ROSHAN
Projects Skills Experience About Blog Contact
Introduction video coming soon
Roshan Kharel
▶ Intro
Hi, I'm 👋

Roshan Kharel

AI Product Engineer

Building intelligent products at the intersection of AI, design, and engineering.

LLM Systems
RAG Pipelines
Full-Stack Development
AI Agents
Product Design
Machine Learning
DevOps & Cloud
UI/UX Engineering
LLM Systems
RAG Pipelines
Full-Stack Development
AI Agents
Product Design
Machine Learning
DevOps & Cloud
UI/UX Engineering

Selected
Projects

08 Works
⊕ Hover to zoom · Emotion AI
AI Avatar

SIMO Avatar

Vision-aware 3D AI avatar that sees, hears, and responds in real-time. GPT-4 Vision for scene understanding, MediaPipe + face-api.js for emotion & gesture detection, Whisper STT → Pinecone RAG → ElevenLabs TTS pipeline, HeyGen lip-sync. Demoed live at the Dubai AI Summit.

GPT-4 VisionMediaPipeface-api.jsWhisperElevenLabsHeyGenPinecone RAGWebSocket
View Project →
⊕ Hover to zoom · Voice Pipeline
Voice AI

AllysAI Consulting Agent

Enterprise voice consulting agent with sub-4s end-to-end latency. Whisper STT → Pinecone RAG → GPT-4o-mini → ElevenLabs TTS pipeline with barge-in support (interrupt mid-response). Live metrics including readiness score, fit score, and ROI projections for lead qualification.

Whisper STTPinecone RAGGPT-4o-miniElevenLabsWebSocketsSQLite
View Project →
⊕ Hover to zoom · LangGraph Agent
AI SaaS

Klavy

Full enterprise AI SaaS for the French rental market (~40K LOC). 14-node LangGraph agent for intent classification & legal Q&A. Document AI with DeepSeek OCR + extraction + risk scoring. Legal RAG over 3K Légifrance law chunks. Multi-channel (WhatsApp + in-platform), Stripe billing, GDPR-compliant on Azure France Central.

LangGraphpgvectorDeepSeek OCRNext.jsFastAPIStripeCeleryAzure
View Project →
⊕ Hover to zoom · Waveform
Voice Agent

Sarathi Voice Agent

Production government voice assistant for NE India supporting Assamese, Bodo & Hindi. Fully local speech pipeline — IndicConformer STT + VITS ONNX TTS running on a $24/month server. Cross-lingual RAG with Pinecone + ChromaDB, Gemini LLM, streaming NDJSON responses.

IndicConformerVITS ONNXPineconeChromaDBGeminiFastAPINext.js
View Project →
⊕ Hover to zoom · Sentiment
Crisis Intel

SALAMA Crisis Agent

Real-time crisis intelligence platform built in 1 week during a live UAE emergency. Aggregates verified news from 10+ official UAE sources, fact-checks social media claims via Claude API, sentiment tracking, and AI-powered Q&A with citations. Real-time updates via SSE streaming.

FastAPINext.jsClaude APIPostgreSQLSSERSS ScrapingSentiment
View Project →
⊕ Hover to zoom · Encryption
Privacy AI

VaultAI

Local-first privacy AI desktop app that runs 100% offline. Fully air-gapped LLM chat via Ollama, AES-256 encryption at rest, document analysis (PDF, TXT, MD, JSON), zero telemetry. Built for users who need AI without data leaving their machine.

ElectronReactOllamaAES-256SQLiteTypeScriptTailwind
View Project →
⊕ Hover to zoom · Scraper
Scraper

Hyves

Production WhatsApp luxury watch market scraper (live, deployed on DigitalOcean). Real-time message capture from trading groups, deduplication, contact enrichment, Google Sheets export. Circuit breaker pattern + rate limiting for resilience. Next.js dashboard + 7K LOC FastAPI backend.

FastAPIWAHANext.jsGoogle Sheets APISQLiteDockerDigitalOcean
View Project →
⊕ Hover to zoom · ATS Score
AI Tool

Resume Weaver

AI resume tailoring tool (8.2K LOC). Upload resume PDF → paste job description → GPT-4 rewrites bullets with targeted keywords while preserving tone. Smart constraints: won’t fabricate experience, won’t add fake jobs, won’t modify dates. Keyword match report + cover letter generation + PDF export.

ReactTypeScriptOpenAIpdfjs-distViteshadcn/uiTailwind
View Project →

Core
Expertise

AI & LLM Systems

LangGraph · RAG Pipelines · Pinecone · pgvector · ChromaDB · OpenAI · Claude API · Gemini · Whisper · ElevenLabs · Ollama · ONNX · Embeddings · Prompt Engineering · Reinforcement Learning

Full-Stack Engineering

React · Next.js · TypeScript · Python · FastAPI · Django · Node.js · Express · PostgreSQL · SQLite · Redis · Prisma · SQLAlchemy · REST APIs · WebSockets · GraphQL

Computer Vision & Voice

GPT-4 Vision · MediaPipe · face-api.js · Moondream · EasyOCR · Presidio PII · Speech-to-Text · Text-to-Speech · HeyGen · Emotion Detection · On-device ML · Cross-lingual NLP

Cloud & Infrastructure

Docker · Google Cloud · Azure · AWS · Vercel · DigitalOcean · Railway · Celery · Nginx · PM2 · CI/CD · Git

Product & Shipping

Electron · React Native · Stripe · SendGrid · WhatsApp API · Google Sheets API · JWT · OAuth · GDPR · Tailwind · Vite · Figma

Work
Experience

AllysAI
AI Engineer
Nov 2024 — Present Active

Building enterprise AI products at a Dubai-based agency. Developed SIMO — a vision-aware interactive AI avatar with voice, emotion detection, and gesture recognition showcased at the Dubai AI Summit. Building AllysAI Consulting Agent with sub-4s latency and real-time lead qualification.

Computer Vision Voice AI Emotion Detection LLM Agents WebSockets
Visit Website
Sarathi Studio
Founder
Jan 2024 — Present Active

Founded an AI agency serving 95+ clients. Built the Sarathi Voice Agent — a production voice assistant for government services in Assamese, Bodo, and Hindi running local speech models on a $24/month server. Also shipped SALAMA (crisis intelligence platform in 1 week), an election intelligence system for a real 2026 campaign, and a digital library serving 30+ schools.

95+ Clients Multilingual STT/TTS Edge AI Crisis Intel LangGraph Election Tech
Visit Website
Magic Square
Full-Stack Engineer — Data & ML
May 2022 — Oct 2023

Built real-time analytics dashboards (React, Next.js, D3.js) visualizing 190K+ user events through a Python ETL pipeline on Google Cloud (Pub/Sub → BigQuery). Developed microservices with FastAPI & Node.js on GKE. Implemented user-churn prediction with XGBoost that lifted 7-day retention by 12%.

React / Next.js D3.js Python ETL Google Cloud XGBoost FastAPI +12% Retention
Visit Website
LunarCrush
Software Engineer — Data Products
Oct 2021 — Nov 2022

Integrated a BERT-based sentiment API (Hugging Face Transformers) into Node backend, reducing inference latency 40%. Created Python pipeline for social-metric embeddings (UMAP) powering a React heat-map surfacing trending tokens to 50K daily users. Built custom analytics bot for 10K-member developer Discord.

BERT / HuggingFace Sentiment Analysis UMAP Embeddings 50K DAU React -40% Latency
Visit Website

About
Me

I'm Roshan, an AI Product Engineer who sits at the intersection of artificial intelligence, software engineering, and product thinking.

I don't just build models — I build products. From ideation to production, I craft intelligent systems that people actually want to use. My work spans LLM applications, full-stack development, and the art of turning messy real-world problems into clean, scalable solutions.

When I'm not shipping code, you'll find me exploring the latest in AI research, contributing to open-source, or mentoring the next generation of engineers.
0
Projects Shipped
0
Years Experience
0
AI Systems Built
0
Open Source Projects

Thinking
Out Loud

06 Articles
01
LLM Engineering

Why RAG Alone Isn't Enough — The Case for Agentic Retrieval

Static retrieval pipelines break when users ask multi-hop questions. Here's how adding planning agents transforms accuracy and user trust.

8 min read
02
Product Thinking

Shipping AI Products: What PMs Get Wrong About Latency Budgets

Users don't wait 12 seconds for a "smarter" answer. How I cut LLM response times by 70% without sacrificing quality through streaming and caching.

6 min read
03
Architecture

Designing Multi-Agent Systems That Don't Collapse Under Their Own Weight

Orchestrating 5+ LLM agents sounds cool until they hallucinate in a loop. A practical framework for guardrails, fallbacks, and graceful degradation.

11 min read
04
Dev Experience

The Stack I Use to Prototype AI Features in a Weekend

FastAPI + LangChain + Next.js + Vercel. A repeatable blueprint for going from idea to deployed demo in 48 hours flat.

5 min read
05
Fine-Tuning

When to Fine-Tune vs. When to Prompt Engineer — A Decision Framework

Not every problem needs a custom model. I break down cost, latency, and accuracy tradeoffs with real benchmarks from production systems.

9 min read
06
Philosophy

The AI Product Engineer — A New Role for a New Era

Neither purely ML nor purely product. Why the most impactful builders in the next decade will be hybrids who speak both languages fluently.

7 min read
Let's collaborate

Got a project
in mind?