The Stack I Use to Prototype AI Features in a Weekend

A week that changed how I build things

In late 2024, there was an urgent need for a construction safety monitoring system in the UAE. Workers were getting injured on sites that had no real-time hazard detection. I didn't have months. I didn't have a team. I had one week, a clear problem, and a stack I'd been refining across a dozen projects.

SALAMA shipped in 7 days. Real-time video analysis detecting safety violations on construction sites. Workers wearing (or not wearing) hard hats, harnesses, proximity to heavy machinery — all flagged in real-time with alerts. It wasn't a toy demo. It went into actual use.

That speed wasn't luck. It was the result of having a "golden path" — a set of tools and decisions I've pre-made so I don't waste the first two days of a prototype on stack debates.

Here's the whole thing, layer by layer.

Frontend: Next.js + Tailwind + shadcn/ui

Next.js App Router is the foundation. Not Pages Router — App Router. Server Components mean I can fetch data and render HTML without shipping a JavaScript bundle for every page. For a prototype, this matters more than you'd think: fewer loading spinners, faster perceived performance, and the code is simpler because I'm not managing client-side state for data that doesn't need to be interactive.

Tailwind CSS because I refuse to context-switch between a component file and a stylesheet during a prototype sprint. Everything is inline, everything is scannable. I know some people hate it. I've tried the alternatives. Nothing else lets me go from "blank div" to "polished card component" in 90 seconds.

shadcn/ui is the secret weapon. It's not a component library you install — it's components you copy into your project and own. Dialogs, dropdowns, data tables, command palettes — all accessible, all composable, all customizable because they're literally just files in your repo. When AllysAI needed a complex multi-panel document viewer, I started with shadcn's sheet and dialog primitives and had a working UI in an hour.

# The command I run first on every project
npx create-next-app@latest my-prototype --typescript --tailwind --app --src-dir
npx shadcn@latest init
npx shadcn@latest add button card dialog input textarea

Five minutes. I have a working app with a design system.

Backend: FastAPI with async everything

FastAPI running on uvicorn. Every route handler is async. Every external call uses httpx or the async client of whatever SDK I'm integrating.

Why FastAPI over Express or Django or whatever else? Three reasons:

Automatic OpenAPI docs. I define a Pydantic model, I get request validation, response serialization, and a Swagger UI that I can hand to a frontend dev (or myself in 3 days when I've forgotten the API shape). Zero extra work.
Async native. AI features are I/O bound. You're waiting on OpenAI, you're waiting on embedding APIs, you're waiting on vector search. Async lets you fire these in parallel without threads. Sarathi's voice pipeline hits 4 different APIs per request — async brought the P95 from 3.2s to 1.1s.
Python ecosystem. LangChain, LlamaIndex, transformers, OpenAI SDK — everything in the AI world is Python-first. Fighting that with a Node.js backend means writing wrappers, dealing with subprocess calls, or using half-baked JS ports. Just use Python.

My base FastAPI setup looks like this:

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from contextlib import asynccontextmanager

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup: init DB pool, load models, warm caches
    await init_db()
    await warm_embedding_cache()
    yield
    # Shutdown: clean up
    await close_db()

app = FastAPI(lifespan=lifespan)
app.add_middleware(CORSMiddleware, allow_origins=["*"])  # Tighten for prod

That lifespan handler is crucial. Database connections, embedding model warm-up, any expensive initialization — it happens once at startup, not on the first request. Your first user doesn't eat a 5-second cold start.

AI Layer: LangChain when it helps, raw API calls when it doesn't

Hot take: LangChain is great for chains and terrible for simple things.

If I need a straightforward "send prompt, get response" call, I use the OpenAI SDK directly. Wrapping a single API call in LangChain's abstraction adds complexity with zero benefit. You're importing five modules to do what four lines of code can do.

But the moment I need chains — retrieval-augmented generation, multi-step reasoning, tool use — LangChain earns its weight. Klavy's legal RAG pipeline uses LangChain's retrieval chain with a custom reranker, and building that from scratch would have taken days instead of hours.

My rule of thumb:

Single LLM call → direct API (openai.chat.completions.create)
RAG / retrieval → LangChain retrieval chain
Multi-step agent → LangGraph (the state machine, not the LangChain agent executor)
Embeddings → direct API call, cache aggressively

Don't let anyone tell you it's "all or nothing." Mix and match. Use the right abstraction level for each task.

Vector DB: Pinecone to start, pgvector to stay

Pinecone for day one. Managed service, generous free tier, dead-simple SDK. I can go from "no vector search" to "semantic search over my documents" in 20 minutes. For a weekend prototype, this is unbeatable.

pgvector for anything going to production. Once I know the prototype has legs, I migrate to pgvector. Why? Because now my vectors live in the same PostgreSQL database as my relational data. One backup strategy. One connection pool. One fewer service to monitor at 3 AM.

Klavy runs 3,000 French law embeddings on pgvector with a HNSW index. Query time: ~40ms for top-10 retrieval. That's plenty fast for a chat interface. You don't need a dedicated vector database for most use cases. You really don't.

-- pgvector setup: 3 lines to semantic search
CREATE EXTENSION vector;
CREATE TABLE documents (
    id serial PRIMARY KEY,
    content text,
    embedding vector(1536)
);
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

Deployment: split brains, small bills

Frontend on Vercel. Push to main, it deploys. Preview URLs for every PR. Edge functions if I need them. The DX is unmatched and the free tier covers most prototypes.

Backend on DigitalOcean or Railway. For prototypes, Railway is magic — connect a GitHub repo, it auto-deploys. For anything that needs to stick around, I use a DigitalOcean droplet. Sarathi — a full voice-to-voice AI assistant with real-time streaming — runs on a $24/month DigitalOcean droplet. Not a typo. Twenty-four dollars a month for a production AI service handling concurrent voice streams.

The trick is that the expensive compute (LLM inference, embeddings) happens on the API provider's side. Your backend is just orchestrating HTTP calls and managing state. You don't need beefy hardware for that.

The golden path: idea to demo in 48 hours

Here's what the first 48 hours actually look like:

Hour 0-2: Scaffold and wire.

# Project structure I always start with
my-prototype/
  frontend/             # Next.js app
    src/
      app/
        page.tsx        # Landing / main interface
        api/            # API route proxies (optional)
        layout.tsx
      components/
        ui/             # shadcn components
      lib/
        api.ts          # Backend API client
  backend/              # FastAPI
    app/
      main.py           # FastAPI app + routes
      services/         # Business logic
      models/           # Pydantic schemas
    requirements.txt
  docker-compose.yml    # Postgres + Redis locally

Hour 2-8: Build the AI core. This is where the actual value lives. Forget the UI — build the pipeline. Can you take input, process it through your AI layer, and get a useful output? For SALAMA, this was "video frame in, safety violations out." For Sarathi, "audio in, intelligent response audio out." Get this working in a Python script before you touch the frontend.

Hour 8-16: Wrap it in an API. FastAPI routes that expose the AI pipeline. Pydantic models for request/response. Basic error handling. Hit it with curl or the Swagger UI to make sure it works.

Hour 16-36: Build the interface. Now the frontend. Connect to the API. Build the minimum UI that lets someone experience the AI feature. Not beautiful — functional. shadcn/ui components, Tailwind styling, maybe one custom animation if I'm feeling fancy.

Hour 36-48: Polish and deploy. Error states. Loading indicators. Mobile responsiveness. Push frontend to Vercel, backend to Railway. Share the URL.

That's it. That's the whole process. No architecture diagrams. No design reviews. No sprint planning. Just build the thing.

Stop overthinking your stack

The best AI prototype is the one that ships this weekend.

I've watched talented engineers spend weeks evaluating vector databases, debating LangChain vs LlamaIndex vs building from scratch, and setting up elaborate CI/CD pipelines — for a prototype. A prototype that, by definition, exists to test whether the idea even works.

Pick a stack. Learn it deeply. Use it for everything until you hit a wall that genuinely requires a different tool. My stack isn't the objectively best stack. It's the stack I know so well that the tools disappear and I can focus on the actual problem.

SALAMA didn't ship in a week because I picked the perfect technologies. It shipped because I didn't waste a single hour on decisions I'd already made. The vector DB was pgvector because it's always pgvector. The frontend was Next.js because it's always Next.js. The backend was FastAPI because it's always FastAPI.

Make your decisions once. Then go build something.