Building with AI, not just on top of it
Most portfolios treat AI as a buzzword. I treat it as engineering: prompts versioned like code, outputs evaluated like tests, security owned end-to-end. No model is going to do that for you.
Evals before vibes
AI output is non-deterministic. If you can't measure it, you're shipping vibes. Every production prompt gets a small eval set: input fixtures, expected behavior, regression guard. Same discipline as unit tests.
Prompts are code
Prompts get versioned, reviewed, and structured. Schemas for outputs. Few-shot examples checked into the repo. Models drift between releases; a prompt that drifts silently is a bug that ships to users.
Local-first when it matters
Not every problem needs a frontier model. Code search, classification, summarization: Ollama or MLX on-device is faster, private, and free. Reach for the API when the task actually needs it.
Capabilities
Eight domains I work in production. Two anchor most projects; six round out the surface area. The tools listed are what I reach for, not a wishlist.
LLM apps & AI SDK
Streaming chatbots, tool-calling agents, and generative interfaces built on the Vercel AI SDK. Anthropic, OpenAI, and Gemini sit behind the AI Gateway so failover, observability, and zero-retention come for free. Tokens stream as they arrive; cancel and regenerate work the way the user expects, not the way the model defaults.
AI security & evals
Prompt-injection defenses, output validation with Zod, and red-team suites that run on every prompt change. The same posture that powers ShipSafe's scanner for AI-generated apps. If the model can break it, the eval should catch it before it ships.
RAG & embeddings
Retrieval pipelines from raw docs to grounded answers. Chunking strategies, hybrid search, re-ranking. pgvector for Postgres-native search, Pinecone when scale demands it.
Agents & workflows
Multi-step agents with tool use and durable orchestration. Vercel Queues for at-least-once delivery, structured outputs for type-safe handoffs between steps.
MCP servers & tools
Custom Model Context Protocol servers that expose private data and workflows to Claude and other MCP clients. The bridge between an LLM and the systems it actually needs.
Prompt engineering
Prompts treated as a code artifact: structured outputs with Zod schemas, few-shot fixtures committed to the repo, system prompts versioned with the app. No magic strings.
AI product UX
The human side: streaming responses, cancel and regenerate affordances, generative interfaces, conversational error states. Designing for non-determinism without making it feel broken.
Local & on-device
Ollama and MLX for private, offline-capable inference. Useful for code search, classification, and any pipeline where round-tripping to an API isn't worth the latency or the bill.
Security scanning is an AI problem now
ShipSafe audits apps built with Bolt, Lovable, and Cursor before they reach production. The scanner is a security tool. The problem space is something else: how LLMs generate code, what they get wrong, what they leak, what fails silently. Building ShipSafe keeps me honest about the AI-codegen ecosystem.
What I build for myself
The fastest way to know whether someone understands AI tooling is to look at what they built for their own workflow. Mine has four layers.
- 01
MCP servers for Claude
Custom Model Context Protocol servers wired into my daily workflow. They expose project data, search, and deploys to Claude Code so I never copy-paste context into a chat window.
- 02
AI coding workflows
Claude Code skills, slash commands, and custom subagents that turn ad-hoc 'write me this' prompts into reproducible work. Four phases: research, plan, execute, verify. Checkpoints between each.
- 03
Personal automation agents
Agents that handle the recurring work I don't want to: research synthesis, content drafts, code reviews, deploy verification. Each one owns one job end-to-end and reports back when it's done.
- 04
RAG over my own knowledge
A retrieval pipeline over my notes, past projects, and reference material. Embedded once, searchable across every agent and every session. The closest thing I have to a personal model.
The stack
Tools I reach for first. Opinionated, not exhaustive.
- Models
- Claude (Sonnet, Opus)GPT-4/5Gemini 2.5Llama 3Mistral
- Frameworks
- Vercel AI SDK v6AI GatewayMCPLangChain (selectively)
- Data layer
- pgvectorPineconeSupabaseConvex (reactive)
- Infra & ops
- Vercel Functions (Fluid)Vercel QueuesSentryEval CI gates
Have an AI project?
Three lines is enough: what you're building, what it needs to do, what's stuck. I reply in 48 hours, weekends included.