AI Engineering

Building with AI, not just on top of it

Most portfolios treat AI as a buzzword. I treat it as engineering: prompts versioned like code, outputs evaluated like tests, security owned end-to-end. No model is going to do that for you.

Evals before vibes

AI output is non-deterministic. If you can't measure it, you're shipping vibes. Every production prompt gets a small eval set: input fixtures, expected behavior, regression guard. Same discipline as unit tests.

Prompts are code

Prompts get versioned, reviewed, and structured. Schemas for outputs. Few-shot examples checked into the repo. Models drift between releases; a prompt that drifts silently is a bug that ships to users.

Local-first when it matters

Not every problem needs a frontier model. Code search, classification, summarization: Ollama or MLX on-device is faster, private, and free. Reach for the API when the task actually needs it.

Capabilities

Eight domains I work in production. Two anchor most projects; six round out the surface area. The tools listed are what I reach for, not a wishlist.

LLM apps & AI SDK

Streaming chatbots, tool-calling agents, and generative interfaces built on the Vercel AI SDK. Anthropic, OpenAI, and Gemini sit behind the AI Gateway so failover, observability, and zero-retention come for free. Tokens stream as they arrive; cancel and regenerate work the way the user expects, not the way the model defaults.

AI security & evals

Prompt-injection defenses, output validation with Zod, and red-team suites that run on every prompt change. The same posture that powers ShipSafe's scanner for AI-generated apps. If the model can break it, the eval should catch it before it ships.

RAG & embeddings

Retrieval pipelines from raw docs to grounded answers. Chunking strategies, hybrid search, re-ranking. pgvector for Postgres-native search, Pinecone when scale demands it.

pgvectorPineconeHybrid searchRe-rankers

Agents & workflows

Multi-step agents with tool use and durable orchestration. Vercel Queues for at-least-once delivery, structured outputs for type-safe handoffs between steps.

Tool useVercel QueuesWorkflowsStructured outputs

MCP servers & tools

Custom Model Context Protocol servers that expose private data and workflows to Claude and other MCP clients. The bridge between an LLM and the systems it actually needs.

MCP serversTool designClaude clientsSchema-first

Prompt engineering

Prompts treated as a code artifact: structured outputs with Zod schemas, few-shot fixtures committed to the repo, system prompts versioned with the app. No magic strings.

Structured outputsZod schemasFew-shotChain-of-thought

AI product UX

The human side: streaming responses, cancel and regenerate affordances, generative interfaces, conversational error states. Designing for non-determinism without making it feel broken.

Streaming UIGenerative UIConversational UXAffordances

Local & on-device

Ollama and MLX for private, offline-capable inference. Useful for code search, classification, and any pipeline where round-tripping to an API isn't worth the latency or the bill.

OllamaMLXllama.cppOn-device

Anchor Case Study

Security scanning is an AI problem now

ShipSafe audits apps built with Bolt, Lovable, and Cursor before they reach production. The scanner is a security tool. The problem space is something else: how LLMs generate code, what they get wrong, what they leak, what fails silently. Building ShipSafe keeps me honest about the AI-codegen ecosystem.

Read the case study

What I build for myself

The fastest way to know whether someone understands AI tooling is to look at what they built for their own workflow. Mine has four layers.

01
MCP servers for Claude
Custom Model Context Protocol servers wired into my daily workflow. They expose project data, search, and deploys to Claude Code so I never copy-paste context into a chat window.
02
AI coding workflows
Claude Code skills, slash commands, and custom subagents that turn ad-hoc 'write me this' prompts into reproducible work. Four phases: research, plan, execute, verify. Checkpoints between each.
03
Personal automation agents
Agents that handle the recurring work I don't want to: research synthesis, content drafts, code reviews, deploy verification. Each one owns one job end-to-end and reports back when it's done.
04
RAG over my own knowledge
A retrieval pipeline over my notes, past projects, and reference material. Embedded once, searchable across every agent and every session. The closest thing I have to a personal model.

The stack

Tools I reach for first. Opinionated, not exhaustive.

Models: Claude (Sonnet, Opus)GPT-4/5Gemini 2.5Llama 3Mistral
Frameworks: Vercel AI SDK v6AI GatewayMCPLangChain (selectively)
Data layer: pgvectorPineconeSupabaseConvex (reactive)
Infra & ops: Vercel Functions (Fluid)Vercel QueuesSentryEval CI gates

Have an AI project?

Three lines is enough: what you're building, what it needs to do, what's stuck. I reply in 48 hours, weekends included.

Security scanning is an AI problem now

GOLDSTACK

Building with AI, not just on top of it

Evals before vibes

Prompts are code

Local-first when it matters

Capabilities

LLM apps & AI SDK

AI security & evals

RAG & embeddings

Agents & workflows

MCP servers & tools

Prompt engineering

AI product UX

Local & on-device

Security scanning is an AI problem now

What I build for myself

MCP servers for Claude

AI coding workflows

Personal automation agents

RAG over my own knowledge

The stack

Have an AI project?

Building with AI, not just on top of it

Evals before vibes

Prompts are code

Local-first when it matters

Capabilities

LLM apps & AI SDK

AI security & evals

RAG & embeddings

Agents & workflows

MCP servers & tools

Prompt engineering

AI product UX

Local & on-device

Security scanning is an AI problem now

What I build for myself

MCP servers for Claude

AI coding workflows

Personal automation agents

RAG over my own knowledge

The stack

Have an AI project?

G