Staff Engineer · Airbnb · writing since 2026
Agentic AI on systems of record, made reliable.
Fifteen years building enterprise platforms. I architect agentic AI on the systems businesses actually run on — and write about the engineering around the model: the guardrails, the human-in-the-loop, and the parts the hype skips.
Featured — selected writing
02 pieces
Putting AI in Front of a Platform: Lessons from Real Systems
Adding an LLM to a greenfield app is easy. Adding one to an enterprise platform — with its permissions model, data gravity, and audit requirements — is a different problem. What changes, and what you have to respect.
Designing Reliable AI Agents on Top of Enterprise Platforms
An agent that can take actions in a system of record is powerful and dangerous in equal measure. The guardrails — permissions, idempotency, audit, and human checkpoints — that make autonomous actions safe on business-critical data.
Latest — most recent
All posts →Putting AI in Front of a Platform: Lessons from Real Systems
Adding an LLM to a greenfield app is easy. Adding one to an enterprise platform — with its permissions model, data gravity, and audit requirements — is a different problem. What changes, and what you have to respect.
Building Reliable LLM Features: What Production Actually Demands
The model is the easy part now. Here are the engineering patterns — validation, evals, observability, and designing for the wrong answer — that decide whether an LLM feature holds up with real users.
Designing Reliable AI Agents on Top of Enterprise Platforms
An agent that can take actions in a system of record is powerful and dangerous in equal measure. The guardrails — permissions, idempotency, audit, and human checkpoints — that make autonomous actions safe on business-critical data.
Non-Functional Requirements for AI Systems: What Staff Engineers Should Specify
Teams obsess over what an AI feature should do and forget to specify how well it must do it. A checklist of the non-functional requirements — accuracy, latency, cost, fallback, governance — that decide whether it's production-ready.
How I Evaluate LLM Output Without a Ground-Truth Dataset
You almost never have labeled data when you ship an AI feature. Here's a practical progression — from a hand-built eval set to LLM-as-judge — for measuring quality when there's no answer key.