ai

Building Production Apps with Claude API: What I Learned After 6 Months

After shipping three production apps using Claude API, here's what actually matters when building real-world AI features.

January 23, 20267 min read
Share:
Building Production Apps with Claude API: What I Learned After 6 Months

Building your first AI feature feels magical. You send a prompt, get back exactly what you wanted, and suddenly you're planning the next unicorn startup. Then reality hits when you try to make it work reliably for thousands of users.

I've been building production applications with Claude API for about six months now, shipping everything from a content generation tool for marketers to a code review assistant for development teams. The gap between "it works in my local environment" and "it works for paying customers" taught me more about AI integration than any tutorial ever could.

laptop code screen
laptop code screen

The Foundation: Getting Your Architecture Right

Most developers jump straight into prompt engineering, but your architecture decisions matter more than your prompts. I learned this the hard way when my first Claude integration brought down our main API because I didn't isolate the AI calls properly.

Here's the setup I use now for every Claude integration:

typescript
// api/claude/client.ts

class ClaudeClient { private client: Anthropic; private readonly maxRetries = 3; private readonly timeoutMs = 30000;

constructor() { this.client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY, timeout: this.timeoutMs, }); }

async generateWithRetry(messages: any[], options = {}) { for (let attempt = 1; attempt <= this.maxRetries; attempt++) { try { return await this.client.messages.create({ model: 'claude-3-sonnet-20240229', max_tokens: 1000, messages, ...options, }); } catch (error) { if (attempt === this.maxRetries) throw error; const delay = Math.pow(2, attempt) * 1000; // Exponential backoff await new Promise(resolve => setTimeout(resolve, delay)); } } } } `

The key insight here is treating Claude as an external service that will fail. Rate limits, timeouts, and occasional errors are features, not bugs. Your app needs to handle them gracefully.

Prompt Engineering for Reliability, Not Cleverness

I used to write prompts like I was crafting poetry. Lots of context, creative instructions, and complex examples. Then I discovered that production prompts need to be boring and predictable.

Here's what actually works:

typescript
const SYSTEM_PROMPT = `You are a code reviewer. Your job is to:
1. Find potential bugs or issues
2. Suggest improvements for readability

Rules: - Always respond in valid JSON format - If no issues found, return {"issues": []} - Each issue must have: type, line, description, severity - Severity levels: "low", "medium", "high"`;

const userPrompt = `Review this code:

\\\${language}\n${code}\n\\\\n Respond with JSON only.`; `

Notice how specific this is. I'm telling Claude exactly what format I want, what fields are required, and what happens in edge cases. This isn't creative writing—it's API design.

The "respond with JSON only" instruction at the end? That came after debugging why my parser was breaking on responses that included explanatory text before the JSON.

server room computers
server room computers

Handling the Messy Reality of LLM Outputs

Claude is incredibly good at following instructions, but "incredibly good" isn't "perfect." You need robust parsing and validation for every response.

typescript
function parseClaudeResponse<T>(response: string, schema: z.ZodSchema<T>): T {
  // Claude sometimes wraps JSON in markdown code blocks
  const cleanedResponse = response
    .replace(/```json\n?/g, '')
    .replace(/\n?```/g, '')

try { const parsed = JSON.parse(cleanedResponse); return schema.parse(parsed); // Using Zod for validation } catch (error) { console.error('Claude response parsing failed:', { original: response, cleaned: cleanedResponse, error: error.message, }); // Return a safe default or throw a custom error throw new Error('Failed to parse AI response'); } } `

I use Zod schemas to validate every Claude response. It catches issues early and makes debugging much easier. The cleaning step handles Claude's tendency to wrap JSON in markdown code blocks, even when you specifically ask for "JSON only."

Cost Management That Actually Scales

Claude API pricing is reasonable, but it adds up fast in production. I've seen developers get $500 surprise bills because they didn't think about token usage.

Here's my approach:

typescript
// Track token usage per user/feature
class TokenTracker {
  async trackUsage(userId: string, feature: string, inputTokens: number, outputTokens: number) {
    const cost = this.calculateCost(inputTokens, outputTokens);
    
    await this.db.tokenUsage.create({
      data: {
        userId,
        feature,
        inputTokens,
        outputTokens,
        cost,
        timestamp: new Date(),
      },
    });
    
    // Check if user is approaching limits
    await this.checkUserLimits(userId);
  }
  
  private calculateCost(inputTokens: number, outputTokens: number): number {
    // Claude 3 Sonnet pricing (as of writing)
    const inputCost = (inputTokens / 1000000) * 3; // $3 per million input tokens
    const outputCost = (outputTokens / 1000000) * 15; // $15 per million output tokens
    return inputCost + outputCost;
  }
}

I track costs per user and per feature. This data helps me understand which features are expensive and where to optimize. Sometimes the answer is using Claude 3 Haiku instead of Sonnet for simpler tasks.

The Debugging Experience You Need to Plan For

Debugging AI features is different from debugging normal code. When a user reports "the AI gave me a weird response," you need to be able to trace exactly what happened.

I store every Claude interaction:

typescript
interface ClaudeInteraction {
  id: string;
  userId: string;
  feature: string;
  systemPrompt: string;
  userPrompt: string;
  response: string;
  tokensUsed: number;
  responseTimeMs: number;
  timestamp: Date;
  error?: string;
}

This makes debugging possible. When something goes wrong, I can see the exact prompt that caused the issue and iterate on it. Without this logging, you're flying blind.

Performance Patterns That Matter

Claude API calls are slow compared to database queries. Plan your UX accordingly:

  • Stream responses when possible using Claude's streaming API
  • Show progress indicators that feel meaningful, not just spinners
  • Cache aggressively for repeated queries
  • Run AI calls in background jobs for non-interactive features

For our code review tool, I process reviews in the background and notify users when they're ready. Much better UX than making them wait 30 seconds watching a loading spinner.

team meeting office
team meeting office

What I'd Do Differently Next Time

Start with simpler models: I jumped straight to Claude 3 Sonnet for everything. Haiku handles many tasks just fine at 1/3 the cost.

Build evaluation from day one: I wish I'd started with automated testing of my prompts. Now I'm retrofitting evaluation pipelines.

Plan for prompt versioning: When you update prompts in production, you need a way to roll back. Treat them like code deployments.

Monitor token efficiency: Some of my early prompts were hilariously verbose. Shorter, clearer prompts often work better and cost less.

Practical Next Steps

  • Set up proper error handling and retries before writing your first prompt
  • Implement response validation with a schema library like Zod
  • Track token usage and costs from day one
  • Store all AI interactions for debugging
  • Start with Haiku, upgrade to Sonnet only when you need the extra capability
  • Write automated tests for your critical prompts

Building with AI APIs isn't just about crafting the perfect prompt—it's about building resilient systems that handle the messy reality of production usage. The magic is still there, but it's the kind of magic that scales and makes money, not just the kind that impresses other developers.

What's been your biggest surprise building with Claude? I'd love to hear about the gaps between demo and production that caught you off guard.

Ibrahim Lawal

Ibrahim Lawal

Full-Stack Developer & AI Integration Specialist. Building AI-powered products that solve real problems.

View Portfolio