TypeScript

Building Production Apps with Claude API: Lessons from Real Projects

Most AI integrations break in production. Here's what I learned building apps that actually work with Claude API.

February 11, 20268 min read

Building Production Apps with Claude API: Lessons from Real Projects

Most developers I know have built toy apps with AI APIs. They work great in demos, then fall apart when real users start hitting them.

I've shipped three production apps using Claude API over the past year, and honestly? The first one was a disaster. Users complained about slow responses, inconsistent outputs, and mysterious failures. But each project taught me something crucial about building AI-powered apps that actually work.

Why Claude API for Production?

Before we get into the weeds, let me explain why I chose Claude over other options. I've worked with OpenAI's GPT models, and while they're powerful, Claude consistently gives me more predictable outputs for structured tasks. The API responses feel more reliable, especially when you need consistent JSON formatting or following specific instructions.

Claude also handles longer contexts better in my experience. When building a document analysis tool, I could feed it entire PDFs without the context getting muddled - something that was hit-or-miss with other models.

The Reality of Production AI Apps

Here's what nobody tells you about AI APIs in production: they're not just slow, they're unpredictably slow. A request might take 2 seconds or 20 seconds, and you can't really know which ahead of time.

My first production app was a content summarization tool. Users would paste articles, and Claude would generate bullet-point summaries. Simple enough, right? Wrong. Here's what went sideways:

Response times varied wildly (2-30 seconds)
Some requests would just hang indefinitely
Claude occasionally returned malformed JSON
Rate limits hit without warning during traffic spikes
Costs spiraled when users submitted massive documents

The app worked, but the user experience was frustrating. People would click submit, wait 15 seconds, then click again thinking it broke. Multiple API calls, confused users, angry feedback.

Architecture That Actually Works

After that first disaster, I rebuilt the architecture around these realities. Here's the pattern I use now:

typescript

// Queue-based processing with status updates
import { Queue } from 'bull';

const claudeQueue = new Queue('claude processing'); const supabase = createClient(process.env.SUPABASE_URL, process.env.SUPABASE_ANON_KEY);

// API endpoint just queues the job export async function POST(request: Request) { const { content, userId, jobId } = await request.json(); // Store job status immediately await supabase .from('ai_jobs') .insert({ id: jobId, user_id: userId, status: 'queued', created_at: new Date().toISOString() }); // Queue the actual AI processing await claudeQueue.add('process-content', { content, userId, jobId }); return Response.json({ jobId, status: 'queued' }); } `

The key insight: separate the API request from the AI processing. Users get immediate feedback, then poll for results or get updates via websockets.

Handling Claude API Failures Gracefully

Claude API will fail. Not often, but it happens. Network timeouts, rate limits, model overload - you need to handle all of it. Here's my error handling strategy:

typescript

const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY, });

class ClaudeService { async processWithRetry(prompt: string, maxRetries = 3): Promise { for (let attempt = 1; attempt <= maxRetries; attempt++) { try { const response = await anthropic.messages.create({ model: "claude-3-sonnet-20240229", max_tokens: 1000, messages: [{ role: "user", content: prompt }], timeout: 30000 // 30 second timeout }); return response.content[0].text; } catch (error) { if (attempt === maxRetries) { throw new Error(Claude API failed after ${maxRetries} attempts: ${error.message}); } // Exponential backoff const delay = Math.pow(2, attempt) * 1000; await new Promise(resolve => setTimeout(resolve, delay)); console.log(Claude API attempt ${attempt} failed, retrying in ${delay}ms); } } } } `

The timeout is crucial. Without it, requests can hang for minutes. I learned this the hard way when users started complaining about jobs that never completed.

Prompt Engineering for Consistency

This is where most developers go wrong. They write prompts like they're talking to a human, then wonder why the output format keeps changing. Claude is powerful, but it needs structure.

Here's a prompt pattern I use for consistent JSON responses:

typescript

const buildPrompt = (userContent: string) => {

Required JSON format: { "summary": "Brief summary in 1-2 sentences", "key_points": ["point 1", "point 2", "point 3"], "sentiment": "positive" | "negative" | "neutral", "topics": ["topic1", "topic2"] }

Content to analyze: ${userContent}

Response:`; }; `

The key elements:

Clear role definition
Explicit output format with examples
"ONLY" and "no additional text" to prevent extra commentary
Ending with "Response:" to prime the output

I validate the response immediately and retry if it's not valid JSON:

typescript

const validateAndParse = (response: string) => {
  try {
    const parsed = JSON.parse(response.trim());
    
    // Validate required fields
    if (!parsed.summary || !Array.isArray(parsed.key_points)) {
      throw new Error('Invalid response structure');
    }
    
    return parsed;
  } catch (error) {
    throw new Error(`Failed to parse Claude response: ${error.message}`);
  }
};

Cost Control and Monitoring

AI APIs are expensive, especially at scale. My first month with the content tool cost me $200 because I didn't implement proper safeguards. Here's what I do now:

Input Length Limits:

typescript

const MAX_INPUT_LENGTH = 50000; // characters

Usage Tracking:

typescript

const trackUsage = async (userId: string, inputLength: number, outputLength: number) => {
  const estimatedCost = (inputLength + outputLength) * 0.000001; // rough calculation
  
  await supabase
    .from('usage_tracking')
    .insert({
      user_id: userId,
      input_tokens: Math.ceil(inputLength / 4), // rough token estimate
      output_tokens: Math.ceil(outputLength / 4),
      estimated_cost: estimatedCost,
      created_at: new Date().toISOString()
    });
};

Real-Time User Experience

Users hate waiting without feedback. Since Claude requests can take 10-30 seconds, I implement real-time status updates:

typescript

// Frontend polling pattern
const pollJobStatus = async (jobId: string) => {
  const maxAttempts = 60; // 5 minutes max
  let attempts = 0;
  
  while (attempts < maxAttempts) {
    const { data } = await supabase
      .from('ai_jobs')
      .select('status, result, error')
      .eq('id', jobId)
      .single();
    
    if (data.status === 'completed') {
      return data.result;
    }
    
    if (data.status === 'failed') {
      throw new Error(data.error);
    }
    
    // Show progress indicators
    setStatus(data.status); // 'queued', 'processing', 'completed'
    
    await new Promise(resolve => setTimeout(resolve, 5000));
    attempts++;
  }
  
  throw new Error('Job timed out');
};

For better UX, I show different messages based on status:

"Queued": "Your request is in line..."
"Processing": "AI is analyzing your content..."
"Completed": Show results

Testing AI Integrations

Testing AI features is tricky because outputs aren't deterministic. Here's my approach:

typescript

// Test the integration, not the AI output
describe('Claude Service', () => {
  it('should handle valid responses', async () => {
    const mockResponse = {
      content: [{ text: '{"summary": "test", "key_points": ["point1"]}' }]
    };
    
    jest.spyOn(anthropic.messages, 'create').mockResolvedValue(mockResponse);
    
    const result = await claudeService.processContent('test input');
    
    expect(result).toHaveProperty('summary');
    expect(Array.isArray(result.key_points)).toBe(true);
  });
  
  it('should retry on failures', async () => {
    jest.spyOn(anthropic.messages, 'create')
      .mockRejectedValueOnce(new Error('Network error'))
      .mockResolvedValue(validResponse);
    
    const result = await claudeService.processWithRetry('test');
    
    expect(anthropic.messages.create).toHaveBeenCalledTimes(2);
    expect(result).toBeDefined();
  });
});

I focus on testing error handling, retries, and response parsing rather than trying to validate AI-generated content.

What I'd Do Differently

If I were starting fresh today, here's what I'd change:

Start with streaming responses for better perceived performance
Implement caching for similar requests (surprisingly effective)
Use webhooks instead of polling where possible
Set up proper monitoring from day one (costs, latency, error rates)
Build admin tools for managing failed jobs and user limits

The biggest lesson? AI APIs aren't drop-in replacements for traditional APIs. They need different architectures, different error handling, and different user experience patterns.

Key Takeaways for Your Next AI Project

Separate AI processing from user requests using queues
Implement exponential backoff and proper timeouts
Structure your prompts for consistent outputs
Track usage and costs from the beginning
Test integration logic, not AI outputs
Design UI around unpredictable response times
Plan for failures - they will happen

Building production AI apps is harder than the tutorials make it look, but it's absolutely doable. The key is respecting the unique challenges of AI APIs and designing around them rather than fighting them.

What's your experience been with AI APIs in production? I'd love to hear about the gotchas you've encountered and how you solved them.

Ibrahim Lawal

Full-Stack Developer & AI Integration Specialist. Building AI-powered products that solve real problems.

View Portfolio

Back to all articles