Building Production Apps with Claude API: Lessons from Real Projects
Most AI integrations break in production. Here's what I learned building apps that actually work with Claude API.

Most developers I know have built toy apps with AI APIs. They work great in demos, then fall apart when real users start hitting them.
I've shipped three production apps using Claude API over the past year, and honestly? The first one was a disaster. Users complained about slow responses, inconsistent outputs, and mysterious failures. But each project taught me something crucial about building AI-powered apps that actually work.

Why Claude API for Production?
Before we get into the weeds, let me explain why I chose Claude over other options. I've worked with OpenAI's GPT models, and while they're powerful, Claude consistently gives me more predictable outputs for structured tasks. The API responses feel more reliable, especially when you need consistent JSON formatting or following specific instructions.
Claude also handles longer contexts better in my experience. When building a document analysis tool, I could feed it entire PDFs without the context getting muddled - something that was hit-or-miss with other models.
The Reality of Production AI Apps
Here's what nobody tells you about AI APIs in production: they're not just slow, they're unpredictably slow. A request might take 2 seconds or 20 seconds, and you can't really know which ahead of time.
My first production app was a content summarization tool. Users would paste articles, and Claude would generate bullet-point summaries. Simple enough, right? Wrong. Here's what went sideways:
- Response times varied wildly (2-30 seconds)
- Some requests would just hang indefinitely
- Claude occasionally returned malformed JSON
- Rate limits hit without warning during traffic spikes
- Costs spiraled when users submitted massive documents
The app worked, but the user experience was frustrating. People would click submit, wait 15 seconds, then click again thinking it broke. Multiple API calls, confused users, angry feedback.
Architecture That Actually Works
After that first disaster, I rebuilt the architecture around these realities. Here's the pattern I use now:
// Queue-based processing with status updates
import { Queue } from 'bull';const claudeQueue = new Queue('claude processing'); const supabase = createClient(process.env.SUPABASE_URL, process.env.SUPABASE_ANON_KEY);
// API endpoint just queues the job
export async function POST(request: Request) {
const { content, userId, jobId } = await request.json();
// Store job status immediately
await supabase
.from('ai_jobs')
.insert({
id: jobId,
user_id: userId,
status: 'queued',
created_at: new Date().toISOString()
});
// Queue the actual AI processing
await claudeQueue.add('process-content', {
content,
userId,
jobId
});
return Response.json({ jobId, status: 'queued' });
}
`
The key insight: separate the API request from the AI processing. Users get immediate feedback, then poll for results or get updates via websockets.

Handling Claude API Failures Gracefully
Claude API will fail. Not often, but it happens. Network timeouts, rate limits, model overload - you need to handle all of it. Here's my error handling strategy:
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY, });
class ClaudeService {
async processWithRetry(prompt: string, maxRetries = 3): PromiseClaude API failed after ${maxRetries} attempts: ${error.message});
}
// Exponential backoff
const delay = Math.pow(2, attempt) * 1000;
await new Promise(resolve => setTimeout(resolve, delay));
console.log(Claude API attempt ${attempt} failed, retrying in ${delay}ms);
}
}
}
}
`
The timeout is crucial. Without it, requests can hang for minutes. I learned this the hard way when users started complaining about jobs that never completed.
Prompt Engineering for Consistency
This is where most developers go wrong. They write prompts like they're talking to a human, then wonder why the output format keeps changing. Claude is powerful, but it needs structure.
Here's a prompt pattern I use for consistent JSON responses:
const buildPrompt = (userContent: string) => {Required JSON format: { "summary": "Brief summary in 1-2 sentences", "key_points": ["point 1", "point 2", "point 3"], "sentiment": "positive" | "negative" | "neutral", "topics": ["topic1", "topic2"] }
Content to analyze: ${userContent}
Response:`;
};
`
The key elements:
- Clear role definition
- Explicit output format with examples
- "ONLY" and "no additional text" to prevent extra commentary
- Ending with "Response:" to prime the output
I validate the response immediately and retry if it's not valid JSON:
const validateAndParse = (response: string) => {
try {
const parsed = JSON.parse(response.trim());
// Validate required fields
if (!parsed.summary || !Array.isArray(parsed.key_points)) {
throw new Error('Invalid response structure');
}
return parsed;
} catch (error) {
throw new Error(`Failed to parse Claude response: ${error.message}`);
}
};Cost Control and Monitoring
AI APIs are expensive, especially at scale. My first month with the content tool cost me $200 because I didn't implement proper safeguards. Here's what I do now:
Input Length Limits:
const MAX_INPUT_LENGTH = 50000; // charactersconst validateInput = (content: string, userId: string) => {
if (content.length > MAX_INPUT_LENGTH) {
throw new Error(Content too long. Maximum ${MAX_INPUT_LENGTH} characters.);
}
// Check user's daily usage
const dailyUsage = getUserDailyUsage(userId);
if (dailyUsage.requests > 100) {
throw new Error('Daily request limit exceeded');
}
};
`
Usage Tracking:
const trackUsage = async (userId: string, inputLength: number, outputLength: number) => {
const estimatedCost = (inputLength + outputLength) * 0.000001; // rough calculation
await supabase
.from('usage_tracking')
.insert({
user_id: userId,
input_tokens: Math.ceil(inputLength / 4), // rough token estimate
output_tokens: Math.ceil(outputLength / 4),
estimated_cost: estimatedCost,
created_at: new Date().toISOString()
});
};
Real-Time User Experience
Users hate waiting without feedback. Since Claude requests can take 10-30 seconds, I implement real-time status updates:
// Frontend polling pattern
const pollJobStatus = async (jobId: string) => {
const maxAttempts = 60; // 5 minutes max
let attempts = 0;
while (attempts < maxAttempts) {
const { data } = await supabase
.from('ai_jobs')
.select('status, result, error')
.eq('id', jobId)
.single();
if (data.status === 'completed') {
return data.result;
}
if (data.status === 'failed') {
throw new Error(data.error);
}
// Show progress indicators
setStatus(data.status); // 'queued', 'processing', 'completed'
await new Promise(resolve => setTimeout(resolve, 5000));
attempts++;
}
throw new Error('Job timed out');
};For better UX, I show different messages based on status:
- "Queued": "Your request is in line..."
- "Processing": "AI is analyzing your content..."
- "Completed": Show results
Testing AI Integrations
Testing AI features is tricky because outputs aren't deterministic. Here's my approach:
// Test the integration, not the AI output
describe('Claude Service', () => {
it('should handle valid responses', async () => {
const mockResponse = {
content: [{ text: '{"summary": "test", "key_points": ["point1"]}' }]
};
jest.spyOn(anthropic.messages, 'create').mockResolvedValue(mockResponse);
const result = await claudeService.processContent('test input');
expect(result).toHaveProperty('summary');
expect(Array.isArray(result.key_points)).toBe(true);
});
it('should retry on failures', async () => {
jest.spyOn(anthropic.messages, 'create')
.mockRejectedValueOnce(new Error('Network error'))
.mockResolvedValue(validResponse);
const result = await claudeService.processWithRetry('test');
expect(anthropic.messages.create).toHaveBeenCalledTimes(2);
expect(result).toBeDefined();
});
});I focus on testing error handling, retries, and response parsing rather than trying to validate AI-generated content.
What I'd Do Differently
If I were starting fresh today, here's what I'd change:
- Start with streaming responses for better perceived performance
- Implement caching for similar requests (surprisingly effective)
- Use webhooks instead of polling where possible
- Set up proper monitoring from day one (costs, latency, error rates)
- Build admin tools for managing failed jobs and user limits
The biggest lesson? AI APIs aren't drop-in replacements for traditional APIs. They need different architectures, different error handling, and different user experience patterns.
Key Takeaways for Your Next AI Project
- Separate AI processing from user requests using queues
- Implement exponential backoff and proper timeouts
- Structure your prompts for consistent outputs
- Track usage and costs from the beginning
- Test integration logic, not AI outputs
- Design UI around unpredictable response times
- Plan for failures - they will happen
Building production AI apps is harder than the tutorials make it look, but it's absolutely doable. The key is respecting the unique challenges of AI APIs and designing around them rather than fighting them.
What's your experience been with AI APIs in production? I'd love to hear about the gotchas you've encountered and how you solved them.

Ibrahim Lawal
Full-Stack Developer & AI Integration Specialist. Building AI-powered products that solve real problems.
View Portfolio