Build your first document extraction flow using the Doclo SDK.
Prerequisites
Basic TypeScript knowledge
An AI provider API key (OpenRouter, OpenAI, or Anthropic)
Installation
Install the core packages:
pnpm add @doclo/flows @doclo/providers-llm
These two packages include everything you need:
@doclo/flows - Flow builder and all processing nodes
@doclo/providers-llm - LLM/VLM provider integrations
API Key Setup
This guide uses OpenRouter as a gateway to multiple AI providers. You can also use native provider keys directly.
Get an OpenRouter API Key
Sign up at openrouter.ai, navigate to the Keys section, and generate a new API key. Create environment file
Create a .env.local file in your project root:OPENROUTER_API_KEY=sk-or-v1-your-key-here
Load environment variables
For Node.js scripts, install dotenv:Then import it at the top of your script:
Create a file called invoice-extract.ts:
import 'dotenv/config';
import { createFlow, extract, categorize } from '@doclo/flows';
import { createVLMProvider } from '@doclo/providers-llm';
import fs from 'fs';
// Helper to convert file to base64 data URL
function fileToBase64(filePath: string): string {
const fileBuffer = fs.readFileSync(filePath);
const base64 = fileBuffer.toString('base64');
const mimeType = filePath.endsWith('.pdf') ? 'application/pdf' : 'image/jpeg';
return `data:${mimeType};base64,${base64}`;
}
// Providers for different document qualities
const proProvider = createVLMProvider({
provider: 'google', model: 'google/gemini-2.5-pro',
apiKey: process.env.OPENROUTER_API_KEY!, via: 'openrouter'
});
const flashProvider = createVLMProvider({
provider: 'google', model: 'google/gemini-2.5-flash',
apiKey: process.env.OPENROUTER_API_KEY!, via: 'openrouter'
});
const liteProvider = createVLMProvider({
provider: 'google', model: 'google/gemini-2.5-flash-lite',
apiKey: process.env.OPENROUTER_API_KEY!, via: 'openrouter'
});
// Schema for invoice extraction
const invoiceSchema = {
type: 'object',
properties: {
invoiceNumber: { type: 'string' },
vendor: { type: 'string' },
date: { type: 'string' },
total: { type: 'number' },
currency: { type: 'string' }
}
};
// Build the flow with quality-based routing
const flow = createFlow()
// Assess document quality first
.step('assess', categorize({
provider: liteProvider,
categories: ['low', 'medium', 'high'],
additionalInstructions: 'Assess document quality: low = poor scan/handwritten, medium = decent quality, high = clean digital document'
}))
// Route to appropriate model based on quality
.conditional('extract', (data) => {
const options = { schema: invoiceSchema, consensus: { runs: 3, level: 'field', strategy: 'majority' } };
switch (data.category) {
case 'low':
return extract({ provider: proProvider, ...options });
case 'medium':
return extract({ provider: flashProvider, ...options });
default:
return extract({ provider: liteProvider, ...options });
}
})
.build();
// Run the flow
async function processInvoice(pdfPath: string) {
const result = await flow.run({ base64: fileToBase64(pdfPath) });
console.log(result);
return result;
}
processInvoice('./invoice.pdf').catch(console.error);
Run the Example
Add a test PDF
Save an invoice PDF as invoice.pdf in your project directory.
Run the script
npx tsx invoice-extract.ts
View the output
{
output: {
invoiceNumber: "INV-2024-001",
vendor: "Acme Corporation",
date: "2024-01-15",
total: 1250.00,
currency: "USD"
},
aggregated: {
totalDurationMs: 2134,
totalCostUSD: 0.0045,
totalInputTokens: 2400,
totalOutputTokens: 320,
stepCount: 2
},
metrics: [
{ step: "assess", ms: 312, costUSD: 0.0004 },
{ step: "extract", ms: 1822, costUSD: 0.0041 }
],
artifacts: {
assess: { category: "high" },
extract: { invoiceNumber: "INV-2024-001", ... }
}
}
Understanding the Flow
This example demonstrates two key Doclo features:
| Feature | What it does |
|---|
| Quality-based routing | Assesses document quality, routes to the right model for the job |
| Consensus voting | Runs extraction 3 times and votes on each field for accuracy |
The routing logic:
- Low quality (poor scans, handwritten) →
gemini-2.5-pro for maximum accuracy
- Medium quality (decent scans) →
gemini-2.5-flash for balanced performance
- High quality (clean digital docs) →
gemini-2.5-flash-lite for speed and cost
The result object contains:
| Property | Description |
|---|
output | Final extracted data matching your schema |
aggregated | Totals: duration, cost, tokens, step count |
metrics | Per-step timing and cost breakdown |
artifacts | Intermediate outputs from each step |
Alternative Providers
Use createVLMProvider for a single provider, or buildLLMProvider for fallback chains:
Single Provider
With Fallback
Native Keys
import { createVLMProvider } from '@doclo/providers-llm';
const provider = createVLMProvider({
provider: 'anthropic',
model: 'anthropic/claude-sonnet-4',
apiKey: process.env.OPENROUTER_API_KEY!,
via: 'openrouter'
});
import { buildLLMProvider } from '@doclo/providers-llm';
const provider = buildLLMProvider({
providers: [
{ provider: 'anthropic', model: 'anthropic/claude-sonnet-4', apiKey: process.env.OPENROUTER_API_KEY!, via: 'openrouter' },
{ provider: 'openai', model: 'openai/gpt-5.1', apiKey: process.env.OPENROUTER_API_KEY!, via: 'openrouter' }
],
maxRetries: 2
});
import { createVLMProvider } from '@doclo/providers-llm';
// Use provider directly without OpenRouter
const provider = createVLMProvider({
provider: 'openai',
model: 'gpt-4o',
apiKey: process.env.OPENAI_API_KEY!
});
Troubleshooting
Cannot find module '@doclo/flows'
Make sure packages are installed:pnpm add @doclo/flows @doclo/providers-llm
OPENROUTER_API_KEY is undefined
- Check
.env.local exists with your key
- Make sure you imported
dotenv/config at the top of your file
- Restart your dev server if using Next.js
- Check your OpenRouter usage
- Add credits to your account
- Use
buildLLMProvider() with retry logic for production
- Make required fields optional if data might not exist
- Check your schema uses valid JSON Schema format
- Review the error message for which field failed
Next Steps