Providers are the AI services that power Doclo’s document processing. The SDK normalizes all provider APIs into unified interfaces, handling cross-provider quirks like different JSON modes, token counting, cost calculation, error formats, and rate limit behaviors. This means you can swap providers without changing your flow code.
Provider Types
Doclo supports three types of providers:
OCRProvider
Converts documents to structured DocumentIR. Supports PDFs, images (JPEG, PNG, WebP, TIFF, etc.), and Office documents (DOCX, XLSX, PPTX) depending on the provider.
interface OCRProvider {
parseToIR(input: { url?: string; base64?: string }): Promise<DocumentIR>;
}
Providers: Datalab, Mistral, Reducto, Unsiloed
Best for: Text-heavy documents, high-accuracy requirements, multi-page PDFs, RAG pipelines, agentic search
LLMProvider
Text-only structured extraction using language models.
interface LLMProvider {
completeJson(input: {
prompt: string;
schema: object;
}): Promise<{ json: unknown; costUSD: number; ... }>;
}
Examples: GPT 5.1, Claude Haiku 4.5, Grok 4
Best for: Working with pre-parsed DocumentIR, text-only documents, cost-sensitive scenarios
LLMProvider has no vision capabilities. It requires DocumentIR or text as input.
VLMProvider
Vision + language models that can extract from images directly.
interface VLMProvider {
completeJson(input: {
prompt: string | MultimodalInput;
schema: object;
}): Promise<{ json: unknown; costUSD: number; ... }>;
}
Examples: GPT 5.1, Claude Sonnet 4, Gemini 2.5 Flash
Best for: Documents with images/charts, complex visual layouts, speed over cost
Available Providers
LLM Providers
| Provider | Models | Via |
|---|
| OpenAI | gpt-5, gpt-5-mini, gpt-4.1, o3, o4-mini | Direct or OpenRouter |
| Anthropic | claude-sonnet-4, claude-haiku-4.5 | Direct or OpenRouter |
| Google | gemini-2.5-flash, gemini-2.5-pro, gemini-3.0-pro | Direct or OpenRouter |
| xAI | grok-4, grok-4-fast | OpenRouter |
We’re actively expanding OpenRouter model support, with GPT OSS, Qwen, Kimi, GLM and others coming soon.
OCR Providers
| Provider | Models | Package |
|---|
| Datalab | surya, marker | @doclo/providers-datalab |
| Mistral | ocr-2512 | @doclo/providers-mistral |
| Reducto | reducto | @doclo/providers-reducto |
| Unsiloed | unsiloed | @doclo/providers-unsiloed |
Unified Interface
All providers return consistent responses:
const result = await provider.completeJson({
prompt: 'Extract invoice data',
schema: invoiceSchema
});
// Every provider returns:
result.json // Extracted data
result.costUSD // Cost in USD
result.provider // Provider name (e.g., 'openai:gpt-4o')
result.ms // Duration in milliseconds
This means you can swap providers without changing your flow code.
OpenRouter vs Native
You can access providers through OpenRouter (a unified gateway) or directly:
Via OpenRouter
Native Access
import { createVLMProvider } from '@doclo/providers-llm';
const provider = createVLMProvider({
provider: 'google',
model: 'google/gemini-2.5-flash',
apiKey: process.env.OPENROUTER_API_KEY!,
via: 'openrouter'
});
Pros: Single API key for all providers, automatic fallbackCons: Slightly higher latency, additional cost markupimport { createVLMProvider } from '@doclo/providers-llm';
const provider = createVLMProvider({
provider: 'openai',
model: 'gpt-4o',
apiKey: process.env.OPENAI_API_KEY!
});
Pros: Lower latency, direct pricingCons: Separate API key per provider
Fallback and Retry
Use buildLLMProvider for production workloads with automatic fallback:
import { buildLLMProvider } from '@doclo/providers-llm';
const provider = buildLLMProvider({
providers: [
{
provider: 'google',
model: 'google/gemini-2.5-flash',
apiKey: process.env.OPENROUTER_API_KEY!,
via: 'openrouter'
},
{
provider: 'openai',
model: 'openai/gpt-5.1',
apiKey: process.env.OPENROUTER_API_KEY!,
via: 'openrouter'
}
],
maxRetries: 3,
retryDelay: 1000,
useExponentialBackoff: true
});
Retry behavior:
- Retries on 5xx errors and rate limits (429)
- Skips retries on 4xx errors (bad requests)
- Exponential backoff: 1s, 2s, 4s delays
- Falls back to next provider after max retries
Choosing a Provider
Our Recommendation: Start with Gemini
For OCR and data extraction, Gemini models are by far the best and it’s not even close. We strongly recommend starting with Gemini and experimenting with other models as needed:
- Gemini 2.5 Flash - Great workhorse for just about any OCR/extraction task. Fast, accurate, cost-effective.
- Gemini 2.5 Pro / 3.0 Pro - Use for the trickiest, most complicated extraction tasks where you need maximum accuracy.
// Start here for most extraction tasks
const provider = createVLMProvider({
provider: 'google',
model: 'google/gemini-2.5-flash',
apiKey: process.env.OPENROUTER_API_KEY!,
via: 'openrouter'
});
When to Use OCR Providers
OCR providers (Datalab, Mistral, Reducto, Unsiloed) are valuable when you need the parsed content for purposes beyond just extraction:
- RAG pipelines - Store and search document content
- Text imports - Ingest documents into your system
- Agentic search - Let agents query document content
- Record keeping - Maintain searchable archives
- Long-form text extraction - Multi-page documents with dense text
OCR + LLM Pipeline
For text-heavy documents, use OCR to parse first, then extract with an LLM:
import { createFlow, parse, extract } from '@doclo/flows';
import { createOCRProvider } from '@doclo/providers-datalab';
import { createVLMProvider } from '@doclo/providers-llm';
const ocrProvider = createOCRProvider({
provider: 'datalab',
model: 'marker',
apiKey: process.env.DATALAB_API_KEY!
});
const llmProvider = createVLMProvider({
provider: 'google',
model: 'google/gemini-2.5-flash',
apiKey: process.env.OPENROUTER_API_KEY!,
via: 'openrouter'
});
// Parse to DocumentIR, then extract structured data
const flow = createFlow()
.step('parse', parse({ provider: ocrProvider }))
.step('extract', extract({
provider: llmProvider,
schema: invoiceSchema
}))
.build();
This approach gives you:
- OCR text for RAG, search, and record keeping
- Maximum text accuracy from specialized OCR
- Citation tracking back to source lines
- Lower cost at scale (OCR + text-only LLM is cheaper than VLM)
Next Steps