Providers are the AI services that power Doclo’s document processing. The SDK normalizes all provider APIs into unified interfaces, handling cross-provider quirks like different JSON modes, token counting, cost calculation, error formats, and rate limit behaviors. This means you can swap providers without changing your flow code.Documentation Index
Fetch the complete documentation index at: https://docs.doclo.ai/llms.txt
Use this file to discover all available pages before exploring further.
Provider Types
Doclo supports three types of providers:OCRProvider
Converts documents to structured DocumentIR. Supports PDFs, images (JPEG, PNG, WebP, TIFF, etc.), and Office documents (DOCX, XLSX, PPTX) depending on the provider.LLMProvider
Text-only structured extraction using language models.LLMProvider has no vision capabilities. It requires DocumentIR or text as input.
VLMProvider
Vision + language models that can extract from images directly.Available Providers
LLM Providers
| Provider | Models | Via |
|---|---|---|
| OpenAI | gpt-5, gpt-5-mini, gpt-4.1, o3, o4-mini | Direct or OpenRouter |
| Anthropic | claude-sonnet-4, claude-haiku-4.5 | Direct or OpenRouter |
| gemini-2.5-flash, gemini-2.5-pro, gemini-3.0-pro | Direct or OpenRouter | |
| xAI | grok-4, grok-4-fast | OpenRouter |
OCR Providers
| Provider | Models | Package |
|---|---|---|
| Datalab | surya, marker | @doclo/providers-datalab |
| Mistral | ocr-2512 | @doclo/providers-mistral |
| Reducto | reducto | @doclo/providers-reducto |
| Unsiloed | unsiloed | @doclo/providers-unsiloed |
Unified Interface
All providers return consistent responses:OpenRouter vs Native
You can access providers through OpenRouter (a unified gateway) or directly:- Via OpenRouter
- Native Access
Fallback and Retry
UsebuildLLMProvider for production workloads with automatic fallback:
- Retries on 5xx errors and rate limits (429)
- Skips retries on 4xx errors (bad requests)
- Exponential backoff: 1s, 2s, 4s delays
- Falls back to next provider after max retries
Choosing a Provider
Our Recommendation: Start with Gemini
For OCR and data extraction, Gemini models are by far the best and it’s not even close. We strongly recommend starting with Gemini and experimenting with other models as needed:- Gemini 2.5 Flash - Great workhorse for just about any OCR/extraction task. Fast, accurate, cost-effective.
- Gemini 2.5 Pro / 3.0 Pro - Use for the trickiest, most complicated extraction tasks where you need maximum accuracy.
When to Use OCR Providers
OCR providers (Datalab, Mistral, Reducto, Unsiloed) are valuable when you need the parsed content for purposes beyond just extraction:- RAG pipelines - Store and search document content
- Text imports - Ingest documents into your system
- Agentic search - Let agents query document content
- Record keeping - Maintain searchable archives
- Long-form text extraction - Multi-page documents with dense text
OCR + LLM Pipeline
For text-heavy documents, use OCR to parse first, then extract with an LLM:- OCR text for RAG, search, and record keeping
- Maximum text accuracy from specialized OCR
- Citation tracking back to source lines
- Lower cost at scale (OCR + text-only LLM is cheaper than VLM)
Next Steps
Provider Setup
Configure providers in detail
Nodes
Learn about processing nodes