- VLM Providers (Vision Language Models): Process documents visually, extract structured data, classify documents
- OCR Providers: Convert documents to text with layout information
Provider Types
VLM Providers
VLM providers can see document images directly. Use them for:- Direct extraction from visually complex documents
- Document classification and categorization
- Splitting multi-document files
- Quality assessment
OCR Providers
OCR providers convert documents to structured text. Use them for:- High-fidelity text extraction with bounding boxes
- Processing text-heavy documents
- Building RAG pipelines with chunking
Supported Providers
VLM Providers
| Provider | Models | Vision | PDFs | Structured Output |
|---|---|---|---|---|
| OpenAI | GPT-4.1, o3, o4-mini | Yes | Yes | Yes |
| Anthropic | Claude 4, Sonnet 4.5, Haiku 4.5 | Yes | Yes | Yes |
| Gemini 2.5 Pro/Flash | Yes | Yes | Yes | |
| xAI | Grok 4.1 | Yes | Yes | Yes |
OCR Providers
| Provider | Package | Features |
|---|---|---|
| Surya | @docloai/providers-datalab | Text + bounding boxes |
| Marker | @docloai/providers-datalab | Markdown conversion |
| Reducto | @docloai/providers-reducto | Chunking, citations |
Access Methods
VLM providers can be accessed two ways:Via OpenRouter (Recommended)
Single API key for all providers with unified billing:- Single API key for all providers
- Unified billing and usage tracking
- Automatic cost tracking in responses
- Provider fallback without multiple API keys
Native APIs
Direct access to provider APIs:- You have existing API keys
- You need provider-specific features
- You want to avoid the OpenRouter intermediary
Provider Selection
Choose based on your needs:| Use Case | Recommended Provider |
|---|---|
| Fast extraction | Google Gemini 2.5 Flash |
| Complex documents | Anthropic Claude Sonnet 4.5 |
| Cost-sensitive | Google Gemini 2.5 Flash Lite |
| Reasoning required | OpenAI o3, Anthropic Claude |
| OCR + text extraction | Surya or Marker |
| RAG chunking | Reducto Parse |
Production Configuration
For production, usebuildLLMProvider with fallback support:
- Retries failed requests up to 2 times
- Falls back to the next provider if one fails
- Uses circuit breaker to skip failing providers
- Applies exponential backoff between retries