Skip to main content
Providers are the external services that power document processing. Doclo supports two types:
  • VLM Providers (Vision Language Models): Process documents visually, extract structured data, classify documents
  • OCR Providers: Convert documents to text with layout information

Provider Types

VLM Providers

VLM providers can see document images directly. Use them for:
  • Direct extraction from visually complex documents
  • Document classification and categorization
  • Splitting multi-document files
  • Quality assessment
import { createVLMProvider } from '@doclo/providers-llm';

const vlmProvider = createVLMProvider({
  provider: 'google',
  model: 'google/gemini-2.5-flash-preview-09-2025',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

OCR Providers

OCR providers convert documents to structured text. Use them for:
  • High-fidelity text extraction with bounding boxes
  • Processing text-heavy documents
  • Building RAG pipelines with chunking
import { createOCRProvider } from '@doclo/providers-datalab';

const ocrProvider = createOCRProvider({
  endpoint: 'https://www.datalab.to/api/v1/ocr',
  apiKey: process.env.DATALAB_API_KEY!
});

Supported Providers

VLM Providers

ProviderModelsVisionPDFsStructured Output
OpenAIGPT-4.1, o3, o4-miniYesYesYes
AnthropicClaude 4, Sonnet 4.5, Haiku 4.5YesYesYes
GoogleGemini 2.5 Pro/FlashYesYesYes
xAIGrok 4.1YesYesYes

OCR Providers

ProviderPackageFeatures
Surya@doclo/providers-datalabText + bounding boxes
Marker@doclo/providers-datalabMarkdown conversion
Mistral@doclo/providers-mistralMarkdown, native extraction, handwriting
Reducto@doclo/providers-reductoChunking, citations
Unsiloed@doclo/providers-unsiloedParse, extract, classify, split

Access Methods

VLM providers can be accessed two ways: Single API key for all providers with unified billing:
const provider = createVLMProvider({
  provider: 'anthropic',
  model: 'anthropic/claude-sonnet-4.5',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});
Benefits:
  • Single API key for all providers
  • Unified billing and usage tracking
  • Automatic cost tracking in responses
  • Provider fallback without multiple API keys

Native APIs

Direct access to provider APIs:
const provider = createVLMProvider({
  provider: 'openai',
  model: 'gpt-4.1',
  apiKey: process.env.OPENAI_API_KEY!
  // No 'via' parameter = native API
});
Use native APIs when:
  • You have existing API keys
  • You need provider-specific features
  • You want to avoid the OpenRouter intermediary

Provider Selection

Choose based on your needs:
Use CaseRecommended Provider
Fast extractionGoogle Gemini 2.5 Flash
Complex documentsAnthropic Claude Sonnet 4.5
Cost-sensitiveGoogle Gemini 2.5 Flash Lite
Reasoning requiredOpenAI o3, Anthropic Claude
OCR + text extractionSurya or Marker
RAG chunkingReducto Parse

Production Configuration

For production, use buildLLMProvider with fallback support:
import { buildLLMProvider } from '@doclo/providers-llm';

const provider = buildLLMProvider({
  providers: [
    {
      provider: 'google',
      model: 'google/gemini-2.5-flash-preview-09-2025',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    },
    {
      provider: 'anthropic',
      model: 'anthropic/claude-sonnet-4.5',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    }
  ],
  maxRetries: 2,
  retryDelay: 1000,
  useExponentialBackoff: true,
  circuitBreakerThreshold: 3
});
This configuration:
  • Retries failed requests up to 2 times
  • Falls back to the next provider if one fails
  • Uses circuit breaker to skip failing providers
  • Applies exponential backoff between retries

Cost Tracking

All providers return cost information:
const result = await flow.run(input);

console.log('Cost:', result.aggregated.totalCostUSD);
console.log('Tokens:', result.aggregated.totalInputTokens, 'in /',
            result.aggregated.totalOutputTokens, 'out');

Provider Metadata Utilities

The SDK exports utility functions for querying provider capabilities programmatically:
import {
  PROVIDER_METADATA,
  isImageTypeSupported,
  supportsPDFsInline,
  getProvidersForNode,
  isProviderCompatibleWithNode,
  estimateCost,
  getCheapestProvider,
  compareNativeVsOpenRouter
} from '@doclo/providers-llm';

Check Image Support

isImageTypeSupported('openai', 'image/png');   // true
isImageTypeSupported('openai', 'image/bmp');   // false
isImageTypeSupported('google', 'image/bmp');   // true (extended support)

Check PDF Support

supportsPDFsInline('openai');     // true
supportsPDFsInline('anthropic');  // true
supportsPDFsInline('google');     // true

Get Providers for Node Type

// Get all providers compatible with parse()
const parseProviders = getProvidersForNode('parse');

// Get providers compatible with extract()
const extractProviders = getProvidersForNode('extract');

// Check specific provider compatibility
isProviderCompatibleWithNode('openai', 'categorize'); // true

Estimate Costs

// Estimate cost for 1000 input + 500 output tokens
const cost = estimateCost('openai', 1000, 500);
console.log(`$${cost.toFixed(4)}`); // "$0.0125"

// Find cheapest provider for a workload
const cheapest = getCheapestProvider(10000, 1000);
console.log(cheapest.name); // "Google (Gemini)"

Compare Access Methods

const comparison = compareNativeVsOpenRouter('anthropic');
console.log(comparison.differences);
// ['Uses OpenAI-compatible format...', 'Response prefill trick...']

Access Provider Metadata

The PROVIDER_METADATA constant provides complete metadata for all providers:
const anthropic = PROVIDER_METADATA.anthropic;

console.log(anthropic.models);           // ['claude-opus-4.5', 'claude-sonnet-4.5', ...]
console.log(anthropic.capabilities);     // { supportsImages: true, supportsPDFs: true, ... }
console.log(anthropic.pricing);          // { inputPer1k: 0.003, outputPer1k: 0.015, ... }
console.log(anthropic.limits);           // { maxContextTokens: 200000, ... }
console.log(anthropic.inputFormats);     // { images: {...}, pdfs: {...} }

Next Steps

OpenAI

GPT-4.1, o3, o4-mini configuration

Anthropic

Claude models configuration

Google

Gemini models configuration

Mistral OCR

Mistral OCR 3 and Document AI setup

Surya OCR

Datalab Surya OCR setup