Providers Overview

Providers are the external services that power document processing. Doclo supports two types:

VLM Providers (Vision Language Models): Process documents visually, extract structured data, classify documents
OCR Providers: Convert documents to text with layout information

Provider Types

VLM Providers

VLM providers can see document images directly. Use them for:

Direct extraction from visually complex documents
Document classification and categorization
Splitting multi-document files
Quality assessment

import { createVLMProvider } from '@doclo/providers-llm';

const vlmProvider = createVLMProvider({
  provider: 'google',
  model: 'google/gemini-2.5-flash-preview-09-2025',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

OCR Providers

OCR providers convert documents to structured text. Use them for:

High-fidelity text extraction with bounding boxes
Processing text-heavy documents
Building RAG pipelines with chunking

import { createOCRProvider } from '@doclo/providers-datalab';

const ocrProvider = createOCRProvider({
  endpoint: 'https://www.datalab.to/api/v1/ocr',
  apiKey: process.env.DATALAB_API_KEY!
});

Supported Providers

VLM Providers

Provider	Models	Vision	PDFs	Structured Output
OpenAI	GPT-4.1, o3, o4-mini	Yes	Yes	Yes
Anthropic	Claude 4, Sonnet 4.5, Haiku 4.5	Yes	Yes	Yes
Google	Gemini 2.5 Pro/Flash	Yes	Yes	Yes
xAI	Grok 4.1	Yes	Yes	Yes

OCR Providers

Provider	Package	Features
Surya	`@doclo/providers-datalab`	Text + bounding boxes
Marker	`@doclo/providers-datalab`	Markdown conversion
Mistral	`@doclo/providers-mistral`	Markdown, native extraction, handwriting
Reducto	`@doclo/providers-reducto`	Chunking, citations
Unsiloed	`@doclo/providers-unsiloed`	Parse, extract, classify, split

Access Methods

VLM providers can be accessed two ways:

Via OpenRouter (Recommended)

Single API key for all providers with unified billing:

const provider = createVLMProvider({
  provider: 'anthropic',
  model: 'anthropic/claude-sonnet-4.5',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

Benefits:

Single API key for all providers
Unified billing and usage tracking
Automatic cost tracking in responses
Provider fallback without multiple API keys

Native APIs

Direct access to provider APIs:

const provider = createVLMProvider({
  provider: 'openai',
  model: 'gpt-4.1',
  apiKey: process.env.OPENAI_API_KEY!
  // No 'via' parameter = native API
});

Use native APIs when:

You have existing API keys
You need provider-specific features
You want to avoid the OpenRouter intermediary

Provider Selection

Choose based on your needs:

Use Case	Recommended Provider
Fast extraction	Google Gemini 2.5 Flash
Complex documents	Anthropic Claude Sonnet 4.5
Cost-sensitive	Google Gemini 2.5 Flash Lite
Reasoning required	OpenAI o3, Anthropic Claude
OCR + text extraction	Surya or Marker
RAG chunking	Reducto Parse

Production Configuration

For production, use buildLLMProvider with fallback support:

import { buildLLMProvider } from '@doclo/providers-llm';

const provider = buildLLMProvider({
  providers: [
    {
      provider: 'google',
      model: 'google/gemini-2.5-flash-preview-09-2025',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    },
    {
      provider: 'anthropic',
      model: 'anthropic/claude-sonnet-4.5',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    }
  ],
  maxRetries: 2,
  retryDelay: 1000,
  useExponentialBackoff: true,
  circuitBreakerThreshold: 3
});

This configuration:

Retries failed requests up to 2 times
Falls back to the next provider if one fails
Uses circuit breaker to skip failing providers
Applies exponential backoff between retries

Cost Tracking

All providers return cost information:

const result = await flow.run(input);

console.log('Cost:', result.aggregated.totalCostUSD);
console.log('Tokens:', result.aggregated.totalInputTokens, 'in /',
            result.aggregated.totalOutputTokens, 'out');

Provider Metadata Utilities

The SDK exports utility functions for querying provider capabilities programmatically:

import {
  PROVIDER_METADATA,
  isImageTypeSupported,
  supportsPDFsInline,
  getProvidersForNode,
  isProviderCompatibleWithNode,
  estimateCost,
  getCheapestProvider,
  compareNativeVsOpenRouter
} from '@doclo/providers-llm';

Check Image Support

isImageTypeSupported('openai', 'image/png');   // true
isImageTypeSupported('openai', 'image/bmp');   // false
isImageTypeSupported('google', 'image/bmp');   // true (extended support)

Check PDF Support

supportsPDFsInline('openai');     // true
supportsPDFsInline('anthropic');  // true
supportsPDFsInline('google');     // true

Get Providers for Node Type

// Get all providers compatible with parse()
const parseProviders = getProvidersForNode('parse');

// Get providers compatible with extract()
const extractProviders = getProvidersForNode('extract');

// Check specific provider compatibility
isProviderCompatibleWithNode('openai', 'categorize'); // true

Estimate Costs

// Estimate cost for 1000 input + 500 output tokens
const cost = estimateCost('openai', 1000, 500);
console.log(`$${cost.toFixed(4)}`); // "$0.0125"

// Find cheapest provider for a workload
const cheapest = getCheapestProvider(10000, 1000);
console.log(cheapest.name); // "Google (Gemini)"

Compare Access Methods

const comparison = compareNativeVsOpenRouter('anthropic');
console.log(comparison.differences);
// ['Uses OpenAI-compatible format...', 'Response prefill trick...']

Access Provider Metadata

The PROVIDER_METADATA constant provides complete metadata for all providers:

const anthropic = PROVIDER_METADATA.anthropic;

console.log(anthropic.models);           // ['claude-opus-4.5', 'claude-sonnet-4.5', ...]
console.log(anthropic.capabilities);     // { supportsImages: true, supportsPDFs: true, ... }
console.log(anthropic.pricing);          // { inputPer1k: 0.003, outputPer1k: 0.015, ... }
console.log(anthropic.limits);           // { maxContextTokens: 200000, ... }
console.log(anthropic.inputFormats);     // { images: {...}, pdfs: {...} }

Next Steps

OpenAI

GPT-4.1, o3, o4-mini configuration

Anthropic

Claude models configuration

Google

Gemini models configuration

Mistral OCR

Mistral OCR 3 and Document AI setup

Surya OCR

Datalab Surya OCR setup

​Provider Types

​VLM Providers

​OCR Providers

​Supported Providers

​VLM Providers

​OCR Providers

​Access Methods

​Via OpenRouter (Recommended)

​Native APIs

​Provider Selection

​Production Configuration

​Cost Tracking

​Provider Metadata Utilities

​Check Image Support

​Check PDF Support

​Get Providers for Node Type

​Estimate Costs

​Compare Access Methods

​Access Provider Metadata

​Next Steps

OpenAI

Anthropic

Google

Mistral OCR

Surya OCR

Provider Types

VLM Providers

OCR Providers

Supported Providers

VLM Providers

OCR Providers

Access Methods

Via OpenRouter (Recommended)

Native APIs

Provider Selection

Production Configuration

Cost Tracking

Provider Metadata Utilities

Check Image Support

Check PDF Support

Get Providers for Node Type

Estimate Costs

Compare Access Methods

Access Provider Metadata

Next Steps