Skip to main content
Providers are the AI services that power Doclo’s document processing. The SDK normalizes all provider APIs into unified interfaces, handling cross-provider quirks like different JSON modes, token counting, cost calculation, error formats, and rate limit behaviors. This means you can swap providers without changing your flow code.

Provider Types

Doclo supports three types of providers:

OCRProvider

Converts documents to structured DocumentIR. Supports PDFs, images (JPEG, PNG, WebP, TIFF, etc.), and Office documents (DOCX, XLSX, PPTX) depending on the provider.
interface OCRProvider {
  parseToIR(input: { url?: string; base64?: string }): Promise<DocumentIR>;
}
Providers: Datalab, Mistral, Reducto, Unsiloed Best for: Text-heavy documents, high-accuracy requirements, multi-page PDFs, RAG pipelines, agentic search

LLMProvider

Text-only structured extraction using language models.
interface LLMProvider {
  completeJson(input: {
    prompt: string;
    schema: object;
  }): Promise<{ json: unknown; costUSD: number; ... }>;
}
Examples: GPT 5.1, Claude Haiku 4.5, Grok 4 Best for: Working with pre-parsed DocumentIR, text-only documents, cost-sensitive scenarios
LLMProvider has no vision capabilities. It requires DocumentIR or text as input.

VLMProvider

Vision + language models that can extract from images directly.
interface VLMProvider {
  completeJson(input: {
    prompt: string | MultimodalInput;
    schema: object;
  }): Promise<{ json: unknown; costUSD: number; ... }>;
}
Examples: GPT 5.1, Claude Sonnet 4, Gemini 2.5 Flash Best for: Documents with images/charts, complex visual layouts, speed over cost

Available Providers

LLM Providers

ProviderModelsVia
OpenAIgpt-5, gpt-5-mini, gpt-4.1, o3, o4-miniDirect or OpenRouter
Anthropicclaude-sonnet-4, claude-haiku-4.5Direct or OpenRouter
Googlegemini-2.5-flash, gemini-2.5-pro, gemini-3.0-proDirect or OpenRouter
xAIgrok-4, grok-4-fastOpenRouter
We’re actively expanding OpenRouter model support, with GPT OSS, Qwen, Kimi, GLM and others coming soon.

OCR Providers

ProviderModelsPackage
Datalabsurya, marker@doclo/providers-datalab
Mistralocr-2512@doclo/providers-mistral
Reductoreducto@doclo/providers-reducto
Unsiloedunsiloed@doclo/providers-unsiloed

Unified Interface

All providers return consistent responses:
const result = await provider.completeJson({
  prompt: 'Extract invoice data',
  schema: invoiceSchema
});

// Every provider returns:
result.json       // Extracted data
result.costUSD    // Cost in USD
result.provider   // Provider name (e.g., 'openai:gpt-4o')
result.ms         // Duration in milliseconds
This means you can swap providers without changing your flow code.

OpenRouter vs Native

You can access providers through OpenRouter (a unified gateway) or directly:
import { createVLMProvider } from '@doclo/providers-llm';

const provider = createVLMProvider({
  provider: 'google',
  model: 'google/gemini-2.5-flash',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});
Pros: Single API key for all providers, automatic fallbackCons: Slightly higher latency, additional cost markup

Fallback and Retry

Use buildLLMProvider for production workloads with automatic fallback:
import { buildLLMProvider } from '@doclo/providers-llm';

const provider = buildLLMProvider({
  providers: [
    {
      provider: 'google',
      model: 'google/gemini-2.5-flash',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    },
    {
      provider: 'openai',
      model: 'openai/gpt-5.1',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    }
  ],
  maxRetries: 3,
  retryDelay: 1000,
  useExponentialBackoff: true
});
Retry behavior:
  • Retries on 5xx errors and rate limits (429)
  • Skips retries on 4xx errors (bad requests)
  • Exponential backoff: 1s, 2s, 4s delays
  • Falls back to next provider after max retries

Choosing a Provider

Our Recommendation: Start with Gemini

For OCR and data extraction, Gemini models are by far the best and it’s not even close. We strongly recommend starting with Gemini and experimenting with other models as needed:
  • Gemini 2.5 Flash - Great workhorse for just about any OCR/extraction task. Fast, accurate, cost-effective.
  • Gemini 2.5 Pro / 3.0 Pro - Use for the trickiest, most complicated extraction tasks where you need maximum accuracy.
// Start here for most extraction tasks
const provider = createVLMProvider({
  provider: 'google',
  model: 'google/gemini-2.5-flash',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

When to Use OCR Providers

OCR providers (Datalab, Mistral, Reducto, Unsiloed) are valuable when you need the parsed content for purposes beyond just extraction:
  • RAG pipelines - Store and search document content
  • Text imports - Ingest documents into your system
  • Agentic search - Let agents query document content
  • Record keeping - Maintain searchable archives
  • Long-form text extraction - Multi-page documents with dense text

OCR + LLM Pipeline

For text-heavy documents, use OCR to parse first, then extract with an LLM:
import { createFlow, parse, extract } from '@doclo/flows';
import { createOCRProvider } from '@doclo/providers-datalab';
import { createVLMProvider } from '@doclo/providers-llm';

const ocrProvider = createOCRProvider({
  provider: 'datalab',
  model: 'marker',
  apiKey: process.env.DATALAB_API_KEY!
});

const llmProvider = createVLMProvider({
  provider: 'google',
  model: 'google/gemini-2.5-flash',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

// Parse to DocumentIR, then extract structured data
const flow = createFlow()
  .step('parse', parse({ provider: ocrProvider }))
  .step('extract', extract({
    provider: llmProvider,
    schema: invoiceSchema
  }))
  .build();
This approach gives you:
  • OCR text for RAG, search, and record keeping
  • Maximum text accuracy from specialized OCR
  • Citation tracking back to source lines
  • Lower cost at scale (OCR + text-only LLM is cheaper than VLM)

Next Steps