Providers

Providers are the AI services that power Doclo’s document processing. The SDK normalizes all provider APIs into unified interfaces, handling cross-provider quirks like different JSON modes, token counting, cost calculation, error formats, and rate limit behaviors. This means you can swap providers without changing your flow code.

Provider Types

Doclo supports three types of providers:

OCRProvider

Converts documents to structured DocumentIR. Supports PDFs, images (JPEG, PNG, WebP, TIFF, etc.), and Office documents (DOCX, XLSX, PPTX) depending on the provider.

interface OCRProvider {
  parseToIR(input: { url?: string; base64?: string }): Promise<DocumentIR>;
}

Providers: Datalab, Mistral, Reducto, Unsiloed Best for: Text-heavy documents, high-accuracy requirements, multi-page PDFs, RAG pipelines, agentic search

LLMProvider

Text-only structured extraction using language models.

interface LLMProvider {
  completeJson(input: {
    prompt: string;
    schema: object;
  }): Promise<{ json: unknown; costUSD: number; ... }>;
}

Examples: GPT 5.1, Claude Haiku 4.5, Grok 4 Best for: Working with pre-parsed DocumentIR, text-only documents, cost-sensitive scenarios

LLMProvider has no vision capabilities. It requires DocumentIR or text as input.

VLMProvider

Vision + language models that can extract from images directly.

interface VLMProvider {
  completeJson(input: {
    prompt: string | MultimodalInput;
    schema: object;
  }): Promise<{ json: unknown; costUSD: number; ... }>;
}

Examples: GPT 5.1, Claude Sonnet 4, Gemini 2.5 Flash Best for: Documents with images/charts, complex visual layouts, speed over cost

Available Providers

LLM Providers

Provider	Models	Via
OpenAI	gpt-5, gpt-5-mini, gpt-4.1, o3, o4-mini	Direct or OpenRouter
Anthropic	claude-sonnet-4, claude-haiku-4.5	Direct or OpenRouter
Google	gemini-2.5-flash, gemini-2.5-pro, gemini-3.0-pro	Direct or OpenRouter
xAI	grok-4, grok-4-fast	OpenRouter

We’re actively expanding OpenRouter model support, with GPT OSS, Qwen, Kimi, GLM and others coming soon.

OCR Providers

Provider	Models	Package
Datalab	surya, marker	`@doclo/providers-datalab`
Mistral	ocr-2512	`@doclo/providers-mistral`
Reducto	reducto	`@doclo/providers-reducto`
Unsiloed	unsiloed	`@doclo/providers-unsiloed`

Unified Interface

All providers return consistent responses:

const result = await provider.completeJson({
  prompt: 'Extract invoice data',
  schema: invoiceSchema
});

// Every provider returns:
result.json       // Extracted data
result.costUSD    // Cost in USD
result.provider   // Provider name (e.g., 'openai:gpt-4o')
result.ms         // Duration in milliseconds

This means you can swap providers without changing your flow code.

OpenRouter vs Native

You can access providers through OpenRouter (a unified gateway) or directly:

Via OpenRouter
Native Access

import { createVLMProvider } from '@doclo/providers-llm';

const provider = createVLMProvider({
  provider: 'google',
  model: 'google/gemini-2.5-flash',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

Pros: Single API key for all providers, automatic fallbackCons: Slightly higher latency, additional cost markup

import { createVLMProvider } from '@doclo/providers-llm';

const provider = createVLMProvider({
  provider: 'openai',
  model: 'gpt-4o',
  apiKey: process.env.OPENAI_API_KEY!
});

Pros: Lower latency, direct pricingCons: Separate API key per provider

Fallback and Retry

Use buildLLMProvider for production workloads with automatic fallback:

import { buildLLMProvider } from '@doclo/providers-llm';

const provider = buildLLMProvider({
  providers: [
    {
      provider: 'google',
      model: 'google/gemini-2.5-flash',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    },
    {
      provider: 'openai',
      model: 'openai/gpt-5.1',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    }
  ],
  maxRetries: 3,
  retryDelay: 1000,
  useExponentialBackoff: true
});

Retry behavior:

Retries on 5xx errors and rate limits (429)
Skips retries on 4xx errors (bad requests)
Exponential backoff: 1s, 2s, 4s delays
Falls back to next provider after max retries

Choosing a Provider

Our Recommendation: Start with Gemini

For OCR and data extraction, Gemini models are by far the best and it’s not even close. We strongly recommend starting with Gemini and experimenting with other models as needed:

Gemini 2.5 Flash - Great workhorse for just about any OCR/extraction task. Fast, accurate, cost-effective.
Gemini 2.5 Pro / 3.0 Pro - Use for the trickiest, most complicated extraction tasks where you need maximum accuracy.

// Start here for most extraction tasks
const provider = createVLMProvider({
  provider: 'google',
  model: 'google/gemini-2.5-flash',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

When to Use OCR Providers

OCR providers (Datalab, Mistral, Reducto, Unsiloed) are valuable when you need the parsed content for purposes beyond just extraction:

RAG pipelines - Store and search document content
Text imports - Ingest documents into your system
Agentic search - Let agents query document content
Record keeping - Maintain searchable archives
Long-form text extraction - Multi-page documents with dense text

OCR + LLM Pipeline

For text-heavy documents, use OCR to parse first, then extract with an LLM:

import { createFlow, parse, extract } from '@doclo/flows';
import { createOCRProvider } from '@doclo/providers-datalab';
import { createVLMProvider } from '@doclo/providers-llm';

const ocrProvider = createOCRProvider({
  provider: 'datalab',
  model: 'marker',
  apiKey: process.env.DATALAB_API_KEY!
});

const llmProvider = createVLMProvider({
  provider: 'google',
  model: 'google/gemini-2.5-flash',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

// Parse to DocumentIR, then extract structured data
const flow = createFlow()
  .step('parse', parse({ provider: ocrProvider }))
  .step('extract', extract({
    provider: llmProvider,
    schema: invoiceSchema
  }))
  .build();

This approach gives you:

OCR text for RAG, search, and record keeping
Maximum text accuracy from specialized OCR
Citation tracking back to source lines
Lower cost at scale (OCR + text-only LLM is cheaper than VLM)

Next Steps

Provider Setup

Configure providers in detail

Nodes

Learn about processing nodes

Getting Started

Concepts

SDK

Doclo Cloud

Guides

Resources

Provider Types

OCRProvider

LLMProvider

VLMProvider

Available Providers

LLM Providers

OCR Providers

Unified Interface

OpenRouter vs Native

Fallback and Retry

Choosing a Provider

Our Recommendation: Start with Gemini

When to Use OCR Providers

OCR + LLM Pipeline

Next Steps

Provider Setup

Nodes

Getting Started

Concepts

SDK

Doclo Cloud

Guides

Resources

​Provider Types

​OCRProvider

​LLMProvider

​VLMProvider

​Available Providers

​LLM Providers

​OCR Providers

​Unified Interface

​OpenRouter vs Native

​Fallback and Retry

​Choosing a Provider

​Our Recommendation: Start with Gemini

​When to Use OCR Providers

​OCR + LLM Pipeline

​Next Steps

Provider Setup

Nodes

Provider Types

OCRProvider

LLMProvider

VLMProvider

Available Providers

LLM Providers

OCR Providers

Unified Interface

OpenRouter vs Native

Fallback and Retry

Choosing a Provider

Our Recommendation: Start with Gemini

When to Use OCR Providers

OCR + LLM Pipeline

Next Steps