Mistral provides VLM-based document processing through two services: OCR 3 for parsing documents to markdown, and Document AI for structured extraction directly from source documents.
Installation
npm install @doclo/providers-mistral
Providers Overview
Mistral offers two services through the same package:
| Service | Function | Use Case | Cost |
|---|
| OCR 3 | Document to markdown | Parsing for RAG, text extraction | $0.002/page |
| Document AI | Schema-based extraction | Direct extraction from source | $0.002/page |
Mistral OCR 3 is a VLM under the hood, not traditional OCR. It provides excellent handwriting recognition and handles complex layouts well.
OCR Provider
Use mistralOCRProvider for parsing documents to markdown/DocumentIR.
Basic Setup
import { mistralOCRProvider } from '@doclo/providers-mistral';
const ocrProvider = mistralOCRProvider({
apiKey: process.env.MISTRAL_API_KEY!
});
Configuration Options
mistralOCRProvider({
apiKey: string, // Required: Mistral API key
// Table handling
tableFormat?: 'html' | 'markdown', // Table output format (default: html)
// Header/footer extraction
extractHeader?: boolean, // Extract headers into separate field
extractFooter?: boolean, // Extract footers into separate field
// Image handling
includeImageBase64?: boolean, // Include base64 images in response
// Page selection
pages?: string | number | number[], // Specific pages: "0-5", 3, or [0, 2, 5]
})
Usage with Flows
import { createFlow, parse } from '@doclo/flows';
import { mistralOCRProvider } from '@doclo/providers-mistral';
const ocrProvider = mistralOCRProvider({
apiKey: process.env.MISTRAL_API_KEY!,
tableFormat: 'html'
});
const flow = createFlow()
.step('parse', parse({ provider: ocrProvider }))
.build();
const result = await flow.run({
url: 'https://example.com/document.pdf'
});
// Access parsed content
console.log(result.output.pages[0].lines);
Output: DocumentIR
interface DocumentIR {
pages: {
width: number;
height: number;
lines: { text: string }[];
images?: {
id: string;
bbox: { top_left_x, top_left_y, bottom_right_x, bottom_right_y };
base64?: string; // If includeImageBase64 was true
}[];
}[];
extras?: {
raw: object; // Raw Mistral response
costUSD: number;
pageCount: number;
markdown: string; // Full markdown output
};
}
VLM Provider (Document AI)
Use mistralVLMProvider for structured extraction directly from source documents using JSON schema.
Mistral VLM always requires raw document input (URL or base64). It cannot extract from pre-parsed DocumentIR. Use it as the first step in a flow, not after a parse() step.
Basic Setup
import { mistralVLMProvider } from '@doclo/providers-mistral';
const vlmProvider = mistralVLMProvider({
apiKey: process.env.MISTRAL_API_KEY!
});
Configuration Options
mistralVLMProvider({
apiKey: string, // Required: Mistral API key
// Annotation mode
annotationMode?: 'document' | 'bbox', // Extraction mode (default: document)
// Image handling
includeImageBase64?: boolean, // Include base64 images in response
// Page selection
pages?: string | number | number[], // Specific pages to process
})
Annotation Modes
| Mode | Description | Page Limit |
|---|
document | Single JSON output for entire document | 8 pages |
bbox | Per-image annotations with bounding boxes | 1000 pages |
Usage with Flows
import { createFlow, extract } from '@doclo/flows';
import { mistralVLMProvider } from '@doclo/providers-mistral';
const vlmProvider = mistralVLMProvider({
apiKey: process.env.MISTRAL_API_KEY!,
annotationMode: 'document'
});
const invoiceSchema = {
type: 'object',
properties: {
invoiceNumber: { type: 'string' },
date: { type: 'string' },
total: { type: 'number' },
lineItems: {
type: 'array',
items: {
type: 'object',
properties: {
description: { type: 'string' },
quantity: { type: 'number' },
price: { type: 'number' }
}
}
}
}
};
const flow = createFlow()
.step('extract', extract({
provider: vlmProvider,
schema: invoiceSchema,
inputMode: 'raw' // Required: Mistral needs raw document
}))
.build();
const result = await flow.run({
base64: 'data:application/pdf;base64,...'
});
console.log(result.output);
// { invoiceNumber: "INV-001", date: "2025-01-15", total: 1250.00, ... }
Documents
| Format | MIME Type | Supported |
|---|
| PDF | application/pdf | Yes |
| DOCX | application/vnd.openxmlformats-officedocument.wordprocessingml.document | Yes |
| PPTX | application/vnd.openxmlformats-officedocument.presentationml.presentation | Yes |
| TXT | text/plain | Yes |
| EPUB | application/epub+zip | Yes |
| RTF | application/rtf | Yes |
| ODT | application/vnd.oasis.opendocument.text | Yes |
| LaTeX | application/x-latex | Yes |
| Jupyter | application/x-ipynb+json | Yes |
| XLSX | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | No |
Images
| Format | MIME Type |
|---|
| JPEG | image/jpeg |
| PNG | image/png |
| WebP | image/webp |
| TIFF | image/tiff |
| GIF | image/gif |
| AVIF | image/avif |
| BMP | image/bmp |
| HEIC/HEIF | image/heic, image/heif |
Limits
| Limit | Value |
|---|
| Max file size | 50 MB |
| Max pages (OCR) | 1000 |
| Max pages (Document AI - document mode) | 8 |
| Max pages (Document AI - bbox mode) | 1000 |
Pricing
| Service | Cost | Batch Discount |
|---|
| OCR 3 | $0.002/page | 50% off |
| Document AI | $0.002/page | 50% off |
$2 per 1000 pages makes Mistral one of the most cost-effective OCR options available.
Mistral vs Other Providers
| Feature | Mistral | Reducto | Surya | Marker |
|---|
| Document parsing | Yes | Yes | Yes | Yes |
| Structured extraction | Yes (native) | Yes (native) | Via LLM | Via LLM |
| Handwriting | Excellent | Good | Good | Limited |
| Format support | Extensive | Good | PDF/Images | PDF/Images |
| Bounding boxes | Image-level | Yes | Yes | No |
| Cost/page | $0.002 | $0.004+ | $0.01 | $0.002+ |
| Page limit | 1000 | Unlimited | Unlimited | Unlimited |
Choose Mistral when:
- Processing documents with handwriting
- You need native structured extraction without a separate LLM
- Working with diverse document formats (DOCX, PPTX, EPUB, etc.)
- Cost is a primary concern
Example: Parse and Extract Pipeline
For documents over 8 pages, use OCR to parse first, then an LLM to extract:
import { createFlow, parse, extract } from '@doclo/flows';
import { mistralOCRProvider } from '@doclo/providers-mistral';
import { createVLMProvider } from '@doclo/providers-llm';
const ocrProvider = mistralOCRProvider({
apiKey: process.env.MISTRAL_API_KEY!
});
const llmProvider = createVLMProvider({
provider: 'google',
model: 'google/gemini-2.5-flash',
apiKey: process.env.OPENROUTER_API_KEY!,
via: 'openrouter'
});
const flow = createFlow()
.step('parse', parse({ provider: ocrProvider }))
.step('extract', extract({
provider: llmProvider,
schema: contractSchema,
inputMode: 'ir' // Use parsed DocumentIR
}))
.build();
For documents under 8 pages, extract directly:
import { createFlow, extract } from '@doclo/flows';
import { mistralVLMProvider } from '@doclo/providers-mistral';
const vlmProvider = mistralVLMProvider({
apiKey: process.env.MISTRAL_API_KEY!
});
const flow = createFlow()
.step('extract', extract({
provider: vlmProvider,
schema: invoiceSchema,
inputMode: 'raw'
}))
.build();
// Single-step extraction
const result = await flow.run({
url: 'https://example.com/invoice.pdf'
});
Next Steps