Doclo processes documents through a standardized intermediate representation called DocumentIR. This decouples parsing from extraction, allowing you to use different providers for each step.Documentation Index
Fetch the complete documentation index at: https://docs.doclo.ai/llms.txt
Use this file to discover all available pages before exploring further.
Supported Input Formats
Doclo accepts documents in many formats. Provider support varies:| Format | Extension | Datalab | Mistral | Reducto | Unsiloed | OpenAI | Anthropic | xAI | |
|---|---|---|---|---|---|---|---|---|---|
.pdf | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | |
| JPEG | .jpg, .jpeg | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| PNG | .png | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
| WebP | .webp | Yes | Yes | Yes | - | Yes | Yes | Yes | - |
| GIF | .gif | Yes | Yes | Yes | - | Yes | Yes | Yes | - |
| TIFF | .tiff, .tif | Yes | Yes | - | Yes | - | - | Yes | - |
| BMP | .bmp | - | Yes | Yes | - | - | - | Yes | - |
| HEIC | .heic, .heif | - | Yes | Yes | - | - | - | Yes | - |
| AVIF | .avif | - | Yes | - | - | - | - | - | - |
| PSD | .psd | - | - | Yes | - | - | - | - | - |
| DOCX | .docx | Yes | Yes | Yes | Yes | - | - | - | Yes |
| DOC | .doc | Yes | - | - | - | - | - | - | - |
| XLSX | .xlsx | - | - | Yes | Yes | - | - | - | - |
| XLS | .xls | - | - | - | - | - | - | - | - |
| PPTX | .pptx | - | Yes | Yes | Yes | - | - | - | - |
| ODT | .odt | Yes | Yes | - | - | - | - | - | - |
| ODS | .ods | Yes | - | - | - | - | - | - | - |
| ODP | .odp | Yes | - | - | - | - | - | - | - |
| HTML | .html, .htm | Yes | - | - | - | - | - | - | - |
| TXT | .txt | - | Yes | Yes | - | - | - | - | Yes |
| CSV | .csv | - | - | Yes | - | - | - | - | Yes |
| RTF | .rtf | - | Yes | Yes | - | - | - | - | - |
| EPUB | .epub | Yes | Yes | - | - | - | - | - | - |
| MD | .md | - | - | - | - | - | - | - | Yes |
| LaTeX | .tex | - | Yes | - | - | - | - | - | - |
| Jupyter | .ipynb | - | Yes | - | - | - | - | - | - |
VLM providers support images and PDFs directly, with variations by provider (see table). xAI also supports DOCX, TXT, CSV, and MD files natively. Mistral OCR has the widest format support including LaTeX and Jupyter notebooks. For other Office documents and text formats, use an OCR provider first.
Input Methods
Pass documents to flows using any of these methods:Converting Files to Base64
Use thebufferToDataUri utility from @doclo/core:
DocumentIR (Intermediate Representation)
DocumentIR is Doclo’s standard format for representing parsed documents. It preserves structure, layout, and enables citation tracking.Structure
DocumentIR uses a page-centric format:Content Formats
DocumentIR supports multiple output formats:- Plain text: Line-by-line OCR output with spatial coordinates
- Markdown: Structured documents with tables, headers, lists preserved
- HTML: Rich formatting with tables and semantic structure
Layout Preservation
DocumentIR preserves document structure through:- Bounding boxes: Every line has
(x, y, width, height)coordinates - Reading order: Lines are ordered as they should be read
- Table structure: Markdown/HTML capture tables and columns
- Semantic structure: Headings, lists, and formatting preserved
Provenance Tracking
DocumentIR tracks metadata about the parsing process:When to Use OCR vs VLM
Choose your parsing approach based on document characteristics:Use VLM Direct (No DocumentIR)
VLMs excel when layout context matters:- Handwritten forms - VLMs understand spatial relationships between fields and handwriting
- Photos of documents - Receipts, whiteboards, ID cards captured by phone
- Varied layouts - When documents come in many different formats/structures
- Charts and diagrams - Visual elements that need interpretation
- Quick prototyping - Fastest path to get something working
Use OCR → LLM (With DocumentIR)
OCR shines for text-heavy, structured documents:- Clean PDFs - Invoices, contracts, reports with consistent formatting
- Dense text - Multi-page documents where accuracy matters
- RAG pipelines - When you need to store and search document content
- Agentic loops - Repeated queries against the same document
- Citation tracking - When you need to trace extracted values back to source lines
Document Lifecycle
- Raw Document: PDF, image, or Office document input
- DocumentIR: Parsed text with layout (optional - skipped with VLM direct)
- Structured JSON: Extracted data matching your schema
Next Steps
Providers
Learn about OCR and LLM providers
Parse Node
Configure document parsing