Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.doclo.ai/llms.txt

Use this file to discover all available pages before exploring further.

The parse node converts raw documents (PDFs, images) into Doclo’s intermediate representation (DocumentIR) with text content and layout information.

Basic Usage

import { createFlow, parse } from '@doclo/flows';
import { suryaProvider } from '@doclo/providers-datalab';

const ocrProvider = suryaProvider({
  endpoint: 'https://www.datalab.to/api/v1/ocr',
  apiKey: process.env.DATALAB_API_KEY!
});

const flow = createFlow()
  .step('parse', parse({ provider: ocrProvider }))
  .build();

const result = await flow.run({ base64: pdfDataUrl });
// result.output is DocumentIR

Configuration Options

parse({
  provider: ocrProvider,           // Required: OCR or VLM provider
  format: 'text',                  // Output format: 'text' | 'markdown' | 'html'
  describeFigures: false,          // Describe charts/diagrams (VLM only)
  includeImages: false,            // Extract images (Surya/Marker only)
  citations: { enabled: true },    // Enable citation tracking
  consensus: { runs: 3 },          // Multi-run consensus
  chunked: {                       // Process large PDFs in chunks
    maxPagesPerChunk: 10,
    parallel: true
  }
})

Options Reference

OptionTypeDefaultDescription
providerOCRProvider | VLMProviderRequiredProvider for parsing
format'text' | 'markdown' | 'html''text'Output format
describeFiguresbooleanfalseDescribe charts/diagrams (VLM only)
includeImagesbooleanfalseExtract embedded images
citationsCitationConfig-Citation tracking configuration
consensusConsensusConfig-Multi-run voting for accuracy
chunkedobject-Large document chunking
reasoningobject-Extended reasoning (VLM only)
additionalInstructionsstring-Custom parsing guidance

Output Format

Text Format (Default)

Line-by-line output with position data:
parse({
  provider: ocrProvider,
  format: 'text'
})
Best for: Maximum precision, citation tracking, numeric data.

Markdown Format

Preserves document structure (tables, headers, lists):
parse({
  provider: vlmProvider,
  format: 'markdown'
})
Best for: Structured documents, reports, forms with tables.

HTML Format

Rich formatting with semantic structure:
parse({
  provider: vlmProvider,
  format: 'html'
})
Best for: Complex layouts, multi-column documents.

Provider Types

OCR Provider

Use for text-heavy documents requiring high accuracy:
import { suryaProvider } from '@doclo/providers-datalab';

const ocrProvider = suryaProvider({
  endpoint: 'https://www.datalab.to/api/v1/ocr',
  apiKey: process.env.DATALAB_API_KEY!
});

parse({ provider: ocrProvider })

VLM Provider

Use for visual documents or when you need structure detection:
import { createVLMProvider } from '@doclo/providers-llm';

const vlmProvider = createVLMProvider({
  provider: 'google',
  model: 'google/gemini-2.5-flash',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

parse({
  provider: vlmProvider,
  format: 'markdown',
  describeFigures: true
})

Large Document Handling

For PDFs with many pages, use chunking to avoid timeouts and memory issues:
parse({
  provider: ocrProvider,
  chunked: {
    maxPagesPerChunk: 10,  // Pages per chunk
    overlap: 0,            // Page overlap between chunks
    parallel: true         // Process chunks in parallel
  }
})
The output combines all chunks into a single DocumentIR.

Citation Tracking

Enable line-level citations for source tracking:
parse({
  provider: ocrProvider,
  format: 'text',
  citations: {
    enabled: true
  }
})
Each line in the output includes a lineId (e.g., p1_l5 for page 1, line 5) that can be referenced during extraction.

Extended Reasoning

For VLM providers that support it, enable extended reasoning:
parse({
  provider: vlmProvider,
  format: 'markdown',
  reasoning: {
    enabled: true,
    effort: 'medium'  // 'low' | 'medium' | 'high'
  }
})

Custom Instructions

Add parsing guidance:
parse({
  provider: vlmProvider,
  format: 'markdown',
  additionalInstructions: 'Pay special attention to preserving table structures and footnotes.'
})

Output: DocumentIR

The parse node outputs a DocumentIR object:
interface DocumentIR {
  pages: Array<{
    lines: Array<{
      text: string;
      bbox: { x: number; y: number; w: number; h: number };
      lineId?: string;  // For citations
    }>;
    width: number;
    height: number;
    markdown?: string;  // If format: 'markdown'
    html?: string;      // If format: 'html'
  }>;
  extras?: {
    providerType: 'ocr' | 'vlm';
  };
}

Next Steps

extract

Extract structured data from parsed documents

OCR Providers

Configure OCR providers