Skip to main content
Nodes are stateless, reusable building blocks for document operations. They transform inputs to outputs and can be chained together in flows.

What are Nodes?

Nodes are deterministic operations with consistent behavior:
  • Stateless: No side effects, same input always produces same output
  • Configurable: Accept configuration objects to customize behavior
  • Composable: Chain together in flows via .step() and .conditional()
  • Type-safe: Inputs/outputs are validated at compile time

Core Nodes

Doclo provides 8 core nodes for document operations:
  • parse: Convert documents to DocumentIR using OCR or VLM
  • extract: Extract structured data matching a JSON Schema
  • split: Identify document boundaries in multi-doc PDFs
  • categorize: Classify documents into predefined categories
  • chunk: Split documents into chunks for RAG/embeddings
  • combine: Merge results from parallel processing
  • trigger: Execute child flows for composition
  • output: Explicit output selection and transformation
For detailed documentation on each node, see the Nodes Reference.

Node Inputs and Outputs

  • parse: Input: FlowInput (PDF, image, DOCX, etc.) | Output: DocumentIR
  • extract: Input: DocumentIR, FlowInput, or ChunkOutput | Output: JSON
  • split: Input: FlowInput | Output: SplitDocument[]
  • categorize: Input: DocumentIR or FlowInput | Output:
  • chunk: Input: DocumentIR or DocumentIR[] | Output: ChunkOutput
  • combine: Input: T[] | Output: T or T[]
  • trigger: Input: Any | Output: Child flow output
  • output: Input: Any | Output: Any
FlowInput accepts any supported format via { base64: '...' } or { url: '...' }. See Documents for the full format list.

Using Nodes

Import nodes from @doclo/flows:
import { createFlow, parse, extract, split, categorize, chunk, combine, trigger } from '@doclo/flows';

// Or import output separately from @doclo/nodes
import { output } from '@doclo/nodes';

Basic Usage

const flow = createFlow()
  .step('extract', extract({
    provider: vlmProvider,
    schema: invoiceSchema
  }))
  .build();

const result = await flow.run({ base64: pdfDataUrl });

Chaining Nodes

const flow = createFlow()
  .step('parse', parse({ provider: ocrProvider }))
  .step('extract', extract({ provider: llmProvider, schema, inputMode: 'ir' }))
  .build();

Common Patterns

Parse → Extract

Most accurate path for text-heavy documents. Use inputMode: 'ir' with LLM providers since they process text from DocumentIR:
createFlow()
  .step('parse', parse({ provider: ocrProvider }))
  .step('extract', extract({ provider: llmProvider, schema, inputMode: 'ir' }))
  .build();

Direct VLM Extract

Fastest path for simple documents:
createFlow()
  .step('extract', extract({ 
    provider: vlmProvider, 
    schema: invoiceSchema
  }))
  .build();

Split → Process Each

Handle multi-document PDFs:
createFlow()
  .step('split', split({ 
    provider: vlmProvider,
    schemas: { invoice: invoiceSchema, receipt: receiptSchema }
  }))
  .forEach('process', (doc) =>
    createFlow()
      .step('extract', extract({ provider: vlmProvider, schema: invoiceSchema }))
  )
  .step('combine', combine())
  .build();

Categorize → Route

Route documents to different extractors:
createFlow()
  .step('categorize', categorize({
    provider: vlmProvider,
    categories: ['invoice', 'receipt', 'contract']
  }))
  .conditional('extract', (data) => {
    switch (data.category) {
      case 'invoice': return extract({ provider: vlmProvider, schema: invoiceSchema });
      case 'receipt': return extract({ provider: vlmProvider, schema: receiptSchema });
      default: return extract({ provider: vlmProvider, schema: genericSchema });
    }
  })
  .build();

Next Steps