Nodes - Doclo

Nodes are stateless, reusable building blocks for document operations. They transform inputs to outputs and can be chained together in flows.

What are Nodes?

Nodes are deterministic operations with consistent behavior:

Stateless: No side effects, same input always produces same output
Configurable: Accept configuration objects to customize behavior
Composable: Chain together in flows via .step() and .conditional()
Type-safe: Inputs/outputs are validated at compile time

Core Nodes

Doclo provides 8 core nodes for document operations:

parse: Convert documents to DocumentIR using OCR or VLM
extract: Extract structured data matching a JSON Schema
split: Identify document boundaries in multi-doc PDFs
categorize: Classify documents into predefined categories
chunk: Split documents into chunks for RAG/embeddings
combine: Merge results from parallel processing
trigger: Execute child flows for composition
output: Explicit output selection and transformation

For detailed documentation on each node, see the Nodes Reference.

Node Inputs and Outputs

parse: Input: FlowInput (PDF, image, DOCX, etc.) | Output: DocumentIR
extract: Input: DocumentIR, FlowInput, or ChunkOutput | Output: JSON
split: Input: FlowInput | Output: SplitDocument[]
categorize: Input: DocumentIR or FlowInput | Output:
chunk: Input: DocumentIR or DocumentIR[] | Output: ChunkOutput
combine: Input: T[] | Output: T or T[]
trigger: Input: Any | Output: Child flow output
output: Input: Any | Output: Any

FlowInput accepts any supported format via { base64: '...' } or { url: '...' }. See Documents for the full format list.

Using Nodes

Import nodes from @doclo/flows:

import { createFlow, parse, extract, split, categorize, chunk, combine, trigger } from '@doclo/flows';

// Or import output separately from @doclo/nodes
import { output } from '@doclo/nodes';

Basic Usage

const flow = createFlow()
  .step('extract', extract({
    provider: vlmProvider,
    schema: invoiceSchema
  }))
  .build();

const result = await flow.run({ base64: pdfDataUrl });

Chaining Nodes

const flow = createFlow()
  .step('parse', parse({ provider: ocrProvider }))
  .step('extract', extract({ provider: llmProvider, schema, inputMode: 'ir' }))
  .build();

Common Patterns

Parse → Extract

Most accurate path for text-heavy documents. Use inputMode: 'ir' with LLM providers since they process text from DocumentIR:

createFlow()
  .step('parse', parse({ provider: ocrProvider }))
  .step('extract', extract({ provider: llmProvider, schema, inputMode: 'ir' }))
  .build();

Direct VLM Extract

Fastest path for simple documents:

createFlow()
  .step('extract', extract({ 
    provider: vlmProvider, 
    schema: invoiceSchema
  }))
  .build();

Split → Process Each

Handle multi-document PDFs:

createFlow()
  .step('split', split({ 
    provider: vlmProvider,
    schemas: { invoice: invoiceSchema, receipt: receiptSchema }
  }))
  .forEach('process', (doc) =>
    createFlow()
      .step('extract', extract({ provider: vlmProvider, schema: invoiceSchema }))
  )
  .step('combine', combine())
  .build();

Categorize → Route

Route documents to different extractors:

createFlow()
  .step('categorize', categorize({
    provider: vlmProvider,
    categories: ['invoice', 'receipt', 'contract']
  }))
  .conditional('extract', (data) => {
    switch (data.category) {
      case 'invoice': return extract({ provider: vlmProvider, schema: invoiceSchema });
      case 'receipt': return extract({ provider: vlmProvider, schema: receiptSchema });
      default: return extract({ provider: vlmProvider, schema: genericSchema });
    }
  })
  .build();

Next Steps

Nodes Reference - Detailed documentation for each node
Flows - Learn about flow orchestration

​What are Nodes?

​Core Nodes

​Node Inputs and Outputs

​Using Nodes

​Basic Usage

​Chaining Nodes

​Common Patterns

​Parse → Extract

​Direct VLM Extract

​Split → Process Each

​Categorize → Route

​Next Steps