Nodes Overview

Nodes are stateless building blocks that transform document data. Each node performs a specific operation and can be chained together in flows.

Available Nodes

Node	Description	Provider
parse	Convert documents to DocumentIR using OCR or VLM	OCR or VLM
extract	Extract structured data matching a JSON Schema	VLM or LLM
split	Identify document boundaries in multi-doc PDFs	VLM
categorize	Classify documents into predefined categories	VLM
chunk	Split documents into chunks for RAG/embeddings	None
combine	Merge results from parallel processing	None
output	Control which data is returned from a flow	None
trigger	Execute a child flow from within a parent flow	None

Import

All nodes are exported from @doclo/flows:

import {
  createFlow,
  parse,
  extract,
  split,
  categorize,
  chunk,
  combine,
  output
} from '@doclo/flows';

Basic Usage

Nodes are functions that return configured node objects. Use them with createFlow().step():

const flow = createFlow()
  .step('parse', parse({ provider: ocrProvider }))
  .step('extract', extract({ provider: llmProvider, schema }))
  .build();

const result = await flow.run({ base64: documentData });

Node Types

Provider Nodes

Nodes that require an AI provider to process documents:

parse - Requires OCR or VLM provider
extract - Requires VLM (for images/PDFs) or LLM (for text)
split - Requires VLM provider
categorize - Requires VLM provider

Utility Nodes

Nodes that transform data without calling external providers:

chunk - Splits DocumentIR into smaller pieces
combine - Merges results from parallel operations
output - Selects and transforms final output

Common Patterns

Direct VLM Extraction

Fastest path for simple documents:

const flow = createFlow()
  .step('extract', extract({
    provider: vlmProvider,
    schema: invoiceSchema
  }))
  .build();

OCR → LLM Extraction

Most accurate path for text-heavy documents:

const flow = createFlow()
  .step('parse', parse({ provider: ocrProvider }))
  .step('extract', extract({
    provider: llmProvider,
    schema: invoiceSchema
  }))
  .build();

Split → Process Each

Handle multi-document PDFs:

const flow = createFlow()
  .step('split', split({
    provider: vlmProvider,
    schemas: { invoice: invoiceSchema, receipt: receiptSchema }
  }))
  .forEach('process', (doc) =>
    createFlow()
      .step('extract', extract({
        provider: vlmProvider,
        schema: doc.schema
      }))
  )
  .step('combine', combine())
  .build();

Categorize → Route

Route to different schemas:

const flow = createFlow()
  .step('categorize', categorize({
    provider: vlmProvider,
    categories: ['invoice', 'receipt', 'contract']
  }))
  .conditional('extract', (data) => {
    const schemas = {
      invoice: invoiceSchema,
      receipt: receiptSchema,
      contract: contractSchema
    };
    return extract({
      provider: vlmProvider,
      schema: schemas[data.category] || genericSchema
    });
  })
  .build();

Consensus Support

Most provider nodes support consensus voting for improved accuracy:

const flow = createFlow()
  .step('extract', extract({
    provider: vlmProvider,
    schema: invoiceSchema,
    consensus: {
      runs: 3,
      strategy: 'majority'
    }
  }))
  .build();

See Consensus Voting for details.

Node Execution

Each node execution produces:

Output: The transformed data
Metrics: Duration, cost, token usage

const result = await flow.run({ base64: pdf });

// Access per-step metrics
result.metrics.forEach(m => {
  console.log(`${m.step}: ${m.ms}ms, $${m.costUSD}`);
});

// Access intermediate outputs
const parseOutput = result.artifacts['parse'];
const extractOutput = result.artifacts['extract'];

Trigger Node

The trigger node executes a child flow from within a parent flow, enabling flow composition and reusable sub-flows.

Basic Usage

import { createFlow, parse, extract, trigger } from '@doclo/flows';

const flow = createFlow()
  .step('process', trigger({
    flow: () => createFlow()
      .step('parse', parse({ provider: ocrProvider }))
      .step('extract', extract({ provider: vlmProvider, schema }))
  }))
  .build();

Configuration Options

Option	Type	Default	Description
`flow`	`FlowBuilder \| Flow`	Required	Flow to execute
`mapInput`	`function`	-	Transform input before passing to child flow
`providers`	`object`	-	Provider overrides for child flow
`mergeMetrics`	`boolean`	`true`	Merge child metrics into parent
`timeout`	`number`	-	Timeout in milliseconds
`flowId`	`string`	`'anonymous-flow'`	ID for debugging

Input Transformation

Transform input before passing to child flow:

trigger({
  flow: processingFlow,
  mapInput: (input, context) => ({
    document: input,
    category: context.artifacts.categorize.category
  })
})

Provider Overrides

Override providers for the child flow:

trigger({
  flow: processingFlowBuilder,
  providers: {
    vlm: alternateVlmProvider
  }
})

Conditional Flow Routing

Use with conditionals to route to different flows:

.conditional('route', (data, context) => {
  if (context.artifacts.categorize.category === 'invoice') {
    return trigger({ flow: invoiceFlowBuilder });
  }
  return trigger({ flow: receiptFlowBuilder });
})

Circular Dependency Detection

The trigger node automatically detects circular dependencies and enforces a maximum depth (default: 10 levels).

Next Steps

parse

Document parsing with OCR/VLM

extract

Structured data extraction

Flows

Learn about flow orchestration

Providers

Configure AI providers

Getting Started

Concepts

SDK

Doclo Cloud

Guides

Resources

Nodes Overview

Available Nodes

Import

Basic Usage

Node Types

Provider Nodes

Utility Nodes

Common Patterns

Direct VLM Extraction

OCR → LLM Extraction

Split → Process Each

Categorize → Route

Consensus Support

Node Execution

Trigger Node

Basic Usage

Configuration Options

Input Transformation

Provider Overrides

Conditional Flow Routing

Circular Dependency Detection

Next Steps

parse

extract

Flows

Providers

Getting Started

Concepts

SDK

Doclo Cloud

Guides

Resources

​Available Nodes

​Import

​Basic Usage

​Node Types

​Provider Nodes

​Utility Nodes

​Common Patterns

​Direct VLM Extraction

​OCR → LLM Extraction

​Split → Process Each

​Categorize → Route

​Consensus Support

​Node Execution

​Trigger Node

​Basic Usage

​Configuration Options

​Input Transformation

​Provider Overrides

​Conditional Flow Routing

​Circular Dependency Detection

​Next Steps

parse

extract

Flows

Providers

Available Nodes

Import

Basic Usage

Node Types

Provider Nodes

Utility Nodes

Common Patterns

Direct VLM Extraction

OCR → LLM Extraction

Split → Process Each

Categorize → Route

Consensus Support

Node Execution

Trigger Node

Basic Usage

Configuration Options

Input Transformation

Provider Overrides

Conditional Flow Routing

Circular Dependency Detection

Next Steps