Split Node

The split node uses VLM to identify document boundaries in multi-document PDFs and categorize each document by type.

Basic Usage

import { createFlow, split, extract, combine } from '@doclo/flows';
import { createVLMProvider } from '@doclo/providers-llm';

const vlmProvider = createVLMProvider({
  provider: 'google',
  model: 'google/gemini-2.5-flash',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

const flow = createFlow()
  .step('split', split({
    provider: vlmProvider,
    schemas: {
      invoice: invoiceSchema,
      receipt: receiptSchema
    }
  }))
  .forEach('process', (doc) =>
    createFlow()
      .step('extract', extract({
        provider: vlmProvider,
        schema: doc.schema
      }))
  )
  .step('combine', combine())
  .build();

const result = await flow.run({ base64: multiDocPdf });
// result.output is an array of extracted documents

Configuration Options

split({
  provider: vlmProvider,           // Required: VLM provider
  schemas: {                       // Required: Document type schemas
    invoice: invoiceSchema,
    receipt: receiptSchema,
    contract: contractSchema
  },
  includeOther: true,              // Include uncategorized documents
  consensus: { runs: 3 },          // Multi-run voting for accuracy
  reasoning: { enabled: true }     // Extended thinking
})

Options Reference

Option	Type	Default	Description
`provider`	`VLMProvider`	Required	VLM provider for document analysis
`schemas`	`Record<string, object>`	Required	Map of document types to schemas
`schemaRef`	`string`	-	Reference to schema registry
`includeOther`	`boolean`	`true`	Include documents that don’t match any type
`consensus`	`ConsensusConfig`	-	Multi-run voting
`reasoning`	`object`	-	Extended reasoning options

Output: SplitDocument[]

The split node outputs an array of SplitDocument objects:

interface SplitDocument {
  type: string;           // Document type ('invoice', 'receipt', 'other')
  schema: object;         // Schema for this document type
  pages: number[];        // Page numbers in original PDF
  input: FlowInput;       // Input for processing this document
}

Processing Split Documents

With forEach

Process each split document with its corresponding schema:

const flow = createFlow()
  .step('split', split({
    provider: vlmProvider,
    schemas: { invoice: invoiceSchema, receipt: receiptSchema }
  }))
  .forEach('process', (doc) =>
    createFlow()
      .step('extract', extract({
        provider: vlmProvider,
        schema: doc.schema  // Uses the matched schema
      }))
  )
  .step('combine', combine())
  .build();

With Conditional Routing

Route to different processing flows:

const flow = createFlow()
  .step('split', split({
    provider: vlmProvider,
    schemas: { invoice: invoiceSchema, receipt: receiptSchema }
  }))
  .forEach('process', (doc) =>
    createFlow()
      .conditional('extract', (d) => {
        if (d.type === 'invoice') {
          return extract({ provider: vlmProvider, schema: invoiceSchema });
        } else if (d.type === 'receipt') {
          return extract({ provider: vlmProvider, schema: receiptSchema });
        }
        return extract({ provider: vlmProvider, schema: genericSchema });
      })
  )
  .step('combine', combine())
  .build();

Schema Registry

Use registered schemas instead of inline:

split({
  provider: vlmProvider,
  schemaRef: 'document-types@1.0.0'
})

The schema registry entry should contain a schemas property:

// Registry entry structure
{
  id: 'document-types',
  version: '1.0.0',
  schema: {
    schemas: {
      invoice: { /* schema */ },
      receipt: { /* schema */ }
    }
  }
}

Handling “Other” Documents

By default, documents that don’t match any defined type are categorized as “other”:

split({
  provider: vlmProvider,
  schemas: { invoice: invoiceSchema },
  includeOther: true  // Default: true
})

Set includeOther: false to exclude unrecognized documents:

split({
  provider: vlmProvider,
  schemas: { invoice: invoiceSchema },
  includeOther: false  // Skip unrecognized documents
})

Extended Reasoning

Enable for complex document analysis:

split({
  provider: vlmProvider,
  schemas: { invoice: invoiceSchema, receipt: receiptSchema },
  reasoning: {
    enabled: true,
    effort: 'high'
  }
})

Example: Insurance Document Bundle

const insuranceFlow = createFlow()
  .step('split', split({
    provider: vlmProvider,
    schemas: {
      claim_form: claimSchema,
      medical_report: medicalSchema,
      receipt: receiptSchema,
      id_document: idSchema
    }
  }))
  .forEach('extract', (doc) =>
    createFlow()
      .step('extract', extract({
        provider: vlmProvider,
        schema: doc.schema
      }))
  )
  .step('combine', combine({ strategy: 'merge' }))
  .build();

const result = await insuranceFlow.run({ base64: documentBundle });

// result.output contains all extracted documents
// [
//   { type: 'claim_form', data: { ... } },
//   { type: 'medical_report', data: { ... } },
//   { type: 'receipt', data: { ... } }
// ]

Getting Started

Concepts

SDK

Doclo Cloud

Guides

Resources

Basic Usage

Configuration Options

Options Reference

Output: SplitDocument[]

Processing Split Documents

With forEach

With Conditional Routing

Schema Registry

Handling “Other” Documents

Extended Reasoning

Example: Insurance Document Bundle

Next Steps

combine

Flows

Getting Started

Concepts

SDK

Doclo Cloud

Guides

Resources

​Basic Usage

​Configuration Options

​Options Reference

​Output: SplitDocument[]

​Processing Split Documents

​With forEach

​With Conditional Routing

​Schema Registry

​Handling “Other” Documents

​Extended Reasoning

​Example: Insurance Document Bundle

​Next Steps

combine

Flows

Basic Usage

Configuration Options

Options Reference

Output: SplitDocument[]

Processing Split Documents

With forEach

With Conditional Routing

Schema Registry

Handling “Other” Documents

Extended Reasoning

Example: Insurance Document Bundle

Next Steps