Documentation Index Fetch the complete documentation index at: https://docs.doclo.ai/llms.txt
Use this file to discover all available pages before exploring further.
The split node uses VLM to identify document boundaries in multi-document PDFs and categorize each document by type.
Basic Usage
import { createFlow , split , extract , combine } from '@doclo/flows' ;
import { createVLMProvider } from '@doclo/providers-llm' ;
const vlmProvider = createVLMProvider ({
provider: 'google' ,
model: 'google/gemini-2.5-flash' ,
apiKey: process . env . OPENROUTER_API_KEY ! ,
via: 'openrouter'
});
const flow = createFlow ()
. step ( 'split' , split ({
provider: vlmProvider ,
schemas: {
invoice: invoiceSchema ,
receipt: receiptSchema
}
}))
. forEach ( 'process' , ( doc ) =>
createFlow ()
. step ( 'extract' , extract ({
provider: vlmProvider ,
schema: doc . schema
}))
)
. step ( 'combine' , combine ())
. build ();
const result = await flow . run ({ base64: multiDocPdf });
// result.output is an array of extracted documents
Configuration Options
split ({
provider: vlmProvider , // Required: VLM provider
schemas: { // Required: Document type schemas
invoice: invoiceSchema ,
receipt: receiptSchema ,
contract: contractSchema
},
includeOther: true , // Include uncategorized documents
consensus: { runs: 3 }, // Multi-run voting for accuracy
reasoning: { enabled: true } // Extended thinking
})
Options Reference
Option Type Default Description providerVLMProviderRequired VLM provider for document analysis schemasRecord<string, object>Required Map of document types to schemas schemaRefstring- Reference to schema registry includeOtherbooleantrueInclude documents that don’t match any type consensusConsensusConfig- Multi-run voting reasoningobject- Extended reasoning options
Output: SplitDocument[]
The split node outputs an array of SplitDocument objects:
interface SplitDocument {
type : string ; // Document type ('invoice', 'receipt', 'other')
schema : object ; // Schema for this document type
pages : number []; // Page numbers in original PDF
input : FlowInput ; // Input for processing this document
}
Processing Split Documents
With forEach
Process each split document with its corresponding schema:
const flow = createFlow ()
. step ( 'split' , split ({
provider: vlmProvider ,
schemas: { invoice: invoiceSchema , receipt: receiptSchema }
}))
. forEach ( 'process' , ( doc ) =>
createFlow ()
. step ( 'extract' , extract ({
provider: vlmProvider ,
schema: doc . schema // Uses the matched schema
}))
)
. step ( 'combine' , combine ())
. build ();
With Conditional Routing
Route to different processing flows:
const flow = createFlow ()
. step ( 'split' , split ({
provider: vlmProvider ,
schemas: { invoice: invoiceSchema , receipt: receiptSchema }
}))
. forEach ( 'process' , ( doc ) =>
createFlow ()
. conditional ( 'extract' , ( d ) => {
if ( d . type === 'invoice' ) {
return extract ({ provider: vlmProvider , schema: invoiceSchema });
} else if ( d . type === 'receipt' ) {
return extract ({ provider: vlmProvider , schema: receiptSchema });
}
return extract ({ provider: vlmProvider , schema: genericSchema });
})
)
. step ( 'combine' , combine ())
. build ();
Schema Registry
Use registered schemas instead of inline:
split ({
provider: vlmProvider ,
schemaRef: 'document-types@1.0.0'
})
The schema registry entry should contain a schemas property:
// Registry entry structure
{
id : 'document-types' ,
version : '1.0.0' ,
schema : {
schemas : {
invoice : { /* schema */ },
receipt : { /* schema */ }
}
}
}
Handling “Other” Documents
By default, documents that don’t match any defined type are categorized as “other”:
split ({
provider: vlmProvider ,
schemas: { invoice: invoiceSchema },
includeOther: true // Default: true
})
Set includeOther: false to exclude unrecognized documents:
split ({
provider: vlmProvider ,
schemas: { invoice: invoiceSchema },
includeOther: false // Skip unrecognized documents
})
Extended Reasoning
Enable for complex document analysis:
split ({
provider: vlmProvider ,
schemas: { invoice: invoiceSchema , receipt: receiptSchema },
reasoning: {
enabled: true ,
effort: 'high'
}
})
Example: Insurance Document Bundle
const insuranceFlow = createFlow ()
. step ( 'split' , split ({
provider: vlmProvider ,
schemas: {
claim_form: claimSchema ,
medical_report: medicalSchema ,
receipt: receiptSchema ,
id_document: idSchema
}
}))
. forEach ( 'extract' , ( doc ) =>
createFlow ()
. step ( 'extract' , extract ({
provider: vlmProvider ,
schema: doc . schema
}))
)
. step ( 'combine' , combine ({ strategy: 'merge' }))
. build ();
const result = await insuranceFlow . run ({ base64: documentBundle });
// result.output contains all extracted documents
// [
// { type: 'claim_form', data: { ... } },
// { type: 'medical_report', data: { ... } },
// { type: 'receipt', data: { ... } }
// ]
Next Steps
combine Merge split document results
Flows Learn about forEach and conditional