Documentation Index Fetch the complete documentation index at: https://docs.doclo.ai/llms.txt
Use this file to discover all available pages before exploring further.
The categorize node uses VLM to classify documents into one of your predefined categories, enabling conditional routing in flows.
Basic Usage
import { createFlow , categorize , extract } from '@doclo/flows' ;
import { createVLMProvider } from '@doclo/providers-llm' ;
const vlmProvider = createVLMProvider ({
provider: 'google' ,
model: 'google/gemini-2.5-flash' ,
apiKey: process . env . OPENROUTER_API_KEY ! ,
via: 'openrouter'
});
const flow = createFlow ()
. step ( 'categorize' , categorize ({
provider: vlmProvider ,
categories: [ 'invoice' , 'receipt' , 'contract' , 'other' ]
}))
. conditional ( 'extract' , ( data ) => {
switch ( data . category ) {
case 'invoice' :
return extract ({ provider: vlmProvider , schema: invoiceSchema });
case 'receipt' :
return extract ({ provider: vlmProvider , schema: receiptSchema });
default :
return extract ({ provider: vlmProvider , schema: genericSchema });
}
})
. build ();
const result = await flow . run ({ base64: documentData });
Configuration Options
categorize ({
provider: vlmProvider , // Required: VLM provider
categories: [ 'invoice' , 'receipt' , 'contract' ], // Required: Category list
additionalPrompt: '...' , // Custom categorization instructions
consensus: { runs: 3 }, // Multi-run voting for accuracy
reasoning: { enabled: true } // Extended thinking
})
Options Reference
Option Type Description providerVLMProviderRequired. VLM provider for classification categoriesstring[]Required. List of valid categories additionalPromptstringCustom categorization instructions consensusConsensusConfigMulti-run voting configuration reasoningobjectExtended reasoning options promptRefstringReference to prompt asset promptVariablesobjectVariables for prompt rendering additionalInstructionsstringExtra instructions for the prompt
Output
The categorize node outputs:
{
input : DocumentIR | FlowInput ; // Original input (passed through)
category : string ; // Matched category
}
Conditional Routing
The main use case for categorize is routing documents to different extraction schemas:
const flow = createFlow ()
. step ( 'categorize' , categorize ({
provider: vlmProvider ,
categories: [ 'invoice' , 'receipt' , 'contract' ]
}))
. conditional ( 'extract' , ( data ) => {
const schemas : Record < string , object > = {
invoice: invoiceSchema ,
receipt: receiptSchema ,
contract: contractSchema
};
return extract ({
provider: vlmProvider ,
schema: schemas [ data . category ] || genericSchema
});
})
. build ();
Custom Instructions
Provide guidance for categorization:
categorize ({
provider: vlmProvider ,
categories: [ 'invoice' , 'receipt' , 'quote' , 'purchase_order' ],
additionalInstructions: `
- Invoices have "Invoice" in the header and show amounts due
- Receipts show "Paid" or "Payment Received"
- Quotes contain pricing but no payment due
- Purchase orders have "PO Number" and request goods/services
`
})
With Parsed Documents
Categorize works with both raw documents and parsed DocumentIR:
// Direct categorization (VLM)
const directFlow = createFlow ()
. step ( 'categorize' , categorize ({
provider: vlmProvider ,
categories: [ 'invoice' , 'receipt' ]
}))
. build ();
// Parse first, then categorize
const parsedFlow = createFlow ()
. step ( 'parse' , parse ({ provider: ocrProvider }))
. step ( 'categorize' , categorize ({
provider: vlmProvider ,
categories: [ 'invoice' , 'receipt' ]
}))
. build ();
Consensus Voting
Improve classification accuracy with multiple runs:
categorize ({
provider: vlmProvider ,
categories: [ 'invoice' , 'receipt' , 'contract' ],
consensus: {
runs: 3 ,
strategy: 'majority'
}
})
Extended Reasoning
Enable for documents that are difficult to classify:
categorize ({
provider: vlmProvider ,
categories: [ 'invoice' , 'receipt' , 'contract' ],
reasoning: {
enabled: true ,
effort: 'medium'
}
})
Example: Multi-Type Document Processing
const documentProcessor = createFlow ()
. step ( 'categorize' , categorize ({
provider: vlmProvider ,
categories: [
'invoice' ,
'receipt' ,
'bank_statement' ,
'tax_form' ,
'unknown'
]
}))
. conditional ( 'process' , ( data ) => {
switch ( data . category ) {
case 'invoice' :
return extract ({ provider: vlmProvider , schema: invoiceSchema });
case 'receipt' :
return extract ({ provider: vlmProvider , schema: receiptSchema });
case 'bank_statement' :
return extract ({ provider: vlmProvider , schema: bankStatementSchema });
case 'tax_form' :
return extract ({ provider: vlmProvider , schema: taxFormSchema });
default :
// Return basic metadata for unknown documents
return extract ({
provider: vlmProvider ,
schema: {
type: 'object' ,
properties: {
title: { type: 'string' },
date: { type: 'string' },
summary: { type: 'string' }
}
}
});
}
})
. build ();
Next Steps
extract Extract data after categorization
Flows Learn about conditional routing