The extract node uses AI to extract structured data from documents according to a JSON Schema. It works with both raw documents (via VLM) and parsed DocumentIR (via LLM).
Basic Usage
import { createFlow , extract } from '@doclo/flows' ;
import { createVLMProvider } from '@doclo/providers-llm' ;
const vlmProvider = createVLMProvider ({
provider: 'google' ,
model: 'google/gemini-2.5-flash' ,
apiKey: process . env . OPENROUTER_API_KEY ! ,
via: 'openrouter'
});
const invoiceSchema = {
type: 'object' ,
properties: {
invoiceNumber: { type: 'string' },
totalAmount: { type: 'number' },
date: { type: 'string' }
},
required: [ 'invoiceNumber' , 'totalAmount' ]
};
const flow = createFlow ()
. step ( 'extract' , extract ({
provider: vlmProvider ,
schema: invoiceSchema
}))
. build ();
const result = await flow . run ({ base64: pdfDataUrl });
// result.output matches invoiceSchema
Configuration Options
extract ({
provider: vlmProvider , // Required: VLM or LLM provider
schema: invoiceSchema , // Required: JSON Schema for output
citations: { enabled: true }, // Enable source tracking
consensus: { runs: 3 }, // Multi-run voting for accuracy
reasoning: { enabled: true }, // Extended thinking (supported providers)
additionalInstructions: '...' // Custom extraction guidance
})
Options Reference
Option Type Default Description providerVLMProvider | LLMProviderRequired AI provider for extraction schemaobject | { ref: string }Required JSON Schema or registry reference inputMode'auto' | 'ir' | 'ir+source' | 'source''auto'Controls what inputs the node ingests preferVisualbooleantrueWhen auto mode, prefer multimodal extraction useOriginalSourcebooleanfalseUse original unsplit document in forEach contexts citationsCitationConfig- Citation/source tracking consensusConsensusConfig- Multi-run voting configuration reasoningobject- Extended reasoning options additionalInstructionsstring- Custom extraction guidance promptRefstring- Reference to prompt asset promptVariablesobject- Variables for prompt rendering maxTokensnumber- Maximum tokens for LLM response
The inputMode option controls what input the extract node uses for extraction. This is one of the most important configuration options for optimizing accuracy and cost.
Mode Options
Mode Description Provider Best For autoAutomatically detect and route (default) VLM or LLM Most use cases irText-only from parsed DocumentIR LLM or VLM Cost-effective, text-heavy docs ir+sourceBoth parsed text AND source images VLM only Maximum accuracy, complex layouts sourceDirect from raw document VLM only Simple docs without prior parsing
Auto Mode (Default)
Auto mode intelligently selects the best extraction path:
extract ({
provider: vlmProvider ,
schema: invoiceSchema ,
inputMode: 'auto' , // Default - automatically determines best mode
preferVisual: true // When both IR and source available, use ir+source
})
Auto mode decision tree:
If DocumentIR + source available + VLM provider + preferVisual: true → ir+source
If only DocumentIR available → ir
If only FlowInput (raw document) + VLM provider → source
IR Mode (Text-Only)
Use parsed text only, ignoring visual context:
const flow = createFlow ()
. step ( 'parse' , parse ({ provider: ocrProvider }))
. step ( 'extract' , extract ({
provider: llmProvider , // LLM is sufficient for text-only
schema: invoiceSchema ,
inputMode: 'ir'
}))
. build ();
Best for:
Text-heavy documents (contracts, reports)
Cost optimization (LLM is cheaper than VLM)
When OCR accuracy is sufficient
IR+Source Mode (Hybrid)
Combine parsed text with visual context for maximum accuracy:
const flow = createFlow ()
. step ( 'parse' , parse ({ provider: ocrProvider }))
. step ( 'extract' , extract ({
provider: vlmProvider , // VLM required for multimodal
schema: invoiceSchema ,
inputMode: 'ir+source'
}))
. build ();
Best for:
Complex layouts (tables, forms with checkboxes)
Documents with visual elements (signatures, stamps)
When highest accuracy is required
Source Mode (Direct VLM)
Skip parsing entirely, extract directly from raw document:
const flow = createFlow ()
. step ( 'extract' , extract ({
provider: vlmProvider , // VLM required
schema: invoiceSchema ,
inputMode: 'source'
}))
. build ();
Best for:
Simple, well-structured documents
When OCR adds no value (clean PDFs)
Fastest processing time
Using Original Source in forEach
When processing split documents, use useOriginalSource to reference the full document instead of individual segments:
. forEach ( 'process' , ( doc ) =>
createFlow ()
. step ( 'extract' , extract ({
provider: vlmProvider ,
schema: doc . schema ,
inputMode: 'ir+source' ,
useOriginalSource: true // Use full document, not segment
}))
)
The extract node accepts different input types depending on the configured mode:
Raw Documents (VLM)
Direct extraction from PDFs or images:
// VLM provider processes the document directly
const flow = createFlow ()
. step ( 'extract' , extract ({
provider: vlmProvider , // Must be VLM for raw input
schema: invoiceSchema
}))
. build ();
await flow . run ({ base64: pdfDataUrl });
Parsed Documents (LLM)
Extract from previously parsed DocumentIR:
// Parse first, then extract with LLM
const flow = createFlow ()
. step ( 'parse' , parse ({ provider: ocrProvider }))
. step ( 'extract' , extract ({
provider: llmProvider , // Can use LLM for text input
schema: invoiceSchema ,
inputMode: 'ir' // Explicitly use text-only mode
}))
. build ();
Schema Definition
Basic Schema
const schema = {
type: 'object' ,
properties: {
invoiceNumber: {
type: 'string' ,
description: 'Invoice number or reference ID'
},
vendor: {
type: 'object' ,
properties: {
name: { type: 'string' , description: 'Company name' },
address: { type: 'string' , description: 'Full address' }
}
},
totalAmount: {
type: 'number' ,
description: 'Total invoice amount'
},
lineItems: {
type: 'array' ,
items: {
type: 'object' ,
properties: {
description: { type: 'string' },
quantity: { type: 'number' },
amount: { type: 'number' }
}
}
}
},
required: [ 'invoiceNumber' , 'totalAmount' ]
};
Schema Registry Reference
Use registered schemas:
extract ({
provider: vlmProvider ,
schema: { ref: 'invoice@1.0.0' }
})
Enhanced Schema
Include examples and extraction guidance:
const enhancedSchema = {
schema: invoiceSchema ,
contextPrompt: 'This is a maritime bunker delivery note' ,
extractionRules: 'Focus on the delivery summary table' ,
examples: [
{
description: 'Standard invoice' ,
input: 'Invoice #: INV-001 \n Total: $1,250.00' ,
output: { invoiceNumber: 'INV-001' , totalAmount: 1250.00 }
}
]
};
extract ({
provider: vlmProvider ,
schema: enhancedSchema
})
Citation Tracking
Track which parts of the source document contributed to each field:
extract ({
provider: vlmProvider ,
schema: invoiceSchema ,
citations: {
enabled: true ,
detectInferred: true // Flag calculated/inferred values
}
})
Output includes citation metadata:
interface OutputWithCitations < T > {
data : T ; // Extracted data
citations : {
[ fieldPath : string ] : {
lineIds : string []; // Source line IDs (e.g., 'p1_l5')
confidence : number ; // 0-1 confidence score
inferred ?: boolean ; // True if value was calculated
reasoning ?: string ; // Explanation for inferred values
};
};
}
Consensus Voting
Run extraction multiple times and vote on results:
extract ({
provider: vlmProvider ,
schema: invoiceSchema ,
consensus: {
runs: 3 , // Number of extraction runs
strategy: 'majority' , // Voting strategy
threshold: 0.6 // Minimum agreement threshold
}
})
See Consensus Voting for strategies and configuration.
Extended Reasoning
Enable chain-of-thought reasoning for complex extractions:
extract ({
provider: vlmProvider ,
schema: invoiceSchema ,
reasoning: {
enabled: true ,
effort: 'high' , // 'low' | 'medium' | 'high'
exclude: false // Include reasoning in output
}
})
Extended reasoning improves accuracy for complex documents but increases latency and cost.
Custom Instructions
Add extraction guidance:
extract ({
provider: vlmProvider ,
schema: invoiceSchema ,
additionalInstructions: `
- Be strict with date formats. Use YYYY-MM-DD format only.
- For amounts, preserve exact decimal precision.
- If a field is partially visible, extract what's readable.
`
})
Use TypeScript generics for typed output:
interface Invoice {
invoiceNumber : string ;
totalAmount : number ;
lineItems ?: Array <{
description : string ;
amount : number ;
}>;
}
const flow = createFlow ()
. step ( 'extract' , extract < Invoice >({
provider: vlmProvider ,
schema: invoiceSchema
}))
. build ();
const result = await flow . run ({ base64: pdf });
// result.output is typed as Invoice
Error Handling
Extraction may fail if:
Document cannot be read
Schema cannot be satisfied
Provider returns invalid response
Handle errors:
try {
const result = await flow . run ({ base64: pdf });
} catch ( error ) {
if ( error . code === 'SCHEMA_VALIDATION_FAILED' ) {
console . error ( 'Extracted data does not match schema' );
}
}
Next Steps
Schemas Learn about schema definition
Consensus Voting Improve accuracy with multi-run voting
Citations Track extraction sources
Providers Configure LLM/VLM providers