What are Nodes?
Nodes are deterministic operations with consistent behavior:- Stateless: No side effects, same input always produces same output
- Configurable: Accept configuration objects to customize behavior
- Composable: Chain together in flows via
.step()and.conditional() - Type-safe: Inputs/outputs are validated at compile time
Core Nodes
Doclo provides 8 core nodes for document operations:- parse: Convert documents to DocumentIR using OCR or VLM
- extract: Extract structured data matching a JSON Schema
- split: Identify document boundaries in multi-doc PDFs
- categorize: Classify documents into predefined categories
- chunk: Split documents into chunks for RAG/embeddings
- combine: Merge results from parallel processing
- trigger: Execute child flows for composition
- output: Explicit output selection and transformation
Node Inputs and Outputs
- parse: Input: FlowInput (PDF, image, DOCX, etc.) | Output: DocumentIR
- extract: Input: DocumentIR, FlowInput, or ChunkOutput | Output: JSON
- split: Input: FlowInput | Output: SplitDocument[]
- categorize: Input: DocumentIR or FlowInput | Output:
- chunk: Input: DocumentIR or DocumentIR[] | Output: ChunkOutput
- combine: Input: T[] | Output: T or T[]
- trigger: Input: Any | Output: Child flow output
- output: Input: Any | Output: Any
{ base64: '...' } or { url: '...' }. See Documents for the full format list.
Using Nodes
Import nodes from@doclo/flows:
Basic Usage
Chaining Nodes
Common Patterns
Parse → Extract
Most accurate path for text-heavy documents. UseinputMode: 'ir' with LLM providers since they process text from DocumentIR:
Direct VLM Extract
Fastest path for simple documents:Split → Process Each
Handle multi-document PDFs:Categorize → Route
Route documents to different extractors:Next Steps
- Nodes Reference - Detailed documentation for each node
- Flows - Learn about flow orchestration