chunk node splits a parsed DocumentIR into smaller pieces, useful for RAG pipelines, embedding generation, and processing documents that exceed context limits.
Basic Usage
Configuration Options
Chunking Strategies
Recursive (Default)
Splits by hierarchical separators, respecting natural boundaries:| Option | Type | Default | Description |
|---|---|---|---|
maxSize | number | 1000 | Maximum characters per chunk |
minSize | number | 100 | Minimum characters per chunk |
overlap | number | 0 | Character overlap between chunks |
separators | string[] | ['\n\n', '\n', '. ', ' '] | Separator hierarchy |
Section
Splits by document sections (headers, chapters):| Option | Type | Default | Description |
|---|---|---|---|
maxSize | number | 2000 | Maximum characters per section chunk |
minSize | number | 100 | Minimum characters per section chunk |
Page
Splits by page boundaries:| Option | Type | Default | Description |
|---|---|---|---|
pagesPerChunk | number | 1 | Pages per chunk |
combineShortPages | boolean | true | Combine short pages together |
minPageContent | number | 100 | Minimum content to keep a page |
Fixed
Splits at fixed intervals:| Option | Type | Default | Description |
|---|---|---|---|
size | number | 512 | Fixed size per chunk |
unit | 'characters' | 'tokens' | 'characters' | Size unit |
overlap | number | 0 | Overlap between chunks |
Output: ChunkOutput
Use Cases
RAG Pipeline
Chunk for retrieval-augmented generation:Large Document Processing
Process documents exceeding context limits:Overlap for Context
Use overlap to maintain context across chunk boundaries:Preserving Source Information
Chunk metadata includes page numbers and positions for citation mapping:Next Steps
parse
Parse documents before chunking
combine
Merge results from chunked processing