chunk node splits a parsed DocumentIR into smaller pieces, useful for RAG pipelines, embedding generation, and processing documents that exceed context limits.
Basic Usage
Configuration Options
Chunking Strategies
Recursive (Default)
Splits by hierarchical separators, respecting natural boundaries:| Option | Type | Default | Description |
|---|---|---|---|
maxSize | number | 1000 | Maximum characters per chunk |
minSize | number | 100 | Minimum characters per chunk |
overlap | number | 0 | Character overlap between chunks |
separators | string[] | ['\n\n', '\n', '. ', ' '] | Separator hierarchy |
Section
Splits by document sections (headers, chapters):| Option | Type | Default | Description |
|---|---|---|---|
maxSize | number | 2000 | Maximum characters per section chunk |
minSize | number | 100 | Minimum characters per section chunk |
Page
Splits by page boundaries:| Option | Type | Default | Description |
|---|---|---|---|
pagesPerChunk | number | 1 | Pages per chunk |
combineShortPages | boolean | true | Combine short pages together |
minPageContent | number | 100 | Minimum content to keep a page |
Fixed
Splits at fixed intervals:| Option | Type | Default | Description |
|---|---|---|---|
size | number | 512 | Fixed size per chunk |
unit | 'characters' | 'tokens' | 'characters' | Size unit |
overlap | number | 0 | Overlap between chunks |