Documentation Index
Fetch the complete documentation index at: https://docs.doclo.ai/llms.txt
Use this file to discover all available pages before exploring further.
Reducto provides document parsing with RAG-optimized chunking, schema-based extraction with citations, and multi-document splitting.
Installation
npm install @doclo/providers-reducto
Providers Overview
Reducto offers three services:
| Service | Function | Cost |
|---|
| Parse | Document parsing with chunking | $0.004/page |
| Extract | Schema-based extraction with citations | $0.008/page |
| Split | Multi-document file segmentation | $0.008/page |
Parse Provider
Basic Setup
import { reductoParseProvider } from '@doclo/providers-reducto';
const ocrProvider = reductoParseProvider({
apiKey: process.env.REDUCTO_API_KEY!
});
Configuration Options
reductoParseProvider({
apiKey: string, // Required: API key
// Chunking
chunkMode?: 'variable' | 'page' | 'block',
targetChunkSize?: number, // Target characters per chunk (for variable)
// Table handling
tableFormat?: 'markdown' | 'html' | 'dynamic',
// Page selection
pageRange?: [number, number] | number[], // Page range or specific pages
// Options
forceOCR?: boolean, // Force OCR even for native text
extractImages?: boolean, // Extract embedded images
figureEnrichment?: boolean, // AI-enhanced figure descriptions
})
Chunk Modes
| Mode | Description |
|---|
variable | Semantic chunking based on content boundaries |
page | One chunk per page |
block | One chunk per content block |
// Variable chunking for RAG
const provider = reductoParseProvider({
apiKey: process.env.REDUCTO_API_KEY!,
chunkMode: 'variable',
targetChunkSize: 1000
});
Output: DocumentIR with Chunks
interface DocumentIR {
pages: {
width: number;
height: number;
lines: { text: string; bbox?: object }[];
}[];
extras?: {
raw: object;
costUSD: number;
pageCount: number;
chunks?: ReductoChunk[]; // RAG-ready chunks
};
}
interface ReductoChunk {
content: string; // Chunk text
pageNumber: number; // Source page
blockType: string; // 'text', 'table', 'figure', etc.
bbox: object; // Bounding box
confidence: number; // Confidence score
}
Schema-based extraction with field-level citations.
Basic Setup
import { reductoExtractProvider } from '@doclo/providers-reducto';
const extractProvider = reductoExtractProvider({
apiKey: process.env.REDUCTO_API_KEY!,
citations: true
});
Configuration Options
reductoExtractProvider({
apiKey: string, // Required: API key
citations?: boolean, // Include field citations (default: true)
systemPrompt?: string, // Custom system prompt
})
Usage with Flows
import { createFlow, extract } from '@doclo/flows';
import { reductoExtractProvider } from '@doclo/providers-reducto';
const extractProvider = reductoExtractProvider({
apiKey: process.env.REDUCTO_API_KEY!,
citations: true
});
const invoiceSchema = {
type: 'object',
properties: {
invoiceNumber: { type: 'string' },
total: { type: 'number' },
vendor: { type: 'string' }
}
};
const flow = createFlow()
.step('extract', extract({
provider: extractProvider,
schema: invoiceSchema
}))
.build();
Citations
When citations: true, each field includes source information:
const result = await flow.run({ base64: documentData });
// Result includes citations for each field
// {
// invoiceNumber: "INV-001",
// invoiceNumber_citation: { page: 1, bbox: {...} },
// total: 1250.00,
// total_citation: { page: 1, bbox: {...} }
// }
Split Function
Detect and separate multiple documents in a single file.
Basic Usage
import { splitDocument } from '@doclo/providers-reducto';
const segments = await splitDocument(
{ base64: stackedPdfData },
{
apiKey: process.env.REDUCTO_API_KEY!,
splitDescription: [
{ name: 'Invoice', description: 'Invoice document with totals' },
{ name: 'Receipt', description: 'Payment receipt' }
]
}
);
// Returns array of document segments
for (const segment of segments) {
console.log(`${segment.type}: pages ${segment.startPage}-${segment.endPage}`);
}
Configuration Options
splitDocument(input, {
apiKey: string,
splitDescription: [
{
name: string, // Document type name
description: string // Description for identification
}
],
partitionStrategy?: 'hi_res' | 'fast'
})
Common Document Types
import { COMMON_DOCUMENT_TYPES } from '@doclo/providers-reducto';
// Pre-defined document types
const segments = await splitDocument(
{ base64: documentData },
{
apiKey: process.env.REDUCTO_API_KEY!,
splitDescription: [
COMMON_DOCUMENT_TYPES.INVOICE,
COMMON_DOCUMENT_TYPES.RECEIPT,
COMMON_DOCUMENT_TYPES.CONTRACT
]
}
);
| Format | MIME Type |
|---|
| PDF | application/pdf |
| PNG | image/png |
| JPEG | image/jpeg |
| DOCX | application/vnd.openxmlformats-officedocument.wordprocessingml.document |
| XLSX | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet |
| PPTX | application/vnd.openxmlformats-officedocument.presentationml.presentation |
Pricing
| Service | Credits | USD |
|---|
| Parse | 1 credit/page | $0.004/page |
| Extract | 2 credits/page | $0.008/page |
| Split | 2 credits/page | $0.008/page |
Reducto vs Other Providers
| Feature | Reducto | Surya | Marker |
|---|
| RAG chunking | Yes | No | No |
| Citations | Yes | Via bbox | No |
| Multi-format | Yes | Yes | Limited |
| Document splitting | Yes | No | No |
| Structured extraction | Native | Via LLM | Via LLM |
| Cost/page | $0.004+ | $0.01 | $0.002+ |
Choose Reducto when:
- Building RAG pipelines with chunking
- You need field-level citations
- Processing multi-document files
- Working with spreadsheets/presentations
Example: RAG Pipeline
import { createFlow, parse, chunk } from '@doclo/flows';
import { reductoParseProvider } from '@doclo/providers-reducto';
const parseProvider = reductoParseProvider({
apiKey: process.env.REDUCTO_API_KEY!,
chunkMode: 'variable',
targetChunkSize: 500
});
const flow = createFlow()
.step('parse', parse({ provider: parseProvider }))
.build();
const result = await flow.run({
base64: 'data:application/pdf;base64,...'
});
// Get RAG-ready chunks
const chunks = result.output.extras?.chunks || [];
for (const chunk of chunks) {
console.log(`Chunk (${chunk.blockType}, page ${chunk.pageNumber}):`);
console.log(` ${chunk.content.substring(0, 100)}...`);
console.log(` Confidence: ${chunk.confidence}`);
// Store in vector DB
// await vectorStore.upsert({
// id: `chunk-${chunk.pageNumber}-${index}`,
// content: chunk.content,
// metadata: { pageNumber: chunk.pageNumber, type: chunk.blockType }
// });
}
Example: Multi-Document Processing
import { splitDocument, reductoExtractProvider } from '@doclo/providers-reducto';
import { createFlow, extract } from '@doclo/flows';
// Split stacked document
const segments = await splitDocument(
{ base64: stackedDocument },
{
apiKey: process.env.REDUCTO_API_KEY!,
splitDescription: [
{ name: 'Invoice', description: 'Invoice with line items' },
{ name: 'Receipt', description: 'Payment receipt' }
]
}
);
// Process each segment
const results = [];
for (const segment of segments) {
const flow = createFlow()
.step('extract', extract({
provider: reductoExtractProvider({
apiKey: process.env.REDUCTO_API_KEY!,
citations: true
}),
schema: segment.type === 'Invoice' ? invoiceSchema : receiptSchema
}))
.build();
const result = await flow.run(segment.input);
results.push({
type: segment.type,
pages: `${segment.startPage}-${segment.endPage}`,
data: result.output
});
}
console.log('Processed documents:', results);
Next Steps
Surya OCR
Text with bounding boxes
Marker
Markdown conversion