Skip to main content
Reducto provides document parsing with RAG-optimized chunking, schema-based extraction with citations, and multi-document splitting.

Installation

npm install @docloai/providers-reducto

Providers Overview

Reducto offers three services:
ServiceFunctionCost
ParseDocument parsing with chunking$0.004/page
ExtractSchema-based extraction with citations$0.008/page
SplitMulti-document file segmentation$0.008/page

Parse Provider

Basic Setup

import { reductoParseProvider } from '@docloai/providers-reducto';

const ocrProvider = reductoParseProvider({
  apiKey: process.env.REDUCTO_API_KEY!
});

Configuration Options

reductoParseProvider({
  apiKey: string,              // Required: API key

  // Chunking
  chunkMode?: 'variable' | 'page' | 'block',
  targetChunkSize?: number,    // Target characters per chunk (for variable)

  // Table handling
  tableFormat?: 'markdown' | 'html' | 'dynamic',

  // Page selection
  pageRange?: [number, number] | number[],  // Page range or specific pages

  // Options
  forceOCR?: boolean,          // Force OCR even for native text
  extractImages?: boolean,     // Extract embedded images
  figureEnrichment?: boolean,  // AI-enhanced figure descriptions
})

Chunk Modes

ModeDescription
variableSemantic chunking based on content boundaries
pageOne chunk per page
blockOne chunk per content block
// Variable chunking for RAG
const provider = reductoParseProvider({
  apiKey: process.env.REDUCTO_API_KEY!,
  chunkMode: 'variable',
  targetChunkSize: 1000
});

Output: DocumentIR with Chunks

interface DocumentIR {
  pages: {
    width: number;
    height: number;
    lines: { text: string; bbox?: object }[];
  }[];
  extras?: {
    raw: object;
    costUSD: number;
    pageCount: number;
    chunks?: ReductoChunk[];  // RAG-ready chunks
  };
}

interface ReductoChunk {
  content: string;            // Chunk text
  pageNumber: number;         // Source page
  blockType: string;          // 'text', 'table', 'figure', etc.
  bbox: object;               // Bounding box
  confidence: number;         // Confidence score
}

Extract Provider

Schema-based extraction with field-level citations.

Basic Setup

import { reductoExtractProvider } from '@docloai/providers-reducto';

const extractProvider = reductoExtractProvider({
  apiKey: process.env.REDUCTO_API_KEY!,
  citations: true
});

Configuration Options

reductoExtractProvider({
  apiKey: string,              // Required: API key
  citations?: boolean,         // Include field citations (default: true)
  systemPrompt?: string,       // Custom system prompt
})

Usage with Flows

import { createFlow, extract } from '@docloai/flows';
import { reductoExtractProvider } from '@docloai/providers-reducto';

const extractProvider = reductoExtractProvider({
  apiKey: process.env.REDUCTO_API_KEY!,
  citations: true
});

const invoiceSchema = {
  type: 'object',
  properties: {
    invoiceNumber: { type: 'string' },
    total: { type: 'number' },
    vendor: { type: 'string' }
  }
};

const flow = createFlow()
  .step('extract', extract({
    provider: extractProvider,
    schema: invoiceSchema
  }))
  .build();

Citations

When citations: true, each field includes source information:
const result = await flow.run({ base64: documentData });

// Result includes citations for each field
// {
//   invoiceNumber: "INV-001",
//   invoiceNumber_citation: { page: 1, bbox: {...} },
//   total: 1250.00,
//   total_citation: { page: 1, bbox: {...} }
// }

Split Function

Detect and separate multiple documents in a single file.

Basic Usage

import { splitDocument } from '@docloai/providers-reducto';

const segments = await splitDocument(
  { base64: stackedPdfData },
  {
    apiKey: process.env.REDUCTO_API_KEY!,
    splitDescription: [
      { name: 'Invoice', description: 'Invoice document with totals' },
      { name: 'Receipt', description: 'Payment receipt' }
    ]
  }
);

// Returns array of document segments
for (const segment of segments) {
  console.log(`${segment.type}: pages ${segment.startPage}-${segment.endPage}`);
}

Configuration Options

splitDocument(input, {
  apiKey: string,
  splitDescription: [
    {
      name: string,           // Document type name
      description: string     // Description for identification
    }
  ],
  partitionStrategy?: 'hi_res' | 'fast'
})

Common Document Types

import { COMMON_DOCUMENT_TYPES } from '@docloai/providers-reducto';

// Pre-defined document types
const segments = await splitDocument(
  { base64: documentData },
  {
    apiKey: process.env.REDUCTO_API_KEY!,
    splitDescription: [
      COMMON_DOCUMENT_TYPES.INVOICE,
      COMMON_DOCUMENT_TYPES.RECEIPT,
      COMMON_DOCUMENT_TYPES.CONTRACT
    ]
  }
);

Supported Formats

FormatMIME Type
PDFapplication/pdf
PNGimage/png
JPEGimage/jpeg
DOCXapplication/vnd.openxmlformats-officedocument.wordprocessingml.document
XLSXapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheet
PPTXapplication/vnd.openxmlformats-officedocument.presentationml.presentation

Pricing

ServiceCreditsUSD
Parse1 credit/page$0.004/page
Extract2 credits/page$0.008/page
Split2 credits/page$0.008/page

Reducto vs Other Providers

FeatureReductoSuryaMarker
RAG chunkingYesNoNo
CitationsYesVia bboxNo
Multi-formatYesYesLimited
Document splittingYesNoNo
Structured extractionNativeVia LLMVia LLM
Cost/page$0.004+$0.01$0.002+
Choose Reducto when:
  • Building RAG pipelines with chunking
  • You need field-level citations
  • Processing multi-document files
  • Working with spreadsheets/presentations

Example: RAG Pipeline

import { createFlow, parse, chunk } from '@docloai/flows';
import { reductoParseProvider } from '@docloai/providers-reducto';

const parseProvider = reductoParseProvider({
  apiKey: process.env.REDUCTO_API_KEY!,
  chunkMode: 'variable',
  targetChunkSize: 500
});

const flow = createFlow()
  .step('parse', parse({ provider: parseProvider }))
  .build();

const result = await flow.run({
  base64: 'data:application/pdf;base64,...'
});

// Get RAG-ready chunks
const chunks = result.output.extras?.chunks || [];

for (const chunk of chunks) {
  console.log(`Chunk (${chunk.blockType}, page ${chunk.pageNumber}):`);
  console.log(`  ${chunk.content.substring(0, 100)}...`);
  console.log(`  Confidence: ${chunk.confidence}`);

  // Store in vector DB
  // await vectorStore.upsert({
  //   id: `chunk-${chunk.pageNumber}-${index}`,
  //   content: chunk.content,
  //   metadata: { pageNumber: chunk.pageNumber, type: chunk.blockType }
  // });
}

Example: Multi-Document Processing

import { splitDocument, reductoExtractProvider } from '@docloai/providers-reducto';
import { createFlow, extract } from '@docloai/flows';

// Split stacked document
const segments = await splitDocument(
  { base64: stackedDocument },
  {
    apiKey: process.env.REDUCTO_API_KEY!,
    splitDescription: [
      { name: 'Invoice', description: 'Invoice with line items' },
      { name: 'Receipt', description: 'Payment receipt' }
    ]
  }
);

// Process each segment
const results = [];
for (const segment of segments) {
  const flow = createFlow()
    .step('extract', extract({
      provider: reductoExtractProvider({
        apiKey: process.env.REDUCTO_API_KEY!,
        citations: true
      }),
      schema: segment.type === 'Invoice' ? invoiceSchema : receiptSchema
    }))
    .build();

  const result = await flow.run(segment.input);
  results.push({
    type: segment.type,
    pages: `${segment.startPage}-${segment.endPage}`,
    data: result.output
  });
}

console.log('Processed documents:', results);

Next Steps