Reducto

Reducto provides document parsing with RAG-optimized chunking, schema-based extraction with citations, and multi-document splitting.

Installation

npm install @doclo/providers-reducto

Providers Overview

Reducto offers three services:

Service	Function	Cost
Parse	Document parsing with chunking	$0.004/page
Extract	Schema-based extraction with citations	$0.008/page
Split	Multi-document file segmentation	$0.008/page

Parse Provider

Basic Setup

import { reductoParseProvider } from '@doclo/providers-reducto';

const ocrProvider = reductoParseProvider({
  apiKey: process.env.REDUCTO_API_KEY!
});

Configuration Options

reductoParseProvider({
  apiKey: string,              // Required: API key

  // Chunking
  chunkMode?: 'variable' | 'page' | 'block',
  targetChunkSize?: number,    // Target characters per chunk (for variable)

  // Table handling
  tableFormat?: 'markdown' | 'html' | 'dynamic',

  // Page selection
  pageRange?: [number, number] | number[],  // Page range or specific pages

  // Options
  forceOCR?: boolean,          // Force OCR even for native text
  extractImages?: boolean,     // Extract embedded images
  figureEnrichment?: boolean,  // AI-enhanced figure descriptions
})

Chunk Modes

Mode	Description
`variable`	Semantic chunking based on content boundaries
`page`	One chunk per page
`block`	One chunk per content block

// Variable chunking for RAG
const provider = reductoParseProvider({
  apiKey: process.env.REDUCTO_API_KEY!,
  chunkMode: 'variable',
  targetChunkSize: 1000
});

Output: DocumentIR with Chunks

interface DocumentIR {
  pages: {
    width: number;
    height: number;
    lines: { text: string; bbox?: object }[];
  }[];
  extras?: {
    raw: object;
    costUSD: number;
    pageCount: number;
    chunks?: ReductoChunk[];  // RAG-ready chunks
  };
}

interface ReductoChunk {
  content: string;            // Chunk text
  pageNumber: number;         // Source page
  blockType: string;          // 'text', 'table', 'figure', etc.
  bbox: object;               // Bounding box
  confidence: number;         // Confidence score
}

Extract Provider

Schema-based extraction with field-level citations.

Basic Setup

import { reductoExtractProvider } from '@doclo/providers-reducto';

const extractProvider = reductoExtractProvider({
  apiKey: process.env.REDUCTO_API_KEY!,
  citations: true
});

Configuration Options

reductoExtractProvider({
  apiKey: string,              // Required: API key
  citations?: boolean,         // Include field citations (default: true)
  systemPrompt?: string,       // Custom system prompt
})

Usage with Flows

import { createFlow, extract } from '@doclo/flows';
import { reductoExtractProvider } from '@doclo/providers-reducto';

const extractProvider = reductoExtractProvider({
  apiKey: process.env.REDUCTO_API_KEY!,
  citations: true
});

const invoiceSchema = {
  type: 'object',
  properties: {
    invoiceNumber: { type: 'string' },
    total: { type: 'number' },
    vendor: { type: 'string' }
  }
};

const flow = createFlow()
  .step('extract', extract({
    provider: extractProvider,
    schema: invoiceSchema
  }))
  .build();

Citations

When citations: true, each field includes source information:

const result = await flow.run({ base64: documentData });

// Result includes citations for each field
// {
//   invoiceNumber: "INV-001",
//   invoiceNumber_citation: { page: 1, bbox: {...} },
//   total: 1250.00,
//   total_citation: { page: 1, bbox: {...} }
// }

Split Function

Detect and separate multiple documents in a single file.

Basic Usage

import { splitDocument } from '@doclo/providers-reducto';

const segments = await splitDocument(
  { base64: stackedPdfData },
  {
    apiKey: process.env.REDUCTO_API_KEY!,
    splitDescription: [
      { name: 'Invoice', description: 'Invoice document with totals' },
      { name: 'Receipt', description: 'Payment receipt' }
    ]
  }
);

// Returns array of document segments
for (const segment of segments) {
  console.log(`${segment.type}: pages ${segment.startPage}-${segment.endPage}`);
}

Configuration Options

splitDocument(input, {
  apiKey: string,
  splitDescription: [
    {
      name: string,           // Document type name
      description: string     // Description for identification
    }
  ],
  partitionStrategy?: 'hi_res' | 'fast'
})

Common Document Types

import { COMMON_DOCUMENT_TYPES } from '@doclo/providers-reducto';

// Pre-defined document types
const segments = await splitDocument(
  { base64: documentData },
  {
    apiKey: process.env.REDUCTO_API_KEY!,
    splitDescription: [
      COMMON_DOCUMENT_TYPES.INVOICE,
      COMMON_DOCUMENT_TYPES.RECEIPT,
      COMMON_DOCUMENT_TYPES.CONTRACT
    ]
  }
);

Supported Formats

Format	MIME Type
PDF	`application/pdf`
PNG	`image/png`
JPEG	`image/jpeg`
DOCX	`application/vnd.openxmlformats-officedocument.wordprocessingml.document`
XLSX	`application/vnd.openxmlformats-officedocument.spreadsheetml.sheet`
PPTX	`application/vnd.openxmlformats-officedocument.presentationml.presentation`

Pricing

Service	Credits	USD
Parse	1 credit/page	$0.004/page
Extract	2 credits/page	$0.008/page
Split	2 credits/page	$0.008/page

Reducto vs Other Providers

Feature	Reducto	Surya	Marker
RAG chunking	Yes	No	No
Citations	Yes	Via bbox	No
Multi-format	Yes	Yes	Limited
Document splitting	Yes	No	No
Structured extraction	Native	Via LLM	Via LLM
Cost/page	$0.004+	$0.01	$0.002+

Choose Reducto when:

Building RAG pipelines with chunking
You need field-level citations
Processing multi-document files
Working with spreadsheets/presentations

Example: RAG Pipeline

import { createFlow, parse, chunk } from '@doclo/flows';
import { reductoParseProvider } from '@doclo/providers-reducto';

const parseProvider = reductoParseProvider({
  apiKey: process.env.REDUCTO_API_KEY!,
  chunkMode: 'variable',
  targetChunkSize: 500
});

const flow = createFlow()
  .step('parse', parse({ provider: parseProvider }))
  .build();

const result = await flow.run({
  base64: 'data:application/pdf;base64,...'
});

// Get RAG-ready chunks
const chunks = result.output.extras?.chunks || [];

for (const chunk of chunks) {
  console.log(`Chunk (${chunk.blockType}, page ${chunk.pageNumber}):`);
  console.log(`  ${chunk.content.substring(0, 100)}...`);
  console.log(`  Confidence: ${chunk.confidence}`);

  // Store in vector DB
  // await vectorStore.upsert({
  //   id: `chunk-${chunk.pageNumber}-${index}`,
  //   content: chunk.content,
  //   metadata: { pageNumber: chunk.pageNumber, type: chunk.blockType }
  // });
}

Example: Multi-Document Processing

import { splitDocument, reductoExtractProvider } from '@doclo/providers-reducto';
import { createFlow, extract } from '@doclo/flows';

// Split stacked document
const segments = await splitDocument(
  { base64: stackedDocument },
  {
    apiKey: process.env.REDUCTO_API_KEY!,
    splitDescription: [
      { name: 'Invoice', description: 'Invoice with line items' },
      { name: 'Receipt', description: 'Payment receipt' }
    ]
  }
);

// Process each segment
const results = [];
for (const segment of segments) {
  const flow = createFlow()
    .step('extract', extract({
      provider: reductoExtractProvider({
        apiKey: process.env.REDUCTO_API_KEY!,
        citations: true
      }),
      schema: segment.type === 'Invoice' ? invoiceSchema : receiptSchema
    }))
    .build();

  const result = await flow.run(segment.input);
  results.push({
    type: segment.type,
    pages: `${segment.startPage}-${segment.endPage}`,
    data: result.output
  });
}

console.log('Processed documents:', results);

Installation

Providers Overview

Parse Provider

Basic Setup

Configuration Options

Chunk Modes

Output: DocumentIR with Chunks

Extract Provider

Basic Setup

Configuration Options

Usage with Flows

Citations

Split Function

Basic Usage

Configuration Options

Common Document Types

Supported Formats

Pricing

Reducto vs Other Providers

Example: RAG Pipeline

Example: Multi-Document Processing

Next Steps

Surya OCR

Marker

​Installation

​Providers Overview

​Parse Provider

​Basic Setup

​Configuration Options

​Chunk Modes

​Output: DocumentIR with Chunks

​Extract Provider

​Basic Setup

​Configuration Options

​Usage with Flows

​Citations

​Split Function

​Basic Usage

​Configuration Options

​Common Document Types

​Supported Formats

​Pricing

​Reducto vs Other Providers

​Example: RAG Pipeline

​Example: Multi-Document Processing

​Next Steps

Surya OCR

Marker

Installation

Providers Overview

Parse Provider

Basic Setup

Configuration Options

Chunk Modes

Output: DocumentIR with Chunks

Extract Provider

Basic Setup

Configuration Options

Usage with Flows

Citations

Split Function

Basic Usage

Configuration Options

Common Document Types

Supported Formats

Pricing

Reducto vs Other Providers

Example: RAG Pipeline

Example: Multi-Document Processing

Next Steps