Skip to main content
The SDK includes pre-built flows for common document processing patterns. These flows handle provider configuration, error handling, and retries out of the box.

VLM Direct Flow

Skip OCR and send documents directly to a Vision Language Model:
import { buildVLMDirectFlow } from '@docloai/flows';

const flow = buildVLMDirectFlow({
  llmConfigs: [
    {
      provider: 'google',
      model: 'gemini-2.5-flash',
      apiKey: process.env.GOOGLE_API_KEY!,
      via: 'openrouter'
    }
  ]
});

const result = await flow.run({ base64: documentData });
console.log(result.output);

Configuration

buildVLMDirectFlow({
  // Required: LLM provider configurations
  llmConfigs: [
    {
      provider: 'openai' | 'anthropic' | 'google' | 'xai',
      model: string,
      apiKey: string,
      via?: 'openrouter' | 'native',
      baseUrl?: string
    }
  ],

  // Optional: Retry settings
  maxRetries?: number,              // Default: 2
  retryDelay?: number,              // Default: 1000ms
  circuitBreakerThreshold?: number  // Default: 3
})

When to Use

VLM direct is ideal when:
  • Documents have complex visual layouts (tables, forms, charts)
  • Speed is more important than cost
  • OCR might introduce errors (handwritten text, unusual fonts)
VLM direct may not be best when:
  • Documents are text-heavy with simple layouts
  • Cost is a primary concern (vision tokens are more expensive)
  • You need to process very large documents

Output

interface VLMDirectResult {
  output: {
    vessel: string | null;
    port: string | null;
    quantity_mt: number | null;
  };
  metrics: StepMetric[];
  artifacts: {
    vlm_extract: unknown;
  };
}

Multi-Provider Flow

OCR + LLM extraction with automatic provider fallback:
import { buildMultiProviderFlow } from '@docloai/flows';
import { createOCRProvider } from '@docloai/providers-llm';

const ocrProvider = createOCRProvider({
  provider: 'surya',
  endpoint: process.env.SURYA_ENDPOINT!,
  apiKey: process.env.SURYA_API_KEY!
});

const flow = buildMultiProviderFlow({
  ocr: ocrProvider,
  llmConfigs: [
    {
      provider: 'openai',
      model: 'gpt-4.1',
      apiKey: process.env.OPENAI_API_KEY!
    },
    {
      provider: 'anthropic',
      model: 'claude-haiku-4.5',
      apiKey: process.env.ANTHROPIC_API_KEY!,
      via: 'openrouter'
    },
    {
      provider: 'google',
      model: 'gemini-2.5-flash',
      apiKey: process.env.GOOGLE_API_KEY!
    }
  ],
  maxRetries: 2
});

const result = await flow.run({ base64: pdfData });

Configuration

buildMultiProviderFlow({
  // Required: OCR provider
  ocr: OCRProvider,

  // Required: LLM provider configurations (in priority order)
  llmConfigs: [
    {
      provider: 'openai' | 'anthropic' | 'google' | 'xai',
      model: string,
      apiKey: string,
      via?: 'openrouter' | 'native',
      baseUrl?: string
    }
  ],

  // Optional: Retry and fallback settings
  maxRetries?: number,              // Default: 2
  retryDelay?: number,              // Default: 1000ms
  circuitBreakerThreshold?: number  // Default: 3
})

Provider Fallback

Providers are tried in order. If one fails after retries, the next is used:
Request → OpenAI (retry 1) → OpenAI (retry 2) → Anthropic → Google → Error
The circuit breaker prevents repeated calls to failing providers:
  • After circuitBreakerThreshold failures, provider is marked “open”
  • Open providers are skipped until reset

Output

interface MultiProviderResult {
  ir: DocumentIR;                   // Parsed document
  output: {
    vessel: string | null;
    port: string | null;
    quantity_mt: number | null;
  };
  metrics: StepMetric[];
  artifacts: {
    parse: unknown;
    extract: unknown;
  };
}

Two-Provider Flow

Compare extraction results from two LLM providers:
import { buildTwoProviderFlow } from '@docloai/flows';
import { createOCRProvider, createLLMProvider } from '@docloai/providers-llm';

const ocrProvider = createOCRProvider({
  provider: 'surya',
  endpoint: process.env.SURYA_ENDPOINT!,
  apiKey: process.env.SURYA_API_KEY!
});

const llmA = createLLMProvider({
  provider: 'openai',
  model: 'gpt-4.1',
  apiKey: process.env.OPENAI_API_KEY!
});

const llmB = createLLMProvider({
  provider: 'anthropic',
  model: 'claude-sonnet-4',
  apiKey: process.env.ANTHROPIC_API_KEY!
});

const flow = buildTwoProviderFlow({
  ocr: ocrProvider,
  llmA: llmA,
  llmB: llmB
});

const result = await flow.run({ base64: pdfData });

console.log('Provider A:', result.outputA);
console.log('Provider B:', result.outputB);

Use Cases

  • Quality validation: Compare results to detect extraction errors
  • A/B testing: Evaluate provider performance on your documents
  • Consensus: Use matching results as high-confidence extractions

Output

interface TwoProviderResult {
  ir: DocumentIR;
  outputA: {
    vessel: string | null;
    port: string | null;
    quantity_mt: number | null;
  };
  outputB: {
    vessel: string | null;
    port: string | null;
    quantity_mt: number | null;
  };
  metrics: StepMetric[];
  artifacts: {
    parse: unknown;
    extractA: unknown;
    extractB: unknown;
  };
}

Building Custom Pre-built Flows

Use createFlow() to build your own reusable flows:
import { createFlow, parse, extract, split, combine, categorize } from '@docloai/flows';
import { createVLMProvider, createOCRProvider } from '@docloai/providers-llm';

export function buildInvoiceFlow(config: {
  vlmApiKey: string;
  ocrEndpoint: string;
  ocrApiKey: string;
}) {
  const vlmProvider = createVLMProvider({
    provider: 'google',
    model: 'google/gemini-flash-2.5',
    apiKey: config.vlmApiKey,
    via: 'openrouter'
  });

  const ocrProvider = createOCRProvider({
    provider: 'surya',
    endpoint: config.ocrEndpoint,
    apiKey: config.ocrApiKey
  });

  const invoiceSchema = {
    type: 'object',
    properties: {
      invoiceNumber: { type: 'string' },
      date: { type: 'string' },
      vendor: { type: 'string' },
      total: { type: 'number' },
      lineItems: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            description: { type: 'string' },
            quantity: { type: 'number' },
            unitPrice: { type: 'number' },
            amount: { type: 'number' }
          }
        }
      }
    }
  };

  return createFlow()
    .acceptFormats(['application/pdf', 'image/jpeg', 'image/png'])
    .step('parse', parse({ provider: ocrProvider }))
    .step('extract', extract({
      provider: vlmProvider,
      schema: invoiceSchema,
      consensus: { runs: 3, strategy: 'majority' }
    }))
    .build();
}

// Usage
const invoiceFlow = buildInvoiceFlow({
  vlmApiKey: process.env.OPENROUTER_API_KEY!,
  ocrEndpoint: process.env.SURYA_ENDPOINT!,
  ocrApiKey: process.env.SURYA_API_KEY!
});

const result = await invoiceFlow.run({ base64: invoicePdf });

Comparing Approaches

FlowOCRSpeedCostBest For
VLM DirectNoFastHigherVisual documents, forms
Multi-ProviderYesMediumLowerText documents, fallback
Two-ProviderYesSlowerHigherValidation, comparison
CustomConfigurableVariesVariesSpecific requirements

Next Steps