Error Recovery

Production document processing requires handling failures gracefully. This guide covers the error hierarchy, retry patterns, provider fallbacks, and strategies for building resilient extraction pipelines.

Error Hierarchy

The SDK provides a structured error hierarchy for precise error handling:

import {
  DocloError,           // Base error class
  AuthenticationError,  // Invalid API key (401)
  AuthorizationError,   // Insufficient permissions (403)
  NotFoundError,        // Resource not found (404)
  ValidationError,      // Invalid input (400)
  RateLimitError,       // Rate limit exceeded (429)
  TimeoutError,         // Request timed out (408)
  NetworkError,         // Connection issues
  ExecutionError,       // Flow execution failed
  InvalidApiKeyError    // API key format invalid
} from '@doclo/client';

Error Properties

All errors extend DocloError and include:

interface DocloError {
  name: string;                    // Error class name
  code: string;                    // Error code (e.g., 'RATE_LIMIT_EXCEEDED')
  message: string;                 // Human-readable message
  statusCode?: number;             // HTTP status code
  details?: Record<string, any>;   // Additional context
}

Error Codes

Common error codes:

Code	Description
`INVALID_API_KEY`	API key is invalid or malformed
`API_KEY_REVOKED`	API key has been revoked
`FLOW_NOT_FOUND`	Specified flow does not exist
`EXECUTION_NOT_FOUND`	Execution ID not found
`INVALID_INPUT`	Input validation failed
`RATE_LIMIT_EXCEEDED`	Too many requests
`EXECUTION_TIMEOUT`	Execution took too long
`PROVIDER_ERROR`	LLM/OCR provider failed
`PROVIDER_RATE_LIMITED`	Provider rate limit hit

Basic Error Handling

Handle errors by type:

import {
  DocloClient,
  AuthenticationError,
  ValidationError,
  RateLimitError,
  TimeoutError,
  NotFoundError,
  ExecutionError
} from '@doclo/client';

const client = new DocloClient({
  apiKey: process.env.DOCLO_API_KEY!
});

async function processDocument(flowId: string, document: any) {
  try {
    const result = await client.flows.run(flowId, {
      input: { document },
      wait: true,
      timeout: 30000
    });
    return result.output;

  } catch (error) {
    if (error instanceof AuthenticationError) {
      // API key issues - likely configuration problem
      console.error('Authentication failed:', error.message);
      throw new Error('Service configuration error');
    }

    if (error instanceof ValidationError) {
      // Bad input - don't retry, fix the input
      console.error('Invalid input:', error.message);
      throw new Error(`Invalid document: ${error.message}`);
    }

    if (error instanceof RateLimitError) {
      // Rate limited - retry after delay
      const retryAfter = error.rateLimitInfo?.retryAfter || 60;
      console.warn(`Rate limited. Retry after ${retryAfter}s`);
      throw error;  // Let caller handle retry
    }

    if (error instanceof TimeoutError) {
      // Timeout - consider async processing
      console.warn('Processing timed out');
      throw new Error('Document processing timed out. Try async processing.');
    }

    if (error instanceof NotFoundError) {
      // Flow doesn't exist
      console.error('Flow not found:', flowId);
      throw new Error('Processing flow not configured');
    }

    if (error instanceof ExecutionError) {
      // Flow execution failed
      console.error('Execution failed:', error.executionId, error.message);
      throw error;
    }

    // Unknown error
    console.error('Unexpected error:', error);
    throw error;
  }
}

Retry with Exponential Backoff

Implement automatic retries for transient failures:

interface RetryOptions {
  maxAttempts: number;
  initialDelayMs: number;
  maxDelayMs: number;
  backoffMultiplier: number;
  retryableErrors: string[];
}

const defaultRetryOptions: RetryOptions = {
  maxAttempts: 3,
  initialDelayMs: 1000,
  maxDelayMs: 30000,
  backoffMultiplier: 2,
  retryableErrors: [
    'RATE_LIMIT_EXCEEDED',
    'PROVIDER_RATE_LIMITED',
    'NETWORK_ERROR',
    'TIMEOUT',
    'PROVIDER_ERROR'
  ]
};

async function withRetry<T>(
  fn: () => Promise<T>,
  options: Partial<RetryOptions> = {}
): Promise<T> {
  const config = { ...defaultRetryOptions, ...options };
  let lastError: Error | undefined;

  for (let attempt = 1; attempt <= config.maxAttempts; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error as Error;

      // Check if error is retryable
      const errorCode = (error as any).code;
      const isRetryable = config.retryableErrors.includes(errorCode);

      if (!isRetryable || attempt === config.maxAttempts) {
        throw error;
      }

      // Calculate delay with exponential backoff
      const delay = Math.min(
        config.initialDelayMs * Math.pow(config.backoffMultiplier, attempt - 1),
        config.maxDelayMs
      );

      // Add jitter to prevent thundering herd
      const jitter = delay * 0.1 * Math.random();
      const totalDelay = delay + jitter;

      console.warn(
        `Attempt ${attempt} failed with ${errorCode}. ` +
        `Retrying in ${Math.round(totalDelay)}ms...`
      );

      await sleep(totalDelay);
    }
  }

  throw lastError;
}

function sleep(ms: number): Promise<void> {
  return new Promise(resolve => setTimeout(resolve, ms));
}

Usage:

const result = await withRetry(
  () => client.flows.run(flowId, { input: { document }, wait: true }),
  { maxAttempts: 3, initialDelayMs: 2000 }
);

Rate Limit Handling

Handle rate limits with proper backoff:

import { RateLimitError } from '@doclo/client';

async function processWithRateLimitHandling(
  flowId: string,
  document: any
): Promise<any> {
  const maxRetries = 5;

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      return await client.flows.run(flowId, {
        input: { document },
        wait: true
      });
    } catch (error) {
      if (error instanceof RateLimitError) {
        if (attempt === maxRetries) {
          throw new Error('Rate limit exceeded after maximum retries');
        }

        // Use server-provided retry-after if available
        const retryAfter = error.rateLimitInfo?.retryAfter || (attempt * 30);

        console.warn(`Rate limited. Waiting ${retryAfter}s before retry ${attempt + 1}/${maxRetries}`);
        await sleep(retryAfter * 1000);
        continue;
      }
      throw error;
    }
  }
}

Provider Fallback

Use multiple providers with automatic fallback:

import { buildLLMProvider } from '@doclo/providers-llm';

// Create provider with fallback chain
const provider = buildLLMProvider({
  providers: [
    {
      provider: 'google',
      model: 'google/gemini-2.5-flash',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    },
    {
      provider: 'anthropic',
      model: 'anthropic/claude-sonnet-4.5',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    },
    {
      provider: 'openai',
      model: 'openai/gpt-4.1',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    }
  ],
  maxRetries: 2,              // Retries per provider
  retryDelay: 1000,           // Base retry delay
  useExponentialBackoff: true,
  circuitBreakerThreshold: 3  // Failures before skipping provider
});

Circuit Breaker Pattern

Prevent cascading failures by temporarily disabling failing providers:

interface CircuitBreaker {
  failures: number;
  lastFailure: number;
  isOpen: boolean;
}

class ProviderCircuitBreaker {
  private breakers: Map<string, CircuitBreaker> = new Map();
  private threshold: number;
  private resetTimeMs: number;

  constructor(threshold = 3, resetTimeMs = 60000) {
    this.threshold = threshold;
    this.resetTimeMs = resetTimeMs;
  }

  recordFailure(providerId: string): void {
    const breaker = this.breakers.get(providerId) || {
      failures: 0,
      lastFailure: 0,
      isOpen: false
    };

    breaker.failures++;
    breaker.lastFailure = Date.now();

    if (breaker.failures >= this.threshold) {
      breaker.isOpen = true;
      console.warn(`Circuit breaker opened for provider: ${providerId}`);
    }

    this.breakers.set(providerId, breaker);
  }

  recordSuccess(providerId: string): void {
    this.breakers.delete(providerId);
  }

  isAvailable(providerId: string): boolean {
    const breaker = this.breakers.get(providerId);

    if (!breaker) return true;
    if (!breaker.isOpen) return true;

    // Check if reset time has passed
    if (Date.now() - breaker.lastFailure > this.resetTimeMs) {
      // Reset and allow retry
      breaker.isOpen = false;
      breaker.failures = 0;
      return true;
    }

    return false;
  }
}

const circuitBreaker = new ProviderCircuitBreaker();

async function extractWithFallback(
  providers: string[],
  document: any,
  schema: any
): Promise<any> {
  for (const providerId of providers) {
    if (!circuitBreaker.isAvailable(providerId)) {
      console.log(`Skipping provider ${providerId} (circuit open)`);
      continue;
    }

    try {
      const result = await extractWith(providerId, document, schema);
      circuitBreaker.recordSuccess(providerId);
      return result;
    } catch (error) {
      console.error(`Provider ${providerId} failed:`, error);
      circuitBreaker.recordFailure(providerId);
    }
  }

  throw new Error('All providers failed');
}

Partial Results Recovery

When a flow fails mid-execution, recover partial results:

import { FlowExecutionError } from '@doclo/core';

async function processWithPartialRecovery(
  flow: any,
  input: any
): Promise<any> {
  try {
    return await flow.run(input);
  } catch (error) {
    if (error instanceof FlowExecutionError) {
      console.error(`Flow failed at step: ${error.stepId}`);

      // Access results from completed steps
      const artifacts = error.artifacts;

      if (artifacts) {
        console.log('Completed steps:', Object.keys(artifacts));

        // Return partial results
        return {
          partial: true,
          failedAt: error.stepId,
          completedSteps: Object.keys(artifacts),
          results: artifacts
        };
      }
    }
    throw error;
  }
}

Graceful Degradation

Implement degraded modes when providers fail:

interface ExtractionResult {
  data: any;
  confidence: 'high' | 'medium' | 'low';
  mode: 'full' | 'degraded' | 'fallback';
}

async function extractWithDegradation(
  document: any,
  schema: any
): Promise<ExtractionResult> {
  // Attempt 1: Full extraction with VLM
  try {
    const result = await vlmFlow.run({ base64: document });
    return {
      data: result.output,
      confidence: 'high',
      mode: 'full'
    };
  } catch (error) {
    console.warn('VLM extraction failed, trying OCR + LLM');
  }

  // Attempt 2: Degraded mode - OCR then LLM
  try {
    const result = await ocrLlmFlow.run({ base64: document });
    return {
      data: result.output,
      confidence: 'medium',
      mode: 'degraded'
    };
  } catch (error) {
    console.warn('OCR + LLM failed, trying basic extraction');
  }

  // Attempt 3: Fallback - basic text extraction only
  try {
    const result = await basicOcrFlow.run({ base64: document });
    return {
      data: { rawText: result.output.text },
      confidence: 'low',
      mode: 'fallback'
    };
  } catch (error) {
    throw new Error('All extraction methods failed');
  }
}

Timeout Handling

Handle long-running extractions appropriately:

async function processWithTimeoutHandling(
  flowId: string,
  document: any
): Promise<any> {
  // Try sync first with short timeout
  try {
    return await client.flows.run(flowId, {
      input: { document },
      wait: true,
      timeout: 30000  // 30 seconds
    });
  } catch (error) {
    if (error instanceof TimeoutError) {
      console.log('Sync timeout, switching to async');

      // Fall back to async processing
      const execution = await client.flows.run(flowId, {
        input: { document },
        webhookUrl: process.env.WEBHOOK_URL
      });

      console.log('Started async execution:', execution.id);

      // Either wait for webhook or poll
      return await client.runs.waitForCompletion(execution.id, {
        interval: 5000,
        timeout: 300000  // 5 minutes
      });
    }
    throw error;
  }
}

Observability Hooks for Error Tracking

Use hooks to track errors across your flows:

import { createFlow, extract } from '@doclo/flows';

const flow = createFlow({
  observability: {
    onFlowError: (ctx) => {
      // Log to monitoring service
      console.error('Flow error:', {
        flowId: ctx.flowId,
        executionId: ctx.executionId,
        error: ctx.error.message,
        errorCode: ctx.errorCode,
        failedAtStep: ctx.failedAtStepIndex
      });

      // Send to error tracking (Sentry, etc.)
      // Sentry.captureException(ctx.error, { extra: ctx });
    },

    onStepError: (ctx) => {
      console.error('Step error:', {
        stepId: ctx.stepId,
        stepIndex: ctx.stepIndex,
        error: ctx.error.message,
        willRetry: ctx.willRetry,
        retryAttempt: ctx.retryAttempt
      });
    },

    onProviderRetry: (ctx) => {
      console.warn('Provider retry:', {
        provider: ctx.provider,
        attempt: ctx.attemptNumber,
        error: ctx.error.message,
        nextRetryDelay: ctx.nextRetryDelay
      });
    }
  }
})
  .step('extract', extract({ provider, schema }))
  .build();

Complete Example: Resilient Pipeline

import { DocloClient, RateLimitError, TimeoutError } from '@doclo/client';
import { createFlow, parse, extract } from '@doclo/flows';
import { buildLLMProvider, createOCRProvider } from '@doclo/providers-llm';

// Resilient provider configuration
const vlmProvider = buildLLMProvider({
  providers: [
    {
      provider: 'google',
      model: 'google/gemini-2.5-flash',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    },
    {
      provider: 'anthropic',
      model: 'anthropic/claude-sonnet-4.5',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    }
  ],
  maxRetries: 2,
  useExponentialBackoff: true,
  circuitBreakerThreshold: 3
});

const ocrProvider = createOCRProvider({
  endpoint: 'https://www.datalab.to/api/v1/marker',
  apiKey: process.env.DATALAB_API_KEY!
});

// Resilient flow with error tracking
const resilientFlow = createFlow({
  observability: {
    onFlowError: (ctx) => {
      console.error(`Flow ${ctx.flowId} failed:`, ctx.error.message);
      // Track in monitoring
    },
    onStepError: (ctx) => {
      console.warn(`Step ${ctx.stepId} error (attempt ${ctx.retryAttempt}):`, ctx.error.message);
    }
  }
})
  .step('parse', parse({ provider: ocrProvider }))
  .step('extract', extract({
    provider: vlmProvider,
    schema: schema,
    inputMode: 'ir+source'
  }))
  .build();

// Processing function with full error handling
async function processDocumentResilient(
  document: { base64: string; filename: string; mimeType: string }
) {
  const maxRetries = 3;
  let lastError: Error | undefined;

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const result = await resilientFlow.run({ base64: document.base64 });

      return {
        success: true,
        data: result.output,
        metrics: result.aggregated,
        attempts: attempt
      };

    } catch (error) {
      lastError = error as Error;
      const errorCode = (error as any).code;

      // Don't retry validation errors
      if (errorCode === 'INVALID_INPUT' || errorCode === 'SCHEMA_VALIDATION_FAILED') {
        return {
          success: false,
          error: 'Invalid document or schema',
          code: errorCode
        };
      }

      // Rate limit - wait and retry
      if (error instanceof RateLimitError) {
        const retryAfter = error.rateLimitInfo?.retryAfter || (attempt * 30);
        console.warn(`Rate limited. Waiting ${retryAfter}s...`);
        await sleep(retryAfter * 1000);
        continue;
      }

      // Timeout - maybe use async
      if (error instanceof TimeoutError && attempt === maxRetries) {
        return {
          success: false,
          error: 'Processing timed out',
          suggestion: 'Consider using async processing with webhooks'
        };
      }

      // Other errors - retry with backoff
      if (attempt < maxRetries) {
        const delay = 1000 * Math.pow(2, attempt - 1);
        console.warn(`Attempt ${attempt} failed. Retrying in ${delay}ms...`);
        await sleep(delay);
        continue;
      }
    }
  }

  return {
    success: false,
    error: lastError?.message || 'Processing failed after retries',
    attempts: maxRetries
  };
}

function sleep(ms: number): Promise<void> {
  return new Promise(resolve => setTimeout(resolve, ms));
}

Best Practices

Classify errors - Know which errors are retryable vs permanent
Use exponential backoff - Prevent overwhelming failing services
Add jitter - Avoid thundering herd when services recover
Implement circuit breakers - Stop calling failing providers
Log comprehensively - Include context for debugging
Set reasonable timeouts - Balance wait time vs failure detection
Plan for partial failures - Extract value from completed steps
Monitor error rates - Alert on anomalies

Next Steps

Observability

Track errors and metrics

Providers

Configure fallback providers

Webhooks

Handle async failures

Multi-Provider

Increase reliability with consensus

Getting Started

Concepts

SDK

Doclo Cloud

Guides

Resources

Error Recovery

Error Hierarchy

Error Properties

Error Codes

Basic Error Handling

Retry with Exponential Backoff

Rate Limit Handling

Provider Fallback

Circuit Breaker Pattern

Partial Results Recovery

Graceful Degradation

Timeout Handling

Observability Hooks for Error Tracking

Complete Example: Resilient Pipeline

Best Practices

Next Steps

Observability

Providers

Webhooks

Multi-Provider

Getting Started

Concepts

SDK

Doclo Cloud

Guides

Resources

​Error Hierarchy

​Error Properties

​Error Codes

​Basic Error Handling

​Retry with Exponential Backoff

​Rate Limit Handling

​Provider Fallback

​Circuit Breaker Pattern

​Partial Results Recovery

​Graceful Degradation

​Timeout Handling

​Observability Hooks for Error Tracking

​Complete Example: Resilient Pipeline

​Best Practices

​Next Steps

Observability

Providers

Webhooks

Multi-Provider

Error Hierarchy

Error Properties

Error Codes

Basic Error Handling

Retry with Exponential Backoff

Rate Limit Handling

Provider Fallback

Circuit Breaker Pattern

Partial Results Recovery

Graceful Degradation

Timeout Handling

Observability Hooks for Error Tracking

Complete Example: Resilient Pipeline

Best Practices

Next Steps