Skip to main content
OpenAI provides GPT-4 series models with vision capabilities for document processing.

Installation

npm install @docloai/providers-llm

Basic Setup

import { createVLMProvider } from '@docloai/providers-llm';

const provider = createVLMProvider({
  provider: 'openai',
  model: 'openai/gpt-4.1',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

Native API

const provider = createVLMProvider({
  provider: 'openai',
  model: 'gpt-4.1',
  apiKey: process.env.OPENAI_API_KEY!
});

Available Models

ModelContextReasoningBest For
gpt-5.1256kYesComplex extraction
gpt-4.1128kNoGeneral extraction
gpt-4.1-mini128kNoCost-effective extraction
o3200kYesComplex reasoning
o3-mini200kYesFast reasoning
o4-mini200kYesLatest reasoning model

OpenRouter Model IDs

When using OpenRouter, prefix models with openai/:
// OpenRouter
model: 'openai/gpt-4.1'

// Native
model: 'gpt-4.1'

Configuration Options

createVLMProvider({
  provider: 'openai',
  model: string,           // Model ID
  apiKey: string,          // API key
  via?: 'openrouter',      // Access method
  baseUrl?: string,        // Custom endpoint
  limits?: {
    maxFileSize?: number,      // Max file size (bytes)
    requestTimeout?: number,   // Timeout (ms)
    maxJsonDepth?: number      // Max JSON nesting
  }
})

Capabilities

FeatureSupport
ImagesYes (PNG, JPEG, WebP, GIF)
PDFsYes (up to 100 pages)
Structured OutputYes (json_schema)
Reasoningo3, o4-mini models
StreamingYes

Input Formats

Images

// Via URL
{
  images: [{
    url: 'https://example.com/image.jpg',
    mimeType: 'image/jpeg'
  }]
}

// Via Base64
{
  images: [{
    base64: 'data:image/jpeg;base64,...',
    mimeType: 'image/jpeg'
  }]
}

PDFs

// Via Base64
{
  pdfs: [{
    base64: 'data:application/pdf;base64,...'
  }]
}

Extended Reasoning

Enable reasoning for complex extractions:
import { createFlow, extract } from '@docloai/flows';

const flow = createFlow()
  .step('extract', extract({
    provider: createVLMProvider({
      provider: 'openai',
      model: 'openai/o3',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    }),
    schema: complexSchema,
    reasoning: {
      enabled: true,
      effort: 'high'  // 'low' | 'medium' | 'high'
    }
  }))
  .build();
Reasoning configuration:
OptionTypeDescription
enabledbooleanEnable reasoning
effortstringReasoning depth: ‘low’, ‘medium’, ‘high’
excludebooleanExclude reasoning tokens from response

Production Setup

Use buildLLMProvider for retry logic and fallback:
import { buildLLMProvider } from '@docloai/providers-llm';

const provider = buildLLMProvider({
  providers: [
    {
      provider: 'openai',
      model: 'openai/gpt-4.1',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    }
  ],
  maxRetries: 2,
  retryDelay: 1000,
  useExponentialBackoff: true,
  circuitBreakerThreshold: 3
});

Pricing

Via OpenRouter (approximate):
ModelInput (per 1k)Output (per 1k)
gpt-4.1$0.002$0.008
gpt-4.1-mini$0.0004$0.0016
o3$0.010$0.040
o3-mini$0.0011$0.0044

Example: Invoice Extraction

import { createFlow, extract } from '@docloai/flows';
import { createVLMProvider } from '@docloai/providers-llm';

const provider = createVLMProvider({
  provider: 'openai',
  model: 'openai/gpt-4.1',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

const invoiceSchema = {
  type: 'object',
  properties: {
    invoiceNumber: { type: 'string' },
    date: { type: 'string' },
    total: { type: 'number' },
    vendor: { type: 'string' }
  }
};

const flow = createFlow()
  .step('extract', extract({
    provider,
    schema: invoiceSchema
  }))
  .build();

const result = await flow.run({
  base64: 'data:application/pdf;base64,...'
});

console.log(result.output);
// { invoiceNumber: 'INV-001', date: '2024-01-15', total: 1250, vendor: 'Acme Corp' }

Next Steps