Skip to main content
Google Gemini models offer excellent performance with the largest context windows and competitive pricing.

Installation

npm install @docloai/providers-llm

Basic Setup

import { createVLMProvider } from '@docloai/providers-llm';

const provider = createVLMProvider({
  provider: 'google',
  model: 'google/gemini-2.5-flash-preview-09-2025',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

Native API

const provider = createVLMProvider({
  provider: 'google',
  model: 'gemini-2.5-flash',
  apiKey: process.env.GOOGLE_API_KEY!
});

Available Models

ModelContextReasoningBest For
gemini-3-pro1MYesLatest, most capable
gemini-2.5-pro1MYesComplex extraction
gemini-2.5-flash1MNoFast, cost-effective
gemini-2.5-flash-lite1MNoLowest cost

OpenRouter Model IDs

When using OpenRouter, use the full model path:
// OpenRouter
model: 'google/gemini-2.5-flash-preview-09-2025'

// Native
model: 'gemini-2.5-flash'

Configuration Options

createVLMProvider({
  provider: 'google',
  model: string,           // Model ID
  apiKey: string,          // API key
  via?: 'openrouter',      // Access method
  baseUrl?: string,        // Custom endpoint
  limits?: {
    maxFileSize?: number,      // Max file size (bytes)
    requestTimeout?: number,   // Timeout (ms)
    maxJsonDepth?: number      // Max JSON nesting
  }
})

Capabilities

FeatureSupport
ImagesYes (PNG, JPEG, WebP, GIF, BMP, TIFF, HEIF)
PDFsYes (up to 1000 pages)
Structured OutputYes (responseMimeType: application/json)
ReasoningYes (thinking_config)
StreamingYes

Input Formats

Images

Gemini supports more image formats than other providers:
// Supported: PNG, JPEG, WebP, GIF, BMP, TIFF, HEIF
{
  images: [{
    base64: 'data:image/jpeg;base64,...',
    mimeType: 'image/jpeg'
  }]
}
Image limits: 20MB per image, 3072x3072 max (auto-scaled).

PDFs

Gemini has the highest PDF capacity:
{
  pdfs: [{
    base64: 'data:application/pdf;base64,...'
  }]
}

// Or via Files API (native only)
{
  pdfs: [{
    fileId: 'files/abc123'  // Gemini Files API reference
  }]
}
PDF limits: 50MB per file, up to 1000 pages.

Extended Thinking

Gemini models support thinking mode for complex reasoning:
import { createFlow, extract } from '@docloai/flows';

const flow = createFlow()
  .step('extract', extract({
    provider: createVLMProvider({
      provider: 'google',
      model: 'google/gemini-2.5-pro-preview-06-05',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    }),
    schema: complexSchema,
    reasoning: {
      enabled: true,
      effort: 'high'
    }
  }))
  .build();
Thinking budget limits:
  • Gemini 2.5 Flash: 0-24576 tokens (default: auto, up to 8192)
  • Gemini 2.5 Pro: Higher limits available

Large Document Processing

Gemini’s 1M token context makes it ideal for large documents:
import { createFlow, extract } from '@docloai/flows';

// Process large documents without chunking
const flow = createFlow()
  .step('extract', extract({
    provider: createVLMProvider({
      provider: 'google',
      model: 'google/gemini-2.5-flash-preview-09-2025',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    }),
    schema: largeDocSchema
  }))
  .build();

// Can handle 500+ page documents in a single request

Production Setup

import { buildLLMProvider } from '@docloai/providers-llm';

const provider = buildLLMProvider({
  providers: [
    {
      provider: 'google',
      model: 'google/gemini-2.5-flash-preview-09-2025',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    },
    {
      provider: 'google',
      model: 'google/gemini-2.5-flash-lite',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    }
  ],
  maxRetries: 2,
  retryDelay: 1000,
  useExponentialBackoff: true
});

Pricing

Via OpenRouter (approximate):
ModelInput (per 1k)Output (per 1k)
gemini-3-pro$0.00125$0.005
gemini-2.5-pro$0.00125$0.005
gemini-2.5-flash$0.00015$0.0006
gemini-2.5-flash-lite$0.000075$0.0003
Gemini is typically the most cost-effective option for high-volume processing.

Example: Multi-Page Report

import { createFlow, extract } from '@docloai/flows';
import { createVLMProvider } from '@docloai/providers-llm';

const provider = createVLMProvider({
  provider: 'google',
  model: 'google/gemini-2.5-flash-preview-09-2025',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

const reportSchema = {
  type: 'object',
  properties: {
    title: { type: 'string' },
    author: { type: 'string' },
    date: { type: 'string' },
    executiveSummary: { type: 'string' },
    sections: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          title: { type: 'string' },
          content: { type: 'string' },
          pageNumber: { type: 'number' }
        }
      }
    },
    keyFindings: {
      type: 'array',
      items: { type: 'string' }
    }
  }
};

const flow = createFlow()
  .step('extract', extract({
    provider,
    schema: reportSchema
  }))
  .build();

// Process a 200-page annual report
const result = await flow.run({
  base64: 'data:application/pdf;base64,...'
});

console.log(`Found ${result.output.sections.length} sections`);
console.log('Key findings:', result.output.keyFindings);

Structured Output Notes

Gemini uses responseMimeType: application/json for JSON output. The SDK embeds the schema in the prompt for reliable structured output, as Gemini’s native responseSchema has limitations with complex schemas.

Next Steps