Google Gemini

Google Gemini models offer excellent performance with the largest context windows and competitive pricing.

Installation

npm install @doclo/providers-llm

Basic Setup

Via OpenRouter (Recommended)

import { createVLMProvider } from '@doclo/providers-llm';

const provider = createVLMProvider({
  provider: 'google',
  model: 'google/gemini-2.5-flash-preview-09-2025',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

Native API

const provider = createVLMProvider({
  provider: 'google',
  model: 'gemini-2.5-flash',
  apiKey: process.env.GOOGLE_API_KEY!
});

Available Models

Model	Context	Reasoning	Best For
`gemini-3-pro`	1M	Yes	Latest, most capable
`gemini-2.5-pro`	1M	Yes	Complex extraction
`gemini-2.5-flash`	1M	No	Fast, cost-effective
`gemini-2.5-flash-lite`	1M	No	Lowest cost

OpenRouter Model IDs

When using OpenRouter, use the full model path:

// OpenRouter
model: 'google/gemini-2.5-flash-preview-09-2025'

// Native
model: 'gemini-2.5-flash'

Configuration Options

createVLMProvider({
  provider: 'google',
  model: string,           // Model ID
  apiKey: string,          // API key
  via?: 'openrouter',      // Access method
  baseUrl?: string,        // Custom endpoint
  limits?: {
    maxFileSize?: number,      // Max file size (bytes)
    requestTimeout?: number,   // Timeout (ms)
    maxJsonDepth?: number      // Max JSON nesting
  }
})

Capabilities

Feature	Support
Images	Yes (PNG, JPEG, WebP, GIF, BMP, TIFF, HEIF)
PDFs	Yes (up to 1000 pages)
Structured Output	Yes (responseMimeType: application/json)
Reasoning	Yes (thinking_config)
Streaming	Yes

Input Formats

Images

Gemini supports more image formats than other providers:

// Supported: PNG, JPEG, WebP, GIF, BMP, TIFF, HEIF
{
  images: [{
    base64: 'data:image/jpeg;base64,...',
    mimeType: 'image/jpeg'
  }]
}

Image limits: 20MB per image, 3072x3072 max (auto-scaled).

PDFs

Gemini has the highest PDF capacity:

{
  pdfs: [{
    base64: 'data:application/pdf;base64,...'
  }]
}

// Or via Files API (native only)
{
  pdfs: [{
    fileId: 'files/abc123'  // Gemini Files API reference
  }]
}

PDF limits: 50MB per file, up to 1000 pages.

Extended Thinking

Gemini models support thinking mode for complex reasoning:

import { createFlow, extract } from '@doclo/flows';

const flow = createFlow()
  .step('extract', extract({
    provider: createVLMProvider({
      provider: 'google',
      model: 'google/gemini-2.5-pro-preview-06-05',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    }),
    schema: complexSchema,
    reasoning: {
      enabled: true,
      effort: 'high'
    }
  }))
  .build();

Thinking budget limits:

Gemini 2.5 Flash: 0-24576 tokens (default: auto, up to 8192)
Gemini 2.5 Pro: Higher limits available

Large Document Processing

Gemini’s 1M token context makes it ideal for large documents:

import { createFlow, extract } from '@doclo/flows';

// Process large documents without chunking
const flow = createFlow()
  .step('extract', extract({
    provider: createVLMProvider({
      provider: 'google',
      model: 'google/gemini-2.5-flash-preview-09-2025',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    }),
    schema: largeDocSchema
  }))
  .build();

// Can handle 500+ page documents in a single request

Production Setup

import { buildLLMProvider } from '@doclo/providers-llm';

const provider = buildLLMProvider({
  providers: [
    {
      provider: 'google',
      model: 'google/gemini-2.5-flash-preview-09-2025',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    },
    {
      provider: 'google',
      model: 'google/gemini-2.5-flash-lite',
      apiKey: process.env.OPENROUTER_API_KEY!,
      via: 'openrouter'
    }
  ],
  maxRetries: 2,
  retryDelay: 1000,
  useExponentialBackoff: true
});

Pricing

Via OpenRouter (approximate):

Model	Input (per 1k)	Output (per 1k)
gemini-3-pro	$0.00125	$0.005
gemini-2.5-pro	$0.00125	$0.005
gemini-2.5-flash	$0.00015	$0.0006
gemini-2.5-flash-lite	$0.000075	$0.0003

Gemini is typically the most cost-effective option for high-volume processing.

Example: Multi-Page Report

import { createFlow, extract } from '@doclo/flows';
import { createVLMProvider } from '@doclo/providers-llm';

const provider = createVLMProvider({
  provider: 'google',
  model: 'google/gemini-2.5-flash-preview-09-2025',
  apiKey: process.env.OPENROUTER_API_KEY!,
  via: 'openrouter'
});

const reportSchema = {
  type: 'object',
  properties: {
    title: { type: 'string' },
    author: { type: 'string' },
    date: { type: 'string' },
    executiveSummary: { type: 'string' },
    sections: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          title: { type: 'string' },
          content: { type: 'string' },
          pageNumber: { type: 'number' }
        }
      }
    },
    keyFindings: {
      type: 'array',
      items: { type: 'string' }
    }
  }
};

const flow = createFlow()
  .step('extract', extract({
    provider,
    schema: reportSchema
  }))
  .build();

// Process a 200-page annual report
const result = await flow.run({
  base64: 'data:application/pdf;base64,...'
});

console.log(`Found ${result.output.sections.length} sections`);
console.log('Key findings:', result.output.keyFindings);

Structured Output Notes

Gemini uses responseMimeType: application/json for JSON output. The SDK embeds the schema in the prompt for reliable structured output, as Gemini’s native responseSchema has limitations with complex schemas.

Getting Started

Concepts

SDK

Doclo Cloud

Guides

Resources

Google Gemini

Installation

Basic Setup

Via OpenRouter (Recommended)

Native API

Available Models

OpenRouter Model IDs

Configuration Options

Capabilities

Input Formats

Images

PDFs

Extended Thinking

Large Document Processing

Production Setup

Pricing

Example: Multi-Page Report

Structured Output Notes

Next Steps

OpenAI

Anthropic

Getting Started

Concepts

SDK

Doclo Cloud

Guides

Resources

​Installation

​Basic Setup

​Via OpenRouter (Recommended)

​Native API

​Available Models

​OpenRouter Model IDs

​Configuration Options

​Capabilities

​Input Formats

​Images

​PDFs

​Extended Thinking

​Large Document Processing

​Production Setup

​Pricing

​Example: Multi-Page Report

​Structured Output Notes

​Next Steps

OpenAI

Anthropic

Installation

Basic Setup

Via OpenRouter (Recommended)

Native API

Available Models

OpenRouter Model IDs

Configuration Options

Capabilities

Input Formats

Images

PDFs

Extended Thinking

Large Document Processing

Production Setup

Pricing

Example: Multi-Page Report

Structured Output Notes

Next Steps