Recommended Models

FIM One is provider-agnostic — any OpenAI-compatible endpoint works. This page helps you pick the best model combination for your use case. For configuration details, see Environment Variables.

How FIM One Uses Models

FIM One has three model roles:

Role	Env Variable	Used For
General	`LLM_MODEL`	Planning, analysis, ReAct agent, complex reasoning
Fast	`FAST_LLM_MODEL`	DAG step execution, context compaction (cheaper, faster)
Reasoning	`REASONING_LLM_MODEL`	Deep analysis, complex planning, mathematical proofs

Fast and Reasoning fall back to General if not configured. For production deployments, splitting into at least two models (General + Fast) gives the best cost/quality balance. These roles can be configured via ENV variables or through the admin UI’s Model Groups feature, which allows one-click switching between model sets. See Model Management for the full admin UI guide.

Quick Selection Matrix

Provider	Main LLM	Fast LLM	Reasoning	Vision	Notes
OpenAI	`gpt-5.4`	`gpt-5.4-mini` / `gpt-5.4-nano`	✅ `reasoning_effort`	✅ All	Best native tool-calling; GPT-5.4 is latest flagship (Mar 2026)
Anthropic	`claude-sonnet-4-6`	`claude-haiku-4-5`	✅ via LiteLLM	✅ All	Native API routing; full `reasoning_content` support; 1M context GA
Google Gemini	`gemini-2.5-pro` / `gemini-3.1-pro-preview`	`gemini-2.5-flash` / `gemini-3-flash-preview`	✅ `reasoning_effort`	✅ All	2.5 is stable GA; 3.x is preview; `gemini-3-pro-preview` shut down Mar 9
DeepSeek	`deepseek-chat` (V3.2)	`deepseek-chat`	✅ `deepseek-reasoner`	❌	Text-only; V4 (Apr 2026) will add vision
Qwen (Alibaba)	`qwen3.5-plus` / `qwen3-max`	`qwen3.5-flash` / `qwen-turbo`	✅ `enable_thinking` on `qwen3-max`	⚠️ qwen3.5 only	Strongest Chinese language; qwq/reasoning text-only
ChatGLM (Zhipu)	`glm-4.7`	`glm-4.7-flash`	`glm-5`	⚠️ GLM-4.6V	Forced FC not supported; vision requires separate VLM model
MiniMax	`MiniMax-M2.7`	`MiniMax-M2.5`	❌	❌	Text-only; M2.7 latest (Mar 2026); 80.2% SWE-Bench
Kimi (Moonshot)	`kimi-k2.5`	`kimi-k2`	✅ `kimi-k2-thinking`	⚠️ K2.5 only	K2-thinking text-only; forced FC not supported with thinking
Ollama (local)	`qwen3.5` / `llama4`	`qwen3.5:9b`	❌	Varies	Fully offline, no API key; Llama 4 supports vision

Vision indicates whether the model accepts image input. This is required for Intelligent Document Processing (IDP) — if your model doesn’t support vision, IDP will fall back to text-only extraction. Providers marked ⚠️ have vision on some models but not others; check the specific model you’re using.

Structured Output Compatibility

FIM One’s DAG planner needs the model to return valid structured JSON. Internally, it tries three extraction levels in order:

Native Function Calling — forces the model to output JSON matching a schema via the tool-call API. Most reliable.
JSON Mode — requests response_format: json_object. Guarantees valid JSON, but does not enforce schema compliance.
Plain Text Extraction — parses JSON from free-form text as a last resort.

Models that support Level 1 (native FC with forced tool_choice) give the best planning reliability. If a model only reaches Level 2, its output quality depends on how well it follows prompt instructions — weaker models may produce valid JSON that doesn’t match the expected structure.

Provider	Forced Function Calling	JSON Mode	Planning Reliability
OpenAI (GPT-5.x, o3)	✅ Full support	✅	⭐⭐⭐ Excellent
Anthropic (Claude 4.x)	⚠️ Conflicts with thinking mode	✅	⭐⭐⭐ Excellent (strong instruction following compensates)
Google Gemini (2.5/3.x)	✅ Full support	✅	⭐⭐⭐ Excellent
Mistral	✅ Full support	✅	⭐⭐ Good
DeepSeek (V3.2)	⚠️ Unstable (`tool_choice="required"` works, `"auto"` unreliable)	✅	⭐⭐ Good
Qwen (3.x)	⚠️ Partial	✅	⭐⭐ Good
Kimi (K2.5)	⚠️ Partial — `auto` only when thinking enabled	✅	⭐ Fair — may produce malformed plans
ChatGLM (GLM-4.7/5)	❌ Not supported (`auto` only)	✅	⭐ Fair
MiniMax (M2.5/M2.7)	✅ Full support	✅	⭐⭐ Good
Local (Ollama)	Varies by model	Varies	⭐ Fair — 32B+ recommended

If you see the error “failed to generate a valid task plan”, the model’s structured output capability is insufficient for DAG planning. Switch your Main LLM to a model rated ⭐⭐⭐ or ⭐⭐ above, or disable DAG mode and use the simpler ReAct agent instead.

Thinking / Reasoning Compatibility

Different providers implement “thinking” (chain-of-thought reasoning) in fundamentally different ways. This matters because thinking mode can conflict with tool calling, and the output appears in different places depending on the provider. FIM One handles all of these transparently — this table helps you understand what’s happening under the hood.

Key Concepts

Opt-in — thinking is off by default; you enable it via an API parameter (e.g., reasoning_effort). Can be selectively disabled per call.
Always-on — the model always thinks; no API parameter to turn it off. You’d need to switch to a non-thinking model variant to avoid it.
Model-level — thinking is determined by which model ID you choose (e.g., deepseek-reasoner vs deepseek-chat), not by a parameter.

Compatibility Matrix

Provider	How to Enable	Can Disable?	Thinking Output	Forced FC Conflict?
OpenAI (GPT-5.x)	`reasoning_effort` param	✅ Opt-in	Internal (not visible to user)	⚠️ API drops `reasoning_effort` when tools present
OpenAI (o-series)	Always-on	❌	Internal (tokens counted, not returned)	✅ No conflict
Anthropic (Claude 4.x)	`reasoning_effort` → `thinking`	✅ Opt-in	API `reasoning_content` field → Reasoning panel	❌ Forced FC + thinking = 400 error
Google Gemini (2.5/3.x)	`reasoning_effort` param	✅ Opt-in	Internal	✅ No conflict
DeepSeek	Model variant (`deepseek-reasoner`)	Model-level	API `reasoning_content` field → Reasoning panel	⚠️ Forced FC unreliable
Qwen (3.x)	`enable_thinking` param	✅ Opt-in	`<think>` tags in content	⚠️ Partial FC support
MiniMax (M2.7)	Always-on	❌	`<think>` tags in content	✅ No conflict
ChatGLM (GLM-5)	Model variant	Model-level	Not externalized	N/A — forced FC not supported
Kimi (K2-thinking)	Model variant	Model-level	API field	❌ Forced FC + thinking = conflict

How FIM One Handles Each Case

API-level reasoning_content (Claude, DeepSeek): The reasoning field is read directly from the API response and displayed in the UI Reasoning panel. No post-processing needed. <think> tags in content (MiniMax, Qwen, QwQ, and other open-source derivatives): FIM One automatically strips <think>...</think> tags from the content field and reroutes the thinking text to the Reasoning panel. This works for both streaming and non-streaming responses. Forced FC + thinking conflicts (Claude, Kimi): When FIM One needs forced function calling (e.g., during DAG planning’s structured output extraction), it temporarily disables thinking for that specific call by passing reasoning_effort=None. This works because Claude’s thinking is opt-in — not sending the parameter means no thinking, which avoids the 400 error. For providers where thinking cannot be disabled (MiniMax), forced FC works fine since those providers don’t reject the combination. Fallback chain: If forced function calling fails for any reason, FIM One falls back automatically: native FC → JSON mode → plain text extraction. This three-tier approach ensures planning works even with providers that have partial tool-calling support.

If you’re using a model that always thinks (MiniMax M2.7, DeepSeek R1) as your Main LLM, the thinking output will appear in every agent iteration’s Reasoning panel. This is normal — it doesn’t affect functionality, and you get to see the model’s reasoning process.

Provider Details

OpenAI

The most battle-tested option. OpenAI models have the best native function calling (tool-calling) support, which directly impacts agent reliability. The GPT-5 family (August 2025+) is a major generational leap over GPT-4. Recommended models:

Main: gpt-5.4 (latest flagship, Mar 2026 — 1M+ context, computer use) or o3 (best reasoning accuracy)
Fast: gpt-5.4-mini ( $0.75/$ 4.50 per MTok) or gpt-5.4-nano (cheapest at $0.20/$ 1.25 per MTok)
Budget Fast: gpt-5-mini ( $0.25/$ 2.00) and gpt-5-nano ( $0.05/$ 0.40) remain available at lower prices
Legacy: gpt-4.1 (still in API, 1M context, good for coding)

Reasoning: Set LLM_REASONING_EFFORT=medium — works natively with o-series and GPT-5.x models. GPT-5.4 supports reasoning_effort with levels none, low, medium, high, xhigh. The o-series requires max_completion_tokens instead of max_tokens, which LiteLLM handles automatically. Note: GPT-5.x still drops reasoning_effort when tools are present in /v1/chat/completions — FIM One silently drops it during agent tool-use steps so workflows run uninterrupted. GPT-5.4 requires temperature=1 — FIM One handles this automatically via LiteLLM’s parameter filtering (drop_params).

Model	Input $/MTok	Output $/MTok	Context
`gpt-5.4`	$2.50	$15.00	1,050K (surcharge >272K)
`gpt-5.4-mini`	$0.75	$4.50	400K
`gpt-5.4-nano`	$0.20	$1.25	400K
`o3`	$2.00	$8.00	200K
`o4-mini`	$1.10	$4.40	200K
`gpt-5-mini`	$0.25	$2.00	400K
`gpt-5-nano`	$0.05	$0.40	400K

# .env — OpenAI (production with reasoning)
LLM_API_KEY=sk-...
LLM_BASE_URL=https://api.openai.com/v1
LLM_MODEL=gpt-5.4
FAST_LLM_MODEL=gpt-5.4-nano
LLM_REASONING_EFFORT=medium

# .env — OpenAI (budget reasoning)
LLM_API_KEY=sk-...
LLM_BASE_URL=https://api.openai.com/v1
LLM_MODEL=o3
FAST_LLM_MODEL=gpt-5.4-nano
LLM_REASONING_EFFORT=medium

Anthropic (Claude)

Claude excels at nuanced reasoning and complex multi-step tasks. FIM One connects via LiteLLM, which routes Anthropic models through their native API automatically. The current generation is Claude 4.6 (February 2026). Recommended models:

Main: claude-sonnet-4-6 (best balance of capability and cost — $3/$ 15 per MTok)
Fast: claude-haiku-4-5 (fast and cheap — $1/$ 5 per MTok)
Premium: claude-opus-4-6 (most capable, 128K max output — $5/$ 25 per MTok)

Base URL: https://api.anthropic.com/v1/ Opus 4.6 and Sonnet 4.6 have a 1M context window (GA since March 13, 2026 — no beta header needed). Haiku 4.5 has a 200K context window. Reasoning: Set LLM_REASONING_EFFORT=medium — LiteLLM routes Anthropic models through the native API, so reasoning_content (extended thinking) is fully returned and visible in the UI “thinking” step. Claude 4.6 models support Adaptive Thinking (thinking: {type: "adaptive"}) which replaces manual budget_tokens — LiteLLM handles the translation automatically. When extended thinking is enabled, Anthropic requires temperature=1 — set LLM_TEMPERATURE=1 in your .env or model configuration. See Extended Thinking for details.

# .env — Anthropic Claude
LLM_API_KEY=sk-ant-...
LLM_BASE_URL=https://api.anthropic.com/v1/
LLM_MODEL=claude-sonnet-4-6
FAST_LLM_MODEL=claude-haiku-4-5
LLM_REASONING_EFFORT=medium

Google Gemini

Gemini models offer strong performance at competitive pricing via Google’s OpenAI-compatible endpoint. The 3.x generation (late 2025+) is a major leap — Gemini 3 Flash outperforms 2.5 Pro while being 3x faster. Note: gemini-3-pro-preview was shut down March 9, 2026 — use gemini-3.1-pro-preview instead. Recommended models:

Stable (GA): gemini-2.5-pro (main) + gemini-2.5-flash (fast) — production-ready
Latest (Preview): gemini-3.1-pro-preview (main) + gemini-3-flash-preview (fast) + gemini-3.1-flash-lite-preview (budget fast) — best performance, but preview status

Base URL: https://generativelanguage.googleapis.com/v1beta/openai/ Reasoning: reasoning_effort is supported on the compatibility endpoint — set LLM_REASONING_EFFORT=medium and it works out of the box.

Model	Input $/MTok	Output $/MTok	Status
`gemini-3.1-pro-preview`	$2.00	$12.00	Preview
`gemini-3-flash-preview`	$0.50	$3.00	Preview
`gemini-3.1-flash-lite-preview`	$0.25	$1.50	Preview (Mar 2026)
`gemini-2.5-pro`	$1.25	$10.00	Stable GA
`gemini-2.5-flash`	$0.30	$2.50	Stable GA
`gemini-2.5-flash-lite`	$0.10	$0.40	Stable GA

# .env — Gemini (stable)
LLM_API_KEY=AIza...
LLM_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai/
LLM_MODEL=gemini-2.5-pro
FAST_LLM_MODEL=gemini-2.5-flash
LLM_REASONING_EFFORT=medium

# .env — Gemini (latest preview)
LLM_API_KEY=AIza...
LLM_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai/
LLM_MODEL=gemini-3.1-pro-preview
FAST_LLM_MODEL=gemini-3-flash-preview
LLM_REASONING_EFFORT=medium

DeepSeek

DeepSeek offers the best cost/performance ratio in the market. V3.2 (December 2025) unified the chat and reasoning lineages into a single model, with incredibly low pricing. Model IDs (both backed by V3.2):

deepseek-chat — general purpose (non-thinking mode)
deepseek-reasoner — chain-of-thought reasoning mode, returns reasoning_content

Base URL: https://api.deepseek.com Pricing:

0.28/

0.42 per MTok (cache hit: $0.028) — by far the cheapest frontier-class API. Output limits: deepseek-chat max output is 8K tokens (must set explicitly via max_tokens). deepseek-reasoner max output is 64K tokens (includes chain-of-thought).

V4 expected April 2026: trillion-parameter multimodal model with 1M context window. Expect new model IDs when it launches.

# .env — DeepSeek (budget-friendly)
LLM_API_KEY=sk-...
LLM_BASE_URL=https://api.deepseek.com
LLM_MODEL=deepseek-chat
FAST_LLM_MODEL=deepseek-chat

# .env — DeepSeek (with reasoning)
LLM_API_KEY=sk-...
LLM_BASE_URL=https://api.deepseek.com
LLM_MODEL=deepseek-reasoner
FAST_LLM_MODEL=deepseek-chat

Chinese Domestic Models

All major Chinese model providers expose OpenAI-compatible endpoints. These are particularly strong for Chinese-language tasks and offer competitive local pricing.

Qwen / 通义千问 (Alibaba Cloud)

Qwen 3.5 (February 2026) is the latest generation — the 397B MoE flagship outperforms GPT-5.2 on MMLU-Pro. Strongest Chinese language support and cheapest frontier-class pricing (~$0.11/MTok input).

Base URL (China): https://dashscope.aliyuncs.com/compatible-mode/v1
Base URL (Global): https://dashscope-intl.aliyuncs.com/compatible-mode/v1
Main: qwen3.5-plus (flagship, 1M context, $0.11/$ 0.66 per MTok) or qwen3-max (256K, strongest)
Fast: qwen3.5-flash ( $0.055/$ 0.22 per MTok) or qwen-turbo ( $0.04/$ 0.08 per MTok)
Reasoning: qwen3-max with enable_thinking: true parameter (there is no separate qwen3-max-thinking model ID)

# .env — Qwen (China)
LLM_API_KEY=sk-...
LLM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
LLM_MODEL=qwen3.5-plus
FAST_LLM_MODEL=qwen3.5-flash

# .env — Qwen (Global)
LLM_API_KEY=sk-...
LLM_BASE_URL=https://dashscope-intl.aliyuncs.com/compatible-mode/v1
LLM_MODEL=qwen3.5-plus
FAST_LLM_MODEL=qwen3.5-flash

ChatGLM / 智谱

GLM-4.7 and GLM-5 (2026) are the latest models. GLM-5 is the 745B MoE flagship approaching Claude Opus-level on coding/agent tasks.

Base URL (Domestic): https://open.bigmodel.cn/api/paas/v4
Base URL (Z.AI International): https://api.z.ai/api/paas/v4
Main: glm-4.7 (strong coding, $0.60/$ 2.20 on Z.AI)
Fast: glm-4.7-flash (free tier!) or glm-4.7-flashx ( $0.07/$ 0.40, higher throughput)
Reasoning: glm-5 (745B MoE flagship, $1.00/$ 3.20)

Forced tool_choice is not supported — only "auto" works.

Some HTTP clients auto-append /v1 to base URLs. Zhipu uses /v4 — ensure your client does not force an OpenAI-style path suffix or you’ll get 404 errors.

# .env — ChatGLM (domestic)
LLM_API_KEY=...
LLM_BASE_URL=https://open.bigmodel.cn/api/paas/v4
LLM_MODEL=glm-4.7
FAST_LLM_MODEL=glm-4.7-flash

# .env — ChatGLM (Z.AI international)
LLM_API_KEY=...
LLM_BASE_URL=https://api.z.ai/api/paas/v4
LLM_MODEL=glm-4.7
FAST_LLM_MODEL=glm-4.7-flash

MiniMax

MiniMax M2.7 (March 18, 2026) is the latest model, open-weight and scores 80.2% on SWE-Bench. M2.5 remains available as a fast/budget option. MiniMax provides two separate API endpoints for different regions:

Base URL (Global/海外版): https://api.minimax.io/v1 — for users outside mainland China
Base URL (China/国内版): https://api.minimaxi.com/v1 — for users in mainland China (note the extra i in minimaxi)
Main: MiniMax-M2.7
Fast: MiniMax-M2.5
Speed: MiniMax-M2.7-highspeed (2x cost, lower latency)

Model	Input $/MTok	Output $/MTok
`MiniMax-M2.7`	$0.30	$1.20
`MiniMax-M2.7-highspeed`	$0.60	$2.40
`MiniMax-M2.5`	$0.30	$1.20
`MiniMax-M2.5-highspeed`	$0.60	$2.40

# .env — MiniMax (global endpoint)
LLM_API_KEY=...
LLM_BASE_URL=https://api.minimax.io/v1
LLM_MODEL=MiniMax-M2.7
FAST_LLM_MODEL=MiniMax-M2.5

# .env — MiniMax (China mainland endpoint)
LLM_API_KEY=...
LLM_BASE_URL=https://api.minimaxi.com/v1
LLM_MODEL=MiniMax-M2.7
FAST_LLM_MODEL=MiniMax-M2.5

Kimi / 月之暗面 (Moonshot)

Kimi K2.5 (January 2026) has 256K context and strong coding performance (76.8% SWE-Bench among open-source models).

Base URL (Global): https://api.moonshot.ai/v1
Base URL (China): https://api.moonshot.cn/v1
Main: kimi-k2.5
Fast: kimi-k2 (non-thinking, function calling works)
Reasoning: kimi-k2-thinking ( $0.47/$ 2.00 per MTok)

Forced tool_choice only works when thinking mode is off. When thinking is enabled, only "auto" is supported.

# .env — Kimi (Global)
LLM_API_KEY=...
LLM_BASE_URL=https://api.moonshot.ai/v1
LLM_MODEL=kimi-k2.5
FAST_LLM_MODEL=kimi-k2

# .env — Kimi (China)
LLM_API_KEY=...
LLM_BASE_URL=https://api.moonshot.cn/v1
LLM_MODEL=kimi-k2.5
FAST_LLM_MODEL=kimi-k2

Local Models (Ollama)

Run models entirely on your own hardware — no API key needed, fully offline. Ollama exposes an OpenAI-compatible endpoint out of the box. The open-source landscape has changed dramatically — Qwen 3.5, Llama 4, and GPT-OSS (OpenAI’s first open-weight models) are all available. Base URL: http://localhost:11434/v1 Recommended models by VRAM:

VRAM	Main LLM	Fast LLM	Notes
8 GB	`qwen3.5:9b` / `gemma3:4b`	`qwen3.5:4b`	Qwen 3.5 9B is the standout at this tier
16 GB	`gpt-oss:20b` / `deepseek-r1:14b`	`qwen3.5:9b`	GPT-OSS 20B is agent-optimized
24 GB	`qwen3:32b` / `deepseek-r1:32b`	`qwen3.5:9b`	Qwen 3 32B is best for tool-calling
48 GB+	`llama3.3:70b` / `gpt-oss:120b`	`qwen3.5:14b`	Near-frontier quality

Best for tool-calling: Qwen 3/3.5 (32B+), GLM-4.7, GPT-OSS, Mistral — these have explicit function-calling training. Models with 14B+ parameters are the minimum for reliable tool calling; 32B+ is strongly preferred.

Tool-calling quality varies significantly across local models. Not all models reliably generate valid function calls. Test your chosen model with agent workflows before using in production. The general rule: 14B minimum, 32B+ recommended for agent tasks.

# .env — Ollama (balanced, 16GB VRAM)
LLM_API_KEY=ollama
LLM_BASE_URL=http://localhost:11434/v1
LLM_MODEL=gpt-oss:20b
FAST_LLM_MODEL=qwen3.5:9b
LLM_CONTEXT_SIZE=32768
LLM_MAX_OUTPUT_TOKENS=8192

# .env — Ollama (agent-optimized, 24GB VRAM)
LLM_API_KEY=ollama
LLM_BASE_URL=http://localhost:11434/v1
LLM_MODEL=qwen3:32b
FAST_LLM_MODEL=qwen3.5:9b
LLM_CONTEXT_SIZE=32768
LLM_MAX_OUTPUT_TOKENS=8192

Third-Party Relay Platforms

Many users access multiple model providers through a single relay (proxy) service. FIM One automatically detects the correct API protocol based on URL path patterns — just fill in the LLM_BASE_URL and it works.

How It Works

When your base URL points to a third-party relay, FIM One inspects the URL path to determine which protocol to use:

URL Path Contains	Detected Protocol	Auth Header	Key Benefit
`/v1` (or no match)	OpenAI compatible	`Authorization: Bearer`	Universal fallback, works with most relays
`/claude` or `/anthropic`	Anthropic native	`x-api-key`	Full `reasoning_content` (extended thinking) support
`/gemini`	Google native	`x-goog-api-key`	Native Gemini parameter translation

Resolution order: Explicit DB provider field > domain match (official APIs) > URL path hint (relay platforms) > OpenAI compatible fallback.

Example: One Relay, Three Protocols

With a single relay account, you can access different providers by simply changing the base URL path:

# .env — Claude via relay (Anthropic native protocol)
LLM_API_KEY=your-relay-key
LLM_BASE_URL=https://relay.example.com/anthropic
LLM_MODEL=claude-sonnet-4-6

# .env — Gemini via relay (Google native protocol)
LLM_API_KEY=your-relay-key
LLM_BASE_URL=https://relay.example.com/gemini
LLM_MODEL=gemini-2.5-pro

# .env — GPT via relay (OpenAI compatible protocol)
LLM_API_KEY=your-relay-key
LLM_BASE_URL=https://relay.example.com/v1
LLM_MODEL=gpt-5.4

No extra configuration needed — authentication headers, parameter formats, and response parsing all switch automatically.

Step-by-Step: How Path Detection Works

Here’s a concrete example showing what happens internally when you configure a relay:

# .env — Claude via a relay platform
LLM_API_KEY=your-relay-key
LLM_BASE_URL=https://my-relay.example.com/claude
LLM_MODEL=claude-sonnet-4-6
LLM_REASONING_EFFORT=medium

FIM One sees /claude in the URL path → detects Anthropic native protocol
Model is prefixed as anthropic/claude-sonnet-4-6 for LiteLLM routing
Requests use Anthropic’s /v1/messages format with x-api-key auth header
reasoning_effort=medium is translated to Anthropic’s native thinking parameter (not OpenAI’s reasoning_effort)

If the same relay URL were https://my-relay.example.com/v1 instead, the /claude hint would be missing — FIM One would fall back to OpenAI-compatible protocol, sending /v1/chat/completions requests to a Claude-native endpoint, which would fail. The URL path matters.

Why This Matters

Anthropic native endpoint gives you proper reasoning_content support (extended thinking visible in the UI), correct tool-calling format, and x-api-key authentication — features lost when using OpenAI-compatible translation.
Google native endpoint gives you native Gemini parameters and x-goog-api-key authentication.
OpenAI compatible is the universal fallback and works with any relay, but provider-specific features (like extended thinking output) may be unavailable.

If your relay platform uses non-standard path conventions (e.g., no /claude or /anthropic in the URL), FIM One falls back to OpenAI compatible protocol — which works for most use cases. For full native protocol support, you can set the provider field explicitly via the admin model configuration UI.

Configuration Strategy

Main vs Fast: When to Split

Split when your main model is expensive or slow (e.g., gpt-5.4 + gpt-5.4-nano). DAG mode runs many parallel steps — using a cheaper fast model saves significant cost.
Same model when your model is already cheap (e.g., deepseek-chat for both). The overhead of managing two models isn’t worth it.

When to Enable Reasoning

Enable for complex analytical tasks, multi-step planning, and tasks requiring careful judgment
Disable (default) for routine tasks, simple Q&A, and cost-sensitive deployments
Reasoning typically increases cost 2-5x per request — medium effort is a good starting point

Context Window Sizing

Set LLM_CONTEXT_SIZE to match your model’s actual window:

Model	Context Window
GPT-5.4	1,050K (surcharge >272K)
o3 / o4-mini	200K
Claude Opus 4.6	1M
Claude Sonnet 4.6	1M
Claude Haiku 4.5	200K
Gemini 2.5 Pro	1M
Gemini 3.1 Pro	1M
DeepSeek V3.2	128K
Qwen 3.5 Plus	1M
Local (Ollama)	4K–128K (varies)

For local models, set both LLM_CONTEXT_SIZE and LLM_MAX_OUTPUT_TOKENS explicitly — defaults assume cloud-scale context windows that local models cannot support.

Why FIM One

Getting Started

Configuration

Integrations

Features

Extending FIM One

Recommended Models

How FIM One Uses Models

Quick Selection Matrix

Structured Output Compatibility

Thinking / Reasoning Compatibility

Key Concepts

Compatibility Matrix

How FIM One Handles Each Case

Provider Details

OpenAI

Anthropic (Claude)

Google Gemini

DeepSeek

Chinese Domestic Models

Qwen / 通义千问 (Alibaba Cloud)

ChatGLM / 智谱

MiniMax

Kimi / 月之暗面 (Moonshot)

Local Models (Ollama)

Third-Party Relay Platforms

How It Works

Example: One Relay, Three Protocols

Step-by-Step: How Path Detection Works

Why This Matters

Configuration Strategy

Main vs Fast: When to Split

When to Enable Reasoning

Context Window Sizing

Why FIM One

Getting Started

Configuration

Integrations

Features

Extending FIM One

Documentation Index

​How FIM One Uses Models

​Quick Selection Matrix

​Structured Output Compatibility

​Thinking / Reasoning Compatibility

​Key Concepts

​Compatibility Matrix

​How FIM One Handles Each Case

​Provider Details

​OpenAI

​Anthropic (Claude)

​Google Gemini

​DeepSeek

​Chinese Domestic Models

​Qwen / 通义千问 (Alibaba Cloud)

​ChatGLM / 智谱

​MiniMax

​Kimi / 月之暗面 (Moonshot)

​Local Models (Ollama)

​Third-Party Relay Platforms

​How It Works

​Example: One Relay, Three Protocols

​Step-by-Step: How Path Detection Works

​Why This Matters

​Configuration Strategy

​Main vs Fast: When to Split

​When to Enable Reasoning

​Context Window Sizing

How FIM One Uses Models

Quick Selection Matrix

Structured Output Compatibility

Thinking / Reasoning Compatibility

Key Concepts

Compatibility Matrix

How FIM One Handles Each Case

Provider Details

OpenAI

Anthropic (Claude)

Google Gemini

DeepSeek

Chinese Domestic Models

Qwen / 通义千问 (Alibaba Cloud)

ChatGLM / 智谱

MiniMax

Kimi / 月之暗面 (Moonshot)

Local Models (Ollama)

Third-Party Relay Platforms

How It Works

Example: One Relay, Three Protocols

Step-by-Step: How Path Detection Works

Why This Matters

Configuration Strategy

Main vs Fast: When to Split

When to Enable Reasoning

Context Window Sizing