Skip to main content
FIM One is provider-agnostic — any OpenAI-compatible endpoint works. This page helps you pick the best model combination for your use case. For configuration details, see Environment Variables.

How FIM One Uses Models

FIM One has two model slots:
SlotEnv VariableUsed For
Main LLMLLM_MODELPlanning, analysis, ReAct agent, complex reasoning
Fast LLMFAST_LLM_MODELDAG step execution, context compaction (cheaper, faster)
If FAST_LLM_MODEL is not set, it falls back to LLM_MODEL. For production deployments, splitting into two models gives the best cost/quality balance.

Quick Selection Matrix

ProviderMain LLMFast LLMReasoningNotes
OpenAIgpt-5.4 / o3gpt-5-mini / gpt-5-nanoreasoning_effortBest native tool-calling; GPT-5.4 is the latest flagship
Anthropicclaude-sonnet-4-6claude-haiku-4-5✅ via LiteLLMNative API routing; full reasoning_content support
Google Geminigemini-2.5-pro / gemini-3.1-pro-previewgemini-2.5-flash / gemini-3-flash-previewreasoning_effort2.5 is stable GA; 3.x is preview
DeepSeekdeepseek-chat (V3.2)deepseek-chatdeepseek-reasonerBest cost/performance; V4 imminent
Qwen (Alibaba)qwen3.5-plus / qwen3-maxqwen-turboqwen3-max-thinkingStrongest Chinese language support
ChatGLM (Zhipu)glm-5glm-4-flashGLM-5 is 744B MoE; free tier on glm-4-flash
MiniMaxMiniMax-M2.5MiniMax-M2.5-LightningOpen-weight, strong coding (80.2% SWE-Bench)
Kimi (Moonshot)kimi-k2.5kimi-k2.5256K context, strong coding
Ollama (local)qwen3.5 / llama4qwen3.5:9bFully offline, no API key

Provider Details

OpenAI

The most battle-tested option. OpenAI models have the best native function calling (tool-calling) support, which directly impacts agent reliability. The GPT-5 family (August 2025+) is a major generational leap over GPT-4. Recommended models:
  • Main: gpt-5.4 (latest flagship, Mar 2026 — built-in computer use) or o3 (best reasoning accuracy)
  • Fast: gpt-5-mini (0.25/0.25/2.00 per MTok) or gpt-5-nano (cheapest at 0.05/0.05/0.40 per MTok)
  • Legacy: gpt-4.1 (still in API, 1M context, good for coding) — retired from ChatGPT Feb 2026
Reasoning: Set LLM_REASONING_EFFORT=medium — works natively with o-series and GPT-5.x models. The o-series requires max_completion_tokens instead of max_tokens, which LiteLLM handles automatically. Note: GPT-5.x does not support reasoning_effort combined with tool-calling in /v1/chat/completions — FIM One silently drops it during agent tool-use steps so workflows run uninterrupted. GPT-5.x also only supports temperature=1 — FIM One handles this automatically via LiteLLM’s parameter filtering (drop_params).
ModelInput $/MTokOutput $/MTokContext
gpt-5.4$2.50$15.00272K
o3$2.00$8.00200K
o4-mini$1.10$4.40200K
gpt-5-mini$0.25$2.00
gpt-5-nano$0.05$0.40
# .env — OpenAI (production with reasoning)
LLM_API_KEY=sk-...
LLM_BASE_URL=https://api.openai.com/v1
LLM_MODEL=gpt-5.4
FAST_LLM_MODEL=gpt-5-nano
LLM_REASONING_EFFORT=medium
# .env — OpenAI (budget reasoning)
LLM_API_KEY=sk-...
LLM_BASE_URL=https://api.openai.com/v1
LLM_MODEL=o3
FAST_LLM_MODEL=gpt-5-nano
LLM_REASONING_EFFORT=medium

Anthropic (Claude)

Claude excels at nuanced reasoning and complex multi-step tasks. FIM One connects via LiteLLM, which routes Anthropic models through their native API automatically. The current generation is Claude 4.6 (February 2026). Recommended models:
  • Main: claude-sonnet-4-6 (best balance of capability and cost — 3/3/15 per MTok)
  • Fast: claude-haiku-4-5 (fast and cheap — 1/1/5 per MTok)
  • Premium: claude-opus-4-6 (most capable, 128K max output — 5/5/25 per MTok)
Base URL: https://api.anthropic.com/v1/ All current Claude models support extended thinking and have a 200K context window (1M in beta). Reasoning: Set LLM_REASONING_EFFORT=medium — LiteLLM routes Anthropic models through the native API, so reasoning_content (extended thinking) is fully returned and visible in the UI “thinking” step. When extended thinking is enabled, Anthropic requires temperature=1 — set LLM_TEMPERATURE=1 in your .env or model configuration. See Extended Thinking for details.
# .env — Anthropic Claude
LLM_API_KEY=sk-ant-...
LLM_BASE_URL=https://api.anthropic.com/v1/
LLM_MODEL=claude-sonnet-4-6
FAST_LLM_MODEL=claude-haiku-4-5
LLM_REASONING_EFFORT=medium

Google Gemini

Gemini models offer strong performance at competitive pricing via Google’s OpenAI-compatible endpoint. The 3.x generation (late 2025+) is a major leap — Gemini 3 Flash outperforms 2.5 Pro while being 3x faster. Recommended models:
  • Stable (GA): gemini-2.5-pro (main) + gemini-2.5-flash (fast) — production-ready
  • Latest (Preview): gemini-3.1-pro-preview (main) + gemini-3-flash-preview (fast) — best performance, but preview status
Base URL: https://generativelanguage.googleapis.com/v1beta/openai/ Reasoning: reasoning_effort is supported on the compatibility endpoint — set LLM_REASONING_EFFORT=medium and it works out of the box.
ModelInput $/MTokOutput $/MTokStatus
gemini-3.1-pro-preview$2.00$12.00Preview
gemini-3-flash-preview$0.50$3.00Preview
gemini-2.5-pro$1.25$10.00Stable GA
gemini-2.5-flash$0.30$2.50Stable GA
gemini-2.5-flash-lite$0.10$0.40Stable GA
# .env — Gemini (stable)
LLM_API_KEY=AIza...
LLM_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai/
LLM_MODEL=gemini-2.5-pro
FAST_LLM_MODEL=gemini-2.5-flash
LLM_REASONING_EFFORT=medium
# .env — Gemini (latest preview)
LLM_API_KEY=AIza...
LLM_BASE_URL=https://generativelanguage.googleapis.com/v1beta/openai/
LLM_MODEL=gemini-3.1-pro-preview
FAST_LLM_MODEL=gemini-3-flash-preview
LLM_REASONING_EFFORT=medium

DeepSeek

DeepSeek offers the best cost/performance ratio in the market. V3.2 (December 2025) unified the chat and reasoning lineages into a single model, with incredibly low pricing. Model IDs (both backed by V3.2):
  • deepseek-chat — general purpose (non-thinking mode)
  • deepseek-reasoner — chain-of-thought reasoning mode, returns reasoning_content
Base URL: https://api.deepseek.com Pricing: 0.28/0.28/0.42 per MTok (cache hit: $0.028) — by far the cheapest frontier-class API.
V4 is imminent (March 2026): trillion-parameter multimodal model with 1M context window. Expect new model IDs when it launches.
# .env — DeepSeek (budget-friendly)
LLM_API_KEY=sk-...
LLM_BASE_URL=https://api.deepseek.com
LLM_MODEL=deepseek-chat
FAST_LLM_MODEL=deepseek-chat
# .env — DeepSeek (with reasoning)
LLM_API_KEY=sk-...
LLM_BASE_URL=https://api.deepseek.com
LLM_MODEL=deepseek-reasoner
FAST_LLM_MODEL=deepseek-chat

Chinese Domestic Models

All major Chinese model providers expose OpenAI-compatible endpoints. These are particularly strong for Chinese-language tasks and offer competitive local pricing.

Qwen / 通义千问 (Alibaba Cloud)

Qwen 3.5 (February 2026) is the latest generation — the 397B MoE flagship outperforms GPT-5.2 on MMLU-Pro.
  • Base URL: https://dashscope.aliyuncs.com/compatible-mode/v1
  • International: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
  • Main: qwen3.5-plus (flagship, 1M context) or qwen3-max (trillion-param)
  • Fast: qwen-turbo (fast and cheap)
  • Reasoning: qwen3-max-thinking (comparable to GPT-5.2-Thinking)
# .env — Qwen
LLM_API_KEY=sk-...
LLM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
LLM_MODEL=qwen3.5-plus
FAST_LLM_MODEL=qwen-turbo

ChatGLM / 智谱

GLM-5 (2026) is the latest flagship — 744B total params (40B active), approaching Claude Opus-level on coding/agent tasks.
  • Base URL: https://open.bigmodel.cn/api/paas/v4
  • Main: glm-5 (flagship)
  • Fast: glm-4-flash (free tier available!)
Some HTTP clients auto-append /v1 to base URLs. Zhipu uses /v4 — ensure your client does not force an OpenAI-style path suffix or you’ll get 404 errors.
# .env — ChatGLM
LLM_API_KEY=...
LLM_BASE_URL=https://open.bigmodel.cn/api/paas/v4
LLM_MODEL=glm-5
FAST_LLM_MODEL=glm-4-flash

MiniMax

MiniMax M2.5 (February 2026) is open-weight and scores 80.2% on SWE-Bench.
  • Base URL (China): https://api.minimaxi.com/v1
  • Base URL (Global): https://api.minimax.io
  • Main: MiniMax-M2.5
  • Fast: MiniMax-M2.5-Lightning
# .env — MiniMax
LLM_API_KEY=...
LLM_BASE_URL=https://api.minimaxi.com/v1
LLM_MODEL=MiniMax-M2.5
FAST_LLM_MODEL=MiniMax-M2.5-Lightning

Kimi / 月之暗面 (Moonshot)

Kimi K2.5 (January 2026) has 256K context and strong coding performance (76.8% SWE-Bench among open-source models).
  • Base URL: https://api.moonshot.ai/v1
  • Model: kimi-k2.5
# .env — Kimi
LLM_API_KEY=...
LLM_BASE_URL=https://api.moonshot.ai/v1
LLM_MODEL=kimi-k2.5
FAST_LLM_MODEL=kimi-k2.5

Local Models (Ollama)

Run models entirely on your own hardware — no API key needed, fully offline. Ollama exposes an OpenAI-compatible endpoint out of the box. The open-source landscape has changed dramatically — Qwen 3.5, Llama 4, and GPT-OSS (OpenAI’s first open-weight models) are all available. Base URL: http://localhost:11434/v1 Recommended models by VRAM:
VRAMMain LLMFast LLMNotes
8 GBqwen3.5:9b / gemma3:4bqwen3.5:4bQwen 3.5 9B is the standout at this tier
16 GBgpt-oss:20b / deepseek-r1:14bqwen3.5:9bGPT-OSS 20B is agent-optimized
24 GBqwen3:32b / deepseek-r1:32bqwen3.5:9bQwen 3 32B is best for tool-calling
48 GB+llama3.3:70b / gpt-oss:120bqwen3.5:14bNear-frontier quality
Best for tool-calling: Qwen 3/3.5 (32B+), GLM-4.7, GPT-OSS, Mistral — these have explicit function-calling training. Models with 14B+ parameters are the minimum for reliable tool calling; 32B+ is strongly preferred.
Tool-calling quality varies significantly across local models. Not all models reliably generate valid function calls. Test your chosen model with agent workflows before using in production. The general rule: 14B minimum, 32B+ recommended for agent tasks.
# .env — Ollama (balanced, 16GB VRAM)
LLM_API_KEY=ollama
LLM_BASE_URL=http://localhost:11434/v1
LLM_MODEL=gpt-oss:20b
FAST_LLM_MODEL=qwen3.5:9b
LLM_CONTEXT_SIZE=32768
LLM_MAX_OUTPUT_TOKENS=8192
# .env — Ollama (agent-optimized, 24GB VRAM)
LLM_API_KEY=ollama
LLM_BASE_URL=http://localhost:11434/v1
LLM_MODEL=qwen3:32b
FAST_LLM_MODEL=qwen3.5:9b
LLM_CONTEXT_SIZE=32768
LLM_MAX_OUTPUT_TOKENS=8192

Third-Party Relay Platforms

Many users access multiple model providers through a single relay (proxy) service. FIM One automatically detects the correct API protocol based on URL path patterns — just fill in the LLM_BASE_URL and it works.

How It Works

When your base URL points to a third-party relay, FIM One inspects the URL path to determine which protocol to use:
URL Path ContainsDetected ProtocolAuth HeaderKey Benefit
/v1 (or no match)OpenAI compatibleAuthorization: BearerUniversal fallback, works with most relays
/claude or /anthropicAnthropic nativex-api-keyFull reasoning_content (extended thinking) support
/geminiGoogle nativex-goog-api-keyNative Gemini parameter translation
Resolution order: Explicit DB provider field > domain match (official APIs) > URL path hint (relay platforms) > OpenAI compatible fallback.

Example: One Relay, Three Protocols

With a single relay account, you can access different providers by simply changing the base URL path:
# .env — Claude via relay (Anthropic native protocol)
LLM_API_KEY=your-relay-key
LLM_BASE_URL=https://relay.example.com/anthropic
LLM_MODEL=claude-sonnet-4-6
# .env — Gemini via relay (Google native protocol)
LLM_API_KEY=your-relay-key
LLM_BASE_URL=https://relay.example.com/gemini
LLM_MODEL=gemini-2.5-pro
# .env — GPT via relay (OpenAI compatible protocol)
LLM_API_KEY=your-relay-key
LLM_BASE_URL=https://relay.example.com/v1
LLM_MODEL=gpt-5.4
No extra configuration needed — authentication headers, parameter formats, and response parsing all switch automatically.

Step-by-Step: How Path Detection Works

Here’s a concrete example showing what happens internally when you configure a relay:
# .env — Claude via a relay platform
LLM_API_KEY=your-relay-key
LLM_BASE_URL=https://my-relay.example.com/claude
LLM_MODEL=claude-sonnet-4-6
LLM_REASONING_EFFORT=medium
  1. FIM One sees /claude in the URL path → detects Anthropic native protocol
  2. Model is prefixed as anthropic/claude-sonnet-4-6 for LiteLLM routing
  3. Requests use Anthropic’s /v1/messages format with x-api-key auth header
  4. reasoning_effort=medium is translated to Anthropic’s native thinking parameter (not OpenAI’s reasoning_effort)
If the same relay URL were https://my-relay.example.com/v1 instead, the /claude hint would be missing — FIM One would fall back to OpenAI-compatible protocol, sending /v1/chat/completions requests to a Claude-native endpoint, which would fail. The URL path matters.

Why This Matters

  • Anthropic native endpoint gives you proper reasoning_content support (extended thinking visible in the UI), correct tool-calling format, and x-api-key authentication — features lost when using OpenAI-compatible translation.
  • Google native endpoint gives you native Gemini parameters and x-goog-api-key authentication.
  • OpenAI compatible is the universal fallback and works with any relay, but provider-specific features (like extended thinking output) may be unavailable.
If your relay platform uses non-standard path conventions (e.g., no /claude or /anthropic in the URL), FIM One falls back to OpenAI compatible protocol — which works for most use cases. For full native protocol support, you can set the provider field explicitly via the admin model configuration UI.

Configuration Strategy

Main vs Fast: When to Split

  • Split when your main model is expensive or slow (e.g., gpt-5.4 + gpt-5-nano). DAG mode runs many parallel steps — using a cheaper fast model saves significant cost.
  • Same model when your model is already cheap (e.g., deepseek-chat for both). The overhead of managing two models isn’t worth it.

When to Enable Reasoning

  • Enable for complex analytical tasks, multi-step planning, and tasks requiring careful judgment
  • Disable (default) for routine tasks, simple Q&A, and cost-sensitive deployments
  • Reasoning typically increases cost 2-5x per request — medium effort is a good starting point

Context Window Sizing

Set LLM_CONTEXT_SIZE to match your model’s actual window:
ModelContext Window
GPT-5.4272K
o3 / o4-mini200K
Claude Sonnet 4.6200K (1M beta)
Gemini 2.5 Pro1M
Gemini 3.1 Pro1M
DeepSeek V3.2128K
Qwen 3.5 Plus1M
Local (Ollama)4K–128K (varies)
For local models, set both LLM_CONTEXT_SIZE and LLM_MAX_OUTPUT_TOKENS explicitly — defaults assume cloud-scale context windows that local models cannot support.