Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.fim.ai/llms.txt

Use this file to discover all available pages before exploring further.

FIM One provides a full-featured admin UI for managing LLM providers and models. This guide covers how to add providers, configure individual models, tune advanced structured-output settings, and organize models into groups for one-click switching. For ENV-based configuration (no admin UI), see Environment Variables. For model selection recommendations, see Recommended Models.

Architecture: Provider, Model, Group

FIM One organizes LLM configuration in three tiers:
TierWhat it representsExample
ProviderA set of shared credentials (API key + base URL). One provider can host many models.”My OpenAI Account”, “Company Bedrock Relay”
ModelAn individual model under a provider. Has its own display name, API model identifier, and advanced settings.”GPT-4o”, “Claude Sonnet 4.6”
Model GroupA named preset that assigns models to roles (General / Fast / Reasoning). Activating a group switches all roles at once.”Production (OpenAI)”, “Budget (DeepSeek)“
Provider: "My OpenAI Account"
  ├── Model: "GPT-4o"         (model_name: gpt-4o)
  ├── Model: "GPT-5 Nano"     (model_name: gpt-5-nano)
  └── Model: "o3"             (model_name: o3)

Provider: "Anthropic Direct"
  ├── Model: "Claude Sonnet"   (model_name: claude-sonnet-4-6)
  └── Model: "Claude Haiku"    (model_name: claude-haiku-4-5)

Group: "Production"
  ├── General → GPT-4o
  ├── Fast    → GPT-5 Nano
  └── Reasoning → o3

Adding a Provider

1

Open the Models page

Navigate to Admin (sidebar) and select the Models tab.
2

Click Add Provider

Click the Add Provider button in the top-right area of the Providers section.
3

Select a preset or use a custom endpoint

The dialog shows preset buttons for common providers: OpenAI, Anthropic (Claude), Google Gemini, DeepSeek, Mistral AI, and OpenAI Compatible (custom endpoint). Clicking a preset auto-fills the provider name and base URL.Choose OpenAI Compatible if your provider is not listed (e.g., a third-party relay, Ollama, or any other OpenAI-compatible endpoint).
4

Enter credentials

Fill in the required fields:
  • Provider Name — A friendly label (e.g., “My OpenAI Account”). This is for your reference only.
  • Base URL — The API endpoint. Presets fill this automatically. For custom endpoints, enter the full URL (e.g., http://localhost:11434/v1 for Ollama).
  • API Key — Your provider’s API key. For local models (Ollama), enter any non-empty string (e.g., ollama).
5

Save

Click Create. The provider appears in the list, ready for you to add models under it.
You can create multiple providers for the same service. For example, two “OpenAI” providers with different API keys for separate billing accounts, or an “Anthropic (Direct)” and “Anthropic (via Bedrock)” with different base URLs.

Adding a Model

1

Expand a provider

On the Models page, click the chevron next to an existing provider to expand it and see its models.
2

Click Add Model

Click the Add Model button that appears under the expanded provider.
3

Enter model details

Fill in the two required fields:
  • Display Name — A human-readable name shown in the UI (e.g., “GPT-4o”, “Claude Sonnet”). Can be anything you like.
  • Model Name (API) — The exact model identifier sent to the API (e.g., gpt-4o, claude-sonnet-4-6, deepseek-chat). This must match what your provider expects.
4

Configure advanced settings (optional)

Click the Advanced toggle to reveal additional settings: Max Output Tokens, Context Size, Temperature, Native Function Calling, and JSON Mode. See the Advanced Settings section below for details on each.
5

Save

Click Create. The model appears under its provider and is now available for assignment to model groups.

Advanced Settings

Each model has advanced settings that control how FIM One interacts with the provider’s API for structured output extraction. These settings are found under the Advanced toggle in the model create/edit dialog.

Native Function Calling

Setting name: Native Function Calling (stored as tool_choice_enabled) Default: ON Controls whether FIM One uses forced tool_choice for structured output extraction. This is Level 1 in the structured output degradation chain — the most reliable method when the model supports it. When to disable:
  • Your model returns errors like "tool_choice 'specified' is incompatible with thinking enabled" — common with always-on thinking models (DeepSeek R1, Kimi K2.5)
  • Structured output requests are consistently slow with a ~10-second penalty per call, followed by a fallback to JSON Mode anyway
Effect when disabled: FIM One skips Level 1 (native function calling) and starts from Level 2 (JSON Mode) for structured output. The ReAct agent’s tool calling is completely unaffected — it uses tool_choice="auto", which works with all models regardless of this setting.
This setting only affects forced tool selection used for structured output extraction (DAG planning, schema annotation). It does not affect the ReAct agent, which freely decides when to call tools using tool_choice="auto".
For technical details, see LLM Provider Compatibility — tool_choice_enabled.

JSON Mode

Setting name: JSON Mode (stored as json_mode_enabled) Default: ON Controls whether FIM One uses response_format=json_object for structured output. This is Level 2 in the degradation chain. When to disable:
  • Your provider rejects assistant message prefill — primarily AWS Bedrock relays, which throw "This model does not support assistant message prefill"
Effect when disabled: FIM One skips Level 2 (JSON Mode) and falls to Level 3 (plain text extraction). Modern models produce valid JSON from prompt instructions alone, so there is typically no quality loss. For technical details, see LLM Provider Compatibility — json_mode_enabled.

Temperature

Default: 0.7 (inherited from the global setting if left unset) Controls the randomness of model output. Range: 0 (deterministic) to 2 (highly creative).
When reasoning/extended thinking is enabled for Anthropic models, temperature is automatically forced to 1.0 by the system. You do not need to set this manually.

Max Output Tokens

The maximum number of tokens the model can generate in a single response. Leave blank to use the system default (64,000). For local models with limited VRAM, set this explicitly to a lower value (e.g., 8192).

Context Size

The model’s context window size in tokens. Leave blank to use the system default (128,000). Set this to match your model’s actual capability — for local models, this is often 4K-32K depending on the model and available memory.
Most models work correctly with the default settings (both toggles ON). Only adjust when you encounter errors or unnecessary latency. The table below covers common providers and models. Data sourced from UniAPI capability tags and verified against runtime behavior as of 2026-03-22. Model capabilities change frequently — if you encounter errors, check your provider’s latest documentation.

Quick Rules

  • Native FC ON for models with function calling support (most modern models)
  • Native FC OFF for thinking-always-on models that reject forced tool_choice
  • JSON Mode ON for most models (safe default)
  • JSON Mode OFF only for AWS Bedrock relays (prefill rejection)

Per-Provider Configuration Matrix

OpenAI
ModelRoleContextMax OutputNative FCJSON ModeNotes
gpt-5.4General1,050K128KONONFunction calling + structured output + reasoning
gpt-5.4-miniFast400K128KONONFunction calling + structured output + reasoning
o3-proReasoning200K100KONONReasoning model; FC works with auto-disabled thinking
Anthropic (Claude)
ModelRoleContextMax OutputNative FCJSON ModeNotes
claude-sonnet-4-6General1,000K64KONONFunction calling + reasoning; thinking auto-disabled for FC
claude-haiku-4-5Fast200K64KONONFunction calling supported
claude-opus-4-6Reasoning1,000K128KONONFunction calling + reasoning; thinking auto-disabled for FC
Google Gemini
ModelRoleContextMax OutputNative FCJSON ModeNotes
gemini-3.1-pro-previewGeneral1,048K65KONONLatest preview; successor to deprecated gemini-3-pro-preview
gemini-2.5-proFast1,048K65KONONStable GA; production-ready
gemini-3.1-pro-previewReasoning1,048K65KONONThinking support with configurable thinking_level
DeepSeek
ModelRoleContextMax OutputNative FCJSON ModeNotes
deepseek-chatGeneral128K8KONONV3.2 non-thinking mode; FC + JSON mode supported
deepseek-chatFast128K8KONONSame model as General; only two official API model IDs exist
deepseek-reasonerReasoning128K64KOFFONThinking always-on; forced tool_choice rejected; 64K includes CoT
xAI (Grok)
ModelRoleContextMax OutputNative FCJSON ModeNotes
grok-4-1-fast-non-reasoningGeneral2,000K30KONONFunction calling + structured output
grok-3-mini-fastFast131K131KONONFunction calling + structured output + reasoning; 131K is shared context budget
grok-4-1-fast-reasoningReasoning2,000K30KONONFunction calling + structured output + reasoning
Qwen (Alibaba Cloud)
ModelRoleContextMax OutputNative FCJSON ModeNotes
qwen3.5-plusGeneral1,000K64KONONFunction calling + structured output
qwen-turbo-latestFast1,000K16KONONFC likely supported (UniAPI tags incomplete)
qwq-plusReasoning131K16KONONReasoning + function calling; thinking toggleable via enable_thinking
Zhipu (GLM)
ModelRoleContextMax OutputNative FCJSON ModeNotes
glm-4.7General200K65KOFFONForced tool_choice not supported (auto only); strong coding
glm-4.7-flashxFast200K65KOFFONHigher throughput variant; free glm-4.7-flash also available
glm-5Reasoning200K65KOFFON745B MoE flagship; built-in reasoning (no API toggle)
Moonshot (Kimi)
ModelRoleContextMax OutputNative FCJSON ModeNotes
kimi-k2.5General262K65KOFFONFC works but forced tool_choice rejected when thinking is on (default)
kimi-k2Fast131K32KONONNon-thinking; native FC works (verified in production)
kimi-k2-thinkingReasoning131KOFFONThinking always-on; forced tool_choice rejected
MiniMax
ModelRoleContextMax OutputNative FCJSON ModeNotes
MiniMax-M2.7General205K131KONONLatest (Mar 2026); function calling + structured output
MiniMax-M2.5Fast197K65KONONFunction calling + structured output; cheaper cache read ($0.03/MTok)
MiniMax-M2.7-highspeedFast (speed)205K131KONON2x throughput (~100 tok/s), 2x cost
MiniMax-M2.5-highspeedFast (speed)197K65KONON2x throughput (~100 tok/s), 2x cost
ByteDance (Doubao)
ModelRoleContextMax OutputNative FCJSON ModeNotes
doubao-seed-2-0-proGeneral256K128KONONFunction calling + structured output + reasoning
doubao-seed-1-6Fast256K16KONONFunction calling + structured output + reasoning
doubao-seed-1-6Reasoning256K16KONONSupports reasoning_effort (minimal/low/medium/high)
Meta (Llama)
ModelRoleContextMax OutputNative FCJSON ModeNotes
llama-3.3-70bGeneral131K16KONONFC + JSON mode depend on hosting provider; max output varies (2K–16K)
”—” in Max Output means the provider did not report a limit. In practice, these models typically support 4K-16K output tokens. Set Max Output Tokens explicitly in the model’s Advanced settings if you need a specific value.
How to diagnose: Check your application logs for structured_llm_call: native_fc call raised warnings. If you see these warnings followed by successful JSON Mode extraction, the model does not benefit from native function calling. Disable Native Function Calling for that model to eliminate the wasted API call and the ~10-second latency penalty per structured output request.
Model capabilities change frequently as providers update their APIs. The recommendations above are based on data from 2026-03-26 (UniAPI capability tags + production runtime verification). If a model that previously worked starts returning errors, check the provider’s changelog for breaking changes.

Model Groups

Model groups let you assign models to specific roles and switch between configurations with a single click.

Roles

FIM One uses three model roles. Each role serves a different purpose in the execution pipeline:
RoleUsed forRecommendation
GeneralPlanning, analysis, ReAct agent, DAG step execution (default)Your most capable model (e.g., gpt-4o, claude-sonnet-4-6)
Fastmodel_hint="fast" DAG steps, context compaction, history summaryOptimized for speed and cost (e.g., gpt-5-nano, deepseek-chat). Falls back to General if not assigned.
Reasoningmodel_hint="reasoning" DAG steps, domain-escalated ReAct (legal/medical/financial)A strong reasoning model (e.g., o3, deepseek-reasoner). Falls back to General if not assigned.

Creating a Model Group

1

Open the Groups section

On the Admin > Models page, scroll to the Model Groups section.
2

Click Add Group

Click the Add Group button.
3

Name the group

Enter a descriptive name (e.g., “Production (OpenAI)”, “Budget (DeepSeek)”, “Local Dev”).
4

Assign models to roles

For each role (General, Fast, Reasoning), select a model from the dropdown. The dropdown shows all active models from active providers, grouped by provider name. You can leave a role unassigned — it will fall back to the General model (or to ENV-configured models if General is also unassigned).
5

Save

Click Create. The group is now available for activation.

Activating a Group

To activate a model group, use the dropdown or activation control on the Models page. Only one group can be active at a time. Activating a group immediately applies its model assignments to all new conversations. To deactivate the current group (falling back to ENV-configured models), select the deactivate option.
Switching the active model group affects all new conversations system-wide. Existing in-progress conversations continue using whichever model was active when they started.

Domain-Aware Model Escalation

When the auto-router detects a specialist domain — legal, medical, or financial — the system automatically escalates model selection beyond the normal role assignments:
  • ReAct mode: The general model is replaced by the reasoning model (registry.get_by_role("reasoning")). This means the Reasoning slot in your Model Group is not only used for DAG model_hint="reasoning" steps — it also serves as the escalation target for domain-specific ReAct tasks.
  • DAG mode: Domain context is injected into the planner prompt, guiding it to assign model_hint="reasoning" to steps requiring specialist accuracy.
This escalation is automatic and requires no configuration beyond having a Reasoning model assigned in your active Model Group (or via the REASONING_LLM_MODEL env var). Related environment variables:
VariableDefaultDescription
DAG_CITATION_VERIFICATIONtrueEnable post-step citation verification for legal/medical/financial content. Extracts citations via regex and verifies accuracy via LLM judgment.
DAG_STRUCTURED_CONTEXT_MULTIPLIER3.0Truncation budget multiplier for structured content (citations, tables, code blocks) in DAG dependency context. Higher values preserve more structured data between steps.
If your workload involves legal, medical, or financial queries, ensure your Reasoning model is a strong reasoner (e.g., o3, claude-opus-4-6, deepseek-reasoner). The automatic escalation relies on this slot being populated with a model that can handle domain-critical accuracy requirements.

ENV Fallback

When no admin-configured model group is active, FIM One falls back to ENV-based configuration:
RoleENV variable
GeneralLLM_MODEL
FastFAST_LLM_MODEL (falls back to LLM_MODEL)
ReasoningREASONING_LLM_MODEL (falls back to LLM_MODEL)
Admin-configured models always take priority over ENV variables. The system health check considers both sources — as long as either an active model group or valid ENV variables are configured, the LLM subsystem reports healthy. For the full ENV reference, see Environment Variables.

Export and Import

The Models page supports exporting your entire provider and model configuration (providers, models, and groups) as a JSON file, and importing it on another instance. This is useful for:
  • Migrating configuration between development, staging, and production environments
  • Sharing a known-good model setup with team members
  • Backing up your configuration before making changes
Exported configuration does not include API keys. After importing, you must edit each provider to enter the appropriate API key.