Skip to main content
FIM One provides a full-featured admin UI for managing LLM providers and models. This guide covers how to add providers, configure individual models, tune advanced structured-output settings, and organize models into groups for one-click switching. For ENV-based configuration (no admin UI), see Environment Variables. For model selection recommendations, see Recommended Models.

Architecture: Provider, Model, Group

FIM One organizes LLM configuration in three tiers:
TierWhat it representsExample
ProviderA set of shared credentials (API key + base URL). One provider can host many models.”My OpenAI Account”, “Company Bedrock Relay”
ModelAn individual model under a provider. Has its own display name, API model identifier, and advanced settings.”GPT-4o”, “Claude Sonnet 4.6”
Model GroupA named preset that assigns models to roles (General / Fast / Reasoning). Activating a group switches all roles at once.”Production (OpenAI)”, “Budget (DeepSeek)“
Provider: "My OpenAI Account"
  ├── Model: "GPT-4o"         (model_name: gpt-4o)
  ├── Model: "GPT-5 Nano"     (model_name: gpt-5-nano)
  └── Model: "o3"             (model_name: o3)

Provider: "Anthropic Direct"
  ├── Model: "Claude Sonnet"   (model_name: claude-sonnet-4-6)
  └── Model: "Claude Haiku"    (model_name: claude-haiku-4-5)

Group: "Production"
  ├── General → GPT-4o
  ├── Fast    → GPT-5 Nano
  └── Reasoning → o3

Adding a Provider

1

Open the Models page

Navigate to Admin (sidebar) and select the Models tab.
2

Click Add Provider

Click the Add Provider button in the top-right area of the Providers section.
3

Select a preset or use a custom endpoint

The dialog shows preset buttons for common providers: OpenAI, Anthropic (Claude), Google Gemini, DeepSeek, Mistral AI, and OpenAI Compatible (custom endpoint). Clicking a preset auto-fills the provider name and base URL.Choose OpenAI Compatible if your provider is not listed (e.g., a third-party relay, Ollama, or any other OpenAI-compatible endpoint).
4

Enter credentials

Fill in the required fields:
  • Provider Name — A friendly label (e.g., “My OpenAI Account”). This is for your reference only.
  • Base URL — The API endpoint. Presets fill this automatically. For custom endpoints, enter the full URL (e.g., http://localhost:11434/v1 for Ollama).
  • API Key — Your provider’s API key. For local models (Ollama), enter any non-empty string (e.g., ollama).
5

Save

Click Create. The provider appears in the list, ready for you to add models under it.
You can create multiple providers for the same service. For example, two “OpenAI” providers with different API keys for separate billing accounts, or an “Anthropic (Direct)” and “Anthropic (via Bedrock)” with different base URLs.

Adding a Model

1

Expand a provider

On the Models page, click the chevron next to an existing provider to expand it and see its models.
2

Click Add Model

Click the Add Model button that appears under the expanded provider.
3

Enter model details

Fill in the two required fields:
  • Display Name — A human-readable name shown in the UI (e.g., “GPT-4o”, “Claude Sonnet”). Can be anything you like.
  • Model Name (API) — The exact model identifier sent to the API (e.g., gpt-4o, claude-sonnet-4-6, deepseek-chat). This must match what your provider expects.
4

Configure advanced settings (optional)

Click the Advanced toggle to reveal additional settings: Max Output Tokens, Context Size, Temperature, Native Function Calling, and JSON Mode. See the Advanced Settings section below for details on each.
5

Save

Click Create. The model appears under its provider and is now available for assignment to model groups.

Advanced Settings

Each model has advanced settings that control how FIM One interacts with the provider’s API for structured output extraction. These settings are found under the Advanced toggle in the model create/edit dialog.

Native Function Calling

Setting name: Native Function Calling (stored as tool_choice_enabled) Default: ON Controls whether FIM One uses forced tool_choice for structured output extraction. This is Level 1 in the structured output degradation chain — the most reliable method when the model supports it. When to disable:
  • Your model returns errors like "tool_choice 'specified' is incompatible with thinking enabled" — common with always-on thinking models (DeepSeek R1, Kimi K2.5)
  • Structured output requests are consistently slow with a ~10-second penalty per call, followed by a fallback to JSON Mode anyway
Effect when disabled: FIM One skips Level 1 (native function calling) and starts from Level 2 (JSON Mode) for structured output. The ReAct agent’s tool calling is completely unaffected — it uses tool_choice="auto", which works with all models regardless of this setting.
This setting only affects forced tool selection used for structured output extraction (DAG planning, schema annotation). It does not affect the ReAct agent, which freely decides when to call tools using tool_choice="auto".
For technical details, see LLM Provider Compatibility — tool_choice_enabled.

JSON Mode

Setting name: JSON Mode (stored as json_mode_enabled) Default: ON Controls whether FIM One uses response_format=json_object for structured output. This is Level 2 in the degradation chain. When to disable:
  • Your provider rejects assistant message prefill — primarily AWS Bedrock relays, which throw "This model does not support assistant message prefill"
Effect when disabled: FIM One skips Level 2 (JSON Mode) and falls to Level 3 (plain text extraction). Modern models produce valid JSON from prompt instructions alone, so there is typically no quality loss. For technical details, see LLM Provider Compatibility — json_mode_enabled.

Temperature

Default: 0.7 (inherited from the global setting if left unset) Controls the randomness of model output. Range: 0 (deterministic) to 2 (highly creative).
When reasoning/extended thinking is enabled for Anthropic models, temperature is automatically forced to 1.0 by the system. You do not need to set this manually.

Max Output Tokens

The maximum number of tokens the model can generate in a single response. Leave blank to use the system default (64,000). For local models with limited VRAM, set this explicitly to a lower value (e.g., 8192).

Context Size

The model’s context window size in tokens. Leave blank to use the system default (128,000). Set this to match your model’s actual capability — for local models, this is often 4K-32K depending on the model and available memory.
Most models work correctly with the default settings (both toggles ON). Only adjust when you encounter errors or unnecessary latency. The table below covers common providers and models. Data sourced from UniAPI capability tags and verified against runtime behavior as of 2026-03-22. Model capabilities change frequently — if you encounter errors, check your provider’s latest documentation.

Quick Rules

  • Native FC ON for models with function calling support (most modern models)
  • Native FC OFF for thinking-always-on models that reject forced tool_choice
  • JSON Mode ON for most models (safe default)
  • JSON Mode OFF only for AWS Bedrock relays (prefill rejection)

Per-Provider Configuration Matrix

OpenAI
ModelRoleContextMax OutputNative FCJSON ModeNotes
gpt-5.4General1,050K128KONONFunction calling + structured output + reasoning
gpt-5.4-miniFast400K128KONONFunction calling + structured output + reasoning
o3-proReasoning200K100KONONReasoning model; FC works with auto-disabled thinking
Anthropic (Claude)
ModelRoleContextMax OutputNative FCJSON ModeNotes
claude-sonnet-4-6General1,000K64KONONFunction calling + reasoning; thinking auto-disabled for FC
claude-haiku-4-5Fast200K64KONONFunction calling supported
claude-opus-4-6Reasoning1,000K128KONONFunction calling + reasoning; thinking auto-disabled for FC
Google Gemini
ModelRoleContextMax OutputNative FCJSON ModeNotes
gemini-3-pro-previewGeneral1,048K65KONONFull support (UniAPI tags incomplete — Gemini natively supports FC)
gemini-2.5-proFast1,048K65KONONFull support
gemini-3.1-pro-previewReasoning1,048K65KONONFull support
DeepSeek
ModelRoleContextMax OutputNative FCJSON ModeNotes
deepseek-v3.2General164K64KONONFC supported (UniAPI tags incomplete)
deepseek-chatFast64K8KONONBasic chat model; FC supported
deepseek-reasonerReasoning164K164KOFFONThinking always-on; forced tool_choice may be rejected
xAI (Grok)
ModelRoleContextMax OutputNative FCJSON ModeNotes
grok-4-1-fast-non-reasoningGeneral2,000K2,000KONONFunction calling + structured output
grok-3-mini-fastFast131K131KONONFunction calling + structured output + reasoning
grok-4-1-fast-reasoningReasoning2,000K2,000KONONFunction calling + structured output + reasoning
Qwen (Alibaba Cloud)
ModelRoleContextMax OutputNative FCJSON ModeNotes
qwen3.5-plusGeneral1,000K64KONONFunction calling + structured output
qwen-turbo-latestFast1,000K16KONONFC likely supported (UniAPI tags incomplete)
qwq-plusReasoning128K8KONONReasoning + function calling (thinking may be toggleable)
Zhipu (GLM)
ModelRoleContextMax OutputNative FCJSON ModeNotes
glm-4.7General200KONONFunction calling + structured output + reasoning
glm-4.7-flashxFast200KONONFunction calling + structured output + reasoning
glm-5Reasoning200KONONFunction calling + structured output + reasoning
Moonshot (Kimi)
ModelRoleContextMax OutputNative FCJSON ModeNotes
kimi-k2.5General262KOFFONThinking always-on; forced tool_choice rejected (400 error)
kimi-k2Fast131KONONNon-thinking; native FC works (verified in production)
kimi-k2-thinkingReasoning63KOFFONThinking always-on; forced tool_choice rejected
MiniMax
ModelRoleContextMax OutputNative FCJSON ModeNotes
MiniMax-M2.5General205KONONFunction calling + structured output (verified in production)
MiniMax-M2.5-highspeedFast205KONONFunction calling + structured output (verified in production)
MiniMax-M1ReasoningONONFunction calling + structured output
ByteDance (Doubao)
ModelRoleContextMax OutputNative FCJSON ModeNotes
doubao-seed-2-0-proGeneral256K128KONONFunction calling + structured output + reasoning
doubao-seed-1-6Fast256K32KONONFunction calling + structured output + reasoning
doubao-seed-1-6Reasoning256K32KONONSupports reasoning_effort (minimal/low/medium/high)
Meta (Llama)
ModelRoleContextMax OutputNative FCJSON ModeNotes
llama-3.3-70bGeneral131K131KONONFC depends on hosting provider; try defaults first
”—” in Max Output means the provider did not report a limit. In practice, these models typically support 4K-16K output tokens. Set Max Output Tokens explicitly in the model’s Advanced settings if you need a specific value.
How to diagnose: Check your application logs for structured_llm_call: native_fc call raised warnings. If you see these warnings followed by successful JSON Mode extraction, the model does not benefit from native function calling. Disable Native Function Calling for that model to eliminate the wasted API call and the ~10-second latency penalty per structured output request.
Model capabilities change frequently as providers update their APIs. The recommendations above are based on data from 2026-03-22 (UniAPI capability tags + production runtime verification). If a model that previously worked starts returning errors, check the provider’s changelog for breaking changes.

Model Groups

Model groups let you assign models to specific roles and switch between configurations with a single click.

Roles

FIM One uses three model roles. Each role serves a different purpose in the execution pipeline:
RoleUsed forRecommendation
GeneralPlanning, analysis, ReAct agent, complex reasoningYour most capable model (e.g., gpt-4o, claude-sonnet-4-6)
FastDAG step execution, context compactionOptimized for speed and cost (e.g., gpt-5-nano, deepseek-chat). Falls back to General if not assigned.
ReasoningTasks requiring deep analysis — complex planning, mathematical proofs, multi-step logicA strong reasoning model (e.g., o3, deepseek-reasoner). Falls back to General if not assigned.

Creating a Model Group

1

Open the Groups section

On the Admin > Models page, scroll to the Model Groups section.
2

Click Add Group

Click the Add Group button.
3

Name the group

Enter a descriptive name (e.g., “Production (OpenAI)”, “Budget (DeepSeek)”, “Local Dev”).
4

Assign models to roles

For each role (General, Fast, Reasoning), select a model from the dropdown. The dropdown shows all active models from active providers, grouped by provider name. You can leave a role unassigned — it will fall back to the General model (or to ENV-configured models if General is also unassigned).
5

Save

Click Create. The group is now available for activation.

Activating a Group

To activate a model group, use the dropdown or activation control on the Models page. Only one group can be active at a time. Activating a group immediately applies its model assignments to all new conversations. To deactivate the current group (falling back to ENV-configured models), select the deactivate option.
Switching the active model group affects all new conversations system-wide. Existing in-progress conversations continue using whichever model was active when they started.

ENV Fallback

When no admin-configured model group is active, FIM One falls back to ENV-based configuration:
RoleENV variable
GeneralLLM_MODEL
FastFAST_LLM_MODEL (falls back to LLM_MODEL)
ReasoningREASONING_LLM_MODEL (falls back to LLM_MODEL)
Admin-configured models always take priority over ENV variables. The system health check considers both sources — as long as either an active model group or valid ENV variables are configured, the LLM subsystem reports healthy. For the full ENV reference, see Environment Variables.

Export and Import

The Models page supports exporting your entire provider and model configuration (providers, models, and groups) as a JSON file, and importing it on another instance. This is useful for:
  • Migrating configuration between development, staging, and production environments
  • Sharing a known-good model setup with team members
  • Backing up your configuration before making changes
Exported configuration does not include API keys. After importing, you must edit each provider to enter the appropriate API key.