Model Management

FIM One provides a full-featured admin UI for managing LLM providers and models. This guide covers how to add providers, configure individual models, tune advanced structured-output settings, and organize models into groups for one-click switching. For ENV-based configuration (no admin UI), see Environment Variables. For model selection recommendations, see Recommended Models.

Architecture: Provider, Model, Group

FIM One organizes LLM configuration in three tiers:

Tier	What it represents	Example
Provider	A set of shared credentials (API key + base URL). One provider can host many models.	”My OpenAI Account”, “Company Bedrock Relay”
Model	An individual model under a provider. Has its own display name, API model identifier, and advanced settings.	”GPT-4o”, “Claude Sonnet 4.6”
Model Group	A named preset that assigns models to roles (General / Fast / Reasoning). Activating a group switches all roles at once.	”Production (OpenAI)”, “Budget (DeepSeek)“

Provider: "My OpenAI Account"
  ├── Model: "GPT-4o"         (model_name: gpt-4o)
  ├── Model: "GPT-5 Nano"     (model_name: gpt-5-nano)
  └── Model: "o3"             (model_name: o3)

Provider: "Anthropic Direct"
  ├── Model: "Claude Sonnet"   (model_name: claude-sonnet-4-6)
  └── Model: "Claude Haiku"    (model_name: claude-haiku-4-5)

Group: "Production"
  ├── General → GPT-4o
  ├── Fast    → GPT-5 Nano
  └── Reasoning → o3

Adding a Provider

Open the Models page

Navigate to Admin (sidebar) and select the Models tab.

Click Add Provider

Click the Add Provider button in the top-right area of the Providers section.

Select a preset or use a custom endpoint

The dialog shows preset buttons for common providers: OpenAI, Anthropic (Claude), Google Gemini, DeepSeek, Mistral AI, and OpenAI Compatible (custom endpoint). Clicking a preset auto-fills the provider name and base URL.Choose OpenAI Compatible if your provider is not listed (e.g., a third-party relay, Ollama, or any other OpenAI-compatible endpoint).

Enter credentials

Fill in the required fields:

Provider Name — A friendly label (e.g., “My OpenAI Account”). This is for your reference only.
Base URL — The API endpoint. Presets fill this automatically. For custom endpoints, enter the full URL (e.g., http://localhost:11434/v1 for Ollama).
API Key — Your provider’s API key. For local models (Ollama), enter any non-empty string (e.g., ollama).

Save

Click Create. The provider appears in the list, ready for you to add models under it.

You can create multiple providers for the same service. For example, two “OpenAI” providers with different API keys for separate billing accounts, or an “Anthropic (Direct)” and “Anthropic (via Bedrock)” with different base URLs.

Adding a Model

Expand a provider

On the Models page, click the chevron next to an existing provider to expand it and see its models.

Click Add Model

Click the Add Model button that appears under the expanded provider.

Enter model details

Fill in the two required fields:

Display Name — A human-readable name shown in the UI (e.g., “GPT-4o”, “Claude Sonnet”). Can be anything you like.
Model Name (API) — The exact model identifier sent to the API (e.g., gpt-4o, claude-sonnet-4-6, deepseek-chat). This must match what your provider expects.

Configure advanced settings (optional)

Click the Advanced toggle to reveal additional settings: Max Output Tokens, Context Size, Temperature, Native Function Calling, and JSON Mode. See the Advanced Settings section below for details on each.

Save

Click Create. The model appears under its provider and is now available for assignment to model groups.

Advanced Settings

Each model has advanced settings that control how FIM One interacts with the provider’s API for structured output extraction. These settings are found under the Advanced toggle in the model create/edit dialog.

Native Function Calling

Setting name: Native Function Calling (stored as tool_choice_enabled) Default: ON Controls whether FIM One uses forced tool_choice for structured output extraction. This is Level 1 in the structured output degradation chain — the most reliable method when the model supports it. When to disable:

Your model returns errors like "tool_choice 'specified' is incompatible with thinking enabled" — common with always-on thinking models (DeepSeek R1, Kimi K2.5)
Structured output requests are consistently slow with a ~10-second penalty per call, followed by a fallback to JSON Mode anyway

Effect when disabled: FIM One skips Level 1 (native function calling) and starts from Level 2 (JSON Mode) for structured output. The ReAct agent’s tool calling is completely unaffected — it uses tool_choice="auto", which works with all models regardless of this setting.

This setting only affects forced tool selection used for structured output extraction (DAG planning, schema annotation). It does not affect the ReAct agent, which freely decides when to call tools using tool_choice="auto".

For technical details, see LLM Provider Compatibility — tool_choice_enabled.

JSON Mode

Setting name: JSON Mode (stored as json_mode_enabled) Default: ON Controls whether FIM One uses response_format=json_object for structured output. This is Level 2 in the degradation chain. When to disable:

Your provider rejects assistant message prefill — primarily AWS Bedrock relays, which throw "This model does not support assistant message prefill"

Effect when disabled: FIM One skips Level 2 (JSON Mode) and falls to Level 3 (plain text extraction). Modern models produce valid JSON from prompt instructions alone, so there is typically no quality loss. For technical details, see LLM Provider Compatibility — json_mode_enabled.

Temperature

Default: 0.7 (inherited from the global setting if left unset) Controls the randomness of model output. Range: 0 (deterministic) to 2 (highly creative).

When reasoning/extended thinking is enabled for Anthropic models, temperature is automatically forced to 1.0 by the system. You do not need to set this manually.

Max Output Tokens

The maximum number of tokens the model can generate in a single response. Leave blank to use the system default (64,000). For local models with limited VRAM, set this explicitly to a lower value (e.g., 8192).

Context Size

The model’s context window size in tokens. Leave blank to use the system default (128,000). Set this to match your model’s actual capability — for local models, this is often 4K-32K depending on the model and available memory.

Recommended Configuration

Most models work correctly with the default settings (both toggles ON). Only adjust when you encounter errors or unnecessary latency. The table below covers common providers and models. Data sourced from UniAPI capability tags and verified against runtime behavior as of 2026-03-22. Model capabilities change frequently — if you encounter errors, check your provider’s latest documentation.

Quick Rules

Native FC ON for models with function calling support (most modern models)
Native FC OFF for thinking-always-on models that reject forced tool_choice
JSON Mode ON for most models (safe default)
JSON Mode OFF only for AWS Bedrock relays (prefill rejection)

Per-Provider Configuration Matrix

OpenAI

Model	Role	Context	Max Output	Native FC	JSON Mode	Notes
`gpt-5.4`	General	1,050K	128K	ON	ON	Function calling + structured output + reasoning
`gpt-5.4-mini`	Fast	400K	128K	ON	ON	Function calling + structured output + reasoning
`o3-pro`	Reasoning	200K	100K	ON	ON	Reasoning model; FC works with auto-disabled thinking

Anthropic (Claude)

Model	Role	Context	Max Output	Native FC	JSON Mode	Notes
`claude-sonnet-4-6`	General	1,000K	64K	ON	ON	Function calling + reasoning; thinking auto-disabled for FC
`claude-haiku-4-5`	Fast	200K	64K	ON	ON	Function calling supported
`claude-opus-4-6`	Reasoning	1,000K	128K	ON	ON	Function calling + reasoning; thinking auto-disabled for FC

Google Gemini

Model	Role	Context	Max Output	Native FC	JSON Mode	Notes
`gemini-3.1-pro-preview`	General	1,048K	65K	ON	ON	Latest preview; successor to deprecated gemini-3-pro-preview
`gemini-2.5-pro`	Fast	1,048K	65K	ON	ON	Stable GA; production-ready
`gemini-3.1-pro-preview`	Reasoning	1,048K	65K	ON	ON	Thinking support with configurable thinking_level

DeepSeek

Model	Role	Context	Max Output	Native FC	JSON Mode	Notes
`deepseek-chat`	General	128K	8K	ON	ON	V3.2 non-thinking mode; FC + JSON mode supported
`deepseek-chat`	Fast	128K	8K	ON	ON	Same model as General; only two official API model IDs exist
`deepseek-reasoner`	Reasoning	128K	64K	OFF	ON	Thinking always-on; forced tool_choice rejected; 64K includes CoT

xAI (Grok)

Model	Role	Context	Max Output	Native FC	JSON Mode	Notes
`grok-4-1-fast-non-reasoning`	General	2,000K	30K	ON	ON	Function calling + structured output
`grok-3-mini-fast`	Fast	131K	131K	ON	ON	Function calling + structured output + reasoning; 131K is shared context budget
`grok-4-1-fast-reasoning`	Reasoning	2,000K	30K	ON	ON	Function calling + structured output + reasoning

Qwen (Alibaba Cloud)

Model	Role	Context	Max Output	Native FC	JSON Mode	Notes
`qwen3.5-plus`	General	1,000K	64K	ON	ON	Function calling + structured output
`qwen-turbo-latest`	Fast	1,000K	16K	ON	ON	FC likely supported (UniAPI tags incomplete)
`qwq-plus`	Reasoning	131K	16K	ON	ON	Reasoning + function calling; thinking toggleable via enable_thinking

Zhipu (GLM)

Model	Role	Context	Max Output	Native FC	JSON Mode	Notes
`glm-4.7`	General	200K	65K	OFF	ON	Forced tool_choice not supported (`auto` only); strong coding
`glm-4.7-flashx`	Fast	200K	65K	OFF	ON	Higher throughput variant; free `glm-4.7-flash` also available
`glm-5`	Reasoning	200K	65K	OFF	ON	745B MoE flagship; built-in reasoning (no API toggle)

Moonshot (Kimi)

Model	Role	Context	Max Output	Native FC	JSON Mode	Notes
`kimi-k2.5`	General	262K	65K	OFF	ON	FC works but forced tool_choice rejected when thinking is on (default)
`kimi-k2`	Fast	131K	32K	ON	ON	Non-thinking; native FC works (verified in production)
`kimi-k2-thinking`	Reasoning	131K	—	OFF	ON	Thinking always-on; forced tool_choice rejected

MiniMax

Model	Role	Context	Max Output	Native FC	JSON Mode	Notes
`MiniMax-M2.7`	General	205K	131K	ON	ON	Latest (Mar 2026); function calling + structured output
`MiniMax-M2.5`	Fast	197K	65K	ON	ON	Function calling + structured output; cheaper cache read ($0.03/MTok)
`MiniMax-M2.7-highspeed`	Fast (speed)	205K	131K	ON	ON	2x throughput (~100 tok/s), 2x cost
`MiniMax-M2.5-highspeed`	Fast (speed)	197K	65K	ON	ON	2x throughput (~100 tok/s), 2x cost

ByteDance (Doubao)

Model	Role	Context	Max Output	Native FC	JSON Mode	Notes
`doubao-seed-2-0-pro`	General	256K	128K	ON	ON	Function calling + structured output + reasoning
`doubao-seed-1-6`	Fast	256K	16K	ON	ON	Function calling + structured output + reasoning
`doubao-seed-1-6`	Reasoning	256K	16K	ON	ON	Supports `reasoning_effort` (minimal/low/medium/high)

Meta (Llama)

Model	Role	Context	Max Output	Native FC	JSON Mode	Notes
`llama-3.3-70b`	General	131K	16K	ON	ON	FC + JSON mode depend on hosting provider; max output varies (2K–16K)

”—” in Max Output means the provider did not report a limit. In practice, these models typically support 4K-16K output tokens. Set Max Output Tokens explicitly in the model’s Advanced settings if you need a specific value.

How to diagnose: Check your application logs for structured_llm_call: native_fc call raised warnings. If you see these warnings followed by successful JSON Mode extraction, the model does not benefit from native function calling. Disable Native Function Calling for that model to eliminate the wasted API call and the ~10-second latency penalty per structured output request.

Model capabilities change frequently as providers update their APIs. The recommendations above are based on data from 2026-03-26 (UniAPI capability tags + production runtime verification). If a model that previously worked starts returning errors, check the provider’s changelog for breaking changes.

Model Groups

Model groups let you assign models to specific roles and switch between configurations with a single click.

Roles

FIM One uses three model roles. Each role serves a different purpose in the execution pipeline:

Role	Used for	Recommendation
General	Planning, analysis, ReAct agent, DAG step execution (default)	Your most capable model (e.g., `gpt-4o`, `claude-sonnet-4-6`)
Fast	`model_hint="fast"` DAG steps, context compaction, history summary	Optimized for speed and cost (e.g., `gpt-5-nano`, `deepseek-chat`). Falls back to General if not assigned.
Reasoning	`model_hint="reasoning"` DAG steps, domain-escalated ReAct (legal/medical/financial)	A strong reasoning model (e.g., `o3`, `deepseek-reasoner`). Falls back to General if not assigned.

Creating a Model Group

Open the Groups section

On the Admin > Models page, scroll to the Model Groups section.

Click Add Group

Click the Add Group button.

Name the group

Enter a descriptive name (e.g., “Production (OpenAI)”, “Budget (DeepSeek)”, “Local Dev”).

Assign models to roles

For each role (General, Fast, Reasoning), select a model from the dropdown. The dropdown shows all active models from active providers, grouped by provider name. You can leave a role unassigned — it will fall back to the General model (or to ENV-configured models if General is also unassigned).

Save

Click Create. The group is now available for activation.

Activating a Group

To activate a model group, use the dropdown or activation control on the Models page. Only one group can be active at a time. Activating a group immediately applies its model assignments to all new conversations. To deactivate the current group (falling back to ENV-configured models), select the deactivate option.

Switching the active model group affects all new conversations system-wide. Existing in-progress conversations continue using whichever model was active when they started.

Domain-Aware Model Escalation

When the auto-router detects a specialist domain — legal, medical, or financial — the system automatically escalates model selection beyond the normal role assignments:

ReAct mode: The general model is replaced by the reasoning model (registry.get_by_role("reasoning")). This means the Reasoning slot in your Model Group is not only used for DAG model_hint="reasoning" steps — it also serves as the escalation target for domain-specific ReAct tasks.
DAG mode: Domain context is injected into the planner prompt, guiding it to assign model_hint="reasoning" to steps requiring specialist accuracy.

This escalation is automatic and requires no configuration beyond having a Reasoning model assigned in your active Model Group (or via the REASONING_LLM_MODEL env var). Related environment variables:

Variable	Default	Description
`DAG_CITATION_VERIFICATION`	`true`	Enable post-step citation verification for legal/medical/financial content. Extracts citations via regex and verifies accuracy via LLM judgment.
`DAG_STRUCTURED_CONTEXT_MULTIPLIER`	`3.0`	Truncation budget multiplier for structured content (citations, tables, code blocks) in DAG dependency context. Higher values preserve more structured data between steps.

If your workload involves legal, medical, or financial queries, ensure your Reasoning model is a strong reasoner (e.g., o3, claude-opus-4-6, deepseek-reasoner). The automatic escalation relies on this slot being populated with a model that can handle domain-critical accuracy requirements.

ENV Fallback

When no admin-configured model group is active, FIM One falls back to ENV-based configuration:

Role	ENV variable
General	`LLM_MODEL`
Fast	`FAST_LLM_MODEL` (falls back to `LLM_MODEL`)
Reasoning	`REASONING_LLM_MODEL` (falls back to `LLM_MODEL`)

Admin-configured models always take priority over ENV variables. The system health check considers both sources — as long as either an active model group or valid ENV variables are configured, the LLM subsystem reports healthy. For the full ENV reference, see Environment Variables.

Export and Import

The Models page supports exporting your entire provider and model configuration (providers, models, and groups) as a JSON file, and importing it on another instance. This is useful for:

Migrating configuration between development, staging, and production environments
Sharing a known-good model setup with team members
Backing up your configuration before making changes

Exported configuration does not include API keys. After importing, you must edit each provider to enter the appropriate API key.

​Architecture: Provider, Model, Group

​Adding a Provider

​Adding a Model

​Advanced Settings

​Native Function Calling

​JSON Mode

​Temperature

​Max Output Tokens

​Context Size

​Recommended Configuration

​Quick Rules

​Per-Provider Configuration Matrix

​Model Groups

​Roles

​Creating a Model Group

​Activating a Group

​Domain-Aware Model Escalation

​ENV Fallback

​Export and Import

Architecture: Provider, Model, Group

Adding a Provider

Adding a Model

Advanced Settings

Native Function Calling

JSON Mode

Temperature

Max Output Tokens

Context Size

Recommended Configuration

Quick Rules

Per-Provider Configuration Matrix

Model Groups

Roles

Creating a Model Group

Activating a Group

Domain-Aware Model Escalation

ENV Fallback

Export and Import