Architecture: Provider, Model, Group
FIM One organizes LLM configuration in three tiers:| Tier | What it represents | Example |
|---|---|---|
| Provider | A set of shared credentials (API key + base URL). One provider can host many models. | ”My OpenAI Account”, “Company Bedrock Relay” |
| Model | An individual model under a provider. Has its own display name, API model identifier, and advanced settings. | ”GPT-4o”, “Claude Sonnet 4.6” |
| Model Group | A named preset that assigns models to roles (General / Fast / Reasoning). Activating a group switches all roles at once. | ”Production (OpenAI)”, “Budget (DeepSeek)“ |
Adding a Provider
Select a preset or use a custom endpoint
The dialog shows preset buttons for common providers: OpenAI, Anthropic (Claude), Google Gemini, DeepSeek, Mistral AI, and OpenAI Compatible (custom endpoint). Clicking a preset auto-fills the provider name and base URL.Choose OpenAI Compatible if your provider is not listed (e.g., a third-party relay, Ollama, or any other OpenAI-compatible endpoint).
Enter credentials
Fill in the required fields:
- Provider Name — A friendly label (e.g., “My OpenAI Account”). This is for your reference only.
- Base URL — The API endpoint. Presets fill this automatically. For custom endpoints, enter the full URL (e.g.,
http://localhost:11434/v1for Ollama). - API Key — Your provider’s API key. For local models (Ollama), enter any non-empty string (e.g.,
ollama).
Adding a Model
Expand a provider
On the Models page, click the chevron next to an existing provider to expand it and see its models.
Enter model details
Fill in the two required fields:
- Display Name — A human-readable name shown in the UI (e.g., “GPT-4o”, “Claude Sonnet”). Can be anything you like.
- Model Name (API) — The exact model identifier sent to the API (e.g.,
gpt-4o,claude-sonnet-4-6,deepseek-chat). This must match what your provider expects.
Configure advanced settings (optional)
Click the Advanced toggle to reveal additional settings: Max Output Tokens, Context Size, Temperature, Native Function Calling, and JSON Mode. See the Advanced Settings section below for details on each.
Advanced Settings
Each model has advanced settings that control how FIM One interacts with the provider’s API for structured output extraction. These settings are found under the Advanced toggle in the model create/edit dialog.Native Function Calling
Setting name: Native Function Calling (stored astool_choice_enabled)
Default: ON
Controls whether FIM One uses forced tool_choice for structured output extraction. This is Level 1 in the structured output degradation chain — the most reliable method when the model supports it.
When to disable:
- Your model returns errors like
"tool_choice 'specified' is incompatible with thinking enabled"— common with always-on thinking models (DeepSeek R1, Kimi K2.5) - Structured output requests are consistently slow with a ~10-second penalty per call, followed by a fallback to JSON Mode anyway
tool_choice="auto", which works with all models regardless of this setting.
This setting only affects forced tool selection used for structured output extraction (DAG planning, schema annotation). It does not affect the ReAct agent, which freely decides when to call tools using
tool_choice="auto".JSON Mode
Setting name: JSON Mode (stored asjson_mode_enabled)
Default: ON
Controls whether FIM One uses response_format=json_object for structured output. This is Level 2 in the degradation chain.
When to disable:
- Your provider rejects assistant message prefill — primarily AWS Bedrock relays, which throw
"This model does not support assistant message prefill"
Temperature
Default: 0.7 (inherited from the global setting if left unset) Controls the randomness of model output. Range: 0 (deterministic) to 2 (highly creative).When reasoning/extended thinking is enabled for Anthropic models, temperature is automatically forced to 1.0 by the system. You do not need to set this manually.
Max Output Tokens
The maximum number of tokens the model can generate in a single response. Leave blank to use the system default (64,000). For local models with limited VRAM, set this explicitly to a lower value (e.g., 8192).Context Size
The model’s context window size in tokens. Leave blank to use the system default (128,000). Set this to match your model’s actual capability — for local models, this is often 4K-32K depending on the model and available memory.Recommended Configuration
Most models work correctly with the default settings (both toggles ON). Only adjust when you encounter errors or unnecessary latency. The table below covers common providers and models. Data sourced from UniAPI capability tags and verified against runtime behavior as of 2026-03-22. Model capabilities change frequently — if you encounter errors, check your provider’s latest documentation.Quick Rules
- Native FC ON for models with function calling support (most modern models)
- Native FC OFF for thinking-always-on models that reject forced
tool_choice - JSON Mode ON for most models (safe default)
- JSON Mode OFF only for AWS Bedrock relays (prefill rejection)
Per-Provider Configuration Matrix
OpenAI| Model | Role | Context | Max Output | Native FC | JSON Mode | Notes |
|---|---|---|---|---|---|---|
gpt-5.4 | General | 1,050K | 128K | ON | ON | Function calling + structured output + reasoning |
gpt-5.4-mini | Fast | 400K | 128K | ON | ON | Function calling + structured output + reasoning |
o3-pro | Reasoning | 200K | 100K | ON | ON | Reasoning model; FC works with auto-disabled thinking |
| Model | Role | Context | Max Output | Native FC | JSON Mode | Notes |
|---|---|---|---|---|---|---|
claude-sonnet-4-6 | General | 1,000K | 64K | ON | ON | Function calling + reasoning; thinking auto-disabled for FC |
claude-haiku-4-5 | Fast | 200K | 64K | ON | ON | Function calling supported |
claude-opus-4-6 | Reasoning | 1,000K | 128K | ON | ON | Function calling + reasoning; thinking auto-disabled for FC |
| Model | Role | Context | Max Output | Native FC | JSON Mode | Notes |
|---|---|---|---|---|---|---|
gemini-3-pro-preview | General | 1,048K | 65K | ON | ON | Full support (UniAPI tags incomplete — Gemini natively supports FC) |
gemini-2.5-pro | Fast | 1,048K | 65K | ON | ON | Full support |
gemini-3.1-pro-preview | Reasoning | 1,048K | 65K | ON | ON | Full support |
| Model | Role | Context | Max Output | Native FC | JSON Mode | Notes |
|---|---|---|---|---|---|---|
deepseek-v3.2 | General | 164K | 64K | ON | ON | FC supported (UniAPI tags incomplete) |
deepseek-chat | Fast | 64K | 8K | ON | ON | Basic chat model; FC supported |
deepseek-reasoner | Reasoning | 164K | 164K | OFF | ON | Thinking always-on; forced tool_choice may be rejected |
| Model | Role | Context | Max Output | Native FC | JSON Mode | Notes |
|---|---|---|---|---|---|---|
grok-4-1-fast-non-reasoning | General | 2,000K | 2,000K | ON | ON | Function calling + structured output |
grok-3-mini-fast | Fast | 131K | 131K | ON | ON | Function calling + structured output + reasoning |
grok-4-1-fast-reasoning | Reasoning | 2,000K | 2,000K | ON | ON | Function calling + structured output + reasoning |
| Model | Role | Context | Max Output | Native FC | JSON Mode | Notes |
|---|---|---|---|---|---|---|
qwen3.5-plus | General | 1,000K | 64K | ON | ON | Function calling + structured output |
qwen-turbo-latest | Fast | 1,000K | 16K | ON | ON | FC likely supported (UniAPI tags incomplete) |
qwq-plus | Reasoning | 128K | 8K | ON | ON | Reasoning + function calling (thinking may be toggleable) |
| Model | Role | Context | Max Output | Native FC | JSON Mode | Notes |
|---|---|---|---|---|---|---|
glm-4.7 | General | 200K | — | ON | ON | Function calling + structured output + reasoning |
glm-4.7-flashx | Fast | 200K | — | ON | ON | Function calling + structured output + reasoning |
glm-5 | Reasoning | 200K | — | ON | ON | Function calling + structured output + reasoning |
| Model | Role | Context | Max Output | Native FC | JSON Mode | Notes |
|---|---|---|---|---|---|---|
kimi-k2.5 | General | 262K | — | OFF | ON | Thinking always-on; forced tool_choice rejected (400 error) |
kimi-k2 | Fast | 131K | — | ON | ON | Non-thinking; native FC works (verified in production) |
kimi-k2-thinking | Reasoning | 63K | — | OFF | ON | Thinking always-on; forced tool_choice rejected |
| Model | Role | Context | Max Output | Native FC | JSON Mode | Notes |
|---|---|---|---|---|---|---|
MiniMax-M2.5 | General | 205K | — | ON | ON | Function calling + structured output (verified in production) |
MiniMax-M2.5-highspeed | Fast | 205K | — | ON | ON | Function calling + structured output (verified in production) |
MiniMax-M1 | Reasoning | — | — | ON | ON | Function calling + structured output |
| Model | Role | Context | Max Output | Native FC | JSON Mode | Notes |
|---|---|---|---|---|---|---|
doubao-seed-2-0-pro | General | 256K | 128K | ON | ON | Function calling + structured output + reasoning |
doubao-seed-1-6 | Fast | 256K | 32K | ON | ON | Function calling + structured output + reasoning |
doubao-seed-1-6 | Reasoning | 256K | 32K | ON | ON | Supports reasoning_effort (minimal/low/medium/high) |
| Model | Role | Context | Max Output | Native FC | JSON Mode | Notes |
|---|---|---|---|---|---|---|
llama-3.3-70b | General | 131K | 131K | ON | ON | FC depends on hosting provider; try defaults first |
”—” in Max Output means the provider did not report a limit. In practice, these models typically support 4K-16K output tokens. Set Max Output Tokens explicitly in the model’s Advanced settings if you need a specific value.
Model Groups
Model groups let you assign models to specific roles and switch between configurations with a single click.Roles
FIM One uses three model roles. Each role serves a different purpose in the execution pipeline:| Role | Used for | Recommendation |
|---|---|---|
| General | Planning, analysis, ReAct agent, complex reasoning | Your most capable model (e.g., gpt-4o, claude-sonnet-4-6) |
| Fast | DAG step execution, context compaction | Optimized for speed and cost (e.g., gpt-5-nano, deepseek-chat). Falls back to General if not assigned. |
| Reasoning | Tasks requiring deep analysis — complex planning, mathematical proofs, multi-step logic | A strong reasoning model (e.g., o3, deepseek-reasoner). Falls back to General if not assigned. |
Creating a Model Group
Name the group
Enter a descriptive name (e.g., “Production (OpenAI)”, “Budget (DeepSeek)”, “Local Dev”).
Assign models to roles
For each role (General, Fast, Reasoning), select a model from the dropdown. The dropdown shows all active models from active providers, grouped by provider name. You can leave a role unassigned — it will fall back to the General model (or to ENV-configured models if General is also unassigned).
Activating a Group
To activate a model group, use the dropdown or activation control on the Models page. Only one group can be active at a time. Activating a group immediately applies its model assignments to all new conversations. To deactivate the current group (falling back to ENV-configured models), select the deactivate option.ENV Fallback
When no admin-configured model group is active, FIM One falls back to ENV-based configuration:| Role | ENV variable |
|---|---|
| General | LLM_MODEL |
| Fast | FAST_LLM_MODEL (falls back to LLM_MODEL) |
| Reasoning | REASONING_LLM_MODEL (falls back to LLM_MODEL) |
Export and Import
The Models page supports exporting your entire provider and model configuration (providers, models, and groups) as a JSON file, and importing it on another instance. This is useful for:- Migrating configuration between development, staging, and production environments
- Sharing a known-good model setup with team members
- Backing up your configuration before making changes