Changelog - FIM One

All notable changes to this project are documented here. Format follows Keep a Changelog. Versions correspond to Roadmap milestones.

[Unreleased]

Security

Closed a two-factor-authentication bypass. API and chat/SSE authentication now require a genuine access token (SSE streaming also accepts its short-lived ticket); the 2FA temp token, refresh tokens, and bind/SSE tickets — all signed with the same key — can no longer be replayed to authenticate, so a stolen password alone can no longer skip the second factor.
OAuth login no longer auto-links to an existing account on an unverified email. A third-party account is matched to a local account by email only when the provider marks that email verified, preventing takeover via an attacker-controlled unverified address.

Added

file_ops.apply_patch — apply V4A-format diff patches to files. Lets agents make multi-line edits with line-based hunks and fuzzy whitespace matching, complementing the existing find_replace for cases where exact substring match is brittle.
Content guardrails for input and output validation. A new core/agent/guardrail layer runs alongside (not replacing) existing tool-permission gates and security checks. Input guardrails (default: jailbreak phrase detector) run before any LLM call and abort the turn if triggered, saving tokens and surfacing a clear “blocked” notice in chat. Output guardrails run after the final answer. Configurable via FIM_GUARDRAILS_INPUT / FIM_GUARDRAILS_OUTPUT env vars; per-agent UI configuration coming in a future release.

Changed

/settings?tab=billing now renders full-width and visually consistent with the rest of the Settings tabs. Removed the max-w-4xl width cap and the redundant outer scroll container; dropped the page-title icon; rewrote the plan / usage / subscription / plans-comparison cards to use the same lean bordered-div pattern (compact labels + tabular-nums numbers) as the Usage tab. Old layout looked like a standalone page accidentally wedged into the settings frame.

Fixed

Token quota now resets on your billing date, not the 1st of the month. Usage was always counted per calendar month regardless of when you subscribed, so paid users could be charged on the 20th yet wait until the 1st for a fresh quota — and a subscription started late in the month effectively granted a second full quota days later. Quota now resets on each subscription’s monthly anniversary (free users still reset on the calendar month). Annually-billed plans correctly refill every month, not once a year. The Plan & Billing tab, the Usage card, and the quota-exceeded dialog now all measure usage and show the reset date over this same window, so the displayed “usage this period” matches the point chat actually stops you.
Subscription renewals now advance the billing period. On auto-renewal the displayed period-end date stayed frozen at the previous cycle even though Stripe had charged and rolled the subscription forward — the renewal handler read the new period only from the invoice payload, where it is unreliable. It now retrieves the authoritative period bounds from Stripe, so the cycle-end date (and quota reset) move forward on every renewal. This also removes a latent risk where canceling such a subscription could downgrade the account immediately instead of at the true period end.
Raw tool-call protocol no longer leaks into agent answers. When a model improvises a tool call as plain text (e.g. emitting <tool_call>/<tool_response> blocks — common when it reaches for a tool that isn’t available), that text — including any base64/file dumps inside it — could surface verbatim in the reply. All answer paths now strip this pseudo-protocol, and the parser additionally recognizes the {"name","arguments"} dialect as a genuine tool call instead of leaking it.
Intermittent APIConnectionError: Connection error bursts during agent turns. The shared HTTP connection pool kept idle keep-alive connections for up to 30 s; when an upstream relay or NAT silently reaped an idle connection, the next request reused the half-dead socket and failed. Keep-alive expiry is now a short 5 s by default (warm connections within a turn are still reused), and the pool is tunable via LLM_HTTP_MAX_CONNECTIONS / LLM_HTTP_MAX_KEEPALIVE / LLM_HTTP_KEEPALIVE_EXPIRY — set LLM_HTTP_MAX_KEEPALIVE=0 to disable connection reuse entirely behind an unreliable upstream.
API key usage stats (total_requests, last_used_at) now persist on every request. Previously the increment was written on the request’s database session, which is only committed by endpoints that perform their own writes — so read-only API-key calls (dashboard, files, models, exports, …) silently rolled the counter back. Usage is now recorded on an independent, self-committing session, decoupled from request success.

[v0.8.6] - 2026-05-08

Added

Stripe billing — Pro plan subscription. New Plan & Billing tab under /settings?tab=billing shows current plan, monthly token usage, period reset, and either Upgrade to Pro (Stripe Checkout) or Manage subscription (Customer Portal). Successful checkout opens a confetti modal naming the purchased plan; redirect params are stripped only after dismissal so a refresh doesn’t re-trigger it. Quota enforcement respects each user’s plan; canceled subscriptions stay on Pro until the paid period ends; the mid-stream 402 quota dialog now lands on the billing tab. Admins manage plans and browse subscriptions under Admin Panel → Billing. Pro currently uses a Stripe test-mode price; production cutover replaces the price ID.
Admin-controlled billing feature flag. Enable Stripe Billing toggle in Admin → System Settings gates the entire Stripe pipeline — user-facing /api/billing/*, Stripe webhook, admin plan/subscription CRUD, and frontend billing tab + admin nav all hide or 503 when off. First activation seeds Free + Pro, sets the default_plan_id pointer, and backfills users.plan_id; subsequent off/on is a pure flag flip. Private deployments without Stripe credentials stay clean. Quota chain falls back to the admin-set Default Monthly Token Quota when billing is off.
Per-user unlimited quota + Registered column. Empty quota field inherits the global limit; 0 grants unlimited. Previously both collapsed into the same state. Users table also gains a Registered column.
Non-English docs (zh / ja / ko / de / fr) now show the full API Reference and Channels Configuration sections — the Endpoints group and Configuration > Channels sub-group were previously English-only.

Changed

Plan & Billing is a Settings tab, not a standalone page. Folded /settings/billing back into /settings?tab=billing so the left nav stays visible while users switch plans (matches GitHub / Linear / Notion). The standalone route redirects, so old bookmarks and earlier-deploy Stripe Checkout return URLs keep working. The Subscriptions tab is renamed to Marketplace to disambiguate from plan subscriptions.
Free plan quota is read-only on the admin Plans page. Sourced from System Settings → Default Monthly Token Quota; editing there propagates automatically. Backend silently ignores monthly_token_quota on PATCH for the Free plan.
Token quotas render as 5M / 100K instead of 5,000,000 across billing page, plan cards, and admin Plans table. Admin INPUT fields still accept raw numbers.
Translation authority shifted from per-locale files to a single glossary. Rules live in scripts/translation-glossary.md, loaded into every LLM translation call (JSON, MDX, README). The pre-commit hook unconditionally refuses manual edits to generated locale files. To fix a mistranslation, edit the glossary and regenerate with --force.
Licensor and governing law. The FIM One Source Available License is now granted by FIM Labs Pte. Ltd. (Singapore); governing law moves from the PRC to Singapore; disputes go to SIAC arbitration in Singapore (one arbitrator, English). New top-level NOTICE file records R&D attribution, trademarks, and third-party component policy. No change to permitted-use or restriction terms.

Fixed

Admin approve/reject of agents, connectors, knowledge bases, and MCP servers no longer fails on PostgreSQL. The publish-review reviewed_at column is now timezone-aware, so approval writes succeed on PG (the bug was invisible on SQLite dev databases).
OAuth users can refresh their session again. The OAuth-issued refresh token was stored in a form the refresh endpoint could never match, forcing a re-login once the access token expired; it is now stored as a digest like every other path.
Browser tab title now reflects the active sub-tab in /settings and /admin (e.g. Settings · Billing — FIM One), so multiple pinned tabs are distinguishable.
Usage page no longer reports “Unlimited” when an admin has set a default token quota — it now resolves through the unified quota chain (per-user override → plan tier → system default).
Agent settings: bound resources no longer flicker as “(已删除)” on entry. Orphan badge waits for inventory fetches to settle before rendering.
Toasts now follow the light/dark theme instead of being hardcoded to dark.
Usage by-agent breakdown no longer duplicates “Direct Chat” rows for conversations tied to deleted agents.
Self-hosted ./deploy.sh no longer fails with container name already in use — sweeps hash-prefixed zombie containers and DooD sandbox children before docker compose up.
No more redundant chore(i18n): sync translations commits after pushes — the pre-commit translation hook now stages .translation-cache.json alongside translated files.
Agent chat no longer crashes when a confirmation-required tool runs in an unbound-agent session — the gate bows out gracefully when there’s no agent to route for.
Retry no longer fails on strict-alternation providers (Claude) when history contains orphan user messages from stopped turns — consecutive same-role messages collapse into a single turn before dispatch.
Playground retry no longer briefly shows the query twice during the pre-stream window.
ReAct agent now acknowledges and stops after an operator rejects an approval request, instead of retrying paraphrased tool calls.
Playground “Suggested follow-ups” returns and is now opt-in per agent (default off; new Follow-up Suggestions toggle in agent settings).

[v0.8.5] - 2026-04-23

Added

Contributor-friendly i18n workflow: contributors no longer need to configure LLM_API_KEY to submit PRs that touch English source files. If the local pre-commit translation step is skipped (no key), a new .github/workflows/i18n-sync.yml workflow translates EN → ZH/JA/KO/DE/FR on master after the PR is merged and auto-commits the result. The pre-commit hook also now refuses manual edits to generated locale files (with an ALLOW_LOCALE_EDIT=1 override for legitimate translation fixes), preventing silent drift between EN sources and translated outputs.
i18n CI fallback end-to-end verified: smoke-test push (EN-only, local translate hook skipped) confirmed the GitHub Actions workflow detects the changed source, translates into all five locales, and auto-commits the result to master with [skip ci] to prevent recursion.
Exa integration docs page: dedicated Integrations section in the sidebar with a first-class Exa page covering the full Exa search surface (neural / fast / deep-reasoning / instant), filtering, content retrieval, and three tuned presets for news monitoring, research paper retrieval, and deep-reasoning agents. Serves as the partner-facing landing page for the Exa integration directory.
Xinchuang (信创) database support: the Database Connector now lists KingbaseES (人大金仓), HighGo (瀚高), and DM8 (达梦) alongside PostgreSQL/MySQL. KingbaseES and HighGo are PG-compatible and reuse asyncpg with no extra dependencies; DM8 uses the official dmPython vendor wheel. A standalone scripts/test_xinchuang_dbs.py lets operators verify live connectivity from the CLI.
Feishu Channel + confirmation gate via IM: new Channel resource type (org-scoped, credentials encrypted at rest) lets orgs connect a Feishu app for outbound messaging. Tools flagged requires_confirmation=True now send an Approve/Reject interactive card to the configured Feishu group instead of only showing the confirmation dialog in the portal — any authorized member of the group can approve or reject directly from Feishu. Covers Settings → Channels management UI (list, create/edit with dirty-state protection, details with copyable callback URL, test-send), CRUD API (/api/channels), and the Feishu event callback endpoint (/api/channels/{id}/callback) with signature verification and URL challenge support. First step of the v0.9 IM Channel Integration roadmap item, ships ahead of schedule for the 2026-04-24 roadshow.
Agent Hook System (skeleton): new PreToolUseHook / PostToolUseHook abstraction in src/fim_one/core/hooks/ lets deterministic logic run outside the LLM loop — the FeishuGateHook is the first concrete implementation, attached to the confirmation-gate flow. Full hook lifecycle + user-defined YAML hooks remain v0.9 scope.
Hook System is now live in the ReAct and DAG runtime: agents that declare hooks.class_hooks in their model_config_json have those hooks instantiated and registered on every chat session. The first consumer — FeishuGateHook — fires automatically when an agent calls a tool whose connector action is flagged requires_confirmation=True, posts an Approve/Reject card to the org’s Feishu group, blocks the tool, and resumes or aborts based on the verdict. Previously the hook abstraction was in place but nothing in the web layer wired it to live chat.
Channels and the Hook System architecture documented: new docs/architecture/hook-system.mdx explains the three hook points, why hooks run outside the LLM loop, and walks through FeishuGateHook end-to-end. Existing architecture pages (system-overview, organization, react-engine, philosophy) cross-link to it. README now lists Messaging Channels as a first-class v0.8 capability and the Application Layer diagram includes IM targets alongside Portal/API/iframe.
Hook Approval Playground: Channels details sheet now has a “Test Approval Flow” action that simulates a sensitive tool call, pushes a real confirmation card to the linked Feishu group, and polls for the reviewer’s decision live. Unlike the existing preview button, this exercises the full production path (genuine ConfirmationRequest row, real Feishu callback, status transitions), so demos and pre-rollout rehearsals use the same code path a production hook would.
Per-agent task completion notifications: agents can now push a summary card to the org’s channel (currently Feishu) when a long-running ReAct or DAG task finishes. Configurable per-agent in Settings → Agent → Notifications. First consumer of the generic outbound notification pattern.
Configurable confirmation gate — inline or channel: every agent now has an “Approval” section in Settings with three routing modes (Auto / Inline only / Channel only), an approver-scope selector (initiator / agent owner / anyone in the org), a “require confirmation for every tool call” override, and an explicit approval-channel picker. Auto mode uses a linked channel if one exists and gracefully falls back to an inline approval card in the chat stream otherwise — so agents without any channel still get a real approval UX instead of silently failing. A new POST /api/confirmations/{id}/respond endpoint shares a single decision-recording path with the Feishu webhook, so every approval — whether clicked in chat or in a Feishu group — stamps the same approver_user_id and decided_at audit fields.

Changed

Playground loading indicators now use a subtle text shimmer instead of the fake-looking progress bar that froze at a pseudo-full width after 8 seconds. Unified the two existing shimmer implementations (.shiny-text and .text-shimmer) into a single theme-aware primitive with an optional warm preset.
Connector cards now surface a “Private default” badge (with tooltip) when a connector has allow_fallback disabled, so owners can tell at a glance which connectors require every user to bring their own credentials. The help text under the Allow-Fallback toggle in the connector settings form also clarifies that the flag only gates sharing with other users — the owner can always use their own default credential regardless.

Fixed

Connector calls made by the owner of a connector with allow_fallback=false and only a default credential (no per-user credential) no longer 401 with “Requires authentication”. The owner is now exempt from the fallback gate — that flag only controls whether other users may borrow the owner’s default credential. Previously the owner’s own agents were silently sending unauthenticated requests, and the same issue also affected workflow connector_action nodes.
Conversation export now shows the correct mode label (“Planner” / “规划”) for auto-routed DAG conversations instead of always displaying “Standard”.
Export timestamps now respect the user’s configured timezone instead of displaying raw UTC.
Uploaded file content no longer leaks into exported conversations; only the user’s message text is included.
Parallel tool calls no longer collide when a provider reuses index=0 for every streamed tool-call delta; the aggregator now detects boundaries via id or name change and remaps subsequent deltas to the correct slot.
Settings → Channels now reflects the current user’s org role: members (non-admin/owner) see a disabled “New Channel” button, hidden Edit / Enable-Disable / Delete actions, a read-only banner, and a permission-aware empty state — instead of an enabled CTA that failed on submit with “Organization admin access required”.
Session-expiry redirect now preserves the query string, so users land back on the exact tab / filter they were viewing after re-authenticating instead of the bare path.
Feishu channel form no longer shows a spurious “discard unsaved changes” prompt when interacting with the chat picker that’s layered above the dialog.
Feishu channel setup hints no longer duplicate Chinese labels when the UI itself is already in Chinese (e.g. previously rendered “事件与回调 (事件与回调)”).
“Annotate All” in the schema manager no longer returns 500 Internal Server Error — the full-annotate backend path had an unbound-variable bug that blocked every invocation.
Editing a database connector now shows the ******** placeholder in the password field instead of three-bullet masked text, making it obvious that leaving the field blank keeps the stored password.
Updating a connector action no longer collapses the detail panel — the edited action stays selected so users can keep iterating on it.
AI connector editor now distinguishes success, partial failure, and complete failure instead of showing the same “completed” message for all three. Failure reasons are surfaced inline so users can see what actually went wrong.
AI connector editor can no longer silently wipe multiple actions in one go. Bulk-delete (>2 actions) now requires an explicit destructive keyword in the user’s instruction (“rebuild”, “全部重建”, “wipe”, etc.); otherwise the operation is rejected with a clear error, protecting requires_confirmation / JMESPath settings from accidental loss.
Confirmation cards in the portal chat now show whether the request was routed to a channel (e.g. Feishu) or handled inline, alongside a human-readable hint about who is allowed to approve (the initiator, the agent owner, or any org member). Channel-routed requests also produce an inline pending card so the user isn’t left wondering whether a notification was actually sent.
Feishu approval cards now become read-only after the first decision: the /callback webhook returns a replacement card with the Approve/Reject buttons removed and the header coloured green (approved) or red (rejected), preventing repeated clicks. Duplicate clicks that still arrive from stale Feishu clients get a “This request was already approved/rejected.” toast and a fresh copy of the decided card so the stale view catches up.
Restored the plain “Send Test Message” action on channel rows and the details sheet. The Approval Playground exercises the full hook round-trip, but a notification-only channel (no approval hook wired) still needs a quick credential/connectivity sanity check, which the plain test-send covers.
Concurrent clicks on the same Feishu approval card can no longer both succeed. The /callback handler now flips the ConfirmationRequest status via a conditional UPDATE ... WHERE status='pending' and uses the affected rowcount to decide which caller “won”; previously two parallel requests could both read pending and race a write, potentially ending up with approved-then-rejected on the same row.
Pending approval requests now auto-expire after CHANNEL_CONFIRMATION_TTL_MINUTES (default 24h) via a background sweeper. Prevents a stale click days later from flipping agent state that has already been torn down; the next click on an expired card gets a grey “Expired” decided card and a “no longer active” toast.
Send Test Message now delivers a plain text notification (no Approve/Reject buttons) and lives only in the channel details sheet — not the row dropdown. Users who don’t intend to use approval hooks aren’t confused by interactive buttons on a “test” message. Approval round-trip testing remains available via the Approval Playground button.
Channel details sheet tightened: “How to finish setup” is now a collapsible section collapsed by default (so it doesn’t dominate the sheet for already-configured channels), and the outer padding was reduced so content sits closer to the sheet edge.
Builder AI no longer reports masked (****) credentials as missing — it now recognizes them as configured and skips false “credential missing” guidance.
Playground agent list now shows all accessible agents instead of only published ones, so draft agents can be tested without publishing first.
Chat image uploads no longer crash the stream on malformed data: URLs — the MIME extractor now safely falls back to application/octet-stream instead of raising an IndexError mid-generate.
Playground image thumbnails now cancel in-flight fetches on unmount via AbortController, avoiding stale blob-URL assignments and wasted bandwidth during rapid navigation.

[v0.8.4] - 2026-04-17

Added

Conversation recovery: synthetic tool_result rows now persist after an interrupted turn; clients can resume a disconnected SSE stream via POST /chat/resume with the last-seen cursor.
Playground now auto-reconnects dropped SSE streams using the /chat/resume endpoint with exponential backoff (max 3 attempts); shows a “Reconnecting…” indicator during recovery.
Prompt cache observability: cache_read_input_tokens and cache_creation_input_tokens are captured from LLM responses, aggregated per turn in TurnProfiler, logged as a turn_cache summary line (read/create tokens + estimated savings), and surfaced in the chat done_payload under a new cache field. Enables verification that Anthropic prompt caching actually hits, and doubles as a detector for whether API relay stations honor the cache discount.

Changed

System prompts now use a memoized section registry with Anthropic prompt-caching breakpoints on the stable prefix — reduces per-turn token cost ~60-80% on cached prefix for Claude models. ReAct JSON mode, native function-calling mode, and synthesis all emit two system messages for cache-capable providers (Claude, Bedrock Anthropic, Vertex Claude) and fall back to a single concatenated message for every other provider.

Fixed

Thinking/reasoning tokens now persist across multi-turn conversations — Anthropic signature field is captured and replayed per API requirements.
Provider-aware reasoning replay policy: reasoning_content (from DeepSeek-R1, Qwen QwQ, Gemini thinking, OpenAI o-series) is no longer replayed back to non-Anthropic providers on subsequent turns. Previously the field was serialized unconditionally in ChatMessage.to_openai_dict(), which violated provider documentation (DeepSeek and Qwen both explicitly document “do not send reasoning_content back in message history”) and silently invalidated their automatic prefix / KV caches on every multi-turn exchange. Policy is centralized in core/prompt/reasoning.py — Claude family (including Bedrock and Vertex proxies) still replays thinking blocks with signature as required.

[v0.8.3] - 2026-04-16

Added

convert_to_markdown built-in tool — New general-purpose Agent tool that converts any file, URL, YouTube link, or data URI to clean Markdown using Microsoft’s MarkItDown. Supports PDF, Word (.docx), Excel (.xlsx/.xls), PowerPoint (.pptx), HTML, JSON, CSV, XML, ZIP, EPUB, Outlook .msg, images, audio (speech → text), and YouTube transcripts. Available to every agent by default — same tier as web_fetch. When a vision-capable LLM is configured, embedded images and scanned PDF pages are OCR’d automatically via the official markitdown-ocr plugin. Previously this capability was hidden inside the background RAG ingestion pipeline; agents now have it on the interactive conversation path.
Document OCR via markitdown-ocr — Embedded images in DOCX / XLSX / PPTX and scanned PDF pages are now OCR’d using the same vision-capable LLM the rest of FIM One routes through. Applies to both the built-in convert_to_markdown tool and the RAG ingestion pipeline, so chat-time conversion and knowledge-base ingestion produce byte-identical Markdown for the same input.
Universal vision provider support for document OCR — A new LiteLLMOpenAIShim duck-type wraps any FIM One OpenAICompatibleLLM in the openai SDK’s .chat.completions.create(...) API shape, then dispatches through litellm.completion(). MarkItDown (which hard-codes the openai SDK surface) can now consume Anthropic Claude, Google Gemini, Azure, Bedrock, and any other provider LiteLLM supports — no per-provider adapter code in FIM One.
Vision-aware RAG ingestion — Knowledge-base uploads of Office documents and scanned PDFs now resolve the workspace’s default vision LLM (DB-first, ENV fallback) and pass it through to MarkItDown for OCR during ingestion. Zero-regression: when no vision-capable model is available, ingestion silently falls back to text-only mode — exactly the pre-feature behavior.
Expanded MarkItDown format coverage — RAG now natively ingests .pdf, .msg (Outlook), .epub, .mp3, .wav, and .m4a via MarkItDown’s audio-transcription and outlook extras. YouTube URLs flow through convert_to_markdown via markitdown[youtube-transcription].
LLM_SUPPORTS_VISION env var — Optional opt-out (=false) for the ENV-mode document-OCR fallback. Default behavior is optimistic (true), which covers the common ENV setups (gpt-4o, claude-3-5-sonnet, gemini-1.5-pro/flash). Set to false only when your ENV-configured LLM_MODEL does not support vision (e.g. deepseek-v3, qwen-chat, llama-3.1, gpt-3.5-turbo, o1-mini) to skip a failing vision call on every document upload. Ignored entirely when an admin-curated ModelGroup is active — DB mode is always the source of truth when available.
Turn-level profiler — Each ReAct turn now logs phase-level timings (memory_load, compact, tool_schema_build, llm_first_token, llm_total, tool_exec) in a single structured log line per turn. Toggleable via REACT_TURN_PROFILE_ENABLED (default: on; set to false for zero-overhead no-op).
Structured compact work card — Conversation compaction now parses its own 9-section markdown output into a typed WorkCard and merges new compacts into the previous one, so errors and pending tasks from earlier in a long session survive across multiple compaction rounds instead of being re-summarized from scratch.

Changed

Per-user rate limiting — LLM-layer rate limiter now maintains a separate bucket per user instead of a single process-global bucket. Prevents one noisy user from throttling all other users on the same worker. Toggleable via LLM_RATE_LIMIT_PER_USER (default: on).

Fixed

Dangling tool_use recovery — Conversations interrupted mid-tool-execution (user Stop, SSE disconnect, crash) previously left an assistant message with a tool_use block and no matching tool_result, causing the next turn to crash with an opaque HTTP 400 from the LLM API. DbMemory.get_messages() now detects and repairs these dangling blocks on the read path with a synthetic [interrupted] tool_result. Raw DB log is not mutated.
Empty-content assistant messages with tool_calls no longer dropped — The DbMemory load-path filter previously silently discarded any assistant row with empty text content. Native function-calling intermediates (which carry only tool_calls, no text) were being wiped out. The filter now requires BOTH empty content AND no tool_calls.

[v0.8.2] - 2026-04-10

Added

Intelligent Document Processing (Vision-Aware) — Adaptive document handling based on model capabilities. When the target LLM supports vision (GPT-4o, Claude 3/4, Gemini), PDF pages are rendered as images and sent via vision content blocks for full visual fidelity. Text-only models fall back to pdfplumber text extraction. Two modes: Vision and Text-only. Configurable via DOCUMENT_PROCESSING_MODE, DOCUMENT_VISION_DPI, DOCUMENT_VISION_MAX_PAGES env vars. Per-model supports_vision toggle in Admin.
Document vision pipeline — DOCX, PPTX, and PDF files uploaded in chat now have their embedded images extracted and sent as vision content to the LLM when vision is enabled on the model.
Multi-turn vision persistence — Vision content from uploaded documents and images persists across conversation turns, so the model retains visual context throughout the conversation.
Smart PDF processing — Text-rich PDF pages extract text plus embedded images separately (saving tokens). Scanned or image-only pages render as full-page PNG for maximum fidelity.
Pre-built sandbox image — Dockerfile.sandbox with common data-science packages (pdfplumber, Pillow, pandas, etc.) so AI code execution works out of the box in --network=none containers.
Resource Fork completion — All five resource types now support fork: Agent, Connector, Workflow, MCP Server, and Skill. KB fork removed (inherently user-local).

Changed

Faster chat response completion — SSE stream now closes immediately after the agent finishes; title generation and follow-up suggestions run in the background instead of blocking the response.
Smarter context compaction — Conversation compaction uses a structured 9-section format that better preserves key information (original request, errors, pending tasks) across long sessions.
Reduced agent looping — Anti-loop instructions added to agent prompts; cycle detection threshold lowered so repeated identical tool calls are caught earlier.
Faster request startup — LLM configuration lookups and domain classification now run concurrently, reducing per-request overhead by 400-1100ms.
Better empty tool handling — Tools that return no output now produce a descriptive message instead of bare “(no output)”, preventing wasteful retries.
Automatic old tool result cleanup — Tool results older than the 6 most recent are automatically cleared before context compaction, keeping conversations lean.
Tool result aggregate budget — Total tool result tokens are capped at 40K per session; new results are truncated when the budget is exceeded, preventing context bloat from large API responses.
Context overflow auto-recovery — When the LLM rejects a request due to context length overflow, the agent automatically compacts to 50% and retries, instead of crashing the entire conversation.
Keyword-based tool selection — When a query obviously matches a specific tool by name or description keywords, the agent skips the LLM-based tool selection call, saving 200-500ms.
LLM connection pooling — All LLM API calls now share a single connection pool with optimized keepalive settings, reducing connection overhead across the entire session.
Smarter completion check — The post-answer verification step is skipped for long detailed answers (>200 tokens), eliminating an unnecessary LLM round-trip.
Model fallback on provider outage — When the primary model is unavailable (rate limited, overloaded, or down), the agent automatically retries with the fast model instead of failing.

Fixed

Agent hallucination on unreadable files — When the AI agent could not read a file (e.g., image-based PDF), it previously read other unrelated files and presented their content as the target file’s. A file integrity guardrail in the system prompt now prevents this.
File ID injection for uploads — Uploaded files now include their UUID file_id in the message context, so the agent can directly access them via read_uploaded_file without guessing.
Vision toggle reading from new model structure — The supports_vision flag on model configs was not being read correctly from the ModelGroup/ModelProviderModel ORM structure. Fixed.
Improved error messages for unreadable files — When files cannot be read, the tool now returns specific guidance (file type, vision suggestion) instead of generic errors.

[v0.8.1] - 2026-03-29

Added

Timezone-aware admin notifications — Admin notification emails now display event times in each recipient’s configured timezone instead of always showing UTC.
Progressive database tool disclosure — Single database meta-tool with list_tables/discover/query subcommands replaces individual per-table tools. Configurable via DATABASE_TOOL_MODE env var (progressive default, legacy fallback).
On-demand tool loading — When more than 12 tools are available, a request_tools meta-tool lets the agent dynamically load additional tools mid-conversation instead of being stuck with the initial selection.
Progressive MCP tool disclosure — Single mcp meta-tool with discover/call subcommands replaces individual per-server tools. Configurable via MCP_TOOL_MODE env var (progressive default, legacy fallback).
Per-turn token budget circuit breaker — REACT_MAX_TURN_TOKENS env var provides an emergency stop for runaway agent loops. Default 0 (unlimited) — use per-user token_quota for daily cost control instead.
Per-model Native Function Calling toggle — tool_choice_enabled setting (ENV + Admin per-model) lets models that reject forced tool selection skip Level 1 and go directly to JSON Mode. Configurable in Settings → Models → Advanced.
DAG quality overhaul — Five improvements: default model upgrade to general model for non-fast steps; skill auto-discovery in planning; citation verifier for legal/medical/financial domains; structured content context preservation with configurable truncation multiplier; domain classification in router with domain-aware model selection.
Domain model escalation in ReAct — Specialist domains (legal/medical/financial) auto-escalate to reasoning model with mandatory web search and citation verification.
File attachment download — File cards in chat messages are now clickable to download the original file.
Admin notification master switch — Global on/off toggle for admin email notifications with runtime SMTP detection. Shows a warning banner when SMTP is not configured and disables all notification controls.
SMTP Reply-To header — New SMTP_REPLY_TO env var allows replies to go to a different address than the sender.
Resource Fork Phase 1 (MCP Server + Skill) — POST /api/mcp-servers/{id}/fork and POST /api/skills/{id}/fork endpoints create user-owned deep copies with visibility=personal and forked_from lineage tracking. Encrypted env/headers are skipped on MCP Server fork; publish status is skipped on Skill fork. Alembic migration adds forked_from column to both tables. 41 tests.
Workflow Connection Dep Auto-Subscribe — DependencyAnalyzer._resolve_workflow now recursively resolves sub-workflow dependencies with cycle detection (visited set). Agent and sub-workflow nodes correctly added as content deps in dependency manifests. Missing resources handled gracefully (log warning, no failure). 19 tests.
Prebuilt Solution Templates (Market Seed Content) — 8 vertical solution templates bootstrapped idempotently on first-user registration: Financial Audit, Contract Review, Data Reporting, IT Helpdesk, HR Onboarding, Sales Assistant, Content Writer, Meeting Summary. Each bundles an Agent + Skill with Chinese SOPs. Published to Market org (visibility=org, publish_status=approved) for immediate marketplace availability. 4 tests.
ReAct cycle detection — Deterministic detection of repeated identical tool calls. Injects a warning after 3 consecutive calls with the same arguments, preventing agents from looping on failing tools. Configurable via REACT_CYCLE_DETECTION_THRESHOLD.
ReAct completion checklist — One-time verification prompt before accepting final answers when tools were used, reducing premature or incomplete responses. Toggleable per agent instance.

Changed

Completion checklist min-tools threshold — Checklist now only fires when the agent has made 3+ tool calls (configurable via REACT_COMPLETION_CHECK_MIN_TOOLS). Simple 1-2 tool tasks skip verification to avoid unnecessary latency.
Dynamic system prompt budgeting — Removed the fixed SYSTEM_PROMPT_RESERVE (4K tokens) from context budget calculation. ContextGuard now accounts for the system prompt dynamically, giving each iteration ~4K more usable context.
Centralized tool truncation — All tool types now delegate truncation to a shared module. Defaults configurable via TOOL_OUTPUT_MAX_CHARS, TOOL_OUTPUT_MAX_ITEMS, TOOL_OUTPUT_MAX_BYTES env vars.
Domain detection decoupled — Domain classification runs independently in each endpoint, no longer bundled with auto-routing. Domain SOP instructions softened to guide rather than mandate web search.
AUTO_ROUTING env var removed — Auto endpoint always classifies queries.

Fixed

Duplicate message submission — Chat input now uses a synchronous guard to prevent the same message from being submitted multiple times on rapid clicks.
Structured output degradation chain — The 3-level fallback (native FC → JSON mode → plain text) now properly falls through all levels.
json_mode_enabled DB value ignored — Models configured via Admin now correctly use their per-model setting instead of always falling back to the env var.
DAG planning failure message — Now shows user-friendly bilingual message instead of raw pipeline error.
MCP server owner bypass for allow_fallback — Server owner is no longer blocked by allow_fallback=False.

[v0.8] - 2026-03-20

Added

Marketplace redesign Phase 1 — Solutions + Components — Two-tier Market model (Solutions: Agent/Skill/Workflow; Components: Connector/MCP Server) with scope selector (Global Market / org). KB removed from Market scope. Unified subscription model.
Smart file content injection + read_uploaded_file tool — Small uploads (under 32K chars) auto-inlined into LLM context; large files get metadata + tool hint. Dual-mode reading tool with pagination and regex search. GET /api/files/{file_id}/content endpoint.
Workflow Blueprint System — Visual workflow editor for multi-step automation: 25 node types (Start, End, LLM, ConditionBranch, QuestionClassifier, Agent, KnowledgeRetrieval, Connector, HTTPRequest, VariableAssign, TemplateTransform, CodeExecution, Iterator, Loop, VariableAggregator, ParameterExtractor, ListOperation, Transform, DocumentExtractor, QuestionUnderstanding, HumanIntervention, SubWorkflow, ENV + more), React Flow v12 editor with drag-and-drop palette, auto-layout, SSE real-time execution, variable interpolation, condition/classifier branching, error strategies per node, per-node timeout, import/export/duplicate, version history with diff viewer, 14 built-in templates, 306 tests.
Workflow Triggers — Cron scheduling with timezone support; public API keys (wf_ prefix) for external execution without user auth; batch execution (up to 100 input sets, configurable parallelism).
Workflow Operations — Real-time execution log viewer, trace viewer with variable snapshots, run replay overlay on canvas, run history export, analytics dashboard with daily trends and percentiles, per-node statistics panel, favorites/pinning, inline validation badges, canvas node search (Cmd+F), keyboard shortcuts, snap-to-grid.
Workflow Admin + Templates — Admin management tab for all workflows, WorkflowTemplate model with admin CRUD and 5 seed templates, publish flow with org-level review gating, import conflict resolver for external references.
Agent Skill System — On-demand skill loading: Skill model with CRUD/publish/review, read_skill(name) tool for progressive disclosure (~80% token reduction), compact_instructions per-agent for custom ContextGuard compaction. Full Skills UI with list page, editor, and agent skill selector.
ConnectorMetaTool (Progressive Disclosure Phase 1-2) — Single meta-tool replaces per-action tools. System prompt receives lightweight stubs (~30 tokens/connector); agent calls discover/execute on demand. Feature flag CONNECTOR_TOOL_MODE for backward compatibility.
Connector import/export/fork — Share connector templates via JSON export, clone and customize via fork. Backend sanitizes credentials on export.
Connector credential encryption + per-user override — connector_credentials table with Fernet encryption, allow_fallback flag, GET/PUT/DELETE /my-credentials endpoints.
Publish review UI — Org-level review system with approve/reject workflow, status badges on resource cards, review notice in publish dialog, resubmit for rejected resources.
Semantic schema annotations — 16 predefined semantic tags for connector fields with description and pii flags, surfaced in LLM tool descriptions.
Agent mid-loop self-reflection — Goal-check prompt injected every 6 iterations in ReAct to prevent drift in long chains.
Shadow Market org + resource subscriptions — Pull-based resource sharing: resources discovered via marketplace and explicitly subscribed. Market API for browse/subscribe/unsubscribe.
Agent auto-discovery + sub-agent binding — discoverable flag + sub_agent_ids whitelist + CallAgentTool for one-level delegation.
MCP server credentials + per-user override — mcp_server_credentials table with allow_fallback flag for credential fallback behavior.
Connector/KB toggle — Suspend/resume endpoints for both resource types.
Standalone KB conversations — kb_ids field on conversations for direct KB chat without agent binding.
Review log audit tab — Admin audit page with system log / review log toggle and filterable review trail per org/resource.
Agent directive in synthesis — agent_directive parameter ensures final answers honour the agent’s core purpose.

Changed

Subscription-based visibility model — Simplified from 3-tier to 2-tier (own → subscribed). Auto-migration preserves existing access.
Tool cache whitelist — Replaced blacklist with explicit cacheable property on tools. 11 read-only tools marked cacheable.
DAG executor cascade failure — Failed steps now cascade-block dependents with transitive propagation.
DAG planner improvements — Tool descriptions in planner, full re-plan history across all rounds, 14 engine constants parameterized as env vars.
stream_answer observation truncation — Increased from 2000 to 8000 characters (configurable via REACT_TOOL_OBS_TRUNCATION).
Evidence confidence UI — Amber warning cards, [N] citation badges with hover popovers, conflict warning banner with side-by-side comparison.
Workflow version change summaries — Auto-generated human-readable summaries from blueprint diffs on version save.
Workflow run retention cleanup — Background cleanup task with configurable age/count limits. Env vars: WORKFLOW_RUN_MAX_AGE_DAYS, WORKFLOW_RUN_MAX_PER_WORKFLOW.
Connector circuit breaker — Three-state machine (closed/open/half-open) with per-connector failure tracking and monitoring endpoints.
Replaced elkjs with lightweight BFS auto-layout — /workflows/[id] bundle reduced from 473 kB to 43 kB.

Fixed

Workflow eval namespace flattening — Fixed short variable name resolution in ConditionBranch and VariableAssign.
No re-review on is_active toggle — Toggling is_active no longer reverts publish_status from approved to pending_review.
Cascade-skip for condition branches — Skipped nodes correctly deactivate outgoing edges.
Dependency analyzer — Fixed skill_ids resolution, case-insensitive node type matching.

Removed

Removed is_global field and all global visibility concepts — replaced by Market org + subscriptions.
Removed global agent/MCP server admin endpoints.

[v0.7.5] - 2026-03-12

Added

Free mode switching — Switch between Auto/React/DAG mid-conversation. Per-turn mode tracking via metadata.mode.
Three model roles — Independent env config for General, Fast, and Reasoning tiers. Fast model no longer inherits main model settings.
DAG engine improvements — StepOutput structured data, tool cache with async lock stampede prevention, per-step LLM verification with retry (DAG_STEP_VERIFICATION), auto-routing via fast LLM classification (AUTO_ROUTING).
Connector credential encryption — Auth tokens extracted to connector_credentials table with Fernet encryption via CREDENTIAL_ENCRYPTION_KEY. Per-user credential override endpoints. allow_fallback flag.
ModelConfig API key encryption at rest — Transparent encrypt-on-write / decrypt-on-read with backward-compatible plaintext detection.

Changed

Skeleton screens — All list/grid pages show layout-aware skeletons during load instead of spinners.

Fixed

Fast model no longer inherits settings from the main model.
SSE routing event field names aligned with backend.

[v0.7.4] - 2026-03-12

Added

Evaluation Center — Test dataset management, parallel eval runs with LLM grading, per-case pass/fail/latency/token results viewer with auto-polling.
Admin: json_mode_enabled per-model flag — Explicit toggle preventing AWS Bedrock prefill issues. ENV models controlled by LLM_JSON_MODE_ENABLED.
SSE Protocol v2 — Real-time streaming with delta_reasoning, usage fields, split done/suggestions/title/end events.
AI Builder expansion — 7 new builder tools, is_builder flag, builder prompt auto-refresh, SSRF guard. Full ReAct agent dialog for connector management.
Dual database support — SQLite (zero-config) + PostgreSQL (production). Docker Compose auto-provisions PG with health checks.
Extended thinking / reasoning — LLM_REASONING_EFFORT and LLM_REASONING_BUDGET_TOKENS for OpenAI o-series, Gemini 2.5+, Claude.
Admin: tool disable — Per-tool enable/disable toggles; disabled tools filtered from chat at runtime.
Settings: Organizations tab — Create, join, manage orgs with member roles directly from Settings.
Docker Compose deployment — Single image, named volumes, standalone Next.js output.
Export: PDF format — Conversations exportable as PDF documents.
Multi-worker support — WORKERS=N env var; Redis interrupt broker for cross-worker relay.

Changed

LLM layer: LiteLLM — Replaced direct AsyncOpenAI client for universal provider support.
Structured output degradation — Unified structured_llm_call() with 3-level extraction (Native FC → JSON Mode → plain text + regex).
Smart relay routing — Auto-detects API protocol from URL path patterns for third-party relay platforms.

Fixed

Docker sandbox (DooD) volume mount path translation.
Security: sandbox AST dunder validation, MCP stdio defaults, SSRF DNS rebinding, shell metacharacter evasion, connector template injection.
Admin dashboard stats crash on PostgreSQL.
Docker: i18n file discovery, startup race condition, OAuth auto-detect for custom ports.
Export: RFC 5987 filename for CJK.

[v0.7.3] - 2026-03-06

Added

Global MCP servers — Admin-provisioned, loaded in all chat sessions.
Structured audit logging — write_audit() helper with structured columns.

Fixed

Invite code backward-compat for legacy registration_enabled field.

[v0.7.2] - 2026-03-06

Added

Invite-only registration — Three modes (open/invite/disabled) with invite code CRUD.
Storage management — Per-user disk usage, clear, orphan cleanup.
Per-user force logout — Admin token revocation.
Conversation moderation — Admin list/delete all conversations.

[v0.7.1] - 2026-03-06

Added

API health dashboard — System stats, connector metrics, token usage charts.
JWT auth — Token-based SSE auth, conversation ownership.
Admin API — Agent management, per-user token quota (429 enforcement).

[v0.7] - 2026-03-06

Added

Admin Platform — User management, role toggle, password reset, account enable/disable.
First-run setup wizard — Guided admin account creation.
Personal Center — Per-user global instructions, language preference.

[v0.6.5] - 2026-03-05

Added

Utility tools — email_send, json_transform, template_render, text_utils.
Connector response filtering — CONNECTOR_RESPONSE_MAX_CHARS and CONNECTOR_RESPONSE_MAX_ITEMS.
Embedding model options — Jina, OpenAI, and custom providers.

[v0.6] - 2026-03-01

Added

Connector Platform — Full CRUD, ConnectorToolAdapter, per-user credential encryption, confirmation gate, circuit breaker, audit logging.
MCP integration — Tool auto-discovery via protocol, process isolation.

[v0.5] - 2026-02-28

Added

Full RAG pipeline — Jina embedding + LanceDB + FTS + RRF + reranker.
Grounded Generation — Evidence-anchored citations, conflict detection, confidence scores.
KB document management — Chunk-level CRUD, search, retry, schema migration.
ContextGuard + Pinned Messages — Token budget manager.
DAG Re-Planning — Up to 3 rounds; LLM Compact for memory.

[v0.4] - 2026-02-25

Added

Multi-turn conversations — DbMemory persistence, smart truncation.
Tool step folding UI — Collapse/expand tool calls.
HTTP request + shell exec tools.
Agent management — Create, configure, publish with bound models/tools.
JWT authentication.

[v0.3] - 2026-02-25

Added

Web tools — web_search (Jina/Tavily/Brave), web_fetch.
File operations + MCP client.
DAG visualization — Interactive flow graph with live status.
Code execution in Docker — --network=none, memory limits, timeout.

[v0.2] - 2026-02-24

Added

Retry & rate limiting — Exponential backoff.
Usage tracking — Per-request token/cost accounting.
Native function calling — Direct model tool selection.
Multi-model support — FAST_LLM_MODEL for DAG steps.
Memory system — Window, Summary, Db memory.
FastAPI backend — /api/execute, /api/stream (SSE).

[v0.1] - 2026-02-22

Added

ReActAgent — Reason → Act → Observe loop.
DAGPlanner — LLM-generated dependency graphs, concurrent execution, result verification.
Tools — Calculator, Python exec.
Portal UI — Next.js with streaming, dark/light theme, KaTeX.

​[Unreleased]

​Security

​Added

​Changed

​Fixed

​[v0.8.6] - 2026-05-08

​Added

​Changed

​Fixed

​[v0.8.5] - 2026-04-23

​Added

​Changed

​Fixed

​[v0.8.4] - 2026-04-17

​Added

​Changed

​Fixed

​[v0.8.3] - 2026-04-16

​Added

​Changed

​Fixed

​[v0.8.2] - 2026-04-10

​Added

​Changed

​Fixed

​[v0.8.1] - 2026-03-29

​Added

​Changed

​Fixed

​[v0.8] - 2026-03-20

​Added

​Changed

​Fixed

​Removed

​[v0.7.5] - 2026-03-12

​Added

​Changed

​Fixed

​[v0.7.4] - 2026-03-12

​Added

​Changed

​Fixed

​[v0.7.3] - 2026-03-06

​Added

​Fixed

​[v0.7.2] - 2026-03-06

​Added

​[v0.7.1] - 2026-03-06

​Added

​[v0.7] - 2026-03-06

​Added

​[v0.6.5] - 2026-03-05

​Added

​[v0.6] - 2026-03-01

​Added

​[v0.5] - 2026-02-28

​Added

​[v0.4] - 2026-02-25

​Added

​[v0.3] - 2026-02-25

​Added

​[v0.2] - 2026-02-24

​Added

​[v0.1] - 2026-02-22

​Added

[Unreleased]

Security

Added

Changed

Fixed

[v0.8.6] - 2026-05-08

Added

Changed

Fixed

[v0.8.5] - 2026-04-23

Added

Changed

Fixed

[v0.8.4] - 2026-04-17

Added

Changed

Fixed

[v0.8.3] - 2026-04-16

Added

Changed

Fixed

[v0.8.2] - 2026-04-10

Added

Changed

Fixed

[v0.8.1] - 2026-03-29

Added

Changed

Fixed

[v0.8] - 2026-03-20

Added

Changed

Fixed

Removed

[v0.7.5] - 2026-03-12

Added

Changed

Fixed

[v0.7.4] - 2026-03-12

Added

Changed

Fixed

[v0.7.3] - 2026-03-06

Added

Fixed

[v0.7.2] - 2026-03-06

Added

[v0.7.1] - 2026-03-06

Added

[v0.7] - 2026-03-06

Added

[v0.6.5] - 2026-03-05

Added

[v0.6] - 2026-03-01

Added

[v0.5] - 2026-02-28

Added

[v0.4] - 2026-02-25

Added

[v0.3] - 2026-02-25

Added

[v0.2] - 2026-02-24

Added

[v0.1] - 2026-02-22

Added