Model Usage and Cost Management

5 min read

This feature is EXPERIMENTAL and under active development. It may change significantly, be discontinued, or have breaking changes without notice. Documentation may be incomplete or outdated and is NOT recommended for production use. Use at your own risk. Please refer to our Upgrade and Release Process for more information.

Overview

The Model Usage and Cost Management feature gives organisations full visibility into how their users interact with AI models — tracking token consumption and estimated spend per user, assistant, and application in real time.

Once enabled, the feature:

Records every LLM call (input tokens + completion tokens + estimated cost in USD)
Shows an Admin dashboard with usage reporting and model pricing
Shows a per-user spend badge in the Chat interface
Supports CSV export for offline analysis

Enabling the Feature

Model Usage tracking is controlled by two feature flags. Both are disabled by default and must be activated by Unique.

Feature Flag	What it enables	Feature Status
`FEATURE_FLAG_SAVE_MODEL_USAGE_UN_12832`	Starts recording usage data to the database. Must be on for anything to work.	Experimental
`FEATURE_FLAG_SAVE_MODEL_USAGE_DASHBOARD_UN_18889`	Shows the Cost Management section in the Admin UI and the spend badge in the Chat user menu.	Experimental
`FEATURE_FLAG_MODEL_USAGE_LIMITS_UN_18889`	Allows setting limits for model usage.	Experimental

To enable these flags for a client, please contact your Unique Representative or raise a request via the Enterprise mailbox.

Admin UI — Cost Management

Visible in the Settings Page → Cost Management (left sidebar), once the dashboard flag is enabled.

Requires the user to have the role CHAT_FEEDBACK_READ or CHAT_DATA_ADMIN.

Model Pricing

A read-only table showing the active pricing for every supported model:

Column	Description
Model	Internal model identifier (e.g. `AZURE_GPT_4o_2024_1120`, `litellm:anthropic-claude-sonnet-4-5`)
Input cost / 1M tokens	Cost in USD per million prompt tokens
Completion cost / 1M tokens	Cost in USD per million completion (output) tokens
Currency	USD by default

Usage Reporting

An aggregated view of token consumption and estimated spend across the organisation.

Filters available:

Date range — This month, last month, this week, or custom range (with previous/next period navigation)
View by — Model, User, or Assistant
Drill-down — Click any row to see the breakdown within that dimension
Text filter — Search by model name, user, or assistant
Pagination — Configurable page size

Limits

The Limits tab inside the Cost Management dashboard lets administrators cap how much each user spends on AI model calls per day. All limits are denominated in USD and reset automatically at midnight in the user's local timezone — there is no manual reset.

Navigate to Settings → Cost Management → Limits to configure limits. Requires the CHAT_DATA_ADMIN role.

Feature flag: Limits are controlled by FEATURE_FLAG_MODEL_USAGE_LIMITS_UN_18889 and must be activated by Unique.

Limit levels

Three levels of limits can be configured:

Level	Description
Default (Company)	Applies to every user in the organisation who has no user or group limit. Acts as the global fallback.
Group limit	Applies to all members of a specific group. Overrides the company default.
User limit	Applies to a specific user. Takes priority over everything else.

Priority and resolution

When a user makes a model call, the system resolves which limit applies in this order:

User limit — if a direct limit has been set for that user, it is used. No group or company limit is evaluated.
Group limit — if the user belongs to one or more groups that have limits, the highest group limit applies (the most permissive one wins).
Company default — if no user or group limit applies, the company-wide default is used.
No limit — if no limit is configured at any level, there is no spending cap.

Daily reset

Limits are daily and reset at midnight in the user's timezone. The timezone is locked at the start of each calendar day and does not shift mid-day if the user changes their timezone settings.

Enforcement

Limits are checked before each model call. When a user's accumulated spend for the day reaches their limit, the next model call is blocked and the user sees a message indicating how much they have used and what their daily limit is.

Setting a limit

Company default — enter a USD amount and save. Set to 0 to remove the default (no cap).
Group limits — select a group, enter a USD amount, and save. Remove the row to lift the cap for that group.
User limits — search for a user, enter a USD amount, and save. Remove the row to lift the cap for that user.

How Costs Are Calculated

Cost is computed per LLM call using the formula:

spent = (inputTokens x inputCostPer1M + completionTokens x completionCostPer1M) / 1,000,000

All costs are expressed in USD.

Default Pricing Sources

Default prices are maintained by the Unique engineering team and sourced from:

Azure OpenAI models — Azure OpenAI Service Pricing (Sweden Region): https://azure.microsoft.com/en-us/pricing/details/azure-openai/
LiteLLM-proxied models (Anthropic Claude, Gemini, Mistral, OpenAI via LiteLLM) — LiteLLM Providers and Models: https://models.litellm.ai/

Sample Default Prices (April 2026)

Model	Input (USD / 1M tokens)	Completion (USD / 1M tokens)
AZURE_GPT_4o_2024_1120	$2.50	$10.00
AZURE_GPT_4o_MINI_2024_0718	$0.15	$0.60
AZURE_GPT_41_2025_0414	$2.00	$8.00
AZURE_o3_2025_0416	$2.00	$8.00
AZURE_o4_MINI_2025_0416	$1.10	$4.40
litellm:anthropic-claude-sonnet-4-5	$3.00	$15.00
litellm:anthropic-claude-opus-4-5	$5.00	$25.00
litellm:gemini-2-5-pro	$1.25	$10.00

Per-Client Price Override

Prices can be customised per client. If a client's Azure contract or LiteLLM agreement carries different rates, the Unique engineering team can supply a custom pricing configuration for that deployment. Contact your CS to arrange this.

CSV Export

Usage data can be exported as a CSV file via the analytics export pipeline. User privacy settings (pseudonymisation or anonymisation) are respected per organisation configuration.

CSV columns:

Column	Description
S/N	Row sequence number
User ID	User identifier (may be pseudonymised/anonymised per org settings)
Assistant ID	The assistant used (N/A if not applicable)
Chat ID	The conversation session
App ID	The Unique application
Language Model	Model identifier
Spent	Estimated cost in USD
Input Tokens	Number of prompt tokens
Completion Tokens	Number of output tokens
Timestamp	UTC timestamp of the LLM call

Required User Roles

Role	What it grants
`CHAT_FEEDBACK_READ`	View usage reporting and model pricing in Admin
`CHAT_DATA_ADMIN`	Full access to model usage data in Admin

API Model Usage Tracking — Technical reference for developers integrating via the API