API Model Usage Tracking
3 min read
Overview
The Model Usage feature allows tracking of model usage. This includes logging details such as input tokens, completion tokens, estimated cost, and associated metadata for each API call.
The feature is controlled by two feature flags:
FEATURE_FLAG_SAVE_MODEL_USAGE_UN_12832— enables persisting usage records to the databaseFEATURE_FLAG_SAVE_MODEL_USAGE_DASHBOARD_UN_18889— enables the Cost Management UI in Admin and the per-user spend badge in Chat
How It Works
1. What It Tracks
Language Model: The language model used in the request (e.g.:
AZURE_GPT_4o_2024_0806).Input Tokens: Number of tokens sent in the request.
Completion Tokens: Number of tokens generated in the response.
Spent: Estimated cost in USD, calculated as:
(inputTokens * inputCostPer1M + completionTokens * completionCostPer1M) / 1,000,000. Cost is looked up from the active model cost sheet (see Cost Configuration below).User Information: User Id (
userId), Company Id (companyId)Other Optional Fields (saved if present on the request):
App Id
Language Model
Chat Id
Assistant Id
2. Use Cases
Analytics: Provides insights into token usage per user, app, or assistant. (queryable via GraphQL; Admin dashboard available when
FEATURE_FLAG_SAVE_MODEL_USAGE_DASHBOARD_UN_18889is enabled)Cost visibility: Tracks spend per user, model, and assistant in real time.
Rate Limiting: Enables enforcement of usage limits (e.g., tokens, API calls). (future release)
Enabling the Feature
Set the feature flags on the node-chat service:
Flag | Value | Effect |
|---|---|---|
|
| Persists usage records. Required for any data to be collected. |
|
| Shows Cost Management in Admin UI and spend badge in Chat. |
Both flags also need to be set on the chat and admin frontend apps for FEATURE_FLAG_SAVE_MODEL_USAGE_DASHBOARD_UN_18889.
What It Covers
Public API / SDK / Agentic workflows: Tracks all model usage via the public API, SDK or agentic workflows.
Azure SDK calls: Tracks usage from direct Azure OpenAI SDK calls within the chat service.
Supported Models: Any supported model (e.g.,
AZURE_GPT_5_2025_0807,AZURE_GPT_4o_2024_0806,litellm:anthropic-claude-sonnet-4-5).Metadata: Tracks usage per user, app, assistant, and company.
Cost Configuration
Model prices are loaded from a YAML cost sheet (MODEL_COSTS_FILE) at service startup. The default file is bundled with the Docker image.
YAML format:
costSchemaVersion: 1
models:
AZURE_GPT_4o_2024_1120:
input: 2.5 # USD per 1M input tokens
completion: 10 # USD per 1M completion tokens
"litellm:anthropic-claude-sonnet-4-5":
input: 3
completion: 15Pricing sources:
Azure models: Azure OpenAI Service Pricing (Sweden Region)
LiteLLM models: LiteLLM Providers and Models
Per-client overrides are possible by mounting a custom YAML and setting MODEL_COSTS_FILE to its path in the Helm values.
Retrieval
Usage data is retrievable via GraphQL query. Example curl (US-MT environment, AZURE_GPT_5_2025_0807 model):
curl --request POST \
--url https://api.<base-url>/chat/graphql \
--header 'authorization: Bearer token' \
--header 'content-type: application/json' \
--data '{"query":"query ModelUsage($input: ModelUsageQueryDto!) {
ModelUsage(input: $input) {
appId
assistantId
chatId
userId
languageModel
completionTokens
inputTokens
spent
}
}","variables":"{
\"input\": {
\"languageModel\": \"AZURE_GPT_5_2025_0807\",
\"take\": 100
}
}"}'We advise not to build integrations on top of this. Analytics will soon be available via the Public API.
Retrieval Filters:
App ID (
appId)Language Model (
languageModel)Start Date (
startDate)End Date (
endDate)Skip (
skip)Take (
take)Chat Id (
chatId, from version 2026.04)Assistant Id (
assistantId, from version 2026.04)
Aggregation query (UsageAggregation) is also available, grouping by languageModel, userId, or assistantId with totals for input tokens, completion tokens, and spent.
Limitations
Feature Flag Dependency: The feature will not work unless
FEATURE_FLAG_SAVE_MODEL_USAGE_UN_12832is enabled.Public API/SDK/Azure SDK: Tracking covers public API, SDK, agentic workflows, and Azure SDK calls within the chat service. Other internal services are not covered.
REST API: Usage data is not yet available via REST API. A Public API endpoint is planned.
Future Enhancements
Additional Retrieval Methods: REST / Public API endpoints for usage data.
Rate Limiting: Enforcement of per-user or per-company usage limits.