API Model Usage Tracking

3 min read

Overview

The Model Usage feature allows tracking of model usage. This includes logging details such as input tokens, completion tokens, estimated cost, and associated metadata for each API call.

The feature is controlled by two feature flags:

  • FEATURE_FLAG_SAVE_MODEL_USAGE_UN_12832 — enables persisting usage records to the database

  • FEATURE_FLAG_SAVE_MODEL_USAGE_DASHBOARD_UN_18889 — enables the Cost Management UI in Admin and the per-user spend badge in Chat

How It Works

1. What It Tracks

  • Language Model: The language model used in the request (e.g.: AZURE_GPT_4o_2024_0806).

  • Input Tokens: Number of tokens sent in the request.

  • Completion Tokens: Number of tokens generated in the response.

  • Spent: Estimated cost in USD, calculated as: (inputTokens * inputCostPer1M + completionTokens * completionCostPer1M) / 1,000,000. Cost is looked up from the active model cost sheet (see Cost Configuration below).

  • User Information: User Id (userId), Company Id (companyId)

  • Other Optional Fields (saved if present on the request):

    • App Id

    • Language Model

    • Chat Id

    • Assistant Id

2. Use Cases

  • Analytics: Provides insights into token usage per user, app, or assistant. (queryable via GraphQL; Admin dashboard available when FEATURE_FLAG_SAVE_MODEL_USAGE_DASHBOARD_UN_18889 is enabled)

  • Cost visibility: Tracks spend per user, model, and assistant in real time.

  • Rate Limiting: Enables enforcement of usage limits (e.g., tokens, API calls). (future release)

Enabling the Feature

Set the feature flags on the node-chat service:

Flag

Value

Effect

FEATURE_FLAG_SAVE_MODEL_USAGE_UN_12832

"true" or comma-separated company IDs

Persists usage records. Required for any data to be collected.

FEATURE_FLAG_SAVE_MODEL_USAGE_DASHBOARD_UN_18889

"true" or comma-separated company IDs

Shows Cost Management in Admin UI and spend badge in Chat.

Both flags also need to be set on the chat and admin frontend apps for FEATURE_FLAG_SAVE_MODEL_USAGE_DASHBOARD_UN_18889.

What It Covers

  • Public API / SDK / Agentic workflows: Tracks all model usage via the public API, SDK or agentic workflows.

  • Azure SDK calls: Tracks usage from direct Azure OpenAI SDK calls within the chat service.

  • Supported Models: Any supported model (e.g., AZURE_GPT_5_2025_0807, AZURE_GPT_4o_2024_0806, litellm:anthropic-claude-sonnet-4-5).

  • Metadata: Tracks usage per user, app, assistant, and company.

Cost Configuration

Model prices are loaded from a YAML cost sheet (MODEL_COSTS_FILE) at service startup. The default file is bundled with the Docker image.

YAML format:

yaml
costSchemaVersion: 1
models:
  AZURE_GPT_4o_2024_1120:
    input: 2.5        # USD per 1M input tokens
    completion: 10    # USD per 1M completion tokens
  "litellm:anthropic-claude-sonnet-4-5":
    input: 3
    completion: 15

Pricing sources:

Per-client overrides are possible by mounting a custom YAML and setting MODEL_COSTS_FILE to its path in the Helm values.

Retrieval

Usage data is retrievable via GraphQL query. Example curl (US-MT environment, AZURE_GPT_5_2025_0807 model):

bash
curl --request POST \
  --url https://api.<base-url>/chat/graphql \
  --header 'authorization: Bearer token' \
  --header 'content-type: application/json' \
  --data '{"query":"query ModelUsage($input: ModelUsageQueryDto!) {
  ModelUsage(input: $input) {
    appId
    assistantId
    chatId
    userId
    languageModel
    completionTokens
    inputTokens
    spent
  }
}","variables":"{
  \"input\": {
    \"languageModel\": \"AZURE_GPT_5_2025_0807\",
    \"take\": 100
  }
}"}'

We advise not to build integrations on top of this. Analytics will soon be available via the Public API.

Retrieval Filters:

  • App ID (appId)

  • Language Model (languageModel)

  • Start Date (startDate)

  • End Date (endDate)

  • Skip (skip)

  • Take (take)

  • Chat Id (chatId, from version 2026.04)

  • Assistant Id (assistantId, from version 2026.04)

Aggregation query (UsageAggregation) is also available, grouping by languageModel, userId, or assistantId with totals for input tokens, completion tokens, and spent.

Limitations

  • Feature Flag Dependency: The feature will not work unless FEATURE_FLAG_SAVE_MODEL_USAGE_UN_12832 is enabled.

  • Public API/SDK/Azure SDK: Tracking covers public API, SDK, agentic workflows, and Azure SDK calls within the chat service. Other internal services are not covered.

  • REST API: Usage data is not yet available via REST API. A Public API endpoint is planned.

Future Enhancements

  • Additional Retrieval Methods: REST / Public API endpoints for usage data.

  • Rate Limiting: Enforcement of per-user or per-company usage limits.

Last updated