How Azure OpenAI Tracks Token Usage
Azure OpenAI measures usage in terms of tokens processed – counting both prompt tokens (input) and completion tokens (output) for each API call. Every request’s response actually includes a usage breakdown in the JSON (showing prompt tokens, completion tokens, and total tokens). Under the hood, Azure OpenAI aggregates these token counts as metrics. Importantly, this usage tracking works the same way regardless of billing model – whether you are on Pay-As-You-Go or using Provisioned Throughput Units (PTUs). The service doesn’t “count” tokens differently for different billing plans; it always tallies the number of tokens consumed by your requests in a consistent manner (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn). In other words, token consumption is recorded identically; only the billing interpretation of that usage differs (as explained below).
- Pay-As-You-Go (PAYG): On the PAYG model, you are charged per token consumed (with different rates for input vs. output tokens, varying by model). The Azure OpenAI resource still tracks how many tokens you use, but in this model those token counts directly translate into costs on your bill. Azure imposes certain rate limits (tokens-per-minute quotas) on PAYG deployments since they run on shared infrastructure. For example, API responses for PAYG calls include headers like
x-ratelimit-remaining-tokens
indicating how many tokens you have left in the current time window (Azure OpenAI PTU utilization - Microsoft Q&A). These headers help you gauge usage against the rate limit but do not change how tokens are counted – they’re purely for throttling feedback. - Provisioned Throughput Units (PTU): PTU (a provisioned capacity model) means you reserve a dedicated throughput (measured in token processing units per second/minute) for a fixed hourly or monthly fee. Token usage is still counted in the same way (the service logs how many tokens your calls used), but you aren’t charged per token. Instead, you pay for the reserved capacity (whether you fully utilize it or not). Because capacity is prepaid, there’s no pay-per-token charge to “meter” in real time; however, Azure provides metrics to show how much of your capacity you’re using. For instance, API responses on PTU deployments include an
azure-openai-deployment-utilization
header, which indicates the current utilization percentage of your reserved throughput (Azure OpenAI PTU utilization - Microsoft Q&A). This header tells you how close the deployment is to its maximum PTU capacity at that moment (unlike PAYG, where the focus is on remaining tokens before throttling). Again, the internal token counting is the same; it’s the billing that differs (PTU is a flat rate for capacity, so those token counts aren’t directly billed, but they are used to calculate utilization).
Consistent Token Counting Across Billing Models
No matter the billing model, Azure OpenAI’s usage metrics count tokens uniformly. Each call’s prompt and completion tokens are summed as “tokens processed,” and these get recorded in Azure’s monitoring system. Microsoft’s documentation confirms that core usage metrics like “Processed Prompt Tokens” (input tokens) and “Generated Completion Tokens” (output tokens) apply to both PTU and Pay-As-You-Go deployments (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn). In other words, the same metric definitions are used whether you’re on a PTU (provisioned) deployment or a standard PAYG deployment. The billing model does not change how the service measures token usage – it only changes how you pay for that usage.
To be clear, using PTUs doesn’t give you any “different kind” of token count; it simply means you have purchased a certain throughput. You can imagine that under both models, an internal counter is adding up tokens in the same way. The PAYG model converts those counts into a dollar cost per 1,000 tokens, whereas the PTU model converts them into a percentage of your reserved capacity used. Microsoft’s official metrics reference shows that metrics like “Processed Inference Tokens” (which counts total tokens = prompt + completion) are reported for all deployment types (Standard PAYG, PTU, and PTU-managed) (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn). This confirms that usage reporting is consistent across billing models – the system counts tokens the same; it’s just that with PTU you won’t see a running monetary charge for each token.
Additionally, the Azure OpenAI monitoring dashboards provided in Azure illustrate both scenarios side by side. The out-of-box dashboard for an Azure OpenAI resource has a “Tokens-Based Usage” section (showing token consumption over time) and a “PTU Utilization” section for those with provisioned throughput (Monitor Azure OpenAI Service - Azure AI services | Microsoft Learn). The presence of both categories indicates that token usage is tracked universally, while PTU customers get an extra view of capacity usage. In summary, you can trust that a “token” is a token – counted the same way – regardless of whether you pay per token (PAYG) or via reserved capacity (PTU). The billing model only affects how costs are calculated, not how usage is measured.
PTU-Specific Metrics (Utilization and Throughput)
While the fundamental usage metrics are the same for all billing models, Azure provides additional metrics for PTU deployments to help you monitor your reserved capacity utilization. If you are using PTUs, you’ll want to pay attention to metrics that reflect throughput and utilization of your provisioned units:
- Utilization (%) Metrics: The key PTU-specific metric is “Provisioned-Managed Utilization V2”, which measures what percentage of your allocated throughput is being used over time (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn). This metric essentially tracks
(tokens consumed / tokens capacity)
in each time interval to show how close you are to saturating your PTU. Microsoft documentation describes this metric as “Utilization % for a provisioned-managed deployment, calculated as (PTUs consumed / PTUs deployed) x 100”, reported in 1-minute increments (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn) (Azure OpenAI Service provisioned throughput - Azure AI services | Microsoft Learn). When this utilization hits 100%, it means your deployment is at full capacity; further requests will be throttled with HTTP 429 errors until utilization drops. The Azure portal’s PTU Utilization dashboard will graph this percentage so you can see your usage vs. capacity at a glance. (For PAYG deployments, this metric isn’t applicable, since there’s no fixed capacity – instead PAYG uses rate-limit policies at the service level.) - Active Tokens (Throughput) Metric: Another PTU-related metric is “Active Tokens”, which represents the number of tokens processed minus any tokens served from cache (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn). This metric is used for PTU and PTU-managed deployments to gauge the actual token throughput hitting the model (excluding cached reuse). In practice, “Active Tokens” helps PTU customers understand their TPS/TPM (tokens per second or per minute) against the provisioned capacity (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn). You can compare this to your expected throughput to see if you’re within bounds. (This metric isn’t as relevant for PAYG because on PAYG you’re typically more concerned with total tokens for cost, whereas PTU you’re concerned with tokens per time interval for utilization.)
- Tokens per Second: Azure Monitor also offers a “Tokens Per Second” metric (a real-time throughput rate) and related timing metrics, but note that Microsoft currently reports Tokens/sec and some latency metrics only for PTU deployments, not for pay-as-you-go (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn). This is likely because in shared (PAYG) mode those performance metrics can vary unpredictably, whereas with dedicated PTU capacity they can measure consistent throughput. So if you are on PTU, you have a richer set of performance metrics (e.g. latency, time between tokens, etc.) to analyze; on PAYG these specific metrics are not populated (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn).
In short, PTU customers get extra metrics to manage their reserved capacity (utilization % and throughput rates), which are accessible via Azure Monitor. These are in addition to the standard token consumption metrics that everyone gets. The billing model doesn’t affect the counting of tokens, but with PTU you’ll use these metrics to ensure you’re using what you paid for efficiently (and not consistently hitting 100% utilization, for example).
Viewing Usage Metrics in the Azure Portal
You can validate and monitor token usage for your Azure OpenAI resource directly in the Azure Portal. Microsoft provides multiple ways to see both your token consumption and, if applicable, your PTU utilization:
- Azure OpenAI Resource Dashboards: Navigate to your Azure OpenAI resource in the Azure Portal. On the Overview blade, you’ll typically see some high-level metrics. For a deeper look, Microsoft’s documentation mentions an AI Foundry metrics dashboard accessible via the Azure OpenAI resource page (there’s a “Go to Azure AI Foundry portal” link on the overview pane) (Monitor Azure OpenAI Service - Azure AI services | Microsoft Learn). The built-in metrics dashboard is grouped into categories like “HTTP Requests”, “Tokens-Based Usage”, “PTU Utilization”, and “Fine-tuning” (Monitor Azure OpenAI Service - Azure AI services | Microsoft Learn). To check token usage, you’d focus on the Tokens-Based Usage graphs, which display the number of tokens used over time. If you have PTU deployments, the PTU Utilization section will show how much of your capacity is being used (often as a percentage or as active tokens vs. allocated tokens). These out-of-box dashboards provide a convenient at-a-glance view. For example, you might see a chart of “Total Tokens per hour” and a chart of “Utilization % of PTU deployment X” on the same page.
- Metrics Explorer (Custom Metrics): For more control, use the Metrics blade under Monitoring for your Azure OpenAI resource. Here you can plot and filter specific metrics. In the Metrics explorer:
- Select your Azure OpenAI resource and the metric namespace (it may default to Azure OpenAI metrics).
- For token usage, choose metrics such as “Processed Prompt Tokens” (input tokens), “Generated Completion Tokens” (output tokens), or “Processed Inference Tokens” (total tokens). These metrics are recorded automatically for all deployments (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn) (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn). You can view them as a sum over time, or rate, etc., and adjust the time range to your needs.
- Apply splits or filters by deployment or model, if desired. The metrics include dimensions like ModelDeploymentName and ModelName. For instance, you can filter the metric to a specific deployment (if you have multiple model deployments under the same Azure OpenAI resource) to see token usage for that particular model endpoint (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn). This effectively gives you a per-deployment breakdown of token consumption. Similarly, splitting by ModelName could show separate lines for GPT-4 vs GPT-3.5 deployments, etc.
- If you are using PTU, select metrics like “AzureOpenAIProvisionedManagedUtilizationV2” (the utilization % discussed above) or “ActiveTokens” to monitor capacity usage (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn) (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn). These can be filtered by deployment as well (in case you have multiple PTU deployments).
- You can also set up alerts on these metrics (for example, an alert if utilization % goes above 90% or if token usage spikes beyond a certain rate).
- Cost Analysis (Billing): For PAYG users, you can cross-check cost and usage via Azure Cost Management. In the Azure Portal, go to your subscription’s Cost Analysis (or the resource’s Cost Analysis if supported) to see charges. Token usage charges appear under Cognitive Services for the Azure OpenAI resource. For example, you can filter by your resource or by service name to see how much you spent on input and output tokens in a given period. This is more about dollars, but it indirectly reflects the token counts (since cost is proportional to tokens in PAYG). For PTU, your cost will be a fixed amount (for the reserved capacity hours) rather than per-token charges, so Cost Analysis will show the reservation costs. The token metrics in Azure Monitor are the better way to see actual token volumes in PTU scenarios (since cost won’t fluctuate with usage when on a fixed PTU plan).
In summary, the Azure Portal’s Monitoring -> Metrics section is the primary place to validate token usage numbers. The Metrics dashboard gives a friendly overview, and the Metrics explorer allows detailed queries and breakdowns (e.g., per deployment or model). All these are consistent across billing models – you’ll see token counts in both cases; if you have PTU, you’ll just have some extra metrics like utilization available as well.
Breakdown by Deployment, Model, or API Key
By default, Azure OpenAI’s built-in metrics allow you to break down usage per deployment and model, but not inherently by individual end-user or API key (since the service doesn’t inherently know about multiple API keys if you’re just using the single resource key). Here’s how to achieve various breakdowns:
- Per Deployment / Model: As noted, metrics can be filtered by the ModelDeploymentName (the name you gave the deployment in Azure) or ModelName (the base model, e.g., “gpt-4” or “gpt-35-turbo”). This means if you have multiple deployments (for example, one deployment of gpt-4 and another of gpt-3.5 in your resource), you can see token usage for each separately (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn). In the Azure Portal metrics interface, you would either apply a filter for a specific deployment or use the “Split” function on the deployment dimension to get a chart with one line per deployment. This is very useful for monitoring which model is consuming how many tokens. It answers questions like “Which of my deployments is driving most of the usage?” directly from the Azure metrics – no external instrumentation needed.
- Per API Key / Consumer: Azure OpenAI doesn’t natively report metrics by API key or caller identity if you’re using the resource access keys directly – all usage with a given resource key aggregates under that resource. If you need to track usage by different users or applications, you have a couple of options:
- Use Azure API Management (APIM): Microsoft recommends using APIM as a front-end to Azure OpenAI if you want to expose it to multiple internal or external consumers with separate credentials. By importing the Azure OpenAI API into APIM, you can issue separate subscription keys to different consumers. APIM can then emit custom metrics per subscription. In fact, there is a built-in APIM policy called
azure-openai-emit-token-metric
which records token usage metrics to Application Insights, and it allows adding dimensions such as the APIM Subscription ID (which maps to an individual API key/consumer) (Azure API Management policy reference - azure-openai-emit-token-metric | Microsoft Learn). This way, you can get a breakdown of tokens used per client. Essentially, APIM will capture the usage from each caller separately and forward the calls to your Azure OpenAI resource. Azure OpenAI itself will still see total tokens, but APIM’s metrics or logs will attribute which subscription (user) was responsible. This is a recommended approach for multi-tenant scenarios or chargeback models. (Microsoft’s documentation and samples confirm you can include dimensions likeSubscription ID
orUser ID
in the token metric policy to achieve per-consumer tracking (Azure API Management policy reference - azure-openai-emit-token-metric | Microsoft Learn).) - Custom Logging in Application Code: Alternatively, you could parse the usage from each API response (the usage JSON mentioned earlier) and log it along with an identifier of the user/request in your own database or analytics tool. This requires more custom work, but it’s another way to get per-user token counts if APIM is not used. Each response provides
total_tokens
, so your application can sum those per user over time.
- Use Azure API Management (APIM): Microsoft recommends using APIM as a front-end to Azure OpenAI if you want to expose it to multiple internal or external consumers with separate credentials. By importing the Azure OpenAI API into APIM, you can issue separate subscription keys to different consumers. APIM can then emit custom metrics per subscription. In fact, there is a built-in APIM policy called
It’s worth noting that Azure Monitor’s default metrics do include a dimension called UsageChannel
and ApiName
, which indicate how the call was made (for example, which API operation or channel – such as ChatCompletion vs Completion) (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn). But they do not include a caller ID by default. Thus, for token breakdown by API key or user, you will need to implement a solution like APIM or custom logging. The billing itself (especially for PAYG) is at the resource level, so Azure’s own cost reports won’t split by user – that’s something you’d manually create via the above methods if needed for internal accounting.
Summary
In conclusion, Azure OpenAI tracks token usage uniformly across both Pay-As-You-Go and PTU billing models. All usage is measured in tokens (input and output) and surfaced through Azure Monitor metrics. The billing model only affects how you are charged (per token in PAYG vs. per hour of capacity in PTU) – it does not change the underlying token counting. PAYG and PTU deployments both report into metrics like “Processed Prompt Tokens” and “Generated Completion Tokens” (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn), ensuring consistent usage reporting. PTU deployments simply have additional metrics (like utilization percentage) to help you gauge your usage of the reserved capacity (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn).
To monitor your usage, use the Azure Portal: check the Metrics (or the provided dashboards) for token counts and, if applicable, utilization stats. You can see breakdowns per model deployment easily in the metrics view. For more granular per-user or per-key insights, consider fronting the service with API Management and using its token metrics capability or implement custom logging. All official guidance (Microsoft Learn docs and Azure Portal tools) confirms that token usage is counted the same regardless of PTU vs PAYG – the difference lies only in cost calculation and capacity management (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn) (Azure OpenAI Service provisioned throughput - Azure AI services | Microsoft Learn). By regularly checking the “Tokens-Based Usage” metrics and (for PTU) the “PTU Utilization” metrics in the Azure Portal, you can validate exactly how many tokens are being used and ensure that aligns with your expectations and billing model (Monitor Azure OpenAI Service - Azure AI services | Microsoft Learn).
Sources:
- Microsoft Azure OpenAI Monitoring Reference – showing token metrics apply to both PAYG and PTU deployments (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn) (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn).
- Azure OpenAI PTU Utilization and Throughput – utilization metric definition and usage in Azure Monitor (Monitoring data reference for Azure OpenAI - Azure AI services | Microsoft Learn) (Azure OpenAI Service provisioned throughput - Azure AI services | Microsoft Learn).
- Azure documentation on built-in dashboards for OpenAI (Tokens-Based Usage and PTU Utilization categories) (Monitor Azure OpenAI Service - Azure AI services | Microsoft Learn).
- Microsoft Q&A and Azure API Management docs – confirming PAYG vs PTU headers and methods for tracking usage per subscription (APIM) (Azure OpenAI PTU utilization - Microsoft Q&A) (Azure API Management policy reference - azure-openai-emit-token-metric | Microsoft Learn).
🚀 Join the DevOps Dojo! 🌟
Are you passionate about growth, learning, and collaboration in the world of DevOps? The DevOps Dojo is your new home! Whether you’re just starting out or looking to refine your skills, this vibrant community is here to support your journey.
🔧 What You’ll Get:
- Access to expert-led discussions
- Hands-on learning opportunities
- Networking with like-minded professionals
Ready to take your DevOps game to the next level? Click below to learn more and join the community!
Let’s build, grow, and thrive together! 🌐