Provider Adapters¶

The package is generic in the middle and provider-specific at the edges.

CapacityProfile¶

A provider adapter boils down to a CapacityProfile:

throughput_per_unit
purchase_increment
min_units
input_weight
cached_input_weight
output_weight
thinking_weight
optional long-context overrides

That is enough to turn requests into adjusted work and then into required reserved units.

Vertex AI GSU¶

The package ships built-in Vertex AI profiles based on Google Cloud’s provisioned throughput documentation.

Available Models¶

Model	Throughput per GSU	Output Weight	Long Context
gemini-2.0-flash-001	3,360	4x	No
gemini-2.0-flash-lite-001	6,720	4x	No
gemini-2.5-flash	2,690	9x	Yes (>200k)
gemini-2.5-flash-lite	8,070	4x	No
gemini-2.5-pro	650	8x	Yes (>200k)
gemini-3.1-flash-lite-preview	4,030	6x	No

Token Burndown Rates¶

Vertex AI uses different burndown rates for input vs output tokens:

Input tokens: 1x weight (baseline)
Cached input tokens: 0.1x weight (90% discount)
Output tokens: 4-9x weight depending on model
Thinking tokens: Same as output weight

Long Context Threshold¶

For models with long context support, requests exceeding 200,000 input tokens use elevated weights:

Input: 2x (instead of 1x)
Output: 12x (instead of 8-9x)

Usage¶

import slosizer as slz

profile = slz.vertex_profile("gemini-2.5-flash")

These profiles are text-centric. If you use images, audio, video, or other token classes, add columns and extend the profile before trusting the numbers.

Azure OpenAI PTU¶

Azure PTU support is calibration-first. PTU behavior is highly workload-sensitive, so we don’t ship built-in profiles.

Reference: Azure OpenAI Provisioned Throughput

Key Characteristics¶

Workload-sensitive: Throughput varies significantly based on prompt/completion ratios
Token ratio: For GPT-4.1 and later, 1 output token ≈ 4 input tokens
Calibration required: Use Azure capacity calculator + benchmarks

Calibration Process¶

Use the Azure capacity calculator to estimate baseline throughput
Deploy with your actual workload and measure via Azure Monitor
Refine the profile based on observed throughput

Usage¶

import slosizer as slz

profile = slz.azure_profile(
    "gpt-4.1",
    throughput_per_unit=12000.0,
    input_weight=1.0,
    output_weight=4.0,
    thinking_weight=4.0,
)

Anthropic Claude (Planned)¶

Status: Not Yet Implemented

Anthropic doesn’t offer a provisioned throughput model like Vertex GSU or Azure PTU. Claude uses tier-based rate limits which don’t map cleanly to slosizer’s capacity unit model. A future version may add support for modeling Claude rate limits, but there is currently no built-in anthropic_profile() function.

Reference: Anthropic Rate Limits