# Data requirements You can start with only three fields: | Field | Required | Meaning | | --- | --- | --- | | timestamp | yes | request arrival time; numeric seconds or a datetime-like column | | input_tokens | yes | request input or prompt tokens | | output_tokens | yes | generated response tokens | The package gets more useful when you also provide these: | Field | Recommended | Why it matters | | --- | --- | --- | | cached_input_tokens | yes | some providers discount cached tokens | | thinking_tokens | yes | reasoning-heavy routes can burn far more reserved capacity | | max_output_tokens | yes | useful for conservative planning and admission-style estimates | | class_name | yes | lets you segment chat, RAG, tool use, reasoning, and other traffic classes | | latency_s | yes | helps fit a baseline-latency model from real telemetry | ## Minimal schema ```csv timestamp,input_tokens,output_tokens 0.0,800,120 0.2,1200,180 0.7,600,95 ``` ## Recommended schema ```csv timestamp,class_name,input_tokens,cached_input_tokens,output_tokens,thinking_tokens,max_output_tokens,latency_s 0.0,chat,800,120,120,0,512,0.74 0.2,rag,1200,350,180,0,768,1.10 0.7,reasoning,600,0,95,210,1024,1.48 ``` ## Example files in this repo - `examples/input/synthetic_request_trace_baseline.csv` - `examples/input/synthetic_request_trace_optimized.csv` Those are fake, but structurally realistic.