Provenance Tracking¶
StatQA provides comprehensive provenance metadata for all generated content, ensuring full reproducibility and traceability.
Metadata Structure¶
Every Q/A pair includes detailed provenance information:
{
"provenance": {
"generated_at": "2025-11-19T19:21:28+00:00",
"tool": "statqa",
"tool_version": "0.2.0",
"generation_method": "template", # or "llm_paraphrase"
"analysis_type": "univariate", # univariate/bivariate/temporal/causal
"analyzer": "UnivariateAnalyzer",
"variables": ["age"],
"statistical_tests": ["shapiro_wilk", "jarque_bera"],
"python_commands": [
"data['age'].mean() # Result: 42.5",
"data['age'].std() # Result: 12.3"
],
"llm_model": "gpt-4-turbo", # if LLM was used
"template_id": "distribution_summary" # if template-based
}
}
Benefits of Provenance Tracking¶
Reproducibility: Exact commands and parameters used
Quality Control: Track generation methods and models
Audit Trails: Full history of analysis decisions
Version Management: Tool and model versions recorded
Research Integrity: Transparent methodology documentation