Q/A Generation¶

Question/Answer pair generation from statistical insights.

Generator¶

Main Q/A generation engine with template-based and LLM-powered approaches.

Q/A pair generation from statistical insights.

Converts facts into multiple question/answer pairs using: 1. Template-based generation 2. LLM paraphrasing and augmentation

class statqa.qa.generator.QAGenerator(use_llm=False, llm_provider='openai', llm_model=None, api_key=None, paraphrase_count=2)[source]¶

Bases: object

Generates Q/A pairs from statistical insights.

Parameters:

use_llm (bool) – Whether to use LLM for paraphrasing
llm_provider (Literal['openai', 'anthropic']) – LLM provider (‘openai’ or ‘anthropic’)
llm_model (str | None) – Model name
api_key (str | None) – API key for LLM
paraphrase_count (int) – Number of paraphrased versions per question

Raises:

ImportError – If required LLM package not installed
ValueError – If LLM provider configuration is invalid

export_qa_dataset(qa_results, output_format='jsonl')[source]¶

Export Q/A pairs in format suitable for LLM fine-tuning.

Parameters:

qa_results (list[dict[str, Any]]) – Results from generate_batch
output_format (str) – ‘jsonl’, ‘openai’, or ‘anthropic’

Return type:

list[str]

Returns:

List of formatted strings (one per line for JSONL)

generate_batch(insights, formatted_answers)[source]¶

Generate Q/A pairs for multiple insights.

Parameters:

insights (list[dict[str, Any]]) – List of statistical insights
formatted_answers (list[str]) – Corresponding natural language answers

Return type:

list[dict[str, Any]]

Returns:

List of insight dictionaries with added ‘qa_pairs’ field

generate_exploratory_questions(insight, context=None)[source]¶

Generate exploratory follow-up questions using LLM.

Parameters:

insight (dict[str, Any]) – Statistical insight
context (str | None) – Optional dataset/domain context

Return type:

list[str]

Returns:

List of exploratory questions

generate_qa_pairs(insight, formatted_answer, variables=None, visual_data=None)[source]¶

Generate Q/A pairs from a statistical insight.

Parameters:

insight (dict[str, Any]) – Statistical analysis result
formatted_answer (str) – Natural language answer
variables (list[str] | None) – List of variable names involved in the analysis
visual_data (dict[str, Any] | None) – Optional visual metadata to include with Q/A pairs

Returns:

question, answer, type, provenance, visual

Return type:

List of Q/A pair dictionaries with keys

generate_visual_metadata(insight, variables=None, plot_data=None)[source]¶

Generate visual metadata for a statistical insight.

Parameters:

insight (dict[str, Any]) – Statistical analysis result
variables (list[str] | None) – List of variable names involved in the analysis
plot_data (dict[str, Any] | None) – Optional plot data (data and variable objects)

Return type:

dict[str, Any] | None

Returns:

Visual metadata dictionary or None if no visualization appropriate

Templates¶

Template-based question generation for different analysis types.

Question templates for Q/A pair generation.

Defines templates for converting facts into question/answer pairs.

class statqa.qa.templates.QuestionTemplate(question_type)[source]¶

Bases: object

Template for generating questions from statistical insights.

Parameters:: question_type (QuestionType) – Type of question to generate

generate(insight, answer)[source]¶

Generate question/answer pairs from an insight.

Parameters:

insight (dict[str, Any]) – Statistical insight dictionary
answer (str) – Formatted natural language answer

Return type:

list[dict[str, str]]

Returns:

List of Q/A pair dictionaries

Raises:

ValueError – If question type is not supported

class statqa.qa.templates.QuestionType(*values)[source]¶

Bases: str, Enum

Types of questions that can be generated.

CAUSAL = 'causal'¶

COMPARATIVE = 'comparative'¶

CORRELATIONAL = 'correlational'¶

DESCRIPTIVE = 'descriptive'¶

DISTRIBUTIONAL = 'distributional'¶

TEMPORAL = 'temporal'¶

statqa.qa.templates.infer_question_type(insight)[source]¶

Infer the appropriate question type from an insight.

Parameters:: insight (dict[str, Any]) – Statistical insight dictionary
Return type:: QuestionType
Returns:: Inferred question type