Q/A Generation

Question/Answer pair generation from statistical insights.

Generator

Main Q/A generation engine with template-based and LLM-powered approaches.

Q/A pair generation from statistical insights.

Converts facts into multiple question/answer pairs using: 1. Template-based generation 2. LLM paraphrasing and augmentation

class statqa.qa.generator.QAGenerator(use_llm=False, llm_provider='openai', llm_model=None, api_key=None, paraphrase_count=2)[source]

Bases: object

Generates Q/A pairs from statistical insights.

Parameters:
  • use_llm (bool) – Whether to use LLM for paraphrasing

  • llm_provider (Literal['openai', 'anthropic']) – LLM provider (‘openai’ or ‘anthropic’)

  • llm_model (str | None) – Model name

  • api_key (str | None) – API key for LLM

  • paraphrase_count (int) – Number of paraphrased versions per question

Raises:
  • ImportError – If required LLM package not installed

  • ValueError – If LLM provider configuration is invalid

export_qa_dataset(qa_results, output_format='jsonl')[source]

Export Q/A pairs in format suitable for LLM fine-tuning.

Parameters:
  • qa_results (list[dict[str, Any]]) – Results from generate_batch

  • output_format (str) – ‘jsonl’, ‘openai’, or ‘anthropic’

Return type:

list[str]

Returns:

List of formatted strings (one per line for JSONL)

generate_batch(insights, formatted_answers)[source]

Generate Q/A pairs for multiple insights.

Parameters:
  • insights (list[dict[str, Any]]) – List of statistical insights

  • formatted_answers (list[str]) – Corresponding natural language answers

Return type:

list[dict[str, Any]]

Returns:

List of insight dictionaries with added ‘qa_pairs’ field

generate_exploratory_questions(insight, context=None)[source]

Generate exploratory follow-up questions using LLM.

Parameters:
  • insight (dict[str, Any]) – Statistical insight

  • context (str | None) – Optional dataset/domain context

Return type:

list[str]

Returns:

List of exploratory questions

generate_qa_pairs(insight, formatted_answer, variables=None, visual_data=None)[source]

Generate Q/A pairs from a statistical insight.

Parameters:
  • insight (dict[str, Any]) – Statistical analysis result

  • formatted_answer (str) – Natural language answer

  • variables (list[str] | None) – List of variable names involved in the analysis

  • visual_data (dict[str, Any] | None) – Optional visual metadata to include with Q/A pairs

Returns:

question, answer, type, provenance, visual

Return type:

List of Q/A pair dictionaries with keys

generate_visual_metadata(insight, variables=None, plot_data=None)[source]

Generate visual metadata for a statistical insight.

Parameters:
  • insight (dict[str, Any]) – Statistical analysis result

  • variables (list[str] | None) – List of variable names involved in the analysis

  • plot_data (dict[str, Any] | None) – Optional plot data (data and variable objects)

Return type:

dict[str, Any] | None

Returns:

Visual metadata dictionary or None if no visualization appropriate

Templates

Template-based question generation for different analysis types.

Question templates for Q/A pair generation.

Defines templates for converting facts into question/answer pairs.

class statqa.qa.templates.QuestionTemplate(question_type)[source]

Bases: object

Template for generating questions from statistical insights.

Parameters:

question_type (QuestionType) – Type of question to generate

generate(insight, answer)[source]

Generate question/answer pairs from an insight.

Parameters:
  • insight (dict[str, Any]) – Statistical insight dictionary

  • answer (str) – Formatted natural language answer

Return type:

list[dict[str, str]]

Returns:

List of Q/A pair dictionaries

Raises:

ValueError – If question type is not supported

class statqa.qa.templates.QuestionType(*values)[source]

Bases: str, Enum

Types of questions that can be generated.

CAUSAL = 'causal'
COMPARATIVE = 'comparative'
CORRELATIONAL = 'correlational'
DESCRIPTIVE = 'descriptive'
DISTRIBUTIONAL = 'distributional'
TEMPORAL = 'temporal'
statqa.qa.templates.infer_question_type(insight)[source]

Infer the appropriate question type from an insight.

Parameters:

insight (dict[str, Any]) – Statistical insight dictionary

Return type:

QuestionType

Returns:

Inferred question type