Analysis

Statistical analysis modules for comprehensive data insights.

Univariate Analysis

Single variable descriptive statistics and distribution analysis.

Univariate statistical analysis.

Performs descriptive statistics for single variables including: - Numeric: mean, median, std, robust statistics, distribution tests - Categorical: frequencies, mode, diversity measures - Missing: missingness analysis

class statqa.analysis.univariate.UnivariateAnalyzer(handle_outliers=True, robust=True)[source]

Bases: object

Analyzer for single-variable statistics.

Parameters:
  • handle_outliers (bool) – Whether to detect and report outliers

  • robust (bool) – Whether to include robust statistics (median, MAD)

analyze(data, variable)[source]

Analyze a single variable.

Parameters:
  • data (Series) – Data series

  • variable (Variable) – Variable metadata

Return type:

dict[str, Any]

Returns:

Analysis results as UnivariateResult

batch_analyze(df, variables)[source]

Analyze multiple variables at once.

Parameters:
  • df (DataFrame) – DataFrame with data

  • variables (dict[str, Variable]) – Mapping of column names to Variable metadata

Return type:

list[dict[str, Any]]

Returns:

List of analysis results as dictionaries

Bivariate Analysis

Two-variable relationship analysis including correlations and group comparisons.

Bivariate statistical analysis.

Analyzes relationships between pairs of variables: - Numeric x Numeric: Pearson/Spearman correlation, regression - Categorical x Categorical: Chi-square, Cramér’s V - Categorical x Numeric: Group comparisons, ANOVA

class statqa.analysis.bivariate.BivariateAnalyzer(significance_level=0.05, min_sample_size=10, use_robust=True)[source]

Bases: object

Analyzer for two-variable relationships.

Parameters:
  • significance_level (float) – Alpha level for statistical tests

  • min_sample_size (int) – Minimum sample size for analysis

  • use_robust (bool) – Use robust methods (Spearman) when appropriate

analyze(data, var1, var2)[source]

Analyze relationship between two variables.

Parameters:
  • data (DataFrame) – DataFrame containing both variables

  • var1 (Variable) – First variable metadata

  • var2 (Variable) – Second variable metadata

Return type:

dict[str, Any] | None

Returns:

Analysis results, or None if analysis not applicable

batch_analyze(df, variables, max_pairs=None)[source]

Analyze multiple variable pairs.

Parameters:
  • df (DataFrame) – DataFrame with data

  • variables (dict[str, Variable]) – Mapping of variable names to metadata

  • max_pairs (int | None) – Maximum number of pairs to analyze (None for all)

Return type:

list[dict[str, Any]]

Returns:

List of analysis results as dictionaries

Temporal Analysis

Time series analysis with trend detection and change point analysis.

Temporal analysis for time series data.

Analyzes trends and patterns over time: - Trend detection (Mann-Kendall, linear regression) - Seasonal decomposition - Change point detection - Year-over-year changes

class statqa.analysis.temporal.TemporalAnalyzer(significance_level=0.05, min_periods=3)[source]

Bases: object

Analyzer for temporal patterns and trends.

Parameters:
  • significance_level (float) – Alpha level for statistical tests

  • min_periods (int) – Minimum number of time periods required

analyze_grouped_trend(data, time_var, value_var, group_var)[source]

Analyze trends separately for different groups.

Parameters:
  • data (DataFrame) – DataFrame with time, value, and group columns

  • time_var (Variable) – Time variable

  • value_var (Variable) – Value variable

  • group_var (Variable) – Grouping variable

Return type:

dict[str, Any]

Returns:

Grouped trend analysis results as dictionary

analyze_trend(data, time_var, value_var)[source]

Analyze trend in a variable over time.

Parameters:
  • data (DataFrame) – DataFrame with time and value columns

  • time_var (Variable) – Time variable metadata

  • value_var (Variable) – Value variable being analyzed over time

Return type:

dict[str, Any]

Returns:

Trend analysis results as dictionary

detect_change_points(data, time_var, value_var)[source]

Detect significant change points in time series.

Uses simple segmentation approach comparing before/after means.

Parameters:
  • data (DataFrame) – DataFrame with time and value columns

  • time_var (Variable) – Time variable

  • value_var (Variable) – Value variable

Return type:

dict[str, Any]

Returns:

Change point detection results as dictionary

year_over_year_change(data, year_var, value_var)[source]

Calculate year-over-year changes.

Parameters:
  • data (DataFrame) – DataFrame with year and value columns

  • year_var (Variable) – Year variable

  • value_var (Variable) – Value variable

Return type:

dict[str, Any]

Returns:

Year-over-year analysis results as dictionary

Causal Analysis

Causal inference with confounding control and sensitivity analysis.

Causal analysis with confounding control.

Performs regression analysis with control variables to surface associations in causal language: - Linear regression with controls - Logistic regression for binary outcomes - Confounder identification - Sensitivity analysis

class statqa.analysis.causal.CausalAnalyzer(significance_level=0.05, min_sample_size=30, robust_se=True)[source]

Bases: object

Analyzer for causal relationships with confounding control.

Note: These are observational analyses and do not establish true causation without strong assumptions. Results should be interpreted as associations controlling for measured confounders.

Parameters:
  • significance_level (float) – Alpha level for hypothesis tests

  • min_sample_size (int) – Minimum sample size required

  • robust_se (bool) – Use heteroskedasticity-robust standard errors

analyze_treatment_effect(data, treatment_var, outcome_var, control_vars=None)[source]

Estimate treatment effect on outcome with optional controls.

Parameters:
  • data (DataFrame) – DataFrame with variables

  • treatment_var (Variable) – Treatment/exposure variable

  • outcome_var (Variable) – Outcome variable

  • control_vars (list[Variable] | None) – List of control/confounder variables

Return type:

dict[str, Any]

Returns:

Treatment effect analysis results as dictionary

identify_confounders(data, treatment_var, outcome_var, potential_confounders)[source]

Identify which variables act as confounders.

A confounder must: 1. Be associated with treatment 2. Be associated with outcome 3. Not be on causal path between treatment and outcome

Parameters:
  • data (DataFrame) – DataFrame with variables

  • treatment_var (Variable) – Treatment variable

  • outcome_var (Variable) – Outcome variable

  • potential_confounders (list[Variable]) – List of potential confounders to test

Return type:

dict[str, Any]

Returns:

Confounder identification results as dictionary