Analysis¶

Statistical analysis modules for comprehensive data insights.

Univariate Analysis¶

Single variable descriptive statistics and distribution analysis.

Univariate statistical analysis.

Performs descriptive statistics for single variables including: - Numeric: mean, median, std, robust statistics, distribution tests - Categorical: frequencies, mode, diversity measures - Missing: missingness analysis

class statqa.analysis.univariate.UnivariateAnalyzer(handle_outliers=True, robust=True)[source]¶

Bases: object

Analyzer for single-variable statistics.

Parameters:

handle_outliers (bool) – Whether to detect and report outliers
robust (bool) – Whether to include robust statistics (median, MAD)

analyze(data, variable)[source]¶

Analyze a single variable.

Parameters:

data (Series) – Data series
variable (Variable) – Variable metadata

Return type:

dict[str, Any]

Returns:

Analysis results as UnivariateResult

batch_analyze(df, variables)[source]¶

Analyze multiple variables at once.

Parameters:

df (DataFrame) – DataFrame with data
variables (dict[str, Variable]) – Mapping of column names to Variable metadata

Return type:

list[dict[str, Any]]

Returns:

List of analysis results as dictionaries

Bivariate Analysis¶

Two-variable relationship analysis including correlations and group comparisons.

Bivariate statistical analysis.

Analyzes relationships between pairs of variables: - Numeric x Numeric: Pearson/Spearman correlation, regression - Categorical x Categorical: Chi-square, Cramér’s V - Categorical x Numeric: Group comparisons, ANOVA

class statqa.analysis.bivariate.BivariateAnalyzer(significance_level=0.05, min_sample_size=10, use_robust=True)[source]¶

Bases: object

Analyzer for two-variable relationships.

Parameters:

significance_level (float) – Alpha level for statistical tests
min_sample_size (int) – Minimum sample size for analysis
use_robust (bool) – Use robust methods (Spearman) when appropriate

analyze(data, var1, var2)[source]¶

Analyze relationship between two variables.

Parameters:

data (DataFrame) – DataFrame containing both variables
var1 (Variable) – First variable metadata
var2 (Variable) – Second variable metadata

Return type:

dict[str, Any] | None

Returns:

Analysis results, or None if analysis not applicable

batch_analyze(df, variables, max_pairs=None)[source]¶

Analyze multiple variable pairs.

Parameters:

df (DataFrame) – DataFrame with data
variables (dict[str, Variable]) – Mapping of variable names to metadata
max_pairs (int | None) – Maximum number of pairs to analyze (None for all)

Return type:

list[dict[str, Any]]

Returns:

List of analysis results as dictionaries

Temporal Analysis¶

Time series analysis with trend detection and change point analysis.

Temporal analysis for time series data.

Analyzes trends and patterns over time: - Trend detection (Mann-Kendall, linear regression) - Seasonal decomposition - Change point detection - Year-over-year changes

class statqa.analysis.temporal.TemporalAnalyzer(significance_level=0.05, min_periods=3)[source]¶

Bases: object

Analyzer for temporal patterns and trends.

Parameters:

significance_level (float) – Alpha level for statistical tests
min_periods (int) – Minimum number of time periods required

analyze_grouped_trend(data, time_var, value_var, group_var)[source]¶

Analyze trends separately for different groups.

Parameters:

data (DataFrame) – DataFrame with time, value, and group columns
time_var (Variable) – Time variable
value_var (Variable) – Value variable
group_var (Variable) – Grouping variable

Return type:

dict[str, Any]

Returns:

Grouped trend analysis results as dictionary

analyze_trend(data, time_var, value_var)[source]¶

Analyze trend in a variable over time.

Parameters:

data (DataFrame) – DataFrame with time and value columns
time_var (Variable) – Time variable metadata
value_var (Variable) – Value variable being analyzed over time

Return type:

dict[str, Any]

Returns:

Trend analysis results as dictionary

detect_change_points(data, time_var, value_var)[source]¶

Detect significant change points in time series.

Uses simple segmentation approach comparing before/after means.

Parameters:

data (DataFrame) – DataFrame with time and value columns
time_var (Variable) – Time variable
value_var (Variable) – Value variable

Return type:

dict[str, Any]

Returns:

Change point detection results as dictionary

year_over_year_change(data, year_var, value_var)[source]¶

Calculate year-over-year changes.

Parameters:

data (DataFrame) – DataFrame with year and value columns
year_var (Variable) – Year variable
value_var (Variable) – Value variable

Return type:

dict[str, Any]

Returns:

Year-over-year analysis results as dictionary

Causal Analysis¶

Causal inference with confounding control and sensitivity analysis.

Causal analysis with confounding control.

Performs regression analysis with control variables to surface associations in causal language: - Linear regression with controls - Logistic regression for binary outcomes - Confounder identification - Sensitivity analysis

class statqa.analysis.causal.CausalAnalyzer(significance_level=0.05, min_sample_size=30, robust_se=True)[source]¶

Bases: object

Analyzer for causal relationships with confounding control.

Note: These are observational analyses and do not establish true causation without strong assumptions. Results should be interpreted as associations controlling for measured confounders.

Parameters:

significance_level (float) – Alpha level for hypothesis tests
min_sample_size (int) – Minimum sample size required
robust_se (bool) – Use heteroskedasticity-robust standard errors

analyze_treatment_effect(data, treatment_var, outcome_var, control_vars=None)[source]¶

Estimate treatment effect on outcome with optional controls.

Parameters:

data (DataFrame) – DataFrame with variables
treatment_var (Variable) – Treatment/exposure variable
outcome_var (Variable) – Outcome variable
control_vars (list[Variable] | None) – List of control/confounder variables

Return type:

dict[str, Any]

Returns:

Treatment effect analysis results as dictionary

identify_confounders(data, treatment_var, outcome_var, potential_confounders)[source]¶

Identify which variables act as confounders.

A confounder must: 1. Be associated with treatment 2. Be associated with outcome 3. Not be on causal path between treatment and outcome

Parameters:

data (DataFrame) – DataFrame with variables
treatment_var (Variable) – Treatment variable
outcome_var (Variable) – Outcome variable
potential_confounders (list[Variable]) – List of potential confounders to test

Return type:

dict[str, Any]

Returns:

Confounder identification results as dictionary