Analysis
Statistical analysis modules for comprehensive data insights.
Univariate Analysis
Single variable descriptive statistics and distribution analysis.
Univariate statistical analysis.
Performs descriptive statistics for single variables including:
- Numeric: mean, median, std, robust statistics, distribution tests
- Categorical: frequencies, mode, diversity measures
- Missing: missingness analysis
-
class statqa.analysis.univariate.UnivariateAnalyzer(handle_outliers=True, robust=True)[source]
Bases: object
Analyzer for single-variable statistics.
- Parameters:
handle_outliers (bool) – Whether to detect and report outliers
robust (bool) – Whether to include robust statistics (median, MAD)
-
analyze(data, variable)[source]
Analyze a single variable.
- Parameters:
-
- Return type:
dict[str, Any]
- Returns:
Analysis results as UnivariateResult
-
batch_analyze(df, variables)[source]
Analyze multiple variables at once.
- Parameters:
-
- Return type:
list[dict[str, Any]]
- Returns:
List of analysis results as dictionaries
Bivariate Analysis
Two-variable relationship analysis including correlations and group comparisons.
Bivariate statistical analysis.
Analyzes relationships between pairs of variables:
- Numeric x Numeric: Pearson/Spearman correlation, regression
- Categorical x Categorical: Chi-square, Cramér’s V
- Categorical x Numeric: Group comparisons, ANOVA
-
class statqa.analysis.bivariate.BivariateAnalyzer(significance_level=0.05, min_sample_size=10, use_robust=True)[source]
Bases: object
Analyzer for two-variable relationships.
- Parameters:
significance_level (float) – Alpha level for statistical tests
min_sample_size (int) – Minimum sample size for analysis
use_robust (bool) – Use robust methods (Spearman) when appropriate
-
analyze(data, var1, var2)[source]
Analyze relationship between two variables.
- Parameters:
data (DataFrame) – DataFrame containing both variables
var1 (Variable) – First variable metadata
var2 (Variable) – Second variable metadata
- Return type:
dict[str, Any] | None
- Returns:
Analysis results, or None if analysis not applicable
-
batch_analyze(df, variables, max_pairs=None)[source]
Analyze multiple variable pairs.
- Parameters:
df (DataFrame) – DataFrame with data
variables (dict[str, Variable]) – Mapping of variable names to metadata
max_pairs (int | None) – Maximum number of pairs to analyze (None for all)
- Return type:
list[dict[str, Any]]
- Returns:
List of analysis results as dictionaries
Temporal Analysis
Time series analysis with trend detection and change point analysis.
Temporal analysis for time series data.
Analyzes trends and patterns over time:
- Trend detection (Mann-Kendall, linear regression)
- Seasonal decomposition
- Change point detection
- Year-over-year changes
-
class statqa.analysis.temporal.TemporalAnalyzer(significance_level=0.05, min_periods=3)[source]
Bases: object
Analyzer for temporal patterns and trends.
- Parameters:
-
-
analyze_grouped_trend(data, time_var, value_var, group_var)[source]
Analyze trends separately for different groups.
- Parameters:
data (DataFrame) – DataFrame with time, value, and group columns
time_var (Variable) – Time variable
value_var (Variable) – Value variable
group_var (Variable) – Grouping variable
- Return type:
dict[str, Any]
- Returns:
Grouped trend analysis results as dictionary
-
analyze_trend(data, time_var, value_var)[source]
Analyze trend in a variable over time.
- Parameters:
data (DataFrame) – DataFrame with time and value columns
time_var (Variable) – Time variable metadata
value_var (Variable) – Value variable being analyzed over time
- Return type:
dict[str, Any]
- Returns:
Trend analysis results as dictionary
-
detect_change_points(data, time_var, value_var)[source]
Detect significant change points in time series.
Uses simple segmentation approach comparing before/after means.
- Parameters:
data (DataFrame) – DataFrame with time and value columns
time_var (Variable) – Time variable
value_var (Variable) – Value variable
- Return type:
dict[str, Any]
- Returns:
Change point detection results as dictionary
-
year_over_year_change(data, year_var, value_var)[source]
Calculate year-over-year changes.
- Parameters:
data (DataFrame) – DataFrame with year and value columns
year_var (Variable) – Year variable
value_var (Variable) – Value variable
- Return type:
dict[str, Any]
- Returns:
Year-over-year analysis results as dictionary
Causal Analysis
Causal inference with confounding control and sensitivity analysis.
Causal analysis with confounding control.
Performs regression analysis with control variables to surface
associations in causal language:
- Linear regression with controls
- Logistic regression for binary outcomes
- Confounder identification
- Sensitivity analysis
-
class statqa.analysis.causal.CausalAnalyzer(significance_level=0.05, min_sample_size=30, robust_se=True)[source]
Bases: object
Analyzer for causal relationships with confounding control.
Note: These are observational analyses and do not establish true causation
without strong assumptions. Results should be interpreted as associations
controlling for measured confounders.
- Parameters:
significance_level (float) – Alpha level for hypothesis tests
min_sample_size (int) – Minimum sample size required
robust_se (bool) – Use heteroskedasticity-robust standard errors
-
analyze_treatment_effect(data, treatment_var, outcome_var, control_vars=None)[source]
Estimate treatment effect on outcome with optional controls.
- Parameters:
data (DataFrame) – DataFrame with variables
treatment_var (Variable) – Treatment/exposure variable
outcome_var (Variable) – Outcome variable
control_vars (list[Variable] | None) – List of control/confounder variables
- Return type:
dict[str, Any]
- Returns:
Treatment effect analysis results as dictionary
-
identify_confounders(data, treatment_var, outcome_var, potential_confounders)[source]
Identify which variables act as confounders.
A confounder must:
1. Be associated with treatment
2. Be associated with outcome
3. Not be on causal path between treatment and outcome
- Parameters:
data (DataFrame) – DataFrame with variables
treatment_var (Variable) – Treatment variable
outcome_var (Variable) – Outcome variable
potential_confounders (list[Variable]) – List of potential confounders to test
- Return type:
dict[str, Any]
- Returns:
Confounder identification results as dictionary