Utils¶

Utility functions and helper modules.

Statistics¶

Statistical utilities and helper functions.

statqa.utils.stats.calculate_effect_size(data1, data2=None, effect_type='cohen_d')[source]¶

Calculate effect size for statistical tests.

Parameters:

data1 (Series | ndarray[Any, dtype[floating[Any]]] | float) – First sample or correlation coefficient
data2 (Series | ndarray[Any, dtype[floating[Any]]] | None) – Second sample (for two-sample tests)
effect_type (Literal['cohen_d', 'r_to_d', 'cramers_v', 'eta_squared']) – Type of effect size (‘cohen_d’, ‘r_to_d’, ‘cramers_v’, ‘eta_squared’)

Return type:

float

Returns:

Effect size value

Raises:

ValueError – If invalid effect_type or incompatible data
NotImplementedError – If effect_type is not yet implemented

statqa.utils.stats.cohens_d(group1, group2)[source]¶

Calculate Cohen’s d effect size for two groups.

Parameters:

group1 (Series | ndarray[Any, dtype[floating[Any]]]) – First group
group2 (Series | ndarray[Any, dtype[floating[Any]]]) – Second group

Return type:

float

Returns:

Cohen’s d (standardized mean difference)

statqa.utils.stats.correct_multiple_testing(p_values, method='fdr_bh', alpha=0.05)[source]¶

Apply multiple testing correction to p-values.

Parameters:

p_values (list[float] | ndarray[Any, dtype[floating[Any]]]) – List or array of p-values
method (Literal['bonferroni', 'fdr_bh', 'fdr_by']) – Correction method (‘bonferroni’, ‘fdr_bh’, ‘fdr_by’) - bonferroni: Bonferroni correction (most conservative) - fdr_bh: Benjamini-Hochberg FDR (recommended) - fdr_by: Benjamini-Yekutieli FDR (more conservative)
alpha (float) – Significance level

Return type:

tuple[ndarray[Any, dtype[bool]], ndarray[Any, dtype[floating[Any]]]]

Returns:

Tuple of (reject, corrected_p_values) - reject: Boolean array indicating which tests reject null - corrected_p_values: Adjusted p-values

Raises:

ValueError – If correction method is not supported

statqa.utils.stats.cramers_v(contingency_table)[source]¶

Calculate Cramér’s V effect size for categorical associations.

Parameters:: contingency_table (DataFrame | ndarray[Any, dtype[integer[Any]]]) – Contingency table (crosstab)
Return type:: float
Returns:: Cramér’s V (0 to 1)

statqa.utils.stats.detect_outliers(data, method='iqr', threshold=1.5)[source]¶

Detect outliers in data.

Parameters:

data (Series | ndarray[Any, dtype[floating[Any]]]) – Input data
method (Literal['iqr', 'mad', 'zscore']) – Detection method (‘iqr’, ‘mad’, ‘zscore’)
threshold (float) – Threshold for outlier detection - iqr: Multiplier for IQR (default 1.5) - mad: Multiplier for MAD (default 3.0 recommended) - zscore: Z-score threshold (default 3.0)

Return type:

ndarray[Any, dtype[floating[Any]]]

Returns:

Boolean array indicating outliers

Raises:

ValueError – If outlier detection method is not supported

statqa.utils.stats.mann_kendall_trend(series)[source]¶

Perform Mann-Kendall trend test for temporal data.

Parameters:

series (Series | ndarray[Any, dtype[floating[Any]]]) – Time series data

Returns:

tau: Kendall’s tau statistic
p_value: Two-tailed p-value
trend: Trend direction (‘increasing’, ‘decreasing’, ‘no trend’)

Return type:

Dictionary with

statqa.utils.stats.robust_stats(data)[source]¶

Calculate robust statistics for potentially outlier-heavy data.

Parameters:

data (Series | ndarray[Any, dtype[floating[Any]]]) – Input data

Returns:

median: Median (robust central tendency)
mad: Median Absolute Deviation (robust dispersion)
iqr: Interquartile Range
q25, q75: Quartiles

Return type:

Dictionary with robust statistics

I/O¶

Input/output utilities for loading and saving data.

I/O utilities for loading and saving data.

statqa.utils.io.load_data(source, file_pattern='(?i)\\\\.csv$', **kwargs)[source]¶

Load data from various sources.

Parameters:

source (str | Path) – Path to file (CSV, ZIP containing CSVs, etc.)
file_pattern (str) – Regex pattern for files in ZIP
**kwargs (Any) – Additional arguments for pd.read_csv

Return type:

DataFrame

Returns:

Loaded DataFrame

Raises:

FileNotFoundError – If source doesn’t exist

statqa.utils.io.load_json(input_path)[source]¶

Load data from JSON file.

Parameters:: input_path (str | Path) – Input file path
Return type:: Any
Returns:: Loaded data

statqa.utils.io.save_json(data, output_path, indent=2)[source]¶

Save data to JSON file.

Parameters:

data (Any) – Data to save (must be JSON-serializable)
output_path (str | Path) – Output file path
indent (int) – JSON indentation level

Return type:

None

Logging¶

Logging configuration and utilities.

Simple logging setup for statqa.

Provides minimal logging configuration with debug support via environment variable. No complex logging infrastructure - just simple, useful debugging.

statqa.utils.logging.get_logger(name)[source]¶

Get a logger for a module with statqa’s simple configuration.

Return type:: Logger

statqa.utils.logging.setup_logging(logger_name, level=None)[source]¶

Set up simple logging for statqa modules.

Respects STATQA_DEBUG environment variable: - STATQA_DEBUG=1: DEBUG level - Default: INFO level

Parameters:

logger_name (str) – Usually __name__ from calling module
level (Optional[Literal['DEBUG', 'INFO', 'WARNING', 'ERROR']]) – Override log level (optional)

Return type:

Logger

Returns:

Configured logger