Utils

Utility functions and helper modules.

Statistics

Statistical utilities and helper functions.

Statistical utilities and helper functions.

statqa.utils.stats.calculate_effect_size(data1, data2=None, effect_type='cohen_d')[source]

Calculate effect size for statistical tests.

Parameters:
Return type:

float

Returns:

Effect size value

Raises:
statqa.utils.stats.cohens_d(group1, group2)[source]

Calculate Cohen’s d effect size for two groups.

Parameters:
Return type:

float

Returns:

Cohen’s d (standardized mean difference)

statqa.utils.stats.correct_multiple_testing(p_values, method='fdr_bh', alpha=0.05)[source]

Apply multiple testing correction to p-values.

Parameters:
  • p_values (list[float] | ndarray[Any, dtype[floating[Any]]]) – List or array of p-values

  • method (Literal['bonferroni', 'fdr_bh', 'fdr_by']) – Correction method (‘bonferroni’, ‘fdr_bh’, ‘fdr_by’) - bonferroni: Bonferroni correction (most conservative) - fdr_bh: Benjamini-Hochberg FDR (recommended) - fdr_by: Benjamini-Yekutieli FDR (more conservative)

  • alpha (float) – Significance level

Return type:

tuple[ndarray[Any, dtype[bool]], ndarray[Any, dtype[floating[Any]]]]

Returns:

Tuple of (reject, corrected_p_values) - reject: Boolean array indicating which tests reject null - corrected_p_values: Adjusted p-values

Raises:

ValueError – If correction method is not supported

statqa.utils.stats.cramers_v(contingency_table)[source]

Calculate Cramér’s V effect size for categorical associations.

Parameters:

contingency_table (DataFrame | ndarray[Any, dtype[integer[Any]]]) – Contingency table (crosstab)

Return type:

float

Returns:

Cramér’s V (0 to 1)

statqa.utils.stats.detect_outliers(data, method='iqr', threshold=1.5)[source]

Detect outliers in data.

Parameters:
  • data (Series | ndarray[Any, dtype[floating[Any]]]) – Input data

  • method (Literal['iqr', 'mad', 'zscore']) – Detection method (‘iqr’, ‘mad’, ‘zscore’)

  • threshold (float) – Threshold for outlier detection - iqr: Multiplier for IQR (default 1.5) - mad: Multiplier for MAD (default 3.0 recommended) - zscore: Z-score threshold (default 3.0)

Return type:

ndarray[Any, dtype[floating[Any]]]

Returns:

Boolean array indicating outliers

Raises:

ValueError – If outlier detection method is not supported

statqa.utils.stats.mann_kendall_trend(series)[source]

Perform Mann-Kendall trend test for temporal data.

Parameters:

series (Series | ndarray[Any, dtype[floating[Any]]]) – Time series data

Returns:

  • tau: Kendall’s tau statistic

  • p_value: Two-tailed p-value

  • trend: Trend direction (‘increasing’, ‘decreasing’, ‘no trend’)

Return type:

Dictionary with

statqa.utils.stats.robust_stats(data)[source]

Calculate robust statistics for potentially outlier-heavy data.

Parameters:

data (Series | ndarray[Any, dtype[floating[Any]]]) – Input data

Returns:

  • median: Median (robust central tendency)

  • mad: Median Absolute Deviation (robust dispersion)

  • iqr: Interquartile Range

  • q25, q75: Quartiles

Return type:

Dictionary with robust statistics

I/O

Input/output utilities for loading and saving data.

I/O utilities for loading and saving data.

statqa.utils.io.load_data(source, file_pattern='(?i)\\\\.csv$', **kwargs)[source]

Load data from various sources.

Parameters:
  • source (str | Path) – Path to file (CSV, ZIP containing CSVs, etc.)

  • file_pattern (str) – Regex pattern for files in ZIP

  • **kwargs (Any) – Additional arguments for pd.read_csv

Return type:

DataFrame

Returns:

Loaded DataFrame

Raises:

FileNotFoundError – If source doesn’t exist

statqa.utils.io.load_json(input_path)[source]

Load data from JSON file.

Parameters:

input_path (str | Path) – Input file path

Return type:

Any

Returns:

Loaded data

statqa.utils.io.save_json(data, output_path, indent=2)[source]

Save data to JSON file.

Parameters:
  • data (Any) – Data to save (must be JSON-serializable)

  • output_path (str | Path) – Output file path

  • indent (int) – JSON indentation level

Return type:

None

Logging

Logging configuration and utilities.

Simple logging setup for statqa.

Provides minimal logging configuration with debug support via environment variable. No complex logging infrastructure - just simple, useful debugging.

statqa.utils.logging.get_logger(name)[source]

Get a logger for a module with statqa’s simple configuration.

Return type:

Logger

statqa.utils.logging.setup_logging(logger_name, level=None)[source]

Set up simple logging for statqa modules.

Respects STATQA_DEBUG environment variable: - STATQA_DEBUG=1: DEBUG level - Default: INFO level

Parameters:
  • logger_name (str) – Usually __name__ from calling module

  • level (Optional[Literal['DEBUG', 'INFO', 'WARNING', 'ERROR']]) – Override log level (optional)

Return type:

Logger

Returns:

Configured logger