Temporal Sampler

The temporal sampler randomly selects days within a specified date range and retrieves repositories updated during those periods using weighted sampling based on repository activity.

class reporoulette.TemporalSampler(token: str | None = None, start_date: datetime | str | None = None, end_date: datetime | str | None = None, rate_limit_safety: int = 100, seed: int | None = None, years_back: int = 10, log_level: int = 20)[source]

Bases: BaseSampler

Sample repositories by randomly selecting days and fetching repos updated in those periods.

This sampler selects random days within a specified date range, weights them by repository count, and retrieves repositories with proportional sampling.

sample(n_samples: int = 100, days_to_sample: int = 10, per_page: int = 100, min_wait: float = 1.0, min_stars: int = 0, min_size_kb: int = 0, language: str | None = None, **kwargs: Any) list[dict[str, Any]][source]

Sample repositories by randomly selecting days with weighting based on repo count.

Parameters:
  • n_samples – Target number of repositories to collect

  • days_to_sample – Number of random days to initially sample for count assessment

  • per_page – Number of results per page (max 100)

  • min_wait – Minimum wait time between API requests

  • min_stars – Minimum number of stars (0 for no filtering)

  • min_size_kb – Minimum repository size in KB (0 for no filtering)

  • language – Programming language to filter by

  • **kwargs – Additional filters to apply

Returns:

List of repository data

Advantages

  • Can filter repositories during sampling

  • Weighted approach provides more active repositories

  • Customizable date ranges

Disadvantages

  • May be biased toward more active repositories

  • Limited by GitHub API rate limits

  • Requires careful parameter tuning

Usage Example

from reporoulette import TemporalSampler
from datetime import datetime, timedelta

# Direct usage
sampler = TemporalSampler(token="your_github_token")
repos = sampler.sample(
    n_samples=10,
    start_date=datetime.now() - timedelta(days=365),
    end_date=datetime.now()
)

# Using convenience function
from reporoulette import sample
results = sample(method='temporal', n_samples=10)