Temporal Sampler¶
The temporal sampler randomly selects days within a specified date range and retrieves repositories updated during those periods using weighted sampling based on repository activity.
- class reporoulette.TemporalSampler(token: str | None = None, start_date: datetime | str | None = None, end_date: datetime | str | None = None, rate_limit_safety: int = 100, seed: int | None = None, years_back: int = 10, log_level: int = 20)[source]¶
Bases:
BaseSamplerSample repositories by randomly selecting days and fetching repos updated in those periods.
This sampler selects random days within a specified date range, weights them by repository count, and retrieves repositories with proportional sampling.
- sample(n_samples: int = 100, days_to_sample: int = 10, per_page: int = 100, min_wait: float = 1.0, min_stars: int = 0, min_size_kb: int = 0, language: str | None = None, **kwargs: Any) list[dict[str, Any]][source]¶
Sample repositories by randomly selecting days with weighting based on repo count.
- Parameters:
n_samples – Target number of repositories to collect
days_to_sample – Number of random days to initially sample for count assessment
per_page – Number of results per page (max 100)
min_wait – Minimum wait time between API requests
min_stars – Minimum number of stars (0 for no filtering)
min_size_kb – Minimum repository size in KB (0 for no filtering)
language – Programming language to filter by
**kwargs – Additional filters to apply
- Returns:
List of repository data
Advantages¶
Can filter repositories during sampling
Weighted approach provides more active repositories
Customizable date ranges
Disadvantages¶
May be biased toward more active repositories
Limited by GitHub API rate limits
Requires careful parameter tuning
Usage Example¶
from reporoulette import TemporalSampler
from datetime import datetime, timedelta
# Direct usage
sampler = TemporalSampler(token="your_github_token")
repos = sampler.sample(
n_samples=10,
start_date=datetime.now() - timedelta(days=365),
end_date=datetime.now()
)
# Using convenience function
from reporoulette import sample
results = sample(method='temporal', n_samples=10)