Sample Function¶
The sample() function provides a convenient unified interface to all sampling methods without needing to instantiate sampler classes directly.
- reporoulette.sample(method: str = 'temporal', n_samples: int = 50, token: str | None = None, **kwargs: Any) dict[str, Any][source]¶
Sample repositories using the specified method.
- Parameters:
method – Sampling method (‘id’, ‘temporal’, ‘archive’, or ‘bigquery’)
n_samples – Number of repositories to sample
token – GitHub Personal Access Token (not used for BigQuery)
**kwargs – Additional parameters specific to each sampler
- Returns:
Dictionary with sampling results and stats
- Raises:
ValueError – If an unknown sampling method is provided
Usage Examples¶
The sample function automatically handles sampler instantiation and configuration:
from reporoulette import sample
# ID-based sampling
results = sample(method='id', n_samples=10)
# Temporal sampling (default method)
results = sample(n_samples=20)
# BigQuery sampling with credentials
results = sample(
method='bigquery',
n_samples=100,
credentials_path="/path/to/credentials.json",
project_id="your-gcp-project"
)
# GitHub Archive sampling
results = sample(method='archive', n_samples=50)
Return Format¶
All methods return a dictionary with:
method: The sampling method usedparams: Parameters passed to the samplerattempts: Total sampling attempts madesuccess_rate: Ratio of successful to total attemptssamples: List of repository data dictionaries
{
"method": "temporal",
"params": {"start_date": "2024-01-01", ...},
"attempts": 25,
"success_rate": 0.8,
"samples": [
{"full_name": "user/repo1", "stars": 42, ...},
{"full_name": "user/repo2", "stars": 123, ...},
...
]
}