Changelog
Source:NEWS.md
bloomjoin 1.0.0
Major Release - Production Ready
This major release represents a complete overhaul of the bloomjoin package with significant enhancements to functionality, performance, reliability, and CRAN compliance. The package has been thoroughly tested with 54 comprehensive tests and is now production-ready.
New Features
-
Intelligent Join Strategy: Bloom filters are now used by default as users expect from
bloomjoin()
-
Enhanced Verbose Output: Added comprehensive performance reporting with
verbose = TRUE
- Performance Metadata: Results now include detailed metadata about Bloom filter effectiveness
- Multi-Column Join Optimization: Improved performance for composite key joins
Performance Improvements
- Optimized Bloom Filter Sizing: Smart sizing based on unique keys rather than total rows
- Enhanced Hash Functions: Implemented double hashing with MurmurHash3 for better distribution
- Memory Efficiency: Significant memory usage optimizations and proper cleanup
- Row Reduction: Achieving 94-97% row reduction in optimal scenarios
Bug Fixes
- Critical NA Handling: Fixed major bug in NA processing that caused incorrect join results
- Anti-Join Logic: Fixed anti-join implementation - now correctly bypasses Bloom filters due to false positive limitations
- Memory Leaks: Resolved potential memory leaks in C++ layer
-
Temporary Column Cleanup: Fixed issue where temporary columns weren’t properly removed
- Input Validation: Added comprehensive parameter validation and error handling
- CRAN Compliance: Fixed all major R CMD check issues for CRAN submission
API Changes
- Added
verbose
parameter tobloom_join()
for performance reporting - Enhanced error messages with clear, specific guidance
- Improved handling of edge cases (empty datasets, no overlap scenarios)
Testing & Quality
- 54 comprehensive tests covering all functionality and edge cases
- Memory and performance benchmarking integrated into test suite
- 100% test pass rate with extensive edge case coverage
- CRAN compliance verified