π Release v0.1.1 β vLLM Improvements & BitsAndBytes Fix
This release includes improvements for vLLM-based inference and a necessary fix for quantized models using bitsandbytes.
β How to Use
You can install the latest release directly from PyPI:
pip install mergeneticOr update an existing installation:
pip install --upgrade mergeneticMore usage examples and evaluation scripts can be found in the repositoryβs README.
π¦ PyPI: https://pypi.org/project/mergenetic/
π What's New
-
Enabled Eager Mode for vLLM
vLLM now runs in eager mode to avoid expensive CUDA graph capture, which can cause slowdowns when evaluating on a small number of samples. This change significantly improves responsiveness in low-batch scenarios. -
Fixed BitsAndBytes Loading Issue
Models loaded withquantization="bitsandbytes"now explicitly setload_format="bitsandbytes", resolving a runtime error:BitsAndBytes quantization and QLoRA adapter only support 'bitsandbytes' load format, but got auto
See #2 for more details.