v0.1.1 – vLLM Improvements & BitsAndBytes Fix

🔖 Release v0.1.1 – vLLM Improvements & BitsAndBytes Fix

This release includes improvements for vLLM-based inference and a necessary fix for quantized models using bitsandbytes.

You can install the latest release directly from PyPI:

pip install mergenetic

Or update an existing installation:

pip install --upgrade mergenetic

More usage examples and evaluation scripts can be found in the repository’s README.

Enabled Eager Mode for vLLM
vLLM now runs in eager mode to avoid expensive CUDA graph capture, which can cause slowdowns when evaluating on a small number of samples. This change significantly improves responsiveness in low-batch scenarios.
Fixed BitsAndBytes Loading Issue
Models loaded with quantization="bitsandbytes" now explicitly set load_format="bitsandbytes", resolving a runtime error:
```
BitsAndBytes quantization and QLoRA adapter only support 'bitsandbytes' load format, but got auto
```

See #2 for more details.