Thanks to visit codestin.com
Credit goes to github.com

Skip to content

v0.1.1 – vLLM Improvements & BitsAndBytes Fix

Latest

Choose a tag to compare

@adrianrob1 adrianrob1 released this 20 May 06:16

πŸ”– Release v0.1.1 – vLLM Improvements & BitsAndBytes Fix

This release includes improvements for vLLM-based inference and a necessary fix for quantized models using bitsandbytes.

βœ… How to Use

You can install the latest release directly from PyPI:

pip install mergenetic

Or update an existing installation:

pip install --upgrade mergenetic

More usage examples and evaluation scripts can be found in the repository’s README.

πŸ“¦ PyPI: https://pypi.org/project/mergenetic/


πŸš€ What's New

  • Enabled Eager Mode for vLLM
    vLLM now runs in eager mode to avoid expensive CUDA graph capture, which can cause slowdowns when evaluating on a small number of samples. This change significantly improves responsiveness in low-batch scenarios.

  • Fixed BitsAndBytes Loading Issue
    Models loaded with quantization="bitsandbytes" now explicitly set load_format="bitsandbytes", resolving a runtime error:

    BitsAndBytes quantization and QLoRA adapter only support 'bitsandbytes' load format, but got auto
    

See #2 for more details.