Code release for "SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency"
- Models.
- Evaluation pipeline.
- Training pipeline.
- Clone this repository and navigate to SAISA folder
git clone https://github.com/icip-cas/SAISA.git
cd SAISA- Install Package
conda create -n saisa python=3.10 -y
conda activate saisa
pip install --upgrade pip  # enable PEP 660 support
pip install -e .- Install additional packages for evaluation with lmms-eval
cd lmms_eval
pip install -e .Chat about images using SAISA.
python -m llava.serve.cli \
    --model-path yuanqianhao/saisa-vicuna   \
    --image-file "https://llava-vl.github.io/static/images/view.jpg"LMMs-Eval is an evaluation framework meticulously crafted for consistent and efficient evaluation of LMM.
export MODEL_PATH="yuanqianhao/saisa-vicuna"
export MODEL_NAME="saisa_vicuna"
export CONV_MODE="v1"
accelerate launch  --num_processes=1 --main_process_port=12346 -m lmms_eval \
    --model llava \
    --model_args pretrained=${MODEL_PATH},conv_template=${CONV_MODE}  \
    --tasks mmmu_val \
    --batch_size 1 \
    --log_samples_suffix ${MODEL_NAME} \
    --output_path ./logs/ export MODEL_PATH="yuanqianhao/saisa-llama3"
export MODEL_NAME="saisa_llama3"
export CONV_MODE="llama_3"
accelerate launch  --num_processes=1 --main_process_port=12346 -m lmms_eval \
    --model llava \
    --model_args pretrained=${MODEL_PATH},conv_template=${CONV_MODE}  \
    --tasks mmmu_val \
    --batch_size 1 \
    --log_samples_suffix ${MODEL_NAME} \
    --output_path ./logs/ See Evaluation.md.
This work is built upon the LLaVA and lmms-eval.
If you find ShortV useful for your research and applications, please cite using this BibTeX:
@article{yuan2025saisa,
  title={SAISA: Towards Multimodal Large Language Models with Both Training and Inference Efficiency},
  author={Yuan, Qianhao and Liu, Yanjiang and Lu, Yaojie and Lin, Hongyu and He, Ben and Han, Xianpei and Sun, Le},
  journal={arXiv preprint arXiv:2502.02458},
  year={2025}
}