Codestin Search App

Learning with Enriched Inductive Biases for Vision-Language Models (IJCV2025)

Lingxiao Yang, Ru-Yuan Zhang, Qi Chen, Xiaohua Xie
Sun Yat-sen University, Shanghai Jiao Tong University
IJCV2025

Highlights

Abstract: Vision-Language Models, pre-trained on large-scale image-text pairs, serve as strong foundation models for transfer learning across a variety of downstream tasks. For few-shot generalization tasks, ie., when the model is trained on few-shot samples and then tested on unseen categories or datasets, there is a balance to be struck between generalization and discrimination when tweaking these models. Existing approaches typically rely on one or two strategies during training to learn task-specific knowledge, while preserving as much task-agnostic representation as possible. However, these methods overlook the importance of other useful inductive biases, thereby limiting their generalization capabilities. In this work, we propose a method–Learning with Enriched Inductive Biases (LwEIB)–to explore multiple inductive biases at the text, model, and optimization levels. Specifically, we first propose to enrich the handcrafted text prompt with Large Language Model generated descriptions for each category. To better capture structural cues in both linguistics and vision, we design two new adapters for text and image encoders, respectively. Additionally, we propose a slow-fast optimization method to explore different degrees of adaptation more efficiently, learning task-specific representations while maintaining task-agnostic ones. We empirically validate the effectiveness of LwEIB on three widely used benchmarks. Remarkably, our LwEIB outperforms numerous state-of-the-art methods across all evaluation metrics, demonstrating its efficacy and versatility.

Contributions

We propose a novel parameter-efficient fine-tuning framework – Learning with Enriched Inductive Biases (LwEIB) that can be trained end-to-end to leverage multiple inductive biases.
We propose three levels of inductive biases, i.e., textlevel, model-level and optimization-level, inductive biases, to increase the generalizability of VLMs in fewshot settings.
We evaluate LwEIB on three widely used and challenging few-shot generalization tasks. Experimental results show that LwEIB achieves leading performance among all compared methods in all evaluated benchmarks.

All Results over Three Benchmarks

Results reported below are average accuracy across 3 evaluated test settings. Please refer to our paper for more details.

Method	Base2New (HM)	Cross-Datasets	Domain Generalization	Avg
CLIP	71.70	65.15	57.18	64.67
CoOp	71.66	63.88	59.28	64.94
CoCoOp	75.83	65.74	59.91	67.16
MaPLe	78.55	66.30	60.27	68.37
PromptSRC	79.97	65.81	60.65	68.81
HPT	80.23	67.74	60.71	69.56
LwEIB (Paper)	81.21	68.61	60.84	70.22
LwEIB (This repo)	81.18	68.79	60.83	70.27

Some hyper-parameters in configs are slightly differeces to our paper, which provides a better average performance over three benchmarks (see above).

We provide all trained models and logs, based on this repo, to reproduce the results (70.27) on BaiduYunPan (passcode: 6hge) and Google Drive.

Installation

This code is built on top of the awesome project - CoOp, so you need to follow its setup steps:

First, you need to install the dassl environment - Dassl.pytorch. Simply follow the instructions described here to install dassl as well as PyTorch. After that, run pip install -r requirements.txt under VLM-LwEIB/ to install a few more packages required by CLIP (this should be done when dassl is activated).

Second, you need to follow DATASETS.md to install the datasets.

How to Run

# arg1 = used gpu_id
# arg2 = seed number
# using the following command for the base2new experiment
bash run_base2new.sh 0 1

# using the following command for the cross-datasets and domain-generalization experimetns
bash run_xd.sh 0 1

Citation

If you find our work or this repo helpful for your research, please kindly cite the following paper:

@article{LwEIB-Yang2025,
  title={Learning with Enriched Inductive Biases for Vision-Language Models},
  author={Yang, Lingxiao and Zhang, Ru-Yuan and Chen, Qi and Xie, Xiaohua},
  journal={International Journal of Computer Vision},
  year={2025},
  publisher={Springer}
}

Acknowledgements

Our code is based on Co-CoOp, CoOp and MMA repositories. We thank the authors for releasing their codes.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
clip		clip
configs		configs
datasets		datasets
docs		docs
scripts/lweib		scripts/lweib
trainers		trainers
LICENSE		LICENSE
README.md		README.md
parse_test_res.py		parse_test_res.py
requirements.txt		requirements.txt
run_base2new.sh		run_base2new.sh
run_xd.sh		run_xd.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Learning with Enriched Inductive Biases for Vision-Language Models (IJCV2025)

Highlights

Contributions

All Results over Three Benchmarks

Installation

How to Run

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

Kururu007/VLM-LwEIB

Folders and files

Latest commit

History

Repository files navigation

Learning with Enriched Inductive Biases for Vision-Language Models (IJCV2025)

Highlights

Contributions

All Results over Three Benchmarks

Installation

How to Run

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages