F4-ITS: Fine-grained Feature Fusion for Food Image-Text Search is a training-free, vision-language model (VLM)-guided framework that significantly improves retrieval performance through enhanced multi-modal feature representations. Our approach introduces two key contributions: (1) a uni-directional(and bi-directional) multi- modal fusion strategy that combines image embeddings with VLM-generated textual descriptions to improve query expressiveness, and (2) a novel feature-based re-ranking mechanism for top-k retrieval, leveraging predicted food ingredients to refine results and boost precision.
-
Notifications
You must be signed in to change notification settings - Fork 0
F4-ITS: Fine-grained Feature Fusion for Food Image-Text Search
License
mailcorahul/f4-its
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
About
F4-ITS: Fine-grained Feature Fusion for Food Image-Text Search
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published