RA-Touch

RA-Touch: Retrieval-Augmented Touch Understanding with Enriched Visual Data
Yoorhim Cho^*, Hongyeob Kim^*, Semin Kim, Youjia Zhang, Yunseok Choi, Sungeun Hong^†
* Denotes equal contribution
Sungkyunkwan University

Abstract

TL;DR

We introduce RA-Touch, a retrieval-augmented framework that improves visuo-tactile perception by leveraging visual data enriched with tactile semantics. We carefully recaption a large-scale visual dataset, ImageNet-T, with tactile-focused descriptions, enabling the model to access tactile semantics typically absent from conventional visual datasets.

FULL abstract

Visuo-tactile perception aims to understand an object’s tactile properties, such as texture, softness, and rigidity. However, the field remains underexplored because collecting tactile data is costly and labor-intensive. We observe that visually distinct objects can exhibit similar surface textures or material properties. For example, a leather sofa and a leather jacket have different appearances but share similar tactile properties. This implies that tactile understanding can be guided by material cues in visual data, even without direct tactile supervision. In this paper, we introduce RA-Touch, a retrieval-augmented framework that improves visuo-tactile perception by leveraging visual data enriched with tactile semantics. We carefully recaption a large-scale visual dataset with tactile-focused descriptions, enabling the model to access tactile semantics typically absent from conventional visual datasets. A key challenge remains in effectively utilizing these tactile-aware external descriptions. RATouch addresses this by retrieving visual-textual representations aligned with tactile inputs and integrating them to focus on relevant textural and material properties. By outperforming prior methods on the TVL benchmark, our method demonstrates the potential of retrieval-based visual reuse for tactile understanding.

Instructions

Setup

conda create -n ra-touch python=3.10 -y
conda activate ra-touch
git clone https://github.com/AIM-SKKU/RA-Touch.git
cd RA-Touch

with pip:

pip install -r requirements.txt
pip install -e .

with uv:

uv wync
source .venv/bin/acivate

Download Pre-trained Weights: Please download the required pre-trained weights from the TVL repository:

TVL Encoder weights (e.g., tvl_enc_vitb.pth, tvl_enc_vits.pth, tvl_enc_vittiny.pth)
TVL-LLaMA weights (e.g., tvl_llama_vitb.pth, tvl_llama_vits.pth, tvl_llama_vittiny.pth)

Place these weights in the ./weights/ directory.

We have used torch 2.5.1 for training and evaluation on A6000 GPUs.

Tactile-Guided Retriever

Training

To train the tactile-guided retriever:

bash scripts/train_retriever.sh <gpu_ids> <vit_type> <epochs> <batch_size> <port>
# bash scripts/train_retriever.sh 0,1 base 60 256 23500

Arguments:

gpu_ids: Comma-separated GPU IDs (e.g., "0,1,2,3")
vit_type: Vision Transformer type ("tiny", "small", "base") - default: "base"
epochs: Number of training epochs - default: 60
batch_size: Training batch size - default: 256
port: Distributed training port - default: 23500

Requirements:

Pre-trained tactile encoder weights (e.g., ./weights/tvl_enc_vitb.pth)
Training data configuration in configs/finetune-data-config.yaml

Texture-Aware Integrator

Training

To train the full RA-Touch model with texture-aware integrator:

bash scripts/train_ra_touch.sh <gpu_ids> <vit_type> <retriever_weight> <topk> <external_dataset> <retrieval_method> <port>
# bash scripts/train_ra_touch.sh 0,1,2,3 base ./output/retriever_checkpoint.pth 5 imgnet_t_150k txt2txt 1113

Arguments:

gpu_ids: Comma-separated GPU IDs
vit_type: Vision Transformer type ("tiny", "small", "base")
retriever_weight: Path to trained retriever checkpoint
topk: Number of top-k retrieved samples - default: 5
external_dataset: External dataset for retrieval ("imgnet_t_10k", "imgnet_t_50k", "imgnet_t_100k", "imgnet_t_150k")
retrieval_method: Retrieval method - default: "txt2txt"
port: Distributed training port - default: 1113

Requirements:

LLaMA-2 model in ./llama-2/ directory
Pre-trained TVL-LLaMA weights (e.g., ./weights/tvl_llama_vitb.pth)
ImageNet-T embeddings (e.g., ./data/embeddings/imagenet_t_150k_embeddings.npz)

Evaluation on TVL-Benchmark

To evaluate the trained RA-Touch model:

bash scripts/eval_ra_touch.sh <gpu_ids> <vit_type> <retriever_ckpt> <ra_touch_ckpt> <topk> <external_dataset> <retrieval_method> <port>
# bash scripts/eval_ra_touch.sh 0 base ./output/retriever_checkpoint.pth ./output/ra_touch_checkpoint.pth 5 imgnet_t_150k txt2txt 1113

Requirements:

OpenAI API key in scripts/openai_key.txt for GPT evaluation
TVL dataset path configured in the script
Trained retriever and RA-Touch model checkpoints

We follow the same paired t-test as TVL. Please replace the OpenAI API key in scripts/openai_key.txt after obtaining your OPENAI_API_Key

Dataset

The dataset is hosted on HuggingFace. To use the dataset, we first download them using the GUI or use git:

# install git-lfs
sudo apt install git-lfs
git lfs install
# clone the dataset
git clone https://huggingface.co/datasets/yoorhim/ImageNet-T
# or you can download the zip files manually from here: https://huggingface.co/datasets/yoorhim/ImageNet-T/tree/main

Models

TBU

To-Do List

Training Code
Release ImgeNet-T Dataset

Acknowledgement

This repository is built using the TVL repository.

📖BibTeX

@inproceedings{
    cho2025ra,
    title={RA-Touch: Retrieval-Augmented Touch Understanding with Enriched Visual Data},
    author={Cho, Yoorhim and Kim, Hongyeob and Kim, Semin and Zhang, Youjia and Choi, Yunseok and Hong, Sungeun},
    booktitle={ACM Multimedia 2025},
    year={2025},
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
configs		configs
images		images
ra_touch		ra_touch
scripts		scripts
tvl_enc		tvl_enc
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RA-Touch

Abstract

TL;DR

Instructions

Setup

with pip:

with uv:

Tactile-Guided Retriever

Training

Texture-Aware Integrator

Training

Evaluation on TVL-Benchmark

Dataset

Models

To-Do List

Acknowledgement

📖BibTeX

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

AIM-SKKU/RA-Touch

Folders and files

Latest commit

History

Repository files navigation

RA-Touch

Abstract

TL;DR

Instructions

Setup

with pip:

with uv:

Tactile-Guided Retriever

Training

Texture-Aware Integrator

Training

Evaluation on TVL-Benchmark

Dataset

Models

To-Do List

Acknowledgement

📖BibTeX

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages