ETH Computational Intelligence Lab Project 2025

Monocular Depth Estimation

Group Name: deficienza_computazionale
Authors: Luigi Pizza, Aryan Sood, Alessandro Tazza, Federico Villa

Environment Setup

Using Conda

Create a conda environment and activate it:

conda create -n CIL python=3.12
conda activate CIL

Install pip inside the conda environment:

conda install pip

Using Python virtual environment

Create a Python virtual environment:

python -m venv CIL

Install required dependencies

pip install -r requirements.txt

Model Training

The file example_training.py contains an example for running a model. It is the same script used by us to achieve our best Kaggle score.

In the example file, the begin_training_loop function is called. That function is called during training to start the main model training process. The function's parameter use_random_split is a boolean value that should be True if you want to perform a random split of your data, so creating at runtime a training set and validation set at random. If use_random_split = False, then a custom split should be provided, so both the train_split and val_split parameters should be non-empty.

Here there is an example of such parameters using the training and validation splits created by clustering images based on their cosine similarity (as we did in our experiments):

train_split=pd.read_csv("train_split.csv")["file_name"].to_list()
val_split=pd.read_csv("val_split.csv")["file_name"].to_list()

Check the files train_split.csv and val_split.csv in the repository root folder to see how to format the files. Both files are formatted as Comma-Separated Value (.csv) files with only one column, file_name, and the values contained in the column are alternating names of depth values (.npy) and RGB Images (.png) filenames.

Model Evaluation

To evaluate a previously trained model, run the script test_evaluation.py after setting the model weights, the path of your test data and the directory where the predictions should be saved (refer to the comments inside the evaluation file for more details on how to change the values to evaluate a specific model).

To use the same model weights and test data as done in our best experiment, do not change any variables in the script. Simply download the model weights and save the downloaded checkpoint file in the root directory of this repository.

Submitting Results

After evaluation, to generate a Kaggle submission file, run the script create_outputs.py.

Models

Inside the models/ subfolder there are the files for the different models we tested.

Models trained from scratch

Those are the models we trained from scratch by using the data referred in train_split.csv.

Base U-Net: models/large_unet.py
UNet++: models/unetpp.py
Swin transformer encoder with a U-Net decoder: models/unet_swin_depth_estimator.py
ViT transformer encoder with a U-Net decoder: models/unet_vit_depth_estimator.py

ResNet Model

The model created using a ResNet encoder plus a U-Net decoder used pre-trained weights for the encoder and was fine-tuned on the training data referred in train_split.csv. The encoder's pre-trained weights were taken from torchvision.models.ResNet50_Weights. The files detailing ResNet based models are:

ResNet: models/resnet_unet_decoder.py
ResNet-Transformer: models/resnet_transformer_unet.py

SegFormer Model

The model created by using SegFormer encoder plus SegFormer Decoder (modified by us to output a depth mask) used pre-trained weights for the encoder and was fine-tuned on the training data referred in train_split.csv. The encoder's pre-trained weights were taken from nvidia/segformer-b5-finetuned-ade-640-640.

For more details of the actual model's implementation refer to models/segformer_depth.py.

Mask2Former Model

The model created by using a Mask2Former architecture plus double convolutional layer used pre-trained weights for the Mask2Former architecture and was fine-tuned on the training data as referred in train_split.csv. The fine-tuned weights used were taken from facebook/mask2former-swin-small-coco-instance.

For more details of the actual model's implementation refer to: models/Mask2Former_depth.py.

Table: Depth estimation performance across different architectures

Model	Validation Loss	Training Loss	Kaggle Public Score
U-Net Based models
Base U-Net	0.27101	0.32242	--
UNet++	0.36963	0.44006	--
ResNet Based models
ResNet	0.15886	0.09157	0.16063
ResNet-Transformer	0.17421	0.11357	0.21408
Mask2Former model
mask2former-swin-small-coco-instance	0.12976	0.097775	0.13343
SegFormer models
segformer-b4-512-512	0.14236	0.11561	0.14048
segformer-b5-640-640	0.11474	0.15951	0.12621

Note: The SegFormer based model achieved a better result compared to the other models, showing a 27.8% improvement over the ResNet based model and a 68.9% improvement over the Base-U-Net and the UNet++ model.

Model Weights

To evaluate our results, we provide the model weights of the following models:

Model	File
segformer-b4-512-512	polybox
segformer-b5-640-640	polybox
mask2former-swin-small-coco-instance	polybox
ResNet50	polybox

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
deliverables		deliverables
models		models
monocular_depth_estimation		monocular_depth_estimation
notebooks		notebooks
utils		utils
.gitignore		.gitignore
README.md		README.md
create_outputs.py		create_outputs.py
example_training.py		example_training.py
preprocess_data.py		preprocess_data.py
requirements.txt		requirements.txt
test_evaluation.py		test_evaluation.py
test_list.txt		test_list.txt
testing.py		testing.py
train_split.csv		train_split.csv
training.py		training.py
val_split.csv		val_split.csv
validation_images.csv		validation_images.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ETH Computational Intelligence Lab Project 2025

Monocular Depth Estimation

Environment Setup

Using Conda

Using Python virtual environment

Install required dependencies

Model Training

Model Evaluation

Submitting Results

Models

Models trained from scratch

ResNet Model

SegFormer Model

Mask2Former Model

Model Weights

Comparison between trained models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

aryansood/CIL

Folders and files

Latest commit

History

Repository files navigation

ETH Computational Intelligence Lab Project 2025

Monocular Depth Estimation

Environment Setup

Using Conda

Using Python virtual environment

Install required dependencies

Model Training

Model Evaluation

Submitting Results

Models

Models trained from scratch

ResNet Model

SegFormer Model

Mask2Former Model

Model Weights

Comparison between trained models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages