Thanks to visit codestin.com
Credit goes to github.com

Skip to content

aryansood/CIL

Repository files navigation

ETH Computational Intelligence Lab Project 2025

Monocular Depth Estimation

  • Group Name: deficienza_computazionale
  • Authors: Luigi Pizza, Aryan Sood, Alessandro Tazza, Federico Villa

Environment Setup

Using Conda

Create a conda environment and activate it:

conda create -n CIL python=3.12
conda activate CIL

Install pip inside the conda environment:

conda install pip

Using Python virtual environment

Create a Python virtual environment:

python -m venv CIL

Install required dependencies

pip install -r requirements.txt

Model Training

The file example_training.py contains an example for running a model. It is the same script used by us to achieve our best Kaggle score.

In the example file, the begin_training_loop function is called. That function is called during training to start the main model training process. The function's parameter use_random_split is a boolean value that should be True if you want to perform a random split of your data, so creating at runtime a training set and validation set at random. If use_random_split = False, then a custom split should be provided, so both the train_split and val_split parameters should be non-empty.

Here there is an example of such parameters using the training and validation splits created by clustering images based on their cosine similarity (as we did in our experiments):

  • train_split=pd.read_csv("train_split.csv")["file_name"].to_list()
  • val_split=pd.read_csv("val_split.csv")["file_name"].to_list()

Check the files train_split.csv and val_split.csv in the repository root folder to see how to format the files. Both files are formatted as Comma-Separated Value (.csv) files with only one column, file_name, and the values contained in the column are alternating names of depth values (.npy) and RGB Images (.png) filenames.

Model Evaluation

To evaluate a previously trained model, run the script test_evaluation.py after setting the model weights, the path of your test data and the directory where the predictions should be saved (refer to the comments inside the evaluation file for more details on how to change the values to evaluate a specific model).

To use the same model weights and test data as done in our best experiment, do not change any variables in the script. Simply download the model weights and save the downloaded checkpoint file in the root directory of this repository.

Submitting Results

After evaluation, to generate a Kaggle submission file, run the script create_outputs.py.

Models

Inside the models/ subfolder there are the files for the different models we tested.

Models trained from scratch

Those are the models we trained from scratch by using the data referred in train_split.csv.

  • Base U-Net: models/large_unet.py
  • UNet++: models/unetpp.py
  • Swin transformer encoder with a U-Net decoder: models/unet_swin_depth_estimator.py
  • ViT transformer encoder with a U-Net decoder: models/unet_vit_depth_estimator.py

ResNet Model

The model created using a ResNet encoder plus a U-Net decoder used pre-trained weights for the encoder and was fine-tuned on the training data referred in train_split.csv. The encoder's pre-trained weights were taken from torchvision.models.ResNet50_Weights. The files detailing ResNet based models are:

  • ResNet: models/resnet_unet_decoder.py
  • ResNet-Transformer: models/resnet_transformer_unet.py

SegFormer Model

The model created by using SegFormer encoder plus SegFormer Decoder (modified by us to output a depth mask) used pre-trained weights for the encoder and was fine-tuned on the training data referred in train_split.csv. The encoder's pre-trained weights were taken from nvidia/segformer-b5-finetuned-ade-640-640.

For more details of the actual model's implementation refer to models/segformer_depth.py.

Mask2Former Model

The model created by using a Mask2Former architecture plus double convolutional layer used pre-trained weights for the Mask2Former architecture and was fine-tuned on the training data as referred in train_split.csv. The fine-tuned weights used were taken from facebook/mask2former-swin-small-coco-instance.

For more details of the actual model's implementation refer to: models/Mask2Former_depth.py.

Table: Depth estimation performance across different architectures

Model Validation Loss Training Loss Kaggle Public Score
U-Net Based models
Base U-Net 0.27101 0.32242 --
UNet++ 0.36963 0.44006 --
ResNet Based models
ResNet 0.15886 0.09157 0.16063
ResNet-Transformer 0.17421 0.11357 0.21408
Mask2Former model
mask2former-swin-small-coco-instance 0.12976 0.097775 0.13343
SegFormer models
segformer-b4-512-512 0.14236 0.11561 0.14048
segformer-b5-640-640 0.11474 0.15951 0.12621

Note: The SegFormer based model achieved a better result compared to the other models, showing a 27.8% improvement over the ResNet based model and a 68.9% improvement over the Base-U-Net and the UNet++ model.

Model Weights

To evaluate our results, we provide the model weights of the following models:

Model File
segformer-b4-512-512 polybox
segformer-b5-640-640 polybox
mask2former-swin-small-coco-instance polybox
ResNet50 polybox

Comparison between trained models

Comparison between models

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages