Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Enquiry regarding training and dataset downloading #29

@gtc-gh

Description

@gtc-gh

Hi Authors,

Thanks for your great work and your beautiful and clean code!

I have 4 questions about training and dataset downloading.

  1. Can you elaborate on the training details, such as GPU type, number of GPUs, and how long the training process takes?
  2. I saw and followed the download process from the original BEDLAM github. But I found that there are a lot of data and I am not sure which data is used for this training process. Can you elaborate on that as well? Sorry I currently do not have too much available disk space.

The issue for this question is that your tram script uses jpg images but I can not find any jpg files in the BEDLAM official website.

The following script is from the official BEDLAM download page.

Run this script with desired target data type from that folder 
#      to download data.
#      Do not use `all` if you don't need depth data.
#      We recommend to start with smallest folder first. 
#      + `all`:   will download everything (depth,gt,masks,mp4,png), ~6TB local space needed
#      + `depth`: depth images (EXR), ~3.8TB
#      + `gt`:    scene ground truth (CSV), ~100MB
#      + `masks`: segmentation masks (PNG), ~30GB
#      + `mp4`:   movies (MP4), ~20GB
#      + `png`:   image sequences (PNG), ~2.2TB
#
#      Example: `bash ./be_download.sh mp4` 
  1. I am a little bit confused here in estimate_camera.py:
    camera = {'pred_cam_R': cam_R.numpy(), 'pred_cam_T': cam_T.numpy(),
    'world_cam_R': wd_cam_R.numpy(), 'world_cam_T': wd_cam_T.numpy(),
    'img_focal': cam_int[0], 'img_center': cam_int[2:], 'spec_focal': spec_f}
    I commented the 'pred_cam_R' and 'pred_cam_T' and the results also look good. I am not sure about the usage of these two values.
    It looks like the these ('world_cam_R': wd_cam_R.numpy(), 'world_cam_T': wd_cam_T.numpy()) is extrinsic matrix.
    For intrinsic matrix, is it fx=fy='img_focal' in your case? Have you tried use different value for x and y focal length. cx,cy='img_center'. What does spec_focal mean? As for the skew factor s, is it 0 in all of your cases?

  2. I noticed that for the relatively dense multi-person video, there is an obvious drop for the model performance. People were floating in the air and the model struggled to place them into the same plane. Have you encountered this and do you have any possible solutions for this?

tram_output.mp4

Thanks a lot for your time and help in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions