Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Questions Regarding Model Design and Dataset #9

@jhkwag970

Description

@jhkwag970

Hello,

Thank you again for your excellent work. I have a few questions regarding the model design and dataset:

  1. In the token decoder, you used 5×5 masked attention for self-attention and a 3×3 convolution block at the end. Have you tried using full (unmasked) attention instead of masked attention, along with the 3×3 convolution block at the end? While masked attention is known to perform better in segmentation tasks due to local region awareness, the token decoder here performs index prediction rather than segmentation.

  2. The BEV labels from the nuScenes dataset do not appear to be rotated upward.

Image

Image

The first image is from timestamp 1, and the second image is from timestamp 2. As you can see, the vehicle moves from left to right. Did you rotate the BEV map 90 degrees counterclockwise for visualization in the paper?

  1. For the MLP head, transformer-based autoregressive models typically use a simple linear head with cross-entropy loss. However, you used an MLP with focal loss. I understand that focal loss addresses class imbalance, but I’m curious how much performance gain you observed from using both the MLP and focal loss.

Thank you again for your great work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions