Prediction quality on custom surround-view data (single frame, 6 cameras)

Thank you for your work!

I ran inference on custom data using only one frame with 6 surround-view images.

The prediction results are not very good — it seems that the relative positions are predicted incorrectly.

Could you please help me understand what might be going wrong?

Does this have anything to do with the input order of the images?

<img width="1159" height="1261" alt="Image" src="https://github.com/user-attachments/assets/c68c8a16-a3bc-4427-bb43-27bc52e641e0" />
(Image A: ground-truth pointmap)

<img width="1526" height="1374" alt="Image" src="https://github.com/user-attachments/assets/d3ea4766-25f8-466e-86d2-54f49ae727f8" />
(Image B: inference result)

I also cropped some of the image borders, and the results became even worse.

Does the model require all surround-view images to have the same size?

<img width="1526" height="1374" alt="Image" src="https://github.com/user-attachments/assets/3f48816d-41b8-407a-bd99-9ec6fcb82e66" />
(Image C: inference result after cropping)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Prediction quality on custom surround-view data (single frame, 6 cameras) #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Prediction quality on custom surround-view data (single frame, 6 cameras) #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions