RuntimeError: The size of tensor a (462) must match the size of tensor b (461) at non-singleton dimension 0

Hi, thanks for sharing your code.
I pretrained the model but got this error after running `python train.py --cfg_path ./configs/clay/clay_seq2seq_all.yaml`:

2024-04-12 17:17:45 [root] INFO Setup output directory - experiments/clay_seq2seq_upsample_manual.
2024-04-12 17:17:59 [dataloader] INFO Setup [clay] dataset in [Seq2Seq] mode.
2024-04-12 17:17:59 [dataloader] INFO [clay] dataset in [Seq2Seq] mode => Test dataset False.
2024-04-12 17:17:59 [model] INFO Setup model Rel2Bbox.
2024-04-12 17:17:59 [model] INFO Model structure:
2024-04-12 17:18:00 [model] INFO Rel2Bbox(
  (encoder): RelEncoder(
    (input_embeddings): Sentence_Embeddings(
      (word_embeddings): Embedding(34, 256, padding_idx=0)
      (obj_id_embeddings): Embedding(300, 256, padding_idx=0)
      (parent_id_embeddings): Embedding(300, 256, padding_idx=0)
      (sentence_type): Embedding(33, 256, padding_idx=0)
      (token_type): Embedding(4, 256, padding_idx=0)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): TransformerEncoder(num_layers=8, num_heads=8)
    (vocab_classifier): Linear(in_features=256, out_features=34, bias=True)
    (obj_id_classifier): Linear(in_features=256, out_features=300, bias=True)
    (parent_id_classifier): Linear(in_features=256, out_features=300, bias=True)
    (token_type_classifier): Linear(in_features=256, out_features=4, bias=True)
  )
  (bbox_head): BBox_Head(
    (Decoder): PDFDecoder(
      (box_embedding): Linear(in_features=4, out_features=64, bias=True)
      (output_Layer): Linear(in_features=576, out_features=256, bias=True)
      (latent_transformer): Linear(in_features=256, out_features=192, bias=True)
      (decoder): CustomTransformerDecoder(num_layers=2, num_heads=2)
      (box_predictor): GMM_head(
        (xy_bivariate): Linear(in_features=256, out_features=30, bias=True)
        (xy_embedding): Linear(in_features=2, out_features=64, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
        (wh_bivariate): Linear(in_features=320, out_features=30, bias=True)
      )
    )
    (refine_encoder): Refine_Encoder(
      (box_embedding): Linear(in_features=4, out_features=64, bias=True)
      (layer): TransformerRefineLayer(
        (layer_norm): LayerNorm((256,), eps=1e-06, elementwise_affine=True)
        (box_norm): LayerNorm((64,), eps=1e-06, elementwise_affine=True)
        (src_src_att): Custom_Attention(
          (k_layer): Linear(in_features=64, out_features=64, bias=True)
          (v_layer): Linear(in_features=256, out_features=256, bias=True)
          (q_layer): Linear(in_features=64, out_features=64, bias=True)
          (confident_layer): Sequential(
            (0): Linear(in_features=64, out_features=64, bias=True)
            (1): ReLU()
          )
          (output_layer): Linear(in_features=256, out_features=256, bias=True)
          (softmax): Softmax(dim=-1)
          (dropout): Dropout(p=0.1, inplace=False)
        )
        (combine_layer): Linear(in_features=320, out_features=256, bias=True)
        (feed_forward): PositionwiseFeedForward(
          (layer_norm): LayerNorm((256,), eps=1e-06, elementwise_affine=True)
          (pwff_layer): Sequential(
            (0): Linear(in_features=256, out_features=1024, bias=True)
            (1): GELU()
            (2): Dropout(p=0.1, inplace=False)
            (3): Linear(in_features=1024, out_features=256, bias=True)
            (4): Dropout(p=0.1, inplace=False)
          )
        )
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (layer_norm): LayerNorm((256,), eps=1e-06, elementwise_affine=True)
      (emb_dropout): Dropout(p=0.1, inplace=False)
    )
    (refine_box_head): Linear_head(
      (box_embedding): Linear(in_features=4, out_features=64, bias=True)
      (dense): Linear(in_features=320, out_features=64, bias=True)
      (feed_forward): Linear(in_features=64, out_features=4, bias=True)
      (activation): Sigmoid()
    )
  )
)
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.001
    maximize: False
    weight_decay: 0.01
)
2024-04-12 17:18:01 [scheduler] INFO Setup scheduler BertScheduler.
Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    capturable: False
    differentiable: False
    eps: 1e-08
    foreach: None
    fused: None
    lr: 0.001
    maximize: False
    weight_decay: 0.01
)
2024-04-12 17:18:01 [scheduler] INFO Setup scheduler BertScheduler.
2024-04-12 17:18:01 [dataloader] INFO Setup trainer PretrainTrainer.
2024-04-12 17:18:01 [PretrainTrainer] INFO [Phase: train, Epoch: 0]
2024-04-12 17:18:07 [PretrainTrainer] INFO [1/1007] Loss: 482.7012 Loss_position: 16.2822 Loss_size: 7.2394 Loss_vocab: 3.7860 Loss_obj_id: 2.3140 Loss_parent_id: 1.6424 Loss_token_type: 0.0149 Loss_box: [45.4274,362.9301] Loss_kl: [0.3773,0.0000] Loss_rel: [25.7745, 5.0757] Co IOU: 0.0000 Re IOU: 0.0186 overlap loss: 2.3434 overlap loss inside: 9.4938

D:\MLLMforGUI\GUILGET-main\trainer\loss.py:447: UserWarning: Using a target size (torch.Size([461, 2])) that is different to the input size (torch.Size([462, 2])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  xy_loss = F.mse_loss(sub_pred-obj_pred, sample_rel, reduction='sum')
Traceback (most recent call last):
  File "D:\MLLMforGUI\GUILGET-main\train.py", line 68, in <module>
    T.train()
  File "D:\MLLMforGUI\GUILGET-main\trainer\Pretrain.py", line 111, in train
    log = self._run_epoch(i, 'train', mode)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\MLLMforGUI\GUILGET-main\trainer\Pretrain.py", line 281, in _run_epoch
    rel_loss, rel2_loss = self.rel_loss(coarse_gmm, coarse_box_label)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\MLLMforGUI\GUILGET-main\trainer\loss.py", line 447, in forward
    xy_loss = F.mse_loss(sub_pred-obj_pred, sample_rel, reduction='sum')
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\torch\nn\functional.py", line 3338, in mse_loss
    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\Lib\site-packages\torch\functional.py", line 76, in broadcast_tensors
    return _VF.broadcast_tensors(tensors)  # type: ignore[attr-defined]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The size of tensor a (462) must match the size of tensor b (461) at non-singleton dimension 0

Could you please help me to get it working?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RuntimeError: The size of tensor a (462) must match the size of tensor b (461) at non-singleton dimension 0 #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

RuntimeError: The size of tensor a (462) must match the size of tensor b (461) at non-singleton dimension 0 #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions