Thanks to visit codestin.com
Credit goes to github.com

Skip to content

please check the training/validate code, maybe there are some error , the validate result is different with training result. can you offer the predict code if ok ?  #52

@MachineLP

Description

@MachineLP

For Training:
CUDA_VISIBLE_DEVICES="2"
nohup python -u main.py fit
-c configs/dinov3/painting/semantic/eomt_large_512.yaml
--trainer.devices 1
--data.batch_size 16
--data.path /mnt/lepeng/
--model.ckpt_path ./model_zoo/pytorch_model.bin
--model.load_ckpt_class_head False > painting_seg.log 2>&1 &

Epoch 0: 100%|██████████| 700/700 [09:15<00:00, 1.26it/s, v_num=hxdk, losses/train_loss_total=4.010]mIoU: 96.0
Epoch 1: 100%|██████████| 700/700 [09:04<00:00, 1.29it/s, v_num=hxdk, losses/train_loss_total=1.740]mIoU: 97.8
Epoch 2: 100%|██████████| 700/700 [08:53<00:00, 1.31it/s, v_num=hxdk, losses/train_loss_total=1.250]mIoU: 98.8
Epoch 3: 100%|██████████| 700/700 [09:14<00:00, 1.26it/s, v_num=hxdk, losses/train_loss_total=0.983]mIoU: 99.1
Epoch 4: 100%|██████████| 700/700 [08:47<00:00, 1.33it/s, v_num=hxdk, losses/train_loss_total=2.380]mIoU: 99.1
Epoch 5: 100%|██████████| 700/700 [08:49<00:00, 1.32it/s, v_num=hxdk, losses/train_loss_total=1.000]mIoU: 99.2
Epoch 6: 100%|██████████| 700/700 [08:48<00:00, 1.33it/s, v_num=hxdk, losses/train_loss_total=1.080]mIoU: 98.6
Epoch 7: 100%|██████████| 700/700 [08:48<00:00, 1.32it/s, v_num=hxdk, losses/train_loss_total=1.090]mIoU: 99.0
Epoch 8: 100%|██████████| 700/700 [08:47<00:00, 1.33it/s, v_num=hxdk, losses/train_loss_total=1.290]mIoU: 99.2
Epoch 9: 100%|██████████| 700/700 [08:49<00:00, 1.32it/s, v_num=hxdk, losses/train_loss_total=1.050]mIoU: 99.3
Epoch 10: 100%|██████████| 700/700 [08:46<00:00, 1.33it/s, v_num=hxdk, losses/train_loss_total=1.060]mIoU: 99.4
Epoch 11: 100%|██████████| 700/700 [08:46<00:00, 1.33it/s, v_num=hxdk, losses/train_loss_total=0.999]mIoU: 99.4
Epoch 12: 100%|██████████| 700/700 [08:49<00:00, 1.32it/s, v_num=hxdk, losses/train_loss_total=1.140]mIoU: 99.5
Epoch 13: 100%|██████████| 700/700 [08:48<00:00, 1.32it/s, v_num=hxdk, losses/train_loss_total=0.726]mIoU: 99.5
Epoch 14: 100%|██████████| 700/700 [08:48<00:00, 1.33it/s, v_num=hxdk, losses/train_loss_total=0.959]mIoU: 99.6
Epoch 15: 100%|██████████| 700/700 [08:49<00:00, 1.32it/s, v_num=hxdk, losses/train_loss_total=0.861]mIoU: 99.6
Epoch 15: 100%|██████████| 700/700 [08:49<00:00, 1.32it/s, v_num=hxdk, losses/train_loss_total=0.861]Trainer.fit stopped: max_epochs=16 reached.
Epoch 15: 100%|██████████| 700/700 [09:14<00:00, 1.26it/s, v_num=hxdk, losses/train_loss_total=0.861]

(the image from ./wandb/offline-run-20251028_000101-ac9nhxdk/files/media/images)
Image

For Validate:
CUDA_VISIBLE_DEVICES="2"
python3 main.py validate
-c configs/dinov3/painting/semantic/eomt_large_512.yaml
--model.network.masked_attn_enabled False
--trainer.devices 1
--data.batch_size 4
--data.path /mnt/lepeng/
--model.ckpt_path /home/ext_disk1/lepeng/eomt/eomt/ac9nhxdk/checkpoints/epoch=15-step=11200.ckpt

Seed set to 0
INFO:root:Delta weights mode
INFO:root:Zeroed 11,667,459 / 314,796,035 parameters (everything not under 'network.encoder.')
INFO:root:Loaded 436 keys
Using 16bit Automatic Mixed Precision (AMP)
Using default ModelCheckpoint. Consider installing litmodels package to enable LitModelCheckpoint for automatic upload to the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
wandb: WARNING resume will be ignored since W&B syncing is set to offline. Starting a new run with run id fth2gp2q.
wandb: Tracking run with wandb version 0.19.10
wandb: W&B syncing is set to offline in this directory. Run wandb online or set WANDB_MODE=online to enable cloud syncing.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [2]
Validation DataLoader 0: 100%|███████████████████████████████████████████████████████████████████████████| 478/478 [01:22<00:00, 5.77it/s]
mIoU: 31.3
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Validate metric DataLoader 0
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
metrics/val_iou_all 0.3132353127002716
─────────────────────────────────────────────────────────────────────────────

Image

inference.ipynb
there are some error ?
Image

Image Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions