-
Couldn't load subscription status.
- Fork 37
Description
For Training:
CUDA_VISIBLE_DEVICES="2"
nohup python -u main.py fit
-c configs/dinov3/painting/semantic/eomt_large_512.yaml
--trainer.devices 1
--data.batch_size 16
--data.path /mnt/lepeng/
--model.ckpt_path ./model_zoo/pytorch_model.bin
--model.load_ckpt_class_head False > painting_seg.log 2>&1 &
Epoch 0: 100%|██████████| 700/700 [09:15<00:00, 1.26it/s, v_num=hxdk, losses/train_loss_total=4.010]mIoU: 96.0
Epoch 1: 100%|██████████| 700/700 [09:04<00:00, 1.29it/s, v_num=hxdk, losses/train_loss_total=1.740]mIoU: 97.8
Epoch 2: 100%|██████████| 700/700 [08:53<00:00, 1.31it/s, v_num=hxdk, losses/train_loss_total=1.250]mIoU: 98.8
Epoch 3: 100%|██████████| 700/700 [09:14<00:00, 1.26it/s, v_num=hxdk, losses/train_loss_total=0.983]mIoU: 99.1
Epoch 4: 100%|██████████| 700/700 [08:47<00:00, 1.33it/s, v_num=hxdk, losses/train_loss_total=2.380]mIoU: 99.1
Epoch 5: 100%|██████████| 700/700 [08:49<00:00, 1.32it/s, v_num=hxdk, losses/train_loss_total=1.000]mIoU: 99.2
Epoch 6: 100%|██████████| 700/700 [08:48<00:00, 1.33it/s, v_num=hxdk, losses/train_loss_total=1.080]mIoU: 98.6
Epoch 7: 100%|██████████| 700/700 [08:48<00:00, 1.32it/s, v_num=hxdk, losses/train_loss_total=1.090]mIoU: 99.0
Epoch 8: 100%|██████████| 700/700 [08:47<00:00, 1.33it/s, v_num=hxdk, losses/train_loss_total=1.290]mIoU: 99.2
Epoch 9: 100%|██████████| 700/700 [08:49<00:00, 1.32it/s, v_num=hxdk, losses/train_loss_total=1.050]mIoU: 99.3
Epoch 10: 100%|██████████| 700/700 [08:46<00:00, 1.33it/s, v_num=hxdk, losses/train_loss_total=1.060]mIoU: 99.4
Epoch 11: 100%|██████████| 700/700 [08:46<00:00, 1.33it/s, v_num=hxdk, losses/train_loss_total=0.999]mIoU: 99.4
Epoch 12: 100%|██████████| 700/700 [08:49<00:00, 1.32it/s, v_num=hxdk, losses/train_loss_total=1.140]mIoU: 99.5
Epoch 13: 100%|██████████| 700/700 [08:48<00:00, 1.32it/s, v_num=hxdk, losses/train_loss_total=0.726]mIoU: 99.5
Epoch 14: 100%|██████████| 700/700 [08:48<00:00, 1.33it/s, v_num=hxdk, losses/train_loss_total=0.959]mIoU: 99.6
Epoch 15: 100%|██████████| 700/700 [08:49<00:00, 1.32it/s, v_num=hxdk, losses/train_loss_total=0.861]mIoU: 99.6
Epoch 15: 100%|██████████| 700/700 [08:49<00:00, 1.32it/s, v_num=hxdk, losses/train_loss_total=0.861]Trainer.fit stopped: max_epochs=16 reached.
Epoch 15: 100%|██████████| 700/700 [09:14<00:00, 1.26it/s, v_num=hxdk, losses/train_loss_total=0.861]
(the image from ./wandb/offline-run-20251028_000101-ac9nhxdk/files/media/images)
For Validate:
CUDA_VISIBLE_DEVICES="2"
python3 main.py validate
-c configs/dinov3/painting/semantic/eomt_large_512.yaml
--model.network.masked_attn_enabled False
--trainer.devices 1
--data.batch_size 4
--data.path /mnt/lepeng/
--model.ckpt_path /home/ext_disk1/lepeng/eomt/eomt/ac9nhxdk/checkpoints/epoch=15-step=11200.ckpt
Seed set to 0
INFO:root:Delta weights mode
INFO:root:Zeroed 11,667,459 / 314,796,035 parameters (everything not under 'network.encoder.')
INFO:root:Loaded 436 keys
Using 16bit Automatic Mixed Precision (AMP)
Using default ModelCheckpoint. Consider installing litmodels package to enable LitModelCheckpoint for automatic upload to the Lightning model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
wandb: WARNING resume will be ignored since W&B syncing is set to offline. Starting a new run with run id fth2gp2q.
wandb: Tracking run with wandb version 0.19.10
wandb: W&B syncing is set to offline in this directory. Run wandb online or set WANDB_MODE=online to enable cloud syncing.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [2]
Validation DataLoader 0: 100%|███████████████████████████████████████████████████████████████████████████| 478/478 [01:22<00:00, 5.77it/s]
mIoU: 31.3
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Validate metric DataLoader 0
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
metrics/val_iou_all 0.3132353127002716
─────────────────────────────────────────────────────────────────────────────
inference.ipynb
there are some error ?