-
Notifications
You must be signed in to change notification settings - Fork 44
Open
Description
I've encountered the confusing situation of having the conductor still running as a background process, my jobs have all completed successfully, but running maestro status lists the jobs that were run via SLURM as still "PENDING".
Is there any way to figure out what has gone wrong, or otherwise reset the conductor process?
Running ps aus | grep $USER, I see:
login4.stampede3(626)$ ps aux | grep $USER
bwibking 1549018 0.0 0.0 20820 11936 ? Ss Apr30 0:09 /usr/lib/systemd/systemd --user
bwibking 1549019 0.0 0.0 202568 6848 ? S Apr30 0:00 (sd-pam)
bwibking 1725176 0.0 0.0 8436 3692 ? S Apr30 0:00 /bin/sh -c nohup conductor -t 60 -d 2 /scratch/02661/bwibking/precipitator-paper/outputs/medres_compressive_20240430-160408 > /scratch/02661/bwibking/precipitator-paper/outputs/medres_compressive_20240430-160408/medres_compressive.txt 2>&1
bwibking 1725177 0.0 0.0 325552 67876 ? S Apr30 0:32 /scratch/projects/compilers/intel24.0/oneapi/intelpython/python3.9/bin/python3.9 /home1/02661/bwibking/.local/bin/conductor -t 60 -d 2 /scratch/02661/bwibking/precipitator-paper/outputs/medres_compressive_20240430-160408
root 3152422 0.0 0.0 39932 11976 ? Ss 13:31 0:00 sshd: bwibking [priv]
bwibking 3152677 0.0 0.0 40240 7472 ? S 13:31 0:00 sshd: bwibking@pts/122
bwibking 3152678 0.0 0.0 17920 6116 pts/122 Ss 13:31 0:00 -bash
root 3163591 0.0 0.0 39932 12044 ? Ss 13:43 0:00 sshd: bwibking [priv]
bwibking 3163629 0.0 0.0 40076 7236 ? S 13:43 0:00 sshd: bwibking@notty
root 3168472 0.0 0.0 39932 12020 ? Ss 13:48 0:00 sshd: bwibking [priv]
bwibking 3168505 0.0 0.0 40080 7208 ? S 13:48 0:00 sshd: bwibking@notty
bwibking 3174162 0.0 0.0 20408 4220 pts/122 R+ 13:54 0:00 ps aux
bwibking 3174163 0.0 0.0 7592 2836 pts/122 S+ 13:54 0:00 grep --color=auto bwibking
Partial output from maestro status:
run-sim_amp.0.0002222222222222222.f_sol.0.0.tc_tff.1.333521432163324 3 run-sim/amp.0.0002222222222222222.f_sol.0.0.tc_tff.1.333521432163324 PENDING --:--:-- --:--:-- -- 2024-04-30 16:06:52 -- 0
run-sim_amp.0.0003888888888888888.f_sol.0.0.tc_tff.4.216965034285822 3 run-sim/amp.0.0003888888888888888.f_sol.0.0.tc_tff.4.216965034285822 PENDING --:--:-- --:--:-- -- 2024-04-30 16:06:54 -- 0
run-sim_amp.0.00011111111111111112.f_sol.0.0.tc_tff.2.371373705661655 3 run-sim/amp.0.00011111111111111112.f_sol.0.0.tc_tff.2.371373705661655 PENDING --:--:-- --:--:-- -- 2024-04-30 16:06:55 -- 0
run-sim_amp.0.0002777777777777778.f_sol.0.0.tc_tff.7.498942093324558 3 run-sim/amp.0.0002777777777777778.f_sol.0.0.tc_tff.7.498942093324558 PENDING --:--:-- --:--:-- -- 2024-04-30 16:06:57 -- 0
run-sim_amp.0.00044444444444444447.f_sol.0.0.tc_tff.1.1547819846894583 3 run-sim/amp.0.00044444444444444447.f_sol.0.0.tc_tff.1.1547819846894583 PENDING --:--:-- --:--:-- -- 2024-04-30 16:06:58 -- 0
run-sim_amp.1.8518518518518515e-05.f_sol.0.0.tc_tff.3.651741272548377 3 run-sim/amp.1.8518518518518515e-05.f_sol.0.0.tc_tff.3.651741272548377 PENDING --:--:-- --:--:-- -- 2024-04-30 16:07:00 -- 0
run-sim_amp.0.00018518518518518518.f_sol.0.0.tc_tff.2.0535250264571463 3 run-sim/amp.0.00018518518518518518.f_sol.0.0.tc_tff.2.0535250264571463 PENDING --:--:-- --:--:-- -- 2024-04-30 16:07:02 -- 0
run-sim_amp.0.0003518518518518519.f_sol.0.0.tc_tff.6.493816315762113 3 run-sim/amp.0.0003518518518518519.f_sol.0.0.tc_tff.6.493816315762113 PENDING --:--:-- --:--:-- -- 2024-04-30 16:07:04 -- 0
run-sim_amp.7.407407407407406e-05.f_sol.0.0.tc_tff.1.539926526059492 3 run-sim/amp.7.407407407407406e-05.f_sol.0.0.tc_tff.1.539926526059492 PENDING --:--:-- --:--:-- -- 2024-04-30 16:07:06 -- 0
run-sim_amp.0.00024074074074074072.f_sol.0.0.tc_tff.4.869675251658631 3 run-sim/amp.0.00024074074074074072.f_sol.0.0.tc_tff.4.869675251658631 PENDING --:--:-- --:--:-- -- 2024-04-30 16:07:08 -- 0
Metadata
Metadata
Assignees
Labels
No labels