-
Notifications
You must be signed in to change notification settings - Fork 34
Open
Description
Hi ruffus team,
I'm using the drmaa wrapper to submit/run jobs on an SGE cluster. I'm running into communication exceptions that I've been working to resolve (Related issue: aws/aws-parallelcluster#1592). Has the ruffus team encountered this error? If not, is there a resubmit/retry feature that is ready to use? Even though not explicitly documented, it looks like the run_job function takes a resubmit parameter.
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] File "/shared/amgenesis/helpers.py", line 126, in run
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] cmdline.run (options, logger=logger_proxy, multithread = options.jobs, exceptions_terminate_immediately = True)
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/ruffus/cmdline.py", line 834, in run
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] **appropriate_options)
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/ruffus/task.py", line 5424, in pipeline_run
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] raise job_errors
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] ruffus.ruffus_exceptions.RethrownJobError:
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] Original exception:
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] Exception #1
[2020-02-11 00:29:15,628: WARNING/ForkPoolWorker-1] 'drmaa.errors.DrmCommunicationException(code 2: failed receiving gdi request response for mid=65535 (can't send response for this message id - protocol error).)' raised in ...
Metadata
Metadata
Assignees
Labels
No labels