Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 02a85d0

Browse files
committed
Time limit
1 parent 17b852f commit 02a85d0

17 files changed

Lines changed: 228 additions & 133 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
## New features
44

5+
* Time limit for tasks (option ``--time-limit``)
56
* Job and task times are shown
67
* Integers anywhere can be now written with underscore (e.g. ``--array=1-1_000``)
78

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/arrays.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,10 @@ By default, when a task fails the computation of job continues.
5959
You can change it by ``--max-fails=X`` where ``X`` is non-negative integer.
6060
If more tasks then ``X`` fails, then the rest of non-finished tasks are canceled.
6161

62+
## Time limit
63+
64+
Time limit (``--time-limit``) is counted for each task separatatelly.
65+
6266
## Job canceling
6367

6468
When a job with more tasks is canceled then all non-finished tasks is canceled.

docs/jobs.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,11 @@ from which was the job submitted.
4747

4848
### Output of the job
4949

50+
!!! Warning
51+
52+
If you want to avoid creating many files, see the section about streaming
53+
54+
5055
By default, each job will produce two files containing the standard output and standard error output, respectively.
5156

5257
The paths where these files will be created can be changed via the parameters ``--stdout=<path>`` and ``--stderr=<path>``.
@@ -111,6 +116,7 @@ Detailed information about a job:
111116

112117
You can also use `hq job last` to get information about the most recently submitted job.
113118

119+
114120
## Task states
115121

116122
```
@@ -138,6 +144,31 @@ Finished Failed Canceled
138144
* *Canceled* - The task has been canceled by a user.
139145

140146

147+
## Time limit
148+
149+
Time limit is set as follows:
150+
151+
``hq submit --time-limit=TIME ...``
152+
153+
Where ``TIME`` is a number followed by units (e.g. ``10 min``)
154+
155+
You can use the following units:
156+
157+
* msec, ms -- milliseconds
158+
* seconds, second, sec, s
159+
* minutes, minute, min, m
160+
* hours, hour, hr, h
161+
* days, day, d
162+
* weeks, week, w
163+
* months, month, M -- defined as 30.44 days
164+
* years, year, y -- defined as 365.25 days
165+
166+
167+
Time can be also a combination of more units:
168+
169+
``hq submit --time-limit="1h 30min" ...``
170+
171+
141172
## Task instance
142173

143174
It may happen that a task is started more than once when a worker crashes during execution of a task and the task is rescheduled to another worker. Instance IDs exist to distinguish each run when necessary. Instance ID is 32b non-negative number and it is guarantted that the newer execution has a bigger value. HyperQueue explicitly does *not* guarantee any specific value or differences between two ids. Instance ID is valid only for a particular task. Two different tasks may have the same instance ID.

docs/streaming.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,10 +70,13 @@ Disabling stderr and streaming only stdout into log file.
7070

7171
# Guarantees
7272

73-
When a task is *finished* or *failed* with a non-streaming error then it is guaranteed that its stream is fully flushed into the log file.
73+
When a task is *finished* or *failed* (except fail of streaming, see below) then it is guaranteed that its stream is fully flushed into the log file.
7474

75-
When a task is *canceled* or *failed* with a streaming error, then the stream is not necessarily fully written into the log file in the moment when the state occurs
76-
and some part may be written later, but the stream will be eventually closed. In this case, HQ is also allowed to drop any suffix of the buffered part of the stream.
75+
When a task is *canceled* then the stream is not necessarily fully written into the log file in the moment when the state occurs and some parts may be written later, but the stream will be eventually closed.
76+
77+
When a task is *canceled* or the time limit is reached then part of the stream buffered in the worker is dropped to void spending additional resources for this task. In practice, this should be only part that is produced immediately before the event, because data are sent to the server as soon as possible.
78+
79+
If streaming failed (e.g. insufficient disk space for the log file) then task fails with an error prefixed "Streamer:" and no guarantees for streaming are provided.
7780

7881

7982
# Current limitations

src/client/commands/submit.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ use crate::client::job::{get_worker_map, print_job_detail};
1616
use crate::client::resources::parse_cpu_request;
1717
use crate::client::status::StatusList;
1818
use crate::common::arraydef::IntArray;
19+
use crate::common::timeutils::ArgDuration;
1920
use crate::transfer::connection::ClientConnection;
2021
use crate::transfer::messages::{
2122
FromClientMessage, JobType, ResubmitRequest, SubmitRequest, ToClientMessage,
@@ -137,6 +138,10 @@ pub struct SubmitOpts {
137138
#[clap(long, default_value = "0")]
138139
priority: tako::Priority,
139140

141+
#[clap(long)]
142+
/// Time limit per task. E.g. --time-limit=10min
143+
time_limit: Option<ArgDuration>,
144+
140145
/// Wait on the job(s) execution.
141146
#[clap(long)]
142147
wait: bool,
@@ -231,6 +236,7 @@ pub async fn submit_computation(
231236
max_fails: opts.max_fails,
232237
submit_dir: std::env::current_dir().unwrap().to_str().unwrap().into(),
233238
priority: opts.priority,
239+
time_limit: opts.time_limit.map(|x| x.into()),
234240
log,
235241
});
236242

src/client/job.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,14 @@ pub fn print_job_detail(
209209
job.submission_date.round_subsecs(0).cell(),
210210
]);
211211

212+
rows.push(vec![
213+
"Task time limit".cell().bold(true),
214+
job.time_limit
215+
.map(|duration| humantime::format_duration(duration).to_string())
216+
.unwrap_or_else(|| "None".to_string())
217+
.cell(),
218+
]);
219+
212220
rows.push(vec![
213221
"Makespan".cell().bold(true),
214222
human_duration(job.completion_date_or_now - job.submission_date).cell(),

src/common/timeutils.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,9 @@ impl FromStr for ArgDuration {
1616
Ok(Self(humantime::parse_duration(s)?))
1717
}
1818
}
19+
20+
impl From<ArgDuration> for Duration {
21+
fn from(x: ArgDuration) -> Self {
22+
x.0
23+
}
24+
}

src/server/client.rs

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -333,6 +333,7 @@ async fn handle_submit(
333333
let pin = message.pin;
334334
let submit_dir = message.submit_dir;
335335
let priority = message.priority;
336+
let time_limit = message.time_limit;
336337

337338
let make_task = |job_id, task_id, tako_id, entry: Option<BString>| {
338339
let mut program = make_program_def_for_task(&spec, job_id, task_id, &submit_dir);
@@ -351,6 +352,7 @@ async fn handle_submit(
351352
conf: TaskConfiguration {
352353
resources: resources.clone(),
353354
n_outputs: 0,
355+
time_limit,
354356
type_id: 0,
355357
body,
356358
},
@@ -391,7 +393,8 @@ async fn handle_submit(
391393
pin,
392394
message.max_fails,
393395
message.entries.clone(),
394-
message.priority,
396+
priority,
397+
time_limit,
395398
message.log.clone(),
396399
);
397400
let job_detail = job.make_job_detail(false);
@@ -474,21 +477,19 @@ async fn handle_resubmit(
474477
let spec = job.program_def.clone();
475478
let name = job.name.clone();
476479
let resources = job.resources.clone();
477-
let pin = job.pin;
478480
let entries = job.entries.clone();
479-
let max_fails = job.max_fails;
480-
let priority = job.priority;
481481

482482
SubmitRequest {
483483
job_type,
484484
name,
485-
max_fails,
485+
max_fails: job.max_fails,
486486
spec,
487487
resources,
488-
pin,
488+
pin: job.pin,
489489
entries,
490490
submit_dir: std::env::current_dir().unwrap().to_str().unwrap().into(),
491-
priority,
491+
priority: job.priority,
492+
time_limit: job.time_limit,
492493
log: None, // TODO: Reuse log configuration
493494
}
494495
} else {

src/server/job.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ pub struct Job {
103103

104104
pub entries: Option<Vec<BString>>,
105105
pub priority: tako::Priority,
106+
pub time_limit: Option<std::time::Duration>,
106107

107108
pub submission_date: DateTime<Utc>,
108109
pub completion_date: Option<DateTime<Utc>>,
@@ -123,6 +124,7 @@ impl Job {
123124
max_fails: Option<JobTaskCount>,
124125
entries: Option<Vec<BString>>,
125126
priority: tako::Priority,
127+
time_limit: Option<std::time::Duration>,
126128
job_log: Option<PathBuf>,
127129
) -> Self {
128130
let state = match &job_type {
@@ -158,6 +160,7 @@ impl Job {
158160
entries,
159161
priority,
160162
log: job_log,
163+
time_limit,
161164
submission_date: Utc::now(),
162165
completion_date: None,
163166
}
@@ -185,6 +188,7 @@ impl Job {
185188
pin: self.pin,
186189
max_fails: self.max_fails,
187190
priority: self.priority,
191+
time_limit: self.time_limit,
188192
submission_date: self.submission_date,
189193
completion_date_or_now: self.completion_date.unwrap_or_else(Utc::now),
190194
}

0 commit comments

Comments
 (0)