Use task_id in Task.eq comparison #2462

riga · 2018-07-16T09:08:10Z

Description, Motivation and Context

Currently, Task.__eq__ works like this:

Lines 543 to 544 in 1192003

    
           def __eq__(self, other): 
        
               return self.__class__ == other.__class__ and self.param_kwargs == other.param_kwargs

param_kwargs is a dictionary, so task comparison might be slow for tasks with many parameters. It should be safe to just compare the task_id which is computed in the task init based on all parameters and therefore is a valid hash for task comparison.

(originally added in #2446)

Have you tested this? If so, how?

No new feature was added, so the existing tests should completely cover the code changes.

Tarrasch

Looks great:

Have you ran this in production? It's not necessary but it would be nice.
Does your code run faster or something?

Regardless I think this can be merged unless anyone objects.

riga · 2018-07-19T08:18:40Z

Have you ran this in production? It's not necessary but it would be nice.

Yes, we are already running this in production, works as expected.

Does your code run faster or something?

In some places we explicitly compare tasks and for ~100-1k tasks, we definitely see a difference (well, just our subjective feeling).

Tarrasch · 2018-07-21T12:47:55Z

Now let's pray we didn't break anything. :)

* upstream-master: (82 commits) S3 client refactor (spotify#2482) Rename to rpc_log_retries, and make it apply to all the logging involved Factor log_exceptions into a configuration parameter Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478) Add a visiblity level for luigi.Parameters (spotify#2278) Add support for multiple requires and inherits arguments (spotify#2475) Add metadata columns to the RDBMS contrib (spotify#2440) Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477) tests: Use RunOnceTask where possible (spotify#2476) Optional TOML configs support (spotify#2457) Added default port behaviour for Redshift (spotify#2474) Add codeowners file with default and specific example (spotify#2465) Add Data Revenue to the `blogged` list (spotify#2472) Fix Scheduler.add_task to overwrite accepts_messages attribute. (spotify#2469) Use task_id comparison in Task.__eq__. (spotify#2462) Add stale config Move github templates to .github dir Fix transfer config import (spotify#2458) Additions to provide support for the Load Sharing Facility (LSF) job scheduler (spotify#2373) Version 2.7.6 ...

* upstream-master: Remove long-deprecated scheduler config variable alternatives (spotify#2491) Bump tornado milestone version (spotify#2490) Update moto to 1.x milestone version (spotify#2471) Use passed password when create a redis connection (spotify#2489) S3 client refactor (spotify#2482) Rename to rpc_log_retries, and make it apply to all the logging involved Factor log_exceptions into a configuration parameter Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478) Add a visiblity level for luigi.Parameters (spotify#2278) Add support for multiple requires and inherits arguments (spotify#2475) Add metadata columns to the RDBMS contrib (spotify#2440) Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477) tests: Use RunOnceTask where possible (spotify#2476) Optional TOML configs support (spotify#2457) Added default port behaviour for Redshift (spotify#2474) Add codeowners file with default and specific example (spotify#2465) Add Data Revenue to the `blogged` list (spotify#2472) Fix Scheduler.add_task to overwrite accepts_messages attribute. (spotify#2469) Use task_id comparison in Task.__eq__. (spotify#2462)

Use task_id comparison in Task.__eq__.

6330c72

Tarrasch approved these changes Jul 18, 2018

View reviewed changes

Tarrasch merged commit 459a260 into spotify:master Jul 21, 2018

thisiscab pushed a commit to glossier/luigi that referenced this pull request Aug 3, 2018

Use task_id comparison in Task.__eq__. (spotify#2462)

6541ecb

thisiscab pushed a commit to glossier/luigi that referenced this pull request Aug 8, 2018

Use task_id comparison in Task.__eq__. (spotify#2462)

e6c5f52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use task_id in Task.eq comparison #2462

Use task_id in Task.eq comparison #2462

Uh oh!

riga commented Jul 16, 2018

Uh oh!

Tarrasch left a comment

Uh oh!

riga commented Jul 19, 2018

Uh oh!

Tarrasch commented Jul 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def __eq__(self, other):
	return self.__class__ == other.__class__ and self.param_kwargs == other.param_kwargs

Use task_id in Task.__eq__ comparison #2462

Use task_id in Task.__eq__ comparison #2462

Uh oh!

Conversation

riga commented Jul 16, 2018

Description, Motivation and Context

Have you tested this? If so, how?

Uh oh!

Tarrasch left a comment

Choose a reason for hiding this comment

Uh oh!

riga commented Jul 19, 2018

Uh oh!

Tarrasch commented Jul 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use task_id in Task.eq comparison #2462

Use task_id in Task.eq comparison #2462