Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@riga
Copy link
Contributor

@riga riga commented Jul 16, 2018

Description, Motivation and Context

Currently, Task.__eq__ works like this:

luigi/luigi/task.py

Lines 543 to 544 in 1192003

def __eq__(self, other):
return self.__class__ == other.__class__ and self.param_kwargs == other.param_kwargs

param_kwargs is a dictionary, so task comparison might be slow for tasks with many parameters. It should be safe to just compare the task_id which is computed in the task init based on all parameters and therefore is a valid hash for task comparison.

(originally added in #2446)

Have you tested this? If so, how?

No new feature was added, so the existing tests should completely cover the code changes.

Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great:

  • Have you ran this in production? It's not necessary but it would be nice.
  • Does your code run faster or something?

Regardless I think this can be merged unless anyone objects.

@riga
Copy link
Contributor Author

riga commented Jul 19, 2018

Have you ran this in production? It's not necessary but it would be nice.

Yes, we are already running this in production, works as expected.

Does your code run faster or something?

In some places we explicitly compare tasks and for ~100-1k tasks, we definitely see a difference (well, just our subjective feeling).

@Tarrasch Tarrasch merged commit 459a260 into spotify:master Jul 21, 2018
@Tarrasch
Copy link
Contributor

Now let's pray we didn't break anything. :)

thisiscab pushed a commit to glossier/luigi that referenced this pull request Aug 3, 2018
thisiscab pushed a commit to glossier/luigi that referenced this pull request Aug 8, 2018
dlstadther added a commit to dlstadther/luigi that referenced this pull request Aug 14, 2018
* upstream-master: (82 commits)
  S3 client refactor (spotify#2482)
  Rename to rpc_log_retries, and make it apply to all the logging involved
  Factor log_exceptions into a configuration parameter
  Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478)
  Add a visiblity level for luigi.Parameters (spotify#2278)
  Add support for multiple requires and inherits arguments (spotify#2475)
  Add metadata columns to the RDBMS contrib (spotify#2440)
  Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477)
  tests: Use RunOnceTask where possible (spotify#2476)
  Optional TOML configs support (spotify#2457)
  Added default port behaviour for Redshift (spotify#2474)
  Add codeowners file with default and specific example (spotify#2465)
  Add Data Revenue to the `blogged` list (spotify#2472)
  Fix Scheduler.add_task to overwrite accepts_messages attribute. (spotify#2469)
  Use task_id comparison in Task.__eq__. (spotify#2462)
  Add stale config
  Move github templates to .github dir
  Fix transfer config import (spotify#2458)
  Additions to provide support for the Load Sharing Facility (LSF) job scheduler (spotify#2373)
  Version 2.7.6
  ...
dlstadther added a commit to dlstadther/luigi that referenced this pull request Aug 16, 2018
* upstream-master:
  Remove long-deprecated scheduler config variable alternatives (spotify#2491)
  Bump tornado milestone version (spotify#2490)
  Update moto to 1.x milestone version (spotify#2471)
  Use passed password when create a redis connection (spotify#2489)
  S3 client refactor (spotify#2482)
  Rename to rpc_log_retries, and make it apply to all the logging involved
  Factor log_exceptions into a configuration parameter
  Fix attribute forwarding for tasks with dynamic dependencies (spotify#2478)
  Add a visiblity level for luigi.Parameters (spotify#2278)
  Add support for multiple requires and inherits arguments (spotify#2475)
  Add metadata columns to the RDBMS contrib (spotify#2440)
  Fix race condition in luigi.lock.acquire_for (spotify#2357) (spotify#2477)
  tests: Use RunOnceTask where possible (spotify#2476)
  Optional TOML configs support (spotify#2457)
  Added default port behaviour for Redshift (spotify#2474)
  Add codeowners file with default and specific example (spotify#2465)
  Add Data Revenue to the `blogged` list (spotify#2472)
  Fix Scheduler.add_task to overwrite accepts_messages attribute. (spotify#2469)
  Use task_id comparison in Task.__eq__. (spotify#2462)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants