-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Build task.param_kwargs dynamically. #2446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite sceptic to this. Is it ever necessary to mutate the parameters? In those cases that you do anyway. I would appreciate that I could debug an task-object to see the original values of the parameters.
|
|
||
| def __eq__(self, other): | ||
| return self.__class__ == other.__class__ and self.param_kwargs == other.param_kwargs | ||
| return self.__class__ == other.__class__ and self.task_id == other.task_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably could do this anyway. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we even experienced a noticeable speed-up during tree building when dealing with > O(100) tasks as the dict comparison (== length + keys + values comparison) is avoided (depends on the use-case of course).
I guess this strongly depends on the use case / personal task setup. In our analysis code, we definitely need to change some parameters (but only on rare occasions) and avoiding this would have some inconvenient implications. But this is just how we use it, so this might not really apply to all users ;) Independently, I think it's somewhat confusing if a simple task = MyTask(param=1)
task.param = 2
# task.param_kwargs["param"] => 1can lead to sync issues between the |
I would prefer this. What's wrong with simply having computed fields? |
Me too. Or if they were synchronized with the dict immediately. How about this addition to the base task: def __getattribute__(self, attr):
if attr != "__class__" and isinstance(getattr(self.__class__, attr, None), luigi.Parameter):
return self.param_kwargs[attr]
else:
return super(Task, self).__getattribute__(attr)
def __setattr__(self, attr, value):
if isinstance(getattr(self.__class__, attr, None), luigi.Parameter):
self.param_kwargs[attr] = value
else:
super(Task, self).__setattr__(attr, value)Unfortunately this is not possible with Lines 436 to 439 in 1192003
|
|
I think we're over-thinking this. What about a seperate PR with only the |
|
I think you're right. Especially the |
Description, Motivation and Context
In the current task constructor, all task parameters at initialization time are stored in
param_kwargs:Dynamic changes of parameter values are not reflected in this dict, which is somewhat dangerous. Consider this example using util.common_params (which uses
param_kwargs):(ok, this might not be the best example...imagine a situation where a parameter value changes after
Task.__init__)util.common_paramsuses theparam_kwargsattribute which is not synchronized to the actual task parameters at the time it is called.This PR makes
param_kwargsa property so its content becomes dynamic. There is a small implication ontask.__eq__which currently compares theparam_kwargsdict between two instances. This might be too inefficient with dynamic properties, but I think it should be safe to comparetask.task_idinstead.Have you tested this?
No new feature was added, so the existing tests should completely cover changed code.