Build task.param_kwargs dynamically. #2446

riga · 2018-06-18T12:04:09Z

Description, Motivation and Context

In the current task constructor, all task parameters at initialization time are stored in param_kwargs:

    def __init__(self, *args, **kwargs):
        params = self.get_params()
        param_values = self.get_param_values(params, args, kwargs)

        # Set all values on class instance
        for key, value in param_values:
            setattr(self, key, value)

        # Register kwargs as an attribute on the class. Might be useful
        self.param_kwargs = dict(param_values)  # <--- set once

Dynamic changes of parameter values are not reflected in this dict, which is somewhat dangerous. Consider this example using util.common_params (which uses param_kwargs):

class MyTask(luigi.Task):

    weight = luigi.FloatParameter(default=0.)
    unit = luigi.ChoiceParameter(default="g", choices=("g", "kg"))

    def __init__(self, *args, **kwargs):
        super(MyTask, self).__init__(*args, **kwargs)

        # store weight internally as grams
        if self.unit == "kg":
            self.weight *= 1000

    def requires(self):
        required_cls = ... # class of the required task
        common_params = luigi.util.common_params(self, required_class)
        #
        # -> common_params["weight"] will have the wrong value when unit is kg!
        #
        return required_cls(**common_params)

(ok, this might not be the best example...imagine a situation where a parameter value changes after Task.__init__)

util.common_params uses the param_kwargs attribute which is not synchronized to the actual task parameters at the time it is called.

This PR makes param_kwargs a property so its content becomes dynamic. There is a small implication on task.__eq__ which currently compares the param_kwargs dict between two instances. This might be too inefficient with dynamic properties, but I think it should be safe to compare task.task_id instead.

Have you tested this?

No new feature was added, so the existing tests should completely cover changed code.

Tarrasch

I'm quite sceptic to this. Is it ever necessary to mutate the parameters? In those cases that you do anyway. I would appreciate that I could debug an task-object to see the original values of the parameters.

Tarrasch · 2018-07-04T20:48:14Z

luigi/task.py


    def __eq__(self, other):
-        return self.__class__ == other.__class__ and self.param_kwargs == other.param_kwargs
+        return self.__class__ == other.__class__ and self.task_id == other.task_id


We probably could do this anyway. :)

Yeah, we even experienced a noticeable speed-up during tree building when dealing with > O(100) tasks as the dict comparison (== length + keys + values comparison) is avoided (depends on the use-case of course).

riga · 2018-07-05T09:16:36Z

I'm quite sceptic to this. Is it ever necessary to mutate the parameters?

I guess this strongly depends on the use case / personal task setup. In our analysis code, we definitely need to change some parameters (but only on rare occasions) and avoiding this would have some inconvenient implications. But this is just how we use it, so this might not really apply to all users ;)

Independently, I think it's somewhat confusing if a simple setattr on a task parameter value such as

task = MyTask(param=1)
task.param = 2
# task.param_kwargs["param"] => 1

can lead to sync issues between the param_kwargs field (which is used in various places) and the actual parameter value. Instead of making param_kwargs dynamic, one could also disable the setter on task parameters.

Tarrasch · 2018-07-09T20:01:42Z

one could also disable the setter on task parameters.

I would prefer this. What's wrong with simply having computed fields?

riga · 2018-07-12T16:03:54Z

I would prefer this.

Me too. Or if they were synchronized with the dict immediately. How about this addition to the base task:

    def __getattribute__(self, attr):
        if attr != "__class__" and isinstance(getattr(self.__class__, attr, None), luigi.Parameter):
            return self.param_kwargs[attr]
        else:
            return super(Task, self).__getattribute__(attr)

    def __setattr__(self, attr, value):
        if isinstance(getattr(self.__class__, attr, None), luigi.Parameter):
            self.param_kwargs[attr] = value
        else:
            super(Task, self).__setattr__(attr, value)

Unfortunately this is not possible with __getattr__ (which I would prefer), since the parameters are registered on the class with the same name, so __getattr__ would never be invoked for them. But the above implementation is safe nevertheless. One could also drop

luigi/luigi/task.py

Lines 436 to 439 in 1192003

    
           # Set all values on class instance 
        
           for key, value in param_values: 
        
               setattr(self, key, value)

then (as well as the current changes in this PR).

Tarrasch · 2018-07-15T10:04:07Z

I think we're over-thinking this. What about a seperate PR with only the .task_id == change? I would love to make the parameter flags unsettable but I don't want to break compatibility now. Also I'm not sure the `get/setattribute`` magic is worth it. Maybe just encourage our users to not set the instantiated parameters?

riga · 2018-07-16T08:54:16Z

I think you're right. Especially the getattribute is maybe just too much for this case. Will open a new PR with the task_id comparison.

riga added 2 commits June 18, 2018 12:49

Build task.param_kwargs dynamically.

9c5fb53

Respect ordering in param_kwargs.

eab11c8

Tarrasch reviewed Jul 4, 2018

View reviewed changes

riga closed this Jul 16, 2018

riga mentioned this pull request Jul 16, 2018

Use task_id in Task.__eq__ comparison #2462

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Build task.param_kwargs dynamically. #2446

Build task.param_kwargs dynamically. #2446

Uh oh!

riga commented Jun 18, 2018 •

edited

Loading

Uh oh!

Tarrasch left a comment

Uh oh!

Tarrasch Jul 4, 2018

Uh oh!

riga Jul 5, 2018

Uh oh!

riga commented Jul 5, 2018

Uh oh!

Tarrasch commented Jul 9, 2018

Uh oh!

riga commented Jul 12, 2018 •

edited

Loading

Uh oh!

Tarrasch commented Jul 15, 2018

Uh oh!

riga commented Jul 16, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Build task.param_kwargs dynamically. #2446

Build task.param_kwargs dynamically. #2446

Uh oh!

Conversation

riga commented Jun 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description, Motivation and Context

Have you tested this?

Uh oh!

Tarrasch left a comment

Choose a reason for hiding this comment

Uh oh!

Tarrasch Jul 4, 2018

Choose a reason for hiding this comment

Uh oh!

riga Jul 5, 2018

Choose a reason for hiding this comment

Uh oh!

riga commented Jul 5, 2018

Uh oh!

Tarrasch commented Jul 9, 2018

Uh oh!

riga commented Jul 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tarrasch commented Jul 15, 2018

Uh oh!

riga commented Jul 16, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

riga commented Jun 18, 2018 •

edited

Loading

riga commented Jul 12, 2018 •

edited

Loading