Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@riga
Copy link
Contributor

@riga riga commented Jun 18, 2018

Description, Motivation and Context

In the current task constructor, all task parameters at initialization time are stored in param_kwargs:

    def __init__(self, *args, **kwargs):
        params = self.get_params()
        param_values = self.get_param_values(params, args, kwargs)

        # Set all values on class instance
        for key, value in param_values:
            setattr(self, key, value)

        # Register kwargs as an attribute on the class. Might be useful
        self.param_kwargs = dict(param_values)  # <--- set once

Dynamic changes of parameter values are not reflected in this dict, which is somewhat dangerous. Consider this example using util.common_params (which uses param_kwargs):

class MyTask(luigi.Task):

    weight = luigi.FloatParameter(default=0.)
    unit = luigi.ChoiceParameter(default="g", choices=("g", "kg"))

    def __init__(self, *args, **kwargs):
        super(MyTask, self).__init__(*args, **kwargs)

        # store weight internally as grams
        if self.unit == "kg":
            self.weight *= 1000

    def requires(self):
        required_cls = ... # class of the required task
        common_params = luigi.util.common_params(self, required_class)
        #
        # -> common_params["weight"] will have the wrong value when unit is kg!
        #
        return required_cls(**common_params)

(ok, this might not be the best example...imagine a situation where a parameter value changes after Task.__init__)

util.common_params uses the param_kwargs attribute which is not synchronized to the actual task parameters at the time it is called.

This PR makes param_kwargs a property so its content becomes dynamic. There is a small implication on task.__eq__ which currently compares the param_kwargs dict between two instances. This might be too inefficient with dynamic properties, but I think it should be safe to compare task.task_id instead.

Have you tested this?

No new feature was added, so the existing tests should completely cover changed code.

Copy link
Contributor

@Tarrasch Tarrasch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite sceptic to this. Is it ever necessary to mutate the parameters? In those cases that you do anyway. I would appreciate that I could debug an task-object to see the original values of the parameters.


def __eq__(self, other):
return self.__class__ == other.__class__ and self.param_kwargs == other.param_kwargs
return self.__class__ == other.__class__ and self.task_id == other.task_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably could do this anyway. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we even experienced a noticeable speed-up during tree building when dealing with > O(100) tasks as the dict comparison (== length + keys + values comparison) is avoided (depends on the use-case of course).

@riga
Copy link
Contributor Author

riga commented Jul 5, 2018

I'm quite sceptic to this. Is it ever necessary to mutate the parameters?

I guess this strongly depends on the use case / personal task setup. In our analysis code, we definitely need to change some parameters (but only on rare occasions) and avoiding this would have some inconvenient implications. But this is just how we use it, so this might not really apply to all users ;)

Independently, I think it's somewhat confusing if a simple setattr on a task parameter value such as

task = MyTask(param=1)
task.param = 2
# task.param_kwargs["param"] => 1

can lead to sync issues between the param_kwargs field (which is used in various places) and the actual parameter value. Instead of making param_kwargs dynamic, one could also disable the setter on task parameters.

@Tarrasch
Copy link
Contributor

Tarrasch commented Jul 9, 2018

one could also disable the setter on task parameters.

I would prefer this. What's wrong with simply having computed fields?

@riga
Copy link
Contributor Author

riga commented Jul 12, 2018

I would prefer this.

Me too. Or if they were synchronized with the dict immediately. How about this addition to the base task:

    def __getattribute__(self, attr):
        if attr != "__class__" and isinstance(getattr(self.__class__, attr, None), luigi.Parameter):
            return self.param_kwargs[attr]
        else:
            return super(Task, self).__getattribute__(attr)

    def __setattr__(self, attr, value):
        if isinstance(getattr(self.__class__, attr, None), luigi.Parameter):
            self.param_kwargs[attr] = value
        else:
            super(Task, self).__setattr__(attr, value)

Unfortunately this is not possible with __getattr__ (which I would prefer), since the parameters are registered on the class with the same name, so __getattr__ would never be invoked for them. But the above implementation is safe nevertheless. One could also drop

luigi/luigi/task.py

Lines 436 to 439 in 1192003

# Set all values on class instance
for key, value in param_values:
setattr(self, key, value)
then (as well as the current changes in this PR).

@Tarrasch
Copy link
Contributor

I think we're over-thinking this. What about a seperate PR with only the .task_id == change? I would love to make the parameter flags unsettable but I don't want to break compatibility now. Also I'm not sure the `get/setattribute`` magic is worth it. Maybe just encourage our users to not set the instantiated parameters?

@riga
Copy link
Contributor Author

riga commented Jul 16, 2018

I think you're right. Especially the getattribute is maybe just too much for this case. Will open a new PR with the task_id comparison.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants