Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@sebffischer
Copy link
Member

@sebffischer sebffischer commented Apr 16, 2025

Resolves #374

@tdhock With this PR, mlr3torch will now expect an output of shape (batch_size, 1) for binary classification problems.
t_loss("cross_entropy") will automatically select the appropriate loss depending on the number of classes.

In the example below, you see it in action.
We train the same learner once on a binary classification problem (iris subset to two classes) and on a multi-class classification problem (iris).

Depending on the task, the correct output dimension will be set, the targets will be loaded correctly during training, and the correct loss function is instantiated. This is now also respected by all mlr3torch learners.

library(mlr3torch)

cb = torch_callback("loss_fn",
  state_dict = function() {
    self$ctx$loss_fn
  },
  load_state_dict = function(state_dict) {
    self$x = x
  }
)

learner = lrn("classif.mlp", neurons = 100, batch_size = 32, epochs = 10,
  callbacks = cb)

tsk_binary = tsk("iris")$filter(1:100)$droplevels()
tsk_multi = tsk("iris")

learner$train(tsk_binary)
learner$network(torch_randn(1, 4))
#> torch_tensor
#> -0.3322
#> [ CPUFloatType{1,1} ][ grad_fn = <AddmmBackward0> ]
class(learner$model$callbacks$loss_fn)
#> [1] "nn_bce_with_logits_loss" "nn_loss"                
#> [3] "nn_module"

learner$train(tsk_multi)
learner$network(torch_randn(1, 4))
#> torch_tensor
#>  1.0422  0.3267 -0.5658
#> [ CPUFloatType{1,3} ][ grad_fn = <AddmmBackward0> ]
class(learner$model$callbacks$loss_fn)
#> [1] "nn_cross_entropy_loss" "nn_weighted_loss"      "nn_loss"              
#> [4] "nn_module"

Created on 2025-04-16 with reprex v2.1.1

Note that after this is is not possible to have a neural network with output dim (batch_size, 2) for binary classification. I think this is okay, as (batch_size, 1) is clearly better.

glrn = po("torch_ingress_num") %>>%
  nn("linear", out_features = 2) %>>%
  po("torch_loss", loss = "cross_entropy") %>>%
  po("torch_optimizer", "adamw") %>>%
  po("torch_model_classif", epochs = 1, batch_size = 32) |> as_learner()

glrn$train(tsk("sonar"))
#> Error in (function (self, target, weight, pos_weight, reduction) : output with shape [32, 1] doesn't match the broadcast shape [32, 2]
#> Exception raised from mark_resize_outputs at /Users/runner/work/libtorch-mac-m1/libtorch-mac-m1/pytorch/aten/src/ATen/TensorIterator.cpp:1208 (most recent call first):
#> frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>) + 52 (0x10422811c in libc10.dylib)
#> frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) + 140 (0x104224d6c in libc10.dylib)
#> ... 
#> This happened PipeOp torch_model_classif's $train()

Created on 2025-04-16 with reprex v2.1.1

@sebffischer
Copy link
Member Author

@tdhock let me know if this is what you had in mind :)

@tdhock
Copy link
Contributor

tdhock commented Apr 16, 2025

The code seems to work as I would expect, thanks!
I tried this code:

remotes::install_github("mlr-org/mlr3torch@c03d61a18e9785e2dbb5b20e2b6dada74a9b58b8")
stask <- mlr3::tsk("sonar")
po_list <- list(
  mlr3torch::PipeOpTorchIngressNumeric$new(),
  mlr3torch::nn("head"),
  mlr3pipelines::po(
    "torch_loss",
    loss = torch::nn_bce_with_logits_loss),
  mlr3pipelines::po("torch_optimizer"),
  mlr3pipelines::po(
    "torch_model_classif",
    epochs = 1,
    batch_size = 3,
    predict_type="prob"))
graph <- Reduce(mlr3pipelines::concat_graphs, po_list)
glrn <- mlr3::as_learner(graph)
glrn$train(stask)
glrn$predict(stask)

And I got this result:

> glrn$predict(stask)
<PredictionClassif> for 208 observations:
 row_ids truth response    prob.M    prob.R
       1     R        M 0.5324456 0.4675544
       2     R        M 0.5387607 0.4612393
       3     R        M 0.5339928 0.4660072
     ---   ---      ---       ---       ---
     206     M        M 0.5342165 0.4657835
     207     M        M 0.5279804 0.4720196
     208     M        M 0.5211176 0.4788824

@tdhock
Copy link
Contributor

tdhock commented Apr 16, 2025

overall looks good!
I suggested some comments for the docs.

@sebffischer
Copy link
Member Author

@tdhock thanks for the review!

@sebffischer sebffischer merged commit 307ab40 into main Apr 16, 2025
5 of 6 checks passed
@sebffischer sebffischer deleted the binary-head branch April 16, 2025 15:24
Comment on lines +41 to +43
#' * binary classification: The `factor` target variable of a [`TaskClassif`][mlr3::TaskClassif] is encoded as a
#' [`torch_float`][torch::torch_float] with shape `(batch_size, 1)` where the positive class is `1` and the negative
#' class is `0`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, but what is the positive and negative class at the R-level?
In R it is a factor with two levels. Is the first factor level always considered the positive class?

Copy link
Member Author

@sebffischer sebffischer Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mlr3 binary classification tasks have a field $positive. Will add that to the docs!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, it is also ensured that the positive class is the first level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

#'
#' Furthermore, the target encoding is expected to be as follows:
#' * regression: The `numeric` target variable of a [`TaskRegr`][mlr3::TaskRegr] is encoded as a
#' [`torch_float`][torch::torch_float] with shape `c(batch_size, 1)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great!
Maybe this is more of a question/issue for mlr3 rather than mlr3torch, but is it possible to do multi-task regression? (more than one output to predict?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet, but we could implement a Task where the target is a lazy_tensor, which would allow this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting target to lazy_tensor would be specific to torch, right? other non-torch learners can't use lazy_tensors, right?
it would be better if there would be support for any learners (for example glmnet or rpart) and not just torch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that’s true, if you want this this should be a feature request in mlr3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

predict_type="prob" does not work with out_features=1

3 participants