Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@coatless
Copy link
Contributor

I'm promoting @cgiachalis' branch changes discussed in the issue ticket #3722 over to a PR for ease of testing.

@eddelbuettel
Copy link
Contributor

That is good but shouldn't we also provide a few stub member functions for the classes?

And/or to make our life easier make this 'one superclass' (with catch-all implementation, maybe one nicer than one list would create, but also add a reference to mlpack etc) plus the bindingname on top for possible specialisation?

ALso, what is the list of generated classes? Do we need NAMESPACE entries for the methods?

Adding the one-liner is good but does not move the needle that far.

@cgiachalis
Copy link

And/or to make our life easier make this 'one superclass' (with catch-all implementation, maybe one nicer than one list would create, but also add a reference to mlpack etc) plus the bindingname on top for possible specialisation?

We can drop list

cout << "  class(out) <- c(\"" << bindingName << "\", \"mlpack\")" << endl;

and have the following

# add subclass and class
class(res) <- c("knn", "mlpack")

class(res)
# [1] "knn" "mlpack" 

@eddelbuettel
Copy link
Contributor

eddelbuettel commented Aug 27, 2025

Sure. But we have no methods for classes knn or mlpack so what good does the classing do for us?

To make this concrete, run examples(kMeans) (from my little spinoff rcppmlpackexamples -- the help files in package mlpack do not seem to have working examples, which may be something else we want to fix). Then on object cl2:

> class(cl2) <- c("kMeans", "mlpack")
> str(cl2)
List of 2
 $ clusters: int 3
 $ result  : num [1, 1:31] 1 1 1 1 2 2 1 2 2 2 ...
 - attr(*, "class")= chr [1:2] "kMeans" "mlpack"
> print(cl2)
$clusters
[1] 3

$result
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31]
[1,]    1    1    1    1    2    2    1    2    2     2     2     2     2     1     2     2     2     2     2     1     2     2     2     2     0     0     0     0     0     0     0

attr(,"class")
[1] "kMeans" "mlpack"
> 

So I see your point. I behaves like a list even if list is not listed. Fair enough.

@coatless
Copy link
Contributor Author

I'm less worried on the hierarchy, e.g.

binding > mlpack > list

From what I recall, the original issue was we were just returning:

list

By salting the object with properties, downstream packages can take advantage using:

is_knn_mlpack <- function(x) {
  inherits(x, "knn")  &&  inherits(x, "mlpack")
}

That said, the current modus operandi makes ample use of mlpack$type and a dispatch switch statement.

  model_serialization_function <-
    switch(attributes(model)$type,
      "GaussianKernel" = SerializeGaussianKernelPtr, # Populated by a print c++ binding
      ...
   )

https://github.com/cran/mlpack/blob/956bbce2f3f444aa8adff8cd91eab31a3088f0de/R/serialization.R#L9

So, I see this being more beneficial for mlpack interacting with R's class()/inherits(). I'm not sure we need to further embrace the class structure inside the package as the purpose of #3722 was to make class information more apparent and actionable.

@cgiachalis
Copy link

cgiachalis commented Aug 27, 2025

Methods will be the next step; the rationale was provided at the opening of #3722.

I was looking for native serialisation method back then (we don't have) and also the specific routines are not exported, and on top of that the model outputs are mere list. So if you want to craft your own method on top of mlpack is not an option right now.

# New S3 generic for native serialisation
serialise_mlpack <- function(object, ...) {
  UseMethod("serialise_mlpack")
}

# Optional
serialise_mlpack.default <- function(object, ...) {

  stop(sprintf("No method for object %s. See ?serialise_mlpack for details.",
               sQuote(deparse(substitute(object)))))
}

# method for KNNModel
serialise_mlpack.knn <- function(object, ...) {
mlpack:::SerializeKNNModelPtr(object$output_model)

}

@cgiachalis
Copy link

cgiachalis commented Aug 27, 2025

Yes, that could be reduced to:

# Using serialise_mlpack() method internally  
Serialize <- function(object, filename) {
  model_serialization <- serialise_mlpack(object)

  con <- file(as.character(filename), "wb")
  serialize(model_serialization, con)
  close(con)
}
res2 <- mlpack::knn(query = x, reference = x, k = 3)
Serialize(res2, filename = "Hello.rds")

But still need to auto-generate

serialise_mlpack.knn <- function(object, ...) {
mlpack:::SerializeKNNModelPtr(object$output_model)

Unless we do it directly in R (faster)

@rcurtin
Copy link
Member

rcurtin commented Sep 15, 2025

@cgiachalis @eddelbuettel @coatless are we happy with the change here? From the mlpack/C++ side it looks good to me, and I agree in principle that returning a class can be useful for exactly the inheritance-based reasons proposed.

I have a couple comments code-wise; up a bit higher in print_R.cpp there is this line:

  if (outputOptions.size() > 0)
    cout << "#' @return A list with several components:" << endl;

So, that's what's used to print the documentation about what the output is. But now we are returning a class, not a list; does it make sense in the R lexicon to change this to something like "A class with several members:"? Here is an example of how that documentation actually gets used (from decision_tree.R):

#' @return A list with several components:
#' \item{output_model}{Output for trained decision tree
#'   (DecisionTreeModel).}
#' \item{predictions}{Class predictions for each test point (integer
#'   row).}
#' \item{probabilities}{Class probabilities for each test point (numeric
#'   matrix).}

We should also add a note to HISTORY.md about this change. I think it would be fine to add serialization functions to the class in a future PR, or this one is fine too if we want to do it here.

@cgiachalis
Copy link

But now we are returning a class, not a list;

We're still returning a list structure, but we're adding extra elements as sub-classes; as @coatless wrote the hierarchy will be binding > mlpack > list.

# before
class(res)
# [1]  "list"

# after
class(res)
# [1] "knn" "mlpack"  "list"

@rcurtin
Copy link
Member

rcurtin commented Sep 15, 2025

I agree in the technical sense, as a list is indeed a class, but wouldn't it make the documentation more accurate to point out that what is being returned is a class?

@cgiachalis
Copy link

Yes definitely, one suggestion is A list object of class mlpack with several components.

@coatless
Copy link
Contributor Author

@rcurtin R semantics. While we're now returning several "classes" instead of a base list, users are still interacting with it as a list: accessing components via model$output_model, etc.

Given that, I think we could modify the documentation to:

"A list with several components that has the class attributes of "${MODEL}" and mlpack_model:"

This accurately describes both the user-facing list interface and the added class structure. The ${MODEL} would get substituted with the actual model type (like knn), making it clear what class attributes have been added to the object while emphasizing users still work with it as a list. I'm also suggesting moving away from mlpack to maybe mlpack_model for better differentiation (and because mlpack might be too general here).

Agree on the HISTORY.md note. It's worth documenting that we're now adding these class attributes to returned objects.

@cgiachalis
Copy link

cgiachalis commented Sep 15, 2025

I'm also suggesting moving away from mlpack to maybe mlpack_model for better differentiation (and because mlpack might be too general here).

We might want to keep mlpack because we're adding classes to all exported objects:

 class(out) <- c(\"" << bindingName << "\", \"mlpack\", \"list\")" << endl;

So a function like mlpack::preprocess_binarize is not a mlpack_model and its class structure will be

"preprocess_binarize" "mlpack" "list"

@eddelbuettel
Copy link
Contributor

(Just wanted to note that for faster iteration / testing we could work all this out in an ad-hoc throw-away package with just R code that just adds S3 wraps around what the real mlpack package does. Once finalise we port this over and kill the one-off ad-hoc package. Better?)

@cgiachalis
Copy link

cgiachalis commented Sep 15, 2025

@eddelbuettel
Copy link
Contributor

eddelbuettel commented Sep 15, 2025

That was fast. I was even thinking way skinnier ie

x <- mlpack::mlpack_some_model_here(some arge)  # boot strap an object
class(x) <- c("foo", "bar", "mlpack", "list") # as needed

But I did not make myself very clear. Sorry 'bout that.

@cgiachalis
Copy link

That was fast. I was even thinking was skinnier ie

x <- mlpack::mlpack_some_model_here(some arge)  # boot strap an object
class(x) <- c("foo", "bar", "mlpack", "list") # as needed

But I did not make myself very clear. Sorry 'bout that.

That's OK. But you can peruse and see what has been generated.

@rcurtin
Copy link
Member

rcurtin commented Sep 15, 2025

Sounds good, whenever things are ready for a review here I will happily oblige. 👍 The deeper R details will be lost on me.

@cgiachalis
Copy link

cgiachalis commented Sep 16, 2025

PR Summary

This PR modifies src/mlpack/bindings/R/print_R.cpp so all mlpack results to gain class attributes <fn name>, 'mlpack'.

Examples:

 # Serialisable model
 mlpack::adaboost()   
 
  # Add binding name as class to the output.
 class(out) <- c("adaboost", "mlpack", "list")
 # No Serialisable model
 mlpack::pca()
 
 # Add binding name as class to the output.
 class(out) <- c("pca", "mlpack", "list")
  

For more outputs see the generated package from mlpack build process, here

In the above examples, adaboost() has model type "AdaBoostModel" that can be accessed via ModelType attribute from the model pointer. On the other hand, pca() doesn't have ModelType so we couldn't generate (via C++) a subclass based on that and therefore we reside to BINDING_NAME which makes C++ side much simpler with one liner.

A quick demonstration - now is easier to create a S3 method to extract the model_type (if any) from an mlpack object:

 # model_type method 
 model_type.mlpack <- function(object) {
 attr(object$output_model, "type") 
 }
 
 # -- -- -- 
 
 res <- mlpack::knn(query = x, reference = x, k = 3)
 
 model_type(res)
 # [1] "KNNModel"
 

Documention (todo)

As per @coatless suggestion with a small change:

`"A list with several components that has the class attributes of "${BINDING_NAME}" and "mlpack":"`

Unit test (todo)

 res <- mlpack::knn(query = x, reference = x, k = 3)
testthat::expect_s3_class(res, c("knn", "mlpack", "list"))

@eddelbuettel
Copy link
Contributor

I am loosing my marbles. The PR is over file src/mlpack/bindings/R/print_R.cpp but not such file is in the one-off repo we created to look at / extend the PR. Am I forgetting how the code generator works?

@cgiachalis
Copy link

Checking at CRAN mlpack archives, it seems that src/mlpack/bindings/R/print_R.cpp/hpp files have never been included.

Is it because they're not listed in CMakeLists.txt ?

set(BINDINGS_SOURCES
"${CMAKE_CURRENT_SOURCE_DIR}/get_type.hpp"
"${CMAKE_CURRENT_SOURCE_DIR}/print_doc.hpp"
"${CMAKE_CURRENT_SOURCE_DIR}/print_doc_functions.hpp"
"${CMAKE_CURRENT_SOURCE_DIR}/print_doc_functions_impl.hpp"
"${CMAKE_CURRENT_SOURCE_DIR}/print_input_param.hpp"
"${CMAKE_CURRENT_SOURCE_DIR}/get_param.hpp"
"${CMAKE_CURRENT_SOURCE_DIR}/get_printable_param.hpp"
"${CMAKE_CURRENT_SOURCE_DIR}/get_r_type.hpp"
"${CMAKE_CURRENT_SOURCE_DIR}/print_input_processing.hpp"
"${CMAKE_CURRENT_SOURCE_DIR}/print_output_processing.hpp"
"${CMAKE_CURRENT_SOURCE_DIR}/print_serialize_util.hpp"
"${CMAKE_CURRENT_SOURCE_DIR}/mlpack_main.hpp"
"${CMAKE_CURRENT_SOURCE_DIR}/R_option.hpp"

@eddelbuettel
Copy link
Contributor

Maybe because CRAN complains about use of std::cout which we could alleviate via use of Rcpp::Rcout as usual?

@rcurtin
Copy link
Member

rcurtin commented Sep 17, 2025

No worries on the marbles, mine are long gone too. Quick refresher:

  • The mlpack R package is completely generated from the mlpack sources.
  • The file print_R.cpp is what's actually run to generate each .R file in the mlpack R package.
  • So, if we want changes to those .R files, we have to modify print_R.cpp here.
  • But, in the package we upload to CRAN, there is no need for the code that actually generated those .R files, so print_R.cpp is not included.

So, whatever you are doing in the test package, I assume you will get the R files looking like you want, and then following that we will make the appropriate changes here such that the generated R files match the hand-crafted ones.

@github-actions github-actions bot closed this Nov 1, 2025
@coatless coatless removed the s: stale label Nov 1, 2025
@coatless coatless self-assigned this Nov 1, 2025
@coatless coatless reopened this Nov 1, 2025
@coatless
Copy link
Contributor Author

coatless commented Nov 1, 2025

I think this should largely be fine as-is. The main component is tweaking the underlying class names, e.g.

# Present
class(out) <- c("list")

To:

# Proposed
class(out) <- c("adaboost", "mlpack", "list")

The main change for this is maybe namespacing model name and splitting this to be a specific binding, e.g.

# Alternative Proposal.
class(out) <- c("mlpack_adaboost", "mlpack_model_binding", "list")

Again, to reiterate, it's fine to keep the "Proposed" version that is contained in this PR.

@cgiachalis
Copy link

Namespacing is a good idea!

@rcurtin
Copy link
Member

rcurtin commented Nov 3, 2025

Is this one ready and waiting for review, or is there more to do? Sorry if I should have reviewed it and dropped the ball.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants