[RFC] Always convert lists of lists of numbers to numpy arrays during input validation.

### Describe the workflow you want to enable

Transformers and Estimators accept list of lists of numbers as valid for inputs like `X`.

Yet, when it comes to access to some basic attributes of the datasets (like the shape and the dtype which are present for numpy array) or to reach the best performances (e.g. be able to use Cython implementation which only operates on continuous buffers of memory), list of lists of numbers structure is inconvenient.

Also lists of lists really are used for simple examples (such as doctests) but are unlikely used in practice.

### Describe your proposed solution

I propose changing inputs validation to always convert list of list of numbers to their associated natural numpy array.

In this context:
 - lists of lists of Python `int` will be converted to 2D numpy array of `np.int64`
 - lists of lists of Python `float` will be converted to 2D numpy array of `np.float64`
 - a `RuntimeError` will be raised if leaf element aren't numbers
 - a `RuntimeError` will be raised if internals list have different length (the case of ragged array)

There might be some cost and maintenance complexity in converting list of lists to numpy array.

Changes mostly need be made in:

 - `BaseEstimator._validate_data`:
    https://github.com/scikit-learn/scikit-learn/blob/1dc23d7a1a798151a45ce1d72954821d61728411/sklearn/base.py#L453-L460
 
 - `sklearn.utils.check_array`:
   https://github.com/scikit-learn/scikit-learn/blob/7b0a16206558b0297a7f38a5a17fe06970a45894/sklearn/utils/validation.py#L629-L644

### Describe alternatives you've considered, if relevant

Continue supporting list of lists of numbers and introduce utility functions to be able to get basic attributes of the datasets which this structure.

### Additional context

Listing references as I find them:
 - [`43a61c4` (#22665)](https://github.com/scikit-learn/scikit-learn/pull/22665/commits/43a61c48c14192f99dfb5c49460d745c5377e0a5)
 - https://github.com/scikit-learn/scikit-learn/pull/23958/files#r1003462690

	def _validate_data(
	self,
	X="no_validation",
	y="no_validation",
	reset=True,
	validate_separately=False,
	**check_params,
	):

	def check_array(
	array,
	accept_sparse=False,
	*,
	accept_large_sparse=True,
	dtype="numeric",
	order=None,
	copy=False,
	force_all_finite=True,
	ensure_2d=True,
	allow_nd=False,
	ensure_min_samples=1,
	ensure_min_features=1,
	estimator=None,
	input_name="",
	):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC] Always convert lists of lists of numbers to numpy arrays during input validation. #24745

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[RFC] Always convert lists of lists of numbers to numpy arrays during input validation. #24745

Description

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions