Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions doc/whats_new/v1.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,11 @@ Changelog
- |Enhancement| :func:`datasets.dump_svmlight_file` is now accelerated with a
Cython implementation, providing 2-4x speedups.
:pr:`23127` by :user:`Meekail Zain <micky774>`

- |Enhancement| Path-like objects, such as those created with pathlib are now
allowed as paths in :func:`datasets.load_svmlight_file` and
:func:`datasets.load_svmlight_files`.
:pr:`19075` by :user:`Carlos Ramos Carreño <vnmabus>`

:mod:`sklearn.decomposition`
............................
Expand Down
14 changes: 11 additions & 3 deletions sklearn/datasets/_svmlight_format_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,12 +85,15 @@ def load_svmlight_file(

Parameters
----------
f : str, file-like or int
f : str, path-like, file-like or int
(Path to) a file to load. If a path ends in ".gz" or ".bz2", it will
be uncompressed on the fly. If an integer is passed, it is assumed to
be a file descriptor. A file-like or file descriptor will not be closed
by this function. A file-like object must be opened in binary mode.

.. versionchanged:: 1.2
Path-like objects are now accepted.

n_features : int, default=None
The number of features to use. If None, it will be inferred. This
argument is useful to load several files that are subsets of a
Expand Down Expand Up @@ -182,8 +185,10 @@ def get_data():
def _gen_open(f):
if isinstance(f, int): # file descriptor
return io.open(f, "rb", closefd=False)
elif isinstance(f, os.PathLike):
f = os.fspath(f)
elif not isinstance(f, str):
raise TypeError("expected {str, int, file-like}, got %s" % type(f))
raise TypeError("expected {str, int, path-like, file-like}, got %s" % type(f))

_, ext = os.path.splitext(f)
if ext == ".gz":
Expand Down Expand Up @@ -249,13 +254,16 @@ def load_svmlight_files(

Parameters
----------
files : array-like, dtype=str, file-like or int
files : array-like, dtype=str, path-like, file-like or int
(Paths of) files to load. If a path ends in ".gz" or ".bz2", it will
be uncompressed on the fly. If an integer is passed, it is assumed to
be a file descriptor. File-likes and file descriptors will not be
closed by this function. File-like objects must be opened in binary
mode.

.. versionchanged:: 1.2
Path-like objects are now accepted.

n_features : int, default=None
The number of features to use. If None, it will be inferred from the
maximum column index occurring in any of the files.
Expand Down
12 changes: 11 additions & 1 deletion sklearn/datasets/tests/test_svmlight_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
import pytest

from sklearn.utils._testing import assert_array_equal
from sklearn.utils._testing import assert_array_almost_equal
from sklearn.utils._testing import assert_array_almost_equal, assert_allclose
from sklearn.utils._testing import fails_if_pypy

import sklearn
Expand Down Expand Up @@ -89,6 +89,16 @@ def test_load_svmlight_file_fd():
os.close(fd)


def test_load_svmlight_pathlib():
# test loading from file descriptor
with resources.path(TEST_DATA_MODULE, datafile) as data_path:
X1, y1 = load_svmlight_file(str(data_path))
X2, y2 = load_svmlight_file(data_path)

assert_allclose(X1.data, X2.data)
assert_allclose(y1, y2)


def test_load_svmlight_file_multilabel():
X, y = _load_svmlight_local_test_file(multifile, multilabel=True)
assert y == [(0, 1), (2,), (), (1, 2)]
Expand Down