-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH Add array API for PolynomialFeatures #31580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ENH Add array API for PolynomialFeatures #31580
Conversation
Some benchmarks: Kaggle notebook Avg fit time for numpy: 0.007334113121032715 Avg fit time for torch cuda: 0.050480985641479494 from time import time
import numpy as np
import torch as xp
from tqdm import tqdm
from sklearn._config import config_context
from sklearn.preprocessing._polynomial import PolynomialFeatures
X_np = np.random.rand(100000, 100)
X_xp_cuda = xp.asarray(X_np, device="cuda")
# Numpy benchmarks
fit_times = []
transform_times = []
for _ in tqdm(range(10), desc="Numpy Flow"):
start = time()
pf_np = PolynomialFeatures(degree=2)
pf_np.fit(X_np)
fit_times.append(time() - start)
start = time()
pf_np.transform(X_np)
transform_times.append(time() - start)
avg_fit_time = sum(fit_times) / 10
avg_transform_time = sum(transform_times) / 10
print(f"Avg fit time for numpy: {avg_fit_time}")
print(f"Avg transform time for numpy: {avg_transform_time}")
# Torch cuda benchmarks
fit_times = []
transform_times = []
for _ in tqdm(range(10), desc="Torch cuda Flow"):
with config_context(array_api_dispatch=True):
start = time()
pf_xp = PolynomialFeatures(degree=2)
pf_xp.fit(X_xp_cuda)
fit_times.append(time() - start)
start = time()
pf_xp.transform(X_xp_cuda)
transform_times.append(time() - start)
avg_fit_time = sum(fit_times) / 10
avg_transform_time = sum(transform_times) / 10
print(f"Avg fit time for torch cuda: {avg_fit_time}")
print(f"Avg transform time for torch cuda: {avg_transform_time}") Local System with MPS (just changed device and dtype to float32 in the above code) Avg fit time for numpy: 0.0025035619735717775 Avg fit time for torch mps: 0.16063039302825927 I don't think we can expect any improvements (and actually some downgrade) in the fit time because I did not have to change anything in the fit part to support array api which means it can't really benefit from it. But the transform times are significantly better. CC: @ogrisel @lesteve @lucyleeow @StefanieSenger for reviews |
The code coverage warning can be ignored because it is related to a special case for mps devices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @OmarManzoor. This looks good to me.
I edited the benchmark as follows to insert a block call on the resulting array to force cuda synchronization. We still get a 36x speed-up on this data and runtime!
from time import time
import numpy as np
import torch as xp
from tqdm import tqdm
from sklearn._config import config_context
from sklearn.preprocessing._polynomial import PolynomialFeatures
X_np = np.random.rand(100000, 100)
X_xp_cuda = xp.asarray(X_np, device="cuda")
# Numpy benchmarks
fit_times = []
transform_times = []
n_iter = 10
for _ in tqdm(range(n_iter), desc="Numpy Flow"):
start = time()
pf_np = PolynomialFeatures(degree=2)
pf_np.fit(X_np)
fit_times.append(time() - start)
start = time()
pf_np.transform(X_np)
transform_times.append(time() - start)
avg_fit_time_numpy = sum(fit_times) / n_iter
avg_transform_time_numpy = sum(transform_times) / n_iter
print(f"Avg fit time for numpy: {avg_fit_time_numpy:.3f}")
print(f"Avg transform time for numpy: {avg_transform_time_numpy:.3f}")
# Torch cuda benchmarks
fit_times = []
transform_times = []
for _ in tqdm(range(n_iter), desc="Torch cuda Flow"):
with config_context(array_api_dispatch=True):
start = time()
pf_xp = PolynomialFeatures(degree=2)
pf_xp.fit(X_xp_cuda)
fit_times.append(time() - start)
start = time()
float(pf_xp.transform(X_xp_cuda)[0, 0])
transform_times.append(time() - start)
avg_fit_time_cuda = sum(fit_times) / n_iter
avg_transform_time_cuda = sum(transform_times) / n_iter
print(
f"Avg fit time for torch cuda: {avg_fit_time_cuda:.3f}, "
f"speed-up: {avg_fit_time_numpy / avg_fit_time_cuda:.1f}x"
)
print(
f"Avg transform time for torch cuda: {avg_transform_time_cuda:.3f} "
f"speed-up: {avg_transform_time_numpy / avg_transform_time_cuda:.1f}x"
)
Numpy Flow: 100%|██████████| 10/10 [00:37<00:00, 3.70s/it]
Avg fit time for numpy: 0.008
Avg transform time for numpy: 3.695
Torch cuda Flow: 100%|██████████| 10/10 [00:01<00:00, 9.76it/s]
Avg fit time for torch cuda: 0.001, speed-up: 6.8x
Avg transform time for torch cuda: 0.100 speed-up: 36.8x
I think the supported_float_dtypes
function could be simplified by leveraging the new inspection API. Otherwise, +1 for merge.
I also get a 5.5x speed-up over numpy using the MPS GPU on my M1 laptop (compared to your 25x speed-up on your MPS GPU). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more follow-up comment below.
Besides, the __sklearn_tags__
method should be updated to declare that this transformer supports array API inputs.
return (xp.float64, xp.float32, xp.float16) | ||
else: | ||
return (xp.float64, xp.float32) | ||
valid_float_dtypes.append(xp.float16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed? I think it can be wrong: some devices might not support float16
even when the namespace exposes it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The array API specification does not support float16 which is why we have this condition. https://data-apis.org/array-api/latest/API_specification/data_types.html
kind="real floating", device=device | ||
) | ||
valid_float_dtypes = [] | ||
for dtype_key in ("float64", "float32"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for dtype_key in ("float64", "float32"): | |
for dtype_key in ("float64", "float32", "float16"): |
Reference Issues/PRs
Towards #26024
What does this implement/fix? Explain your changes.
preprocessing.PolynomialFeatures
Any other comments?