-
-
Notifications
You must be signed in to change notification settings - Fork 657
Fix: change the dtype of self._kernel when input args have a different dtype #3034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t dtype If both dtype do not have the torch default type, F.conv2D will fail because self._kernel is of the default dtype. This solve this issue by switching the dtype of self._kernel
It may be clearer this way. But _kernel should always have the pytorch default dtype, so the two solutions are equivalent. Co-authored-by: vfdev <[email protected]>
|
@MarcBresson thanks for a fix, can you also write a test here: https://github.com/pytorch/ignite/blob/master/tests/ignite/metrics/test_ssim.py, let's create a separate test function like |
|
Nice idea to write the test ! I just discovered that a few operation are not available in float16 on CPU. Working on a fix. |
|
Just need a little help, how to exclude a test from being tested on CPU? I don't see any exclusion pattern in And by the way, how is it possible that some tests failed while I haven't touched a thing yet in this remote repo? |
I think the simpliest way to do that is just call pytest:
GH CI is rather flaky so some jobs are failing due to issues with distributed configs for the tests. Do not worry about that. |
Some functions used in SSIM.update do not exist on CPU and half type, so CPU tests are excluded.
|
Here is a little analysis of the difference between ignite and scikit-image: code sample: import torch
from ignite.metrics import SSIM
import tqdm
from skimage.metrics import structural_similarity as ski_ssim
import numpy as np
from matplotlib import pyplot as pltdef test_ssim(available_device, shape, dtype):
y_pred = torch.rand(shape, device=available_device, dtype=dtype)
y = y_pred * 0.8
sigma = 1.5
data_range = 1.0
ssim = SSIM(data_range=data_range, sigma=sigma, device=available_device)
ssim.update((y_pred, y))
ignite_ssim = ssim.compute()
if y_pred.dtype == torch.bfloat16:
y_pred = y_pred.to(dtype=torch.float16)
skimg_pred = y_pred.cpu().numpy()
skimg_y = skimg_pred * 0.8
skimg_ssim = ski_ssim(
skimg_pred,
skimg_y,
win_size=y_pred.shape[0]-1,
sigma=sigma,
channel_axis=1,
data_range=data_range,
)
difference = ignite_ssim - skimg_ssim
return difference
diff_values = {}
for device in [torch.device("cpu"), torch.device("cuda")]:
available_dtypes = [torch.float32, torch.float64]
if device == torch.device("cuda"):
available_dtypes.extend((torch.bfloat16, torch.float16))
for dtype in available_dtypes:
diff[str(dtype) + " on " + str(device)] = []
for _ in tqdm.tqdm(range(1000)):
difference_ssim = test_ssim(device, (12, 3, 28, 28), dtype=dtype)
diff[str(dtype) + " on " + str(device)].append(difference_ssim)for key, values in diff.items():
# they are too fart apart, so we exclude them
if key in ["torch.float16 on cuda", "torch.bfloat16 on cuda"]:
continue
plt.hist(values, bins=10, label=key)
axes = plt.gca()
axes.ticklabel_format(style="scientific", scilimits=(0, 3))
plt.title("SSIM difference between ignite and scikit-image")
plt.xlabel("difference value")
plt.ylabel("frequency")
plt.legend()
plt.show() |
|
@MarcBresson can you run the following to fix formatting issue: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks for the PR @MarcBresson !
I'll merge it once required CI jobs is done.
By the way, while checking your PR I saw in the actual code that we are doing rather strange things with device of self._kernel. Would you like to tackle this problem?
Basically, self._kernel should respect self._device but there may be a trade-off to run all padding and conv ops on cuda before moving to self._device. We may want to measure that and make an appropriate decision.
|
@vfdev-5 you are welcome! Happy to help. I think that The |
|
Well, this code: seems a bit weird as only kernel with ndims < 4 will have y_pred.device and do we need to change |
|
Just a new analysis on dtype. These are the comparisons of the precision between float32 and float16, bfloat16 and float64. The error is not absolute so we can see if a dtype overshoots compared to float32. import tqdm
import torch
from ignite.metrics import SSIM
from matplotlib import pyplot as pltdef test_ssim_dtype_error(available_device, shape, dtype):
y_pred_float32 = torch.rand(shape, device=available_device, dtype=torch.float32)
y_float32 = y_pred_float32 * 0.8
y_pred_typed = torch.clone(y_pred_float32)
y_pred_typed = y_pred_typed.to(dtype=dtype)
y_typed = y_pred_typed * 0.8
sigma = 1.5
data_range = 1.0
ssim = SSIM(data_range=data_range, sigma=sigma, device=available_device)
ssim.update((y_pred_float32, y_float32))
ssim_float32 = ssim.compute()
ssim_ty = SSIM(data_range=data_range, sigma=sigma, device=available_device)
ssim_ty.update((y_pred_typed, y_typed))
ssim_typed = ssim_ty.compute()
difference = ssim_float32 - ssim_typed
return difference
diff = {}
for device in [torch.device("cuda")]:
available_dtypes = [torch.float32, torch.float64]
if device == torch.device("cuda"):
available_dtypes.extend((torch.bfloat16, torch.float16))
for dtype in available_dtypes:
diff[str(dtype) + " on " + str(device)] = []
for _ in tqdm.tqdm(range(1000)):
difference_ssim = test_ssim_dtype_error(device, (12, 3, 28, 28), dtype=dtype)
diff[str(dtype) + " on " + str(device)].append(difference_ssim)for dtype in diff:
plt.hist(diff[dtype], bins=50)
axes = plt.gca()
axes.ticklabel_format(axis='x', style="scientific", scilimits=(0, 0))
plt.title(f"SSIM difference between float32 and a {dtype}")
plt.xlabel("difference value")
plt.ylabel("frequency")
plt.show() |
When the SSIM metric is given a float16 (Half) or float64 (Double), it fails because self._kernel will be of the default type.
There is two ways around that:
torch.set_default_dtype(torch.float16).This pull request put the second option in action. The result is now transparent to the user who can use float16 or float64 on GPU directly.
A new test has been added to ensure of the right behaviour of this change.
Check list: