NOTE!!!: This package will no longer be maintained, and the API calls in this repo are exactly the same in torch-mfcc.
from conv_stft import STFT # same as below
from torch_mfcc import STFTAuthor: Shimin Zhang
The code refers to the following repo:
An STFT/iSTFT written up in PyTorch(py3) using 1D Convolutions. There are two window logic, break and continue.
break- a kaldi-like framing method
When the parameters win_len and fft_len are different, padding fft_len-win_len zero points after each frame( len(frame) = win_len ), and the window ( len(window) = win_len ) always wise-multiply with frame before padding.
continue- a librosa-like framing method.
When the parameters win_len and fft_len are different, framing the signal using win_len=fft_len, and zero padding on both sides of window ( len(window) = win_len ), which is len(center_pad(window))=fft_len
Install easily with pip:pip install conv_stft or download this repo, python setup.py install.
import torch
from conv_stft import STFT
import numpy as np
import librosa
import matplotlib.pyplot as plt
audio = librosa.load(librosa.util.example_audio_file(), duration=10.0, offset=30)[0]
device = 'cpu'
fft_len = 1024
win_hop = 256
win_len = 1024
window = 'hann'
audio = torch.FloatTensor(audio)
audio = audio.unsqueeze(0)
audio = audio.to(device)
stft = STFT(
fft_len=fft_len,
win_hop=win_hop,
win_len=win_len,
win_type=window,
).to(device)
magnitude, phase = stft.transform(audio, return_type='magphase') # 'magphase' or 'realimag'
output = stft.inverse(magnitude, phase, input_type='magphase') # 'magphase' or 'realimag'
output = output.cpu().data.numpy()[..., :]
audio = audio.cpu().data.numpy()[..., :]
print(np.mean((output - audio) ** 2)) # on order of 1e-15Output of compare_stft.py:
Test it by just cloning this repo and running
pip install -r requirements.txt
python -m pytest .