Post-processing data with Matlab®
Best Practice
TMR7 - 01/01/2014 - Valentin Chabaud
[email protected]
• Cleaning data
• Filtering data
• Extracting data’s frequency content
Introduction
• A trade-off between do-it-yourself philosophy, time spent on side tasks and
quality of the results
• Keeping data as is and default settings while filtering / computing the
power spectral density leads to inaccurate, or even misleading results
which are hard to comment on. Fault is often mistakenly taken back to
measurement uncertainties.
• Many possibilities in Matlab (various toolboxes and built-in functions of
various complexity and flexibility)
The following is only a suggestion of efficient methods to save time. Help
will be preferably provided for those methods. You are however free to
choose your own as long as you keep a critical eye on the underlying
uncertainties.
Cleaning data
• Equipment limitations lead to:
• Erroneous data: Infinite (very large) or NaN (not a number).
• Missing data: 0. Can occur for a somewhat long period of time and thus
affects the results even if the mean value is small, even 0.
• Acquired data should be already uniformly sampled (constant step
size). However for safety, run the function:
Selected time array t=tstart:dt:tend
Uniformly sampled x=interp1(t0,x0,t)
selected data
Raw data and time arrays
Which also cuts the data to the desired time span.
Cleaning data cont.
• The data can be cleaned by the function:
Original data (uniformly sampled)
xclean=clean_data(x,CrtSTD,CrtCONV)
Cleaned data
Iterative outlier criterion Convergence criterion
Play around with these criteria to get the desired result
• Home made function. Tested on Labtest 4 last year. Yet, always check
the results! Modifications and suggestions are welcome.
• clean_data function is found in the Resource-section of the TMR7
webpage and at the end of this presentation
𝜇 : Mean value
𝜎 : Standard deviation
How clean_data works
If
𝑥𝑖 − 𝜇𝑥 ≥ 𝐶𝐶𝐶𝐶𝐶𝐶 ∗ 𝜎𝑥
Recompute 𝜇𝑥 and 𝜎𝑥 and
Or iterate until it has converged:
𝜎𝑥 𝑛 − 𝜎𝑥 𝑛−1
≤ 𝐶𝐶𝐶𝐶𝐶𝐶𝐶
𝜎𝑥 𝑛−1
Signal
Less error is induced by
𝑥̇ 𝑖 ≥ 𝐶𝐶𝐶𝐶𝐶𝐶 ∗ 𝜎𝑥̇
keeping corrupt points than
Or 𝜎𝑥̇ simply removing them!
𝑥̇ 𝑖 ≤
10 ∗ 𝐶𝐶𝐶𝐶𝐶𝐶
Then
Replace 𝑥𝑖 by a linear interpolation
of the nearest valid points
Derivative
Filtering data
Digital Butterworth filters:
• Most commonly used filters for this kind of application. One is already
in place in the data acquisition set up, removing very high frequencies.
𝑏 1 +𝑏 2 𝑧 −1 +⋯+𝑏(𝑛+1)𝑧 −𝑛
• Described by a transfer function 𝐻 𝑧 =
1+𝑎 2 𝑧 −1 +⋯+𝑎(𝑛+1)𝑧 −𝑛
• Designed by ’low’ low-pass filter filters frequencies > cutoff freq.
’high’ high-pass filter filters frequencies < cutoff freq.
Order of the filter ’bandpass’ band-pass filter filters frequencies outside the
cutoff freq interval.
[b,a]=butter(order,wstar,’ftype’)
Normalized cutoff frequency 𝐶𝐶𝐶𝐶𝐶𝐶 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
Or interval of frequencies 𝑤∗ =
𝑁𝑁𝑁𝑁𝑁𝑁𝑁 𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓𝑓
(bandpasss filter)
Filtering data cont.
How does it work?
A function, called gain, attenuates some parts of the frequency content of the signal.
𝑥𝑓𝑓𝑓𝑓 𝑡 = 𝔉 −1 𝐺 𝜔 ∗ 𝔉 𝑥 𝑡 𝜔 (𝑡)
Filtered signal IFFT Gain FFT of the signal
• In the frequency domain, no difference is made from 2 different processes having the same frequency
In order for filtering to be successful, undesired processes should have a distinct frequency content from
that of the studied process.
• 𝐺 𝜔 must be continuous for the IFFT to exist.
The attenuation evolves gradually with the frequency. A sharp cut in the frequency content is not possible
with low order filters.
Filtering data cont.
The filtering effect is best described by Bode diagrams of the filter’s
continuous transfer function
Cut-off frequency
[b,a]=butter(order,wstar,’low’)
Figure()
Bode(d2c(tf(b,a,dt)))
Slope in gain reduction:
• = «filtering strength»
Discrete to Discrete transfer • Increasing with the order
continuous function • Increasing with frequency (for a
low-pass filter) from cut-off
frequency
The cut-off frequency should be
Filtering induces a phase lower than the undesired
shift in the signal, increasing frequencies, but higher than the
with the order frequencies of interest.
Else the signal will be badly filtered
or the amplitude attenuated!
Filtering data cont.
• A so-called “spectral gap” is needed for efficient filtering
= No energy in the spectrum around the cut-off frequency
If this is not the case, uncertainties will be introduced, take note of
them!
• To avoid phase shift (improves readability in time domain plots), use:
Original data (uniformly sampled)
xfilt=filtfilt(b,a,x)
Filtered data
Digital filter coefficients
Extracting PSD (Power Spectral Density)
2 approaches: Autocorrelation in time-domain
+∞ FFT +∞
𝑅 𝑥𝑥 (𝜏) = � 𝑥 𝑡 𝑥̅ 𝑡 − 𝜏 𝑑𝑑 𝑆 𝑥𝑥 (𝑓) = � 𝑅 𝑥𝑥 𝜏 𝑒 −𝑗𝑗𝑗𝑗𝑗 𝑑𝑑
−∞ −∞
PSD
FFT Square
+∞
𝑆 𝑥𝑥 (𝑓) = 𝔉𝑥 (𝑓) 2
𝔉𝑥 𝑓 = � 𝑥 𝑡 𝑒 −𝑗𝑗𝑗𝑗𝑗 𝑑𝑑
−∞
• Upper: pcov function and variants. Sensitive to signal manipulations.
• Lower: pwelch function and variants. Sensitive to signal length.
Extracting PSD cont.
pwelch is recommended. Default values often lead to inaccurate results.
Number of overlapping samples between
Time series. Uniformly sampled. windows. Window/10 is a good start. Change
Preferably minus mean value. if you suspect inaccuracies.
PSD
Sampling frequency (Hz)
[Sxx,f]=pwelch(x,Window,Noverlap,NFFT,fs)
Total number of points used for computation.
Frequencies (Hz) Use the whole length of the signal.
The signal is segmented into «windows». The FFT is computed segment by segment
which are then assembled to give the PSD.
The broader the window, the finer the spectrum. The narrower, the smoother. Adjust
it to get a readable yet accurate spectrum (Use values from NFFT/2 to NFFT/10).
Extracting PSD cont.
pwelch may give inaccurate results for short signals with transients
(oscillations in low frequencies). Try to play around with the Noverlap
parameter.
If not successful, or if you want another source to check the result, try
[Sxx,f] = pcov(x,order,NFFT,fs)
The smoothness of the spectrum is adjusted through the order parameter.
pcov is altered by the cleaning process. Use it preferably on the raw signal
and take note of the introduced uncertainties.
Example: Irregular wave elevation
Generated from JONSWAP spectrum, then added:
• Erroneous and missing data
• Measurement noise (high frequencies)
• Transients (low frequency)
• Mean offset (zero frequency)
Example cont.
duration=200;
dt=0.1;
t=0:dt:duration;
Nt=length(t);
xclean=clean_data(x,5,0.0001);
cutoff=[0.3 4]/(2*pi); %Cut-off frequencies
fnyq=1/(2*dt); %Nyquist frequency
[b,a]=butter(4,cutoff/fnyq,'bandpass');
xfilt=filtfilt(b,a,xclean);
[Sxx0,f0]=pwelch(x-mean(x),round(Nt/2),round(Nt/10),Nt,1/dt);
[Sxx,f]=pwelch(xfilt-mean(xfilt),round(Nt/2),round(Nt/10),Nt,1/dt);
figure(1)
plot(t,[x0 x xclean xfilt])
figure(2)
plot(w,jonswap,f0*2*pi,Sxx0,f*2*pi,Sxx)
Example cont.
Outlier
Cut-off frequencies
Transients
Noise
Uncertainties due to
short time series
Period of missing data Uncomplete spectral gap: Large spectral gaps allowing
Band-pass filtering removes slightly uncertain filtering of efficient filtering of the noise
hig and low (including offset = the transients
0 rad/s) frequencies
Questions?
Now or later on, about this or anything related to the course, don’t hesitate.
[email protected]
Office G2.130
while abs((std(x)-std_prev)/std_prev)>CrtCONV
flag=0;
ind=[];
for i=1:length(x)
if abs(x(i)-mx)>sx*CrtSTD || abs(d(i))>sd*CrtSTD ||
abs(d(i))<sd/CrtSTD*0.1
if flag==0
flag=1;
ind=[ind;[i 0]];
clean_data.m else
end
if flag==1
ind(end,2)=i;
flag=0;
end
end
end
if(ind(end,end))==0
ind(end,end)=length(x);
end
function x=clean_data(data,CrtSTD,CrtCONV) y=[ones(N,1)*x(1);x;ones(N,1)*x(end)];
for i=1:size(ind,1)
x=data; inttot=(1:length(y))';
sx=std(x); intrem=ind(i,1)+N:ind(i,2)+N;
mx=mean(x); intfit=setdiff(inttot,intrem);
d=diff(x); z=y(intfit);
% f = fit(intfit, z, 'smoothingspline','SmoothingParam',
sd=std(d); 0.1);
d=[d;d(end)]; % y(intrem)=feval(f,intrem);
figure(3) y(intrem)=interp1(intfit,y(intfit),intrem);
plot([data d]) x=y(N+(1:length(x)));
std_prev=std(x)/CrtSTD; end
N=10; std_prev=std(x);
end