A Python 3 script for baseline correction, smoothing, processing and plotting of Raman spectra. Data must be in the format wavenumber [space] intensity. The baseline correction uses the asymmetrically reweighted penalized least squares smoothing algorithm (arPLS). The Whittaker filter is (by default) applied for smoothing. Optionally, the Savitzky–Golay filter can be used. Data of the processed spectra can be saved as "csv"-like data files in the format wavenumber [delimiter] intensity. An overlay spectrum (normalized and not normalized) and a normalized stacked spectrum of all processed spectra can be plotted. Plots can be saved as PNG bitmap files and as PDF.
If you use the arPLS algorithm to process your spectra, please cite:
"Baseline correction using asymmetrically reweighted penalized least squares smoothing"
Sung-June Baek, Aaron Park, Young-Jin Ahna, Jaebum Choo, Analyst 2015, 140, 250-257
The Whittaker algorithm (sometimes also referred to as Whittaker-Eilers smoother) is adapted from:
"A perfect smoother"
Paul H. C. Eilers, Anal. Chem. 2003, 75, 3631-3636
which is based on:
"On a new method of gradutation"
E. T. Whittaker, Proceedings of the Edinburgh Mathematical Society 1922, 41, 63-75
numpy, scipy, matplotlib
Start the script with:
python3 raman-tl.py filenameto open a single file.
Start the script with:
python3 raman-tl.py *.txtto process all files with the extension .txt in the folder.
Under Windows, you have to open PowerShell first and start the script with:
python raman-tl.py (Get-ChildItem *.txt -Name)to process all files with the extension .txt in the folder.
If the plot window appears empty, please resize it. Check "Known issues" for further details.
In all cases a file summary.pdf will be created which contains the following plots:
On the first page (from top to bottom):
- raw spectrum with baseline plot (red)
- baseline corrected spectrum
- smoothed / filtered spectrum with peak annotation
On the following page(s):
- smoothed / filtered spectrum with peak annotation
- not normalized and normalized overlay spectra and normalized stacked spectra if the
-ooption was invoked
filename, required: filename(s), input file(s) in the formatwavenumber [space] intensity-lN, optional: the lambda parameter for the arPLS algorithm (default isN = 1000)-pN:M, optional: invokes the Savitzky–Golay filter,N:Mare the window length and polynomial order of the Savitzky–Golay filter-wN, optional: the lambda parameter for the Whittaker filter (default isN = 1)-xminN, optional: start spectra atNwave numbers-xmaxN, optional: end spectra atNwave numbers-tN, optional: threshold for peak detection, withNbeing the intensity (default is 5% from the maximum intensity)-mN, optional: multiply intensities withN(default isN = 1)-aN, optional: add or subtractNto / from wave numbers (default isN = 0)-iN, optional: add or subtractNto / from intensities (default isN = 0)-o, optional: show the normalized and not normalized overlay spectrum and the normalized stacked spectrum-n, optional: do not savesummary.pdf-sp,d, optional: save P(NG) and / or D(ATA) files. The filenames arefilename.pngand / orfilename-mod.datfor the single spectra. Data files are in the formatwavenumber [delimiter] intensity. The delimiter can be set in the script. The default delimiter is [space].Summary.png,overlay.png,overlay-normalized.png,stack-normalized.pngbitmaps will be saved as well, overlay and stacked spectra only if the-ooption has been invoked.
- The save values for the arPLS parameter
lambdastart from 1000. Smaller values will give sharper peaks, but broader peaks become part of the baseline. Check the red baseline curve in the summary page. - There is no way to turn off smoothing directly, but with two Savitzky-Golay parameters close together, e.g.
-p3:2or a Whittaker parameter-w0.01filtering is ineffective. - The window length for the Savitzky–Golay filter must be an odd number and the window length must be greater than the polynomial order.
- Polynomial based filters, such as the Savitzky–Golay filter, sometimes tend to overshoot in negative regions, especially with sharp signals in the Raman spectrum. Reduce the filtering (see above) is one way to solve this problem.
xminand orxmaxvalues outside the experimental wave number range will result in errors or strange outputs.-achanges the range forxminandxmax-iand-mchange the range for-t- The
.datfile contains the data of the processed spectrum in the given range as it is shown in the plot for the single spectrum. - The
-ooption invokes the overlay plots (normalized and not normalized) and the normalized stacked plot of all processed spectra. Normalized means, that the intensities are divided by the maximium intensity in the given intensity range. The maximum intensity becomes unity. The peak detection threshold for the normalized spectrum is 0.05 (can be changed in the script:normalized_height). - The delimiter in the
.datfile can be changed in the script:dat_delimiter = " "ordat_delimiter = " ; "for example. - The files
summary.pdf,summary.png,overlay.png,overlay-normalized.png,stack-normalized.pngwill be overwritten every time the script is started (with respective options) in the same directory. Single spectra with the same filenames will be overwritten as well. Rename them if you want to keep them.
- Some of the peaks that are close together are not annotated. To change this, one can reduce the
peak_distancein the script, which is by defaultpeak_distance = 8. - Peak annotations can be overprinted by other peak annotations in the overlay spectrum. There is no workaround for this. If annotations are in the same position, one can uncomment the instruction under
#no dupesin the script, then only one annotation is displayed. - The legend obscures part of the spectrum. If this is a problem, one can change the position of the legend in the script or prevent the legend from being printed at the spectrum (try to change
head_space_y_o_sin the script for the overlay and stacked spectra). - Recent versions of Matplotlib and Python may encounter an issue where the plot window appears empty. As a temporary solution, resizing the window seems to solve the problem. This issue is currently unresolved: matplotlib/matplotlib#25768. However, with Matplotlib version 3.9.0, everything functions as expected.
Remember, under Windows you have to open PowerShell first and start the script with:
python raman-tl.py (Get-ChildItem *.txt -Name)to open more than one file at once.
python3 raman-tl.py s*.txtProcess all files starting with s and the extension .txt.
Summary:
Single spectra:
python3 raman-tl.py sample-A.txt -xmin 600 -xmax 800 -spd Process spectrum sample-A.txt in the range from xmin = 600 to xmax = 800 cm-1 and save the PNG and DATA files (-spd).
Summary:
Single spectrum:
python3 raman-tl.py sample-A.txt -l10000 -p7:4 -xmin 600 -xmax 800 -t50 -spd Process spectrum sample-A.txt with lambda = 10000 (baseline parameter), window length = 7 and polynomial order = 4 (smoothing parameters) in the range from xmin = 600 to xmax = 800 cm-1, annotate peaks with intensities equal or greater than t = 50 and save the PNG and DATA files (-spd).
Summary:
Single spectrum:
python3 raman-tl.py sample-A.txt sample-B.txt -o -xmin 200 -xmax 1100 -spProcess spectra sample-A.txt and sample-B.txt in the range from xmin = 200 to xmax = 1100 cm-1, plot the overlay and stacked spectra (-o) and save the PNG files (-sp).
Overlay spectrum (not normalized):
Overlay spectrum (normalized):
Stacked spectrum (normalized):