0% found this document useful (0 votes)

21 views8 pages

SEindicator

scientific paper about signal to error indicatior based on net analyte signal theory

Uploaded by

5n2nnm2r9p

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views8 pages

SEindicator

scientific paper about signal to error indicatior based on net analyte signal theory

Uploaded by

5n2nnm2r9p

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

New Indicator for Optimal Preprocessing and Wavelength

Selection of Near-Infrared Spectra

E. T. S. SKIBSTED, H. F. M . BOELENS, J. A. WESTERHUIS,* D. T. WITTE, and

A. K. SM ILDE
Process Analysis and Chemometrics, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands
(E.T.S.S., H.F.M.B., J.A.W., A.K.S.); Pharmaceutical Site Maaloev, Novo Nordisk A/S, Novo Nordisk Park, 2760 Maaloev, Denmark
(E.T.S.S.); and QA/ALC-A, N.V. Organon, Molenstraat 10, P.O. Box 20, 5340 BH Oss, The Netherlands (D.T.W.)

Preprocessing of near-infrared spectra to remove unwanted, i.e., can easily be obtained. One known problem in near-in-
non-related spectral variation and selection of informative wave- frared spectroscopy is spectral variations that are not re-
lengths is considered to be a cru cial step prior to the construction lated to the property of interest.8 This non-related varia-
of a quantitative calibration model. The standard m ethodology
tion is especially important in pharm aceutical applica-
when comparing various preprocessing techniques and selecting dif-
feren t wavelengths is to compare prediction statistics computed with
tions of NIR. In the pharm aceutical industry, spectra are
an independent set of data not used to make the actual calibration often recorded in re ectance m ode. Var ying particle sizes
model. When the erro rs of reference value are large, no such values and var ying compression of, e.g., powders cause non-
are available at all, or only a limited number of samples are avail- related spectral variation. To correct for this variation var-
able, other methods exist to evaluate the preprocess ing method and ious spectral preprocessing techniques are used prior to
wavelength selection. In this work we present a new indicator (SE) calibration, e.g., multiplicative scatter correction 9 (M SC),
that only req uires blank sample spectra, i.e., spectra of samples that offset correction, or Savitzky–Golay 10 derivatives. Anoth-
are m ixtures of the interferin g constituents (everything except the er problem is that if a large part of the recorded spectrum
analyte), a pure analyte spectrum , or alternatively, a sample spec-
does not contain any information about the analyte,
trum where the analyte is present. The indicator is based on com -
puting the net analyte signal of the analyte and the total erro r, i.e., wavelength selection becomes very important. Several
instrumental noise and bias. By com paring the indicator values methods have been proposed for wavelength selec-
when different preprocess ing techniques and wavelength selections tion.11,12 Until recently, it was believed that full spectrum
are applied to the spectra, the optimal preprocessing technique and methods, e.g., PLS, would automatically overcome the
the optimal wavelength selection can be determined without knowl- problem of wavelength selection by setting the regression
edge of reference values, i.e., it minimizes the non-related spectral coef cients for non-informative wavelengths to zero or
variation. The SE indicator is compared to two other indicators that near zero. However, this is not the case and PLS-based
also use net analyte signal computations. To demonstrate the fea-
calibrations can in m any cases be improved by a proper
sibility of the SE indicator, two near-infrared spectral data sets
from the pharmaceutical industry were used, i.e., diffuse re ectance
selection of wavelengths.13
spectra of powder samples and transmission spectra of tablets. Es- The most com mon way of judging whether a prepro-
pecially in pharmaceutical spectroscopic applications, it is expected cessing method is bene cial for the analytical perfor-
beforehand that the non-related spectral variation is rather large mance is to compute the prediction uncertainty for an
and it is important to remove it. The indicator gave excellent results independent test set, i.e., the root mean square error of
with respect to wavelength selection and optimal preprocessing. The prediction (RMSEP) or root mean square error of predic-
SE indicator performs better than the two other indicators, and it tion cross-validation (RM SECV) if only a smaller dataset
is also applicable to other situations where the Beer– Lambert law
is available, and then select the preprocessing method that
is valid.
gives the lowest RMSEP/RM SECV. Some pitfalls with
Index Headings: Spectra l prep rocessin g; W avelength selection;
this method are that it requires a fairly large number of
Near-infrared spectroscopy; Error indicator; Net analyte signal;
Signal-to-noise ratio; Pharmaceutical powders and tablets.
samples, i.e., both calibration and test set data. Secondly,
if the uncertainty of the reference values is high then
judgments are based on reference values with errors. Fi-
INT RODUCTIO N nally, when using PCR or PLS the RM SEP/RMSECV
values are in uenced by the m odel dimensionality. If the
Near-infrared (NIR) spectroscopy is gaining popularity model dimensionality is not estimated correctly with
as a quantitative analytical method in the pharmaceutical some kind of validation technique, then the RM SEP/
industry.1–3 Quality control of incoming raw materials RM SECV values will be misleading and therefore, judg-
and quantitative analysis of intermediate 4,5 and nalized ments of preprocessing method selection or wavelength
products3 are examples of that. Spectra can be recorded selection m ay also be incorrect.
quickly and in a non-invasive m anner and they can be Other m ethods exist to help choose the optimal pre-
combined with a m ultivariate calibration technique, e.g., processing method, i.e., methods using the net analyte
principal component regression 6 (PCR) and partial least- signal (NAS) concept. Net analyte signal is de ned as the
squares regression 7 (PLS), whereby quantitative measures part of a signal that is unique for the analyte of interest.14
Lorber 14 demonstrated how gures of merit, e.g., multi-
Received 19 June 2003; accepted 30 October 2003.
variate sensitivity, signal-to-noise ratio, selectivity, and
* Author to whom correspondence should be sent. E-mail: westerhuis@ limit of detection could be computed from the net analyte
science.uva.nl. signal of the analyte. These gures of m erit can be used

0003-7028 / 04 / 5803-0264$2.00 / 0
264 Volume 58, Number 3, 2004 q 2004 Society for Applied Spectroscop y
APPLIED SPECTROSCOPY
to judge whether a preprocessing method is bene cial for vector. S 2 k is a J 3 L matrix with L spectra of blank
the analytical performance, and they can also be used for samples. In some publications14 pure spectra of the inter-
wavelength selection. Faber 15 used the inverse multivar- fering constituents are used to construct the S 2 k matrix.
iate sensitivity of the analyte to judge whether a certain In our experience this is not the best m ethod, e.g., pure
preprocessing method, e.g., derivative, would improve spectra are not always available and the pure constituent
the predictive ability of the calibration model or not. Xu spectrum m ight differ slightly in shape from the spectral
and Schechter16 developed an error indicator for wave- contribution in a m ixture of interfering constituents. Prac-
length selection. Boelens et al. have also demonstrated tically, the S 2 k matrix is m ost easily constructed by mea-
the usability of NAS for improving the detection limit suring m ixtures of the interfering constituents. S12 k is its
for a spectroscopic process analysis by tuning Savitzky– Moore–Penrose inverse, an L 3 J m atrix. r *k and s*k are
Golay lters. 17 All these m ethodologies use the net an- J 3 1 vectors called the net analyte signal vector of the
alyte signal of the analyte of interest. kth constituent. The net analyte signal for constituent k
In this work we introduce a new error indicator called in any sample can now be computed with Eq. 1.
the signal-to-error indicator (SE). A signal-to-error (SE) Inverse Sensitivity. Various gures of merit 14 can be
value is computed for the analyte when various prepro- computed using the net analyte signal concept, e.g., an-
cessing m ethods and wavelength selections are applied to alyte sensitivity. Faber 15 evaluated the effect of various
the spectra. The highest SE value indicates the optimal preprocessing methods of near-infrared spectra with an
preprocessing and wavelength interval. error indicator based on computation of the inverse of the
We will demonstrate the performance of the inverse analyte sensitivity (a2 1; from here we denote this as
sensitivity indicator, the error indicator, and the signal-to- invSEN) using the net analyte signal concept. Faber used
error indicator with two NIR data sets from different stag- the assumption that the length of the net analyte signal
es in a pharm aceutical tablet production process. The in- vector is proportional to the concentration of the analyte.
dicators are compared to the standard PLS methodology Faber converted the net analyte signal vector into a scalar
and the RMSECV. For the applications presented in this value by taking the Euclidean norm 15 of the net analyte
paper the PLS m ethod is used as a standard to which the signal vector and plotted the value against the analyte
other indicators can be compared. This is possible since concentration of the sample, thereby constructing a uni-
the reference method is known to be accurate. The rst variate calibration plot. The analyte sensitivity can then
set contains spectra of powder samples after mixing the be computed with:
tablet constituents. In the second data set, nalized tablets
using the powder composition from the rst data set are \ r *k,c \
a5 (3)
measured. In both cases the analyte is the active phar- c k,c
maceutical ingredient (API) and the optimal preprocess-
c k,c
ing and wavelength selection is sought. invSEN 5 a 2 1 5 (4)
First some theory about net analyte signal and the \ r *k,c \
method to compute gures of merit will be presented.
where \ r *k,c \ is the norm of the net analyte signal of a
Secondly, the different error indicators will be described
calibration sample with concentration c k,c and the slope
and compared. Then in the Experimental section the in-
of the calibration line a is the sensitivity of analyte k.
strumentation and different data sets used are described
Faber concluded that a preprocessing m ethod is ben-
in detail, and nally, in the Results and Discussion sec-
e cial for the nal predictive ability if the inverse sen-
tion the different error indicators are compared and the
sitivity is decreasing with that particular pretreatment.
results are commented on.
The effect on the inverse sensitivity when doing rst and
second derivatives compared to multiplicative scatter cor-
TH EORY
rected (MSC) spectra was evaluated. This indicator needs
Notation. Boldface capital characters denote matrices, a collection of spectra to span the interference space and
boldface lower-case characters denote vectors, and lower- spectra containing the analyte and their respective refer-
case italic characters denote scalars. \ r \ is the Euclidean ence concentrations of the analyte to compute analyte
norm of the vector r , superscript T denotes the transposed sensitivity.
matrix or vector, and the superscript 1 denotes the Error Indicator. Xu and Schechter16 developed an er-
Moore–Penrose generalized inverse of a matrix. The ma- ror indicator (EI) for wavelength selection. The assump-
trix I J is the J 3 J identity matrix. tion for their EI is that the prediction error in multivariate
Net Analyte Signal. The net analyte signal is de ned analysis is determined by the quality of the corresponding
as the part of a spectrum that is orthogonal to a subspace net analyte signal. By minimizing the relative error in the
spanned by the spectra of all constituents except the an- norm of the NAS, the analytical conditions are optimized
alyte, i.e., all interfering constituents.14 So the net analyte and lower prediction errors are achieved. The EI was de-
signal of analyte k can be found by the following or- ned as follows:
thogonal projection:
var( \ r *k \ 2 \ r *k,true \ ) 1 / 2
r *k 5 (I J 2 S 2 k S 12 k )r k (1) EI 5 (5)
\ r *k,true \
s*k 5 (I J 2 S 2 k S 12 k )s k (2) Due to non-related variations (interferents or baseline
where r k is a J 3 1 vector containing the spectral re- offsets) the norm of the NAS may be affected. The nu-
sponse for a sample including the analyte k measured at merator of the EI describes the variance in the norm of
J wavenumbers. The pure analyte spectrum s k is a J 3 1 the NAS caused by noise in the spectra due to non-related

APPLIED SPECTROSCOPY 265

variations. Xu and Schechter assume that the noise in the
spectra due to non-related variations is homoscedastic,
i.e., each wavenumber has the same variance, and that
the noise is not correlated for neighboring wavenumbers.
In that case, the variance in the norm of the NAS due to
non-related variation can be written as follows:18
[(2 \ r *k,true \ s) 2 1 (Js 2 ) 2 ]
var( \ r *k \ 2 \ r *k,true \ ) 5 (6)
( \ r *k \ 1 \ r *k,true \ ) 2
Here J is the number of wavenumbers in the spectra
used. The standard deviation of the spectral noise de- F IG . 1. Geometrical display of the interference space and the net an-
scribed above is represented by s. Since \ r *k,true \ cannot be alyte signal vectors.
known, Xu and Schechter propose to replace it with \ r *k \ ,
which leads, according to Ferre and Rius,18 to the follow-
ing expression for the error indicator: s*k

[1 ]
PROJ blank 5 r Tblank · (11)
1/2
s*k T s*k
2
J s 2 2
s 2
1 1
4 \ r *k \ 2
PROJ blank 5 r Tblank · nas reg (12)
EI 5 (7)
\ r *k \ The error taking into account both bias and noise is
The standard deviation of the spectral noise, s, is found computed by:

Î
from the net analyte signal regression plot (NASRP).
O (PROJ
I
First take the NAS vector of the pure analyte spectrum 2 0) 2
blank, i
s*k and the NAS vector of a sample spectrum containing i5 1
error 5
the analyte r *k . Then the absorbance at each wavelength I

ÎO
j in s*k is plotted against the absorbance in r *k at the same
I
wavelength, for all j 5 1, . . . , J wavelengths in the vec-
tors. This results in the NASRP plot. In the ideal case (PROJ blank, i ) 2
i5 1
with no non-related variation, both NAS vectors will 5 (13)
I
point in the same direction and the points in the NASRP
plot will form a perfectly straight line passing (0, 0). The In the nominator we use I and not I 2 1 because no
assum ption made by Xu and Schechter16 is that at each mean is subtracted so the degrees of freedom are pre-
wavelength the error is norm ally distributed with the served.
same standard deviation, i.e., white noise. A straight line The signal is then computed by projecting the analyte
is tted through the points in the NASRP plot in a least- spectrum on the NAS regression vector and the SE can
square sense and by computing the residual vector, i.e., be computed as the ratio between the signal and the error:
deviation of each of the points from the line, s can be
computed: 18
Signal 5 sTk ·nas reg (14)

!
e Tk,res ·e k,res
s5 (8) Signal
J 21 SE 5 (15)
error
where e k,res is a J 3 1 vector containing the residuals. The
residuals are computed in the following manner: This error indicator needs a collection of blank spectra
to span the interference space and to quantify the error
e k,res 5 r *k 2 s*k c k (9) part plus the pure analyte spectrum. If the pure analyte
spectrum is not available, a sample spectrum containing
c k 5 r *k T ·s*k /\ s*k \ 2
(10)
the analyte can be used.
The error indicator needs a collection of blank spectra Although the error indicator and the signal-to-error in-
to span the interference space, the pure analyte spectrum, dictor seem to be comparable, there are some important
and a sample spectrum containing the analyte. differences. The EI minimizes the difference between the
Signal-to-Error Indicator. In this work we present a length of two vectors, r *k,true and r *k . However, these vec-
new indicator based on the computations of the signal- tors will not necessarily point in the same direction.
to-error (SE). We assum e that the error in the spectra is Therefore, the difference in lengths is not directly related
made of two contributions, i.e., noise and bias. If a certain to errors in concentration. The SE indicator focuses on
preprocessing method or wavelength selection is not re- errors in the direction of the NAS regression vector, i.e.,
moving unwanted interference, then extra blank samples the same direction. The projections on the NAS regres-
may have a small contribution orthogonal to S 2 k when sion vector are used (can also be negative) and not only
they are projected onto the interference space. We com- the lengths of the projected vector. These projections are
pute this contribution as the projection (PROJ blank ) of directly related to the concentrations (cf. Fig. 1).
some extra blank spectra (r blank ) on the norm ed s*k vector, Toolboxes for net analyte signal calibrations are avail-
i.e., norm ed to unit length. We call the norm ed s*k vector able for free download at http://www-its.chem.uva.nl/
for the net analyte signal regression vector nasreg . research/pac/index.html.

266 Volume 58, Number 3, 2004

EXPERIMENTAL
The powder samples were m easured with a BOM EM
M B 16 0 F T-N IR sp ectro m eter equ ipp ed w ith a
SpinningVialy accessory for measuring powder samples,
the samples were measured with diffuse re ectance, and
an InGa detector was used. The SpinningVialy accessory
measured through the sample vials through the side of
the glass vials (where the glass walls are assumed to be
the most homogeneous). The wave number range from
3800 cm 2 1 to 12 000 cm 2 1 was recorded and the spectral
resolution was set to 8 cm 2 1 . For each spectrum a total
of 32 scans were averaged (the scanning time for 32
scans measured with a spectral resolution of 8 cm 2 1 is
the same time as the SpinningVialy accessory uses to
spin the sample vial one revolution). The tablet samples
were measured with a BOMEM MB 160 FT-NIR spec-
trometer equipped with a TabletSamplIRy accessory.
The tablets were measured with a transmission measure-
ment and an InGaAs detector was used. The wave num- F IG . 2. Triangular m ixture design for powder sam ples.
ber range from 4000 cm 2 1 to 12 000 cm 2 1 was recorded
and the spectral resolution was set to 16 cm 2 1. W hen
measuring transm ission spectra of tablets, normally only The constituents were mixed manually with a metal spat-
broad peaks in the rst and second overtone region are ula. Then ller binder B, glidant C, and nally the glidant
useful for quanti cation and 16 cm 2 1 is a reasonable res- D were added, each time manually m ixing with a metal
olution. For each spectrum a total of 32 scans were av- spatula. Each sample was measured eight times in the
eraged. In both cases the data were collected with SpinningVialy accessory and between each measurem ent
GRAMS32 (Therm oGalactic.com, GRAM S/32, 1998) the sample was removed and shaken vigorously. The
software and imported into Matlab (MathWorks Inc., mean of the eight spectra was then used to represent the
Matlab ver. 12.1., 2001) with in-house written software. sample. The powder samples are generally problematic
Computations were perform ed in Matlab with in-house to measure because of the heterogeneous distribution of
written routines plus the PLSp toolbox (Eigenvector Re- the sample constituents, but other studies (not shown)
search, Inc. PLSp Toolbox. Version 2.1., 1998). have shown that the SpinningVialy accessory and the use
Dataset 1: Powder Samples. The samples were made of the mean spectrum is a valid methodology, and the
according to a triangular mixture design. The samples methodology has also been reported elsewere.19 As a ref-
contained ve constituents, i.e., the active pharmaceutical erence method, the weighed amount was used (gravi-
ingredient (API), two ller binders (A and B), and two metric) and the uncertainty of this value was believed to
glidants (C and D). Three doses are normally produced, be low, i.e., 610 2 4 g.
i.e., 0.64, 1.27, and 2.57 API w/w % (low, medium, and Dataset 2: Tablet Samples. No speci c experimental
high strength). To have samples that resemble the hetero- design was used for the tablet samples, but a small data
geneous nature of powder m ixtures, samples with over- set based on a strati ed sampling scheme was used. Tab-
and under-dose of API, ller binder A, and ller binder lets were taken from nine different production batches
B were produced according to a triangular m ixture design (pilot scale batches): three batches with placebo tablets,
(Fig. 2). Samples with 610% of target dose of API and i.e., blank samples without API, and six batches with API
610 and 620 w/w % of target dose of ller binder A in three different levels. From each batch two tablets
and ller binder B, respectively, were m ade while the were used, for a total of 18 tablets. Because it is not
added amounts of glidant C and glidant D were kept con- possible to measure a transmission spectrum of the pure
stant. Initial experiments (not shown here) indicated that API (s k ), we used a spectrum of a tablet from a batch
homogeneity of ller binder A and ller binder B could with high concentration of API as a replacement for the
be dif cult to obtain in a large-scale mixing process. It pure analyte spectrum. One tablet spectrum from each of
was therefore assumed that the span, i.e., 620% from the placebo batches, i.e., three spectra, were used to span
target concentration, of those constituents would resem- the interference space and the three rem aining spectra
ble the heterogeneity that could be expected in the inter- were used as blank samples to quantify the error.
ference m atrix, while glidant C and glidant D are as-
sumed to be less important and for practical reasons the RESULTS AND DISCUSSION
added amount was kept constant. Blank samples without A selection of different preprocessing methods (Table
API were also prepared (marked with squares in Fig. 2). I) that are normally 20,21 applied when doing preprocessing
The samples were prepared in 25 mL glass vials that of NIR spectra obtained from diffuse re ectance mea-
tted into the SpinningVialy accessory. The total sample surements of powders and transmission spectra of tablets
size was 8.0 g and the samples were prepared in the fol- were compared. For both data sets we compared the same
lowing manner. First the ller binder A was weighed with preprocessing methods. The wavelength selection can be
an electronic precision weight and transferred into the conducted in m any different ways. In this article we used
vial. Then API was weighed and transferred into the vial. the prior knowledge we have about the analyte, i.e., lo-

APPLIED SPECTROSCOPY 267

TABLE I. Preprocessing methods.
No. Method Note
1 No preprocessing
2 MSC
3 Offset Using 9990–10 000 cm 2 1 as offset point
4 First derivative Using 11 spectral points
5 First derivative Using 25 spectral points
6 Second derivative Using 11 spectral points
7 Second derivative Using 25 spectral points

cation of main analyte peaks. The search for the optimal

wavelength interval was conducted by choosing a starting
point, i.e., a wavenumber where an analyte peak is pre-
sent, and then computing the various indicator values and
RM SECV for a wavelength interval de ned around this
starting point. Then the interval was extended in both
directions and new indicator values and RM SECV were
F IG . 3. NIR spectra of blank powder sample, powder sample with 2.57
computed. This was done a proper number of times using w/w % API, and analyte spectrum.
an increasing interval width until a large part of the
wavelength axis was examined. The selection of wave-
length intervals to examine can be m ade in numerous
ways either using prior knowledge about major peak lo- that a preprocessing method or preprocessing method and
cations or more automatic routines, e.g., m oving win- wavelength interval selection have been applied to the
dows. In any case, the indicator values can be computed spectra. Note that SE ref is the denominator in Eq. 17. This
and therefore applied to existing wavelength selection is because the optimal preprocessing and wavelength se-
methods. lection is equal to the highest SE value opposite the other
For the powder samples the starting point was 6000 indicators and RMSECV where the lowest value equals
cm 2 1 , i.e., an analyte peak is found there (Fig. 3) with an the optimal preprocessing and wavelength selection. If
interval width of 160 cm 2 1, i.e., from 5920 to 6080 cm 2 1. the gain value is bigger then one, then the preprocessing
Then the interval was extended 160 cm 2 1 to 5840 –6160 or preprocessing and wavelength selection will improve
cm 2 1 . This was repeated until 20 intervals were exam- the nal calibration model, while if the gain value is
ined, the last covering 4400 –7600 cm 2 1 . For the tablet equal to or lower then one then the preprocessing or pre-
samples the starting point was 8800 cm 2 1, i.e., an analyte processing and wavelength selection are not improving
peak is found there with an interval width of 120 cm 2 1, or worsen the nal calibration m odel.
i.e., from 8740 to 8860 cm 2 1. Then the interval was ex- Results for Powder Samples. In Fig. 3 the pure an-
tended 120 cm 2 1 to 8680 –8920 cm 2 1, and this was re- alyte spectrum, a spectrum of a blank sample, and a spec-
peated until 15 intervals were examined, the last covering trum of a sample containing 2.57 w/w % API are de-
7900 –9700 cm 2 1 . picted. The difference between the blank spectrum and
Comparing the Indicator Values and RM SECV Us- the spectrum containing 2.57 w/w % API is m ainly
ing GAIN Values. To compare the indicator values and caused by scattering phenomena seen as offset differenc-
the RMSECV we computed the GAIN for each value. es from 7000 to 12 000 cm 2 1 . In the API spectrum main
The GAIN is computed as the ratio between an indicator peaks are identi ed in the combinational band region, i.e.,
or RMSECV value to a reference value. The reference 4650 cm 2 1 and 4940 cm 2 1, and in the rst overtone region
value for the indicators or RMSECV is the value when we nd a peak at 6000 cm 2 1 , and in the second overtone
using spectra without any preprocessing applied and us- region a peak at 8800 cm 2 1 is apparent.
ing the whole wavelength range: Choosing the O ptimal Preprocessing M ethod for
SE pre Powder Samples. To span the interference space for the
SE gain 5 (16) invSEN, SE, and EI indicator we used ve blank sample
SE ref
spectra, symbolized with open squares in Fig. 2. To com-
invSEN ref pute the invSEN we used two sample spectra containing
invSEN gain 5 (17)
invSEN pre the analyte, i.e., samples marked with grey circles in Fig.
2. To compute the SE we used two analyte spectra to
EI ref compute the signal and an additional twenty- ve blank
EI gain 5 (18)
EI pre sample spectra to compute the error. To compute the EI,
RMSECV ref two sample spectra, i.e., samples marked with grey color
RMSECV gain 5 (19) in Fig. 2, and two analyte spectra were used. The
RMSECV pre RM SECV values were calculated using the 32 samples
where the subscript ‘‘ref ’’ means that the indicator and depicted in Fig. 2. W hen computing the RMSECV values
RM SECV value are computed using non-preprocessed the 32 samples were divided into 11 blocks, i.e., 10
spectra and the whole wavelength range, i.e., 4000 to blocks with three samples each and one block with two
10 000 cm 2 1 for the powder samples and 7300 to 10 000 samples, and then cross-validation was perform ed leaving
cm 2 1 for the tablet samples. The subscript ‘‘pre’’ means out one block each time. Based on the cross-validation

268 Volume 58, Number 3, 2004

F IG . 4. Optimal preprocessing method. Gain values for indicators and F IG . 5. Optimal preprocessing/wavelength selection. Gain values for
RMSECV for powder sam ples. indicators and RMSECV for powder samples.

results, ve PLS components were selected for the PLS length interval, i.e., I 2 1 and increasing with increasing
model of the whole wavelength range. interval width (see insert in Fig. 5).
The indicator values and the RM SECV were calculated It is important to notice that the selection of prepro-
using the 4000 –10 000 cm 2 1 wavelength region and by cessing method using the whole wavenumber range is not
applying the preprocessing methods listed in Table I. In representative of the results when only a small wave-
Fig. 4 the gain values are depicted for the indicators and length region is used. Therefore, combining preprocess-
RM SECV. The RM SECV shows that the best prepro- ing and wavelength selection, as is done here, seems to
cessing method is rst derivatives using 25 spectral be necessary.
points with a gain value of 2.9. The SE indicator has the Results for Tablet Samples. To span the interference
highest gain for rst derivatives, while the EI indicator space for the invSEN, SE, and EI indicators we used
has the highest gain for second derivatives. The invSEN three blank sample spectra. To compute the invSEN we
indicator has the highest gain for MSC, which is clearly used two samples with a high concentration of API. To
wrong compared to the PLS results. compute the SE we used two sample spectra, i.e., using
W avelength Selection for Powder Samples. Indicator two samples with a high concentration of API as substi-
and RM SECV values were computed for twenty wave- tution for pure analyte tablet spectra that were not avail-
length intervals around 6000 cm 2 1. For all intervals, four able to compute the signal, and an additional three blank
PLS components were used to calculate the RMSECV sample spectra to compute the error. To compute the EI
values. Again the number of PLS components is based four sample spectra with a high API concentration were
on cross-validation results. This was done for all seven used. Two of the sample spectra were used to compute
preprocessing methods and the highest gain value for the the average r *k and the two other sample spectra were
RM SECV was then found to be 5.95 when preprocessing used to compute the average s*k (Eqs. 2 and 3) because
method 5 was used with the wavelength interval from no p ure analy te tablet sam ples are av ailable. T he
5840 –6160 cm 2 1 (Fig. 5). This matched perfectly the SE RM SECV values were calculated using all 18 samples.
indicator that had the highest gain value for the same W hen the RMSECV value was computed the leave-one-
preprocessing method and wavelength interval as the out principle was used because of the limited size of the
PLS method. Also, the EI indicator had the highest gain dataset.
value for preprocessing method 5, but using the wave- Choosing the O ptimal Preprocessing M ethod for
length interval from 5760 –6240 cm 2 1. The shape of the Tablet Samples. Also for the tablet samples, comparison
RM SECV gain curve corresponded well with the shape of the preprocessing methods using a broad spectral range
of the SE gain curve, and also the gain values were all was not a feasible method, i.e., preprocessing combined
above one for the RMSECV and the SE. The gain values with wavelength selection was necessary.
for the EI when applying preprocessing m ethod 5 were W avelength Selection for Tablet Samples. Indicator
only above one for three intervals, i.e., I 2 2, I 2 3, and and RM SECV values were com puted for fteen wave-
I 2 4, while the remaining intervals were less then one, length intervals around 8800 cm 2 1 with all the prepro-
indicating that no preprocessing and using the whole cessing methods described in Table I. All PLS m odels
wavelength region was better for those intervals (Fig. 5). were calculated using four PLS components. The highest
The invSEN indicator was not useful for wavelength se- gain value for the RMSECV was 3.6 when using prepro-
lection using any of the preprocessing m ethods. The cessing method 5, i.e., rst derivatives with 25 spectral
highest gain value for the invSEN was 11.8 using M SC points and the wavelength interval 8620 –8980 cm 2 1 (Fig.
as the preprocessing method and the wavelength interval 6). Also, the SE had the maximum gain value of 3.8 using
from 4000 to 10 000 cm 2 1, and when using all other pre- preprocessing method 5 and the interval 8620 –8980 cm 2 1
processing methods the gain for the invSEN was always (Fig. 6). The shape of the RM SECV and the SE gain
below one, with the lowest value for the smallest wave- curves were fairly similar. As for the powder samples,

APPLIED SPECTROSCOPY 269

unclear how interactions between the analyte and the in-
terferents are dealt with. This is a general problem of the
NAS approach, but even for more commonly used in-
verse calibration methods such as PLS or PCR this is not
clear.

CONCLUSION
We have demonstrated a new indicator for choosing
the optimal preprocessing m ethod and conducting wave-
length selection of NIR spectra. The indicator was com-
pared to existing indicators also using net analyte signal
computations and the standard m ethodology using cross-
validation results from a PLS regression m odel. The in-
dicator performed better then the two reference methods
using net analyte signal methodology. The invSEN failed
generally to nd the optimal preprocessing method and
F IG . 6. Optimal preprocessing/wavelength selection. Gain values for was also not useful for wavelength selection. The EI in-
indicators and RMSECV for tablet samples. dicator was developed for wavelength selection but we
tried to use it for selection of optimal preprocessing meth-
od without success for both the powder and tablet sam-
the invSEN was not useful for wavelength selection and
ples. For wavelength selection the EI indicator performed
the gain values were less then one except for the M SC
reasonably for the powder samples and identi ed a few
method. The EI had a m aximum gain at 1.29 when M SC
wavelength intervals that improved the calibration model,
was used for preprocessing and the wavelength interval
but not the optimal selection (Fig. 5). The indicator could
was 8320 –9280 cm 2 1 (not depicted) and was in general
not be used for wavelength selection of the tablet sam-
not useful for wavelength selection of the tablet samples.
ples. The SE indicator identi ed the right preprocessing
The problem with the invSEN indicator is that when
method and also the optimal wavelength selection both
the spectra are preprocessed using rst and second deriv-
for the powder and the tablet samples. For the tablet sam-
atives the Euclidean length of the spectra and subsequent-
ples the right preprocessing m ethod was not obvious and
ly the net analyte signal vectors are lowered. This de-
was identi ed only after subsequent wavelength selection
creases the analyte sensitivity as computed in Eq. 4 with-
was performed (Fig. 6). Thus, in cases where only a few
out regard to the analytical performance of a calibration
samples are available, reference values are determined
model using derivative spectra. In the original publica-
with a high error, or are not available, we recommend
tion, Faber assumed that only white noise is present,
this new indicator.
which is a huge simpli cation of real spectroscopic sys-
In this study the proposed m ethod is only demonstrated
tems in pharmaceutical applications. This m ight also ex-
for re ectance spectra of powder samples and transm it-
plain why the m ethod fails with our examples.
tance spectra of whole tablets. M ore and different spec-
The EI indicator perform ed reasonably well but with
troscopic applications are necessary to corroborate the
failures. Wavelength selection of the tablet samples was
obtained results and to understand the limitations of this
not possible. The reason for the failure with the tablet
method. It might be the case that for different applica-
samples might be that no ‘‘pure analyte tablet’’ was avail-
tions, the proposed indicator will not always be the best
able. In the EI, the net analyte signal vector of a sample
choice for selection of the optimal preprocessing and
and analyte spectra are compared. But as pure analyte
wavelength points.
spectra are not always available and generally not for
tablet samples the EI is not usable for this sample type.
ACK NOW LEDGM ENT
The validation of the SE method is only performed on
the zero concentration level. Therefore, it can be expected Novo Nordisk, Corporate Research Affairs (CORA) sponsored this
work as a part of E.T.S. Skibsted’s Ph.D. project.
that the m ethod will work better for low concentrations.
During the work we discovered that a good selection of
blank samples is the ‘‘key’’ to the SE indicator. For the 1. M . Blanco, H. Iturriaga, S. Maspoch, and C. Pezuela, Analyst
powder samples we had measured each of the ve blank (Cambridge, U.K.) 123, 135 (1998).
2. M . Blanco, J. Coello, A. Eustaquio, H. Iturriaga, and S. Maspoch,
samples eight times, giving forty blank spectra. Among Anal. Chim. Acta 392, 237 (1999).
these spectra we picked a few spectra to span the inter- 3. M . Blanco, J. Coello, H. Iturriaga, S. M aspoch, and D. Serrano,
ference space and a larger portion to quantify the error. Analyst (Cambridge, U.K.) 123, 2307 (1998).
We recommend that as many blank samples as possible 4. R. D. Maesschalck, F. C. Sánchez, D. L. M assart, P. Doherty, and
P. Hailey, Appl. Spectrosc. 52, 725 (1998).
be m easured using repeated measurem ents, and in that
5. F. C. Sánchez, J. Toft, B. Bogaert, S. S. Dive, and P. Hailey, Fre-
manner, instrumental noise and baseline drift are includ- senius’ J. Anal. Chem. 352, 771 (1995).
ed. This is easy to do in most industrial applications, but 6. H. Martens and T. Næs, Trends Anal. Chem. 3, 204 (1984).
might be more dif cult for environmental products. Also 7. H. Martens and T. Næs, Multivariate Calibration (John Wiley and
reposition the samples and for powder samples, shake the Sons, Chichester, 1989).
8. H. Swierenga, A. P. de Weijer, R. J. van Wijk, and L. M. C. Buy-
samples. In that manner, heterogeneous samples are best dens, Chemom. Intell. Lab. Syst. 49, 1 (1999).
measured. 9. P. Geladi, D. McDougall, and H. Martens, Appl. Spectrosc. 39, 491
A problematic issue for all NAS m ethods is that it is (1985).

270 Volume 58, Number 3, 2004

10. A. Savitzky and M. J. E. Golay, Anal. Chem. 36, 1627 (1964). 18. J. Ferré and F. X. Rius, Anal. Chem. 70, 1999 (1998).
11. Q. Ding and G. W. Small, Anal. Chem. 70, 4472 (1998). 19. O. Berntsson, L. G. Danielsson, M. O. Johansson, and S. Folestad,
12. L. Nørgaard, A. Saudland, J. Wagner, J. P. Nielsen, L. Munck, and Anal. Chim. Acta 419, 45 (2000).
S. B. Engelsen, Appl. Spectrosc. 54, 413 (2000). 20. O. Berntsson, Ph.D. Thesis, ‘‘Characterization and Application of
13. C. H. Spiegelman, M. J. McShane, M . J. Goetz, M. Motamedi, Q. Near Infrared Re ection Spectroscopy for Quantitative Process
L. Yue, and G. L. Coté, Anal. Chem. 70, 35 (1998).
Analysis of Powder Mixtures’’, Kungliga Tekniska Högskolan
14. A. Lorber, Anal. Chem. 58, 1167 (1986).
15. N. M. Faber, Anal. Chem. 71, 557 (1999). (KTH), Stockholm, Sweden (2001).
16. L. Xu and I. Schechter, Anal. Chem. 68, 1842 (1996). 21. H. Swierenga, Ph.D. Thesis, ‘‘Robust Multivariate Calibration
17. H. F. Boelens, W. T. Kok, O. E. de Noord, and A. K. Smilde, M odels in Vibrational Spectroscopic Applications’’, Katholieke
internal report, available upon request. Universiteit Nijmegen, The Netherlands (2000).

APPLIED SPECTROSCOPY 271

2.2.24. Absorption Spectrophotometry, Infrared
100% (2)
2.2.24. Absorption Spectrophotometry, Infrared
4 pages
Chem 1 WEEK 1 - 20
No ratings yet
Chem 1 WEEK 1 - 20
166 pages
An Introduction To Multivariate Calibration and Analysis: Kenneth R. Beebe Bruce R. Kowalski
No ratings yet
An Introduction To Multivariate Calibration and Analysis: Kenneth R. Beebe Bruce R. Kowalski
9 pages
2017 - OPUS Quant Advanced PDF
100% (1)
2017 - OPUS Quant Advanced PDF
205 pages
Chemometrics For Dummies 2011 Version
No ratings yet
Chemometrics For Dummies 2011 Version
29 pages
Genetic Algorithm Based Wavelength Selection For
No ratings yet
Genetic Algorithm Based Wavelength Selection For
8 pages
Li (2009) Key Wavelengths Screening Using Competitive Adaptive Reweighted Samplingmethod For Multivariate Calibration
No ratings yet
Li (2009) Key Wavelengths Screening Using Competitive Adaptive Reweighted Samplingmethod For Multivariate Calibration
8 pages
Homogeneity in Pharmaceutical Mixing Processes
No ratings yet
Homogeneity in Pharmaceutical Mixing Processes
10 pages
ML Preprocessing Exercise 1
No ratings yet
ML Preprocessing Exercise 1
12 pages
Prediction Uncertainty Multivariate Model
No ratings yet
Prediction Uncertainty Multivariate Model
4 pages
NIR Spectra Pre-Processing Review
No ratings yet
NIR Spectra Pre-Processing Review
22 pages
2014 NIRnews EAS MarkWesterhaus
No ratings yet
2014 NIRnews EAS MarkWesterhaus
5 pages
MCR Als
No ratings yet
MCR Als
10 pages
NIR Spectroscopy & Chemometrics in R
No ratings yet
NIR Spectroscopy & Chemometrics in R
18 pages
Multivariate Calibration For The Development of Vibrational Spec Methods
No ratings yet
Multivariate Calibration For The Development of Vibrational Spec Methods
27 pages
Near Infrared Spectroscopy in Upstream Bioprocesses
No ratings yet
Near Infrared Spectroscopy in Upstream Bioprocesses
20 pages
Intro To Chemometrics and Quantitative Calibration Development
No ratings yet
Intro To Chemometrics and Quantitative Calibration Development
21 pages
Spectroscopic Analysis For Biological Samples: Towards in Situ Sample Analysis of Body Fluids
No ratings yet
Spectroscopic Analysis For Biological Samples: Towards in Situ Sample Analysis of Body Fluids
18 pages
Multivariate Calibration . II. Chemometric Methods: Tormod Naes and Harald Martens
No ratings yet
Multivariate Calibration . II. Chemometric Methods: Tormod Naes and Harald Martens
6 pages
Multivariate Calibration What Is in Chemometrics For The Analytical Chemist?
No ratings yet
Multivariate Calibration What Is in Chemometrics For The Analytical Chemist?
10 pages
Principal Component Analysis and Near Infrared Spectros PDF
No ratings yet
Principal Component Analysis and Near Infrared Spectros PDF
7 pages
Selection of Samples For Calibration in Near-Infrared Spectroscopy. Part Ih Selection Based On Spectral Measurements
No ratings yet
Selection of Samples For Calibration in Near-Infrared Spectroscopy. Part Ih Selection Based On Spectral Measurements
7 pages
853 Fluorescence Spectros
No ratings yet
853 Fluorescence Spectros
6 pages
4.11 - Smart Sensors
No ratings yet
4.11 - Smart Sensors
20 pages
Orthogonal Signal Correction of Near-Infrared Spectra: Svante Wold, Henrik Antti, Fredrik Lindgren, Jerker Ohman
No ratings yet
Orthogonal Signal Correction of Near-Infrared Spectra: Svante Wold, Henrik Antti, Fredrik Lindgren, Jerker Ohman
11 pages
NIR Helps Lower Costs, Optimize Processes
No ratings yet
NIR Helps Lower Costs, Optimize Processes
3 pages
NIR - Near Infrared Spectros
No ratings yet
NIR - Near Infrared Spectros
11 pages
Intro To Signal Processing
No ratings yet
Intro To Signal Processing
86 pages
A Machine Learning Application For Classification of Chemical Spectra
No ratings yet
A Machine Learning Application For Classification of Chemical Spectra
14 pages
Journal of Pharmaceutical and Biomedical Analysis
No ratings yet
Journal of Pharmaceutical and Biomedical Analysis
13 pages
NIRS Quantitative Analysis Guide
No ratings yet
NIRS Quantitative Analysis Guide
7 pages
E 2310 - 04 Rtizmta - PDF
No ratings yet
E 2310 - 04 Rtizmta - PDF
9 pages
NMR Chemometrics Guide
No ratings yet
NMR Chemometrics Guide
20 pages
NIR Spectra Clustering with Chemometrics
No ratings yet
NIR Spectra Clustering with Chemometrics
15 pages
WS Roos
No ratings yet
WS Roos
11 pages
Uv Vis-Trang-4
No ratings yet
Uv Vis-Trang-4
14 pages
(RSC Analytical Spectroscopy Momographs) M. J. Adams-Chemometrics in Analytical Spectroscopy - Royal Society of Chemistry (1995)
100% (1)
(RSC Analytical Spectroscopy Momographs) M. J. Adams-Chemometrics in Analytical Spectroscopy - Royal Society of Chemistry (1995)
225 pages
Applications For Mid-IR Spectroscopy in The Pharmaceutical Process Environment
No ratings yet
Applications For Mid-IR Spectroscopy in The Pharmaceutical Process Environment
4 pages
IR Spectroscopy in Pharma
No ratings yet
IR Spectroscopy in Pharma
4 pages
Pros and Cons of Using Correlation Versus Multivariate Algorithms For Material Identification Via Handheld Spectros
No ratings yet
Pros and Cons of Using Correlation Versus Multivariate Algorithms For Material Identification Via Handheld Spectros
11 pages
Absolute Virtual Instrument TCH
No ratings yet
Absolute Virtual Instrument TCH
6 pages
Bakeev Web Article - Proof
No ratings yet
Bakeev Web Article - Proof
5 pages
Guide To NIR
50% (2)
Guide To NIR
24 pages
Investigation of Partial Least Squares (PLS) Calibration Performance Based On Different Resolutions of Near Infrared Spectra
No ratings yet
Investigation of Partial Least Squares (PLS) Calibration Performance Based On Different Resolutions of Near Infrared Spectra
5 pages
Multivariate Strategies For Classificati
No ratings yet
Multivariate Strategies For Classificati
13 pages
Cem 800
No ratings yet
Cem 800
23 pages
Pca Ca MLR
No ratings yet
Pca Ca MLR
45 pages
Lasso Modeling As An Alternative To Pca Based Multivariate Models To System With Heavy Sparsity: "Biodiesel Quality by Nir Spectroscopy"
No ratings yet
Lasso Modeling As An Alternative To Pca Based Multivariate Models To System With Heavy Sparsity: "Biodiesel Quality by Nir Spectroscopy"
12 pages
Intro To Signal Processing
No ratings yet
Intro To Signal Processing
45 pages
Chemometrics For Dummies 2011 Version
No ratings yet
Chemometrics For Dummies 2011 Version
29 pages
Sensors: Monitoring Key Parameters in Bioprocesses Using Near-Infrared Technology
No ratings yet
Sensors: Monitoring Key Parameters in Bioprocesses Using Near-Infrared Technology
19 pages
Cite 201500147
No ratings yet
Cite 201500147
12 pages
Moisture Content Determination of Pharmaceutical Pellets by Near Infrared Spectroscopy: Method Development and Validation
No ratings yet
Moisture Content Determination of Pharmaceutical Pellets by Near Infrared Spectroscopy: Method Development and Validation
7 pages
Internal Standards in Regulated Bioanalysis: Putting in Place A Decision-Making Process During Method Development
No ratings yet
Internal Standards in Regulated Bioanalysis: Putting in Place A Decision-Making Process During Method Development
13 pages
Author's Accepted Manuscript: Talanta
No ratings yet
Author's Accepted Manuscript: Talanta
44 pages
Optimization and Testing of Mass Spectral Library Search Algorithms For Compound Identification
No ratings yet
Optimization and Testing of Mass Spectral Library Search Algorithms For Compound Identification
8 pages
Intro To Signal Processing
No ratings yet
Intro To Signal Processing
112 pages
Combined Mathematics: Structure of The Question Paper
No ratings yet
Combined Mathematics: Structure of The Question Paper
18 pages
Recent Progress in Lasers On Silicon
No ratings yet
Recent Progress in Lasers On Silicon
8 pages
Atomic Structure and Models
No ratings yet
Atomic Structure and Models
118 pages
Snc1w Exam Review 2024
No ratings yet
Snc1w Exam Review 2024
2 pages
Quantum Mechanics Quiz
No ratings yet
Quantum Mechanics Quiz
51 pages
Deborah S. Jin
No ratings yet
Deborah S. Jin
7 pages
Experiment No. 3 (E3) : Study of Magnetic Field of DC Helmholtz Coil
No ratings yet
Experiment No. 3 (E3) : Study of Magnetic Field of DC Helmholtz Coil
8 pages
Projectile Motion Lab
No ratings yet
Projectile Motion Lab
3 pages
Biochem. Reviewer
No ratings yet
Biochem. Reviewer
34 pages
Geologic Materials and Gamma R
No ratings yet
Geologic Materials and Gamma R
30 pages
+ 2 Two Marks QB em
No ratings yet
+ 2 Two Marks QB em
2 pages
Computational Methods - CFD Lecture 2
No ratings yet
Computational Methods - CFD Lecture 2
39 pages
General Physics 1 Quarter 2
No ratings yet
General Physics 1 Quarter 2
1 page
Lecture 1
No ratings yet
Lecture 1
27 pages
Chapter 21-Electric Charge and Electric Fields
No ratings yet
Chapter 21-Electric Charge and Electric Fields
28 pages
3 SR Star Revision Schedule (2024 2025) Final (Revised)
No ratings yet
3 SR Star Revision Schedule (2024 2025) Final (Revised)
2 pages
Quantum Computing and Communications
No ratings yet
Quantum Computing and Communications
144 pages
Laws of Chemical Combination
0% (1)
Laws of Chemical Combination
8 pages
Physics QB
No ratings yet
Physics QB
7 pages
Involute and Evolute PDF
No ratings yet
Involute and Evolute PDF
80 pages
PS Q3 Module2 WK2
No ratings yet
PS Q3 Module2 WK2
20 pages
Mechanism of Electrodeposition of Nickel
No ratings yet
Mechanism of Electrodeposition of Nickel
13 pages
Mechanics Mcqs
No ratings yet
Mechanics Mcqs
25 pages
Class PPT 1
No ratings yet
Class PPT 1
27 pages
Week 1 Introduction To Electricity and Electronics
No ratings yet
Week 1 Introduction To Electricity and Electronics
39 pages
Metallurgical Failure Analysis: by DR - Yahya Waqad
No ratings yet
Metallurgical Failure Analysis: by DR - Yahya Waqad
130 pages
Topics in Quantum Field Theory in Curved Spacetime Unruh Effect and Hawking Radiation (FUNÇÕES DE GREEN HEHEHEHE)
No ratings yet
Topics in Quantum Field Theory in Curved Spacetime Unruh Effect and Hawking Radiation (FUNÇÕES DE GREEN HEHEHEHE)
69 pages
Log Polar Manual
No ratings yet
Log Polar Manual
5 pages

SEindicator

Uploaded by

SEindicator

Uploaded by

New Indicator for Optimal Preprocessing and Wavelength

Selection of Near-Infrared Spectra

E. T. S. SKIBSTED, H. F. M . BOELENS, J. A. WESTERHUIS,* D. T. WITTE, and

APPLIED SPECTROSCOPY 265

266 Volume 58, Number 3, 2004

APPLIED SPECTROSCOPY 267

cation of main analyte peaks. The search for the optimal

268 Volume 58, Number 3, 2004

APPLIED SPECTROSCOPY 269

270 Volume 58, Number 3, 2004

APPLIED SPECTROSCOPY 271

You might also like