Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views8 pages

SEindicator

scientific paper about signal to error indicatior based on net analyte signal theory

Uploaded by

5n2nnm2r9p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

SEindicator

scientific paper about signal to error indicatior based on net analyte signal theory

Uploaded by

5n2nnm2r9p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

New Indicator for Optimal Preprocessing and Wavelength

Selection of Near-Infrared Spectra

E. T. S. SKIBSTED, H. F. M . BOELENS, J. A. WESTERHUIS,* D. T. WITTE, and


A. K. SM ILDE
Process Analysis and Chemometrics, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands
(E.T.S.S., H.F.M.B., J.A.W., A.K.S.); Pharmaceutical Site Maaloev, Novo Nordisk A/S, Novo Nordisk Park, 2760 Maaloev, Denmark
(E.T.S.S.); and QA/ALC-A, N.V. Organon, Molenstraat 10, P.O. Box 20, 5340 BH Oss, The Netherlands (D.T.W.)

Preprocessing of near-infrared spectra to remove unwanted, i.e., can easily be obtained. One known problem in near-in-
non-related spectral variation and selection of informative wave- frared spectroscopy is spectral variations that are not re-
lengths is considered to be a cru cial step prior to the construction lated to the property of interest.8 This non-related varia-
of a quantitative calibration model. The standard m ethodology
tion is especially important in pharm aceutical applica-
when comparing various preprocessing techniques and selecting dif-
feren t wavelengths is to compare prediction statistics computed with
tions of NIR. In the pharm aceutical industry, spectra are
an independent set of data not used to make the actual calibration often recorded in re ectance m ode. Var ying particle sizes
model. When the erro rs of reference value are large, no such values and var ying compression of, e.g., powders cause non-
are available at all, or only a limited number of samples are avail- related spectral variation. To correct for this variation var-
able, other methods exist to evaluate the preprocess ing method and ious spectral preprocessing techniques are used prior to
wavelength selection. In this work we present a new indicator (SE) calibration, e.g., multiplicative scatter correction 9 (M SC),
that only req uires blank sample spectra, i.e., spectra of samples that offset correction, or Savitzky–Golay 10 derivatives. Anoth-
are m ixtures of the interferin g constituents (everything except the er problem is that if a large part of the recorded spectrum
analyte), a pure analyte spectrum , or alternatively, a sample spec-
does not contain any information about the analyte,
trum where the analyte is present. The indicator is based on com -
puting the net analyte signal of the analyte and the total erro r, i.e., wavelength selection becomes very important. Several
instrumental noise and bias. By com paring the indicator values methods have been proposed for wavelength selec-
when different preprocess ing techniques and wavelength selections tion.11,12 Until recently, it was believed that full spectrum
are applied to the spectra, the optimal preprocessing technique and methods, e.g., PLS, would automatically overcome the
the optimal wavelength selection can be determined without knowl- problem of wavelength selection by setting the regression
edge of reference values, i.e., it minimizes the non-related spectral coefŽ cients for non-informative wavelengths to zero or
variation. The SE indicator is compared to two other indicators that near zero. However, this is not the case and PLS-based
also use net analyte signal computations. To demonstrate the fea-
calibrations can in m any cases be improved by a proper
sibility of the SE indicator, two near-infrared spectral data sets
from the pharmaceutical industry were used, i.e., diffuse re ectance
selection of wavelengths.13
spectra of powder samples and transmission spectra of tablets. Es- The most com mon way of judging whether a prepro-
pecially in pharmaceutical spectroscopic applications, it is expected cessing method is beneŽ cial for the analytical perfor-
beforehand that the non-related spectral variation is rather large mance is to compute the prediction uncertainty for an
and it is important to remove it. The indicator gave excellent results independent test set, i.e., the root mean square error of
with respect to wavelength selection and optimal preprocessing. The prediction (RMSEP) or root mean square error of predic-
SE indicator performs better than the two other indicators, and it tion cross-validation (RM SECV) if only a smaller dataset
is also applicable to other situations where the Beer– Lambert law
is available, and then select the preprocessing method that
is valid.
gives the lowest RMSEP/RM SECV. Some pitfalls with
Index Headings: Spectra l prep rocessin g; W avelength selection;
this method are that it requires a fairly large number of
Near-infrared spectroscopy; Error indicator; Net analyte signal;
Signal-to-noise ratio; Pharmaceutical powders and tablets.
samples, i.e., both calibration and test set data. Secondly,
if the uncertainty of the reference values is high then
judgments are based on reference values with errors. Fi-
INT RODUCTIO N nally, when using PCR or PLS the RM SEP/RMSECV
values are in uenced by the m odel dimensionality. If the
Near-infrared (NIR) spectroscopy is gaining popularity model dimensionality is not estimated correctly with
as a quantitative analytical method in the pharmaceutical some kind of validation technique, then the RM SEP/
industry.1–3 Quality control of incoming raw materials RM SECV values will be misleading and therefore, judg-
and quantitative analysis of intermediate 4,5 and Ž nalized ments of preprocessing method selection or wavelength
products3 are examples of that. Spectra can be recorded selection m ay also be incorrect.
quickly and in a non-invasive m anner and they can be Other m ethods exist to help choose the optimal pre-
combined with a m ultivariate calibration technique, e.g., processing method, i.e., methods using the net analyte
principal component regression 6 (PCR) and partial least- signal (NAS) concept. Net analyte signal is deŽ ned as the
squares regression 7 (PLS), whereby quantitative measures part of a signal that is unique for the analyte of interest.14
Lorber 14 demonstrated how Ž gures of merit, e.g., multi-
Received 19 June 2003; accepted 30 October 2003.
variate sensitivity, signal-to-noise ratio, selectivity, and
* Author to whom correspondence should be sent. E-mail: westerhuis@ limit of detection could be computed from the net analyte
science.uva.nl. signal of the analyte. These Ž gures of m erit can be used

0003-7028 / 04 / 5803-0264$2.00 / 0
264 Volume 58, Number 3, 2004 q 2004 Society for Applied Spectroscop y
APPLIED SPECTROSCOPY
to judge whether a preprocessing method is beneŽ cial for vector. S 2 k is a J 3 L matrix with L spectra of blank
the analytical performance, and they can also be used for samples. In some publications14 pure spectra of the inter-
wavelength selection. Faber 15 used the inverse multivar- fering constituents are used to construct the S 2 k matrix.
iate sensitivity of the analyte to judge whether a certain In our experience this is not the best m ethod, e.g., pure
preprocessing method, e.g., derivative, would improve spectra are not always available and the pure constituent
the predictive ability of the calibration model or not. Xu spectrum m ight differ slightly in shape from the spectral
and Schechter16 developed an error indicator for wave- contribution in a m ixture of interfering constituents. Prac-
length selection. Boelens et al. have also demonstrated tically, the S 2 k matrix is m ost easily constructed by mea-
the usability of NAS for improving the detection limit suring m ixtures of the interfering constituents. S12 k is its
for a spectroscopic process analysis by tuning Savitzky– Moore–Penrose inverse, an L 3 J m atrix. r *k and s*k are
Golay Ž lters. 17 All these m ethodologies use the net an- J 3 1 vectors called the net analyte signal vector of the
alyte signal of the analyte of interest. kth constituent. The net analyte signal for constituent k
In this work we introduce a new error indicator called in any sample can now be computed with Eq. 1.
the signal-to-error indicator (SE). A signal-to-error (SE) Inverse Sensitivity. Various Ž gures of merit 14 can be
value is computed for the analyte when various prepro- computed using the net analyte signal concept, e.g., an-
cessing m ethods and wavelength selections are applied to alyte sensitivity. Faber 15 evaluated the effect of various
the spectra. The highest SE value indicates the optimal preprocessing methods of near-infrared spectra with an
preprocessing and wavelength interval. error indicator based on computation of the inverse of the
We will demonstrate the performance of the inverse analyte sensitivity (a2 1; from here we denote this as
sensitivity indicator, the error indicator, and the signal-to- invSEN) using the net analyte signal concept. Faber used
error indicator with two NIR data sets from different stag- the assumption that the length of the net analyte signal
es in a pharm aceutical tablet production process. The in- vector is proportional to the concentration of the analyte.
dicators are compared to the standard PLS methodology Faber converted the net analyte signal vector into a scalar
and the RMSECV. For the applications presented in this value by taking the Euclidean norm 15 of the net analyte
paper the PLS m ethod is used as a standard to which the signal vector and plotted the value against the analyte
other indicators can be compared. This is possible since concentration of the sample, thereby constructing a uni-
the reference method is known to be accurate. The Ž rst variate calibration plot. The analyte sensitivity can then
set contains spectra of powder samples after mixing the be computed with:
tablet constituents. In the second data set, Ž nalized tablets
using the powder composition from the Ž rst data set are \ r *k,c \
a5 (3)
measured. In both cases the analyte is the active phar- c k,c
maceutical ingredient (API) and the optimal preprocess-
c k,c
ing and wavelength selection is sought. invSEN 5 a 2 1 5 (4)
First some theory about net analyte signal and the \ r *k,c \
method to compute Ž gures of merit will be presented.
where \ r *k,c \ is the norm of the net analyte signal of a
Secondly, the different error indicators will be described
calibration sample with concentration c k,c and the slope
and compared. Then in the Experimental section the in-
of the calibration line a is the sensitivity of analyte k.
strumentation and different data sets used are described
Faber concluded that a preprocessing m ethod is ben-
in detail, and Ž nally, in the Results and Discussion sec-
eŽ cial for the Ž nal predictive ability if the inverse sen-
tion the different error indicators are compared and the
sitivity is decreasing with that particular pretreatment.
results are commented on.
The effect on the inverse sensitivity when doing Ž rst and
second derivatives compared to multiplicative scatter cor-
TH EORY
rected (MSC) spectra was evaluated. This indicator needs
Notation. Boldface capital characters denote matrices, a collection of spectra to span the interference space and
boldface lower-case characters denote vectors, and lower- spectra containing the analyte and their respective refer-
case italic characters denote scalars. \ r \ is the Euclidean ence concentrations of the analyte to compute analyte
norm of the vector r , superscript T denotes the transposed sensitivity.
matrix or vector, and the superscript 1 denotes the Error Indicator. Xu and Schechter16 developed an er-
Moore–Penrose generalized inverse of a matrix. The ma- ror indicator (EI) for wavelength selection. The assump-
trix I J is the J 3 J identity matrix. tion for their EI is that the prediction error in multivariate
Net Analyte Signal. The net analyte signal is deŽ ned analysis is determined by the quality of the corresponding
as the part of a spectrum that is orthogonal to a subspace net analyte signal. By minimizing the relative error in the
spanned by the spectra of all constituents except the an- norm of the NAS, the analytical conditions are optimized
alyte, i.e., all interfering constituents.14 So the net analyte and lower prediction errors are achieved. The EI was de-
signal of analyte k can be found by the following or- Ž ned as follows:
thogonal projection:
var( \ r *k \ 2 \ r *k,true \ ) 1 / 2
r *k 5 (I J 2 S 2 k S 12 k )r k (1) EI 5 (5)
\ r *k,true \
s*k 5 (I J 2 S 2 k S 12 k )s k (2) Due to non-related variations (interferents or baseline
where r k is a J 3 1 vector containing the spectral re- offsets) the norm of the NAS may be affected. The nu-
sponse for a sample including the analyte k measured at merator of the EI describes the variance in the norm of
J wavenumbers. The pure analyte spectrum s k is a J 3 1 the NAS caused by noise in the spectra due to non-related

APPLIED SPECTROSCOPY 265


variations. Xu and Schechter assume that the noise in the
spectra due to non-related variations is homoscedastic,
i.e., each wavenumber has the same variance, and that
the noise is not correlated for neighboring wavenumbers.
In that case, the variance in the norm of the NAS due to
non-related variation can be written as follows:18
[(2 \ r *k,true \ s) 2 1 (Js 2 ) 2 ]
var( \ r *k \ 2 \ r *k,true \ ) 5 (6)
( \ r *k \ 1 \ r *k,true \ ) 2
Here J is the number of wavenumbers in the spectra
used. The standard deviation of the spectral noise de- F IG . 1. Geometrical display of the interference space and the net an-
scribed above is represented by s. Since \ r *k,true \ cannot be alyte signal vectors.
known, Xu and Schechter propose to replace it with \ r *k \ ,
which leads, according to Ferre and Rius,18 to the follow-
ing expression for the error indicator: s*k

[1 ]
PROJ blank 5 r Tblank · (11)
1/2
s*k T s*k
2
J s 2 2
s 2
1 1
4 \ r *k \ 2
PROJ blank 5 r Tblank · nas reg (12)
EI 5 (7)
\ r *k \ The error taking into account both bias and noise is
The standard deviation of the spectral noise, s, is found computed by:

Î
from the net analyte signal regression plot (NASRP).
O (PROJ
I
First take the NAS vector of the pure analyte spectrum 2 0) 2
blank, i
s*k and the NAS vector of a sample spectrum containing i5 1
error 5
the analyte r *k . Then the absorbance at each wavelength I

ÎO
j in s*k is plotted against the absorbance in r *k at the same
I
wavelength, for all j 5 1, . . . , J wavelengths in the vec-
tors. This results in the NASRP plot. In the ideal case (PROJ blank, i ) 2
i5 1
with no non-related variation, both NAS vectors will 5 (13)
I
point in the same direction and the points in the NASRP
plot will form a perfectly straight line passing (0, 0). The In the nominator we use I and not I 2 1 because no
assum ption made by Xu and Schechter16 is that at each mean is subtracted so the degrees of freedom are pre-
wavelength the error is norm ally distributed with the served.
same standard deviation, i.e., white noise. A straight line The signal is then computed by projecting the analyte
is Ž tted through the points in the NASRP plot in a least- spectrum on the NAS regression vector and the SE can
square sense and by computing the residual vector, i.e., be computed as the ratio between the signal and the error:
deviation of each of the points from the line, s can be
computed: 18
Signal 5 sTk ·nas reg (14)

!
e Tk,res ·e k,res
s5 (8) Signal
J 21 SE 5 (15)
error
where e k,res is a J 3 1 vector containing the residuals. The
residuals are computed in the following manner: This error indicator needs a collection of blank spectra
to span the interference space and to quantify the error
e k,res 5 r *k 2 s*k c k (9) part plus the pure analyte spectrum. If the pure analyte
spectrum is not available, a sample spectrum containing
c k 5 r *k T ·s*k /\ s*k \ 2
(10)
the analyte can be used.
The error indicator needs a collection of blank spectra Although the error indicator and the signal-to-error in-
to span the interference space, the pure analyte spectrum, dictor seem to be comparable, there are some important
and a sample spectrum containing the analyte. differences. The EI minimizes the difference between the
Signal-to-Error Indicator. In this work we present a length of two vectors, r *k,true and r *k . However, these vec-
new indicator based on the computations of the signal- tors will not necessarily point in the same direction.
to-error (SE). We assum e that the error in the spectra is Therefore, the difference in lengths is not directly related
made of two contributions, i.e., noise and bias. If a certain to errors in concentration. The SE indicator focuses on
preprocessing method or wavelength selection is not re- errors in the direction of the NAS regression vector, i.e.,
moving unwanted interference, then extra blank samples the same direction. The projections on the NAS regres-
may have a small contribution orthogonal to S 2 k when sion vector are used (can also be negative) and not only
they are projected onto the interference space. We com- the lengths of the projected vector. These projections are
pute this contribution as the projection (PROJ blank ) of directly related to the concentrations (cf. Fig. 1).
some extra blank spectra (r blank ) on the norm ed s*k vector, Toolboxes for net analyte signal calibrations are avail-
i.e., norm ed to unit length. We call the norm ed s*k vector able for free download at http://www-its.chem.uva.nl/
for the net analyte signal regression vector nasreg . research/pac/index.html.

266 Volume 58, Number 3, 2004


EXPERIMENTAL
The powder samples were m easured with a BOM EM
M B 16 0 F T-N IR sp ectro m eter equ ipp ed w ith a
SpinningVialy accessory for measuring powder samples,
the samples were measured with diffuse re ectance, and
an InGa detector was used. The SpinningVialy accessory
measured through the sample vials through the side of
the glass vials (where the glass walls are assumed to be
the most homogeneous). The wave number range from
3800 cm 2 1 to 12 000 cm 2 1 was recorded and the spectral
resolution was set to 8 cm 2 1 . For each spectrum a total
of 32 scans were averaged (the scanning time for 32
scans measured with a spectral resolution of 8 cm 2 1 is
the same time as the SpinningVialy accessory uses to
spin the sample vial one revolution). The tablet samples
were measured with a BOMEM MB 160 FT-NIR spec-
trometer equipped with a TabletSamplIRy accessory.
The tablets were measured with a transmission measure-
ment and an InGaAs detector was used. The wave num- F IG . 2. Triangular m ixture design for powder sam ples.
ber range from 4000 cm 2 1 to 12 000 cm 2 1 was recorded
and the spectral resolution was set to 16 cm 2 1. W hen
measuring transm ission spectra of tablets, normally only The constituents were mixed manually with a metal spat-
broad peaks in the Ž rst and second overtone region are ula. Then Ž ller binder B, glidant C, and Ž nally the glidant
useful for quantiŽ cation and 16 cm 2 1 is a reasonable res- D were added, each time manually m ixing with a metal
olution. For each spectrum a total of 32 scans were av- spatula. Each sample was measured eight times in the
eraged. In both cases the data were collected with SpinningVialy accessory and between each measurem ent
GRAMS32 (Therm oGalactic.com, GRAM S/32, 1998) the sample was removed and shaken vigorously. The
software and imported into Matlab (MathWorks Inc., mean of the eight spectra was then used to represent the
Matlab ver. 12.1., 2001) with in-house written software. sample. The powder samples are generally problematic
Computations were perform ed in Matlab with in-house to measure because of the heterogeneous distribution of
written routines plus the PLSp toolbox (Eigenvector Re- the sample constituents, but other studies (not shown)
search, Inc. PLSp Toolbox. Version 2.1., 1998). have shown that the SpinningVialy accessory and the use
Dataset 1: Powder Samples. The samples were made of the mean spectrum is a valid methodology, and the
according to a triangular mixture design. The samples methodology has also been reported elsewere.19 As a ref-
contained Ž ve constituents, i.e., the active pharmaceutical erence method, the weighed amount was used (gravi-
ingredient (API), two Ž ller binders (A and B), and two metric) and the uncertainty of this value was believed to
glidants (C and D). Three doses are normally produced, be low, i.e., 610 2 4 g.
i.e., 0.64, 1.27, and 2.57 API w/w % (low, medium, and Dataset 2: Tablet Samples. No speciŽ c experimental
high strength). To have samples that resemble the hetero- design was used for the tablet samples, but a small data
geneous nature of powder m ixtures, samples with over- set based on a stratiŽ ed sampling scheme was used. Tab-
and under-dose of API, Ž ller binder A, and Ž ller binder lets were taken from nine different production batches
B were produced according to a triangular m ixture design (pilot scale batches): three batches with placebo tablets,
(Fig. 2). Samples with 610% of target dose of API and i.e., blank samples without API, and six batches with API
610 and 620 w/w % of target dose of Ž ller binder A in three different levels. From each batch two tablets
and Ž ller binder B, respectively, were m ade while the were used, for a total of 18 tablets. Because it is not
added amounts of glidant C and glidant D were kept con- possible to measure a transmission spectrum of the pure
stant. Initial experiments (not shown here) indicated that API (s k ), we used a spectrum of a tablet from a batch
homogeneity of Ž ller binder A and Ž ller binder B could with high concentration of API as a replacement for the
be difŽ cult to obtain in a large-scale mixing process. It pure analyte spectrum. One tablet spectrum from each of
was therefore assumed that the span, i.e., 620% from the placebo batches, i.e., three spectra, were used to span
target concentration, of those constituents would resem- the interference space and the three rem aining spectra
ble the heterogeneity that could be expected in the inter- were used as blank samples to quantify the error.
ference m atrix, while glidant C and glidant D are as-
sumed to be less important and for practical reasons the RESULTS AND DISCUSSION
added amount was kept constant. Blank samples without A selection of different preprocessing methods (Table
API were also prepared (marked with squares in Fig. 2). I) that are normally 20,21 applied when doing preprocessing
The samples were prepared in 25 mL glass vials that of NIR spectra obtained from diffuse re ectance mea-
Ž tted into the SpinningVialy accessory. The total sample surements of powders and transmission spectra of tablets
size was 8.0 g and the samples were prepared in the fol- were compared. For both data sets we compared the same
lowing manner. First the Ž ller binder A was weighed with preprocessing methods. The wavelength selection can be
an electronic precision weight and transferred into the conducted in m any different ways. In this article we used
vial. Then API was weighed and transferred into the vial. the prior knowledge we have about the analyte, i.e., lo-

APPLIED SPECTROSCOPY 267


TABLE I. Preprocessing methods.
No. Method Note
1 No preprocessing
2 MSC
3 Offset Using 9990–10 000 cm 2 1 as offset point
4 First derivative Using 11 spectral points
5 First derivative Using 25 spectral points
6 Second derivative Using 11 spectral points
7 Second derivative Using 25 spectral points

cation of main analyte peaks. The search for the optimal


wavelength interval was conducted by choosing a starting
point, i.e., a wavenumber where an analyte peak is pre-
sent, and then computing the various indicator values and
RM SECV for a wavelength interval deŽ ned around this
starting point. Then the interval was extended in both
directions and new indicator values and RM SECV were
F IG . 3. NIR spectra of blank powder sample, powder sample with 2.57
computed. This was done a proper number of times using w/w % API, and analyte spectrum.
an increasing interval width until a large part of the
wavelength axis was examined. The selection of wave-
length intervals to examine can be m ade in numerous
ways either using prior knowledge about major peak lo- that a preprocessing method or preprocessing method and
cations or more automatic routines, e.g., m oving win- wavelength interval selection have been applied to the
dows. In any case, the indicator values can be computed spectra. Note that SE ref is the denominator in Eq. 17. This
and therefore applied to existing wavelength selection is because the optimal preprocessing and wavelength se-
methods. lection is equal to the highest SE value opposite the other
For the powder samples the starting point was 6000 indicators and RMSECV where the lowest value equals
cm 2 1 , i.e., an analyte peak is found there (Fig. 3) with an the optimal preprocessing and wavelength selection. If
interval width of 160 cm 2 1, i.e., from 5920 to 6080 cm 2 1. the gain value is bigger then one, then the preprocessing
Then the interval was extended 160 cm 2 1 to 5840 –6160 or preprocessing and wavelength selection will improve
cm 2 1 . This was repeated until 20 intervals were exam- the Ž nal calibration model, while if the gain value is
ined, the last covering 4400 –7600 cm 2 1 . For the tablet equal to or lower then one then the preprocessing or pre-
samples the starting point was 8800 cm 2 1, i.e., an analyte processing and wavelength selection are not improving
peak is found there with an interval width of 120 cm 2 1, or worsen the Ž nal calibration m odel.
i.e., from 8740 to 8860 cm 2 1. Then the interval was ex- Results for Powder Samples. In Fig. 3 the pure an-
tended 120 cm 2 1 to 8680 –8920 cm 2 1, and this was re- alyte spectrum, a spectrum of a blank sample, and a spec-
peated until 15 intervals were examined, the last covering trum of a sample containing 2.57 w/w % API are de-
7900 –9700 cm 2 1 . picted. The difference between the blank spectrum and
Comparing the Indicator Values and RM SECV Us- the spectrum containing 2.57 w/w % API is m ainly
ing GAIN Values. To compare the indicator values and caused by scattering phenomena seen as offset differenc-
the RMSECV we computed the GAIN for each value. es from 7000 to 12 000 cm 2 1 . In the API spectrum main
The GAIN is computed as the ratio between an indicator peaks are identiŽ ed in the combinational band region, i.e.,
or RMSECV value to a reference value. The reference 4650 cm 2 1 and 4940 cm 2 1, and in the Ž rst overtone region
value for the indicators or RMSECV is the value when we Ž nd a peak at 6000 cm 2 1 , and in the second overtone
using spectra without any preprocessing applied and us- region a peak at 8800 cm 2 1 is apparent.
ing the whole wavelength range: Choosing the O ptimal Preprocessing M ethod for
SE pre Powder Samples. To span the interference space for the
SE gain 5 (16) invSEN, SE, and EI indicator we used Ž ve blank sample
SE ref
spectra, symbolized with open squares in Fig. 2. To com-
invSEN ref pute the invSEN we used two sample spectra containing
invSEN gain 5 (17)
invSEN pre the analyte, i.e., samples marked with grey circles in Fig.
2. To compute the SE we used two analyte spectra to
EI ref compute the signal and an additional twenty-Ž ve blank
EI gain 5 (18)
EI pre sample spectra to compute the error. To compute the EI,
RMSECV ref two sample spectra, i.e., samples marked with grey color
RMSECV gain 5 (19) in Fig. 2, and two analyte spectra were used. The
RMSECV pre RM SECV values were calculated using the 32 samples
where the subscript ‘‘ref ’’ means that the indicator and depicted in Fig. 2. W hen computing the RMSECV values
RM SECV value are computed using non-preprocessed the 32 samples were divided into 11 blocks, i.e., 10
spectra and the whole wavelength range, i.e., 4000 to blocks with three samples each and one block with two
10 000 cm 2 1 for the powder samples and 7300 to 10 000 samples, and then cross-validation was perform ed leaving
cm 2 1 for the tablet samples. The subscript ‘‘pre’’ means out one block each time. Based on the cross-validation

268 Volume 58, Number 3, 2004


F IG . 4. Optimal preprocessing method. Gain values for indicators and F IG . 5. Optimal preprocessing/wavelength selection. Gain values for
RMSECV for powder sam ples. indicators and RMSECV for powder samples.

results, Ž ve PLS components were selected for the PLS length interval, i.e., I 2 1 and increasing with increasing
model of the whole wavelength range. interval width (see insert in Fig. 5).
The indicator values and the RM SECV were calculated It is important to notice that the selection of prepro-
using the 4000 –10 000 cm 2 1 wavelength region and by cessing method using the whole wavenumber range is not
applying the preprocessing methods listed in Table I. In representative of the results when only a small wave-
Fig. 4 the gain values are depicted for the indicators and length region is used. Therefore, combining preprocess-
RM SECV. The RM SECV shows that the best prepro- ing and wavelength selection, as is done here, seems to
cessing method is Ž rst derivatives using 25 spectral be necessary.
points with a gain value of 2.9. The SE indicator has the Results for Tablet Samples. To span the interference
highest gain for Ž rst derivatives, while the EI indicator space for the invSEN, SE, and EI indicators we used
has the highest gain for second derivatives. The invSEN three blank sample spectra. To compute the invSEN we
indicator has the highest gain for MSC, which is clearly used two samples with a high concentration of API. To
wrong compared to the PLS results. compute the SE we used two sample spectra, i.e., using
W avelength Selection for Powder Samples. Indicator two samples with a high concentration of API as substi-
and RM SECV values were computed for twenty wave- tution for pure analyte tablet spectra that were not avail-
length intervals around 6000 cm 2 1. For all intervals, four able to compute the signal, and an additional three blank
PLS components were used to calculate the RMSECV sample spectra to compute the error. To compute the EI
values. Again the number of PLS components is based four sample spectra with a high API concentration were
on cross-validation results. This was done for all seven used. Two of the sample spectra were used to compute
preprocessing methods and the highest gain value for the the average r *k and the two other sample spectra were
RM SECV was then found to be 5.95 when preprocessing used to compute the average s*k (Eqs. 2 and 3) because
method 5 was used with the wavelength interval from no p ure analy te tablet sam ples are av ailable. T he
5840 –6160 cm 2 1 (Fig. 5). This matched perfectly the SE RM SECV values were calculated using all 18 samples.
indicator that had the highest gain value for the same W hen the RMSECV value was computed the leave-one-
preprocessing method and wavelength interval as the out principle was used because of the limited size of the
PLS method. Also, the EI indicator had the highest gain dataset.
value for preprocessing method 5, but using the wave- Choosing the O ptimal Preprocessing M ethod for
length interval from 5760 –6240 cm 2 1. The shape of the Tablet Samples. Also for the tablet samples, comparison
RM SECV gain curve corresponded well with the shape of the preprocessing methods using a broad spectral range
of the SE gain curve, and also the gain values were all was not a feasible method, i.e., preprocessing combined
above one for the RMSECV and the SE. The gain values with wavelength selection was necessary.
for the EI when applying preprocessing m ethod 5 were W avelength Selection for Tablet Samples. Indicator
only above one for three intervals, i.e., I 2 2, I 2 3, and and RM SECV values were com puted for Ž fteen wave-
I 2 4, while the remaining intervals were less then one, length intervals around 8800 cm 2 1 with all the prepro-
indicating that no preprocessing and using the whole cessing methods described in Table I. All PLS m odels
wavelength region was better for those intervals (Fig. 5). were calculated using four PLS components. The highest
The invSEN indicator was not useful for wavelength se- gain value for the RMSECV was 3.6 when using prepro-
lection using any of the preprocessing m ethods. The cessing method 5, i.e., Ž rst derivatives with 25 spectral
highest gain value for the invSEN was 11.8 using M SC points and the wavelength interval 8620 –8980 cm 2 1 (Fig.
as the preprocessing method and the wavelength interval 6). Also, the SE had the maximum gain value of 3.8 using
from 4000 to 10 000 cm 2 1, and when using all other pre- preprocessing method 5 and the interval 8620 –8980 cm 2 1
processing methods the gain for the invSEN was always (Fig. 6). The shape of the RM SECV and the SE gain
below one, with the lowest value for the smallest wave- curves were fairly similar. As for the powder samples,

APPLIED SPECTROSCOPY 269


unclear how interactions between the analyte and the in-
terferents are dealt with. This is a general problem of the
NAS approach, but even for more commonly used in-
verse calibration methods such as PLS or PCR this is not
clear.

CONCLUSION
We have demonstrated a new indicator for choosing
the optimal preprocessing m ethod and conducting wave-
length selection of NIR spectra. The indicator was com-
pared to existing indicators also using net analyte signal
computations and the standard m ethodology using cross-
validation results from a PLS regression m odel. The in-
dicator performed better then the two reference methods
using net analyte signal methodology. The invSEN failed
generally to Ž nd the optimal preprocessing method and
F IG . 6. Optimal preprocessing/wavelength selection. Gain values for was also not useful for wavelength selection. The EI in-
indicators and RMSECV for tablet samples. dicator was developed for wavelength selection but we
tried to use it for selection of optimal preprocessing meth-
od without success for both the powder and tablet sam-
the invSEN was not useful for wavelength selection and
ples. For wavelength selection the EI indicator performed
the gain values were less then one except for the M SC
reasonably for the powder samples and identiŽ ed a few
method. The EI had a m aximum gain at 1.29 when M SC
wavelength intervals that improved the calibration model,
was used for preprocessing and the wavelength interval
but not the optimal selection (Fig. 5). The indicator could
was 8320 –9280 cm 2 1 (not depicted) and was in general
not be used for wavelength selection of the tablet sam-
not useful for wavelength selection of the tablet samples.
ples. The SE indicator identiŽ ed the right preprocessing
The problem with the invSEN indicator is that when
method and also the optimal wavelength selection both
the spectra are preprocessed using Ž rst and second deriv-
for the powder and the tablet samples. For the tablet sam-
atives the Euclidean length of the spectra and subsequent-
ples the right preprocessing m ethod was not obvious and
ly the net analyte signal vectors are lowered. This de-
was identiŽ ed only after subsequent wavelength selection
creases the analyte sensitivity as computed in Eq. 4 with-
was performed (Fig. 6). Thus, in cases where only a few
out regard to the analytical performance of a calibration
samples are available, reference values are determined
model using derivative spectra. In the original publica-
with a high error, or are not available, we recommend
tion, Faber assumed that only white noise is present,
this new indicator.
which is a huge simpliŽ cation of real spectroscopic sys-
In this study the proposed m ethod is only demonstrated
tems in pharmaceutical applications. This m ight also ex-
for re ectance spectra of powder samples and transm it-
plain why the m ethod fails with our examples.
tance spectra of whole tablets. M ore and different spec-
The EI indicator perform ed reasonably well but with
troscopic applications are necessary to corroborate the
failures. Wavelength selection of the tablet samples was
obtained results and to understand the limitations of this
not possible. The reason for the failure with the tablet
method. It might be the case that for different applica-
samples might be that no ‘‘pure analyte tablet’’ was avail-
tions, the proposed indicator will not always be the best
able. In the EI, the net analyte signal vector of a sample
choice for selection of the optimal preprocessing and
and analyte spectra are compared. But as pure analyte
wavelength points.
spectra are not always available and generally not for
tablet samples the EI is not usable for this sample type.
ACK NOW LEDGM ENT
The validation of the SE method is only performed on
the zero concentration level. Therefore, it can be expected Novo Nordisk, Corporate Research Affairs (CORA) sponsored this
work as a part of E.T.S. Skibsted’s Ph.D. project.
that the m ethod will work better for low concentrations.
During the work we discovered that a good selection of
blank samples is the ‘‘key’’ to the SE indicator. For the 1. M . Blanco, H. Iturriaga, S. Maspoch, and C. Pezuela, Analyst
powder samples we had measured each of the Ž ve blank (Cambridge, U.K.) 123, 135 (1998).
2. M . Blanco, J. Coello, A. Eustaquio, H. Iturriaga, and S. Maspoch,
samples eight times, giving forty blank spectra. Among Anal. Chim. Acta 392, 237 (1999).
these spectra we picked a few spectra to span the inter- 3. M . Blanco, J. Coello, H. Iturriaga, S. M aspoch, and D. Serrano,
ference space and a larger portion to quantify the error. Analyst (Cambridge, U.K.) 123, 2307 (1998).
We recommend that as many blank samples as possible 4. R. D. Maesschalck, F. C. Sánchez, D. L. M assart, P. Doherty, and
P. Hailey, Appl. Spectrosc. 52, 725 (1998).
be m easured using repeated measurem ents, and in that
5. F. C. Sánchez, J. Toft, B. Bogaert, S. S. Dive, and P. Hailey, Fre-
manner, instrumental noise and baseline drift are includ- senius’ J. Anal. Chem. 352, 771 (1995).
ed. This is easy to do in most industrial applications, but 6. H. Martens and T. Næs, Trends Anal. Chem. 3, 204 (1984).
might be more difŽ cult for environmental products. Also 7. H. Martens and T. Næs, Multivariate Calibration (John Wiley and
reposition the samples and for powder samples, shake the Sons, Chichester, 1989).
8. H. Swierenga, A. P. de Weijer, R. J. van Wijk, and L. M. C. Buy-
samples. In that manner, heterogeneous samples are best dens, Chemom. Intell. Lab. Syst. 49, 1 (1999).
measured. 9. P. Geladi, D. McDougall, and H. Martens, Appl. Spectrosc. 39, 491
A problematic issue for all NAS m ethods is that it is (1985).

270 Volume 58, Number 3, 2004


10. A. Savitzky and M. J. E. Golay, Anal. Chem. 36, 1627 (1964). 18. J. Ferré and F. X. Rius, Anal. Chem. 70, 1999 (1998).
11. Q. Ding and G. W. Small, Anal. Chem. 70, 4472 (1998). 19. O. Berntsson, L. G. Danielsson, M. O. Johansson, and S. Folestad,
12. L. Nørgaard, A. Saudland, J. Wagner, J. P. Nielsen, L. Munck, and Anal. Chim. Acta 419, 45 (2000).
S. B. Engelsen, Appl. Spectrosc. 54, 413 (2000). 20. O. Berntsson, Ph.D. Thesis, ‘‘Characterization and Application of
13. C. H. Spiegelman, M. J. McShane, M . J. Goetz, M. Motamedi, Q. Near Infrared Re ection Spectroscopy for Quantitative Process
L. Yue, and G. L. Coté, Anal. Chem. 70, 35 (1998).
Analysis of Powder Mixtures’’, Kungliga Tekniska Högskolan
14. A. Lorber, Anal. Chem. 58, 1167 (1986).
15. N. M. Faber, Anal. Chem. 71, 557 (1999). (KTH), Stockholm, Sweden (2001).
16. L. Xu and I. Schechter, Anal. Chem. 68, 1842 (1996). 21. H. Swierenga, Ph.D. Thesis, ‘‘Robust Multivariate Calibration
17. H. F. Boelens, W. T. Kok, O. E. de Noord, and A. K. Smilde, M odels in Vibrational Spectroscopic Applications’’, Katholieke
internal report, available upon request. Universiteit Nijmegen, The Netherlands (2000).

APPLIED SPECTROSCOPY 271

You might also like