data.
Normalization (clusterSim)
Types of variable normalization formulas
A. Variable (column) normalization
Variable (column) normalization can be applied to any data matrix.
Selection of ob-
1 data matrix [𝑥𝑖𝑗 ]
jects and variables
Variable scale
Ratio Ratio Interval
level
n1 – standardization
n1 – standardization n2 – positional standardiza-
n2 – positional standardization tion
n3 – unitization n3 – unitization
n6 – quotient transformation
n3a – positional unitization n3a – positional unitization
n6a – positional quotient transfor-
n4 – unitization with zero n4 – unitization with zero
mation
minimum minimum
n7 – quotient transformation
2 Selection of varia- n5 – normalization in range n5 – normalization in range
n8 – quotient transformation
ble normalization [–1, 1] [–1, 1]
n9 – quotient transformation
formula n5a – positional normalization n5a – positional normaliza-
n9a – positional quotient transfor-
in range [–1, 1] tion in range [–1, 1]
mation
n12 – normalization n12 – normalization
n10 – quotient transformation
n12a – positional normaliza- n12a – positional normaliza-
n11 – quotient transformation
tion tion
n13 – normalization with zero n13 – normalization with
being the central point zero being the central
point
Transformed vari-
Ratio Interval Interval
able scale level
(n1) 𝑧𝑖𝑗 = (𝑥𝑖𝑗 − 𝑥̄𝑗 )/𝑠𝑗
(n2) 𝑧𝑖𝑗 = (𝑥𝑖𝑗 −̶ 𝑚𝑒𝑑𝑗 )⁄𝑚𝑎𝑑𝑗
(n3) 𝑧𝑖𝑗 = (𝑥𝑖𝑗 − 𝑥̄𝑗 )/𝑟𝑗
(n3a) 𝑧𝑖𝑗 = (𝑥𝑖𝑗 − 𝑚𝑒𝑑𝑗 )⁄𝑟𝑗
(n4) 𝑧𝑖𝑗 = [𝑥𝑖𝑗 −̶ min {𝑥𝑖𝑗 }]⁄𝑟𝑗
𝑖
(n5) 𝑧𝑖𝑗 = (𝑥𝑖𝑗 − 𝑥̄𝑗 )⁄𝑚𝑎𝑥 |𝑥𝑖𝑗 − 𝑥̄𝑗 |
𝑖
(n5a) 𝑧𝑖𝑗 = (𝑥𝑖𝑗 − 𝑚𝑒𝑑𝑗 )⁄𝑚𝑎𝑥 |𝑥𝑖𝑗 − 𝑚𝑒𝑑𝑗 |
𝑖
(n6) 𝑥𝑖𝑗 ⁄𝑠𝑗
(n6a) 𝑧𝑖𝑗 = 𝑥𝑖𝑗 ⁄𝑚𝑎𝑑𝑗
(n7) 𝑥𝑖𝑗 ⁄𝑟𝑗
(n8) ⁄
𝑥𝑖𝑗 𝑚𝑎𝑥{𝑥𝑖𝑗 }
𝑖
(n9) 𝑥𝑖𝑗 ⁄𝑥̄𝑗
(n9a) 𝑧𝑖𝑗 = 𝑥𝑖𝑗 ⁄𝑚𝑒𝑑𝑗
(n10) 𝑥𝑖𝑗 ⁄∑𝑛𝑖=1 𝑥𝑖𝑗
(n11) 𝑥𝑖𝑗 ⁄√∑𝑛𝑖=1 𝑥𝑖𝑗
2
𝑥𝑖𝑗 −𝑥̄ 𝑗
(n12) 𝑧𝑖𝑗 =
√∑𝑛
𝑖=1(𝑥𝑖𝑗 −𝑥̄ 𝑗 )
2
𝑥𝑖𝑗 −𝑚𝑒𝑑𝑗
(n12a) 𝑧𝑖𝑗 =
√∑𝑛
𝑖=1(𝑥𝑖𝑗 −𝑚𝑒𝑑𝑗 )
2
1
𝑥𝑖𝑗 −𝑚𝑗
(n13)1 𝑧𝑖𝑗 =
𝑟𝑗/2
where: 𝑥𝑖𝑗 (𝑧𝑖𝑗 ) – i-th observation on j-th variable (i-th normalized observation on j-th variable),
𝑥̄𝑗 (𝑠𝑗 ) – mean (standard deviation) for j-th variable,
𝑚𝑒𝑑𝑗 = 𝑚𝑒𝑑 (𝑥𝑖𝑗 ) – median for j-th variable,
𝑖
𝑚𝑎𝑑𝑗 = 𝑚𝑎𝑑 (𝑥𝑖𝑗 ) – median absolute deviation for j-th variable,
𝑖
𝑟𝑗 = 𝑚𝑎𝑥{𝑥𝑖𝑗 } − 𝑚𝑖𝑛{𝑥𝑖𝑗 } – range for j-th variable,
𝑖 𝑖
𝑚𝑎𝑥 {𝑥𝑖𝑗 }+𝑚𝑖𝑛{𝑥𝑖𝑗 }
𝑚𝑗 = 𝑖 𝑖
– mid-range for j-th variable.
2
B. Object (row) normalization
The same normalization procedures can be applied as for variable (column) normalization. Object
(row) normalization makes sense only when all variables are expressed in the same unit. This is often
the case for instance with structural data.
References
Anderberg, M.R. (1973), Cluster analysis for applications, Academic Press, New York, San Fran-
cisco, London.
Gatnar, E., Walesiak, M. (Eds.) (2004), Metody statystycznej analizy wielowymiarowej w badaniach
marketingowych [Multivariate statistical analysis methods in marketing research], Wydawnictwo
AE, Wroclaw, 35-38.
Jajuga, K., Walesiak, M. (2000), Standardisation of data set under different measurement scales, In:
R. Decker, W. Gaul (Eds.), Classification and information processing at the turn of the millen-
nium, Springer-Verlag, Berlin, Heidelberg, 105-112. DOI: https://doi.org/10.1007/978-3-642-
57280-7_11.
Milligan, G.W., Cooper, M.C. (1988), A study of standardization of variables in cluster analysis,
“Journal of Classification”, vol. 5, 181-204.
Młodak, A. (2006), Analiza taksonomiczna w statystyce regionalnej, Difin, Warszawa.
Walesiak, M. (2014), Przegląd formuł normalizacji wartości zmiennych oraz ich własności w staty-
stycznej analizie wielowymiarowej [Data normalization in multivariate data analysis. An overview
and properties], “Przegląd Statystyczny” (Statistical Review), vol. 61, no 4, 365-374.
1
http://www.benetzkorn.com/2011/11/data-normalization-and-standardization/ (1.06.2014).