Sensors 21 04412
Sensors 21 04412
Review
An Overview of Machine Learning within Embedded and
Mobile Devices–Optimizations and Applications
Taiwo Samuel Ajani 1 , Agbotiname Lucky Imoize 1,2, * and Aderemi A. Atayero 3
Abstract: Embedded systems technology is undergoing a phase of transformation owing to the novel
advancements in computer architecture and the breakthroughs in machine learning applications.
The areas of applications of embedded machine learning (EML) include accurate computer vision
schemes, reliable speech recognition, innovative healthcare, robotics, and more. However, there
exists a critical drawback in the efficient implementation of ML algorithms targeting embedded
applications. Machine learning algorithms are generally computationally and memory intensive,
making them unsuitable for resource-constrained environments such as embedded and mobile
devices. In order to efficiently implement these compute and memory-intensive algorithms within
the embedded and mobile computing space, innovative optimization techniques are required at
the algorithm and hardware levels. To this end, this survey aims at exploring current research
trends within this circumference. First, we present a brief overview of compute intensive machine
Citation: Ajani, T.S.; Imoize, A.L.;
Atayero, A.A. An Overview of
learning algorithms such as hidden Markov models (HMM), k-nearest neighbors (k-NNs), support
Machine Learning within Embedded vector machines (SVMs), Gaussian mixture models (GMMs), and deep neural networks (DNNs).
and Mobile Devices–Optimizations Furthermore, we consider different optimization techniques currently adopted to squeeze these
and Applications. Sensors 2021, 21, computational and memory-intensive algorithms within resource-limited embedded and mobile
4412. https://doi.org/10.3390/ environments. Additionally, we discuss the implementation of these algorithms in microcontroller
s21134412 units, mobile devices, and hardware accelerators. Conclusively, we give a comprehensive overview
of key application areas of EML technology, point out key research directions and highlight key
Academic Editor: Paolo Gastaldo take-away lessons for future research exploration in the embedded machine learning domain.
has opened a research thrust between embedded devices and machine learning models
termed “Embedded Machine Learning” where machine learning models are executed
within resource-constrained environments [13]. This research surveys key issues within
this convergence of embedded systems and machine learning.
Machine learning methods such as SVMs for feature classification [14], CNNs for
intrusion detection [15], and other deep learning techniques, require high computational
and memory resources for effective training and inferencing [16–19]. General-purpose
CPUs, even with their architectural modification over the years, including pipelining,
deep cache memory hierarchies, multicore enhancements, etc., cannot meet the high
computational demand of deep learning models. However, graphic processing units
(GPUs), due to their high floating-point performance and thread-level parallelism, are
more suitable for training deep learning models [13]. Extensive research is actively being
carried out to develop suitable hardware acceleration units using FPGAs [20–26], GPUs,
ASICs, and TPUs to create heterogeneous and sometimes distributed systems to meet
up the high computational demand of deep learning models. At both the algorithm
and hardware levels, optimization techniques for classical machine learning and deep
learning algorithms are being investigated such as pruning, quantization, reduced precision,
hardware acceleration, etc. to enable the efficient execution of machine learning models in
mobile devices and other embedded systems [27–29].
The convergence of machine learning methods and embedded systems in which com-
putationally intensive machine learning models target the resource-constrained embedded
environment has opened a plethora of opportunities for research in computing technol-
ogy. Although EML is just in its cradle, quite some work has been done to: (1) optimize
different machine learning models to fit into resource-limited environments, (2) develop
efficient hardware architectures (acceleration units) using custom chipsets to accelerate
the implementation of these algorithms, and (3) create novel and innovative specialized
hardware architectures to meet the high-performance requirements of these models. Thus,
there is a need to bring these perspectives together to provide the interested researcher
with the fundamental concepts of EML and further provide the computer architect with
insights and possibilities within this space.
Interestingly, several surveys have been carried out to achieve this. For example
references [30,31] survey deep learning concepts, models, and optimizations. In these
surveys, little consideration is given to the hardware architectural design, which is a key
concern in developing efficient machine learning systems. Pooja [32] surveys recent trends
in the hardware architectural design for machine learning applications, using the tensor
processing unit as a case study. However, the research did not explore the different DNN
architectures and just skimmed through some deep learning optimization techniques. Jiasi
and Xukan [33], in their review, explored deep learning concepts, narrowing down on
inference at end devices but do not compare different embedded chipset architectures to
inform which architecture or optimization is appropriate for the different DNN models.
They also present applications of deep learning in end devices. They, however, only
explored a type of deep learning model (DNNs) and did not discuss other deep learning
models (CNN, RNN), which have gained attention in recent times. Sergio et al. [24] have
carried out a comprehensive survey on ML in embedded and mobile devices, presenting
ML concepts and techniques for optimization and also investigated different application
areas. They, however, also do not explore other models of DNNs or make appropriate trade-
offs. To address these drawbacks, this survey presents key compute and memory intensive
machine learning algorithms, which are the HMM, k-NN, SVM, GMM, and the different
shades of DNNs (CNN and RNN), and present hardware-based and algorithm-based
optimization techniques required to compress these algorithms within resource-constrained
environments. To sum up, the authors decided to consider diverse application areas where
machine learning has been utilized in proffering solutions to stringent problems in this big
data era. A comprehensive layout of this survey is presented in Figure 1.
scribes machine learning in resource-constrained environments (MCUs, mobile devices,
acceleration units, and TinyML). Section 4 presents challenges and possible optimization
opportunities in embedded machine learning. Section 5 provides diverse areas of appli-
cations of embedded machine learning technology, while Section 6 presents plausible re-
Sensors 2021, 21, 4412 3 of 44
search directions, open issues, and lessons learned. In Section 7, a concise conclusion is
presented.
HMM
K-NN
Support Vector
Machines
Figure 1.Figure
The layout
1. The of
layout of Embedded
Embedded Machine
Machine Learning
Learning Computing
Computing Architectures
Architectures and Machine
and Machine Learning
Learning and Optimiza-
and Optimization Techniques.
tion Techniques.
The key contributions of this survey are as follows:
i. We present a survey of machine learning models commonly used in embedded
systems applications.
ii. We describe an overview of compute-intensive machine learning models such as
HMMs, k-NNs, SVMs, GMMs, and DNNs.
iii. We provide an overview of different optimization schemes adopted for
these algorithms.
iv. We present an overview of the implementation of these algorithms within resource-
limited environments such as MCUs, mobile devices, hardware accelerators,
and TinyML.
v. We survey the challenges faced in embedded machine learning and review different
optimization techniques to enhance the execution of deep learning models within
resource-constrained environments.
vi. We present diverse application areas of embedded machine learning, identify open
issues and highlight key lessons learned for future research exploration.
The remainder of this paper is organized as follows. Section 2 presents embedded
machine learning algorithms and specific optimization techniques, while Section 3 de-
scribes machine learning in resource-constrained environments (MCUs, mobile devices,
acceleration units, and TinyML). Section 4 presents challenges and possible optimization
opportunities in embedded machine learning. Section 5 provides diverse areas of applica-
Sensors 2021, 21, 4412 4 of 44
tions of embedded machine learning technology, while Section 6 presents plausible research
directions, open issues, and lessons learned. In Section 7, a concise conclusion is presented.
Table 2. Cont.
T
P( Z | X ) = ∏ P ( zi | xi ) (1)
i =1
maintaining a near-optimal configuration of a MANET. Also, SVMs are used in the design
and development of a low-cost and energy-efficient intelligent sensor [94]. SVMs are,
however, computationally and memory intensive and thus require hardware acceleration
units to be effectively executed in resource-limited situations. In [95], the FPGA hardware
implementation of an SVM is surveyed with optimization techniques.
l l l
1
maxW (∝) = ∑ ∝i − 2 ∑ · ∑ yi y j k xi , x j ∝i ∝ j
(6)
i =1 i =1 j =1
where ∝i ∝ j are Lagrange Multipliers, k xi , x j are the kernel functions, x and y are positions,
most suitable for resource-constrained environments is the Laplacian kernel because it can
be implemented using shifters [97].
Table 7. Cont.
1 1
g ( x | µ i , Σi ) = D/2 1/2
exp {− ( x − µi )0 Σ−1 i ( x − µi )}, (7b)
(2π) | Σi | 2
λ = { wi , µ i , Σ i } i = 1, . . . , M.
where x is a D-dimensional continuous-valued data vector (features), i = 1, . . . , M, are
the mixture weights, p( x |λ) is the probability density function, and g( x |µi , Σi ) are the
component Gaussian densities, µi is the mean vector, Σi is the covariance matrix.
Sensors 2021, 21, 4412 12 of 44
Optimization
Reference Application Comments
Method
The results of this research were
Minimization of
impressive, showing no
[106] Floating-Point Background Subtraction
degradation in accuracy except
Computations
for lower recall rates.
The results of this research reveal
Background Subtraction
Comprehensive good performance for
[107] for real-time tracking in
Sensing computational speed and reduce
Embedded Vision
the memory footprint by 50%
This work shows good
performance for processors
without FPU, thus reducing
Integer-based Background/foreground computation cost and reducing
[39]
technique Segmentation the memory footprint to 1/12 of
the original GMM; however, it
cannot be adopted for models
with more than two Gaussians
Nonlinear
Name FC Conv Vector Pool Total Weights
Function
MLP0 5 5 ReLU 20 M
MLP1 4 4 ReLU 5M
LSTM0 24 34 58 Sigmoid, tanh 52 M
LSTM1 37 19 56 Sigmoid, tanh 34 M
CNN0 16 16 ReLU 8M
CNN1 72 13 89 ReLU 100 M
Furthermore, like other machine learning models, deep learning models go through
three phases; train, test, and predict. The training phase of deep learning models is carried
out using a feedforward technique that entails sequentially passing data through the entire
network for a prediction to be made and back-propagating the error through the network.
The technique for backpropagation is called stochastic gradient descent (SGD), which
adjusts the weights or synapses of each layer in the network using a non-linear activation
function (tanh, sigmoid, rectified linear unit (ReLU)) [108–110]. Lin and Juan [111] carry out
research where they explore the possibility of developing an efficient hardware architecture
to accelerate the activation function of the network. The training process is often carried
out many times for the model to efficiently learn, and then using the trained model,
the prediction is made on new data. The training is very computationally and memory
intensive and is often carried out offline using very high-performance computing resources
mostly found in large data centers, while the inference targets low cost and resource-
constrained environments.
Ni f Kx Ky
∑ ∑ ∑
f
out( x, y) f0 =
w f i , f0 k x k y ∗ in x + k x , y + k y i (8)
f i =0 k x =0 k y =0
where out( x, y) f0 represent the output neuron and in( x, y) f0 represents the input neuron,
in x and y directions, w f i , f0 k x k y represent the synaptic weight and k x k y represent the
kernel position, Ni f are the input features. In addition, fi and f 0 represent the input and
output feature maps, respectively.
in( x, y) f
out( x, y) f = β
(10)
min( N f −1, f + 2k ) g 2
(c+ ∝ ∑ ∝ ( x, y) )
g=max(0, f − 2k )
where out( x, y) f are output neurons, in( x, y) f are input neurons, N f are the input features,
f and g are input and output feature maps respectively, k is the number of adjacent feature
maps, and c, α, and β are constants.
where “t” is the non-linear activation function, Ni are the input features, wij are the synaptic
weights, i and j are the input and output feature maps respectively.
(a)
(b)
Figure 2. Figure
Description of a spectrum
2. Description of certain CNN
of a spectrum models
of certain CNN to models
reveal their compute
to reveal theirand memory
compute and memory
demand: demand:
(a) Describes the Memory
(a) Describes theDemand
MemoryofDemand
these models in terms
of these of the
models number
in terms of of
theweight
number of weight
parameters in (millions)
parameters (b) Computational
in (millions) Demand of
(b) Computational these models
Demand in terms
of these models ofin
their number
terms of number of
of their
operations (GOPs).
operations (GOPs).
3.4. TinyML
Machine learning inference at the edge particularly within very low power MCUs is
gaining increased interest amongst the ML community. This interest pivots on creating
a suitable platform where ML models may be efficiently executed within IoT devices.
This has thus opened a growing research area in embedded machine learning termed
TinyML. TinyML is a machine learning technique that integrates compressed and optimized
machine learning to suit very low-power MCUs [141]. TinyML primarily differs from cloud
machine learning (where compute intensive models are implemented using high-end
computers in large datacenters like Facebook [142]), Mobile machine learning in terms of
their very low power consumption (averagely 0.1 W) as shown in Table 15. TinyML creates
a platform whereby machine learning models are pushed to user devices to inform good
user experience for diverse applications and it has advantages such as energy efficiency,
reduced costs, data security, low latency, etc., which are major concerns in contemporary
cloud computing technology [141]. Colby et al. [143] presented a survey where neural
network architectures (MicroNets) target commodity microcontroller units. The authors
efficiently ported MicroNets to MCUs using the TensorFlow Lite Micro platform. There
are different platforms developed to easily port ML algorithms to resource-constrained
environments. Table 16 presents a list of available TinyML frameworks commonly adopted
to push ML models into different compatible resource-limited devices.
Computing
Platform Architecture Memory Storage Power Ref.
Technology
Nvidia V100 GPU Nvidia VoltaTM 16 GB 1 TBs-PBs 250 W [144]
CloudML Nvidia Titan RTX GPU Nvidia TuringTM 24 GB 1 TBs-PBs 280 W [145]
Nvidia V100S GPU Nvidia VoltaTM 32 GB 1 TBs-PBs 250 W [144]
ST F446RE Arm M4 128 KB 0.5 MB 0.1 W [146]
TinyML ST F746ZG Arm M7 320 KB 1 MB 0.3 W [147]
ST F767ZI Arm M7 512 KB 2 MB 0.3 W [147]
1 Terabytes to Petabytes.
to perform different operations in an ASIC. It can be observed from Table 17 that the
amount of energy required to fetch data from the SRAM is much less, than when fetching
data from the off-chip DRAM and very minimal if the computation is done at the register
files. From this insight, we can conclude that computation should be done as close to the
processor as possible to save energy. However, this is a bottleneck because the standard
size of available on-chip memory in embedded architectures is very low compared to the
Sensors 2021, 21, x FOR PEER REVIEW size of deep learning models [124]. Algorithmic-based optimization 20 of 43 techniques for model
compression such as parameter pruning, sparsity, and quantization may be applied to
address this challenge [150]. Also, hardware design-based optimizations such as Tiling
to the processor as possible to save energy. However, this is a bottleneck because the
and size
standard dataofreuse may
available be utilized
on-chip memory [25]. The next
in embedded section expatiates
architectures some of these optimization
is very low com-
paredmethods
to the sizeinoffurther detail.
deep learning Furthermore,
models most machine-learning
[124]. Algorithmic-based optimization tech-models, especially deep
niques for modelmodels,
learning compression such ashuge
require parameter pruning,of
amounts sparsity, and quantization
multiply may
and accumulate (MAC) operations
be applied to address this challenge [150]. Also, hardware design-based optimizations
for effective training and inference. Figure 3 describes the power consumed by the MAC
such as Tiling and data reuse may be utilized [25]. The next section expatiates some of
theseunit as a function
optimization methods of the bitdetail.
in further precision adopted
Furthermore, by the system.
most machine-learning mod-We may observe that the
higher the
els, especially deepnumber of bits,require
learning models, the higher the power
huge amounts consumed.
of multiply Thus, to reduce the power
and accumulate
(MAC) operations during
consumed for effective training and inference.
computation, reduced Figure
bit3precision
describes thearithmetic
power con- and data quantization
sumed by the MAC unit as a function of the bit precision adopted by the system. We may
maythat
observe bethe
utilized
higher the[151].
number of bits, the higher the power consumed. Thus, to re-
duce the power consumed during computation, reduced bit precision arithmetic and data
quantization
Table 17. may be utilized
Energy [151].
Consumption in (pJ) of performing operations.
Table 17. Energy Consumption in (pJ) of performing operations.
Operation Energy (pJ)
Operation Energy (pJ)
8 bit int ADD 0.03
8 bit int ADD 0.03
16 bit int ADD 0.05
16 bit int ADD 0.05
32 bit int ADD 0.1
32 bit int ADD 0.1
16 bit float ADD 0.4
16 bit float ADD 0.4
32 bit float ADD
32 bit float ADD 0.9
0.9
8 bit MULT 8 bit MULT 0.2 0.2
32 bit MULT32 bit MULT 3.1 3.1
16 bit float MULT
16 bit float MULT 1.1 1.1
32 bit float MULT
32 bit float MULT 3.7 3.7
32 bit SRAM 32READ
bit SRAM READ 5.0 5.0
32 bit DRAM 32READ
bit DRAM READ 640 640
Source: Bill Dally,
Source: Cadence
Bill Dally, Embedded
Cadence Neural Network
Embedded NeuralSummit, 1 February
Network Summit,2017.
1 February 2017.
where value is the floating-point value, sign is the sign bit, mantissa is the mantissa bit.
− 2QI −1 ≤ ∝ ≤ 2QI −1 − 2−QF ε = 2−QF (13)
where ∝ represents the input integer, QI = # of integer bits and QF = # of fractional bits and
ε is the resolution of the fixed-point number.
where X represents the Posit value, rvalue represents the number regime and useed repre-
sents the scale factor.
4.5.6. Quantization
SVM and DNN model parameters are often represented using 32-bit floating-point
values [92,158], which are highly computationally and memory intensive. However, re-
search shows that these models can be implemented efficiently using low precision pa-
rameters (8-bit or 16-bit) with minimal accuracy loss [113,169]. Quantization describes
techniques aimed at reducing the bit width of the weights and activations of a machine-
learning model to reduce the memory storage and communication overhead required for
computation. This process thereby reduces the bandwidth required for communication,
overall power consumption, area, and circuitry required to implement the design. Many
research works have considered different quantization techniques for deep learning mod-
els. Courbariaux et al. [170] consider training a deep model using binary representation
(+1 and −1), of the model parameters using a binarization function given in Equation (15).
Also, [166] proposes a quantization scheme using ternary values (+1, 0, −1). The proposed
equation is given in Equations (16) and (17). Other quantization techniques involving
Bayesian quantization, weighted entropy-based quantization, vector quantization, and
two-bit networks are adopted in [50,51,158], and [171], respectively. Although quantization
techniques increase execution speed, the algorithm requires fine-tuning to avoid accuracy
loss [172,173]:
+1 i f x ≥ 0,
x b = Sign ( x ) = (15)
−1 Otherwise
where x b is the binarized variable (weights and activations) and x is the real-valued variable.
!
δE
wij, new = wij − α = wij − α δi y j (16)
δwij
Sensors 2021, 21, 4412 25 of 44
where wij is the new ternarized weight, α is the learning rate E is the output error, y j is the
output signal and δi is the error signal.
where ∅( x ) are the new ternarized activation functions and y is the output signal.
Table 20. Deep learning training in general purpose graphic processing units (GPGPUs).
systems are systems with more than one type of processor core. Most heterogeneous
computing systems are used as acceleration units for offloading computationally intensive
operations from the CPU, thereby increasing the system’s overall execution speed. Table 21
presents an area of application of deep learning training in heterogeneous computing
systems. A critical drawback in heterogeneous computing systems pivots around the
sharing of memory resources, data bus, etc. If designed inefficiently, it can result in data
traffic and thus increase latency and power consumption.
accuracy concerns, the quantized parameters of the model may be retrained and fine-tuned
to restore prediction confidence.
Lesson five: To tackle latency concerns that are a result of off-chip memory transfers,
optimizations may be carried out such that model parameters may be cache on-chip for
data reuse. This optimization can be done using techniques such as tiling or simple vector
decomposition, where input data may be partitioned into bits or tiles that can fit into
on-chip memory and may be reused for computation when required. This technique avoids
frequent off-chip memory transfers, which is a major concern for both latency and power
consumption. Hardware acceleration units may be designed to integrate a Tiling Unit
to carry out this operation at the hardware level. Some other techniques to inform high
throughput involve pipelining, on-chip buffer optimization, data access optimizations, etc.
Lesson six: Although hardware acceleration using custom FPGA logic, GPUs, or CPUs
addresses compute power demands, a most promising solution is to develop application-
specific architectures using ASICs. Interestingly, every processor architecture has its pros
and cons such as the energy efficiency and reconfigurability of FPGAs, but they are slow
and hard to program, the high performance of GPU processors but they are power-hungry,
the flexibility of general-purpose CPU architectures but are slow with ML computations,
etc. Of all these processor architectures, ASICs possess the best performance in terms of
energy efficiency because they are hardwired designs to target a specific application. They
consume very low power and incur very low costs too. They, however, trade-off flexibility
for performance and take a lot of time to market. ASICs are thus gaining renewed interest
in the design and development of application-specific machine learning architectures, with
Google TPU being a successful case study.
7. Conclusions
Machine learning models are fast proliferating embedded devices with limited com-
putational power and memory space. These machine learning models are compute and
memory intensive and thus, face the critical limitation of available hardware resources in
embedded and mobile devices. In this paper, optimization techniques and various applica-
tions of machine learning algorithms within resource-limited environments are presented.
We first survey the embedded machine learning space to determine the common machine
learning algorithms adopted and select key compute and memory-intensive models such as
HMMs, k-NNs, SVMs, GMMs, and DNNs. We survey specialized optimization techniques
commonly adopted to squeeze these algorithms within resource-limited environments.
Also, we present different hardware platforms such as microcontroller units, mobile devices,
accelerators, and even TinyML frameworks, which are used to port these algorithms to
resource-limited MCUs. Furthermore, we survey the challenges encountered in embedded
machine learning and present a more detailed exposition on certain hardware-oriented and
algorithm-oriented optimization schemes to address these bottlenecks. Additionally, an
exciting look is given to different hardware and algorithm-based optimization techniques,
including model pruning, data quantization, reduced precision, tiling, and others to deter-
mine which optimization technique best suits the different ML algorithms. Interesting and
viable application areas, open research issues, and key take-away lessons are presented in
this intersection of embedded systems and machine learning. Conclusively, this survey
attempts to create awareness for the passionately interested researcher to kick-start an
adventure into this promising landscape of embedded machine learning.
Author Contributions: T.S.A. and A.L.I. were responsible for the Conceptualization of the topic;
Article gathering and sorting were done by T.S.A. and A.L.I.; Manuscript writing and original drafting
and formal analysis were carried out by T.S.A. and A.L.I.; Writing of reviews and editing was done
by A.L.I. and A.A.A.; A.L.I. led the overall research activity. All authors have read and agreed to the
published version of the manuscript.
Funding: Agbotiname Lucky Imoize is supported by the Nigerian Petroleum Technology Devel-
opment Fund (PTDF) and the German Academic Exchange Service (DAAD) through the Nigerian-
German Postgraduate Program under Grant 57473408.
Sensors 2021, 21, 4412 36 of 44
Institutional Review Board Statement: This article does not contain any studies with human partic-
ipants or animals performed by any of the authors.
Informed Consent Statement: Not applicable.
Data Availability Statement: Data sharing does not apply to this article.
Acknowledgments: This work was carried out in collaboration with the IoT-enabled Smart and
Connected Communities (SmartCU) Research Cluster of Covenant University. The Article Processing
Charges is sponsored by Covenant University Centre for Research, Innovation, and Development
(CUCRID), Covenant University, Ota, Nigeria.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
3. Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet classification with deep convolutional neural networks Alex. Adv. Neural Inf.
Process. Syst. 2012, 25, 1097–1105.
4. Szegedy, C.; Liu, W.; Jia, P.Y.; Reed, S.S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions.
In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June
2015; pp. 1–9. [CrossRef]
5. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society
Conference Computer Vision Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [CrossRef]
6. Real, E.; Moore, S.; Selle, A.; Saxena, S.; Suematsu, Y.L.; Tan, J.; Le, Q.V.; Kurakin, A. Large-scale evolution of image classifiers. In
Proceedings of the 34th International Conference Machine Learning ICML, Sydney, Australia, 6–11 August 2017; pp. 4429–4446.
7. Tan, M.; Le, Q.V. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th
International Conference Machine Learning ICML 2019, Long Beach, CA, USA, 10–15 June 2019; pp. 10691–10700.
8. Hinton, G.; Deng, L.; Yu, D.; Dahl, G.E.; Mohamed, A.R.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen, P.; Sainath, T.N.; et al.
Deep neural networks for acoustic modeling in speech recognition. IEEE Signal. Process. Mag. 2012, 29, 82–97. [CrossRef]
9. Chan, W.; Jaitly, N.; Le, Q.V.; Vinyals, O. Listen, attend and spell. In Proceedings of the 2016 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016.
10. Wu, Y.; Schuster, M.; Chen, Z.; Le, Q.V.; Norouzi, M.; Macherey, W.; Krikun, M.; Cao, Y.; Gao, Q.; Macherey, K.; et al. Google’s
neural machine translation system: Bridging the Gap between human and machine translation. arXiv 2016, arXiv:1609.08144.
11. Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; Kuksa, P. Natural language processing (almost) from scratch.
J. Mach. Learn. Res. 2011, 12, 2493–2537.
12. Haj, R.B.; Orfanidis, C. A discreet wearable long-range emergency system based on embedded machine learning. In Proceedings
of the 2021 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops),
Kassel, Germany, 22–26 March 2021.
13. Dean, J. The deep learning revolution and its implications for computer architecture and chip design. In Proceedings of the 2020
IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA, 16–20 February 2020; pp. 8–14. [CrossRef]
14. Cui, X.; Liu, H.; Fan, M.; Ai, B.; Ma, D.; Yang, F. Seafloor habitat mapping using multibeam bathymetric and backscatter intensity
multi-features SVM classification framework. Appl. Acoust. 2020, 174, 107728. [CrossRef]
15. Khan, M.A.; Kim, J. Toward developing efficient Conv-AE-based intrusion detection system using heterogeneous dataset.
Electronics 2020, 9, 1771. [CrossRef]
16. Li, P.; Luo, Y.; Zhang, N.; Cao, Y. HeteroSpark: A heterogeneous CPU/GPU spark platform for machine learning algorithms. In
Proceedings of the 2015 IEEE International Conference Networking, Architecture Storage, NAS, Boston, MA, USA, 6–7 August
2015; pp. 347–348. [CrossRef]
17. Raparti, V.Y.; Pasricha, S. RAPID: Memory-aware NoC for latency optimized GPGPU architectures. IEEE Trans. Multi-Scale
Comput. Syst. 2018, 4, 874–887. [CrossRef]
18. Cheng, X.; Zhao, Y.; Robaei, M.; Jiang, B.; Zhao, H.; Fang, J. A low-cost and energy-efficient noc architecture for GPGPUs. J. Nat.
Gas Geosci. 2019, 4, 1–28. [CrossRef]
19. Zhang, L.; Cheng, X.; Zhao, H.; Mohanty, S.P.; Fang, J. Exploration of system configuration in effective training of CNNs on
GPGPUs. In Proceedings of the 2019 IEEE International Conferece Consumer Electronics ICCE, Las Vegas, NJ, USA, 11 January
2019; pp. 1–4. [CrossRef]
20. Yu, Q.; Wang, C.; Ma, X.; Li, X.; Zhou, X. A deep learning prediction process accelerator based FPGA. In Proceedings of the
2015 IEEE/ACM 15th International Symposium Cluster Cloud, Grid Computer CCGrid 2015, Shenzhen, China, 4–7 May 2015;
pp. 1159–1162. [CrossRef]
21. Noronha, D.H.; Zhao, R.; Goeders, J.; Luk, W.; Wilton, S.J.E. On-chip FPGA debug instrumentation for machine learning
applications. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Seaside,
CA, USA, 24–26 February 2019. [CrossRef]
22. Wang, C.; Gong, L.; Yu, Q.; Li, X.; Xie, Y.; Zhou, X. DLAU: A scalable deep learning accelerator unit on FPGA. IEEE Trans. Comput.
Des. Integr. Circuits Syst. 2016, 36, 513–517. [CrossRef]
23. Chang, A.X.M.; Martini, B.; Culurciello, E. Recurrent Neural Networks Hardware Implementationon FPGA. Available online:
http://arxiv.org/abs/1511.05552 (accessed on 15 January 2021).
24. Branco, S.; Ferreira, A.G.; Cabral, J. Machine learning in resource-scarce embedded systems, FPGAs, and end-devices: A survey.
Electronics 2019, 8, 1289. [CrossRef]
25. Zhang, C.; Li, P.; Sun, G.; Guan, Y.; Xiao, B.; Cong, J. Optimizing FPGA-based accelerator design for deep convolutional neural
networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey,
CA, USA, 22–24 February 2015; pp. 161–170. [CrossRef]
26. Neshatpour, K.; Mokrani, H.M.; Sasan, A.; Ghasemzadeh, H.; Rafatirad, S.; Homayoun, H. Architectural considerations for FPGA
acceleration of machine learning applications in MapReduce. In Proceedings of the 18th International Conference on Embedded
Computer Systems: Architectures, Modeling, and Simulation, Pythagorion, Greece, 15–19 July 2018. [CrossRef]
27. Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level Accuracy With 50×
Fewer Parameters and <0.5 mb Model Size. Available online: http://arxiv.org/abs/1602.07360 (accessed on 15 February 2021).
Sensors 2021, 21, 4412 38 of 44
28. Deng, Y. Deep learning on mobile devices: A review. In Proceedings of the SPIE 10993, Mobile Multimedia/Image Processing,
Security, and Applications 2019, 109930A, Baltimore, ML, USA, 14–18 April 2019. [CrossRef]
29. Kim, D.; Ahn, J.; Yoo, S. A novel zero weight/activation-aware hardware architecture of convolutional neural network. In
Proceedings of the 2017 Design, Automation and Test in Europe DATE 2017, Lausanne, Switzerland, 27–31 March 2017;
pp. 1462–1467. [CrossRef]
30. Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [CrossRef] [PubMed]
31. Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [CrossRef]
32. Jawandhiya, P. Hardware design for machine learning. Int. J. Artif. Intell. Appl. 2018, 9, 1–6. [CrossRef]
33. Chen, J.; Ran, X. Deep learning with edge computing: A review. Proc. IEEE 2019, 107, 1655–1674. [CrossRef]
34. Frank, M.; Drikakis, D.; Charissis, V. Machine-learning methods for computational science and engineering. Computation
2020, 8, 15. [CrossRef]
35. Xiong, Z.; Zhang, Y.; Niyato, D.; Deng, R.; Wang, P.; Wang, L.C. Deep reinforcement learning for mobile 5G and beyond:
Fundamentals, applications, and challenges. IEEE Veh. Technol. Mag. 2019, 14, 44–52. [CrossRef]
36. Carbonell, J.G. Machine learning research. ACM SIGART Bull. 1981, 18, 29. [CrossRef]
37. Jadhav, S.D.; Channe, H.P. Comparative STUDY of K-NN, naive bayes and decision tree classification techniques. Int. J. Sci. Res.
2016, 5, 1842–1845.
38. Chapter 4 Logistic Regression as a Classifier. Available online: https://www.cs.cmu.edu/~{}kdeng/thesis/logistic.pdf (accessed
on 29 December 2020).
39. Salvadori, C.; Petracca, M.; del Rincon, J.M.; Velastin, S.A.; Makris, D. An optimisation of Gaussian mixture models for integer
processing units. J. Real Time Image Process. 2017, 13, 273–289. [CrossRef]
40. Das, A.; Borisov, N.; Caesar, M. Do you hear what i hear? Fingerprinting smart devices through embedded acoustic components.
In Proceedings of the ACM Conference on Computer, Communication and Security, Scottsdale, AZ, USA, 3–7 November 2014;
pp. 441–452. [CrossRef]
41. Bojinov, H.; Michalevsky, Y.; Nakibly, G.; Boneh, D. Mobile Device Identification via Sensor Fingerprinting. Available online:
http://arxiv.org/abs/1408.1416 (accessed on 12 January 2021).
42. Huynh, M.; Nguyen, P.; Gruteser, M.; Vu, T. Mobile device identification by leveraging built-in capacitive signature. In Proceedings
of the ACM Conference on Compututer, Communication and Security, Denver, CO, USA, 12–16 October 2015; pp. 1635–1637.
[CrossRef]
43. Dhar, S.; Sreeraj, K.P. FPGA implementation of feature extraction based on histopathalogical image and subsequent classification
by support vector machine. IJISET Int. J. Innov. Sci. Eng. Technol. 2015, 2, 744–749.
44. Yu, L.; Ukidave, Y.; Kaeli, D. GPU-accelerated HMM for speech recognition. In Proceedings of the International Conference
Parallel Processing Work, Minneapolis, MN, USA, 9–12 September 2014; pp. 395–402. [CrossRef]
45. Zubair, M.; Yoon, C.; Kim, H.; Kim, J.; Kim, J. Smart wearable band for stress detection. In Proceedings of the 2015 5th International
Conference IT Converg. Secur. ICITCS, Kuala Lumpur, Malaysia, 24–27 August 2015; pp. 1–4. [CrossRef]
46. Razavi, A.; Valkama, M.; Lohan, E.S. K-means fingerprint clustering for low-complexity floor estimation in indoor mobile
localization. In Proceedings of the 2015 IEEE Globecom Work. GC Wkshps 2015, San Diego, CA, USA, 6–10 December 2015.
[CrossRef]
47. Bhide, V.H.; Wagh, S. I-learning IoT: An intelligent self learning system for home automation using IoT. In Proceedings of the 2015
International Conference Communication Signalling Process. ICCSP 2015, Melmaruvathur, India, 2–4 April 2015; pp. 1763–1767.
[CrossRef]
48. Munisami, T.; Ramsurn, M.; Kishnah, S.; Pudaruth, S. Plant Leaf recognition using shape features and colour histogram with
K-nearest neighbour classifiers. Proc. Comput. Sci. 2015, 58, 740–747. [CrossRef]
49. Sowjanya, K.; Singhal, A.; Choudhary, C. MobDBTest: A machine learning based system for predicting diabetes risk using mobile
devices. In Proceedings of the Souvenir 2015 IEEE Int. Adv. Comput. Conference IACC 2015, Banglore, India, 12–13 June 2015;
pp. 397–402. [CrossRef]
50. Lee, J.; Stanley, M.; Spanias, A.; Tepedelenlioglu, C. Integrating machine learning in embedded sensor systems for Internet-of-
Things applications. In Proceedings of the 2016 IEEE International Symposium on Signal Processing and Information Technology
(ISSPIT), Limassol, Cyprus, 12–14 December 2016; pp. 290–294. [CrossRef]
51. Qiu, J.; Wang, J.; Yao, S.; Guo, K.; Li, B.; Zhou, E.; Yu, J.; Tang, T.; Xu, N.; Song, S.; et al. Going deeper with embedded
FPGA platform for convolutional neural network. In Proceedings of the FPGA 2016ACM/SIGDA International Symposium
Field-Programmable Gate Arrays, Monterey, CA, USA, 21–23 February 2016; pp. 26–35. [CrossRef]
52. Huynh, L.N.; Balan, R.K.; Lee, Y. DeepSense: A GPU-based deep convolutional neural network framework on commodity mobile
devices. In Proceedings of the Workshop on Wearable Systems and Application Co-Located with MobiSys 2016, Singapore,
30 June 2016; pp. 25–30. [CrossRef]
53. Tuama, A.; Comby, F.; Chaumont, M. Camera model identification based machine learning approach with high order statistics
features. In Proceedings of the 24th European Signal Processing Conference (EUSIPCO), Budapest, Hungary, 29 August–2
September 2016; pp. 1183–1187. [CrossRef]
54. Kurtz, A.; Gascon, H.; Becker, T.; Rieck, K.; Freiling, F. Fingerprinting Mobile Devices Using Personalized Configurations. Proc.
Priv. Enhanc. Technol. 2016, 1, 4–19. [CrossRef]
Sensors 2021, 21, 4412 39 of 44
55. Mohsin, M.A.; Perera, D.G. An FPGA-based hardware accelerator for k-nearest neighbor classification for machine learning on
mobile devices. In Proceedings of the ACM International Conference Proceeding Series, HEART 2018, Toronto, ON, Canada,
20–22 June 2018; pp. 6–12. [CrossRef]
56. Patil, S.S.; Thorat, S.A. Early detection of grapes diseases using machine learning and IoT. In Proceedings of the 2016 Second
International Conference on Cognitive Computing and Information Processing (CCIP), Mysuru, India, 12–13 August 2016.
[CrossRef]
57. Ollander, S.; Godin, C.; Campagne, A.; Charbonnier, S. A comparison of wearable and stationary sensors for stress detection. In
Proceedings of the IEEE International Conference System Man, and Cybernetic SMC 2016, Budapest, Hungary, 9–12 October
2016; pp. 4362–4366. [CrossRef]
58. Moreira, M.W.L.; Rodrigues, J.J.P.C.; Oliveira, A.M.B.; Saleem, K. Smart mobile system for pregnancy care using body sensors. In
Proceedings of the International Conference Sel. Top. Mob. Wirel. Networking, MoWNeT 2016, Cairo Egypt, 11–13 April 2016;
pp. 1–4. [CrossRef]
59. Shapsough, S.; Hesham, A.; Elkhorazaty, Y.; Zualkernan, I.A.; Aloul, F. Emotion recognition using mobile phones. In Proceedings
of the 2016 IEEE 18th International Conference on e-Health Networking, Applications and Services (Healthcom), Munich,
Germany, 14–16 September 2016; pp. 276–281. [CrossRef]
60. Hakim, A.; Huq, M.S.; Shanta, S.; Ibrahim, B.S.K.K. Smartphone based data mining for fall detection: Analysis and design. Proc.
Comput. Sci. 2016, 105, 46–51. [CrossRef]
61. Ronao, C.A.; Cho, S.B. Recognizing human activities from smartphone sensors using hierarchical continuous hidden Markov
models. Int. J. Distrib. Sens. Netw. 2017, 13, 1–16. [CrossRef]
62. Kodali, S.; Hansen, P.; Mulholland, N.; Whatmough, P.; Brooks, D.; Wei, G.Y. Applications of deep neural networks for ultra
low power IoT. In Proceedings of the 35th IEEE International Conference on Computer Design ICCD 2017, Boston, MA, USA,
5–8 November 2017; pp. 589–592. [CrossRef]
63. Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An extremely efficient convolution neural network for mobile devices. In Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018;
pp. 6848–6856. [CrossRef]
64. Baldini, G.; Dimc, F.; Kamnik, R.; Steri, G.; Giuliani, R.; Gentile, C. Identification of mobile phones using the built-in magnetome-
ters stimulated by motion patterns. Sensors 2017, 17, 783. [CrossRef] [PubMed]
65. Azimi, I.; Anzanpour, A.; Rahmani, A.M.; Pahikkala, T.; Levorato, M.; Liljeberg, P.; Dutt, N. HiCH: Hierarchical fog-assisted
computing architecture for healthcare IoT. ACM Trans. Embed. Comput. Syst. 2017, 16, 1–20. [CrossRef]
66. Pandey, P.S. Machine Learning and IoT for prediction and detection of stress. In Proceedings of the 17th International Conference
on Computational Science and Its Applications ICCSA 2017, Trieste, Italy, 3–6 July 2017. [CrossRef]
67. Sneha, H.R.; Rafi, M.; Kumar, M.V.M.; Thomas, L.; Annappa, B. Smartphone based emotion recognition and classification. In
Proceedings of the 2nd IEEE International Conference on Electrical, Computer and Communication Technology ICECCT 2017,
Coimbatore, India, 22–24 February 2017. [CrossRef]
68. Al Mamun, M.A.; Puspo, J.A.; Das, A.K. An intelligent smartphone based approach using IoT for ensuring safe driving. In
Proceedings of the 2017 International Conference on Electrical Engineering and Computer Science (ICECOS), Palembang,
Indonesia, 22–23 August 2017; pp. 217–223. [CrossRef]
69. Neyja, M.; Mumtaz, S.; Huq, K.M.S.; Busari, S.A.; Rodriguez, J.; Zhou, Z. An IoT-based e-health monitoring system using ECG
signal. In Proceedings of the IEEE Global Communications Conference GLOBECOM 2017, Singapore, 4–8 December 2017; pp. 1–6.
[CrossRef]
70. Gupta, C.; Suggala, A.S.; Goyal, A.; Simhadri, H.V.; Paranjape, B.; Kumar, A.; Goyal, S.; Udupa, R.; Varma, M.; Jain, P. ProtoNN:
Compressed and accurate kNN for resource-scarce devices. In Proceedings of the 34th International Conference on Machine
Learning, Sydney, Australia, 6–11 August 2017; pp. 1331–1340.
71. Fafoutis, X.; Marchegiani, L.; Elsts, A.; Pope, J.; Piechocki, R.; Craddock, I. Extending the battery lifetime of wearable sensors with
embedded machine learning. In Proceedings of the IEEE World Forum on Internet Things, WF-IoT 2018, Singapore, 5–8 February
2018; pp. 269–274. [CrossRef]
72. Damljanovic, A.; Lanza-Gutierrez, J.M. An embedded cascade SVM approach for face detection in the IoT edge layer. In
Proceedings of the IECON 2018—44th Annual Conference of the IEEE Industrial Electronics Society, Washington, DC, USA,
21–23 October 2018; pp. 2809–2814. [CrossRef]
73. Hochstetler, J.; Padidela, R.; Chen, Q.; Yang, Q.; Fu, S. Embedded deep learning for vehicular edge computing. In Proceedings of
the 3rd ACM/IEEE Symposium on Edge Computing SEC 2018, Seattle, WA, USA, 25–27 October 2018; pp. 341–343. [CrossRef]
74. Taylor, B.; Marco, V.S.; Wolff, W.; Elkhatib, Y.; Wang, Z. Adaptive deep learning model selection on embedded systems. ACM
SIGPLAN Not. 2018, 53, 31–43. [CrossRef]
75. Strielkina, A.; Kharchenko, V.; Uzun, D. A markov model of healthcare internet of things system considering failures of
components. CEUR Workshop Proc. 2018, 2104, 530–543.
76. Vhaduri, S.; van Kessel, T.; Ko, B.; Wood, D.; Wang, S.; Brunschwiler, T. Nocturnal cough and snore detection in noisy
environments using smartphone-microphones. In Proceedings of the IEEE International Conference on Healthcare Informatics,
ICHI 2019, Xi’an, China, 10–13 June 2019. [CrossRef]
Sensors 2021, 21, 4412 40 of 44
77. Sattar, H.; Bajwa, I.S.; Amin, R.U.; Sarwar, N.; Jamil, N.; Malik, M.A.; Mahmood, A.; Shafi, U. An IoT-based intelligent wound
monitoring system. IEEE Access 2019, 7, 144500–144515. [CrossRef]
78. Mengistu, D.; Frisk, F. Edge machine learning for energy efficiency of resource constrained IoT devices. In Proceedings of the
Fifth International Conference on Smart Portable, Wearable, Implantable and Disabilityoriented Devices and Systems, SPWID
2019, Nice, France, 28 July–1 August 2019; pp. 9–14.
79. Wang, S.; Tuor, T.; Salonidis, T.; Leung, K.K.; Makaya, C.; He, T.; Chan, K. Adaptive Federated Learning in Resource Constrained
Edge Computing Systems. IEEE J. Sel. Areas Commun. 2019, 37, 1205–1221. [CrossRef]
80. Suresh, P.; Fernandez, S.G.; Vidyasagar, S.; Kalyanasundaram, V.; Vijayakumar, K.; Archana, V.; Chatterjee, S. Reduction of
transients in switches using embedded machine learning. Int. J. Power Electron. Drive Syst. 2020, 11, 235–241. [CrossRef]
81. Giri, D.; Chiu, K.L.; di Guglielmo, G.; Mantovani, P.; Carloni, L.P. ESP4ML: Platform-based design of systems-on-chip for
embedded machine learning. In Proceedings of the 2020 Design, Automation and Test in European Conference Exhibition DATE
2020, Grenoble, France, 9–13 March 2020; pp. 1049–1054. [CrossRef]
82. Tiku, S.; Pasricha, S.; Notaros, B.; Han, Q. A hidden markov model based smartphone heterogeneity resilient portable indoor
localization framework. J. Syst. Archit. 2020, 108, 101806. [CrossRef]
83. Mazlan, N.; Ramli, N.A.; Awalin, L.; Ismail, M.; Kassim, A.; Menon, A. A smart building energy management using internet of
things (IoT) and machine learning. Test. Eng. Manag. 2020, 83, 8083–8090.
84. Cornetta, G.; Touhafi, A. Design and evaluation of a new machine learning framework for iot and embedded devices. Electronics
2021, 10, 600. [CrossRef]
85. Rabiner, L.; Juang, B. An introduction to hidden Markov models. IEEE ASSP Mag. 1986, 3, 4–16. [CrossRef]
86. Degirmenci, A. Introduction to hidden markov models. Harv. Univ. 2014, 3, 1–5. Available online: http://scholar.harvard.edu/
files/adegirmenci/files/hmm_adegirmenci_2014.pdf (accessed on 10 October 2016).
87. Tóth, B.; Németh, G. Optimizing HMM speech synthesis for low-resource devices. J. Adv. Comput. Intell. Intell. Inform. 2012, 16,
327–334. [CrossRef]
88. Fu, R.; Zhao, Z.; Tu, Q. Reducing computational and memory cost for HMM-based embedded TTS system. Commun. Comput. Inf.
Sci. 2011, 224, 602–610. [CrossRef]
89. Baoli, L.; Shiwen, Y.; Qin, L. An improved K-nearest neighbor algorithm for text categorization. Dianzi Yu Xinxi Xuebao J. Electron.
Inf. Technol. 2005, 27, 487–491.
90. Norouzi, M.; Fleet, D.J.; Salakhutdinov, R. Hamming distance metric learning. Adv. Neural Inf. Process. Syst. 2012, 2, 1061–1069.
91. Saikia, J.; Yin, S.; Jiang, Z.; Seok, M.; Seo, J.S. K-nearest neighbor hardware accelerator using in-memory computing SRAM.
In Proceedings of the 2019 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED), Lausanne,
Switzerland, 29–31 July 2019. [CrossRef]
92. Pedersen, R.; Schoeberl, M. An embedded support vector machine. In Proceedings of the 2006 International Workshop on
Intelligent Solutions in Embedded Systems, Vienna, Austria, 30 June 2006; pp. 79–89. [CrossRef]
93. You, Y.; Fu, H.; Song, S.L.; Randles, A.; Kerbyson, D.; Marquez, A.; Yang, G.; Hoisie, A. Scaling support vector machines on
modern HPC platforms. J. Parallel Distrib. Comput. 2015, 76, 16–31. [CrossRef]
94. Boni, A.; Pianegiani, F.; Petri, D. Low-power and low-cost implementation of SVMs for smart sensors. IEEE Trans. Instrum. Meas.
2007, 56, 39–44. [CrossRef]
95. Afifi, S.M.; Gholamhosseini, H.; Sinha, R. Hardware implementations of SVM on FPGA: A state-of-the-art review of current
practice. Int. J. Innov. Sci. Eng. Technol. 2015, 2, 733–752.
96. Zeng, Z.Q.; Yu, H.B.; Xu, H.R.; Xie, Y.Q.; Gao, J. Fast training support vector machines using parallel sequential minimal
optimization. In Proceedings of the 2008 3rd International Conference on Intelligent System and Knowledge Engineering, Xiamen,
China, 17–19 November 2008; pp. 997–1001. [CrossRef]
97. Anguita, D.; Ghio, A.; Oneto, L.; Parra, X.; Reyes-Ortiz, J.L. Human activity recognition on smartphones using a multiclass
hardware-friendly support vector machine. Lect. Notes Comput. Sci. 2012, 7657, 216–223. [CrossRef]
98. Kudo, T.; Matsumoto, Y. Chunking with support vector machines. In Proceedings of the Second Meeting of the North American
Chapter of the Association for Computational Linguistics 2001, Pittsburgh, PA, USA, 2–7 June 2001; pp. 1–8. [CrossRef]
99. Osuna, E.; Freund, R.; Girosi, F. Improved training algorithm for support vector machines. Neural Networks for Signal Processing
VII. In Proceedings of the 1997 IEEE Signal Processing Society Workshop, Amelia Island, FL, USA, 24–26 September 1997;
pp. 276–285. [CrossRef]
100. Lee, Y.J.; Mangasarian, O. RSVM: Reduced Support vector machines. In Proceedings of the Proceedings of the 2001 SIAM
International Conference on Data Mining, Chicago, IL, USA, 5–7 April 2001; pp. 1–17. [CrossRef]
101. Anguita, D.; Ghio, A.; Pischiutta, S.; Ridella, S. A hardware-friendly support vector machine for embedded automotive
applications. In Proceedings of the 2007 International Joint Conference on Neural Networks, Orlando, FL, USA, 12–17 August
2007; pp. 1360–1364. [CrossRef]
102. Anguita, D.; Bozza, G. The effect of quantization on support vector machines with Gaussian kernel. In Proceedings of the 2005
IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada, 31 July–4 August 2005. [CrossRef]
103. Khan, F.M.; Arnold, M.G.; Pottenger, W.M. Hardware-based support vector machine classification in logarithmic number systems.
In Proceedings of the 2005 IEEE International Symposium on Circuits and Systems, Kobe, Japan, 23–26 May 2005; pp. 5154–5157.
[CrossRef]
Sensors 2021, 21, 4412 41 of 44
104. Anguita, D.; Pischiutta, S.; Ridella, S.; Sterpi, D. Feed-forward support vector machine without multipliers. IEEE Trans. Neural
Netw. 2006, 17, 1328–1331. [CrossRef]
105. Reynolds, D. Gaussian mixture models. Encycl. Biometr. 2009, 741, 659–663. [CrossRef]
106. Gorur, P.; Amrutur, B. Speeded up Gaussian mixture model algorithm for background subtraction. In Proceedings of the
2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Klagenfurt, Austria,
30 August–2 September 2011; pp. 386–391. [CrossRef]
107. Shen, Y.; Hu, W.; Liu, J.; Yang, M.; Wei, B.; Chou, C.T. Efficient background subtraction for real-time tracking in embedded
camera networks. In Proceedings of the 10th ACM Conference on Embedded Networked Sensor System, Toronto, ON, Canada,
6–9 November 2012; pp. 295–308. [CrossRef]
108. Bottou, L. Stochastic Gradient Descent Tricks. In Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science; Montavon,
G., Orr, G.B., Müller, K.R., Eds.; Springer: Berlin/Heidelberg, Germany, 2012. [CrossRef]
109. Johnson, R.; Zhang, T. Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural Inf. Process. Syst.
2013, 1, 1–9.
110. Bottou, L. Stochastic gradient learning in neural networks, Proc. Neuro-Nımes 1991, 8, 1–12.
111. Li, L.; Zhang, S.; Wu, J. An efficient hardware architecture for activation function in deep learning processor. In Proceedings
of the 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC), Chongqing, China, 27–29 June 2018;
pp. 911–918. [CrossRef]
112. Suda, N.; Chandra, V.; Dasika, G.; Mohanty, A.; Ma, Y.; Vrudhula, S.; Seo, J.S.; Cao, Y. Throughput-optimized OpenCL-based
FPGA Accelerator for large-scale convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium
on Field-Programmable Gate Arrays, Monterey, CA, USA, 21–23 February 2016; pp. 16–25. [CrossRef]
113. Learning, S.D. Smartphones devices. IEEE Pervasive Comput. 2017, 16, 82–88.
114. Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a convolutional neural network. In Proceedings of the 2017
International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017; pp. 1–6. [CrossRef]
115. O’Shea, K.; Nash, R. An Introduction to Convolutional Neural Networks. Available online: http://arxiv.org/abs/1511.08458
(accessed on 2 March 2021).
116. Lawrence, S.; Giles, L.; Tsoi, C.; Back, A. Face recognition: A convolutional neural-network approach. IEEE Trans. Neural Netw.
1997, 8, 98–112. [CrossRef]
117. Hochreiter, S.; Schmidhuber, J. Long Short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
118. Shah, S.; Haghi, B.; Kellis, S.; Bashford, L.; Kramer, D.; Lee, B.; Liu, C.; Andersen, R.; Emami, A. Decoding kinematics from
human parietal cortex using neural networks. In Proceedings of the 2019 9th International IEEE/EMBS Conference on Neural
Engineering (NER), San Francisco, CA, USA, 20–23 March 2019; pp. 1138–1141. [CrossRef]
119. Lee, D.; Lim, M.; Park, H.; Kang, Y.; Park, J.S.; Jang, G.J.; Kim, J.H. Long short-term memory recurrent neural network-based
acoustic model using connectionist temporal classification on a large-scale training corpus. Chin. Commun. 2017, 14, 23–31.
[CrossRef]
120. Yu, Y.; Si, X.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31,
1235–1270. [CrossRef] [PubMed]
121. Khan, M.A.; Karim, M.R.; Kim, Y. A two-stage big data analytics framework with real world applications using spark machine
learning and long short-term memory network. Symmetry 2018, 10, 485. [CrossRef]
122. Jouppi, N.P.; Young, C.; Patil, N.; Patterson, D. A domain-specific architecture for deep neural networks. Commun. ACM 2018, 61,
50–59. [CrossRef]
123. Zeiler, M.D.; Fergus, R. Visualizing and understanding convolutional networks. Lect. Notes Comput. Sci. 2014, 8689, 818–833.
[CrossRef]
124. Han, S.; Pool, J.; Tran, J.; Dally, W.J. Learning both weights and connections for efficient neural networks. In Proceedings of the
NIPS’15: Proceedings of the 28th International Conference on Neural Information Processing Systems; ACM: New York, NY, USA, 2015;
Volume 1, pp. 1135–1143.
125. Khoram, S.; Li, J. Adaptive quantization of neural networks. In Proceedings of the 6th International Conference on Learning
Representations (ICLR 2018), Vancouver, BC, Canada, 30 April–3 May 2018; pp. 1–13.
126. Al-Kofahi, M.M.; Al-Shorman, M.Y.; Al-Kofahi, O.M. Toward energy efficient microcontrollers and Internet-of-Things systems.
Comput. Electr. Eng. 2019, 79. [CrossRef]
127. Keras, A. Keras API Reference/Keras Applications. Available online: https://keras.io/api/applications/ (accessed on
14 March 2021).
128. Atmel. ATMEL—ATmega48P/88P/168P/328P. Available online: https://www.sparkfun.com/datasheets/Components/SMD/
ATMega328.pdf (accessed on 14 March 2021).
129. Atmel Corporation. ATMEL—ATmega640/V-1280/V-1281/V-2560/V-2561/V. Available online: https://ww1.microchip.com/
downloads/en/devicedoc/atmel-2549-8-bit-avr-microcontroller-atmega640-1280-1281-2560-2561_datasheet.pdf (accessed on
14 March 2021).
130. STMicroelectronics. STM32L073x8 STM32L073xB. Available online: https://www.st.com/resource/en/datasheet/stm32l073v8
.pdf (accessed on 15 March 2021).
Sensors 2021, 21, 4412 42 of 44
131. Atmel Corporation. 32-Bit ARM-Based Microcontrollers SAM D21E/SAM D21G/SAM D21J Summary. Available online:
www.microchip.com (accessed on 15 March 2021).
132. Atmel. SAM3X / SAM3A Series datasheet. Available online: http://www.atmel.com/Images/Atmel-11057-32-bit-Cortex-M3
-Microcontroller-SAM3X-SAM3A_Datasheet.pdf (accessed on 15 March 2021).
133. STMicroelectronics. STM32F215xx STM32F217xx. Available online: https://www.st.com/resource/en/datasheet/stm32f215re.
pdf (accessed on 15 March 2021).
134. STMicroelectronics. STM32F469xx. Available online: https://www.st.com/resource/en/datasheet/stm32f469ae.pdf (accessed
on 15 March 2021).
135. Raspberry Pi Dramble. Power Consumption Benchmarks. Available online: https://www.pidramble.com/wiki/benchmarks/
power-consumption (accessed on 15 March 2021).
136. The First Affordable RISC-V Computer Designed to Run Linux. Available online: https://www.seeedstudio.com/blog/2021/01/
13/meet-beaglev-the-first-affordable-risc-v-single-board-computer-designed-to-run-linux/ (accessed on 20 April 2021).
137. Lane, N.D.; Bhattacharya, S.; Georgiev, P.; Forlivesi, C.; Jiao, L.; Qendro, L.; Kawsar, F. DeepX: A Software accelerator for
low-power deep learning inference on mobile devices. In Proceedings of the 2016 15th ACM/IEEE International Conference on
Information Processing in Sensor Networks (IPSN), Vienna, Austria, 11–14 April 2016. [CrossRef]
138. Li, D.; Wang, X.; Kong, D. DeepRebirth: Accelerating deep neural network execution on mobile devices. In Proceedings of the
Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2017; pp. 2322–2330.
139. Ren, T.I.; Cavalcanti, G.D.C.; Gabriel, D.; Pinheiro, H.N.B. A Hybrid GMM Speaker Verification System for Mobile Devices in
Variable Environments. In Intelligent Data Engineering and Automated Learning—IDEAL 2012; Lecture Notes in Computer Science;
Yin, H., Costa, J.A.F., Barreto, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2012. [CrossRef]
140. Lei, X.; Senior, A.; Gruenstein, A.; Sorensen, J. Accurate and compact large vocabulary speech recognition on mobile devices. In
Proceedings of the Annual Conference of the International Speech Communication Association INTERSPEECH, Lyon, France,
25–29 August 2013; pp. 662–665.
141. Sanchez-Iborra, R.; Skarmeta, A.F. TinyML-enabled frugal smart objects: Challenges and opportunities. IEEE Circuits Syst. Mag.
2020, 20, 4–18. [CrossRef]
142. Park, J.; Naumov, M.; Basu, P.; Deng, S.; Kalaiah, A.; Khudia, D.; Law, J.; Malani, P.; Malevich, A.; Nadathur, S.; et al. Deep
learning inference in facebook data centers: Characterization, performance optimizations and hardware implications. arXiv 2018,
arXiv:1811.09886.
143. Banbury, C.; Zhou, C.; Fedorov, I.; Matas, R.; Thakker, U.; Gope, D.; Janapa Reddi, V.; Mattina, M.; Whatmough, P. MicroNets:
Neural network architectures for deploying TinyML Applications on commodity microcontrollers. In Proceedings of the 4th
MLSys Conference, San Jose, CA, USA, 4–7 April 2021. Available online: https://proceedings.mlsys.org/paper/2021/file/a3c6
5c2974270fd093ee8a9bf8ae7d0b-Paper.pdf (accessed on 20 April 2021).
144. NVIDIA. NVIDIA V100 Tensor Core GPU. Available online: https://www.nvidia.com/en-us/data-center/v100/ (accessed on
20 February 2021).
145. NVIDIA. The Ultimate PC GPU Nvidia Titan RTX. Available online: https://www.nvidia.com/content/dam/en-zz/Solutions/
titan/documents/titan-rtx-for-creators-us-nvidia-1011126-r6-web.pdf (accessed on 16 February 2021).
146. ST Microelectronics. STM32F745xx STM32F746xx Datasheet. Available online: http://www.st.com/content/ccc/resource/
technical/document/datasheet/96/ed/61/9b/e0/6c/45/0b/DM00166116.pdf/files/DM00166116.pdf/jcr:content/translations/
en.DM00166116.pdf (accessed on 22 January 2021).
147. ST Microelectronics Inc. STM32F765xx, STM32F767xx Datasheet. Available online: https://pdf1.alldatasheet.com/datasheet-
pdf/view/933989/STMICROELECTRONICS/STM32F767ZI.html (accessed on 17 January 2021).
148. Capra, M.; Bussolino, B.; Marchisio, A.; Shafique, M.; Masera, G.; Martina, M. An Updated survey of efficient hardware
architectures for accelerating deep convolutional neural networks. Future Internet 2020, 12, 113. [CrossRef]
149. Sun, S.; Cao, Z.; Zhu, H.; Zhao, J. A survey of optimization methods from a machine learning perspective. IEEE Trans. Cybern.
2020, 50, 3668–3681. [CrossRef] [PubMed]
150. Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and
Huffman coding. In Proceedings of the 4th International Conference on Learning Representations, San Juan, Puerto Rico,
2–4 May 2016; pp. 1–14. Available online: https://arxiv.org/abs/1510.00149 (accessed on 17 January 2021).
151. Hubara, I.; Courbariaux, M.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Quantized neural networks: Training neural networks with low
precision weights and activations. J. Mach. Learn. Res. 2018, 18, 1–30.
152. Tanaka, K.; Arikawa, Y.; Ito, T.; Morita, K.; Nemoto, N.; Miura, F.; Terada, K.; Teramoto, J.; Sakamoto, T. Communication-
efficient distributed deep learning with GPU-FPGA heterogeneous computing. In Proceedings of the 2020 IEEE Symposium on
High-Performance Interconnects (HOTI), Piscataway, NJ, USA, 19–21 August 2020; pp. 43–46. [CrossRef]
153. Lane, N.; Bhattacharya, S.; Georgiev, P.; Forlivesi, C. Squeezing deep learning into mobile and embedded devices. IEEE Pervasive
Comput. 2017, 16, 82–88. [CrossRef]
154. Gysel, P. Ristretto: Hardware-Oriented Approximation of Convolutional Neural Networks. Available online: http://arxiv.org/
abs/1605.06402 (accessed on 20 February 2021).
Sensors 2021, 21, 4412 43 of 44
155. Moons, B.; Goetschalckx, K.; van Berckelaer, N.; Verhelst, M. Minimum energy quantized neural networks. In Proceed-
ings of the 2017 51st Asilomar Conference on Signals, Systems, and Computers ACSSC 2017, Pacific Grove, CA, USA,
29 October–1 November 2017; pp. 1921–1925. [CrossRef]
156. Xu, C.; Kirk, S.R.; Jenkins, S. Tiling for performance tuning on different models of GPUs. In Proceedings of the 2009 Second
International Symposium on Information Science and Engineering ISISE 2009, Shanghai, China, 26–28 December 2009; pp. 500–504.
[CrossRef]
157. Sun, F.; Li, X.; Wang, Q.; Tang, C. FPGA-based embedded system design. In Proceedings of the IEEE Asia-Pacific Conference
Circuits Systems APCCAS, Macao, China, 30 November–3 December 2008. [CrossRef]
158. Roth, W.; Schindler, G.; Zöhrer, M.; Pfeifenberger, L.; Peharz, R.; Tschiatschek, S.; Fröning, H.; Pernkopf, F.; Ghahramani, Z.
Resource-Efficient Neural Networks for Embedded Systems. Available online: http://arxiv.org/abs/2001.03048 (accessed on
27 March 2021).
159. Courbariaux, M.; Bengio, Y.; David, J.P. Low Precision Storage for Deep Learning. Available online: http://arxiv.org/abs/1511.0
0363%5Cnhttp://arxiv.org/abs/1412.7024 (accessed on 10 February 2021).
160. Courbariaux, M.; David, J.P.; Bengio, Y. Training deep neural networks with low precision multiplications. In Proceedings of
the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–10. Available online:
https://arxiv.org/abs/1412.7024 (accessed on 20 February 2021).
161. Tong, J.Y.F.; Nagle, D.; Rutenbar, R.A. Reducing power by optimizing the necessary precision/range of floating-point arithmetic.
IEEE Trans. Very Large Scale Integr. Syst. 2000, 8, 273–286. [CrossRef]
162. Tagliavini, G.; Mach, S.; Rossi, D.; Marongiu, A.; Benin, L. A transprecision floating-point platform for ultra-low power
computing. In Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany,
19–23 March 2018; pp. 151–1056. [CrossRef]
163. Langroudi, S.H.F.; Pandit, T.; Kudithipudi, D. Deep Learning inference on embedded devices: Fixed-point vs posit. In Proceedings
of the 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2),
Williamsburg, VA, USA, 25–25 March 2018; pp. 19–23. [CrossRef]
164. Oberstar, E. Fixed-Point Representation & Fractional Math. Available online: http://www.superkits.net/whitepapers/Fixed%20
Point%20Representation%20&%20Fractional%20Math.pdf (accessed on 2 February 2021).
165. Yates, R. Fixed-point arithmetic: An introduction. Technical Reference. Available online: https://courses.cs.washington.edu/
courses/cse467/08au/labs/l5/fp.pdf (accessed on 15 February 2021).
166. Hwang, K.; Sung, W. Fixed-point feedforward deep neural network design using weights +1, 0, and −1. In Proceedings of the
2014 IEEE Workshop on Signal Processing Systems (SiPS), Belfast, UK, 20–22 October 2014. [CrossRef]
167. Gupta, S.; Agrawal, A.; Gopalakrishnan, K.; Narayanan, P. Deep learning with limited numerical precision. In Proceedings of the
32nd International Conference on Machine Learning ICML 2015, Lille, France, 6–11 July 2015; pp. 1737–1746.
168. Gustafson, J.L.; Yonemoto, I. Beating floating point at its own game: Posit arithmetic. Supercomput. Front. Innov. 2017, 4, 71–86.
169. Hammerstrom, D. A VLSI architecture for high-performance, low-cost, on-chip learning. In Proceedings of the IJCNN. Interna-
tional JT Conference Neural Network, San Diego, CA, USA, 17–21 June 1990; pp. 537–544. [CrossRef]
170. Courbariaux, M.; Hubara, I.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized Neural Networks: Training Deep Neural Networks
with Weights and Activations Constrained to +1 or −1. Available online: http://arxiv.org/abs/1602.02830 (accessed on
22 January 2021).
171. Meng, W.; Gu, Z.; Zhang, M.; Wu, Z. Two-Bit Networks for Deep Learning on Resource-Constrained Embedded Devices.
Available online: http://arxiv.org/abs/1701.00485 (accessed on 3 February 2021).
172. Park, E.; Ahn, J.; Yoo, S. Weighted-entropy-based quantization for deep neural networks. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5456–5464. [CrossRef]
173. Burrascano, P. Learning vector quantization for the probabilistic neural network. IEEE Trans. Neural Netw. 1991, 2, 458–461.
[CrossRef]
174. Mittal, A.; Tiku, S.; Pasricha, S. Adapting convolutional neural networks for indoor localization with smart mobile devices. In
Proceedings of the 2018 on Great Lakes Symposium on VLSI, 2018; GLSVLSI’18, Chicago, IL, USA, 23–25 May 2018; pp. 117–122.
[CrossRef]
175. Hu, R.; Tian, B.; Yin, S.; Wei, S. Efficient hardware architecture of softmax layer in deep neural network. In Proceedings of the
2018 IEEE 23rd International Conference on Digital Signal Processing (DSP), Shanghai, China, 19–21 November 2018; pp. 323–326.
[CrossRef]
176. Hennessy, J.L.; Patterson, D.A. A new golden age for computer architecture. Commun. ACM 2019, 62, 48–60. [CrossRef]
177. Kim, R.G.; Doppa, J.R.; Pande, P.P.; Marculescu, D.; Marculescu, R. Machine learning and manycore systems design: A
Serendipitous symbiosis. Computer 2018, 51, 66–77. [CrossRef]
178. Kim, R.G.; Doppa, J.R.; Pande, P.P. Machine learning for design space exploration and optimization of manycore systems. In
Proceedings of the 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), San Diego, CA, USA,
5–8 November 2018. [CrossRef]
179. Vazquez, R.; Gordon-Ross, A.; Stitt, G. Machine learning-based prediction for dynamic architectural optimizations. In Proceedings
of the 10th International Green and Sustainability Computing Conference IGSC 2019, Alexandria, VA, USA, 21–24 October 2019;
pp. 1–6. [CrossRef]
Sensors 2021, 21, 4412 44 of 44
180. Papp, D.; Ma, Z.; Buttyan, L. Embedded systems security: Threats, vulnerabilities, and attack taxonomy. In Proceedings of the
2015 13th Annual Conference on Privacy, Security and Trust (PST), Izmir, Turkey, 21–23 July 2015; pp. 145–152. [CrossRef]
181. Ogbebor, J.O.; Imoize, A.L.; Atayero, A.A.-A. Energy Efficient Design Techniques in Next-Generation Wireless Communication
Networks: Emerging Trends and Future Directions. Wirel. Commun. Mob. Comput. 2020, 2020, 19. [CrossRef]
182. Imoize, A.L.; Ibhaze, A.E.; Atayero, A.A.; Kavitha, K.V.N. Standard Propagation Channel Models for MIMO Communication
Systems. Wirel. Commun. Mob. Comput. 2021, 2021, 36. [CrossRef]
183. Popoola, S.I.; Jefia, A.; Atayero, A.A.; Kingsley, O.; Faruk, N.; Oseni, O.F.; Abolade, R.O. Determination of neural network
parameters for path loss prediction in very high frequency wireless channel. IEEE Access 2019, 7, 150462–150483. [CrossRef]
184. Faruk, N.; Popoola, S.I.; Surajudeen-Bakinde, N.T.; Oloyede, A.A.; Abdulkarim, A.; Olawoyin, L.A.; Ali, M.; Calafate, C.T.;
Atayero, A.A. Path loss predictions in the VHF and UHF bands within urban environments: Experimental investigation of
empirical, heuristics and geospatial models. IEEE Access 2019, 7, 77293–77307. [CrossRef]
185. Pasricha, S.; Nikdast, M. A Survey of Silicon Photonics for Energy-Efficient Manycore Computing. IEEE Des. Test 2020, 37, 60–81.
[CrossRef]
186. Soref, R. The past, present, and future of silicon photonics. IEEE J. Sel. Top. Quantum Electron. 2006, 12, 1678–1687. [CrossRef]
187. Chittamuru, S.V.R.; Dang, D.; Pasricha, S.; Mahapatra, R. BiGNoC: Accelerating big data computing with application-specific
photonic network-on-chip architectures. IEEE Trans. Parallel Distrib. Syst. 2018, 29, 2402–2415. [CrossRef]