Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views5 pages

Spintronic

Uploaded by

mitra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views5 pages

Spintronic

Uploaded by

mitra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

letter doi:10.

1038/nature23011

Neuromorphic computing with nanoscale


spintronic oscillators
Jacob Torrejon1, Mathieu Riou1, Flavio Abreu Araujo1, Sumito Tsunegi2, Guru Khalsa3†, Damien Querlioz4, Paolo Bortolotti1,
Vincent Cros1, Kay Yakushiji2, Akio Fukushima2, Hitoshi Kubota2, Shinji Yuasa2, Mark D. Stiles3 & Julie Grollier1

Neurons in the brain behave as nonlinear oscillators, which develop recognition with an accuracy similar to that of state-of-the-art neural
rhythmic activity and interact to process information1. Taking networks. We also determine the regime of magnetization dynamics
inspiration from this behaviour to realize high-density, low-power that leads to the greatest performance. These results, combined with
neuromorphic computing will require very large numbers of the ability of the spintronic oscillators to interact with each other, and
nanoscale nonlinear oscillators. A simple estimation indicates that their long lifetime and low energy consumption, open up a path to
to fit 108 oscillators organized in a two-dimensional array inside fast, parallel, on-chip computation based on networks of oscillators.
a chip the size of a thumb, the lateral dimension of each oscillator Nanoscale spintronic oscillators (or spin-torque nano-oscillators)
must be smaller than one micrometre. However, nanoscale devices are nanoscale pillars composed of two ferromagnetic layers separated
tend to be noisy and to lack the stability that is required to process by a non-magnetic spacer (Fig. 1a). Charge currents become spin-
data in a reliable way. For this reason, despite multiple theoretical polarized when they flow through these junctions and generate
proposals2–5 and several candidates, including memristive6 and torques on the magnetizations10,11 that lead to sustained magnetization
superconducting7 oscillators, a proof of concept of neuromorphic precession at frequencies of hundreds of megahertz to several tens of
computing using nanoscale oscillators has yet to be demonstrated. gigahertz. Magnetization oscillations are converted into voltage oscilla-
Here we show experimentally that a nanoscale spintronic oscillator tions through magneto-resistance. The resulting radio-frequency oscil-
(a magnetic tunnel junction)8,9 can be used to achieve spoken-digit lations, of up to tens of millivolts (ref. 12), can be detected by measuring

a Spin torque b c
Voltage amplitude,
Oscillator voltage,

10
m 15
Ferromagnet 5
Vosc (mV)

Ṽ (mV)

Normal 10
0
Ferromagnet M –5 5
–10 0
Current
0 10 20 0 2 4 6 8 10
Time (ns) Current, IDC (mA)
10–500 nm
d e
Diode 200
Vin (mV)

Input Vin Ṽ (t) 0


Arbitrary –200
waveform FeB V1 V2 V5 V7
generator MgO Vosc(t) 15
V6
Vosc (mV)

+ V3 V4 Ṽ (t)
CoFeB
0
IDC
P0H –15
32.0 32.5 33.0
Time, t (μs)
Figure 1 | Spin-torque nano-oscillator for neuromorphic computing. voltage amplitude that results when an input signal of Vin =​  ±​250 mV is
a, Schematic of a spin-torque nano-oscillator, consisting of a injected (here for IDC =​ 6.5 mA (vertical dotted line) and μ0H =​ 430 mT).
non-magnetic spacer (gold) between two ferromagnetic layers, with d, Schematic of the experimental set-up. A d.c. current IDC and a rapidly
magnetization m for the free layer (blue) and M for the fixed layer (silver). varying waveform that encodes the input Vin are injected into the spin-
A current injected into the oscillator induces magnetization precessions of m. torque nano-oscillator. The microwave voltage Vosc emitted by the
For our experiments we used a nano-oscillator with a diameter of 375 nm; oscillator in response to the excitation is measured with an oscilloscope.
however, diameters of 10–500 nm are possible. b, Measured a.c. voltage ~
For computing, the amplitude V of the oscillator is used, and measured
~
emitted by the oscillator as a function of time, Vosc = V (t )cos(ωt + ϕ), directly with a microwave diode. e, Input Vin (top; magenta) and measured
for a steady current injection of 7 mA at an external magnetic field microwave voltage Vosc (bottom; grey) emitted by the oscillator as a
~ ~
μ0H =​ 430 mT. The dotted blue lines highlight the amplitude V . c, Voltage function of time. Here IDC =​ 6 mA and μ0H =​ 430 mT. The envelope V of
~
amplitude V as a function of d.c. current IDC at μ0H =​ 430 mT (blue the oscillator signal is highlighted in blue. For computing it is sampled
squares). The purple shaded area highlights the typical excursion in the periodically, as shown by the blue circles labelled V1–7.

1
Unité Mixte de Physique, CNRS, Thales, Université Paris-Sud, Université Paris-Saclay, 91767 Palaiseau, France. 2National Institute of Advanced Industrial Science and Technology (AIST),
Spintronics Research Center, Tsukuba, Ibaraki 305-8568, Japan. 3Center for Nanoscale Science and Technology, National Institute of Standards and Technology, Gaithersburg, Maryland 20899-
6202, USA. 4Centre de Nanosciences et de Nanotechnologies, CNRS, Université Paris-Sud, Université Paris-Saclay, 91405 Orsay, France. †Present address: Cornell University, Department of
Materials Science and Engineering, Ithaca, New York 14853-1501, USA.

4 2 8 | N A T U R E | V O L 5 4 7 | 2 7 J ul y 2 0 1 7
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
letter RESEARCH

0.1

Amplitude (a.u.)
a “1”
Input: 0.0 e Spectrogram filtering
audio file 90
80

Recognition rate (%)


–0.1
2,000 4,000 6,000 70 With oscillator
Time (a.u.) 60
b 50

Amplitude (a.u.)
Filtering to 0.6 78 or 65 frequency 40
frequency channels 30
0.4
channels W 20 Without oscillator
(spectrogram 0.2 10
or cochlear model) 0.0 0
1 2 3 4 5 6 1 2 3 4 5 6 7 8 9
Time (W) Number of utterances, N

150 20T f Cochlear filtering


Voltage (mV)
c
Pre-processed 100
0
input

Recognition rate (%)


95 With oscillator
–150 90
Without oscillator
200 202 204 206 208 210 85
With or without Time (μs)
oscillator 80
12 75
Voltage (mV)

70
9
d Recorded 65
trace 1 2 3 4 5 6 7 8 9
6
Number of utterances, N
200 202 204 206 208 210
Reconstruction of Time (μs)
the output
(by a computer)
Figure 2 | Spoken-digit recognition. a–d, Principle of the experiment. ~
obtained by linearly combining the 400 values of Vi , sampled from each
a, Audio waveform corresponding to the digit 1 pronounced by speaker 1. interval τ. e, f, Spoken-digit recognition rates in the testing set as a
b, Filtering to frequency channels for acoustic feature extraction. The function of the number of utterances N used for training for the
audio waveform is divided in intervals of duration τ. The cochlear model spectrogram filtering (e; μ0H =​ 430 mT, IDC =​ 6 mA) and for the cochlear
filters each interval into 78 frequency channels (65 for the spectrogram filtering (f; μ0H =​ 448 mT, IDC =​ 7 mA). Because there are many ways
model), which are then concatenated as 78 (65) values for each interval, to to pick the N utterances, the recognition rate is an average over all
form the filtered input. c, Pre-processed input (transformed from the 10!/[(10 −​  N)!N!] combinations of N utterances out of the 10 in the dataset.
purple shaded region in b). The filtered input is multiplied by a randomly The red curves are the experimental results using the magnetic oscillator.
filled binary matrix (masking process), resulting in 400 points separated The black curves are control trials, in which the pre-processed inputs are
by a time step θ of 100 ns in each interval of duration τ (τ =​  400θ). used for reconstructing the output on a computer directly, as described in
~
d, Oscillator output. The envelope V (t ) of the emitted voltage amplitude of Methods, without going through the experimental set-up. The error bars
the experimental oscillator is shown (μ0H =​ 430 mT, IDC =​ 6 mA). The correspond to the standard deviation of the recognition rate, based on
~ ~
400 values of V (t ) per interval τ (Vi , sampled with a time step θ) emulate training with all possible combinations.
400 neurons. The reconstructed output ‘1’, corresponding to this digit, is

the voltage across the junction (Fig. 1b). Spin-torque nano-oscillators realization of which would otherwise require several electronics com-
are therefore simple and ultra-compact: their lateral size can be scaled ponents and a much larger on-chip area using conventional CMOS23.
down to 10 nm and their power consumption reduced to 1 μ​W (ref. 13). To compute, we encode neural inputs in the time-dependent current
Because they have the same structure as present-day magnetic mem- I(t) that is injected into the oscillator and use the amplitude response
~
ory cells, they are compatible with complementary metal–oxide– V (t ) as the neural output.
semiconductor (CMOS) technology, have high endurance, operate at Our nano-oscillators consist of circular magnetic tunnel junctions,
room temperature and can be fabricated in large numbers (currently with a 6-nm-thick free layer of FeB of 375-nm diameter, which have
up to hundreds of millions) on a single chip14. Just as the frequency of magnetic vortex ground states (see Methods). We measure the dynam-
~
a neuron is modified by the spikes received from other neurons, the ics of the signal amplitude V (t ) directly using a microwave diode. In
~
frequencies of spin-torque nano-oscillators are highly sensitive to the Fig. 1c we show the nonlinear response of the amplitude V to a d.c.
~
magnetization dynamics of neighbouring oscillators to which they are current IDC: V ∝ (IDC − Ith ) , where Ith is the current threshold for
coupled15,16. Together, these features of spin-torque nano-oscillators steady oscillations to occur15. Using an arbitrary waveform generator,
make them promising candidates for use in neuromorphic computing we inject a varying current though the junctions in addition to the d.c.
with large arrays of coupled oscillators17–21. However, they have yet to current, using the set-up schematized in Fig. 1d. The resulting voltage
be used to perform an actual computing task. oscillations, recorded with an oscilloscope, are shown in Fig. 1e. The
Our idea is to exploit the amplitude dynamics of spin-torque amplitude of the oscillator varies in response to the injected d.c. current,
nano-oscillators for neuromorphic computing. Their oscillation ampli- with a relaxation time that induces a few hundred nanoseconds
~
tude V (dotted blue line in Fig. 1b) is robust to noise, owing to the memory of past inputs22.
confinement that is provided by the counteracting torques exerted by Recent studies have revealed that time-multiplexing can enable a
~
the injected current and magnetic damping22. In addition, V is highly single oscillator to emulate a full neural network24–26. Here we use
nonlinear as a function of the injected current and depends intrinsically this approach—a form of “reservoir computing”4,5 (see Methods)—
on past inputs15. Exploiting the amplitude dynamics of spin-torque to demonstrate the ability of spin-torque nano-oscillators to realize
nano-oscillators thus combines in one single nanodevice the two most neuromorphic tasks. We perform a benchmark task of spoken-digit
crucial properties of neurons—nonlinearity and memory—the recognition. The input data, taken from the TI-46 database27, are

2 7 J ul y 2 0 1 7 | V O L 5 4 7 | N A T U R E | 4 2 9
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH letter

a 600 b c 600

amplitude, V˜ (mV)
10
14 Vup 500 0
500

deviation (%)

VupVdw (mV2)
17 20

Voltage
21 5 40

r.m.s.
400 400
Magnetic field, P0H (mT)

Magnetic field P0H (mT)


24 Vdw 60
27 80
300 30 0 300 100
–500 0 500 1,000
600 Time (μs) 600
e 7.0 d

amplitude, V˜ (mV)
45 ΔV 0

VupVdw/ΔV (mV)
500 500

1/ΔV (mV–1)
125 2

Voltage
205 6.8 4
400 400
290 6
370 8
300 6.6 300
450 I th 10
–500 0
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Time (μs)
Current, IDC (mA) Current, IDC (mA)

Figure 3 | Conditions for optimal waveform classification and ~


Δ​V of the voltage amplitude V at steady state under IDC. c, Maximal
identification of important oscillator properties. The task consists of response (VupVdw) of the oscillator to the input: map in the IDC–μ0H plane.
recognizing sine waveforms from square ones with the same period. The d, Inverse of the noise amplitude 1/Δ​V: map in the IDC–μ0H plane. The
target for the output that is reconstructed from the oscillator’s response is threshold current Ith is indicated by a white solid line. In c and d, the
~
one for square, zero for sine. We emulate 24 neurons Vi , τ =​  24θ. a, Root- optimal range of bias conditions for waveform classification is marked
mean-square (r.m.s.) deviation of output-to-target deviations: map as a by a white dashed rectangle (currents of 6–7 mA and magnetic fields of
function of d.c. current IDC and magnetic field μ0H. b, Extraction of 350–450 mT). e, Map of the ratio of maximal amplitudes to noise VupVdw/Δ​V,
parameters from the time traces of the oscillator’s response. Top, showing that these parameters largely determine the performance of the
maximum positive (Vup) and negative (Vdw) variations in the oscillator’s oscillator (compare with a).
amplitude in response to the varying pre-processed input. Bottom, noise

audio waveforms of isolated spoken digits (0 to 9) pronounced by five (see Methods). Because each digit has been pronounced ten times by
different female speakers (Fig. 2a). The goal is to recognize the digits, each of the five speakers, we can use some of the data to determine the
independent of the speaker. coefficients (training), and the rest to evaluate the recognition perfor-
Neural networks classify information through chain reactions: mance (testing); see Methods. To assess the effect of our oscillator on
neuron after neuron, each input undergoes a series of nonlinear the quality of recognition, we always perform a control trial without
transformations28. In a trained network, the same digit always triggers the oscillator. In that case, the preprocessed input traces are used to
a similar chain reaction even if it is pronounced by different speakers, reconstruct the outputs on the computer directly, without going
whereas different digits generate different chain reactions, thus allow- through the experimental set-up.
ing pattern recognition. An input can trigger a chain reaction in space The improvement shown in the experimental results over the con-
by using ensembles of neurons, wherein the state of downstream trol results (see Fig. 2e, f) indicates that the spin-torque nano-oscillator
neurons depends on the state of upstream neurons. But an input can greatly improves the quality of spoken-digit recognition, despite the
also trigger a chain reaction in time by constantly exciting a single added noise that is concomitant to its nanometre-scale size. In Fig. 2e
nonlinear oscillator with memory: in this case, the state of the oscillator (linear spectrogram filtering), we present an example in which the
in the future depends on the state of the oscillator in the past. We use extraction of acoustic features, achieved by Fourier transforming the
the latter approach, which simplifies the hardware because only one audio waveform over finite time windows, plays a minimal part in
oscillator is needed, but requires preprocessing of the input: each point classification. Without the oscillator (black line), the recognition rates
of the audio waveform is converted into a fast-paced binary sequence are consistent with random choices; with the oscillator (red line), the
that is designed to generate a chain reaction of amplitude variations in recognition rate is improved by 70%, reaching values of up to 80%.
the oscillator24. This example highlights the crucial role of the oscillator in the recogni-
The procedure is illustrated in Fig. 2a–d and detailed in Methods. tion process. Using nonlinear cochlear filtering30 (Fig. 2f), which is the
Because acoustic features are mainly encoded in frequencies29, we filter standard in reservoir computing24–26 and has been optimized on the
each audio file into Nf different frequency channels (a standard proce- basis of the behaviour of biological ears, we achieve recognition rates
dure in speech recognition), which are then concatenated in intervals of up to 99.6%, as high as the state-of-the-art. Compared to the control
of duration τ (Fig. 2b). For preprocessing, each of these segments is trial, the oscillator reduces the error rate by a factor of up to 15. Our
multiplied by a randomly filled binary matrix (of dimension Nf ×​  Nθ). results with a spin-torque nano-oscillator are therefore comparable
In this way, each point of the input audio waveform is converted into a to the recognition rates obtained with more complicated electronic
binary sequence of duration τ that is composed of Nθ points separated or optical systems (between 95.7% and 99.8% for the same task with
by a time step θ (τ =​  Nθθ). When this preprocessed input (Fig. 2c) is cochlear filtering)23–26,29.
applied as a current to our spin-torque nano-oscillator, the resulting The optimal operating conditions for pattern recognition with our
~
amplitude variations V (t ) (Fig. 2d) function as a set of Nθ neurons spin-torque nano-oscillator are determined by the oscillation amplitude
~
coupled in time (we take Nθ samples Vi per interval τ). For spoken-digit and noise. We use a simpler task, classification of sine and square wave-
recognition, we emulate Nθ =​ 400 neurons and use θ =​ 100 ns (about forms with the same period25, to investigate the ability of the oscillator
one-fifth of the relaxation time of the oscillators) to set the oscillator in to classify waveforms in a wide range of injected d.c. currents IDC and
a transient state. applied magnetic fields μ0H (see Methods). As can be seen in Fig. 3a,
~
The responses of the voltage amplitude V (t ) of the oscillator are the quality of pattern recognition, characterized by the root-mean-
recorded for each utterance of each spoken digit. The goal of the sub- square of deviations between the reconstructed output and the target,
sequent training process, performed on a computer, is to choose a varies from 10% to more than 30% depending on the bias conditions.
~
linear combination of these responses (sets of Vi in each t) for each The oscillator performs well when it responds strongly to the time-
digit such that the sum is one for that digit and zero for the rest varying preprocessed input, with large amplitude variations in both the

4 3 0 | N A T U R E | V O L 5 4 7 | 2 7 J ul y 2 0 1 7
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
letter RESEARCH

positive and negative directions, Vup and Vdw, respectively (Fig. 3b, top). 10. Slonczewski, J. C. Current-driven excitation of magnetic multilayers. J. Magn.
Magn. Mater. 159, L1–L7 (1996).
On the other hand, it performs poorly when the noise in the oscillator 11. Berger, L. Emission of spin waves by a magnetic multilayer traversed by a
Δ​V (the standard deviation of the noise in the voltage amplitude) is current. Phys. Rev. B 54, 9353–9358 (1996).
high (Fig. 3b, bottom). As shown in Fig. 3b, we extract these parameters 12. Tsunegi, S., Yakushiji, K., Fukushima, A., Yuasa, S. & Kubota, H. Microwave
emission power exceeding 10 μ​W in spin torque vortex oscillator. Appl. Phys.
from the time traces of the voltage emitted from the oscillator at each Lett. 109, 252402 (2016).
bias point, and plot VupVdw (Fig. 3c) and 1/Δ​V (Fig. 3d) as a function 13. Sato, H. et al. Properties of magnetic tunnel junctions with a MgO/CoFeB/Ta/
of the d.c. current IDC and field μ0H. The red regions of large oscillation CoFeB/MgO recording structure down to junction diameter of 11 nm. Appl.
Phys. Lett. 105, 062403 (2014).
amplitudes in Fig. 3c correspond to low magnetic fields, in which the 14. Apalkov, D., Dieny, B. & Slaughter, J. M. Magnetoresistive random access
magnetization is weakly confined, and to high currents, for which the memory. Proc. IEEE 104, 1796–1830 (2016).
spin torque on magnetization is maximal. The blue regions of high 15. Slavin, A. & Tiberkevich, V. Nonlinear auto-oscillator theory of microwave
noise in Fig. 3d correspond to areas just above the threshold current Ith generation by spin-polarized current. IEEE Trans. Magn. 45, 1875–1918
~ (2009).
for oscillation, in which the oscillation amplitude V is growing rapidly 16. Houshang, A. et al. Spin-wave-beam driven synchronization of nanocontact
as a function of current and is becoming sensitive to external spin-torque oscillators. Nat. Nanotechnol. 11, 280–286 (2016).
fluctuations15. As can be seen by comparing Fig. 3c and d, the range of 17. Macià, F., Kent, A. D. & Hoppensteadt, F. C. Spin-wave interference patterns
created by spin-torque nano-oscillators for memory and computation.
bias conditions highlighted by the dotted white boxes (currents of Nanotechnology 22, 095301 (2011).
6–7 mA and magnetic fields of 350–450 mT) features wide variations in 18. Pufall, M. R. et al. Physical implementation of coherently coupled
oscillation amplitudes and low noise. In this region, root-mean-square oscillator networks. IEEE J. Explor. Solid-State Comput. Devices Circuits 1,
76–84 (2015).
deviations below 15% are achieved, and there are no classification 19. Nikonov, D. E. et al. Coupled-oscillator associative memory array operation for
errors between sine and square waveforms. The similarity between the pattern recognition. IEEE J. Explor. Solid-State Comput. Devices Circuits 1, 85–93
map of VupVdw/Δ​V (Fig. 3e) and that of the classification performance (2015).
20. Yogendra, K., Fan, D. & Roy, K. Coupled spin torque nano oscillators for low
(Fig. 3a) confirms that the best conditions for classification correspond power neural computation. IEEE Trans. Magn. 51, 4003909 (2015).
to regions of optimal compromise between low noise and large ampli- 21. Grollier, J., Querlioz, D. & Stiles, M. D. Spintronic nanodevices for bioinspired
tude variations. The necessity of a high signal-to-noise ratio for efficient computing. Proc. IEEE 104, 2024–2039 (2016).
22. Grimaldi, E. et al. Response to noise of a vortex based spin transfer nano-
neuromorphic computing, highlighted here for magnetic oscillators, is oscillator. Phys. Rev. B 89, 104404 (2014).
a general guideline that applies to any type of nanoscale oscillator. 23. Soriano, M. C. et al. Delay-based reservoir computing: noise effects in a
As a conclusion, our pattern-recognition results show that simple, combined analog and digital implementation. IEEE Trans. Neural Netw. Learn.
ultra-compact spintronic oscillators have all of the properties that Syst. 26, 388–393 (2015).
24. Appeltant, L. et al. Information processing using a single dynamical node as
are needed to emulate collections of neurons: nonlinearity, memory complex system. Nat. Commun. 2, 468 (2011).
and stability. The ability of groups of these oscillators to mimic neural 25. Paquot, Y. et al. Optoelectronic reservoir computing. Sci. Rep. 2, 287
connections by influencing the behaviour of one another through (2012).
26. Martinenghi, R., Rybalko, S., Jacquot, M., Chembo, Y. K. & Larger, L. Photonic
current and magnetic-field coupling opens up a route to realizing nonlinear transient computing with multiple-delay wavelength dynamics. Phys.
large-scale neural networks in hardware, which exploit magnetization Rev. Lett. 108, 244101 (2012).
dynamics for computing15–21. 27. Texas Instruments. 46-Word Speaker-Dependent Isolated Word Corpus (TI-46),
NIST Speech Disc 7-1.1, https://catalog.ldc.upenn.edu/LDC93S9 (NIST, 1991).
28. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444
Online Content Methods, along with any additional Extended Data display items and
(2015).
Source Data, are available in the online version of the paper; references unique to
29. Yildiz, I. B., von Kriegstein, K. & Kiebel, S. J. From birdsong to human speech
these sections appear only in the online paper.
recognition: Bayesian inference on a hierarchy of nonlinear dynamical
systems. PLOS Comput. Biol. 9, e1003219 (2013).
received 25 January; accepted 2 June 2017.
30. Lyon, R. A computational model of filtering, detection, and compression in the
cochlea. in IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP 82)
1. Buzsaki, G. Rhythms of the Brain (Oxford Univ. Press, 2011). Vol. 7, 1282–1285 (IEEE, 1982).
2. Hoppensteadt, F. C. & Izhikevich, E. M. Oscillatory neurocomputers with
dynamic connectivity. Phys. Rev. Lett. 82, 2983–2986 (1999). Acknowledgements This work was supported by the European Research
3. Aonishi, T., Kurata, K. & Okada, M. Statistical mechanics of an oscillator Council (ERC) under grant bioSPINspired 682955. We thank L. Larger,
associative memory with scattered natural frequencies. Phys. Rev. Lett. 82, B. Penkovsky and F. Duport for discussions.
2800–2803 (1998).
4. Jaeger, H. & Haas, H. Harnessing nonlinearity: predicting chaotic systems and Author Contributions The study was designed by J.G. and M.D.S., samples
saving energy in wireless communication. Science 304, 78–80 (2004). were optimized and fabricated by S.T. and K.Y., experiments were performed
5. Maass, W., Natschläger, T. & Markram, H. Real-time computing without stable by J.T. and M.R., numerical studies were realized by F.A.A., M.R. and G.K., and all
states: a new framework for neural computation based on perturbations. authors contributed to analysing the results and writing the paper.
Neural Comput. 14, 2531–2560 (2002).
6. Pickett, M. D., Medeiros-Ribeiro, G. & Williams, R. S. A scalable neuristor built Author Information Reprints and permissions information is available at
with Mott memristors. Nat. Mater. 12, 114–117 (2013). www.nature.com/reprints. The authors declare no competing financial
7. Segall, K. et al. Synchronization dynamics on the picosecond time scale in interests. Readers are welcome to comment on the online version of the paper.
coupled Josephson junction neurons. Phys. Rev. E 95, 032220 (2017). Publisher’s note: Springer Nature remains neutral with regard to jurisdictional
8. Kiselev, S. I. et al. Microwave oscillations of a nanomagnet driven by a claims in published maps and institutional affiliations. Correspondence and
spin-polarized current. Nature 425, 380–383 (2003). requests for materials should be addressed to J.G. ([email protected]).
9. Rippard, W. H., Pufall, M. R., Kaka, S., Russek, S. E. & Silva, T. J. Direct-current
induced dynamics in Co90Fe10/Ni80Fe20 point contacts. Phys. Rev. Lett. 92, Reviewer Information Nature thanks F. Hoppensteadt and the other anonymous
027201 (2004). reviewer(s) for their contribution to the peer review of this work.

2 7 J ul y 2 0 1 7 | V O L 5 4 7 | N A T U R E | 4 3 1
© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.
RESEARCH letter

Methods frequency content for each time interval by a mask matrix containing Nf ×​  Nθ
Samples. Magnetic tunnel junction (MTJ) films with a stacking structure of random binary values, giving a total of Nτ ×​  Nθ values as input to the oscillator
buffer/PtMn(15)/Co71Fe29(2.5)/Ru(0.9)/Co60Fe20B20(1.6)/Co70Fe30(0.8)/MgO(1)/ (Fig. 2c). Here, we are modelling Nθ =​ 400 input neurons, each of which is
Fe80B20(6)/MgO(1)/Ta(8)/Ru(7) (with thicknesses given in parentheses in nano- connected to all of the frequency channels for each time interval.
metres) were prepared by ultrahigh vacuum (UHV) magnetron sputtering. After Each preprocessed input value is consecutively applied to the oscillator as a
annealing at 360 °C for 1 h, the resistance–area products (RA) were approximately constant current for a time interval of θ ≈​ 100 ns, which is about five times shorter
3.6 Ω​ μ​m2. Circular-shape MTJs with a diameter of approximately 375 nm were pat- than the relaxation time of the oscillator, as recommended in ref. 24. This time is
terned using Ar ion etching and e-beam lithography. The resistance of the samples is short enough to guarantee that the oscillator is maintained in its transient regime
close to 40 Ω​and the magneto-resistance ratio is about 135% at room temperature. so the emulated neurons are connected to each other, but is long enough to let the
The FeB layer presents a vortex structure as the ground state for the dimensions oscillator respond to the input excitation. The amplitude of the a.c. voltage across
used here. In a small region called the core of the vortex, the magnetization spirals the oscillator is recorded for offline post-processing (Fig. 2d).
out of plane. Under d.c. current injection, the core of the vortex steadily gyrates The post-processing of the output consists of two distinct steps. The first is
around the centre of the dot with a frequency in the range 250–400 MHz for the called the training (or learning) process and the second is called the classification
oscillators we consider here. Vortex dynamics driven by spin torque are well under- (or recognition) process. The goal of training is to determine a set of weights wi,θ,
stood, well controlled and have been shown to be particularly stable22. where i indexes the desired digit. These weights are used to multiply the output
Measurement set-up. The experimental implementation for spoken-digit recogni- voltages to give 10Nτ output values, which are then averaged over the Nτ time
tion and sine/square classification tasks is illustrated in Fig. 1d. The pre-processed intervals to give 10 output values yi, which should ideally be equal to the target
input signal Vin is generated by a high-frequency arbitrary-waveform generator and values yi = 1.0 for the appropriate digit and 0.0 for the rest. In the training process,
injected as a current through the magnetic nano-oscillator. The sampling rate of the a fraction of the utterances are used to train these weights; the rest of the utterances
source is set to 200 MHz (20 points per interval of time θ) for the spoken-digit rec- are used in the classification process to test the results.
ognition task and 500 MHz (50 points per interval of time θ) for the classification of The optimum weights are found by minimizing the difference between ~ yi and
sines and squares. The peak-to-peak variation in the input signal is 500 mV, which yi for all of the words used in the training. In practice, optimal values are deter-
corresponds to peak-to-peak current variations of 6 mA, as illustrated in Fig. 1c mined by using techniques for extracting meaningful eigenvalues from singular
(part of the incoming signal is reflected owing to impedance mismatch). The bias matrices such as the linear Moore–Penrose pseudo-inverse operator (denoted by
conditions of the oscillator are set by a d.c. current source and an electromagnet a dagger symbol †). If we consider the target matrix Y, which contains the targets
that applies a field perpendicular to the plane of the magnetic layers. The oscillating yi for all of the time steps τ used for the training, and the response matrix S, which
voltage emitted by the nano-oscillator is rectified by a planar tunnel microwave contains all neuron responses for all of the time steps τ used for the training, then
diode, with a bandwidth of 0.1–12.4 GHz and a response time of 5 ns. The input the matrix W, which contains the optimal weights, is given by W = YS †. This step
dynamic range of the diode is between 1 μ​W and 3.15 mW, corresponding to a is performed on a computer and takes several seconds. In the future, real-time
d.c. output level of 0–400 mV. We use an amplifier to adjust the emitted power of processing on a nanosecond timescale could be realized using fully parallel
the nano-oscillator to the working range of the diode. The output signal is then networks of interacting nano-oscillators.
recorded by a real-time oscilloscope. In Figs 1b, c, e, 2d and 3b–e, the amplitude of During the classification phase, the ten reconstructed outputs corresponding
the signal emitted by the oscillator is shown without amplification (the signal meas- to one digit are averaged over all of the time steps τ of the signal, and the digit is
ured after the diode has been divided by the total amplification of the circuit, about identified by taking the maximum value of the ten averaged reconstructed outputs.
+​21 dB). If, owing to sampling errors, the measured envelope of the oscillators is The averaged reconstructed output that corresponds to the digit in question should
shifted with respect to the input, classification accuracy can be degraded. We use be close to 1 and the others should be close to 0. The efficiency of the recognition
alignment marks to align our measurements with the input when we reconstruct is evaluated by the word success rate, which is the rate of digits that are correctly
the output. The alignment precision is ±​1 ns. identified. The training can be done using more or fewer data (here ‘utterances’).
General concepts of reservoir computing. In machine learning, a reservoir is We always trained the system using the ten digits spoken by the five speakers. The
a network of recurrently and randomly connected nonlinear nodes4,5. When an only parameter that we changed is the number of utterances used for the training.
input signal is injected in the reservoir, it is mapped to a higher-dimensional space If we use N utterances for training, then we use the remaining 10 −​  N utterances
in which it can become linearly separable. The key insight behind reservoir com- for testing. However, some utterances are very well pronounced whereas others are
puting is that the network does not need any tuning: all connections inside the hardly distinguishable. As a consequence, the resulting recognition rate depends
reservoir are kept fixed. Only external connections (between the reservoir and an on which N utterances are picked for training in the set of ten (for example, if
output layer) are trained to achieve the desired task. N =​ 2, then the utterances picked for training could be the first and second, but
In other words, reservoir computing requires the generation of complex nonlinear also the second and third, or the sixth and tenth, or any other of the 10!/(8!2!)
dynamics but, as a trade-off, learning is greatly simplified. For efficient reservoir combinations of 2 picked out of 10). To avoid this bias, the recognition rates that
computing, several requirements related to the dynamical properties of the network we present here are the average of the results over all possible combinations. The
should be satisfied. First, different inputs should trigger different dynamics (separa- error bars corresponds to the standard deviation of the word recognition rate.
tion property) and similar inputs should generate similar dynamics (approximation The raw spectrogram is not complex enough to allow a correct reconstruction
property), enabling efficient classification. Second, the reservoir state should not of the target during the training. Adding the oscillator brings complexity and
depend only on present inputs but also on recent past inputs. This short-term suppresses this phenomenon.
memory, called fading memory, is essential for processing temporal sequences for Sine- and square-wave classification. For this classification task, the input is a
which the history of the signal is important. random sequence of 160 sines and squares with the same period—the first half
A single nonlinear oscillator can emulate a reservoir when it is set in transient of the sequence for training and the second half for classification. Each period is
dynamics by a rapidly varying input24. The loss of parallelism is compensated discretized into eight points separated by a time step τ. The pre-processing con-
by an additional pre-processing input step: the input is multiplied by a rapidly sists of multiplying the value of each point by the same binary sequence that is
varying mask, which enables virtual nodes to be defined, interconnected in time generated by a random distribution of +​1 and −​1 values. In contrast to spoken-
through the resultant oscillator dynamics. This approach provides a marked sim- digit recognition, the mask is a binary vector (instead of a binary matrix). The fast
~
plification of the reservoir scheme for hardware implementations, and has been binary sequence contains 24 values, so 24 neurons Vi are emulated during each
realized in hardware with optical or electronic oscillators assembled from several time step τ.
components23–26. The target y for the network output y is 0 for all of the trajectories in response
Spoken-digit recognition. For this task, the inputs are taken from the NIST TI-46 to a sine and 1 for all of the trajectories in response to a square. The best weights
data corpus27. The input consists of isolated spoken digits said by five different are found by linear regression, as explained above for the spoken-digit recognition
female speakers. Each speaker pronounces each digit ten times. The 500 audio task. For sine/square recognition, we record five points instead of one for each
waveforms are sampled at a rate of 12.5 kHz and have variable time lengths. neuron when we measure the output of the oscillator. During post-processing, we
~ ~
We used two different filtering methods: spectrogram and cochlear models. use these additional states between Vi and Vi+ 1 to increase the number of coeffi-
Both filters break the word into several time intervals Nτ of duration τ and ana- cients available for solving the problem, and thus increase classification accuracy.
lyse the frequency content in each interval τ through either a Fourier transform In addition, the best performance does not necessarily correspond to a target in
(spectrogram model; 65 channels, Nτ ∈​ {24, …, 67}; Fig. 2b) or a more complicated exact phase with the oscillator’s output. The standard deviation of the root-mean-
nonlinear approach (cochlear model; 78 channels, Nτ∈​{14, …, 41}). The input for square value of Voutput −​  Vtarget, obtained with ten repetitions, is around 1%.
each word is composed of an amplitude for each of the Nf =​  65 or Nf =​  78 frequency Data availability. The datasets generated and analysed during this study are avail-
channels times Nτ time intervals. This input is pre-processed by multiplying the able from the corresponding author on reasonable request.

© 2017 Macmillan Publishers Limited, part of Springer Nature. All rights reserved.

You might also like