Generating Conjectures On Fundamental Constants With The Ramanujan Machine
Generating Conjectures On Fundamental Constants With The Ramanujan Machine
Throughout history, simple formulas of fundamental constants sym- famous for saying: “I have the result, but I do not yet know how to get
bolized simplicity, aesthetics and mathematical beauty2. A couple of it”15, which emphasizes the role of identifying patterns and RFs in data
well known examples include Euler’s identity eiπ + 1 = 0 and the contin- as enabling acts of mathematical discovery.
ued fraction representation of the golden ratio: In a different field but a similar manner, Johannes Rydberg’s discov-
ery of his formula of hydrogen spectral lines16 resulted from his analy-
1 −1
sis of the spectral emission by chemical elements: λ = RH(n−2 −2
φ=1+ . 1 − n 2 ),
1+
1 (1) where λ is the emission wavelength, RH is the Rydberg constant, and n1
1
1+
1+…
and n2 are the upper and lower quantum energy levels, respectively.
This insight, emerging directly from identifying patterns in data, had
We use the term regular formulas (RFs) for any mathematical expres- profound implications on quantum mechanics and modern physics.
sion that can be encapsulated using a computable expression14, such Unlike measurements in physics and all other sciences, most
as equation (1). mathematical constants can be calculated to an arbitrary precision
The act of discovering new RFs is often attributed to profound intui- (number of digits) with an appropriate formula, thus providing an
tion, such as in the case of Gauss’ ability to see meaningful patterns absolute ground truth. In this sense, mathematical constants contain
in numerical data that led to the famous prime number theorem and an unlimited amount of data (for example, the digits in an irrational
new fields of analysis such as elliptic and modular functions. He is even number), which we use as ground truth for finding new RFs. Since the
1
Technion—Israel Institute of Technology, Haifa, Israel. 2The Technion Harry and Lou Stern Family Science and Technology Youth Center, Pre-University Education, Haifa, Israel. 3Google,
Tel Aviv, Israel. 4These authors contributed equally: Gal Raayoni, Shahar Gottlieb, Yahel Manor. ✉e-mail: [email protected]
Pattern
Redundancy Rigorous
learning and RF Validation Conjecture
elimination proof
generalization
Conjecturing Proving
Fig. 1 | Conceptual flow of the wider concept of the Ramanujan Machine. validated results form mathematical conjectures that need to be proven
First, using approaches of pattern learning and generalization, we can generate analytically, thus closing a complete research endeavour from pattern
a space of RF conjectures, for example, PCFs. We then apply a search algorithm, generation to proof, potentially yielding further mathematical insight.
validate potential conjectures, and remove redundant results. Finally,
fundamental constants are universal and ubiquitous in their applica- Our MITM-RF algorithm was able to produce several novel conjec-
tions, finding such patterns can reveal new mathematical structures tures that have short proofs, for example:
with broad implications, for example, the Rogers−Ramanujan con-
tinued fraction (which has implications on modular forms)17. Con-
4 1×1
=3− 2×3
sequently, having systematic methods to derive new RFs could help 3π − 8 6− 3×5
9−
research in many fields of science. 12 −
4×7
…
In this Article, we present a concept of learning mathematical rela-
tions of fundamental constants and provide a list of conjectures found 2 1 × (3 − 2 × 1)
=0−
using this method. Although the concept can be leveraged for many π+2 3−
2 × (3 − 2 × 2)
3 × (3 − 2 × 3)
forms of RFs, we demonstrate its potential with equations involving 6−
4 × (3 − 2 × 4)
9−
polynomial continued fractions (PCFs)18 …
b1 e 1
a0 + =4− 2
a1 +
b2
(2) e−2 5−
b3 3
a2 + 6− 4
a3 + … 7−
…
where the partial numerators and denominators an,bn are the evalua- 1 1
=1+
tions (at x = n) of polynomials α(x), β(x) ∈ ℤ[x ] , respectively. PCFs have e−2 1+
−1
been of interest to mathematicians for centuries and still are, for exam- 1+
2 (3)
−1
1+
ple, William Broucker’s π representation19 and Zudilin’s work on dif- 1+
…
3
LHS enumeration
1 Key Matching values
1 + const 3.1423 2.7183 6.4123 17.5438 0.7841 …
Numerical value (‘hits’)
1
…
Value
1 + const Symbolic lhs1 lhs2 lhs3 lhs4 lhs5 …
2 + const expression
…
f1 PCF a = 1 , b = 1
f2 PCF a = 1, b = 1
RHS enumeration
f1 PCF a = 1, b = 2
Value 2.718281 3.718281 17.543901 0.748123 0.412318 …
…
f1 PCF a = 1 + n , b = 1
…
f1 PCF a = 2 + n , b = 1
…
…
…
Fig. 2 | The Meet-In-The-Middle Regular Formula algorithm. The figure and search for matches. Finally, the matches are re-evaluated to higher
describes the MITM-RF algorithm that finds PCFs for fundamental constants. precision and compared again, thus eliminating false positives. The final
First, we enumerate the LHS to a low precision (for example, 10 digits) and store results are then presented as new conjectures.
the results in a hash table. Second, we enumerate over the RHS at low precision
Hilbert’s problems, Landau’s problems, and of course the Riemann Since the LHS and RHS calculations are performed up to a limited
Hypothesis44,45. Maybe the most famous example is Ramanujan, who precision, some of the candidate solutions are typically false positives,
posed dozens of conjectures involving fundamental constants and con- eliminated by calculating the RHS and LHS to higher precision in the
sidered them to be revelations from his family’s goddess7. Our work aims last stage (Fig. 2). See Methods for the algorithm complexity and imple-
to automate the process of conjecture generation and demonstrate it mentation details (see code at http://www.RamanujanMachine.com).
by providing new conjectures for fundamental constants. By analysing Our proposed MITM algorithm discovered previously known PCFs
mathematical relationships of fundamental constants that are aesthetic and new PCF conjectures for mathematical constants such as ζ(3) (that
and concise, the Ramanujan Machine can eventually extend the work is, the Apéry constant) and the Catalan constant, presented in equa-
of great mathematicians such as Gauss, Riemann and Ramanujan. tion (4). (Supplementary Information section A provides details of the
constants for which we ran searches, successful or otherwise). After
discovering dozens of PCFs, we empirically observed (and later proved,
The MITM-RF algorithm Supplementary Information section D) a relationship between the ratio
The first algorithm we present searches for a PCF of a given fundamental of the polynomial order of an and bn, and the formula’s convergence rate
constant c (for example, c = π) of the following form: (Extended Data Fig. 1). Supplementary Information section C provides
γ(c) a wider outlook on PCFs.
= f (PCF(α, β)), (5)
δ ( c) i
for a set of four integer-coefficient polynomials (α, β, γ and δ), and a The Descent&Repel algorithm
1
given set of functions {fi} (for example, f1 (x) = x , f2 (x) = x , ⋯). PCF(α, β) We propose a GD optimization method and demonstrate its success in
means the PCF with the partial numerator an = α(n) and denominator finding RFs. Although proved successful, the MITM-RF method is not
bn = β(n) defined in equation (2). trivially scalable. This issue can be targeted by either a more sophisti-
As showcased in Fig. 2, we start by enumerating over the two sides cated variant or by switching to an optimization-based method, as is
of equation (5) and successively generate integer polynomials for α, done by the following algorithm (Fig. 3).
β, γ and δ. We calculate the left-hand side (LHS) of each instance up to To find integer solutions to equation (5), we write the following con-
limited precision and store the results in a hash table. We continue by strained optimization problem with the loss function ℒ:
evaluating the right-hand side (RHS) and attempt to match each result
in the hash table, where successful attempts are considered candidate γ(π )
min ℒ = − PCF(α, β) where {α, β , γ , δ } ⊂ ℤ[x ]. (7)
solutions. The RHS is calculated with arbitrary-size integers, directly α, β , γ, δ δ (π )
using the recurrence formula for the numerators pn and the denomina-
tors qn of the rational approximation of the PCF: Solving this optimization problem with GD seems implausible because
we are only satisfied with exact ℒ = 0 for integer parameters. Non-zero
q−1 = 0, p−1 = 1, ℒ solutions are usually meaningless as mathematical conjectures, as
q0 = 1, p0 = a 0, they are only approximations.
(6)
Nevertheless, we found a feature of ℒ that helped us develop a slightly
qn +1 = an +1qn + bn +1qn −1, pn +1 = an +1pn + bn +1pn −1 . modified GD, which we name Descent&Repel (Fig. 3). Examples of the
10 10
es Coulomb
urv GD repulsion GD Repeat…
ac
0 inim es 0
Initial points
l m v Map Points
ba ur
Glo rc –20 –10
ro
er
–10 in
g –10
rg
i ve
D
–20 x –20 x
–20 –10 0 10 20 –20 –10 0 10 20 20 10
Coulomb
y y repulsion y
20 20 20
10 10 10
Seek
GD nearby
0 0 0
integers
–20 x –20 x – 20 x
–20 –10 0 10 20 –20 –10 0 10 20 –20 –10 0 10 20
Fig. 3 | The Descent&Repel algorithm. The figure describes the initial conditions (in this example, consisting of 600 points on a vertical line),
Descent&Repel algorithm that finds RFs for fundamental constants we perform ordinary GD alternated with ‘Coulomb’ repulsion between all the
by relying on GD optimization. The x and y axes are parameters defining the points. Finally, we alternate two GD optimizations to reach grid points: towards
polynomials of the continued fraction (in this case α(n) = n, β(n) = n2 + yn + x; integer points and the minimum curves. Lastly, we check whether any point
see Supplementary Information section E, Supplementary Table 4, and satisfies the equation. The colours indicate the loss ℒ (logarithmic scale): for
Supplementary Fig. 1). The key observation that enables this method is that the background, purple indicates larger loss and white indicates zero loss; for
almost all minima have zero loss ( ℒ = 0) and appear as (d − 1)-dimensional the points, red indicates larger loss and dark blue indicates zero loss.
manifolds, where d is the number of optimization variables. Starting with our
results appear in Extended Data Table 1. Without the restriction of of digits of the Catalan constant. Efficient formulas for calculating
being integers, the zero ℒ minima are not 0-dimensional points but fundamental constants to high precision are used for checking their
rather (d−1)-dimensional manifolds with d being the number of opti- statistical consistencies and properties, such as normality (the distri-
mization variables. Specifically, in the case plotted in Fig. 3, there are bution of digits in different integer bases)49.
d = 2 optimization variables, and therefore a 1-dimensional manifold As a consequence of the MITM-RF results for the Catalan constant, we
of minima, appearing as bright curves in the maps. This dimensional- found an infinite family of PCFs for the Catalan constant (see Methods).
ity of the minima is expected given the definition of the loss function Part of these PCFs have faster convergence rates than the current best
ℒ, which poses only a single constraint. Consequently, the GD process formula48. Figure 4a summarizes the convergence rates alongside the
is expected to result in a solution with ℒ = 0. The high dimension of the computational effort per term, conveying the comparative advantage
manifold of minima motivates our approach of adding the repel step of the new PCFs we found.
to the algorithm since most minima have a neighbourhood that con- Another important implication for such expressions is their potential
tains additional minima. See Methods for the algorithm initialization to help prove the irrationality of the Catalan constant. Each PCF pro-
and stages. vides a Diophantine approximation sequence that can be characterized
We ran the algorithm on several different search spaces (mostly with by an effective irrationality exponent that quantifies how ‘efficiently’
d = 2, Supplementary Information section E). The current implemen- it approximates the constant (see Methods).
tation of the algorithm serves as a proof of concept and as a testing A paper from 20035 found the state-of-the-art exponent of the
environment for GD variants. As such, it had not yet been executed Catalan constant to be approximately 0.524. A paper from 201650
on large search spaces. The success we had in finding conjectures in proved this value and presented the better value of about 0.554 as a
these limited runs shows the prospects of using this algorithm on larger conjecture. These values are now the best exponents available in the
search spaces with different parameter choices. literature. One of the PCFs we found here has an exponent of around
0.567, which surpasses all the previous values in the literature, as
shown in Fig. 4b. Finding an explicit sequence for which the exponent
Irrationality bounds of the Catalan constant is larger than 1 will directly prove irrationality. However, it is not trivial
Finding RFs for fundamental constants can have important pros- to find such a sequence explicitly (see, for example, ref. 51), and thus,
pects for proving their intrinsic properties. An example is Apéry’s it is of interest to try to find sequences for which the exponent is as
proof that ζ(3) is irrational, which uses a PCF representation3, and led large as possible.
to similar proofs for other constants46. Finding fast-converging RFs Figure 4b summarizes the convergence of the approximation expo-
could also provide more efficient ways of computing fundamental nent as a function of the number of computed terms. This compari-
constants; for example, one of the most efficient historical methods son includes the best values in the literature and several of our PCFs
of computing π was based on a formula by Ramanujan47. Similarly, the (detailed in Supplementary Information section G). We write the numer-
fastest-converging expression for the Catalan constant was a PCF by ical value of approximation exponent for each of the results in Sup-
Zudilin5 until a relatively recent contribution48. The latter was recently plementary Tables 5 and 6. Looking forward, it may well be that the
used in the y-cruncher algorithm for calculating the record number automated exploration of PCF Diophantine approximation sequences
Approximation exponent
0.6
Term/digits
C = 0.56
0.5
0.1 C=1
New (Supplementary Table 6, row 4)
C = 2.5 New (Supplementary Table 6, row 5)
0 0.45
–1 2 5 8 11 14 17 20 23 0 200 400 600 800 1,000
Compute degree Number of terms
Fig. 4 | Efficient computation of the Catalan constant with new PCFs. to see this result. b, The convergence of the effective irrationality exponent
Comparison of computational metrics with previous results. a, For each (lower bound on the Liouville–Roth irrationality measure, see Methods) as a
formula computing the Catalan constant, the scatter plot shows the function of the number of computed terms. The previous best result, first
asymptotic number of terms required per digit of accuracy, relative to the found in ref. 5, is presented in dark blue. A conjecture for a better value, from
computational effort (compute degree: defined as the smallest possible ref. 50, is presented by a horizontal orange line. The new PCF marked in green
polynomial degree that can be used in the calculation, found after surpasses both previous values and yields the new best value for the Catalan
transforming the PCF into a matrix of balanced degrees). Green hyperbolas constant’s approximation exponent. See Supplementary Information section
mark the relative efficiencies. Readers should search ‘Guillera (2019)’ within G (specifically, Supplementary Tables 5 and 6).
the page http://www.numberworld.org/y-cruncher/internals/formulas.html
will eventually provide a higher approximation exponent that can lead Their method combines the proof as an inherent part of the discovery
to proving the irrationality of the Catalan constant. and thus can be viewed as a successful case study of algorithms that
The same approach can also be used with other constants. More gen- combine automated conjecture generation and ATP.
erally, we expect further explorations of PCFs based on the Ramanujan A wide range of such identities is likely to be useful in future
Machine to lead to additional advances in Diophantine approxima- approaches for different math problems, especially in adjacent fields
tions and irrationality measures. For example, it could be intriguing to (for example, proving the irrationality of Riemann zeta function val-
look for PCFs for values of the Riemann zeta function at odd integers, ues20). More generally, automatically discovered formulas can assist
and specifically ζ(5) (ref. 52), because such PCFs may help prove their further research efforts by enriching the modern ‘integral books’, which
irrationality and provide more efficient ways of calculating ζ values. are software and computing environments such as Maple or Wolfram
Mathematica. This process provides an elegant example of the symbio-
sis between computer-generated mathematics and human-generated
Correspondence with the community mathematics.
Following the appearance of the initial version of our work on arXiv in Although our work focuses on PCFs, we think that it can be systemati-
201953, numerous people ran our algorithms, some found new conjec- cally extended to other space of candidate RF conjectures. We envision
tures, and a few provided proofs for the new formulas. Over the span harvesting the scientific literature (for example, over 1.5 million papers
of a few months, proofs for all the original manuscript formulas were on http://arXiv.org) to generalize known formulas and identify new RFs
presented. This led us to expand our search with the MITM-RF algorithm using machine learning algorithms such as clustering methods (see,
and find more intriguing results such as PCFs for ζ(3), π2, and Catalan’s for example, ref. 55). The scientific literature provides a strong ground
constant, most of which are still unproved. truth for candidate RFs, and this method may discover mathematical
This back-and-forth dynamics between algorithms and mathema- conjectures that go far beyond PCFs.
ticians is the essence of what we believe can be achieved with auto-
matically generated conjectures of fundamental constants. A recent
example of this successful correspondence is the work of Zeilberger’s Outlook on the universality of fundamental constants
group54, generalizing and proving part of the conjectures that appeared Our work provides the groundwork for a more comprehensive study
in the earlier arXiv version of our work53 (Supplementary Information into fundamental constants and their underlying mathematical
section F.3). An example from their paper is the elegant formula structure. Our proposed algorithms found PCFs for the constants π,
e, Catalan’s constant and ζ(3). Table 1 presents a selection of additional
a×1 a a + k +1 fundamental constants of particular interest to our approach. For some
1+k+ = ,
( )
a×2 s
2+k+ (a + k)! ea − ∑ s=0
a+k a of them, such as the Feigenbaum constants, no PCF (or any RF) is known.
3+k+… s!
Potentially the most interesting constants for further research are
a > − k , a ∈ ℕ, k ∈ ℤ, from fields like number theory (not so ironically, some of them are
Generalizations of the MITM-RF algorithm The irrationality measure of a constant and its lower bound
We also generalized the algorithm to allow for α and β to be integer The irrationality measure of x, sometimes called the approximation
sequences generated by any countable parametric function. For exam- exponent or the Liouville–Roth constant6, is defined as the largest
ple, α and β can be interlaced sequences, that is, they may consist of μ = μ(x) for which there exists a sequence of rational numbers pn/qn
multiple (alternating) integer polynomials. For example, in the case of that satisfy 0 < |x − pn /qn| < q−μ. For every x, μ(x) is always either exactly
just two interlaced sequences, odd values of n are equal to one polyno- 1 when x is rational or ≥2 when x is irrational.
mial, and even values of n are equal to a different polynomial. We can define the effective irrationality exponent of a sequence
Seeing how successful our algorithm was despite its relative simplic- as the largest (supremum) that satisfies the inequality. Sequences of
ity, we believe there is still ample room for new results. By leveraging this kind are called Diophantine approximations6. Every PCF we find is
more sophisticated algorithms, other results will follow, thus discover- such a sequence of rational numbers and it has an effective irrational-
ing hidden truths about even more fundamental constants, perhaps ity exponent μ′. Generally, each explicit Diophantine approximation
with formulas that are more complex than the PCFs used in this work. log(| x − p n / q n |)
sequence gives a μ′ that can be calculated by μ′ = lim inf log(q n / gcd(p n , q n))
,
n →+∞
where gcd indicates the greatest common divisor. Each μ′ provides a
Stages of the Descent&Repel algorithm
lower bound for the irrationality measure μ(x) of the value x to which
We chose the optimization problem’s variables as the coefficients of
the sequence converges.
the α, β, γ, δ polynomials in equation (7). The algorithm is initialized
However, finding an explicit sequence from which the value of μ(x)
with a large set of points. In the specific examples we present, all initial
can be extracted is a challenge. This challenge motivated the search for
conditions were set on a line, as shown in Fig. 3.
such sequences for important fundamental constants, with the goal of
The algorithm is then constructed of three main stages: GD, ‘Repel’,
extracting bounds on their value of μ(x). When a constant is not known
and Lattice GD. We iterate between the first two stages and then perform
to be rational, the sequences all still have μ′ ≤ 1, as in the case of the
the third stage once to converge to a possible solution.
Catalan constant. Then, finding an explicit sequence for which μ′ > 1
(1) GD. We perform a standard GD separately for each point xi, which
will directly prove irrationality. In principle, there must be a sequence
is a d-dimensional vector. The loss function ℒ is defined in equation (7),
with μ′ = 1 or μ′ ≥ 2. However, it is not trivial to find such a sequence
and thus, for each point xi, we define its next iteration t + 1 as
explicitly51,56,57, and thus, it is of interest to try to find sequences pn/qn
xi(t +1) = xi(t ) − μ∇ℒ| x (i t ), where μ is some small enough step size.
for which μ′ is as large as possible.
(2) ‘Repel’. We update the values of all the points so that they ‘push
1
off’ one another via a Coulomb-like repulsion proportional to .
‖ xi − xj ‖2 An infinite family of PCFs with complex variables
Namely, we define the ‘repel’ iterations as
Example outcomes of the mathematics−algorithm correspondence in
xi(t ) − x (jt ) our work are aesthetic generalizations that we found based on results
xi(t +1) = xi(t ) + ν ∑ 3 ,
of the Ramanujan Machine algorithms. One example is the following
j xi(t ) − x (jt )
PCF with a complex variable:
with another small step size ν that accounts for the strength of the 1 × (2 × z − 1) 2 2× z +1
∀z ∈ ℂ : 1+ 2 × (2 × z − 3)
=
repulsion. The ‘repel’ mechanism is used to increase the search space 4+ 3 × (2 × z − 5) π2 × z
to more effectively cover the space of integer parameters and thus 7+
4 × (2 × z − 7) z
10 +
13 + …
increase the probability of finding a match. We tune the repulsion
strength heuristically.
(3) Lattice GD. We enforce the constraint of integer results by alter- This PCF was found as a conjecture—by generalizing several automati-
nating the GD optimization between the original loss ℒ of equation (7) cally generated conjectures (specific integer values for z), generated
and a different loss function ℒI that scales like the square of the by the MITM-RF algorithm. Like many other results involving π, it can
be proved using generalized hypergeometric functions. The proof is 56. Zudilin, W. A third-order Apéry-like recursion for ζ(5). Mathematical Notes [Mat. Zametki]
72, 733–737 [796–800] (2002).
quite straightforward, provided one finds certain identities involving 57. Rivoal, T. Rational approximations for values of derivatives of the Gamma function. Trans.
ratios of generalized hypergeometric functions, presented in Supple- Am. Math. Soc. 361, 6115–6149 (2009).
mentary Information section F.2.1 along with other proofs and related
information. It remains to be seen whether related methods would be Acknowledgements We thank M. Soljačić, B. Weiss, D. Soudry and D. Carmon for helpful
able to prove the unproved conjectures in Supplementary Tables 1−3 discussions. I.K. is grateful for the support of R. Magid and B. Magid and for the support of the
Azrieli Faculty Fellowship. Y.M. acknowledges the support and guidance of the Israeli Alpha
of Supplementary Information section A. The above family of PCFs
Program for Excellent High-School Students.
is brought here as an example of how automatically generated con-
jectures can be generalized to a wider conjecture and later a proof. Author contributions G.R., G.P. and I.K. implemented the first proof-of-concept algorithms.
We believe that this process could be used more widely with future G.R. implemented the first generation MITM-RF algorithm. S.G. and Y. Harris implemented the
state-of-the-art MITM-RF algorithm. S.G. made the developments that led to the discovery of
results of the Ramanujan Machine, so that automatically generated the ζ(3) and Catalan PCFs. Y.M. implemented the Descent&Repel algorithm. Y.M., S.G., U.M.
conjectures on fundamental constants become a catalyst for math- and I.K found how to convert the Catalan PCFs into expressions with record approximation
ematical research. For an extended discussion, see Supplementary exponents and fast convergence rates. U.M., Y.M., G.R., S.G., Y. Harris and I.K. proposed parts
of the algorithms and developed proofs for some of the conjectures. D.H. and Y. Hadad
Information section B. developed the online community. Y. Hadad, G.P. and I.K. came up with the conceptual flow of
the wider concept. I.K. conceived the idea and led the research. All authors provided
substantial input to all aspects of the project and to the writing of the paper.
Data availability Competing interests The authors declare no competing interests.
All the results of the Ramanujan Machine project are shared in the paper,
with newer updates appearing periodically on the project website. Additional information
Supplementary information The online version contains supplementary material available at
https://doi.org/10.1038/s41586-021-03229-4.
Correspondence and requests for materials should be addressed to I.K.
Code availability Peer review information Nature thanks Yang-Hui He, Doron Zeilberger and the other,
anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer
Code is available at: http://www.ramanujanmachine.com/ and the reports are available.
GitHub links therein. Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | Convergence rates of the PCFs. The plots present convergence rates, and on the right are PCFs that converge polynomially.
the absolute difference between the PCF value and the corresponding The majority of previously known PCFs for π converge polynomially, whereas
fundamental constant (that is, the error) versus the number of terms calculated all of our newly found results converge exponentially.
in the PCF. On the left are PCFs with exponential/super-exponential
Extended Data Table 1 | RFs for π and e found in a proof-of-concept run of the Descent&Repel algorithm