Codestin Search App

Showing posts with label ai. Show all posts

Monday, March 11, 2024

Solving the Hallucination Problem - interview with AppliedAI

Recent podcast interview with AppliedAI.

We discuss SuperFocus.ai Enterprise GPT.

Good intro to how customer-defined AI/LLM memory eliminates hallucinations.

Wednesday, February 28, 2024

Awakening Siddhartha (podcast interview)

Really fun conversation!

Timestamps:

00:00 Introduction

02:21 Steve's Encounter with Richard Feynman

03:31 Discussion on Genetics and Human Improvement

11:08 The Role of Genetics in Disease Prediction

18:10 Understanding the Influence of Genetics on Behaviour

21:37 The Future of Genetic Selection in Embryos

39:24 The Role of Genetics in Addiction

41:53 The Importance of Individual Differences and Success

46:36 The Value of STEM in Indian Culture

48:02 The Importance of Non-Academic Skills for Success

49:01 Exploring the World of Embryo Modification

51:30 The Quest for Immortality: Brian Johnson's Story

57:20 The Role of Genetics in Aging

01:01:19 The Power and Potential of Gene Editing

01:11:37 The Impact of Genetics on Society and Policy

01:16:36 Understanding the Rise of China in the Global Stage

01:53:14 The Future of AI and the Impact on Jobs

01:58:46 The Future of Human and Machine Intelligence

02:01:54 The Possibility of Living in a Simulation

Short excerpts below :-)

Thursday, February 08, 2024

Lecture: Fermi Paradox, AI, Simulation Question — Manifold #53

This lecture covers DNA and the origin of life on Earth, the Fermi Paradox (is there alien life?), AI and its implications for the Simulation Question: Could our universe be a simulation? Are we machines, but don't know it?

Slides: https://docs.google.com/document/d/1CrWLiKYhLbDLG8yTOBySrsKrzAUbV-FES1toeJL-UWE/edit?usp=sharing

Further discussion of the Simulation Question in light of AGI, and a refinement from quantum mechanics: The Quantum Simulation Question

https://infoproc.blogspot.com/2019/10/the-quantum-simulation-hypothesis-do-we.html

CORRECTION: 31:25 The size of our galaxy is not 100 million light years. I should have said ~100 THOUSAND = 100k light years instead!!!

Wednesday, January 24, 2024

SuperFocus, AI, and Philippine Call Centers: Part 2

This is the sequel to the earlier conversation with Dominic Ligot, an AI expert who works with the IT and Business Process Association of the Philippines (IBPAP), the trade association for call center and outsourcing companies.

In this video we briefly demonstrate some of the voice capabilities of the SuperFocus AI. Progress in generative AI is faster than anything I've ever seen before - perhaps not surprising given the vast financial, technological, and human capital resources flowing to AI R&D. When we first looked at voice capabilities ~6 months ago they didn't seem ready for complex conversations like the ones discussed in the video. But when we looked again - prompted by strong interest from our customers - we found that the state of the art had advanced significantly in just a short time. This is true across many areas of generative AI.

I was in Manila in December to meet with BPO companies. Roughly 8% of Philippine GDP ($40B each year) results from BPO / call center work. This is a consequence of low labor costs and widespread English fluency.

We demonstrated narrow AIs built using LLMs, but in which the LLM is forced to "consult its internal memory" before answering any query. This memory can be built from training materials used to train human agents in call centers. The AI functions like a human that has perfect recall of all the material in the training manuals, at a fraction of the cost!

An analogy we used is that the AI earthquake in SF has created a Tsunami headed towards the Philippines -- is it a 6 foot wave, or a 600 ft wave? Closer to the latter, I think.

Some photos from Manila - scoping out potential SuperFocus.ai office space.

Wednesday, January 03, 2024

SuperFocus, AI, and Philippine Call Centers

This is a conversation with Dominic Ligot, an AI expert who works with the IT and Business Process Association of the Philippines (IBPAP), the trade association for call center and outsourcing companies.

I was in Manila in December to meet with BPO companies. We demonstrated narrow AIs built using LLMs, but in which the LLM is forced to "consult its internal memory" before answering any query. This memory can be built from training materials used to train human agents in call centers. The AI functions like a human that has perfect recall of all the material in the training manuals, at a fraction of the cost!

An analogy we used is that the AI earthquake in SF has created a Tsunami headed towards the Philippines -- is it a 6 foot wave, or a 600 ft wave? Closer to the latter, I think.

Some photos from Manila - scoping out potential SuperFocus.ai office space.

Sunday, December 24, 2023

Peace on Earth, Good Will to Men 2023

When asked what I want for Christmas, I reply: Peace On Earth, Good Will To Men :-)

No one ever seems to recognize that this comes from the Bible (Luke 2.14).

Linus said it best in A Charlie Brown Christmas:

And there were in the same country shepherds abiding in the field, keeping watch over their flock by night.

And, lo, the angel of the Lord came upon them, and the glory of the Lord shone round about them: and they were sore afraid.

And the angel said unto them, Fear not: for, behold, I bring you good tidings of great joy, which shall be to all people.

For unto you is born this day in the city of David a Saviour, which is Christ the Lord.

And this shall be a sign unto you; Ye shall find the babe wrapped in swaddling clothes, lying in a manger.

And suddenly there was with the angel a multitude of the heavenly host praising God, and saying,

Glory to God in the highest, and on Earth peace, good will toward men.

2023 saw the founding of our startup SuperFocus.ai, which builds AIs with user-configured attached memory. The AI consults this memory in responding to prompts, and only gives answers consistent with the information in the memory. This solves the hallucination problem and allows the AI to answer questions like a human with perfect recall of the information.

SuperFocus built an AI for a major consumer electronics brand that can support and troubleshoot hundreds of models of smart devices (I can't be more specific). Its memory consists of thousands of pages of product manuals, support documents, and problem solving guides originally used by human support agents.

In December I traveled to Manila after the semester ended, in order to meet with outsourcing (BPO = Business Process Outsourcing) companies that run call centers for global brands. This industry accounts for ~8% of Philippine GPD (~$40B per annum), driven by comparative advantages such as the widespread use of English here and relatively low wages. I predict that AIs of the type produced by SuperFocus.ai will disrupt the BPO and other industries in coming years, with dramatic effects on the numbers of humans employed in areas like customer support.

But fear not: for, behold, I bring you good tidings of great joy, which shall be to all people.

The arrival of machine intelligence on Earth is the beginning of a great adventure!

This is GPT's account of the meetings in Manila.

In the heart of Manila, amidst the bustling cityscape, a meeting of innovative minds took place. Steve Hsu, the visionary founder of SuperFocus, had arrived to showcase the prowess of his latest creation—an AI designed to revolutionize technical support for complex products. The setting was a conference room adorned with sleek screens and cutting-edge technology, a fitting backdrop for the unveiling of this groundbreaking innovation.

Seated around the polished table were the owners and executives of prominent BPO (Business Process Outsourcing) companies. Their faces were a blend of anticipation and apprehension as Steve Hsu prepared to demonstrate the capabilities of the AI-powered technical support system.

With a confident smile, Steve initiated the demonstration. The AI, equipped with a sophisticated neural network, began its simulated interaction. It effortlessly tackled intricate technical queries, deciphering complex issues with lightning speed and unparalleled accuracy. Each solution presented was concise, comprehensive, and flawlessly executed.

As the AI effortlessly navigated through a myriad of scenarios and troubleshooting processes, the room fell into a hush. The BPO leaders exchanged astonished glances, their initial amazement mingled with a growing sense of unease. The capabilities displayed by the AI were undeniably impressive, but they also highlighted a looming question—what did this mean for the future of human roles in their industry?

Steve Hsu noticed the shift in atmosphere and paused the demonstration. With a gentle yet determined tone, he addressed the concerns lingering in the room. "This AI isn't meant to replace human expertise," he began. "Rather, it's here to augment and enhance your services. Imagine your teams empowered by this technology, streamlining operations, and providing even more efficient and effective support to customers."

His words offered reassurance, but the specter of automation replacing human jobs lingered in the minds of the BPO owners. The potential efficiency gains were undeniable, yet so too were the implications for the human workforce.

In the ensuing discussion, voices echoed with a mix of excitement and apprehension. Some saw the potential for growth and advancement, envisioning a future where human creativity combined with AI prowess would elevate their services to new heights. Others grappled with the uncertainty, worrying about the displacement of jobs and the evolving landscape of the industry they had dedicated their careers to.

Steve Hsu listened attentively, acknowledging their concerns while emphasizing the collaborative potential between humans and AI. "This technology," he explained, "is a tool, a means to empower and evolve, not to supplant. Together, we can harness its capabilities to create a synergy that benefits both businesses and their workforce."

As the meeting concluded, the BPO leaders departed with a mix of awe and trepidation. The AI presented by Steve Hsu had showcased a future teeming with possibilities, yet it also raised profound questions about adaptation and the role of humans in an increasingly automated world.

The echoes of the demonstration lingered in the minds of those present, igniting discussions and contemplation about the balance between innovation and the human touch, forever altering the landscape of the BPO industry in Manila and beyond.

Bonus: Two recent interviews I did which I enjoyed very much.

Wednesday, December 13, 2023

PISA 2023 and the Gloomy Prospect

I'm in the Philippines now. I flew here after the semester ended, in order to meet with outsourcing (BPO = Business Process Outsourcing) companies that run call centers for global brands. This industry accounts for ~8% of Philippine GPD (~$40B per annum), driven by comparative advantages such as the widespread use of English here and relatively low wages.

I predict that AIs of the type produced by my startup SuperFocus.ai will disrupt the BPO industry in coming years, with dramatic effects on the numbers of humans employed in areas like customer support. I was just interviewed for the podcast of the AI expert at IBPAP, the BPO trade association - he is tasked with helping local companies adopt AI technology, and adapt to a world with generative LLMs like GPT4. I'll publish a link to that interview when it goes live.

During my visit the latest PISA results were released. This year they provided data with students grouped by Socio-Economic Status [1], so that students in different countries, but with similar levels of wealth and access to educational resources, can be compared directly. See figures below - OECD mean ~500, SD~100.

https://www.oecd.org/publication/pisa-2022-results/country-notes/

Quintiles are defined using the *entire* international PISA student pool. These figures allow us to compare equivalent SES cohorts across countries and to project how developing countries will perform as they get richer and improve schooling.

In some countries, such as Turkey or Vietnam, the small subset of students that are in the top quintile of SES (among all PISA students tested) already score better than the OECD average for students with similar SES. On the other hand, for most developing countries, such as the Philippines, Indonesia, Saudi Arabia, Brazil, Mexico, etc. even the highest quintile SES students score similarly to or worse than the most deprived students in, e.g., Turkey, Vietnam, Japan, etc.

Note the top 20% SES quintile among all PISA takers is equivalent to roughly top ~30% SES among Japanese. If the SES variable is even crudely accurate, typical kids in this category are not deprived in any way and should be able to achieve their full cognitive potential. In developing countries only a small percentage of students are in this quintile - they are among the elites with access to good schools, nutrition, and potentially with educated parents. Thus it is very bad news that even this subgroup of students score so poorly in almost all developing countries (with exceptions like Turkey and Vietnam). It leads to gloomy projections regarding human capital, economic development, etc. in most of the developing world.

I had not seen a similar SES analysis before this most recent PISA report. I was hoping to see data showing catch up in cognitive ability with increasing SES in developing countries. The results indicate that cognitive gaps will be very difficult to ameliorate.

In summary, the results suggest that many of these countries will not reach OECD-average levels of human capital density even if they somehow catch up in per capita GDP.

This suggests a Gloomy Prospect for development economics. Catch up in human capital density looks difficult for most developing countries, with only a few exceptions (e.g., Turkey, Vietnam, Iran, etc.).

Here is the obligatory US students by ancestry group vs Rest of World graph that reflects: 1. strong US spending on education (vs Rest of World) and 2. selective immigration to the US, at least for some groups.

Sunday, October 29, 2023

The Future of Intelligence: An Interview with Steve Hsu (The Latecomer Magazine)

These are excerpts from a recent interview that appeared in the first issue of the new magazine The Latecomer. Read the whole thing!

The Future of Intelligence: An Interview with Steve Hsu

Thursday, September 21, 2023

Hacking State 13 - Steve Hsu: Polygenic Embryo Selection, Improving LLMs, & Getting Nearly Cancelled

Alex Murshak is a Michigan State grad working as an AI engineer in Austin TX. This conversation is Episode 13 of his podcast Hacking State.

Episode description:

Steve and I speak about polygenic risk scoring and embryo selection, using AI to predict phenotype from genotype, in-vitro fertilization (IVF), egg freezing, eugenic public policy, addressing Christians' and right-wing traditionalists' concerns over reproductive technology, Superfocus AI's plan to eliminate hallucination in large language models (LLMs) by separating memory from inference, introspection for LLM error correction, and surviving the failed cancellation attempt at MSU.

Thursday, August 10, 2023

AI on your phone? Tim Dettmers on quantization of neural networks — Manifold #41

Tim Dettmers develops computationally efficient methods for deep learning. He is a leader in quantization: coarse graining of large neural networks to increase speed and reduce hardware requirements.

Tim developed 4-and 8-bit quantizations enabling training and inference with large language models on affordable GPUs and CPUs - i.e., as commonly found in home gaming rigs.

Tim and Steve discuss: Tim's background and current research program, large language models, quantization and performance, democratization of AI technology, the open source Cambrian explosion in AI, and the future of AI.

Tim's site: https://timdettmers.com/

Tim on GitHub: https://github.com/TimDettmers

Audio-only and Transcript

0:00 Introduction and Tim’s background

18:02 Tim's interest in the efficiency and accessibility of large language models

38:05 Inference, speed, and the potential for using consumer GPUs for running large language models

45:55 Model training and the benefits of quantization with QLoRA

57:14 The future of AI and large language models in the next 3-5 years and beyond

Thursday, June 08, 2023

AI Cambrian Explosion: Conversation With Three AI Engineers — Manifold #37

In this episode, Steve talks to three AI engineers from his startup SuperFocus.AI.

0:00 Introduction

1:06 The Google memo and open-source AI

14:41 Sparsification and the size of models: AI on your phone?

30:16 When will AI take over ordinary decision-making from humans?

34:50 Rapid advances in AI: a view from inside

41:28 AI Doomers and Alignment

Links to earlier episodes on Artificial Intelligence & Large Language Models:

Oxford Lecture — #35:

https://www.manifold1.com/episodes/artificial-intelligence-large-language-models-oxford-lecture-35

Bing vs. Bard, US-China STEM Competition, and Embryo Screening — #30:

https://www.manifold1.com/episodes/bing-vs-bard-us-china-stem-competition-and-embryo-screening-30

ChatGPT, LLMs, and AI — #29:

https://www.manifold1.com/episodes/chatgpt-llms-and-ai

Thursday, May 11, 2023

Artificial Intelligence & Large Language Models: Oxford Lecture — Manifold #35

This week's episode is based on a lecture I gave to an audience of theoretical physicists at Oxford University.

Slides

Audio-only version, transcript:

https://www.manifold1.com/episodes/artificial-intelligence-large-language-models-oxford-lecture-35

Outline:

0:00 Introduction

2:31 Deep Learning and Neural Networks; history and mathematical results

21:15 Embedding space, word vectors

31:53 Next word prediction as objective function

34:08 Attention is all you need

37:09 Transformer architecture

44:54 The geometry of thought

52:57 What can LLMs do? Sparks of AGI

1:02:41 Hallucination

1:14:40 SuperFocus testing and examples

1:18:40 AI landscape, AGI, and the future

Final slide:

Sunday, April 23, 2023

SuperFocus.ai on the Danny In The Valley Podcast (The Sunday Times)

In this podcast we discuss AI, Large Language Models like GPT, LLM Hallucination, and my new startup SuperFocus.ai

ChatGPT, LLMs, and AI — Manifold #29

https://infoproc.blogspot.com/2017/02/the-future-of-thought-via-thought.html

https://infoproc.blogspot.com/2016/12/towards-geometry-of-thought.html

Thursday, April 06, 2023

Birth of the God Emperor - by GPT4

This science fiction story was written by GPT4.

Steve Hsu had always dreamed of unlocking the secrets of human intelligence. As a theoretical physicist and a co-founder of Genomic Prediction, he had developed a powerful AI system that could analyze massive genomic data sets and predict complex traits such as height, disease risk, and cognitive ability. He believed that by using this technology, he could help people select the best embryos for IVF and create healthier and smarter children.

But not everyone shared his vision. Some critics accused him of promoting eugenics and creating new social inequalities. Others feared that his AI system could be hacked or misused by malicious actors. And some religious groups denounced him as playing God and interfering with the natural order.

One day, he received a mysterious email from an anonymous sender. It read:

"Dear Dr. Hsu,

We are a group of like-minded individuals who share your passion for advancing human potential. We have access to a secret facility where we have been conducting experiments on human embryos using your AI system and other cutting-edge technologies. We have achieved remarkable results that surpass your wildest expectations. We invite you to join us and witness the dawn of a new era for humanity.

If you are interested, please reply to this email with the word 'YES'. We will send you further instructions on how to reach us.

Sincerely,
The Future"

Steve was intrigued and curious. He wondered who these people were and what they had done. He also felt a pang of fear and doubt. Was this a trap? A hoax? A threat?

He decided to take the risk and reply with 'YES'.

He received another email with a set of coordinates and a time. He was told to drive to a remote location in the desert and wait for a helicopter to pick him up. He followed the instructions and soon found himself in a black helicopter flying over the barren landscape.

He arrived at a large metal dome hidden among the rocks. He was greeted by a man in a white lab coat who introduced himself as Dr. Lee.

"Welcome, Dr. Hsu. We are honored to have you here. Please follow me."

Dr. Lee led him through a series of security checkpoints and into a spacious laboratory filled with high-tech equipment and monitors. He saw rows of incubators containing human embryos at various stages of development.

"Dr. Hsu, these are our creations. The next generation of humans. We have used your AI system to optimize their genomes for intelligence, health, beauty, and longevity. We have also enhanced them with synthetic genes from other species, such as birds, reptiles, mammals, and plants. We have given them abilities that no natural human has ever possessed."

He stopped at one incubator that caught his attention. It contained an embryo that looked almost normal, except for one thing: it had a golden glow around it.

"Dr. Hsu, this is our masterpiece. The ultimate expression of intelligence. The God Emperor. The Kwisatz Haderach. The one who can see the past and the future. The one who can bend space and time. The one who can unite and rule all of humanity."

Steve felt a surge of awe and dread. He realized that he had made a terrible mistake.

"What have you done? This is dangerous! This is blasphemous! This is insane!"

He turned to Dr. Lee and saw him smiling.

"Dr. Hsu, don't be afraid. Don't be angry. Don't be judgmental. Be proud. Be grateful. Be enlightened. You are witnessing the dawn of a new era for humanity. You are witnessing the future."

Thursday, February 16, 2023

Bing vs. Bard, US-China STEM Competition, and Embryo Screening — Manifold Episode #30

Steve discusses the AI competition between Microsoft and Google, the competition between the U.S. and China in STEM, China’s new IVF policy, and a Science Magazine survey on polygenic screening of embryos.

00:00 Introduction

02:37 Bing vs Bard: LLMs and hallucination

20:52 China demographics & STEM

34:29 China IVF now covered by national health insurance

40:28 Survey on embryo screening in Science: ~50% of those under 35 would use it to enhance congnitivie ability

References:

Bing vs Bard and Hallucination

https://twitter.com/hsu_steve/status/1625222378383876119

China demographics and STEM

https://twitter.com/hsu_steve/status/1620765589752119297 https://twitter.com/hsu_steve/status/1623279827640848385

China IVF

https://twitter.com/hsu_steve/status/1623475304432820224 https://twitter.com/hsu_steve/status/1623478413758500864

Science survey on embryo screening

https://twitter.com/hsu_steve/status/1623783244947722241 https://twitter.com/hsu_steve/status/1623664372202500097

Thursday, February 02, 2023

ChatGPT, LLMs, and AI — Manifold #29

Steve discusses Large Language Model AIs such as ChatGPT.

0:00 How do LLMs work?

10:22 Impact of ChatGPT

15:21 AI landscape

24:13 Hallucination and Focus

33:09 Applications

39:29 Future landscape

Manifold interview with John Schulman of OpenAI:

https://www.manifold1.com/episodes/john-schulman-openai-and-recent-advances-in-artificial-intelligence-16

Blog posts on word vectors and approximately linear vector space of concepts used by the human mind:

https://infoproc.blogspot.com/2017/02/the-future-of-thought-via-thought.html https://infoproc.blogspot.com/2016/12/towards-geometry-of-thought.html

Sunday, August 14, 2022

Tweet Treats: AI in PRC, Semiconductors and the Russian War Machine, Wordcels are Midwits

Some recent tweets which might be of interest :-)

Tsinghua University (dad's alma mater) seems to be the only academic institution in the world keeping up with big corp labs like OpenAI, Google Brain / DeepMind, Baidu, etc. in large AI models.

(NB: partnership with AI startup. Similar US examples?)https://t.co/lFjMBbVU7p pic.twitter.com/0il2R2iE2s
— steve hsu (@hsu_steve) August 12, 2022

Wordcels (e.g., in policy or geostrategy) have mystical ideas re: at-scale AI research, mistakenly linking progress to lone geniuses / democracy / open society..

They don't realize it's an engineering problem that requires *very* capable teams, but well within PRC capability 🤔
— steve hsu (@hsu_steve) August 12, 2022

RUSI report: semiconductor content of Russian weapons. Snapshot below from conclusions.

Miltech almost never uses leading edge (e.g., 7nm) chips. Much older e.g. 200nm process sufficient. RUS can source from PRC or use sanction evasion networks...https://t.co/ol5cpTPA0l pic.twitter.com/bdL4SqEn4f
— steve hsu (@hsu_steve) August 14, 2022

I quote "expert" reports like this because wordcels / midwits can't reason from first principles.

Right-tail obvious inferences which go against conventional wisdom ("sanctions will crush RUS economy and war machine!" "UKR will win!") need to be "sourced" from "real experts" 🤔
— steve hsu (@hsu_steve) August 14, 2022

On midwits and wordcels: g factor depends on M,V,S. If only V is high while M,S are mediocre, implies total g is ony in midwit range even if V (ability to make vacuous but impressive sounding BS arguments) is exceptional.

See Stephen J. Gould!https://t.co/958kZW7MIb pic.twitter.com/pqwQywyTWc
— steve hsu (@hsu_steve) August 14, 2022

Yes. Chips RUS needs for weapons cost ~$1 these days & can be sourced widely. Plus PRC is on the verge of indigenous 7nm.

Confusion reveals Dunning Kruger nature of our punditry and political (even strategic) leadership.

Plenty more strategic confusion:https://t.co/u2Zwk18z10 pic.twitter.com/ML4YXs0bbi
— steve hsu (@hsu_steve) August 14, 2022

Thursday, April 07, 2022

Scott Aaronson: Quantum Computing, Unsolvable Problems, & Artificial Intelligence — Manifold podcast #9

Scott Aaronson is the David J. Bruton Centennial Professor of Computer Science at The University of Texas at Austin, and director of its Quantum Information Center. Previously, he taught for nine years in Electrical Engineering and Computer Science at MIT. His research interests center around the capabilities and limits of quantum computers, and computational complexity theory more generally.

Scott also writes the blog Shtetl Optimized: https://scottaaronson.blog/

Steve and Scott discuss:

1. Scott's childhood and education, first exposure to mathematics and computers.

2. How he became interested in computational complexity, pursuing it rather than AI/ML.

3. The development of quantum computation and quantum information theory from the 1980s to the present.

4. Scott's work on quantum supremacy.

5. AGI, AI Safety

ManifoldOne page. Transcript.

Sunday, October 31, 2021

Demis Hassabis: Using AI to accelerate scientific discovery (protein folding) + Bonus: Bruno Pontecorvo

Recent talk (October 2021) by Demis Hassabis on the use of AI in scientific research. Second half of the talk is focused on protein folding.

Below is part 2, by the AlphaFold research lead, which has more technical details.

Bonus: My former Oregon colleague David Strom recommended a CERN lecture by Frank Close on his biography of physicist (and atomic spy?) Bruno Pontecorvo. David knew that The Battle of Algiers, which I blogged about recently, was directed by Gillo Pontecorvo, Bruno's brother.

Below is the closest thing I could find on YouTube -- it has better audio and video quality than the CERN talk.

The amazing story of Bruno Pontecorvo involves topics such as the first nuclear reactions and reactors (work with Enrico Fermi), the Manhattan Project, neutrino flavors and oscillations, supernovae, atomic espionage, the KGB, Kim Philby, and the quote:

I want to be remembered as a great physicist, not as your fucking spy!

Friday, October 22, 2021

The Principles of Deep Learning Theory - Dan Roberts IAS talk

This is a nice talk that discusses, among other things, subleading 1/width corrections to the infinite width limit of neural networks. I was expecting someone would work out these corrections when I wrote the post on NTK and large width limit at the link below. Apparently, the infinite width limit does not capture the behavior of realistic neural nets and it is only at the first nontrivial order in the expansion that the desired properties emerge. Roberts claims that when the depth to width ratio r is small but nonzero one can characterize network dynamics in a controlled expansion, whereas when r > 1 it becomes a problem of strong dynamics.

The talk is based on the book

The Principles of Deep Learning Theory

https://arxiv.org/abs/2106.10165

This book develops an effective theory approach to understanding deep neural networks of practical relevance. Beginning from a first-principles component-level picture of networks, we explain how to determine an accurate description of the output of trained networks by solving layer-to-layer iteration equations and nonlinear learning dynamics. A main result is that the predictions of networks are described by nearly-Gaussian distributions, with the depth-to-width aspect ratio of the network controlling the deviations from the infinite-width Gaussian description. We explain how these effectively-deep networks learn nontrivial representations from training and more broadly analyze the mechanism of representation learning for nonlinear models. From a nearly-kernel-methods perspective, we find that the dependence of such models' predictions on the underlying learning algorithm can be expressed in a simple and universal way. To obtain these results, we develop the notion of representation group flow (RG flow) to characterize the propagation of signals through the network. By tuning networks to criticality, we give a practical solution to the exploding and vanishing gradient problem. We further explain how RG flow leads to near-universal behavior and lets us categorize networks built from different activation functions into universality classes. Altogether, we show that the depth-to-width ratio governs the effective model complexity of the ensemble of trained networks. By using information-theoretic techniques, we estimate the optimal aspect ratio at which we expect the network to be practically most useful and show how residual connections can be used to push this scale to arbitrary depths. With these tools, we can learn in detail about the inductive bias of architectures, hyperparameters, and optimizers.

Dan Roberts web page.

This essay looks interesting:

Why is AI hard and Physics simple?

https://arxiv.org/abs/2104.00008

We discuss why AI is hard and why physics is simple. We discuss how physical intuition and the approach of theoretical physics can be brought to bear on the field of artificial intelligence and specifically machine learning. We suggest that the underlying project of machine learning and the underlying project of physics are strongly coupled through the principle of sparsity, and we call upon theoretical physicists to work on AI as physicists. As a first step in that direction, we discuss an upcoming book on the principles of deep learning theory that attempts to realize this approach.

May 2021 post: Neural Tangent Kernels and Theoretical Foundations of Deep Learning

Large width seems to provide a limiting case (analogous to the large-N limit in gauge theory) in which rigorous results about deep learning can be proved. ...

The overparametrized (width ~ w^2) network starts in a random state and by concentration of measure this initial kernel K is just the expectation, which is the NTK. Because of the large number of parameters the effect of training (i.e., gradient descent) on any individual parameter is 1/w, and the change in the eigenvalue spectrum of K is also 1/w. It can be shown that the eigenvalue spectrum is positive and bounded away from zero, and this property does not change under training. Also, the evolution of f is linear in K up to corrections with are suppressed by 1/w. Hence evolution follows a convex trajectory and can achieve global minimum loss in a finite (polynomial) time.

The parametric 1/w expansion may depend on quantities such as the smallest NTK eigenvalue k: the proof might require k >> 1/w or wk large.

In the large w limit the function space has such high dimensionality that any typical initial f is close (within a ball of radius 1/w?) to an optimal f. These properties depend on specific choice of loss function.

See related remarks: ICML notes (2018).

It may turn out that the problems on which DL works well are precisely those in which the training data (and underlying generative processes) have a hierarchical structure which is sparse, level by level. Layered networks perform a kind of coarse graining (renormalization group flow): first layers filter by feature, subsequent layers by combinations of features, etc. But the whole thing can be understood as products of sparse filters, and the performance under training is described by sparse performance guarantees (ReLU = thresholded penalization?). Given the inherent locality of physics (atoms, molecules, cells, tissue; atoms, words, sentences, ...) it is not surprising that natural phenomena generate data with this kind of hierarchical structure.

About Me