Microsoft’s $100 Billion Scientific Gamble
Last Updated on April 2, 2024 by Editorial Team
Author(s): Vincent Carchidi
Originally published on Towards AI.
Microsoft’s Stargate project with OpenAI is premised on a potentially costly misunderstanding of how science works.
By Vincent J. Carchidi
Supercomputers and Scaling Laws: The Hope of Emergent Abilities
The Information reported in late March that Microsoft is planning to build a supercomputer in coordination with OpenAI to the tune of up to one hundred billion dollars. The supercomputer, called “Stargate,” would be the fifth in a series of planned supercomputers, with Stargate constituting an assembly of millions of GPUs dedicated to artificial intelligence (AI). The news follows reporting that Sam Altman is engaging with Abu Dhabi-based AI investment firm MGX as part of an effort to raise up to seven trillion dollars for a chip-building venture that would reduce dependency on Nvidia.
If you keep up with AI, then you have likely already read commentary or formulated your own opinions on these reports. The always-curmudgeonly Gary Marcus wrote that a “$100B [Large Language Model] is still an LLM…Reasoning and planning in unexpected scenarios still likely to be flawed.” Moreover, even with improvements in these respects, justifying the price tag is nigh impossible, given the increased costs of operation and the lack of a unique ability that would compel consumers to pay what’s required to generate a return on investment. Curmudgeonly indeed, but likely correct.
Nevertheless, I want to comment here about what’s driving all this: namely, as Chomba Bupe puts it, that Microsoft and OpenAI “are betting on “scale is all you need”, the scaling law & hoping intelligence would emerge from going extremely big…Without the right architecture & learning strategies, scaling alone, is a fools errand.”
In the background here is the belief that, as the internal capacities of AI models are scaled up, the size of their training datasets increases by orders of magnitude, or the amount of computing power they require increases (or all three), they exhibit emergent abilities — “abilities that are not present in smaller-scale models but are present in large-scale models; thus they cannot be predicted by simply extrapolating the performance improvements on smaller-scale models.” All sorts of abilities have been claimed, with plenty of controversies, to be emergent. Some researchers argue, to the contrary, that emergent abilities are a “mirage,” wholly dependent on the metrics used to assess the performance of LLMs.
Bupe seems to be implicitly referencing this idea, arguing that Microsoft and OpenAI are essentially hoping that emergent abilities are not only “real” (in the sense that they cannot be predicted from current state-of-the-art capabilities), but that they will also lead to a generally intelligent system.
There is something deeper at play here bearing on the relationship between data and theory in science.
The Cliché View of Science
I do not want to make a firm claim on emergent abilities one way or another. Instead, I highlight a mistake in scientific reasoning that may end up costing Microsoft $100 billion without the realistic possibility of recouping the investment. Specifically, much of AI research nowadays neglects the relationship between data and theory that is taken for granted in the natural sciences, but not in its commonsense form.
Consider the following oh-so-cliché view of scientific theory construction: a scientist collects as much data as she can about a phenomenon. Then, through hypothesizing and experimentation, she and her colleagues build up a collection of statements that serve to explain the relationships between said data — we’ll call this a “theory.” Confidence in the theory grows with repeated experimentation affirming its core tenets. Now comes the important part: when new data that appear to contradict the theory are collected, our hypothetical scientist must either revise one or more of the theory’s fundamental tenets or throw the theory away and build another in its place. Theory rises and falls, on this view, according to the data.
I do not want to press this point too much, but something like this thought process is operative in AI when it comes to scaling and emergent behaviors. The idea that machine learning models will suddenly exhibit emergent abilities when scaled up (along some dimension) is effectively implying that machine learning was previously misunderstood in some way, or not understood well enough, and these new data prove our old ideas about its potential wrong.
(Also note that emergent abilities are not quite what Rich Sutton appeared to have in mind in “The Bitter Lesson,” more focused on “the great power of general purpose methods” in contrast to an approach favoring built-in knowledge that eventually “plateaus and even inhibits further progress” in contrast to “scaling computation by search and learning.”)
You might be thinking to yourself, if (some) machine learning researchers are acting like the hypothetical scientist, then what’s the problem?
I’ll let the late theoretical physicist Steven Weinberg explain:
One often reads in popular histories of science that “So and so’s data showed clearly that this and that were false, but no one at the time was willing to believe him.” Again, this impression that scientists wantonly reject uncomfortable data is based on a misapprehension as to the way scientific research is carried on.
Weinberg continues:
The fact is that a scientist in any active field of research is continually bombarded with new data, much of which eventually turns out to be either misleading or just plain wrong…When a new datum appears which contradicts our expectations, the likelihood of its being correct and relevant must be measured against the total mass of previously successful theory which might have to be abandoned if it were accepted.
To put this as simply as possible: new data are secondary to established understanding. If data contradict established understanding, then the theory in question may well continue to be used in research with the expectation that it simply does not — indeed, cannot — account for the full breadth of data, and perhaps even most data. Some data are more important than others for deep understanding, and it is never immediately clear which data these will turn out to be.
The False Hope of Stargate (?)
Unfortunately, much of the research and commercial enthusiasm for generative AI is underpinned by the cliché view of science, rather than the science that actually exists in the most successful disciplines, like physics. Examples are not hard to find, and — while I, again, do not wish to cast too wide a net — the idea of models exhibiting emergent abilities appears to contradict in quite a serious way what was previously understood about the behaviors of machine learning models, without much reflection on how the new data are weighed against the old understanding.
There is no slam dunk here, mind you. Much of AI amounts to engineering that implicitly adopts some theoretical content, but consciously aims to find out what works. What works for a given task is often quite a different matter than finding out why it works and what this portends for the future of a research paradigm (and why something does not work is just as important). And, to be sure, details may come out that Stargate is a different beast than what early commentators, including myself, are expecting.
Yet, Microsoft is investing a cool $100 billion for a supercomputer that does not appear contingent on a re-evaluation of our understanding of machine learning, and AI generally; they are not letting the generative AI dust settle, so to speak. More than this, Microsoft is effectively betting on the cliché view of science being correct in the domain of generative AI. If we’re talking about investing real money — and $100 billion is real money — into an approach premised, in part, on the cliché view of science, then a step back and a look in the mirror may be in order.
References:
[1] Adam Lucente. (2024). “What to Know about MGX, UAE’s Latest AI Investment Firm.” Al-Monitor.
[2] Anissa Gardizy & Amir Efrati. (2024). “Microsoft and OpenAI Plot $100 Billion Stargate Supercomputer.” The Information.
[3] Chloe Cornish & Madhumita Murgia. (2024). “Abu Dhabi In Talks to Invest in OpenAI Chip Venture.” Financial Times.
[4] Gary Marcus. (2024). “The Second Worst $100B Investment in The History of AI?” Marcus on AI.
[5] Jason Wei, et al. (2022). “Emergent Abilities of Large Language Models.” ArXiv.
[6] Rich Sutton. (2019). “The Bitter Lesson.” Incomplete Ideas.
[7] Rylan Schaeffer, Brando Miranda, and Sanmi Koyejo. (2023). “Are Emergent Abilities of Large Language Models a Mirage?” ArXiv.
[8] Steven Weinberg. (1974). “Reflections of a Working Scientist.” Daedalus.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI