Racing towards GenAI “Supremacy”: Top tier players must disrupt for survival
By Philippe Ciampossin and Carlos Dangelo, Phd. * (Jan 2024. All rights reserved)
* Editorial Disclosure:
Opinions expressed in the article are the author’s responsibility alone.
Author’s opinions might not reflect the position of the companies, products cited in the article.
The authors are not associated or receive compensation from companies cited.
The current version of the GenAI technology is only one year old. Yet, it has already captured the world's adoption and imagination, with billions in investments poured into the space mostly based on its perceived potential around numerous possible applications, all centered on the ability to distill information from the huge knowledge base it was trained on.
GenAI supremacy may also be a matter of long term survival for top tier companies making their livelihood from direct connections to an enormous end-user customer base generating trillions of interactions from non-AI application access. Companies like Google, Apple, and Microsoft, have the most to lose from the redefinition of computers interfacing with new forms of human interaction for search and man-machine interfaces based on natural language.
Therein lies a source of disruption. Survival implies keeping not-easily-replicable barriers to entry against new, disruptive competitors into future market spaces created by GenAI. Tech titans cannot afford to have someone else between them and their user base, they must fight to keep full control of the interactions so the user they currently own is not directed away from the services they provide.
While highly promising, the technology is still very imperfect in terms of predictability and factual reliability; it needs to continue to improve to realize its full potential in driving real business outcomes. As such new models like ChatGPT 5 aims to improve reasoning and predictability to make AI more tangible assuming matching progress in hardware to sustain increases in software model complexity.
Current GenAI loads account for only around 10% of the $500+ billion public cloud market according to Gartner. This market is currently concentrated around Amazon, Microsoft, and Google. Since we are only at the early days of GenAI adoption this number is expected to significantly change assuming we can realize its potential into real business value woven into our daily life.
Newly introduced purpose-built AI hardware like NVIDIA's Grace Hopper architecture is expected to force a rapid refresh of existing graphics derived GPU-only install base. These optimized systems make AI training possible in hours instead of days and will help scale inference beyond what we can serve today at a fraction of the power budget of today’s system.
While most mainstream companies will sensibly opt to leverage third-party AI services and models via API integration, the full-stack development race (model-hardware-datacenter) seems destined to stay concentrated among consumer tech giants like Google, Apple, Microsoft, and Amazon who only can bear the massive costs.
So in our third article about GenAI, Let’s look this time at why we believe only a few top tier companies will be leading the race for Gen AI supremacy :
Continuous expansion of large Multi models towards AGI implies more resource injection need to happen
Fast refresh cycle of the hardware driven by new generation of integrated hardware designed for AI is emerging to replace GPU centric clusters
Power budgets will shortly exceed certain country production of energy and may require rethinking how to scale clean energy production
The continuous expansion of large Multi models towards AGI implies more resource injection to happen
70B parameters...145B parameters...trillions of parameters, more modalities and a global intent towards AGI means more cost effective compute will be needed in very large quantities. So despite the hardware advances, most of it is expected to be offset by power hungry mega models - but to what extent?
AGI vaguely refers to a “Generalized GenAI model” that supposedly learns most, if not all, necessary and sufficient domain knowledge to generate interactive inferences in future human-AI dialogues across modalities like text, images, speech, and more.
AGI models will likely require data, parameters, and computing at a vastly larger scale (10x) than today’s models. OpenAI (OAI) sees a path to develop an AGI model. Today's models suffer from issues like hallucination and unpredictability that AGI models would need to solve - for example, safely recommending Tylenol dosages for infants is just not possible today.
Many companies now commit R&D resources to address these model issues on top of alignment challenges to prevent AI from going rogue. Based on public information, OAI has a plausible technical path to AGI success absent restricting events like regulations.
Because AGI asymptotically amalgamates comprehensive knowledge across domains, it may become the ultimate single model good enough for most company needs. Lower-tier firms lacking resources for full model stacks would desire such a cost-effective AGI solution suited to their domain.
OAI seems the only major company actively pursuing an expansive AGI model for both end-users and diverse enterprises. Microsoft was first to invest, adopting OAI's models across Windows, Office, Bing and more. Other adopters span sectors like finance (Morgan Stanley) and news (Reuters).
While it is unclear what happens in secrecy at companies like Apple. OAI appears far ahead with focus, talent, critical mass, and learning velocity to inspire paying enterprise customers. Google seems to perpetually trail despite resources, maybe indicating innovator stagnation.
For business-critical models, the race currently favors OAI and Microsoft. For the rest, Fortune 1000 will pour funding into leveraging fundamental building blocks to their user base but unlikely to be part of the race to be the dominant GenAI provider . These firms will license closed models or leverage open source options as basic building blocks.
New generation of integrated hardware designed for AI are emerging to replace GPU centrict clusters
So far, the AI data center story has largely been driven by leveraging NVIDIA's GPUs, which has a quasi monopoly in public perception, if not market share.
While invented for graphics, GPUs proved amenable for AI computations, though less versatile than CPUs as generic computing building blocks. This drives non-uniform AI systems, forcing substantial data movement between cluster types to complete workflows - creating significant idle cycles as datasets fail to feed fast enough across the hardware.
As an example, training an LLMA-2 70B model (no longer cutting-edge) requires ~6,000 GPUs for 12 days at ~$2M to process 10TB data per run - and that's before powering millions of inference sessions.
To improve efficiency and lower costs, discrete AI GPU clusters now morph into more integrated architectures supporting expanded workloads. For now NVIDIA remains the sole advanced supplier of GPU/CPU/memory clusters:
Their newest "AI supercomputer" packs 256 GPU/CPUs as one unit with 1 exaFLOP capacity (10^18 ops/sec) and 144TB unified memory (Nvidia GH200 clusters).
For reference, GPT-3's 175B model took an estimated 3.1x10^23 ops over months of training. Inference is cheaper - just 7.4 x 10^14 ops per token.
If correct, Nvidia's new clusters could train a GPT-3-scale model in just ~100 hours when deployed. These systems seem robust for years of AI workloads.
So while NVIDIA captures most mindshare, AI progress in general seems unconstrained by hardware design that is rapidly advancing - many players drive rapid hardware advancements not requiring architectures like Quantum computing.
Today’s systems design should suffice for 3-5 years of projected workloads, for the training and inference side as long as they are available in enough quantity which could be a challenge in itself.
For now, only top-tier players have access to such systems as the NVIDIA Grace Hopper clusters to cement their edge and agenda control, while others must implement AI most likely on previous-generation hardware or pay premium rental fees for the latest and greatest.
Impact on power budgets will exceed certain country production of energy and may require rethinking how to scale clean energy production
Power is often assumed infinite in the cloud as we indirectly consume it through computing, storage, and services without considering how it's produced or the cleanliness per watt.
The truth is we've offloaded the problem to our data center providers, but with AI poised to surge through billions of interactions woven into daily life, consumption will scale massively very soon. Newer generations of workers and individuals may struggle to function without AI similar to the dependency we have created to our smartphones.
As Sam Altman said at the 2024 Davos conference, realizing AI's potential requires an energy breakthrough down the road. With fossil fuels unsuitable for powering humanity's next stage, nuclear power may need a comeback - whether fusion or fission.
Cloud providers specialize in rapid expansion, but perhaps got caught by the sudden, desperate race to integrate AI capabilities and the immense power needed to fuel this revolution.
So after model complexity, getting the newer generation hardware in enough quantity, the supremacy battle may come down to energy production scalability, leaving top-tier providers best positioned to capitalize.
Several articles like this one showcase the scope of AI's growing appetite: https://www.sciencedaily.com/releases/2023/10/231010133607.htm
Some example of estimated numbers that are surfacing in the article :
Hugging Face's model training consumed ~433 megawatt-hours, enough for 40 US homes annually.
Estimates suggest ChatGPT alone could cost 564 MWh daily.
By 2027, AI electricity use may rise 85-134 terawatt-hours yearly - rivaling major countries' consumption such as the Netherlands, Argentina, and Sweden.
Some perspective and conclusion
The race for AI supremacy sees enormous, focused investment from extremely well-funded leaders who view it as a prime business imperative and more or less a survival issue as the interaction between humans and computers is being disrupted forever.
While giants like Google, Facebook and Apple make great progress, the public stage seems held by OpenAI, NVIDIA and Azure. Despite might and talent access, Google , Meta and others appear in catch-up mode, now flexing their budget and realigning their resources to refocus on Gen AI and by extension AGI .
For most fortune 1000 companies (or even smaller), it will be enough to consume services from top-tier providers through leveraging closed or open foundational models, to improve customer intimacy and interactions within their specific domain.
On the system side fully integrated AI architectures combining shared memory, GPUs and CPUs into uniform clusters will likely persist pushing compute boundaries while cutting costs. We would prefer more entrants here to prevent direct monopoly, but smaller players like SambaNova would need to see much wider adoption to even start competing with NVIDIA’s footprint. As for AMD and Intel it seems that they are currently more focused on discrete chips than fully integrated architecture at least for now.
What about quantum computing? Uses cases seem to be disconnected from AI, making one wonder if funding shifts to integrated hardware could relegate QC to a niche that never reaches mass production.
As enthusiasts, these are fascinating times, but tracking the torrent of announcements to stay current is nearly a full-time job and we sometimes are wondering if what we wrote is still standing by the time we are finishing writing articles 🙂