Discover more from Posts in the Shell by Demren Sinik
Introducing the Multi-Layer
Why OpenAI wants to eat your lunch
Thanks for reading Posts in the Shell by Demren Sinik! Subscribe for free to receive new posts and support my work.
The AI ecosystem is often described in “layers” – the Application Layer (e.g., Codium), the Tooling Layer (e.g., LangChain), the Hardware Layer (e.g., Nvidia), and the Model Layer (e.g., OpenAI).
The primary reason the layer framework became popular – aside from giving VC’s an excuse to create market maps – is that most AI companies fit neatly in a single layer. After all, modern AI companies are nascent and usually good at only one thing. But talk to any founder and you will realize that most AI companies eventually want to do more than just one thing. In other words, the future of AI is “multi-layer.”
Companies that occupy the model-layer are the most likely to go multi-layer – and the least likely to admit it. Furthermore, this transition will not be driven by choice, but will be born out of necessity. Unfortunately, too many founders still see the model-layer as enabler rather than competitor, a mistake that has led (and will continue to lead) to suboptimal outcomes for AI startups.
Foundational Model or Business Model?
It’s first important to understand why model-layer companies are highly incentivized to go multi-layer:
Open-source alternatives are numerous and robust: Open-source alternatives to state-of-the-art closed source models are increasingly common and comparable in terms of accuracy, context window length, use cases, etc. Not to mention the myriad open-source small-scale models that exist for more specific tasks (and can be superior in those niches to their more generalized, large-scale cousins).
Foundational models are expensive with unclear ROI: The cost to create cutting edge LLM’s is significant (e.g., the cost to train GPT-4 was ~$100M), so much so that model layer companies are searching for ways to improve the next generation of models beyond scaling known techniques. For an investment of this scale to make sense, the revenue generated from these models needs to outweigh the costs to create them. Today, that cost is high, but the longevity of those models remaining cutting edge is surprisingly short. Assuming longevity correlates to revenue, model-layer companies will need to generate significant revenues in short periods of time or meaningfully drive down costs to justify the economics.
Perpetual demand for higher performance is not guaranteed: I’ve had hundreds of conversations with users of AI models across multiple industries. Regardless of use case or modality, many buyers have a threshold of performance that they care about (I like to call this the “Threshold of Commercial Viability”). For example, a call center with thousands of hours of audio data wanted to turn that audio into text but was indifferent between solutions that provided >90% accurate speech-to-text accuracy (90% accuracy is shockingly low!). Once such a threshold is reached, customers only care about price, which is the definition of a commodity market. Commoditization will further accelerate when open-source alternatives also reach the Threshold of Commercial Viability.
Sky high valuations will force expansion: Many model-layer companies have unicorn valuations with minimal (if any) revenue to show for it. Due to their ongoing capital requirements and margin profiles, they are unlikely to receive long-term valuation multiples in excess of public SaaS companies. Consequently, the model-layer faces pressure to grow their revenue and margins and will seek new opportunities in adjacent layers of the stack.
Going multi-layer makes simple operational sense: Better AI applications can be built with better foundational models (especially if those models are inaccessible to other applications). Those applications can then become defensible through other means, such as via network effects, customer service, or bundling. Alternatively, AI tools can make models sticky and easier to use. In other words, even if the model-layer primarily focuses on models, it still makes sense to expand to other layers to support that effort.
The Multi-Layer Today
To what extent has the multi-layer transition already started? Below, I’ve mapped out the product suites of the major model-layer companies, occasionally providing color on their recent strategic directions and decisions.
OpenAI is the most interesting example of a model-layer company going multi-layer, primarily because its public positioning (to this day) put it at odds with this transition. OpenAI was founded as a non-profit organization whose “mission is to ensure that AGI benefits all of humanity.” Even as the company has shifted from non-profit to for-profit, it continues to position itself as an enabling layer of the AI ecosystem, providing powerful research models that companies and people can use and build on top of. OpenAI even has an investment fund dedicated to backing early-stage founders.
The release of ChatGPT precipitated the first major shift within OpenAI towards the multi-layer. No one expected ChatGPT to succeed in the way that it did – perhaps not even OpenAI. Since its release, OpenAI has poured significant resources into the product, releasing Plugins, ChatGPT for Enterprise, and most recently multi-modality.
Microsoft’s investment into OpenAI was the company’s second major shift towards the multi-layer. Today, OpenAI’s finances and resources are intrinsically intertwined with Microsoft, and it comes as no surprise that the two companies have collaborated in releasing a slew of applications.
Ironically, OpenAI’s applications have brought it into direct competition with the ecosystem it brands itself as enabling. The first iteration of ChatGPT, for example, was directly competitive (and fairly destructive to) Jasper, a major early corporate partner. And this is just one example of many. It is sadly a common conversation topic within VC circles to discuss the number of startups that will likely fail with each new OpenAI application release.
Unlike OpenAI/Microsoft, the connection between Google Deepmind and Google has been obvious from day one (hint: they’re literally the same company). It is then no surprise that Google has released many applications powered by the foundational models created by its research teams. While Google Deepmind’s multi-layer strategy is more obvious than OpenAI’s, there are a few nuanced items worth noting:
First, Google Deepmind was initially perceived as slow-moving, and it was believed they were stalling the release of their technology due to (1) reputational risks and (2) potential cannibalization of core revenue streams such as search. However, Google has recently taken a more aggressive approach to model and application releases – signaling a competitive showdown with OpenAI and others.
Second, Google has started to build LLM-enabled applications beyond its existing core application suite. Vertex AI (ML workflow platform) and SynthID (image watermark tool) are two examples where Google has built new AI applications/tools vs. upgrading existing Google products. If this trend continues, entrepreneurs may struggle to identify categories with limited competitive risk from the model-layer.
Third, after merging Deepmind and Brain, Google has been internally pushing its researchers to spend more time on commercial research areas vs. non-commercial research areas – a shift that emphasizes Google’s commitment to becoming a power player in the multi-layer.
Anthropic has consistently portrayed itself as an application-first AI company with its own models. Relative to OpenAI and Google Deepmind, Anthropic directly markets Claude for specific end markets such as customer service, legal, coaching, search, back-office, and sales. Amazon’s recent multi-billion-dollar investment in Anthropic is remarkably similar to Microsoft’s investment in OpenAI, and I anticipate the relationship to evolve in a similar way, further accelerating Anthropic’s application suite.
Less needs to be said about Cohere, Adept, and AI21 Labs – their model and application suite, while smaller, is similar to that of their larger peers.
Multi-Layer Thoughts and Predictions
Model-layer companies are well into the multi-layer transition: The model-layer is not content to sell access to its models. Instead, there is an accelerating shift towards other layers in the stack. Most model-layer companies are open about their application aspirations, though others (e.g., OpenAI) demonstrate inconsistencies between their messaging and actions. So far, this has yet to elicit pushback from the community.
Resources + time correlate with application density: The model-layer companies that have gone the most multi-layer are those with the most funding, research resources, time (i.e., were founded earlier), and ties with big tech. It is safe to assume application density will continue to increase as time goes on (and assuming capital continues to fund these businesses).
Model-layer companies have significant application parity: Despite having limited product suites, there is a surprising amount of application parity in the model-layer (e.g., the chatbot/knowledge assistant is nearly ubiquitous). This is likely driven by (1) nascent and limited customer demand, (2) unproven and untested use cases, and (3) limited moats among this cohort.
Prediction #1 – Every application that logically fits into the Big Tech product suite will be built by the model layer: If you’re a founder, don’t just ask yourself whether a model-layer company will compete with you, ask yourself whether your product deserves to exist within the context of the Google, Microsoft, or Amazon product suites. These are the most obvious applications that will be built in the next 1-2 years.
Prediction #2 – Model layer companies will build more products in the tooling layer: Many view the tooling-layer as a place where model-agnostic newcomer solutions will thrive. Model-layer companies, however, may view the tooling-layer as a means to improve user experience. Currently, Google is the main player currently innovating in this space. I believe all model-layer companies will build their own tools down the line. It’s also possible those tools may be model agnostic.
For the sake of playing devil’s advocate, it’s also worth noting a few potential obstacles to the multi-layer:
How valuable is model performance? Earlier, I described the concept of the “threshold of commercial viability,” which I argued would eventually limit demand (and thus monetization potential) for more powerful models. There are a couple ways in which this plays out differently:
If it turns out the threshold of commercial viability is years away for most industries, the model-layer may not need to go multi-layer for a long time, allowing newcomers to scale in the meantime.
The value of AGI remains very unclear. If AGI proves reachable and valuable, the model-layer may allocate fewer resources to applications and tools. The applications and tools they build may also be easy to compete with.
The “threshold of commercial viability” may be irrelevant for certain industries. Some industries may follow the trajectory of the call center example I provided. But we can imagine other categories (e.g., cybersecurity, finance, etc.) where incremental performance gains, no matter how small or hard won, provide value into perpetuity.
Will technical achievements rewrite the cost equation? High costs are a critical consideration for model ROI and thus a potential driver for the multi-layer. If more performant models are no longer gatekept by high-cost requirements, model-layer companies may no longer be incentivized to go multi-layer.
Why the Multi-Layer Matters
People currently perceive a distinction between the model-layer and companies building applications and tools around it. Because of the lack of differentiation model builders and economic pressures, I believe this distinction will disappear over time and would argue we’re already witnessing its effects.
I generally believe people draw too many parallels between the future of LLM’s and the history of the cloud. However, in this case, I’ll allow a brief comparison. The cloud providers, in their early days, provided storage and compute. Today, the cloud providers provide hundreds of products and managed services around storage and compute. This isn’t to say that startups haven’t succeeded in any of these categories or that the cloud providers are clear market leaders in everything they offer. But it tells a story of rapid horizontal and vertical expansion that will likely repeat itself within artificial intelligence.
I previously wrote that incumbents are well-positioned to own the AI landscape. Founders need to forget the model-layer and think in terms of the multi-layer. Find the platforms that deserve to exist outside the multi-layer or that have a reasonable strategy to compete with the multi-layer. Leverage your enablers but treat them as competitors.
Thanks for reading Posts in the Shell by Demren Sinik! Subscribe for free to receive new posts and support my work.