The Great AI Pricing Paradox: What Happens When the Free Lunch Ends?

Since the launch of ChatGPT, everyone (myself included) has been excited about the potential of AI to transform the way that we work and live.

Nearly every post on most networks these days talks about how work is being accelerated by the various LLM platforms, whether it be OpenAI’s GPT, X’s Grok, Anthropic’s Claude, Google’s Gemini, Meta’s Llama… The enthusiasm is infectious, and rightfully so—these tools are genuinely transformational.

We also read countless stories about the billions upon billions that is being invested by these companies, in R&D, infrastructure and compute, to develop and train these models, not to mention the inference compute they require. In 2024, over 50% of all global VC funding went to AI startups, totalling $131.5 billion, marking a 52% year-over-year increase. These aren’t small bets—they’re existential wagers on the future of technology.

In many of the early interviews with the CEOs of these companies, the questions about ROI were often batted away with a “we don’t really know how we are going to return on this investment, but if we don’t make it we risk being left behind and becoming irrelevant.” This seems like an existential threat that they are all reacting to by throwing ever increasing sums of money, to ensure they have a chance to maintain their dominance of the tech commercial landscape. According to industry analysts, the current investment climate isn’t sustainable, with investors needing to look for AI opportunities elsewhere.

The Subsidised Paradise We’re Living In

I’ve been a bit worried about this paradigm recently. Everyone is getting really used to using all of these LLM platforms for free, or very cheaply, and this use is being subsidised by these large tech giants, and by the VCs that are backing them in this high stakes game to dominate the 4th industrial revolution.

What’s particularly fascinating is that whilst we’re enjoying this subsidised period, the cost of LLM inference is actually decreasing by 10x every year for equivalent performance. This rapid cost reduction might seem like great news, but it’s creating an interesting dynamic where users are becoming accustomed to increasingly powerful capabilities at artificially low prices.

The scale of this subsidisation becomes clearer when you consider the energy implications. Processing 500,000 input and output tokens can cost $7.50 with smaller models, but the energy consumption—and associated costs—scale dramatically with model size and complexity. Yet most users are experiencing these capabilities at a fraction of this true cost.

I’ve been wondering recently, how are organisations planning their use of AI within these constraints? The situation feels eerily familiar to what we experienced during the great cloud migration of the 2010s.

Back then, companies rushed to adopt cloud services like AWS, Azure, and Google Cloud Platform because the pricing was incredibly attractive. Amazon, Microsoft, and Google were locked in fierce competition for market share, essentially subsidising adoption through aggressive pricing strategies. CIOs and IT directors couldn’t resist the pitch: “Why maintain expensive on-premises infrastructure when you can rent compute power for pennies on the dollar?”

The early adopters seemed like geniuses. They were reducing capital expenditure, gaining scalability, and appearing more agile than their competitors still wrestling with physical servers. Cloud providers sweetened the deal with generous free tiers, migration incentives, and pricing models that seemed too good to be true.

But here’s the thing—they were too good to be true, at least in the long term. Once companies had migrated their core systems, trained their teams on cloud-native architectures, and restructured their entire IT operations around these platforms, something predictable happened: prices started climbing. What began as a race to the bottom in pricing became a comfortable oligopoly with steadily increasing costs.

More recently, there have been grumblings of how expensive all the cloud services have become, with some highly publicised moves back to in-house infrastructure again at much improved return on investment. Companies like 37signals (Basecamp) made headlines by moving away from cloud services, claiming to save millions annually. They discovered that once you factor in the true cost of cloud services—including bandwidth, storage, compute, and the premium for managed services—running your own infrastructure could be significantly cheaper, especially at scale.

The Enterprise Reality Check

I’ve been looking at the paid versions of some of these platforms, and the pricing reality is quite sobering once you move beyond the basic consumer tiers. Let’s break down what serious usage actually costs:

OpenAI’s pricing structure shows this escalation clearly. ChatGPT Pro costs $200 monthly and includes unlimited access to their smartest model, OpenAI o1, as well as o1?mini, GPT?4o, and Advanced Voice. But that’s for individual power users. For businesses, ChatGPT Team starts at around $25-30 per user per month, while ChatGPT Enterprise pricing isn’t publicly disclosed but requires direct sales contact—always a red flag for pricing that’s likely to make your CFO wince.

Anthropic’s Claude follows a similar pattern. Claude Pro is $20 per month for individuals, but Claude Enterprise costs more than their Team plan (which is $30 per month, per member), though Anthropic refuses to disclose the exact Enterprise pricing. When companies won’t publish their enterprise prices, it’s usually because those prices are negotiated based on usage patterns that can quickly spiral into five or six-figure monthly bills.

X’s Grok is particularly aggressive at $300 per month for their premium tier—and that’s just the beginning of enterprise pricing tiers that can quickly escalate into thousands of dollars monthly for serious usage.

Google’s Gemini and Microsoft’s Copilot follow similar tiered structures, with consumer-friendly entry points that mask the reality of enterprise-level costs.

What’s particularly concerning is that these pricing structures are designed around the current subsidised model. The “Pro” and “Enterprise” tiers aren’t necessarily profitable—they’re often just less subsidised than the free tiers. When the investment climate shifts and these companies need to achieve actual profitability, we could see dramatic pricing adjustments across all tiers.

For context, enterprise customers using these tools heavily can easily rack up usage costs equivalent to employing additional staff members. A company with 100 employees using enterprise-tier AI tools could be looking at $5,000-15,000 monthly, and that’s before considering API usage costs for integrated applications, which operate on entirely different (and often more expensive) pricing models based on token consumption.

One of the pitfalls of SaaS software for businesses is that you go through a big change management process to introduce them to your organisation and then you become a victim to the pricing strategies of the platform. Rolling out one or multiple of these LLMs might be cheap today, but what happens if the cost increases 5-10x or more from where it is today, after the landscape shakes out and these platforms are looking to make a profit?

This concern is particularly valid given that the average ROI on enterprise-wide AI initiatives is sitting at just 5.9%, despite spending on AI set to hit $200 billion in 2024. Companies are making significant investments in AI integration, but the returns aren’t guaranteed—and that’s before considering potential price increases.

The dependency risk is real. Once you’ve restructured workflows, retrained staff, and integrated these tools into core business processes, switching costs become prohibitive. Platform providers understand this lock-in effect perfectly well.

The Consumer Conundrum

And what of the consumer side of things? We all love free services. At the moment, as the competing platforms chase adoption, we are getting a sweetheart deal. I couldn’t help but laugh at the thought of some VC burning his money when my son discovered ChatGPT’s image generation features, and immediately started creating poo emojis.

This behaviour highlights something important: consumers are already exhibiting the casual, high-usage patterns that come with perceived “free” services. My son’s emoji-generating spree represents thousands of inference calls that each carry real computational costs—costs currently absorbed by investors rather than users.

I wonder whether people realise that these services cannot stay free, or cheap forever. Might these platforms be setting themselves, and us, for another one of those times where free/cheap becomes the expectation?

This has happened a few times in the past, notably when the news organisations tried to maintain their market share as news moved online, or the retailers by offering non-sustainable free shipping and return policies. We’ve been left with a vastly degraded experience in both of those domains, and I wonder whether we might be setting ourselves up for another version of this.

The news industry’s experience is particularly instructive. By training consumers to expect free content, news organisations created a race to the bottom that ultimately undermined quality journalism and sustainable business models. Are we witnessing a similar dynamic with AI services?

The Hardware Efficiency Hope

I read recently about the efficiency improvements in the newest NVIDIA chips, and how they have brought the efficiency up, so it costs 10x less than what it used to develop, and I sincerely hope that NVIDIA can keep this efficiency graph up, because this is the only way that we stand any chance to maintain some level of cheapish LLM services.

Recent research shows that whilst LLM inference costs have been declining by around 10x annually on average, there have been some remarkable outliers where specific price reduction trends reached as high as 900x per year after January 2024. These dramatic improvements appear to be driven by a combination of factors: significant hardware efficiency gains, algorithmic optimisations, and increased competition between providers.

However, it’s important to note that these extreme cost reductions often apply to specific use cases or model configurations rather than across-the-board pricing. The 900x figure likely represents the best-case scenario for particular workloads or the most aggressive pricing moves by providers seeking market share. More realistic expectations might centre around the broader trend of 10x annual improvements, which is still remarkable by any historical technology standard.

This suggests that hardware efficiency gains are indeed playing a crucial role in cost reduction, but the sustainability of these improvements—especially the more dramatic ones—remains questionable as competition dynamics evolve.

However, I still don’t understand how the large tech companies will return anything on the initial investments they made in the prior generations of GPUs that they will need to replace to gain this efficiency. The sunk cost of existing infrastructure creates an interesting economic tension—companies have billions invested in current-generation hardware that may need to be written off to capture efficiency gains from newer chips.

The sustainability equation is complex. While newer chips are more efficient per operation, the total energy consumption continues to grow as usage scales exponentially. The energy consumption of LLMs varies significantly across different stages of application, with inference costs scaling directly with usage patterns.

The Open Source Alternative

As a user of many of the LLMs, I’ve started exploring some of the open source LLM models that I can download and use on my own hardware. These are not as powerful as the latest leading-edge generation LLMs, and honestly you wouldn’t really use them for a general purpose substitute, unless you were forced offline, such as when you were on a plane, or if your internet went down. I’m pretty confident that the open source models will improve as things trickle down over time, and we will get to a place where you can have a pretty competent LLM, or group of LLMs for different tasks, that you can run on your local machine, or a server within your household that runs on your local solar array.

This exploration has been eye-opening. While open source models currently lag behind proprietary offerings in raw capability, they excel in specific domains and offer something invaluable: predictable, controlled costs. There’s no risk of pricing changes, usage limits, or service discontinuation.

The energy efficiency angle is particularly compelling for household deployment. Running a capable open source model on local hardware powered by renewable energy creates a genuinely sustainable AI workflow—at least from an ongoing operational perspective.

The technology is improving rapidly too. Recent advancements in open-source large language models are showing unexpected technological progress, suggesting that the capability gap between proprietary and open source models may narrow faster than anticipated.

Strategic Recommendations for Organisations

If I was running a large company, I would be instructing my tech leadership to explore a similar paradigm for my organisation, or alternatively I’d be looking for a long-term contractual commitment from these large tech companies to provide the LLM capability to my organisation.

More specifically, I’d be recommending a three-pronged approach:

Hybrid Strategy Development: Don’t put all your eggs in one basket. Develop capabilities across proprietary services, open source alternatives, and potentially on-premises solutions. This provides pricing leverage and reduces dependency risk.

Long-term Cost Modelling: Build financial models that account for potential 5-10x price increases in AI services. If your business case only works at current subsidised prices, you may need to reconsider the approach or build in significant pricing contingencies.

Contractual Protection: For critical AI-dependent workflows, negotiate longer-term pricing agreements or explore volume commitments that provide some protection against dramatic price increases. Given the current investment climate, providers may be willing to offer attractive long-term deals to secure revenue predictability.

The Bigger Picture

What we’re witnessing is a classic technology adoption curve, but with a twist. Usually, early adopters pay premium prices for bleeding-edge technology, and costs decrease as the technology matures and scales. With LLMs, we’re seeing the reverse: artificially low prices during the adoption phase, supported by massive capital investment, with the real pricing test still to come.

With AI startups capturing about a third of global VC dollars in 2024, totalling close to $314 billion globally, the pressure to generate returns will eventually translate into pressure on pricing models.

The question isn’t whether prices will increase—it’s when, and by how much. Understanding this dynamic and preparing for it now, while we’re still in the subsidised phase, may be the difference between successfully integrating AI into long-term business operations and finding ourselves priced out of tools we’ve become dependent on.

The free lunch won’t last forever. The question is: are we preparing for when the bill comes due?


What’s your experience been with integrating AI tools into your work or organisation? Have you started thinking about long-term pricing sustainability, or are you riding the current wave of cheap/free services? I’d love to hear your thoughts on how we navigate this transition.


Discover more from Real Velona

Subscribe to get the latest posts sent to your email.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

I’m Paul

Hi, I’m Paul Velonis, a Melbourne-based executive and entrepreneur. Welcome to Real Velona—my digital space for exploring business strategy, innovation, leadership, and technology. It’s a kaleidoscope of my passions, blending my curiosity and insight.

Discover more from Real Velona

Subscribe now to keep reading and get access to the full archive.

Continue reading