Top Considerations for Infrastructure for LLMS and Generative AI

At Amplify, I work with early-stage technical companies, and it places me in a position to regularly encounter captivating novel concepts centered around Large Language Models (LLMs) and Generative AI. Many of these projects are at the forefront of the new frontier in computing, pushing the boundaries well beyond what we’ve tried to do before.

But there’s a catch: groundbreaking projects need solid foundations before they can truly scale and work on the higher-order problems involved.

It can be tempting to jump straight into training models and deploying them for inference—after all, that’s the fun part. But the reality is, the infrastructure decisions you make at this stage can set your LLM or Generative AI project up for success—or leave you with a huge technical debt to deal with later on.

Prior to Amplify, I worked at AWS as a Principal Solutions Architect in Strategic Accounts. This provided an enormous amount of exposure to some of the largest scale customers anywhere in the world, and in particular, some of the unique pain points experienced at hyper-scale. In my role at Amplify, I share my learnings from my time there in regards to infrastructure for AI projects in the enterprise.

In this article, I share a concise introduction to several essential infrastructure factors that require your attention. While this overview remains at a broad level, offering only a glimpse into these subjects, my intention is to stimulate thinking on these matters.

What are my hardware options?

While the occasional engineer will still physically buy and plug in their own GPUs, most AI engineers use GPU cloud providers, so that’s what we’re going to focus on. Those options currently include:

Enterprise-grade GPUs from NVIDIA, such as A10, A100, H100. Generally purchased, hosted and managed AWS, GCP, Azure, Oracle, Lambda Labs, etc. The newer chips like A100 and H100 you’re likely well familiar with, and they are the most in-demand (and most expensive) but for good reason - they are the most powerful. If you really have a workload that can keep them busy near 100% of the wall time, it’s hard to find a better alternative.
Proprietary GPUs built major players like Amazon and Google. Solutions like Tranium from AWS are extremely powerful and viable alternatives to the NVIDIA chips, but you will have to implement them slightly differently and it can create a bit of developer friction up front. It also locks you into that specific vendor for that hardware, but they are generally a bit easier to obtain than the NVIDIA chips, and also offer potential joint PR opportunities in different ways.
Less discussed, but still quite available, are consumer-grade (gaming) GPUs such as the NVIDIA 4090. These can’t be offered cloud providers under license, but there are “time share” services that allow you to rent them from consumers who have idle cycles available, and lots of ex-crypto miners selling the cards themselves. Least expensive option, but plenty of tradeoffs, not least of which is that they are also the least powerful.

No, really, what are my hardware options?

If you haven’t attracted funding yet, accessing GPUs from the biggest companies, like AWS and GCP, will likely be difficult. default, you generally can’t even start a single instance without reaching out to account teams, and those teams are stretched incredibly thin with requests from existing important customers, so you’re not likely to have much success or leverage from the outset. Newer GPU-focused clouds like Lambda Labs are an option, as well as the consumer GPU sharing services, but uptime and reliability won’t be anywhere close to the larger players, nor are there guarantees that they can scale up to meet the demand as they grow. This may not matter as much in early days, but it will quickly become an albatross as you scale.

If you have attracted funding, you’ll be better positioned to negotiate a contract with a bigger player, but this is still far from trivial in most cases. Companies like AWS and GCP are looking for groundbreaking companies they can partner with, and strong funding is a certainly one indicator that your project has potential. If your VC has a track record of securing these kinds of deals, they could be an excellent resource—so don’t hesitate to ask. But also understand that many VCs approach these types of requests as relationship-based favors that they approach top-down, often calling on their C-Suite friends. While this historically may have helped, in this space, a favor generally won’t take you very far (you can imagine how many requests they get on a daily basis from everyone they know!). Accordingly, founders need to make sure their investors have strong relationships and traction with the teams who actually sponsor the compute allocation.

A growth mindset

If you can access and afford proprietary GPUs from a major player, in my opinion, this is likely your best option over the long term because:

These providers are more reliable - in longevity, uptime, and extensive track record of customers experiencing both in the most challenging of workloads.
They’ll be able to meet your capacity needs as you grow.
They come with valuable partnership opportunities that can increase your credibility and reach in the marketplace.

I talk to founders all the time who are lured the attractive pricing and alleged availability of compute from the smaller upstart providers. This can certainly be a good option given the right conditions, but beware the risks associated with partnering here. For example, should you hit a certain level of scale - coupled with requirements around availability which balloon in production - you may quickly find yourself in an untenable position. Worst of all, your ability to appeal to the larger providers as a strategic partner have now been significantly diluted because you already “partnered” with another provider before making your public debut.

There are always ways to approach every permutation of the problem, but trying to get the right partner out of the gate is a very important milestone that should not be overlooked.

If you’re unable to procure a path forward with the major providers, a smaller niche provider is probably your next best option. They often come with scalability and reliability challenges, but those will tend to manifest a bit later on, and I believe there are a few benefits too. Namely:

They make it easier to customize the capacity you need, saving you money and power. Commits tend to be lower and more flexible as your needs fluctuate in early days
They’re a low-cost way to test out a new idea without a big commitment.
Can get started faster in most cases

Pondering pricing

There’s also cost to consider, and this is where the current scarcity of both the GPUs themselves and electricity availability in provider regions really play a major role. With most commodities, you get the maximum discount when you buy in bulk—but that’s not always the case with GPUs. Because of global capacity issues, supply is low and demand is high, and you can imagine some of the major customers who are demanding enormous quantities from their GPU cloud providers.

This actually can give you a bit of an advantage when negotiating discount levels - assuming a provider deems your workload strategically valuable, the amount of compute you’ll likely be requesting in the early stages of your company will be quite small in comparison to a much larger customer. Providing discount levels for smaller amounts of GPU is actually easier for most GPU cloud providers for this reason - they get to have a seat at the table as your company matures - and can still keep a lot of their GPUs reserved for very large strategic customers.

Why are larger proprietary providers so choosy about who they work with? Because there’s a hard limit to how many chips they can provide, and demand is currently far outstripping supply as you’re well aware now. If you manage to access these GPUs, expect to pay more for the increased capacity than you might otherwise pay from a smaller cloud vendor—but remember, you’ll also benefit from a full suite of services and partnership opportunities. You want to get the pricing to be as close as possible, but know you’ll be paying a premium no matter what. If and when you start to hit hyper growth, you’ll be glad you paid that premium!

Partner up with your provider

Whatever route you choose, be strategic about your partnership with your provider. If you’ve secured a contract with a larger proprietary provider, take advantage of their market influence agreeing to any partnership opportunities available. Try to understand some of the major strategic initiatives that they are thinking about - products getting maximum exposure, messaging, and most importantly, what you can learn from your network who has connections deep within the providers.

If you’re with a smaller/niche GPU cloud provider, think long and hard before agreeing to promotional opportunities (should they be presented), such as agreeing to choose them as your preferred provider in a public way. If you do ever need to grow and choose a larger provider, they won’t want to repeat PR opportunities you’ve already launched with a smaller company. If you think you’ll be switching to AWS or Google eventually, you may want to hold for a partnership press release with a bit more reach.

Bringing it all together

I recently worked with a portfolio company to help them make a decision on their cloud provider. The company had begun conversations with two major cloud vendors, however, they were concerned about the substantial price difference between the two vendors and wanted to explore cost-effective solutions while maintaining their technical requirements. We were able to engage in negotiations with both vendors to bridge the pricing gap and identify potential cost-saving opportunities. We also conducted a thorough assessment to ensure that the selected vendor could meet the portfolio company's technical requirements without compromising performance or security. And lastly, recognizing the company's interest in public relations, we explored the potential for joint PR opportunities.

After careful consideration and negotiations, the portfolio company chose to go with the slightly more expensive option that not only addressed their immediate cloud credit concerns but also allowed them to make a strategic decision that aligned with their technical needs and public relations goals. The outcome was a resounding success, with the company not only satisfied with the pricing but also experiencing significant marketing momentum that greatly surpassed their initial expectations.

The electricity conundrum

You should now be well aware of the scarcity of GPUs themselves, but in some cases, this isn’t actually the limit causing the lack of resources. GPU cloud providers often have enough chips but lack another critical resource inhibiting their deployment - electricity.

With the explosion of cloud computing over the last couple of decades, there’s very little electricity going unused, especially from always available renewable resources (such as hydro). It’s also why we’ve seen so many data centers popping up in locations like The Dalles, Oregon, along the Columbia River. Consequently, electricity has become a limiting resource, leading providers to be more hesitant about selling their available supply. This unfortunate truth will undeniably affect your choices regarding infrastructure, making it essential for you to be aware of this from the outset. And unlike chip production, this can’t be fixed in a matter of months or even years, as providers also have targets for renewable energy and other low-carbon initiatives that require a lot of buildout to solve.

Pause for thought

Every start-up is different, so when it comes to infrastructure, sometimes there isn’t a right or wrong answer. That’s why it is so troubling, at least in my opinion, that there are so many matter-of-fact recommendations on how this should be done in a generic sense. In an industry that is still in its infancy, it is important to remember that infrastructure recommendations are even more so immature and indefinite in nature.

It’s important to pause, take stock, and avoid potential technical debt headaches later. That way, when your project takes off (and I hope it does!), you can keep focusing on the fun parts. If you don’t have a deep background in infrastructure, make sure you’re relying on the advice of several trusted advisors who do have such expertise. That way, you’ll have more of a sense for the texture of the problem, as well as several paths forward that you can evaluate based on your options.

I hope you've picked up on some of the nitty-gritty details and subtle aspects of this topic. Working with founders on this subject is something I absolutely love – its ever-changing depth and intricacy keep me in the zone. If you're itching to chat, hit me up at mark@amplifypartners.com, and let's keep the conversation going!

Authors

No items found.

Editors

Acknowledgments