Amplify News
3.14.2024
Our Investment in WarpStream

Sarah Catanzaro

In 2019, we connected with Richie Artoul, an Uber engineer who developed an interesting service for monitoring ML models. In 2023, we invested in WarpStream, a company founded by Richie and Ryan Worl to develop a cloud-native data streaming platform. So, how exactly did this amazing pair of engineers settle on re-imagining streaming infrastructure when everyone around them was building foundation models and AI-driven developer tools? Let’s dig in…

Why streams are as (more?) important as neural networks? 

Today, most tech news focuses on how AI will impact the development of future applications. Like most others, we believe that future applications will be intelligent. However, we also believe that applications will be faster, more collaborative, and more interactive. To build better, smarter applications that make use of real-time data, the distributed log is a powerful abstraction, but most systems for collecting and aggregating log data in real-time are too hard to implement and too costly to maintain. As such, existing stream processing platforms, like Kafka, are more suitable for web-scale companies building applications like fraud detection or application performance monitoring. Most normal companies cannot implement stream processing systems reliably or cost-effectively, expand their use of stream processing, or explore new use cases for stream processing. 

Nonetheless, webscale tech companies like LinkedIn, Netflix, and Uber have adopted (or invented) Kafka and similar stream processing systems. Existing stream processing systems were tailor-made for companies like these: big tech companies with big budgets and big headcounts that manage big data centers. For other companies, real-time data processing, streaming analytics, and event-driven architectures are out of reach. There are no alternatives that address their needs for simple systems that are both easy to manage and cost-effective to scale. And even for larger companies, managing a system like Kafka still requires a massive investment of time and resources to set up and maintain.

We invested in WarpStream because they are designing a solution that is purpose-built for the use cases most companies have today and will persist into the future, not for the on-prem behemoths of a decade ago. By building on top of cloud primitives and focusing on the needs of companies that don’t manage their own data centers, WarpStream can create a cheaper, easier, more reliable stream processing system that will unlock so many opportunities for companies to build better applications. 

During their tenure at Datadog, Richie Artoul and Ryan Worl, the founders of WarpStream, were tasked with replacing Datadog’s legacy event analytics system. They developed Husky, a next-generation event storage system, which is dramatically more efficient than its predecessor. However, upon completing this system, they realized that the cost and operational toil of streaming events into Husky was still exorbitant. 

After studying this problem, they observed that the biggest line item associated with Kafka workloads at scale was networking costs - specifically, Interzone-bandwidth fees. This is not surprising since Kafka was designed by LinkedIn to run on-premise - so they didn’t need to pay a toll for cross-availability zone traffic. 

Richie and Ryan saw an opportunity to eliminate these fees by removing the need to write data cross-zone. They determined that they could build a dramatically cheaper Kafka replacement by streaming data directly to S3. What’s more, by using S3 as a durable storage backend, they could eliminate the need for engineers to manage local SSDs. Their Kafka alternative, designed purposefully for the cloud, would be far simpler to use. This new system could make stream processing better for existing Kafka users while also supporting companies for whom Kafka would be impossible to afford or manage today. The market opportunity could be massive… so they called us to chat…

But why did they call us? 

In early January 2023, we connected Richie to another Amplify founder who was focused on developing technologies that might make data lakehouses accessible to a broader market - including by reimagining file formats that were originally designed for local disk by companies that managed their data centers. This founder emphasized the opportunity to replace outdated data infrastructure with cloud-native alternatives and encouraged him to leave his position at Datadog and work with Amplify to transform his ideas into a game-changing product. 

But our relationship with Richie and Ryan began long before that co-investment in March 2019 when Richie was building an ML monitoring tool called Hyperdash. While Richie informed us that he was not excited about revisiting Hyperdash, he mentioned that he was thinking about starting a company alongside Ryan to help any team diagnose performance problems and customer issues by storing and accessing telemetry faster and cheaper.

47 emails, several demonstrations, and a few lunches later, we were bought in and poised to invest in WarpTable. Alas, another Amplify founder beat us to the punch. In October 2019, Richie demo’ed WarpTable to Alexis Lê-Quôc. A few months later, Richie and Ryan joined Datadog to develop a WarpTable-like storage solution that could optimize Datadog’s APM and logging products (this platform later became Husky Datadog’s next-gen log storage engine). 

Nonetheless, we remained confident that Richie and Ryan might start a company in the future. We thought they’d be even more excited about pursuing entrepreneurship after learning about product management, sales, and leadership from the talented team at Datadog. And we just loved talking about databases with them. So we did. For the next three years, we talked about transaction isolation, instance-optimized databases, query optimization, and more. We debated the advantages and disadvantages of relational databases for transaction processing, imagined how databases might exploit new hardware platforms, and tried to anticipate Andy Pavlo’s next antics. 

The truth is after spending 4 years with Richie and Ryan, we could not imagine a future where we didn’t one day work with them. They knew more about databases than most experts delivering keynote presentations at VLDB. They cared more about developer ergonomics and user impact than the most sophisticated product designers. They understood that building cloud-native infrastructure meant more than just lifting and shifting to the cloud. We just hoped and prayed that they didn’t start a space tourism startup on the blockchain.

So when Richie emailed us to say that Ryan was trying to convince him to build a “cloud-native replacement for Kafka WarpTable-style,” we were in. Upon hearing more about their vision, we told them that they’d be crazy not to build this given the clear market demand for stream processing tools with much higher ROI. Within weeks, we were wiring our investment. 

A lot has changed since we first invested in WarpStream in April 2023. They’ve developed a Bring-Your-Own-Cloud and serverless product, they released key features like compacted topics and infinite retention, they’ve helped companies like Middleware dramatically reduce their stream processing costs…and they raised a Series A round! The history of the Amplify-WarpStream relationship is long; over the past few years, we’ve had the chance to support Richie and Ryan as they’ve developed so many compelling data infrastructure projects. We know that WarpStream’s first products are just the beginning and look forward to continuing to support them on their journey. 

Nowadays, we spend most of our time working with the WarpStream team to support deeper integration into the Kafka ecosystem, expand their engineering and GTM team, and explore future opportunities like programmable streams. But we still find time to spat over LSM tree-managed storage - and we always will.