AI teams need a better data platform: Our investment in Spiral

Some companies take your breath away. That’s how I felt as Will Manning dissected database internals over dinner at a hip Asian fusion spot in the East Village. Then I realized that it wasn’t the conversation that had me gasping. It was a piece of shrimp lodged in my throat.

Oddly calm, I thought: “This isn’t how I’m going down” (dying mid-deal would be a far worse exit than an early acquisition). I stood up to flag a waiter who could perform the Heimlich, but Will got there first. A chunk of shrimp was quickly expelled through my mouth onto his shoe.

Mortified, I apologized, but Will just shrugged: “Don’t sweat it. I have kids. I get spit on all the time.”

That moment showed me something about him: he could handle pressure without losing his cool. Minutes later, he was back to talking about the Python GIL.

But I didn’t invest in Will because he saved my life (although I’m certainly grateful to still be breathing), I invested because AI teams need a data platform built specifically for them.

Right now, many AI teams are trying to retrofit their existing data or ML platforms. That may suffice if genAI isn’t core to the product, but for genAI-native companies, it’s the wrong approach. Some engineering leaders once believed that adapting old infrastructure would let them ship faster than building from scratch, but they’re learning it does the opposite. They need purpose-built infrastructure because genAI use cases are fundamentally different from analytics, data science, and ML in three fundamental ways:

GenAI data is different: Most existing databases and data warehouses are optimized for high volume, high velocity structured data; specifically, tabular and time series data (metrics and events). These systems excel at storing, transforming, and modeling such data to produce dashboards and reports that can be consumed by humans (most often, executives and other decision-makers). In contrast, most genAI native teams work with unstructured and structured data. Although some new systems support text and images, very few handle more complex data types like video, audio, or geospatial well. In the past few years, I’ve spoken with AI and engineering leaders working with these data types. These conversations remind me of discussions with Hadoop users before Snowflake showed up on the scene. Everything is hard. These teams need systems that remove overhead and make it simple to work with any data type.

Data gets consumed by machines: Existing data systems are optimized for data teams that prepare data for humans, who consume this data through dashboards, reports, and analysis, or through other SaaS applications like CRM, ATS, or marketing automation tools. In contrast, genAI features and products feed data directly to machines. Foundation models generate output, including conversations or analysis using petabytes of input and can retrieve millions of tokens as context. Soon, agents may act autonomously using tools to solve complex problems with whatever data they can access. Humans like tidy summaries and aggregations; machines want everything.

GPUs are expensive: Before LLMs became widespread, most ML workloads (even deep learning) ran on CPUs. Today, training and deploying LLMs requires GPUs, which are very costly. Even after significant price drops, an H200 still runs around $2 per hour. AI engineering teams must maximize GPU utilization to avoid wasting money, which makes infrastructure efficiency non-negotiable.

AI engineering teams need data platforms that support multimodal data, deliver outputs at machine scale, and load data into GPUs without delay. So what are their options today?

Well if you look at Snowflake’s website today, you’ll see it’s no longer a data warehouse, it’s an “AI data cloud.” Databricks has rebranded itself as a data intelligence platform. Even Oracle insists it’s the only hyperscaler capable of delivering AI services. In short, every incumbent data platform is rushing to position itself as the right solution for AI engineers. But positioning is not the same as engineering. None of these platforms has been fundamentally redesigned for the requirements of generative AI. Instead, they’re bolting new marketing onto old architectures and trying to retrofit systems optimized for analytics and BI to support workloads they were never meant to serve.The result: performance bottlenecks, wasted GPU cycles, and frustrated engineering teams.

It’s becoming more obvious that building a data platform specifically for GenAI requires rethinking things from first principles.You can’t jerry-rig your way into a genAI-native database. Spiral began with Vortex, a state-of-the-art columnar file format they recently donated to the Linux Foundation. Vortex enables users to decode data directly from S3 to GPU memory, so loading data is whip-fast. On top of Vortex, they’ve built Spiral, a database that unifies governance and exposes a single API for every data modality - video, audio, geospatial, text and more. Spiral is also engineered for machine-scale throughput to ensure that GPUs remain fully saturated. Every design choice reflects a singular focus on throughput and efficiency for complex, multimodal workloads.

Not all heroes wear capes, but at least three of them can be found in cotton t-shirts emblazoned with periodic table logos marking each new Palantir release. As we’ve seen from more than a decade of backing infra startups, building databases is brutally hard. Spiral’s founders, Will, Nick, and Rob, know this better than most since they spent years designing and iterating on the infrastructure behind Palantir Foundry, one of the most demanding data platforms in the world. What sets them apart is the combination of traits they exhibit: deep technical rigor paired with respect for academic research; relentless engineering execution balanced with creative problem solving, and a true obsession with customer needs.

That’s why we invested. Spiral isn’t just another data platform with a fresh coat of AI paint. It’s a group-up reimagining of the database for the genAI era, built by people who understand both the technical and human requirements. It’s exactly the team you want when the stakes are high - whether that means intelligently batching massive image datasets for peak performance or making sure a stray shrimp tail doesn’t take you out before dessert.

AI teams need a better data platform: Our investment in Spiral

Announcing our investment in Engram, the memory dream team

Kill the Slop: announcing our investment in Taste

File systems for agents

Behind the scenes of Modal sandboxes