Our Investment in PostgresML

Natalie Vais and Sarah Catanzaro

The Complexity of ML Infrastructure

Today more than ever, everyone wants to build AI-powered applications. However, in conversations with developers and data practitioners, we’ve consistently heard the same thing about AI infrastructure — it’s hard. Setting up machine learning workflows (which may include data preparation, training, fine-tuning, and inference) creates a ton of overhead for teams. Many companies address this by hiring experienced ML engineers to build custom infrastructure like feature stores, model stores, and inference layers. Even then, companies are caught with the maintenance burden of these solutions which become more complex (and bespoke) over time. Here are some of the biggest challenges we heard from developers when it comes to managing AI infrastructure:

  1. Deployment and scaling: Deploying ML models into production is a massive coordination problem requiring collaboration between data scientists, ML engineers, and software engineers (plus integration with other systems - see #2). Ensuring that the model works efficiently in a production environment — and can handle massive data volumes and user requests — is non-trivial, especially when users expect sub-second prediction latency. 
  2. Platform integration: After teams set up their platform for training and deploying models, the next challenge is connecting this platform to real-time data systems — like databases and stream processing engines — to power their applications. For online predictions, it’s even more important for the models to have timely, fresh data access. 
  3. Model maintenance and updates: ML models can become outdated quickly as the underlying data and business logic change. Updating training data and retraining models are essential but can be really hard to do, especially in fast-paced environments. Ideally, this workflow should be trivial — models should auto-update and learn from live data (to be truly “online”).

Many companies we spoke to don’t consider themselves to have “fancy ML workloads”. Instead, they frequently have a single SQL database sitting around (usually Postgres). For these folks, machine learning can be an aspirational roadmap item. While the recent surge of LLM APIs makes AI more accessible to developers without classical ML training, most LLM APIs do not fit nicely into existing developer workflows and technical stacks, especially for online training.

What if you could bring AI tasks directly to the database and make it dead simple to deploy AI applications from a single platform?

PostgresML: Bringing ML Code to Your Data(base)

Last year, we met a team of experienced ML and infrastructure engineers solving exactly this problem. Montana Low and Lev Kokotov are the creators and founders of PostgresML, an open-source extension for Postgres written in Rust, that allows developers to easily train and deploy ML models using SQL. The pair met during their time at Instacart, where they worked on the ML and platform teams respectively during one of the highest growth periods at Instacart ever. 

PostgresML is at the forefront of a trend wherein widely adopted developer tools and databases are being adapted and extended to an AI-first world. PostgresML allows developers to prototype and deploy AI applications quickly with their end-to-end platform on Postgres by bringing common AI tasks directly into the database.

Some of the most exciting features of their current platform are:

  • State-of-the-art LLMs: PostgresML integrates Hugging Face Transformers to bring your favorite models to the data layer. There are tens of thousands of pre-trained models available (like GPT-4, LLaMA, FLAN UL2, BLOOM) to hook up directly to your data and generate embeddings quickly.
  • Vector operations: PostgresML supports optimized vector operations that can be used inside SQL queries. In other words, you can use the full expressive power of SQL to combine semantic, geospatial, and full-text search. Rather than shipping the entire vector back to an external application, PostgresML includes all algorithms to compute results internally (see example). 
  • Web UI: PostgresML comes with a dashboard app (see above) to provide visibility into models and datasets in your database. This allows users to get a systems overview for easier management of their models and projects. The Web UI is optional and the PostgresML extension can still be used independently.

By allowing companies to run ML models directly on a Postgres database (and bringing the “code to the data”), PostgresML removes the need for a separate feature store and reduces the data management overhead associated with training and deploying ML models. This is huge, especially for those smaller companies who view machine learning as a walled garden.

Our Investment

During our due diligence process, we spoke to numerous developers who expressed their excitement about the potential of PostgresML to address their pain points in ML development. This feedback resonated with our own experience, as we've seen firsthand how the complexity of ML infrastructure can hinder the adoption of AI-driven applications. Here are the things that excite us most about PostgresML:

  • PostgresML makes it faster and easier to train and deploy ML models. PostgresML lowers the barrier to entry for small and medium-sized teams to train and deploy ML models right where their data is (in their database). Many companies do not have the time or resources to stand up bespoke ML infrastructure nor do they have a data warehouse strategy yet.
  • AI-driven applications are the future (now you can keep up). Engineers should be empowered to ship ML-driven features whether or not their company can hire full data science teams. We envision environments where engineers can use PostgresML to prototype end-to-end applications with simpler models. This adjustment could dramatically accelerate the ML development cycle. 
  • Postgres is eating the world (and your data). Postgres is one of the most beloved developer tools on the planet. As such, it’s frequently the first database folks get started with to run their production applications. Bringing ML closer to where companies already store their data (i.e. Postgres) makes it easier to get started.

Today, we’re excited to announce our lead investment in PostgresML’s $4.7M seed round to make machine learning more accessible in the era of AI-driven applications. Welcome to the Amplify family!