Skip navigation

pg_lake: Postgres as a lakehouse

Marco Slot

Marco Slot Marco Slot

(Livestream 2)

When Postgres is bad at something, we can make it good at it through extensions. Postgres is not a good analytics database. Its analytical query performance is relatively, it has no facilities for interacting with object storage, and only supports basic CSV as a file format.

Pg_lake is a set of open source Postgres extensions that add the ability to query/import/export raw data files in your data lake via simple SQL commands commands, and create and manage Iceberg tables with high analytical query performance. It enables you to use Postgres as a versatile data "lakehouse".

This talk describes how pg_lake extends Postgres and introduces a new query engine (by "de-embedding" DuckDB), a new table storage engine (Iceberg), and seamlessly integrates them with all existing Postgres features and transactions in a production-ready way. We also show various new patterns that have emerged for using pg_lake, and how it combines with the pg_incremental extension.

talk bubbles
Join the conversation

Use the hashtag #PosetteConf