Call for speakers is open! Check out the CFP
Speaker:
Adam Wolk
Principal Program Manager in Postgres @ Microsoft
Spent 7 years working with Oracle PL/SQL in the financial sector, from which he moved on to using PostgreSQL within a Golf startup for the next 4 years, later became an OpenBSD developer and spent 4 years at a FreeBSD-based company where he mostly worked with C on FreeBSD and began a deep dive into PostgreSQL internals. Long time database & infosec geek and a gamer, currently pushing at the edge of distributed SQL as a Principal Technical Program Manager for PostgreSQL at Microsoft.
POSETTE 2024
Talk
From Postgres full text search to Retrieval Augmented Generative search
This talk iterates through the design of an email search application, starting from basic search through Postgres full text search, semantic search, and finally retrieval augmented generation (RAG), a new AI capability enabled by Azure Database for PostgreSQL. And this talk will introduce you to Milly the Mailiphant, who is ready to help with any Postgres database questions you may have.
Through the session we cover designing the search document, FTS feature inner workings, the pgvector extension to Postgres, plus how to leverage the azure_ai extension to Azure Database for PostgreSQL.
Speaker
Interview
About the Speaker
-
Tell us about yourself: career, family, passions
Me and my wife moved to Warsaw roughly six years ago, we had an amazing Australian Shepherd called Iria (named after Irulan Corrino from Dune) unfortunately we said our goodbyes with her in early May this year, she was 14 years old. I'm passionate about database engineering, security, OpenBSD (though slacking lately) and gaming. I like reading the works of H. P. Lovecraft and I have an interest in past and current history of organized crime (it's interesting from a socio-economic perspective). I also used to do a lot of dog trekking, but that has been put on hold as our dog grew older, and I recently started driving which has been more fun than I anticipated initially.
-
What is your icebreaker for PostgreSQL events?
"So, what are you doing with PostgreSQL?" is my default opener, as I truly enjoy hearing about all the innovative ways people use the database.
-
How do you prepare for an online presentation?
It's mostly rehearsing the timings and doing dry-runs of the presentation with peers to get early feedback.
-
Which book are you reading right now?
Savage Kiss by Roberto Saviano, but it has been a slow read compared to his Gomorrah and Zero Zero Zero. I have Tokyo Vice by Jake Adelstein on my reading queue that I am really looking forward to.
About the Talk
-
What will your talk be about, exactly? Why this topic?
"From Postgres full text search to Retrieval Augmented Generative search", it's an ongoing talk for me that started as "Googling in PostgreSQL" where I described how full text search worked and put a lot of emphasis on how to structure documents for full text search. The concepts not only nicely translate from full text search to semantic search, but also the same concepts of document building apply and both can be leveraged to build RAG applications.
-
What is the audience for your talk?
Developers wanting to build any form of search into their application, especially if they are daunted by stacks like Elastic Search. Anyone interested in doing AI with PostgreSQL or just wanting to see what is possible without leaving the DB.
-
What existing knowledge should the attendee have?
You should be familiar with PostgreSQL itself, know how SQL looks in general, but you don't need to know how to write it. I have given a version of this talk to high school youth in Poland that had absolutely no experience with PostgreSQL and they still were able to interact and ask very on-point questions.
-
Which other talk at this year’s conference would you like to watch?
Just one? It's impossible to pick one, this is such a good lineup. I'm looking forward to see "A Walking Tour of PostgreSQL" by Thomas Munro (I silently hope that it will be like the Narrative History of BSD by Kirk McKusick 😉), I'm also interested to see what Marco Slot has been up to in his "Data-intensive PostgreSQL: Three ways to scale" and I'm going to stop listing here, but just open the page and read name by name as I will be watching all of them.
-
How do you balance technical depth with engaging storytelling in your conference presentations?
I wish I knew! I guess it's easier with in-person events where you can assess how well the talk is landing and decide to dive deeper or go one level up instead. In either case, I think it's important to tune for the audience. On a PostgreSQL conference I will skip describing the obvious (what PostgreSQL is and some other basics) while on a different event I would dedicate some additional slides to set the context.
About PostgreSQL
-
What inspired you to work with PostgreSQL?
When I was working on my startup, I had to pick a database engine. I had roughly 7 years of Oracle experience by that time, and maybe a year with MySQL. I evaluated various storage engines (I remember being very impressed with Riak) and ended up picking PostgreSQL as it felt the most mature feature wise and in it's approach of doing the right thing.
-
What is your favorite PostgreSQL feature?
For built in features that would be Full Text Search, but I consider things around it equally important (GIN, GIST indexes, JSONB).
-
What is the single thing that you think differentiates PostgreSQL most from other databases?
Being able to extend the database without touching the core engine. There are so many things that the community was able to iterate very quickly on (JSON support, vector support) because these could be implemented in isolation without impacting the stability of the core PG.
-
What is your favorite PostgreSQL extension or tool? And why?
Citus is of course my favorite extension, but I am biased as the PM of that project. Apart from Citus my second favorite is pg_trgm, because it's super cool to have indexes used for REGEX and unanchored wildcard searches.
-
What advice would you give to someone starting their journey with PostgreSQL?
Don't be afraid to open up the code and just read it. I was diagnosing a gnarly hanging insert at my previous job and spent weeks reading the PG implementation of GIN indexes. The documentation for PG is great, you will be able to follow it or just poke around until things make sense. That code reading ended up as a talk at PGCon Ottawa in 2019 and I attribute that as a first step that put the Citus team on my radar.
-
What are your favorite resources for learning about PostgreSQL?
Apart from the code itself and the project documentation I must mention Hironobu SUZUKI @ InterDB by Hironobu Suzuki. It's a very detailed resource, describing all the low-level details that PG does (how things are laid out in memory, on disk, how everything works). I love it and read it frequently!
-
Could you share a memorable experience or challenge you faced while working with PostgreSQL?
I mentioned the hanging insert, because how often do you dtrace and disassemble a running PostgreSQL to confirm that a pending list is being merged into the btree? Apart from that, the most gnarly related issue I remember was a machine where PG reported data corruption, ZFS reported data corruption (at different places than PG) and nothing made sense. It was ECC ram failing, but I was questioning reality before we found out what it is.
-
In your opinion, what are the most common pitfalls or mistakes developers make when working with PostgreSQL?
Not understanding what bloat is and how to avoid it, so people hit the TXID wraparound, then they spend time tuning autovacuums while in the first place they should look at their app and see if they are generating unnecessary bloat. I had a talk on that at PGConf in Prague.
-
Which skills are a must have for a PostgreSQL user/developer?
Curiosity and patience, when something feels off just take time, examine it and reason why is it the way it is. That can get you to surprising places and you may find that PG is good at many things but you need to spend the time to learn it.
-
PostgreSQL is open-source, did that ever help you in anyway and how?
Yes, like mentioned before, the ability to look at how things are implemented is a game changer. You can tell exactly what and why the database is doing something, even if it is not documented. You can also try and change that if needed.
-
If you had a magic wand, what single thing would you change in PostgreSQL as it is today?
In the paper "Implementation of Postgre" by "Michael Stonebraker, Lawrence A. Rowe and Michael Hirohama" the authors state, that they spawn a system process for each active user and mentions that it's only an expedient to get a system operational while they plan to use threads. I would like a magic wand to have them finish that work (or make it event based instead)🙂
A last aspect of our design concerns the operating system process structure. Currently, POSTGRES runs as one process for each active user. This was done as an expedient to get a system operational as quickly as possible. We plan on converting POSTGRES to use lightweight processes available in the operating systems we are using. These include PRESTO for the Sequent Symmetry and threads in Version 4 of Sun/O.
– Michael Stonebraker, Lawrence A. Rowe and Michael Hirohama (berkeley.edu)
About POSETTE & Events
-
Have you enjoyed previous POSETTE (formerly Citus Con) conferences, either as an attendee or as a speaker?
Yes, I enjoyed as a speaker, organizer and attendee! 🙂
-
What motivated you to speak at this year’s POSETTE: An Event for Postgres?
I really wanted to update my FTS talk and Posette has great production values so is a nice target to practice. I just wished the time slots would be larger!
-
What other PostgreSQL events in 2024 are you excited about?
PGConf of course! I may also have an opportunity to visit Prague again in September for a local PG meetup.
-
What advice would you give to fellow speakers preparing for a PostgreSQL conference?
The community is friendly, don't stress and push through any errors during the talk - we all make mistakes and nobody in the audience will mind.
-
What would be helpful for a first-time speaker?
What would be helpful for a first-time speaker? If in a room with an audio setup, talk a bit at the beginning to get used to the sound of your own voice (it sometimes echoes back but your mind will cancel it out in a second). Don't rush through the content, make pauses and don't read the slides.
Past Talks
Multi-tenant SaaS apps made simple on Azure Cosmos DB for PostgreSQL (Citus Con 2023)
- Video: Watch the talk on YouTube
Subscribe to notifications to keep up with POSETTE news
Join the conversation
Use the hashtag #PosetteConf
The Postgres team at Microsoft is proud to be the organizer of POSETTE: An Event for Postgres (formerly Citus Con).