Latest Blog Posts

Table Access Method: How Data Update is Handled in PostgreSQL
Posted by cary huang in Highgo Software on 2024-09-27 at 22:05

Introduction

In previous blogs, we talked about an overview of PostgreSQLā€™s table access method APIĀ hereĀ , how sequential scan is handled within this APIĀ here, and how data insertion is handled here. Today in this blog, we will look closely into how PostgreSQL handles update. A successful update in PostgreSQL can be viewed as ā€œinsert a new recordā€ while ā€œmarking the old record as invisibleā€ due to the MVCC technique that PostgreSQL employs. It sounds simple enough, but there are quite a lot of considerations in place to make a successful update. Letā€™s dive in.

APIs Involved

2 Table access method APIs are primarily involved when performing an update

  • tuple_fetch_row_version(): called to find the latest version of the tuple to be updated by the given TID. We are expected to use the given TID to look up a particular tuple to update. Alternatively, a snapshot structure is given here to perform visibility check if applicable. The function is expected to fetch the tuple and convert it to Tuple Table Slot and return true if the tuple is fetched. Otherwise, it should return false.
  • tuple_update(): the primary handler for processing a tuple update request. Basically, this function is given several parameters to perform an update:
    • TID of the old tuple: the location of the old tuple to be updated
    • A new tuple expressed in Tuple Table Slot: PostgreSQL converts this to HeapTuple to update
    • Command ID and snapshot: so we can perform visibility check on the old tuple to be updated

The Update Routine

To perform an update, PostgreSQL performs a series of checks and considerations before it can perform the upgrade. This process is illustrated in the diagram below:

(1) Determine Columns Updated

The very first check the main update routine performs is to determine the columns to be updated. Particularly to find out if identity key columns have been update. This could be a primary key, index key, or partition key. It needs t

[...]

pgsql_tweaks 0.10.6 released
Posted by Stefanie Janine on 2024-09-27 at 22:00

pgsql_tweaks is a bundle of functions and views for PostgreSQL

The soucre code is available on GitLab, a mirror is hosted on GitHub.
One could install the whole package, or just copy what is needed from the source code.

The extension is also available on PGXN.

General changes

Due to the problem that pgTAP is using the same function name, is_empty, there was a problem when pgTAP has been installed before pgsql-tweaks.

To solve this problem and to not break existing code, I decided to install the function under a different name, is_empty_b, when pgTAP is already installed.

When pgTAP is not installed, the function name is still is_empty to not break existing code using the function.

There is also a differnt in the the functions between the two implementations. While the pgTAP function returns text, the pgsql-tweaks function returns a boolean result.

I have also created an issue at the pgTAP. because I can only solve the problem of pgTAP has been installed before pgsql-tweaks.

There have been no new or changed features, therefore this is only a minor release.

Coming up in Postgres - PostgreSQL 17 and trends and innovations to watch
Posted by Amit Kapila in Fujitsu on 2024-09-27 at 01:00

PostgreSQL has released a new version, with a host of updates and improvement to the worldā€™s most advanced open-source database. In this blog post, I want to take you through the key new features implemented in PostgreSQL 17.

pgsql_tweaks 0.10.5 released
Posted by Stefanie Janine on 2024-09-26 at 22:00

pgsql_tweaks is a bundle of functions and views for PostgreSQL

The soucre code is available on GitLab, a mirror is hosted on GitHub.
One could install the whole package, or just copy what is needed from the source code.

The extension is also available on PGXN.

General changes

In this release the recommended PostgreSQL version has been changed to 17.

PostgreSQL 11 has been removed from the list of supported versions.

There have been no new or changed features, therefore this is only a minor release.

Using pg_upgrade to Upgrading Your PostgreSQL Cluster on Windows
Posted by semab tariq in Stormatics on 2024-09-26 at 09:36

Upgrading your PostgreSQL cluster is an important task to keep your database running smoothly and securely. With each new release, PostgreSQL introduces performance improvements, security patches, and new features that can benefit your system. However, upgrading can be a bit tricky, especially if you're working in a Windows environment, where certain challenges like permissions, service management, and file handling may differ from Linux setups. In this blog, weā€™ll walk you through the process of performing an upgrade on a PostgreSQL cluster in Windows, covering the key steps to ensure everything goes smoothly without causing data loss.

The post Using pg_upgrade to Upgrading Your PostgreSQL Cluster on Windows appeared first on Stormatics.

Online Upgrading Logical and Physical Replication Nodes
Posted by Amit Kapila in Fujitsu on 2024-09-26 at 04:13

In my last blog post, Evolution of Logical Replication, I mentioned the future development of a feature to allow "upgrades of logical replication nodes." The upcoming release of PostgreSQL 17 includes this feature. Previously, after major version upgrades (via pg_upgrade), users couldn't immediately connect and write data to logical replication nodes. This was because the slots were lost during upgrades, preventing replication from continuing. As a result, new writes wouldn't get replicated, causing data on both nodes to become out of sync. As explained in this blog post, users had to block applications from writing until the replication setup was re-enabled after the upgrade.

With PostgreSQL 17, logical replication nodes can be upgraded without blocking writes or requiring users to manually detach/attach subscriptions or create slots. The migration of logical slots is supported only when the old cluster is version 17.0 or later, so users will benefit from this feature when upgrading from 17.0 to 18.0 (or later versions). Refer to the PostgreSQL documentation for the prerequisites for upgrading publisher and subscriber nodes.

This blog will delve into the internals of this feature and provide an example of upgrading a two-node logical replica setup. The later part of this blog will talk about online upgrading physical replication setups.Ā 

Let's start with an example of upgrading a logical replica setup where the old and new versions of databases are PostgreSQL 17.

Publisher:
1. CREATE TABLE foo(c1 int);
2. CREATE PUBLICATION pub FOR TABLE foo;

Subscriber:
1. CREATE TABLE foo(c1 int);
2. CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres' PUBLICATION pub;

Publisher:
3. INSERT INTO foo VALUES(generate_series(1,5));

Subscriber:
3. SELECT * FROM foo;
c1
----
1
2
3
4
5
(5 rows)

This shows that logical replication is happening between the publisher and the subscriber. Now, we can upgrade either the publisher or subscriber first and let the other node continue operations. In this example, I choose
[...]

PostGIS 3.5.0
Posted by Regina Obe in PostGIS on 2024-09-26 at 00:00

The PostGIS Team is pleased to release PostGIS 3.5.0! Best Served with PostgreSQL 17 RC1 and GEOS 3.13.0.

This version requires PostgreSQL 12 - 17, GEOS 3.8 or higher, and Proj 6.1+. To take advantage of all features, GEOS 3.12+ is needed. SFCGAL 1.4+ is needed to enable postgis_sfcgal support. To take advantage of all SFCGAL features, SFCGAL 1.5 is needed.

3.5.0

This release is a feature release that includes bug fixes since PostGIS 3.4.3, new features, and a few breaking changes.

Playing with BOLT and Postgres
Posted by Tomas Vondra on 2024-09-25 at 10:00

A couple days ago I had a bit of free time in the evening, and I was bored, so I decided to play with BOLT a little bit. No, not the dog from a Disney movie, the BOLT tool from LLVM project, aimed at optimizing binaries. It took me a while to get it working, but the results are unexpectedly good, in some cases up to 40%. So let me share my notes and benchmark results, and maybe thereā€™s something we can learn from it. Weā€™ll start by going through a couple rabbit holes first, though.

pgBackRest dedicated backup host
Posted by Stefan Fercot in Data Egret on 2024-09-25 at 08:00

As I mentioned in my last blog post, as your cluster grows with multiple standby servers and potentially automated failover (using tools like Patroni), it becomes more practical to set up a dedicated repository host, also known as a dedicated backup server. This backup server can then trigger backups and automatically select the appropriate node in case of failover, eliminating the need for manual intervention.

In this post, Iā€™ll show you how easy it is to add a repository host to an existing cluster. Iā€™ll also give you a sneak peek at a new feature expected to be included in the next pgBackRest release šŸ˜‰


Example setup for repository host

Initial situation

In this example, we pick up from where we left off last time: a primary server (pg1) with a standby (pg2), both already configured to use pgBackRest (with an NFS mount) for backups taken from the standby. Now, we will add a new node, repo1, to take over pgBackRest backups.

The pgBackRest user guide provides a comprehensive overview of how to set up a repository host. Since pgBackRest needs to interact with local processes on each node, we must enable communication between the hosts, either through passwordless SSH or TLS with client certificates. While SSH is generally easier to set up, TLS offers better performance. If youā€™re interested in an example of the TLS setup, I wrote this blog post when the feature was first introduced.

Installation

Letā€™s return to our repository host setup. The first step, of course, is to install pgBackRest:

$ sudo dnf install pgbackrest -y

Any user can own the repository, but itā€™s best to avoid using the postgres user (if it exists) to prevent confusion. Instead, letā€™s create a dedicated system user for this purpose:

$ sudo groupadd pgbackrest
$ sudo adduser -g pgbackrest -n pgbackrest
$ sudo chown -R pgbackrest: /var/log/pgbackrest/

The SSH setup is up to you, but usually it is as simple as creating SSH keys and authorize them on the other nodes. Example:

# From repo1
[pgbackre
[...]

Prague PostgreSQL Meetup on September 30th
Posted by GĆ¼lƧin Yıldırım JelĆ­nek in Xata on 2024-09-25 at 00:00
The fifth Prague PostgreSQL Meetup of 2024 will take place on September 30th!

Building an image search engine on Postgres
Posted by Adam Hendel in Tembo on 2024-09-24 at 09:00
This post contained content that could not be rendered in the Atom feed. Please use the official post link: https://tembo.io/blog/image-search

PGDay Hyderabad 2024 Reflections
Posted by Pavlo Golub in Cybertec on 2024-09-24 at 06:00
Pavlo Golub rocking the show talking about the super important topic - monitoring Professional PostgreSQL monitoring made easy

Introduction

As a senior database consultant and developer at CYBERTEC PostgreSQL International, I enjoyed speaking at PGDay Hyderabad 2024. This event marked two firsts for me: my first time visiting India and the first-ever PostgreSQL conference held in Hyderabad. I was filled with excitement and curiosity as I started this journey. I didn't know what to expect, but I was eager to experience it all.

Arrival and first Impressions

I arrived in Hyderabad late at night, around 2:00 AM. While I was a little tired from the long flight, I was extremely excited.. However, I made two rookie mistakes. First, I didn't buy a local SIM card, which left me without mobile internet for the whole trip ā€” a particularly unpleasant experience in India, where OTPs are required even for something as simple as Uber. Second, I didn't exchange money for local currency, leading to small logistical challenges. And the driving culture? Well, let's just say that seeing it live was a real surprise! ????

Being in India for the first time was a cultural shock ā€” but in a good way! Everything here is different from what you're used to in Europe. The tempo of life is wild, and the cuisine is an adventure of its own. Yes, it's spicy, but I handled it! If you ever have the opportunity to visit India, please do so. It's an unforgettable experience.

Exploring Hyderabad

Despite the rain pouring down the entire day, I was glad I took a full-day tour of Hyderabad. I visited Golconda Fort, The Charminar, Laad Bazaar, and Mecca Masjid Mosque. One of the highlights was stopping at the famous Nimrah Cafe, where the owner welcomed me and gave me a kitchen tour. I sampled an assortment of delicious pastries and tasted their renowned tea. Spending time there, soaking in the atmosphere, was a beautiful experience. I highly recommend it!

My First PGDay in India

In general I had no particular expectations at my first PostgreSQL conference in India. I just wanted to live in the moment and take everything in as it un

[...]

PostgreSQL 17: part 5 or CommitFest 2024-03
Posted by Pavel Luzanov in Postgres Professional on 2024-09-24 at 00:00

Since the PostgreSQLĀ 17 RC1 on a home run towards the official PostgreSQL release, scheduled for September 26, 2024.

LetŹ¼s take a look at the patches that came in during the March CommitFest. Previous articles about PostgreSQLĀ 17 CommitFests:Ā 2023-07,Ā 2023-09,Ā 2023-11, 2024-01.

Together, these give an idea of what the new PostgreSQL will look like.

Reverts after code freeze

Unfortunately, some previously accepted patches didn't make it in after all. Some of the notable ones:

Now, letŹ¼s get to the new stuff.

SQL commands

  • New features of the MERGE command
  • COPY ... FROM: messages about discarded rows
  • The SQL/JSON standard support

Performance

  • SLRU cache configuration
  • Planner: Merge Append for the UNION implementation
  • Planner: materialized CTE statistics (continued)
  • Optimization of parallel plans with DISTINCT
  • Optimizing B-tree scans for sets of values
  • VACUUM: new dead tuples storage
  • VACUUM: combine WAL records for vacuuming and freezing
  • Functions with subtransactions in parallel processes

Monitoring and management

  • EXPLAIN (analyze, serialize): data conversion costs
  • EXPLAIN: improved display of SubPlan and InitPlan nodes
  • pg_buffercache: eviction from cache

Server

  • random: a random number in the specified range
  • transaction_timeout: session termination when the transaction timeout is reached
  • Prohibit the use of ALTER SYSTEM
  • The MAINTAIN privilege and the pg_maintain predefined role
  • Built-in locale provider for C.UTF8
  • pg_column_toast_chunk_id: ID of the TOAST value
  • pg_basetype function: basic domain type
  • pg_constraint: NOT NULL restrictions for domains
  • New function to_regtypemod
  • Hash indexes for
[...]

Real World Performance Gains With Postgres 17 B-tree Bulk Scans
Posted by Brandur Leach in Crunchy Data on 2024-09-23 at 14:15

With RC1 freshly cut, the release of Postgres 17 is right on the horizon, giving us a host of features, improvements, and optimizations to look forward to.

As a backend developer, one in particular pops off the page, distinguishing itself amongst the dozens of new release items:

Allow btree indexes to more efficiently find a set of values, such as those supplied by IN clauses using constants (Peter Geoghegan, Matthias van de Meent)

The B-tree is Postgres' overwhelmingly most common and best optimized index, used for lookups on a table's primary key or secondary indexes, and undoubtedly powering all kinds of applications all over the world, many of which we interact with on a daily basis.

During lookups, a B-tree is scanned, with Postgres descending down through its hierarchy from the root until it finds a target value on one of its leaf pages. Previously, multi-value lookups like id IN (1, 2, 3) or id = any(1, 2, 3) would require that process be repeated multiple times, once for each of the requested values. Although not perfectly efficient, it wasn't a huge problem because B-tree lookups are very fast. It'd take an extremely performance sensitive user to even notice the deficiency.

As of a Postgres 17 enhancement to nbtree's ScalaryArrayOp execution, that's no longer always the case. Any particular scan with multiple scalar inputs will consider all those inputs as it's traversing a B-tree, and where multiple values land on the same leaf page, they're retrieved together to avoid repetitive traversals.

A narrowly focused script to demonstrate the original problem shows a dramatic performance increase before and after ScalaryArrayOp improvement, so we already know the gains are very real. With Postgres 17 so close to hand, we wanted to try to measure what kind of gain a realistic web app might expect from the optimization by testing it against the real API service that powers Crunchy Bridge.

In our experiment we saw roughly a 30% improvement in throughput 20% drop in average r

[...]

Nicolas Payart
Posted by Andreas 'ads' Scherbaum on 2024-09-23 at 14:00
PostgreSQL Person of the Week Interview with Nicolas Payart: I live in Brittany, France and work as a Database System Engineer for OVHcloud. My wife, daughters and I live in a house in the countryside with our cats (official and otherwise).

Contributions of w/c 2024-09-16 (week 38)
Posted by Pavlo Golub in postgres-contrib.org on 2024-09-23 at 10:08

pgBackRest backups from the standby server
Posted by Stefan Fercot in Data Egret on 2024-09-23 at 08:15

Recently, weā€™ve received many questions about how to take backups from a standby server using pgBackRest. In this post, Iā€™d like to clarify one of the most frequently asked questions and address a common misconception for new users.

First of all, itā€™s important to understand that taking a backup exclusively from the standby server is not currently possible. When you trigger a backup from the standby, pgBackRest creates a standby backup that is identical to a backup performed on the primary. It does this by starting/stopping the backup on the primary, copying only files that are replicated from the standby, then copying the remaining few files from the primary.

For this setup to work, both the primary and standby servers must share a common backup repository. This can be any supported repository type.

Letā€™s take an example, using an NFS mount point.


Example setup for backups from the standby server

Initial situation

Both the primary (pg1) and the standby (pg2) are seeing the same content of the mentioned NFS mount:

[postgres@pg1 ~]$ ls /shared/
[postgres@pg1 ~]$ touch /shared/test_write_from the primary
[postgres@pg1 ~]$ mkdir /shared/pgbackrest
[postgres@pg2 ~]$ ls /shared
pgbackrest  test_write_from

And weā€™ve got a working replication connection:

postgres=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 27773
usename          | replicator
application_name | pg2
state            | streaming
sent_lsn         | 0/500AD6C8
write_lsn        | 0/500AD6C8
flush_lsn        | 0/500AD6C8
replay_lsn       | 0/500AD6C8
sync_state       | async

WAL archiving

Letā€™s configure pgBackRest on pg1 and pg2:

$ pgbackrest version
pgBackRest 2.53.1

$ cat<<EOF | sudo tee "/etc/pgbackrest.conf"
[global]
repo1-path=/shared/pgbackrest
repo1-retention-full=4
repo1-bundle=y
repo1-block=y
start-fast=y
log-level-console=info
log-level-file=detail
delta=y
process-max=2
compress-type=zst

[mycluster]
pg1-path=/var/lib/pgsql/16/dat
[...]

Exploring PostgreSQL 17: A Developerā€™s Guide to New Features ā€“ Part 7: pg_maintain Predefined Role for Maintenance.
Posted by Deepak Mahto on 2024-09-23 at 07:15

Welcome to Part 7 of our series exploring the exciting new features anticipated in the official PostgreSQL 17 release. In this series, we delve into newly added features and discuss how they can benefit database developers and migration engineers transitioning to PostgreSQL 17 in future.

In case you missed the earlier parts:

Ready to enhance your PostgreSQL development skills? My course on PostgreSQL Migration and PL/pgSQL will help you master database development and Migration. Click here to start your journey, and use code DBGRP30 to avail 30% off!

New Maintenance Role and Access privilege in PostgreSQL 17

Databases are designed to handle critical and sensitive information, and itā€™s essential to enforce minimal grants to control who can access certain data and what operations can be performed. While necessary grants on data access and processing are common, having permissions for maintenance activities is equally crucial to ensure business SLA adherence and performance optimization.

With PostgreSQL 17, a new predefined role, pg_maintain, has been introduced to manage privileges for maintenance operations such as VACUUM, ANALYZE, REINDEX, and more on database tables, indexes, or views.

This means that a database developer is not only responsible for building business-critical functionality but can also maintain its respective database objects like Table or indexes, ensuring they meet performance and business SLAs all governed by grants that can be precisely controlled.

The following maintenance activities are supported with the new pg_maintain role, allowing for granular maintenance permissions at the table , index or view level:

VACUUM: Cleans up dead tuples and recovers storage.
ANALYZ
[...]

Kubernetes Requests and Limits for Postgres
Posted by Jeremy Schneider on 2024-09-23 at 01:02

As Joe said a few days ago: so many Postgres providers. Aiven, AWS, Azure, Crunchy, DigitalOcean, EDB, GCP, Heroku, Neon, Nile, Oracle, Supabase, Tembo, Timescale, Xata, Yugabyteā€¦ šŸ¤Æ Iā€™m sure thereā€™s more I missed. And thatā€™s not even the providers using Postgres underneath services they offer with a different focus than Postgres compatibility. (I noticed Qian Liā€™s upcoming PGConf NYC talk in 2 weeksā€¦ I have questions about DBOS!)

Kubernetes. I have a theory that more people are using kubernetes to run Postgres than we realize ā€“ even people on that list above. Neonā€™s architecture docs describe their sprinkling of k8s stardust (but not quite vanilla k8s; Neon did a little extra engineering here). There are hints around the internet suggesting some others on that list also found out about kubernetes.

And of course there are the Postgres operators. Crunchy and Zalando were first out of the gate in 2017. But not far behind, we had ongres and percona and kubegres and cloudnativepg.

We are database people. We are not actually a priesthood (the act is only for fun), but we are different. We are not like application people who can spin a compute container anywhere and everywhere without a care in the world. We have state. We are the arch enemies of the storage people. When the ceph team says they have finished their fio performance testing, we laugh and kick off the database benchmark suite and watch them panic as their storage crumbles under the immense beating of our IOPS and their caches utterly fail to predict our read/write patterns.

But we all have at least one thing in common: none of us really want to pay for a bunch of servers to sit around and do nothing, unless itā€™s really necessary. Since the dawn of time: from mainframes to PowerVM to VMware and now to kubernetes. Weā€™re hooked on consolidating better and saving more money and kubernetes is the best drug yet.

In kubernetes, you manage consolidation with two things: requests and limits.

The ProductionĀ Kube
[...]

SQLite3 Vacuum and Autovacuum
Posted by Luca Ferrari on 2024-09-23 at 00:00

Similarly to PostgreSQL, also SQLite3 needs some careā€¦

SQLite3 Vacuum and Autovacuum

Today I discovered, by accident I need to confess, that PostgreSQL is not the only database requiring VACUUM: also SQLite3 does.

And thereā€™s more: SQLite3 includes an auto-vacuum too! They behave similarly, at least in theory, to their PostgreSQL counterparts, but clearly there is no autovacuum daemon or process. Moreover, the configuration is simpler and Iā€™ve not found any threshold as we have in PostgreSQL. In the following, I explain how VACUUM works in SQLite3, at least at glance.

SQLite3 does not have a fully enterprise-level MVCC machinery as PostgreSQL has, but when tuples or tables are updated or deleted from a database, defragmentation and not reclaimed space makes the database file never shrink. Similarly to what PostgreSQL does, the now empty space (no more occupied by old tuples) is kept for future usage, so that the effect is that the database grows without never shrinking even after large data removal.

VACUUM is the solution that also SQLite3 uses to reclaim space.

VACUUM is a command available to the SQLite3 prompt to start a manual space reclaiming. It works by copying the database file content into another (temporary) file and restructuring it, so nothing really fancy and new here!

Then comes auto-vacuum that is turned off by default. The autovacuum works in a full mode or an incremental mode. The former is the most aggressive, and happens after a COMMIT. The second is the less intrusive, and ā€œpreparesā€ what the vacuum process has to do, without performing it. Is is only when [incremental_autovacuum](https://sqlite.org/pragma.html#pragma_incremental_vacuum){:target="_blank"} is launched that the space is freed. Therefore, autovacuum is SQLite3 either executes at each COMMIT or is postponed when considered safe to execute.

Solid Cache for Rails and PostgreSQL
Posted by Andrew Atkinson on 2024-09-23 at 00:00

Solid Cache is a relatively new caching framework thatā€™s available now as a Ruby gem. In the next major version of Ruby on Rails, version 8, itā€™s becoming the default cache backend.

Solid Cache has a noteworthy difference from alternatives in that it stores cache entries in a relational database and not a memory-based data store like Redis.

In this post, weā€™ll set up Solid Cache, explore the schema, operations, and discuss some Postgres optimizations to consider.

Before we do that, letā€™s discuss caching in relational vs. non-relational stores.

Why use a relational DB for caching?

A big change in the last decade is that there are now fast SSD disk drives that have huge capacities at low price points.

SSDs attached locally to an instance, not over the network, offer very fast read and write access. This configuration is available whether self-hosting or using cloud providers.

Besides the hardware gains, PostgreSQL itself has improved its efficiency across many releases in the last decade. Features like index deduplication cut down on index sizes, offering faster writes and reads.

Developers can optimize reads for their application by leveraging things like database indexes, materialized views, and denormalization. These tactics all consume more space and can add latency to write operations, but can greatly improve read access speeds. With Solid Cache, weā€™ll primarily look at a single solid_cache_entries table, and how itā€™s indexed. The indexes themselves will contain all cache entry data, and when theyā€™re small enough based on available system memory, can fit entirely into the fast memory buffer cache.

With faster hardware, abundant storage capacities, and optimized indexes, keeping cache data in the relational database is starting to make more sense. Being able to reduce dependencies on multiple data stores can simplify operations and reduce costs.

Now that weā€™ve covered a bit about why to consider a relational store for cache, letā€™s flip it around. Why would we not want

[...]

CNPG Playground: A New Learning Environment for Postgres in Kubernetes
Posted by Gabriele Bartolini in EDB on 2024-09-20 at 15:57

Welcome CNPG Playground, a local learning environment for exploring CloudNativePG and PostgreSQL in Kubernetes. Using Docker and Kind, it simulates real-world scenarios, enabling developers and DBAs to experiment with PostgreSQL replication across two clusters. Designed for hands-on learning, CNPG Playground provides an accessible entry point for testing configurations and features, with plans for future enhancements and community collaboration.

Transitioning from Oracle to PostgreSQL: Understanding the Concept of Schema
Posted by Umair Shahid in Stormatics on 2024-09-19 at 15:19

As businesses increasingly move toward open-source technologies, many Oracle Database professionals find themselves needing to work with PostgreSQL, one of the most popular open-source relational database management systems (RDBMS). Although both Oracle and PostgreSQL share many similar concepts, there are fundamental differences in how these systems handle certain database structures, one of which is the schema.

The post Transitioning from Oracle to PostgreSQL: Understanding the Concept of Schema appeared first on Stormatics.

Hierarchical data types
Posted by Florent Jardin in Dalibo on 2024-09-19 at 11:20

The SQL standard defines a set of rules so that database systems can be interchangeable, but there are small singularities in the wild. In this regard, the hierarchyid data type provided by SQL Server is a striking example. If you are switching to PostgreSQL, two solutions are available to you.

A first and simpler solution consists in linking each node to its parent using a new parentid column and applying a foreign key constraint. Another, more complete approach consists in using the ltree extension. This article deals with the latter case.

Bridging the Gap Between Compressed and Uncompressed Data in Postgres: Introducing Compression Tuple Filtering
Posted by Sven Klemm in Timescale on 2024-09-18 at 13:00

When we introduced columnar compression for Postgres in 2019, our goal was to help developers scale Postgres and efficiently manage growing datasets, such as IoT sensors, financial ticks, product metrics, and even vector data. Compression quickly became a game-changer, saving users significant storage costs and boosting query performanceā€”all while keeping their data in Postgres. With many seeing over 95Ā % compression rates, the impact was immediate.

But we didnā€™t stop there. Recognizing that many real-time analytics workloads demand flexibility for updating and backfilling data, we slowly but surely enhanced our compression engine to support INSERT, UPDATE, and DELETE (DML) operations directly on compressed data. This allowed users to work with compressed data almost as easily as they do with uncompressed data.

However, it also created a problem. While we had originally intended mutating compressed chunks to be a rare event, people were now pushing its limits with frequent inserts, updates, and deletes. Seeing our customers go all in on this feature confirmed that we were on the right track, but we had to double down on performance.

Today, weā€™re proud to announce significant improvements as of TimescaleDB 2.16.0, delivering up to 500x faster updates and deletes and 10x faster upserts on compressed data. These optimizations make compressed data behave even more like uncompressed dataā€”without sacrificing performance or flexibility.

Letā€™s dive into how we achieved these performance gains and what they mean for you. To check this weekā€™s previous launches and keep track of upcoming ones, head to this blog post or our launch page.Ā 

How We Allowed DML Operations on Compressed Data

To understand our latest improvements, it helps to revisit how we initially threw away the rule book and enabled DML operations on compressed data.

Working with compressed data is tricky. Imagine trying to update a zipped file. Youā€™d need to unzip the file, make your changes, and then zip it back up. Similarly, u

[...]

Optimising your Database for Analytics
Posted by Karen Jex in Crunchy Data on 2024-09-18 at 10:06

This post contains the slides and transcript from the talk that I gave at PyCon Italia 2024, and at EuRuKo 2024.

You can also watch the video from PyCon Italia here.

"Elevator Pitch"

Your database is configured for the needs of your day-to-day application activity, but what if you need to run complex analytics queries against your application data? Letā€™s look at how you can optimise your database for an analytics workload without compromising the performance of your application.

Talk Abstract

Data analytics isnā€™t always done in a dedicated analytics database. The business wants to glean insights and value from the data thatā€™s generated over time by your OLTP applications, and the simplest way to do that is often just to run analytics queries directly on your application database.

Of course, this almost certainly involves running complex queries, joining data from multiple tables, and working on large data sets.

If your database and code are optimised for performance of your day-to-day application activity, youā€™re likely to slow down your application and find yourself with analytics queries that take far too long to run.

In this talk, weā€™ll discuss the challenges associated with running data analytics on an existing application database. Weā€™ll look at some of the impacts this type of workload could have on the application, and why it could cause the analytics queries themselves to perform poorly.

Weā€™ll then look at a number of different strategies, tools and techniques that can prevent the two workloads from impacting each other. We will look at things such as architecture choices, configuration parameters, materialized views and external tools.

The focus will be on PostgreSQL, but most of the concepts are relevant to other database systems.

'whoami' slide: Head shot of Karen in bike helmet and sunglasses alongside diagram containing database icons joined by arrows and labelled Junior DBA, DBA, Senior DBA, Database Expert, Senior Database Consultant, Senior Solutions Architect

A few facts about me, just for fun (you can skip this bit if you've already seen one of my talks or read my previous posts):

When I'm speaking at a developer conference, I always feel the need to get this confess

[...]

Trying out Solid Queue and Mission Control with PostgreSQL
Posted by Andrew Atkinson on 2024-09-18 at 00:00

Why Solid Queue?

Background jobs are used commonly in Ruby on Rails apps to perform any work possible outside of a user request. A classic example is sending an email to a new user, where that doesnā€™t need to happen synchronously within a request. Thus, a background job framework of some kind helps to keep things consistent.

In the 2010s, Sidekiq seemed to become the most popular choice, usable as a free open source version, or a commercial Pro version. Sidekiq uses Redis to persist the job data.

Ruby on Rails added a middleware layer called Active Job back in Rails 4.2, that helped standardized background job using a consistent API. Background job systems then could become ā€œbackendsā€ for Active Job. Starting from the upcoming release of Ruby on Rails 8, thereā€™s a new official background job backend called ā€œSolid Queue.ā€ How does it work? Whatā€™s it like with PostgreSQL?

Letā€™s kick the tires!

Adding Solid Queue to a Rails app

To test it out, weā€™ll add Solid Queue to Rideshare.

To see background job data visually, weā€™ll add the Mission Control gem. Weā€™ll generate a background job, process it, and take a look at what information about the job is visible.

Later in the post weā€™ll discuss some considerations for using Postgres for background jobs.

Code to try

# PR: https://github.com/andyatkinson/rideshare/pull/209
cd rideshare
bundle add solid_queue
bundle add mission_control-jobs
# config/environments/development.rb
config.active_job.queue_adapter = :solid_queue

Generate the migrations:

bin/rails solid_queue:install

The migrations are in a file called db/queue_schema.rb.

Running bin/rails db:migrate should pick these up and add the tables to your main application database. If you want to run a separate database for Solid Queue, which is not a bad idea, youā€™ll need additional configuration beyond whatā€™s covered here.

Run:

bin/rails solid_queue:start

Letā€™s generate a ā€œHello Worldā€ type of job just to get something going through:

bin/rail
[...]

Window Functions for Data Analysis with Postgres
Posted by Elizabeth Garrett Christensen in Crunchy Data on 2024-09-17 at 14:00

SQL makes sense when it's working on a single row, or even when it's aggregating across multiple rows. But what happens when you want to compare between rows of something you've already calculated? Or make groups of data and query those? Enter window functions.

Window functions tend to confuse people - but theyā€™re a pretty awesome tool in SQL for data analytics. The best part is that you donā€™t need charts, fancy BI tools or AI to get some actionable and useful data for your stakeholders. Window functions let you quickly:

  • Calculate running totals
  • Provide summary statistics for groups/partitions of data
  • Create rankings
  • Perform lag/lead analysis, ie comparing two separate sets of data with each other
  • Compute moving/rolling averages

In this post, I will show various types of window functions and how they can apply to certain situations. Iā€™m using a super simple e-commerce schema for this to follow along with the kinds of queries Iā€™m going to run with window functions.

This post is available as an interactive tutorial as well in our Postgres playground. The OVER function

The OVER part of the Window function is what creates the window. Annoyingly the word window appears nowhere in any of the functions. šŸ˜‚ Typically the OVER part is preambled by another function, either an aggregate or mathematical function. Thereā€™s also often a frame, to specify which rows youā€™re looking at like ROWS BETWEEN 6 PRECEDING AND CURRENT ROW.

Window functions vs where clauses

Window functions kind of feel like a where clause at first, since theyā€™re looking at a set of data. But theyā€™re really different. Window functions are more for times when you need to look across sets of data or across groups of data. There are cases where you could use either. In general:

  • Use WHERE clause when you need to filter rows based on a condition.
  • Use window functions when you need to perform calculations across rows that remain after filtering, without removing any rows from the result set.
[...]

[PATCH IDEA] amcheck support for BRIN indexes
Posted by Tomas Vondra on 2024-09-17 at 10:00

Time for yet another ā€œfirst patchā€ idea post ;-) This time itā€™s about BRIN indexes. Postgres has a contrib module called amcheck, meant to check logical consistency of objects (tables and indexes). At the moment the module supports heap relations (i.e. tables) and B-Tree indexes (by far the most commonly used index type). There is a patch adding support for GiST and GIN indexes, and the idea is to also allow checking BRIN indexes.

Andrew Atkinson
Posted by Andreas 'ads' Scherbaum on 2024-09-16 at 14:00
PostgreSQL Person of the Week Interview with Andrew Atkinson: Iā€™m a programmer, dad, and husband, and live with my family in Minneapolis, Minnesota, a northern state in the USA. Iā€™ve worked remotely as a software engineer for more than 15 years at big companies like Microsoft and Groupon, and many small and mid-sized startups, scaling up their platform.

Top posters

Number of posts in the past two months

Top teams

Number of posts in the past two months

Feeds

Planet

  • Policy for being listed on Planet PostgreSQL.
  • Add your blog to Planet PostgreSQL.
  • List of all subscribed blogs.
  • Manage your registration.

Contact

Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.