The Great PostgreSQL Collation Conundrum: A Tale of Alphabetical Anarchy

I recently faced a puzzling issue with PostgreSQL that left me scratching my head. What should have been a quick half-point story turned into a full story point task..

The Unexpected Sorting Surprise

After pushing a new sorting feature to the CI server, I noticed something odd. This was a Django application, and at first, the behavior was really mysterious.

But after some time I found that the problem was in how ORDER BY worked locally and on server.

On my local machine, when I ran:

SELECT * FROM users ORDER BY name;

The result was:

a
A
b
B

But on the server, I saw this instead:

A
B
a
b

Same query, different results. What was going on?

The Culprit: Collation Differences

At this point, it was easy to guess the root cause: collation settings. My local machine was using a case-insensitive collation that followed dictionary order, while the production server was using the default “C” locale, which sorts by ASCII codes.

Attempted Fix: Specifying Collation

My first thought was to specify the collation for the column:

ALTER TABLE users ALTER COLUMN name TYPE VARCHAR COLLATE "en_US.utf8";

This approach should ensure consistent sorting for the column across all queries. However, I hit a snag…

The Server Complication

It turned out our production PostgreSQL server didn’t support the “en_US.utf8” collation I was trying to use.

This was unexpected and threw a wrench in my initial plan.

The Solution: LOWER to the Rescue

Ok different approach. I used LOWER in the ORDER BY clause:

SELECT * FROM users ORDER BY LOWER(name);

This method achieves case-insensitive sorting. It’s non-deterministic, but it works for my use case.

An additional benefit is that we don’t have to alter the table structure.

To make this solution more efficient, I created an index:

CREATE INDEX ON users (LOWER(name));

This index significantly improves query performance when using LOWER in the ORDER BY clause.