Troubleshooting SQL queries with omni_id

Troubleshooting SQL queries with omni_id

Once you have a complex SQL query, sometimes the query doesn't fail, but it returns unexpected results.

Something is going on with it, but it's hard to tell what exactly, and SQL and databases don't always help you figure that out. Let's look at this fictitious scenario: we have a users table, a products table, and an orders table. Every order contains items, and we have reviews for the products.

create table users
(
    id         serial primary key,
    username   varchar(50)  not null,
    email      varchar(100) not null,
    created_at timestamp default current_timestamp
);

create table products
(
    id           serial primary key,
    product_name varchar(100)   not null,
    price        numeric(10, 2) not null,
    stock        int            not null,
    created_at   timestamp default current_timestamp
);

create table orders
(
    id           serial primary key,
    user_id      int references users (id),
    order_date   timestamp default current_timestamp,
    total_amount numeric(10, 2) not null
);

create table order_items
(
    id         serial primary key,
    order_id   int references orders (id),
    product_id int references products (id),
    quantity   int            not null,
    price      numeric(10, 2) not null
);

create table reviews
(
    id          serial primary key,
    user_id     int references users (id),
    product_id  int references products (id),
    rating      int check (rating between 1 and 5),
    review_text text,
    created_at  timestamp default current_timestamp
);

Let's add some data!


insert into users (username, email)
values ('john_doe', 'john@example.com'),
       ('jane_smith', 'jane@example.com'),
       ('alice_jones', 'alice@example.com');

insert into products (product_name, price, stock)
values ('Laptop', 1200.00, 10),
       ('Smartphone', 800.00, 20),
       ('Tablet', 500.00, 15),
       ('Headphones', 150.00, 30);

insert into orders (user_id, total_amount)
values (1, 1950.00),
       (2, 800.00),
       (3, 650.00);

insert into order_items (order_id, product_id, quantity, price)
values (1, 1, 1, 1200.00), -- Laptop
       (1, 4, 5, 150.00),  -- Headphones
       (2, 2, 1, 800.00),  -- Smartphone
       (3, 3, 1, 500.00),  -- Tablet
       (3, 4, 1, 150.00);  -- Headphones

insert into reviews (user_id, product_id, rating, review_text)
values (1, 1, 5, 'Excellent laptop!'),
       (2, 2, 4, 'Great smartphone, but a bit pricey.'),
       (3, 3, 3, 'Tablet is okay, but could be better.'),
       (1, 4, 5, 'Fantastic headphones for the price.'),
       (2, 4, 4, 'Good quality sound.');

Finally, let's look at the query we might be having trouble with.

select distinct o.id                                   as order_id,
                o.order_date,
                u.username,
                u.email,
                p.product_name,
                oi.quantity,
                oi.price,
                (oi.quantity * oi.price)               as total_product_cost,
                count(oi.id) over (partition by o.id)  as total_items,
                avg(r.rating) over (partition by p.id) as average_rating
from orders o
         join
     users u on o.user_id = u.id
         join
     order_items oi on o.id = oi.order_id
         join
     products p on oi.order_id = p.id
         left join
     reviews r on p.id = r.product_id
order by o.id, p.product_name

This query retrieves information about orders, including what people bought, how much they spent, the number of items per order, and the average rating for each product. Let's run it.

It looks fine at first, but if you examine it closely, you'll notice something is off. For example, there is a laptop with a quantity of 1 and another entry for the same laptop with a quantity of 5. What does that mean? The data we populated does not match. So, something is wrong with the query.

This is where the recently released omni_id extension can help! Let's explore it.

select identity_type(name)
from unnest('{user_id, product_id, order_id, order_item_id, review_id}'::text[]) t(name);

create table users
(
    id         user_id primary key default user_id_nextval(),
    -- rest is the same
);


create table orders
(
    id           order_id primary key default order_id_nextval(),
    user_id      user_id references users (id),
    -- rest is the same
);

create table products
(
    id           product_id primary key default product_id_nextval(),
    -- rest is the same
);

create table order_items
(
    id         order_item_id primary key default order_item_id_nextval(),
    order_id   order_id references orders (id),
    product_id product_id references products (id),
    -- rest is the same
);

create table reviews
(
    id          review_id primary key default review_id_nextval(),
    user_id     user_id references users (id),
    product_id  product_id references products (id),
    -- rest is the same
);

So, we define the user_id, product_id, order_id, order_item_id, and review_id types. Then, we use these types in the respective table definitions. We also refer to them using these same types for foreign keys.
So, if we rebuild the schema with these tables, we can go back to the query and try to figure out what happened.

ERROR: operator does not exist: order_id = product_id

This is what happened! We are comparing the order ID with the product ID, and they don't compare as they are independent types. Okay, let's try to fix this:

select distinct /* ... */
from orders o
         join
     users u on o.user_id = u.id
         join
     order_items oi on o.id = oi.order_id
         join
     --- BELOW: changed oi.order_id to oi.product_id
     products p on oi.product_id = p.id 
         left join
     reviews r on p.id = r.product_id
order by o.id, p.product_name;

And now the data looks correct—we have five headphones as in the original data and just one laptop:

So, we just saved ourselves a lot of time trying to figure out what went wrong.

So, what does omni_id do? It creates a new integer type for each relation identity and prevents different types from being compared to each other. This concept is well-known in programming languages, called "newtype." That's exactly what we did here for Postgres—it performs just as well as normal integer keys and only requires PL/pgSQL to create the type.

This little trick shows how important (and useful) it is to enforce the correctness of data and queries at the model level.


If you prefer a video format, check out the Loom below!