PGH Web

PGH Web

Web Solutions from Pittsburgh

JSONB in PostgreSQL: When It's the Right Answer and When It Isn't

Jeff Straney·

I've seen JSONB used well once and used to avoid schema design three times. The three avoidance cases had the same pattern: the data was genuinely relational, the team didn't want to argue about the schema, and JSONB let them defer the argument indefinitely. By the time the performance problems arrived, the team had moved on and someone else got to deal with it.

The well-used case was a product that stored user-defined form fields. The schema of the data was genuinely unknown at design time. Different customers defined different fields. Querying the specific fields a given customer defined was the whole point. That is the problem JSONB exists to solve.

The actual problem JSONB solves

JSONB is the right answer when the schema of your data is genuinely variable and you need to query it efficiently.

"Genuinely variable" means the structure differs between rows in a way you cannot predict at design time. Not "we might add more fields later" (add a column). Not "different record types have slightly different attributes" (use a joined attributes table or separate tables). Genuinely variable means you cannot enumerate the possible fields without talking to the customer who defines them.

"Query it efficiently" is the part that changes when you hit the ceiling. JSONB supports GIN indexes that allow you to query the contents of JSON documents. If you need to find all rows where a JSONB column contains {"status": "active"}, a GIN index can do that. It is not as fast as an index on a dedicated column, but it works.

How GIN indexes work and what they cost

A GIN index (Generalized Inverted Index) works by building an index of every key and value in every JSONB document. When you query data @> '{"status": "active"}', the GIN index finds all documents that contain that key-value pair.

The write cost of a GIN index is higher than a B-tree index. Every insert or update that touches the JSONB column has to update the index entries for every key and value in the document. On a write-heavy table with large JSONB documents, this adds up.

Index size is also larger than you might expect. A GIN index on a JSONB column with documents that have twenty keys will be substantially larger than a B-tree index on a single column, because it indexes every key and value separately.

For read-heavy workloads with moderate write volume and reasonably compact documents, this is acceptable. For write-heavy workloads or documents with hundreds of keys, it becomes a real cost worth measuring.

The performance ceiling

JSONB queries that look simple get expensive at scale. Querying a specific key across millions of rows is one thing. Joining on a JSONB value, aggregating on a JSONB key, or filtering on multiple nested keys pushes the planner into territory where JSONB's flexibility becomes a liability.

The reason is that the planner does not have column-level statistics on JSONB contents the way it does on regular columns. It can't estimate selectivity on data->>'status' = 'active' with the same accuracy it can on status = 'active'. The estimates are cruder, the plans are less reliable, and ANALYZE has less to work with.

If you find yourself writing complex queries against JSONB columns and wondering why they're slow, this is usually why. The data is there, the index is there, but the planner is working with less information than it would have with dedicated columns.

When to extract a column

If a JSONB key appears in queries frequently, extract it to a column. This is not a failure of the original design. It's JSONB working as intended: you started with genuinely variable data, discovered that one attribute is accessed consistently, and promoted it.

ALTER TABLE submissions ADD COLUMN status TEXT 
  GENERATED ALWAYS AS (data->>'status') STORED;

CREATE INDEX ON submissions (status);

The generated column approach keeps the original document intact while giving the planner a real column to work with. The index is B-tree, statistics are accurate, and queries on status are now as fast as any other indexed column.

The case you should actually use it for

User-defined fields, form submissions with variable schemas, configuration data that different customers define differently, product attributes in a catalog where product types have incompatible attribute sets. These are cases where the schema genuinely cannot be fixed at design time and the alternative is a schemaless key-value table, which has its own problems.

If you find yourself reaching for JSONB because you're not sure what columns you need, slow down. Spend an hour on the schema. The hour spent now is worth considerably less than the months you'll spend later when the queries are slow and the data model is baked into three years of application code.

JSONB is a good answer to a specific question. The question is: does the schema of this data vary in a way I cannot predict? If the answer is yes, use it. If the answer is "we're not sure yet," that's a design meeting, not a reason for a schema escape hatch.