PostgreSQL and MySQL can do JSON too

Drupal Mountain Camp, Davos, Switzerland - 12. March 2025

© David Buchmann

David Buchmann - david@liip.ch

PHP Engineer, Liip AG, Switzerland

Why store JSON in a relational database?

Store unstructured or semi-structured data
Do not bother to denormalize everything when not necessary
Your NoSQL database at home?

Repository

Repository at github.com/dbu/postgres-json
and github.com/dbu/postgres-json/pull/2 for MySQL

SQL JSON syntax

Define column as JSON

CREATE TABLE auction_jsonb (
    id SERIAL NOT NULL,
    title VARCHAR(255) NOT NULL,
    current_price INT DEFAULT NULL,
    item JSONB NOT NULL,
    PRIMARY KEY(id)
);

Higher storage consumption, repeated keys are not optimized.
If value occurs more than 1 / 80 rows, column uses less space
Querying and joins might be less optimized than for fields

heap.io blog

JSON vs JSONB (Postgres only)

JSON: Untyped, stored raw	JSONB: Stored structured
Preserve whitespace Preserve object key order Can handle invalid JSON, preserve duplicate keys	Parsed JSON Faster for querying

=> Use JSONB unless you rely on non-standard behaviour.

Read values

Result	Postgres	MySQL
'"David"'	item->'author'	item->'$.author'
'David'	item->>'author'	item->>'$.author'

Postgres: ->> is 2.5 times faster
MySQL: Query with -> need to CAST('"David"' AS JSON)

JSON data or value?

>
The JSON information

>>
Extract primitive type

Filtering on JSON data

SELECT COUNT(*)
FROM auction_jsonb
WHERE item->>'author' = 'David';
# WHERE item->>'$.author' = 'David';

Rows	json	jsonb	MySQL
10k	136	31	21
100k	319	46	148
1M	1'280	157	1'225
10M	15'505	2'768	11'758

Query time in milliseconds

Let's add an index

CREATE INDEX json_author
ON auction_json_indexed ((item->>'author'));
# ON auction_idx ((CAST(item->>'$.author' AS CHAR(255)));

Rows	json	json indexed	jsonb	jsonb indexed	mysql	mysql indexed
10k	136	25	31	20	21	9
100k	319	12	46	13	148	15
1M	1'280	3	157	3	1225	146
10M	15'505	7	2'768	6	11'758	1'276

Query time in milliseconds

# mysql needs to be queried with cast to use index
CAST(item->>'$.author' AS CHAR(255)) = ...

We can also extract a column

CREATE TABLE auction_generated_fields (
    id SERIAL NOT NULL,
    author VARCHAR(255) generated always
        as (item->>'author') stored,
        # as (item->>'$.author') stored,
    item JSON NOT NULL, PRIMARY KEY(id));

Rows	jsonb indexed	gen	gen indexed	mysql gen	mysql gen indexed
10k	20	21	18	15	11
100k	13	30	10	55	8
1M	3	88	3	741	20
10M	6	786	4	7'097	21

Query time in milliseconds

Complex search in JSON

Searching

Use	Postgres	MySQL
Contains	item @> '{"author": "David"}'	JSON_CONTAINS(item, '{"author": "David"}')
Path Exists	item ? 'author jsonb_exists(item, 'author')	JSON_CONTAINS_PATH(item, 'one', '$.author')
Any key	item ?\| array['author','foo'] jsonb_exists_any(item, array[...]	JSON_CONTAINS_PATH(item, 'one', 'a..', 'f..')
All keys	item ?& array['author','title'] jsonb_exists_all(item, array[...]	JSON_CONTAINS_PATH(item, 'all', 'a..', 'f..')

JSON object contains object

CREATE INDEX auction_json_gin_idx
ON auction_jsonb_gin USING GIN (item);

SELECT COUNT(*)
FROM auction_jsonb_gin
WHERE item @> '{"author": "David"}';

Rows	->> indexed	@> no index	@> with index	@> with GIN	MySQL
10k	20	5	5	5	29
100k	13	20	19	5	117
1M	3	144	149	13	1'478
10M	6	1'239	1'235	76	13'866

Query time in milliseconds

Postgres: Generalized Inverted Index (GIN)

Speed up arbitrary JSON contains queries, but not other queries

No equivalent in MySQL

Side note: value distribution

Data distribution of the indexed column makes a huge difference for Postgres

Variants	json	json idx	jsonb	jsonb idx	GIN >>	GIN @	gen	gen idx
10	12'372	784	1'895	676	1'882	1'364	1'030	44
1'000	12'424	6	1'583	6	1'260	58	931	3
100'000	13'197	3	1'710	3	1'501	5	838	3

Response times in milliseconds, table with 10 million rows

Insert Performance

Jsonb, indexes and generated fields do not come for free

Rows	json idx	jsonb idx	GIN	gen idx	MySQL	MySQL idx	MySQL gen idx
10k	90	120	130	143	240	330	480
100k	740	990	2'110	1'000	1'600	2'610	2'840
1M	8'500	11'340	24'000	11'773	15'410	27'110	28'650
10M	96'000	115'000	256'000	116'000	216'480	275'060	297'660

Inserting dummy rows, duration in milliseconds

The impact of data size

Postgres/MySQL processing time scales linear with size of JSON
Storage consumption scales linear too (no optimization)
With many rows, GIN index becomes larger than the table itself

Update JSON

Partial Update: Postgres

Manipulate JSON instead of replacing it completely

-- update, ignore if not exists
jsonb_set(item, '{author}', '"value"', false)

-- update or create
jsonb_set(item, '{author}', 'value', true)

-- create, ignore if already exists
jsonb_insert(item, '{author}', 'value')
-- optional insert_after flag to append elements into array

Postgres is replacing the whole JSON, query is not faster

Partial Update: MySQL

Manipulate JSON instead of replacing it completely

-- update, ignore if not exists
JSON_REPLACE(item, '$.author', '"value"')

-- update or create
JSON_SET(item, '$.author', '"value"')

-- create, ignore if already exists
JSON_INSERT(item, '$.author', '"value"')

MySQL doc claims engine can optimize to not rebuild whole JSON object, but I was not able to observe velocity difference

Delete fields in JSON

-- delete attribute
item - 'key'

-- delete attribute alternate
jsonb_set_lax(item, '{author}', null, true, 'delete_key')

-- delete multiple
item - array['key1', 'key2']

-- delete in MySQL (takes any number of path parameters)
JSON_REMOVE(item, '$.author', '$.other')

Versatile set_jsonb_lax (Postgres only)

raise_exception	Fail if value is null
use_json_null	Set JSON field to null (default)
delete_key	Delete the key on null value
return_target	Leave previous value when new is null

set_jsonb_lax(item, '{author}', null, true, 'return_target');

JSON schema validation

Postgres: Needs custom extension
(pg_jsonschema / postgres-json-schema)
MySQL has built-in JSON_SCHEMA_VALID function, can be defined as CHECK
When you serialize PHP models, do you need validation?
Validation discards flexibility in favor of reliability

Some other constructs

Postgres	MySQL	Use
`\|\|`	`JSON_MERGE_PATCH()`	concat array / merge objects
`jsonb_build_object()`	JSON_OBJECT()	Alternate key and value
jsonb(...)	CAST('...' as JSON)	Parse from string
jsonb_serialize()	JSON_PRETTY or CAST	JSON to string
jsonb_each	Not supported	Expand top-level object key-value into rows

Conclusions

Standardization?

Similar functionality, but lots of small differences
SQL:2016 defines JSON functionality on string fields
SQL:2023 defines JSON data type and operations
MySQL uses a dedicated type for JSON, MariaDB not
Operators like ->> are not standard
(MySQL supports them but different, MariaDB has no support)

Postgres can handle JSON quite well
MySQL mostly same functionality, performance is a lot worse
MariaDB has less support
Harder to support Postgres and MySQL with the same code
Caveats - but so does learning a NoSQL database
Use jsonb with Postgres
Essential data better in dedicated columns

Performance depends a lot on how your data looks
Test on real or at least realistic data in realistic quantities!
Don't take things for granted (ever, with databases)
If you have less than 10k rows: Do whatever you want
Scaling Postgres or MySQL is possible, but first look into configuration and optimizing queries
If you need frequent or complex searches, use a dedicated search engine like Elasticsearch or look into MongoDB

Thank you!

github.com/dbu/postgres-json

Mastodon: @dbu@phpc.social

David Buchmann, Liip AG

json_each(item)

jsonb_each(item)

SELECT *
FROM jsonb_each
((SELECT item FROM auction_jsonb WHERE id=1));

Postgres has (too many) similar options

item->author
JSON_QUERY(item, '$.author')
item #> '{author}'
item['author']

No sharing of indexes.
=> Decide which construct you use