Can Postgres cover your NoSQL needs?



SymfonyCon, Vienna, Austria - December, 2024

© David Buchmann







David Buchmann - david@liip.ch

PHP Engineer, Liip AG, Switzerland

What does NoSQL Even Mean?

Even Wikipedia only offers a negative definition:
«A database that focuses on data that is modeled in means other than tabular relations»

What do you want from NoSQL?

Repository

Repository at github.com/dbu/postgres-json




SQL JSON syntax

Define column as JSON

CREATE TABLE auction_jsonb (
    id SERIAL NOT NULL,
    title VARCHAR(255) NOT NULL,
    current_price INT DEFAULT NULL,
    item JSONB NOT NULL,
    PRIMARY KEY(id)
);
            

heap.io blog

JSON vs JSONB

JSON: Untyped, stored raw JSONB: Stored structured
  • Preserve whitespace
  • Preserve object key order
  • Can handle invalid JSON, preserve duplicate keys
  • Parsed JSON
  • Faster for querying

=> Use JSONB unless you rely on non-standard behaviour.

item->'author' = '"David"'

or, 2.5 times faster

item->>'author' = 'David'


JSON data or value?

>
The JSON information
>>
Extract primitive type

Filtering on JSON data

SELECT COUNT(*)
FROM auction_jsonb
WHERE item->>'author' = 'David';
            
Rows json jsonb
10k 136 31
100k 319 46
1M 1'280 157
10M 15'505 2'768

Query time in milliseconds

Let's add an index

CREATE INDEX json_author
ON auction_json_indexed ((item->>'author'));
            
Rows json json indexed jsonb jsonb indexed
10k 136 25 31 20
100k 319 12 46 13
1M 1'280 3 157 3
10M 15'505 7 2'768 6

Query time in milliseconds

Side note: Tuning the query planner

At 10M rows, postgres stopped using the index and went for full table scan

-- Value <= 1.15 makes postgres use the index
ALTER TABLESPACE pg_default
SET (random_page_cost=1.1);
            

Thanks Frank Heikens on stackoverflow for hinting random_page_cost

We can also extract a column

CREATE TABLE auction_generated_fields (
    id SERIAL NOT NULL,
    author VARCHAR(255) generated always
        as (item->>'author') stored,
    item JSONB NOT NULL, PRIMARY KEY(id));
            
Rows jsonb indexed generated generated indexed
10k 20 21 18
100k 13 30 10
1M 3 88 3
10M 6 786 4

Query time in milliseconds

Side note: Extract a date (thanks jamie@zomglol.wtf)

CREATE FUNCTION text_to_timestamp(text) RETURNS TIMESTAMP
LANGUAGE sql IMMUTABLE AS
$$
SELECT CASE
WHEN $1 ~ '^\d{4}-\d{2}-\d{2}[ |T]\d{2}:\d{2}:\d{2}(\.\d+)?(\+00:00)?$'
    THEN CAST($1 AS timestamp without time zone)
END
$$;

CREATE TABLE auction_generated_fields (
    id SERIAL NOT NULL,
    start_date TIMESTAMP(0) generated always
        AS (text_to_timestamp(item->>'startDate')) stored,
    item JSONB NOT NULL, PRIMARY KEY(id));
            

Too many similar options

No sharing of indexes.
=> Decide which construct you use




Complex search in JSON

contains
item @> '{"author": "David"}'

key exists
item ? 'author'
jsonb_exists(item, 'author')


any key exists
item ?| array['author','foo'] jsonb_exists_any(item, array['a..', 'f..']

all keys exist
item ?& array['author','title']
jsonb_exists_all(item, array['a..', 'f..']


JSON object contains object (jsonb only)

CREATE INDEX auction_json_gin_idx
ON auction_jsonb_gin USING GIN (item);

SELECT COUNT(*)
FROM auction_jsonb_gin
WHERE item @> '{"author": "David"}';
            
Rows ->> indexed @> no index @> with index @> with GIN
10k 20 5 5 5
100k 13 20 19 5
1M 3 144 149 13
10M 6 1'239 1'235 76

Query time in milliseconds

Generalized Inverted Index (GIN)


Speed up arbitrary JSON contains queries, but not other queries

Side note: value distribution

Data distribution of the indexed column makes a huge difference

Variants json json idx jsonb jsonb idx GIN >> GIN @ gen gen idx
10 12'372 784 1'895 676 1'882 1'364 1'030 44
1'000 12'424 6 1'583 6 1'260 58 931 3
100'000 13'197 3 1'710 3 1'501 5 838 3

Response times in milliseconds, table with 10 million rows

Insert Performance

Jsonb, indexes and generated fields do not come for free

Rows json json idx jsonb jsonb idx GIN gen gen idx
10k 60 90 90 120 130 94 143
100k 550 740 820 990 2'110 890 1'000
1M 5'150 8'500 7'670 11'340 24'000 8'440 11'773
10M 55'000 96'000 97'000 115'000 256'000 84'000 116'000

Inserting dummy rows, duration in milliseconds

The impact of data size




Update JSON

Partial Update

Manipulate JSON instead of replacing it completely

-- update, ignore if not exists
UPDATE auction_jsonb
SET item = jsonb_set(item, '{author}', '"value"', false)
WHERE id=42;

-- update or create
UPDATE auction_jsonb
SET item = jsonb_set(item, '{author}', 'value', true)
WHERE id=42;
            

Postgres is replacing the whole JSON, query is not faster

Delete fields in JSON

-- delete attribute
UPDATE auction_jsonb
SET item = item - 'key'
WHERE ID=42;

-- delete attribute alternate
UPDATE auction_jsonb
SET item = jsonb_set_lax(item, '{author}', null, true, 'delete_key')
WHERE ID=42;

-- delete multiple
item - array['key1', 'key2']
            

Versatile set_jsonb_lax

raise_exceptionFail if value is null
use_json_nullSet JSON field to null (default)
delete_keyDelete the key on null value
return_targetLeave previous value when new is null
set_jsonb_lax(item, '{author}', null, true, 'return_target');
            

JSON schema validation

Some other constructs




Doctrine ORM

Declare column

#[ORM\Column(type: Types::JSON)]
private array $item;

#[ORM\Column(type: Types::JSON, options: ['jsonb' => true])]
private array $item;
            

Index can not be defined in Doctrine.
You have to manually add them to the migration.

$this->addSql('CREATE INDEX json_author ON auction_json_indexed ((item->>\'author\'))');
            

Define a generated field

#[ORM\Column(
    length: 255,
    nullable: true,
    insertable: false,
    updatable: false,
    columnDefinition: "VARCHAR(255) generated always as (item->>'author') stored",
    generated: "ALWAYS"
)]
private ?string $author = null;
            

Note: the column definition is native SQL, not DQL

DQL does not support JSON syntax

You can create native queries with Doctrine DBAL

Or use scienta/doctrine-json-functions

# config/packages/doctrine.yaml
doctrine:
  orm:
    dql:
      string_functions:
        JSON_GET_TEXT: Scienta\...\Postgresql\JsonGetText
        JSONB_CONTAINS: Scienta\...\Postgresql\JsonbContains

            

Register boolean functions as string and compare with = true

->> as DQL function

$qb = $this->createQueryBuilder('a');
$qb->select('COUNT(a)')
    // Need to qualify with a. in front of item
    ->where("JSON_GET_TEXT(a.item, 'author') = :author")
    ->setParameter('author', $authorName)
;

return $qb->getQuery()->getSingleScalarResult();

@> as DQL function

$qb = $this->createQueryBuilder('a');
$connection = $this->getEntityManager()->getConnection();
// Could not figure out how to pass author as parameter.
// Not sure if escaping is safe,
// needs to be verified for direct user input
$author = str_replace('"', '\"', $authorName);
$author = trim($connection->quote($author), "'");
$qb->select('COUNT(a)')
    ->where("JSONB_CONTAINS(a.item, '{\"author\": \"$author\"}') = true")
;

return $qb->getQuery()->getSingleScalarResult();
            

Doctrine/PHP Overhead

Rows jsonb dbal dbal, batch 1k entities
10k 60 4519 163 6290
100k 550 49964 1587 53823

MySQL / MariaDB




Conclusions



Thank you!


github.com/dbu/postgres-json


Mastodon: @dbu@phpc.social

David Buchmann, Liip AG




Scaling/Replicating postgres

Workarounds

Read replicas

Logical replicas

See pgdaash.io

json_each(item)

jsonb_each(item)

SELECT *
FROM jsonb_each
((SELECT item FROM auction_jsonb WHERE id=1));