This is the only Postgres schema you need, change my mind

3 points by philmcp 10 hours ago

CREATE TABLE x (

id uuid DEFAULT uuid_generate_v4() PRIMARY KEY,

data jsonb,

inserted timestamp with time zone DEFAULT now(),

updated timestamp with time zone

);

perrygeo 4 hours ago

uuids as a primary key can be a major insert performance bottleneck since they are randomly distributed.

data isn't indexed so queries other than select-by-uuid will be slow (unless you're putting indexes on special keys which is just an ad-hoc schema with extra steps)

data migrations will be painful and require full table scan/rewrite (hope you can afford downtime)

No relationship between any of your data; it's all just independent blobs of data that may or may not be related. No referential integrity means you need another out-of-band process to make sure everything is pointing to valid data.

I get the temptation to nope out of schemas and do schema-on-read. Worked for Mongo, right? (did it?) However, postgres allows an even better option: create an actual schema for your business domain THEN add a jsonb column to every table. If you need to add extra stuff, you can just shove it into the data column. You get all the benefits of a static schema plus an option to jump ship and do JSON in a pinch.

codegeek 6 hours ago

Tell me you have never built a real world production application without telling me you have never built a real world production application.

philmcp 5 hours ago

Doing this on 2 apps
200k users a month, not huge but not nothing
Why do you think it's a bad schema?

bravesoul2 9 hours ago

Depends. Having a real schema can help with performance and correctness guarantees.

speedgoose 6 hours ago

I believe you can make constraints and indexes on jsonb fields.
- bravesoul2 20 minutes ago
  
  Yes you can. Ergonomically that'll end up being harder sometimes. Sometimes it might be easier too! E.g. mostly loose data but you need a user_id field.
  Not sure about performance. I imagine if that user_id field is a column you might get better performance. Not least because by choosing columns in a query you reduce the amount of data to be processed.
  Another benefit of doing it the boring way is tooling. Orms, schema migrations and so on. I work in this space somewhat and think alot about the DX of both key value and relational stores!
  Think of me as a "Just use postgres" and "Also just use dynamo" at the same time :)

roscas 9 hours ago

Care to example some usage cases? And why uuid over auto-increment sequenced number?

speedgoose 6 hours ago

An auto-incremented sequenced number can leak business information that one may not want to leak. Such as number of customers, transactions, documents…
Sometimes it’s fine and the simplicity is worth it if you aren’t dealing with a distributed database, but a random uuid is a better default in my opinion.

sargstuff 7 hours ago

Given schema is limited to where uuid v4 usage is relevant/appropriate.

uuid version 7 more appropriate for keys in high-load databaseses and distributed systems.

Issues if need something other than uuid_v4. aka v8,

snowflake_id bit more compact than separate uuid_v4 & timezone

json, "blob" storage, not efficent for/optimized for search/replace operations. Json blob will need to be normalized every time data cached. File system storage with uuid_v7 index less overhead.

philmcp 5 hours ago

Fair enough, ive not used v7/v8
I stand by the rest however
More pros than cons
- sargstuff 4 hours ago
  
  Access/search for data within json blob is non-sequential/random, kinda defeating whole purpose of using database. Not efficent way to update json if json larger than original size aka cache coherency issues.
  Essentially using a database as a file system[0].
  [0] : postgres fuse file system : https://github.com/petere/postgresqlfs

mooreds 10 hours ago

Use mongoDB without using mongoDB.

philmcp 9 hours ago

10x better than mongo