Taking full advantage of SQLite's extensibility is a center piece of my Go driver.
It started due to having to reimplement the OS layer in Go because of tech constraints, but it means you _can_ implement VFSes, UDFs (scalar, aggregates and windows), and virtual tables in Go, with reasonable performance and nice APIs.
I also made a point of dogfooding this as much as possible, with a bunch of extensions and a few custom VFSes that use the same APIs available to clients of the library.
Interesting, I need to take a look at your vfs implementations. Some of my projects could benefit from a transparent zstd vfs layer, but compiling and loading one of the existing C options in a Go project is yucky even if possible.
For read/write, I'm honestly not sure. The Zipvfs is an… erm… architectural mess that only really works because it accesses private SQLite APIs. Which is fine, but history has shown the SQLite team is willing to break those APIs, as they did for the ones they build the SQLite encryption extension on.
The zstandard alternative is sqlite_zstd_vfs, which faces the same architectural issues. So, I'd rather not go there. But should be doable, as long as you're not needing private APIs.
The problem with those API breaks is rarely the fact that they got broken, it's the fire drill that happens after they do.
If you get lucky, the 2 necessary upgrades happen at a time that fits well into your schedule. If you don't get lucky, then the SQLite upgrade you need contains a CERT advisory for a zero day attack, and not only does that not fit into your schedule but it also doesn't fit into the schedule of the person who did the customization.
These are rare events but over the course of a project, that low priority taken to the exponent of the number of vendors you decide to play that game with, approaches or exceeds a probability of 1.00 (>1 meaning 'happened to us twice')
You can hypothetically write your entire product in SQL with appropriate bindings to SQLite. For me, SQL with CTEs is a very compelling way to model tricky business logic. Building a SQL-based rules engine is trivial if your host language has a good SQLite library.
This thing starts to grow legs once you realize you can recursively get into the rabbit hole by binding something like an Execute_Sql UDF - You can store the actual scripts within the same schema they operate on. Treating your code as data means you can do things like transactional updates of business logic while the system is serving live requests. You also get simple reflection & search over the business logic.
OP here. Yes, it is, except that the only builtin flag it offers is `deterministic` - if you want to also use "directonly" you probably will need the same OR trick.
Ive nothing to say but “that’s cool”. I want to try this in NodeJS!
I suppose using functions defined by the host with SQLite is cheaper than using similarly defined functions in databases that are separated by a network. I wonder what the overhead is.
Also, what’s with the weird UUIDs? Why not use UUIDv7 if you want time-ordered UUIDs?
> Why not use UUIDv7 if you want time-ordered UUIDs?
It is our flavour of NIH, that said - Tou has a finer-resolution timestamp. We also didn't do our homework right and assumed the v7 UUIDs won't be accepted by Postgres because of a different "version" value.
> I suppose using functions defined by the host with SQLite is cheaper than using similarly defined functions in databases that are separated by a network.
“Infeasible” is very fast. sqlite runs in process so you can register a function pointer or five, with a trampoline back into the runtime.
Can’t do that over the network, you can create functions but only using the database’s procedural langage(s in the case of Postgres).
> Also, what’s with the weird UUIDs? Why not use UUIDv7 if you want time-ordered UUIDs?
It looks like the rationale for the Tou library is that some systems do not accept unfamiliar UUID variants as UUIDs, so a time-ordered ID that looks like a UUIDv4 is safer for some legacy systems than a (newer, and less likely recognized) UUIDv7.
I kind of thought it would ingest but not generate unknown UUID versions that nonetheless fit the broader UUID structure, but not having tried I didn't want to bring that up.
It is much cheaper, because you won't have roundtrips or requirements for the availability of extensions on the database server end. It's really a very very sweet capability that SQLite is able to provide exactly because it is hosted by the application.
> Also, what’s with the weird UUIDs? Why not use UUIDv7 if you want time-ordered UUIDs?
I’m not the author but UUIDv7 came out in about 2022. Guessing this is legacy stuff that long predated that. There were lots of solutions to solve this problem before there was a standard.
> Guessing this is legacy stuff that long predated that.
I’m guessing its not, since the tou library seems to be 8 months old and mentions avoiding the need for extensions if you are using it with Postgres as an advantage over using UUIDv7.
> The sqlite3 C API serializes all operations (even reads) within a single process. You can parallelize reads to the database but only by having multiple processes, in which case one process being blocked doesn't affect the other processes anyways. In other words, because sqlite3 serializes everything, doing things asynchronously won't speed up database access within a process. It would only free up time for your app to do other things (like HTTP requests to other servers). Unfortunately, the overhead imposed on sqlite3 to serialize asynchronous operations is quite high, making it disadvantageous 95% of the time.
The sqlite3 C API very much does not serialize "all operations within a single process."
The way threading and concurrency work in SQLite may not mesh well with NodeJS's concurrency model. I dunno, I'm not an NodeJS/libuv expert.
But at the C API level that statement is just wrong. Normally you cannot share a single connection across threads. If you compile SQLite to allow this, yes, it'll serialize operations using locks. The solution is to create additional database connections, not (necessarily) launch another process. With multiple database connections, you can have concurrency, with or without threads.
Again, whether this is viable in NodeJS, I have no idea. But it's a Node issue, not a C API issue.
BTW, we're commenting on a Ruby article, and SQLite in Ruby has seen "recent" advances that increase concurrency through implementing SQLite's BUSY handler in Ruby, which allows the GVL lock to be released, and other Ruby and SQLite code to run while waiting on a BUSY connection.
The user-supplied busy handler has been available for a long while, it's just that the Rails connection adapters did not quite use it right. Indeed, there is elevated interest in SQLite these days.
I happen to work with BigQuery since recently and heard you can do UDFs with JavaScript. Good to know that this is a thing with more databases. I didn't need to use UDFs up until this point but now that I know about it a bit more, I just might as JS happens to be a strong language of mine (currently not using it professionally).
Taking full advantage of SQLite's extensibility is a center piece of my Go driver.
It started due to having to reimplement the OS layer in Go because of tech constraints, but it means you _can_ implement VFSes, UDFs (scalar, aggregates and windows), and virtual tables in Go, with reasonable performance and nice APIs.
I also made a point of dogfooding this as much as possible, with a bunch of extensions and a few custom VFSes that use the same APIs available to clients of the library.
https://github.com/ncruces/go-sqlite3/tree/main/ext
https://github.com/ncruces/go-sqlite3/tree/main/vfs#custom-v...
Interesting, I need to take a look at your vfs implementations. Some of my projects could benefit from a transparent zstd vfs layer, but compiling and loading one of the existing C options in a Go project is yucky even if possible.
I've never done it, but for the readonly case, I think all you need to is to combine the existing reader VFS with zstd-seekable-format-go.
https://github.com/ncruces/go-sqlite3/discussions/117#discus...
For read/write, I'm honestly not sure. The Zipvfs is an… erm… architectural mess that only really works because it accesses private SQLite APIs. Which is fine, but history has shown the SQLite team is willing to break those APIs, as they did for the ones they build the SQLite encryption extension on.
https://sqlite.org/zipvfs/doc/trunk/www/howitworks.wiki
The zstandard alternative is sqlite_zstd_vfs, which faces the same architectural issues. So, I'd rather not go there. But should be doable, as long as you're not needing private APIs.
https://github.com/mlin/sqlite_zstd_vfs
The problem with those API breaks is rarely the fact that they got broken, it's the fire drill that happens after they do.
If you get lucky, the 2 necessary upgrades happen at a time that fits well into your schedule. If you don't get lucky, then the SQLite upgrade you need contains a CERT advisory for a zero day attack, and not only does that not fit into your schedule but it also doesn't fit into the schedule of the person who did the customization.
These are rare events but over the course of a project, that low priority taken to the exponent of the number of vendors you decide to play that game with, approaches or exceeds a probability of 1.00 (>1 meaning 'happened to us twice')
Thanks for the pointers!
i wish there was something like this for wasm. There are wasm builds available but its not easy to extend them with your own functions.
You can hypothetically write your entire product in SQL with appropriate bindings to SQLite. For me, SQL with CTEs is a very compelling way to model tricky business logic. Building a SQL-based rules engine is trivial if your host language has a good SQLite library.
This thing starts to grow legs once you realize you can recursively get into the rabbit hole by binding something like an Execute_Sql UDF - You can store the actual scripts within the same schema they operate on. Treating your code as data means you can do things like transactional updates of business logic while the system is serving live requests. You also get simple reflection & search over the business logic.
Is this the same thing as create_function for python?
https://docs.python.org/3/library/sqlite3.html#sqlite3.Conne...
OP here. Yes, it is, except that the only builtin flag it offers is `deterministic` - if you want to also use "directonly" you probably will need the same OR trick.
Ive nothing to say but “that’s cool”. I want to try this in NodeJS!
I suppose using functions defined by the host with SQLite is cheaper than using similarly defined functions in databases that are separated by a network. I wonder what the overhead is.
Also, what’s with the weird UUIDs? Why not use UUIDv7 if you want time-ordered UUIDs?
> Why not use UUIDv7 if you want time-ordered UUIDs?
It is our flavour of NIH, that said - Tou has a finer-resolution timestamp. We also didn't do our homework right and assumed the v7 UUIDs won't be accepted by Postgres because of a different "version" value.
> I suppose using functions defined by the host with SQLite is cheaper than using similarly defined functions in databases that are separated by a network.
“Infeasible” is very fast. sqlite runs in process so you can register a function pointer or five, with a trampoline back into the runtime.
Can’t do that over the network, you can create functions but only using the database’s procedural langage(s in the case of Postgres).
FWIW, the set of procedural languages in postgres is runtime extensible: https://www.postgresql.org/docs/current/sql-createlanguage.h...
> Also, what’s with the weird UUIDs? Why not use UUIDv7 if you want time-ordered UUIDs?
It looks like the rationale for the Tou library is that some systems do not accept unfamiliar UUID variants as UUIDs, so a time-ordered ID that looks like a UUIDv4 is safer for some legacy systems than a (newer, and less likely recognized) UUIDv7.
This. We did discover, however, that Postgres will, in fact, swallow a UUIDv7 just fine. After having written that library :-)
I kind of thought it would ingest but not generate unknown UUID versions that nonetheless fit the broader UUID structure, but not having tried I didn't want to bring that up.
It is much cheaper, because you won't have roundtrips or requirements for the availability of extensions on the database server end. It's really a very very sweet capability that SQLite is able to provide exactly because it is hosted by the application.
> Also, what’s with the weird UUIDs? Why not use UUIDv7 if you want time-ordered UUIDs?
I’m not the author but UUIDv7 came out in about 2022. Guessing this is legacy stuff that long predated that. There were lots of solutions to solve this problem before there was a standard.
> Guessing this is legacy stuff that long predated that.
I’m guessing its not, since the tou library seems to be 8 months old and mentions avoiding the need for extensions if you are using it with Postgres as an advantage over using UUIDv7.
Great, thanks
better-sqlite3 got you covered!
better-sqlite3 is (was?) my favorite sqlite3 lib, it is incredibly fast.
NodeJS has SQLITE3 support these days!
https://nodejs.org/api/sqlite.html
Interestingly it is NOT async, like better-sqlite3. I wonder why. I've been looking for any public remarks about it, but found nothing.
These may be useful reading:
https://github.com/TryGhost/node-sqlite3/issues/408#issue-57...
https://github.com/WiseLibs/better-sqlite3/issues/32#issueco...
Copying a quote from the second:
> The sqlite3 C API serializes all operations (even reads) within a single process. You can parallelize reads to the database but only by having multiple processes, in which case one process being blocked doesn't affect the other processes anyways. In other words, because sqlite3 serializes everything, doing things asynchronously won't speed up database access within a process. It would only free up time for your app to do other things (like HTTP requests to other servers). Unfortunately, the overhead imposed on sqlite3 to serialize asynchronous operations is quite high, making it disadvantageous 95% of the time.
The sqlite3 C API very much does not serialize "all operations within a single process."
The way threading and concurrency work in SQLite may not mesh well with NodeJS's concurrency model. I dunno, I'm not an NodeJS/libuv expert.
But at the C API level that statement is just wrong. Normally you cannot share a single connection across threads. If you compile SQLite to allow this, yes, it'll serialize operations using locks. The solution is to create additional database connections, not (necessarily) launch another process. With multiple database connections, you can have concurrency, with or without threads.
https://sqlite.org/threadsafe.html
Again, whether this is viable in NodeJS, I have no idea. But it's a Node issue, not a C API issue.
BTW, we're commenting on a Ruby article, and SQLite in Ruby has seen "recent" advances that increase concurrency through implementing SQLite's BUSY handler in Ruby, which allows the GVL lock to be released, and other Ruby and SQLite code to run while waiting on a BUSY connection.
https://fractaledmind.github.io/2023/12/11/sqlite-on-rails-i...
The user-supplied busy handler has been available for a long while, it's just that the Rails connection adapters did not quite use it right. Indeed, there is elevated interest in SQLite these days.
Tnx for the links!
That's nice! I use bun, but it's lack of support for UDF makes me consider switching back to node.
I'm starting to second guess my choice of deno.
The options for sqlite in that are either via unstable APIs, or limited due to the particulars around WASM.
They're going to add it once it stabilizes in Node: https://github.com/denoland/deno/issues/24828#issuecomment-2...
Also available with LUA http://lua.sqlite.org/index.cgi/doc/tip/doc/lsqlite3.wiki#db...
Also part of Python's standard implementation of sqlite3. [0]
[0] https://docs.python.org/3/library/sqlite3.html#sqlite3.Conne...
With any decent host runtime, as it were.
There are also some good libs for SQLite. This is the standard set that I use. https://github.com/nalgeon/sqlean
That is pretty cool. And good to see you still doing talks.
I happen to work with BigQuery since recently and heard you can do UDFs with JavaScript. Good to know that this is a thing with more databases. I didn't need to use UDFs up until this point but now that I know about it a bit more, I just might as JS happens to be a strong language of mine (currently not using it professionally).
We actually use a couple Ruby UDFs with BigQuery, deployed as Google Cloud Functions. Works pretty well.
What an interesting find
Yes, I remember using that in PHP, very handy before sqlite JSON support, since it also worked with aggregate functions.
https://www.php.net/manual/en/sqlite3.createfunction.php
Also available through PDO:
https://www.php.net/manual/en/pdo-sqlite.createfunction.php
from the comments:
https://docs.python.org/3/library/sqlite3.html#sqlite3.Conne... http://lua.sqlite.org/index.cgi/doc/tip/doc/lsqlite3.wiki#db... https://nodejs.org/api/sqlite.html#databasefunctionname-opti... https://www.php.net/manual/en/sqlite3.createfunction.php