Showing posts with label databases. Show all posts
Showing posts with label databases. Show all posts

Friday, April 21, 2023

Data queries and transformations via WebAssembly plugins

Following on the previous research I've done on using WebAssembly as a way to write plugins possibly in several languages and running them in a Rust app via the Wasmer.io runtime, I've started building something more extensive.

The plugins now allow writing data querying and transformation code without having to deal with the low level details on how to connect to the underlying database. A plugin can:

- define the input parameters it needs to run.

- based on the actual provided values for these parameters, generate a SQL query string and bound parameters that actually need to run on the database. So a plugin can have full control on the SQL generation. Some plugins could always use the same SQL query and just use bound parameters coming from their input parameters, or could do any kind of preprocessing to generate the SQL, for things that bound parameters don't allow.

- the runtime will run the query with the bound parameters on the underlying database, and return the result row by row to the plugin. The plugin can then do whatever processing on each row it needs, and return intermediate results or only return results at the end.

Since I using Wasmer, let's define the interface we need in WAI format:

// Get the metadata of the query.
metadata: func() -> query-metadata

// Start the query processing with the given variables.
start: func(variables: list<variable>) -> execution

// Encapsulates the query row processing.
resource execution {
// The actual query to run.
query-string: func() -> string

// The variables to use in the query.
variables: func() -> list<variable>

// Callback on each data row, returning potential intermediate results.
row: func(data: list<variable>) -> option<query-result>

// Callback on query end, returning potential final results.
// Columns are passed in case no data was returned.
end: func(columns: list<string>) -> option<query-result>
}

The simple data types are left out for brevity (they are defined in their own file), but hopefully the intent of this interface should be clear.  The metadata function returns a description of what the plugin does and what parameters it takes (a parameter is strongly typed). The start function take actual values for parameters and return an execution resource. This execution exposes the actual query string and bound parameters for that query (via the variables method). The runtime will then call the row function for each result row and the end function at the end. Each of these can return a possibly partial result. The end function takes the names of columns so that proper metadata is known even if the query returned no row.

Examples of very basic plugins used in tests can be seen here and here. They just collect the data passed to them and return it in the end method.

Each plugin can then be compiled to WASM for example via the cargo wapm --dry-run command provided by Wasmer.

The current runtime I've built is very simple: it takes all plugins from a folder and database connections are defined in a YAML file, and only Sqlite and Postgres are supported. An executable is provided to be able to run plugins from the command line.

Using a WAI interface and not having to deal with low level WASM code is great. cargo expand is your friend to understand what Wasmer generates as structures are generated differently between the import! and export! macros, so some structures own their data while some take references, which can sometimes trip you up.

Of course I would need to test this for performance, to determine how much copying of data is done between the Rust runtime and the WASM plugins.

Let me know if you have use cases where this approach could be interesting! As usual, all code is available on Github.


Friday, January 23, 2015

Writing a low level graph database

I've been interested in graph databases for a long time, and I've developed several applications that offer an API close enough to a graph API, but with relational storage. I've also played with off the shelf graph databases, but I thought it would be fun to try an implement my own, in Haskell of course.
I found that in general literature is quite succinct on how database products manage their physical storage, so I've used some of the ideas behind the Neo4J database, as explained in the Graph Databases books and in a few slideshows online.
So I've written the start of a very low level graph database, writing directly on disk via Handles and some Binary instances. I try to use fixed length record so that their IDs translate easily into offsets in the file. Mostly everything ends up looking like linked lists on disk: vertices have a pointer to their first property and their first edge, and in turn these have pointers to the next property or edge. Vertex have pointers to edges linking to and from them.
I've also had some fun trying to implement an index trie on disk.
All in all, it's quite fun, even though I realize my implementations are quite naive, and I just hope that the OS disk caching is enough to make performance acceptable. I've written a small benchmark using the Hackage graph of packages as sample data, but I would need to write the same with a relational backend.

If anybody is interested in looking at the code or even participate, everything is of course on Github!