Friday, April 21, 2023

Data queries and transformations via WebAssembly plugins

Following on the previous research I've done on using WebAssembly as a way to write plugins possibly in several languages and running them in a Rust app via the Wasmer.io runtime, I've started building something more extensive.

The plugins now allow writing data querying and transformation code without having to deal with the low level details on how to connect to the underlying database. A plugin can:

- define the input parameters it needs to run.

- based on the actual provided values for these parameters, generate a SQL query string and bound parameters that actually need to run on the database. So a plugin can have full control on the SQL generation. Some plugins could always use the same SQL query and just use bound parameters coming from their input parameters, or could do any kind of preprocessing to generate the SQL, for things that bound parameters don't allow.

- the runtime will run the query with the bound parameters on the underlying database, and return the result row by row to the plugin. The plugin can then do whatever processing on each row it needs, and return intermediate results or only return results at the end.

Since I using Wasmer, let's define the interface we need in WAI format:

// Get the metadata of the query.
metadata: func() -> query-metadata

// Start the query processing with the given variables.
start: func(variables: list<variable>) -> execution

// Encapsulates the query row processing.
resource execution {
// The actual query to run.
query-string: func() -> string

// The variables to use in the query.
variables: func() -> list<variable>

// Callback on each data row, returning potential intermediate results.
row: func(data: list<variable>) -> option<query-result>

// Callback on query end, returning potential final results.
// Columns are passed in case no data was returned.
end: func(columns: list<string>) -> option<query-result>
}

The simple data types are left out for brevity (they are defined in their own file), but hopefully the intent of this interface should be clear.  The metadata function returns a description of what the plugin does and what parameters it takes (a parameter is strongly typed). The start function take actual values for parameters and return an execution resource. This execution exposes the actual query string and bound parameters for that query (via the variables method). The runtime will then call the row function for each result row and the end function at the end. Each of these can return a possibly partial result. The end function takes the names of columns so that proper metadata is known even if the query returned no row.

Examples of very basic plugins used in tests can be seen here and here. They just collect the data passed to them and return it in the end method.

Each plugin can then be compiled to WASM for example via the cargo wapm --dry-run command provided by Wasmer.

The current runtime I've built is very simple: it takes all plugins from a folder and database connections are defined in a YAML file, and only Sqlite and Postgres are supported. An executable is provided to be able to run plugins from the command line.

Using a WAI interface and not having to deal with low level WASM code is great. cargo expand is your friend to understand what Wasmer generates as structures are generated differently between the import! and export! macros, so some structures own their data while some take references, which can sometimes trip you up.

Of course I would need to test this for performance, to determine how much copying of data is done between the Rust runtime and the WASM plugins.

Let me know if you have use cases where this approach could be interesting! As usual, all code is available on Github.


No comments: