I've published my prototype of a graph database, HGraphStorage, on Hackage. Not to say that this is a database you should use in production! This is just a test bed to play around graph database concepts and low level algorithms for storage and retrieval.
Just releasing it was, ahem, fun, as somebody noticed that it didn't build in a clean sandbox, and then that the haddock comments were broken (it's a bit annoying that a one space error in Haddock markup causes the whole install to fail). So hopefully now it's ready for people to try and improve!
Pull requests welcome!!
In this blog I talk about some of the personal programming I do as a hobby. From Java to Rust via Haskell, I've played around with a lot of technologies and still try to have fun with new languages and APIs!
Showing posts with label graph databases. Show all posts
Showing posts with label graph databases. Show all posts
Sunday, February 01, 2015
Friday, January 23, 2015
Writing a low level graph database
I've been interested in graph databases for a long time, and I've developed several applications that offer an API close enough to a graph API, but with relational storage. I've also played with off the shelf graph databases, but I thought it would be fun to try an implement my own, in Haskell of course.
I found that in general literature is quite succinct on how database products manage their physical storage, so I've used some of the ideas behind the Neo4J database, as explained in the Graph Databases books and in a few slideshows online.
So I've written the start of a very low level graph database, writing directly on disk via Handles and some Binary instances. I try to use fixed length record so that their IDs translate easily into offsets in the file. Mostly everything ends up looking like linked lists on disk: vertices have a pointer to their first property and their first edge, and in turn these have pointers to the next property or edge. Vertex have pointers to edges linking to and from them.
I've also had some fun trying to implement an index trie on disk.
All in all, it's quite fun, even though I realize my implementations are quite naive, and I just hope that the OS disk caching is enough to make performance acceptable. I've written a small benchmark using the Hackage graph of packages as sample data, but I would need to write the same with a relational backend.
If anybody is interested in looking at the code or even participate, everything is of course on Github!
I found that in general literature is quite succinct on how database products manage their physical storage, so I've used some of the ideas behind the Neo4J database, as explained in the Graph Databases books and in a few slideshows online.
So I've written the start of a very low level graph database, writing directly on disk via Handles and some Binary instances. I try to use fixed length record so that their IDs translate easily into offsets in the file. Mostly everything ends up looking like linked lists on disk: vertices have a pointer to their first property and their first edge, and in turn these have pointers to the next property or edge. Vertex have pointers to edges linking to and from them.
I've also had some fun trying to implement an index trie on disk.
All in all, it's quite fun, even though I realize my implementations are quite naive, and I just hope that the OS disk caching is enough to make performance acceptable. I've written a small benchmark using the Hackage graph of packages as sample data, but I would need to write the same with a relational backend.
If anybody is interested in looking at the code or even participate, everything is of course on Github!
Subscribe to:
Posts (Atom)