Git: the NoSQL database
We all know that Git is pretty amazing. It’s fast, reliable, flexible, and it keeps our project history safely nuzzled in it’s cozy object database while we sleep soundly at night. But I’m curious to see if it can be used for more than code. I’ve had a few apps in the back of my mind for a while now that would be really interesting if the data was stored in Git.
If only there was an easy way to read and write a Git repo from Ruby…
Toystore & adapter-git
Toystore is an ActiveModel-based object mapper for key-value data stores. The beauty of Toystore is that it doesn’t care what the backend is. It uses Adapter to abstract the connection to any data store that can set, get, and delete keys.
Well, Git is a key-value store; it supports set, get and delete on keys (a.k.a. paths). So I sat down with Scott Chacon’s Git Internals Peepcode PDF and put together adapter-git, built on top of Grit.
Now I can create pretty models that are stored in Git.
class Item include Toy::Store store :git, Gaskit.repo, :branch => 'content', :path => 'items' attribute :title, String attribute :description, String attribute :created_at Time, :default => lambda { Time.now } end
Toystore uses conventions that will be familiar to anyone that has used Active Record or MongoMapper.
item = Item.create!(:title => 'Git: the NoSQL database') item.update_attributes(:description => "OMG this is awesome!")
The biggest difference is that you can’t “find” records. The data stored in a key-value store is opaque, so all you can do is get it by key.
item = Item.get!('3FB053FA-0A3B-4903-9CE0-2A8A964E0F37')
Caveats
I have no idea if Git will work as a data backend for an application. I’m sure GitHub has solved many of the problems with concurrency and scaling a filesystem.
There is still a lot of room for improvement on adapter-git too. Here are just a few things I’d like to add soon:
- Locking – I wouldn’t want to use adapter-git for anything with a lot of concurrency at the moment. Git commits are atomic; you’ll never corrupt a repo by a failed commit, but if you have a lot of concurrent access, you might loose commits.
- Custom commit messages – Currently adapter-git just uses the ID of the key being set in the commit message. In the app I’m experimenting with, I’ve already had a desire to set custom commit messages.
- Update working copy – adapter-git currently just works against the git repo itself. It doesn’t update your working copy. So `git status` will currently tell you that you’ve deleted files from your working copy after you update records with adapter-git.
- Merge conflicts – I’m looking forward to being able to programmatically resolve merge conflicts. Riak has a cool pattern for resolving conflicts on read, so I’d love to see if I can build something into adapter-git and toystore to work in a similar fashion.
Check out adapter-git on GitHub and try building an app backed by Git!
Update: Check out the video and slides for a talk I gave on this topic.
8 Comments
You could definitely have an extra param to read, write, delete that is a hash of options. This would allow you to customize the commit messages. Cool stuff.
I think the interesting thing here is that this shows how git could be improved as well.
I’m definitely going to have to look into ToyStore. Could become my favorite new gem.
John: Great idea. I’m thinking about creating toystore-git to extend toystore with some other git functionality (such as commit messages), like you did with toystore-mongo.
Kurtis: Yeah, toystore has quickly become one of my favorite tools. I highly recommend it.
You did it while I was thinking about it! Great!
Beyond using Git as a repository, I just wonder these days about the similarities between Git and a web server.
Well, I don’t think a web server will ever offer a diff (feature) between doc versions, but in a perfect world, I think a web server should offer a versionned management of docs (for example, through ETag http header field). So, for example, I wonder if there is something to add to the http protocol in order to (better?) support versioning natively in web server according to Git way of doing.
Or, to say things differently: will Git be webified in order to support natively http, just like CouchDB (repository), or will we see web servers evolving to support natively some Git operations like versioning ?
Thinking about Git and web protocol could be extended beyond server-side, towards client-side. Indeed, a browser cache may be seen as a Git instance interacting with a Git server-side instance, in order to synchronize itself to download docs, before rendering purposes.
So, thinking about Git for repository may be just begin.
@Dominique:
I don’t think Git should (or ever would) include things such as a HTTP interface because that’s the job of tools built on Git. The core of Git should only care about blobs and trees and the like.
As for applying Git to more general problems, I think it’s drawing the wrong conclusions. The more problems you find to use Git on, the more likely it’s an imperfect solution (because it was designed for code version control).
Let’s leave git to version control. It’s good. “Anything that works will be used in progressively more challenging applications until it fails”.
We explored Git as a possible backend for our iPhone app which supports contact diffs/versioning. This approach didn’t scale in the slightest. Once you move beyond one server the difficulties become enormous. NFS is too slow for anything approaching a large repository. Facebook has run into similar issues with Git not being able to keep up with some of their larger repositories.
Using git as anything but a versioning system is madness. Good solid fun kind of madness, but still madness.
For Michael Rose, there are plenty of really big git services scaled over buckets of machines and they’ve written how they do it; e.g. here’s an (old) entry about github:
https://github.com/blog/530-how-we-made-github-fast