Thursday, July 23, 2009

Alternatives to relational databases

At a brainstorm meeting with one of my colleagues we briefly discussed the way in which data is being stored in the cloud. Since you’re not sure of where data is being stored, it’s difficult to define relations. Most web applications nowadays use relational databases to store their data, which means it would require a significant data model change if relational data can no longer be used when they move to the cloud.

Almost all current database systems are relational: Oracle, MySQL, SQL Server, etc. A problem of relational databases is that they do not comply with the object-oriented programming model. In fact, 40% of total effort often goes into writing and maintaining data access code. Another problem with relational databases is that they are difficult to scale, because of the defined relations. In this blog post I present some alternatives to relation databases.

Alternative 1: Key/value database

In a key/value database, key/value pairs are stored in a domain, which is a bucket of data. These buckets have no relation to each other, nor does the data in them. Instead of defining relations, data is duplicated so that queries can be executed easily. It is important to realize that this may result in data inconsistency and the requirement for more storage capacity. Because no relations are defined between data, the responsibility of data integrity falls to the application, e.g. if a customers is removed, it is up to the application to remove all the orders made by that customer. Key/value databases are currently the preferred way in cloud applications, because they are very easily scalable and have no relations defined within them. Because key/value databases are therefore usually multi-tenanted, some sort of mechanism is required to make sure a user can’t overload the system. This is currently done by either limiting the maximum execution time of a query (Amazon SimpleDB) or the number of rows returned by a query (Google AppEngine Datastore).

Alternative 2: Object-oriented database

In an object-oriented database objects are sent to the database, which stores them as objects. The main advantage of an OO database is that there is no need to write a large amount of data access code, as the database is 1:1 with the code. The downside is that aggregate queries are much more expensive than in SQL.

Alternative 3: XML database

XML databases are very good at storing hierarchical data. The downside is that they are slower to query than SQL databases, especially for aggregate functions.

Alternative 4: Document-oriented database

Document-oriented databases are suitable for applications which are document-oriented (which are in fact most current web applications). They are schema-free, which leads to a very flexible database. The main advantage is that they are much simpler than relational databases, and that they are very scalable. Another advantage is that document-oriented databases have a mature implementation, Apache CouchDB. CouchDB is a distributed database system which offers mechanisms for replication and synchronization (scalability), and uses JavaScript to create views of data.

Conclusion

While trying to achieve a scalable database system, it is important to not lose track of the highest requirement: functionality. If we have a perfectly scalable database, but we can’t perform the queries we want, what good is it?

Non-relational databases appear to be useful in less complex web applications, as they seem to lack (or are very slow with) some advanced functionality such as aggregate queries. I do not see a good alternative yet for relational databases concerning this type of query. Especially in complex applications, with functionality such as report generation, relational databases still seem to be the best approach.

No comments:

Post a Comment