Pick the right database for data warehouse

For those who don’t want to go for licensing path. Open source is definitely a better solution. However, whether open source DBMS can be used to build your data warehouse? I am not a good person to answer this question. But I have seen more and more small and medium size companies launched their business intelligent platform powered by open source DBMS like PostgreSQL and MySQL. Before MySQL v5 released, I don’t recommend to use MySQL for data warehouse because it missed some of key features that others provide like trigger, stored procedure, partitioning. But now, I suggest to revisit it, especially I have heard MySQL become a golden partner with the great open source business intelligent platform “Pentaho”. Below is a rough comparison chart for DBMS I got from devx.com. Take a look at it first.

dbcomparison.JPG

There are debates about whether we should choose PostgreSQL vs MySQL. Here is one case study that shows PostgreSQL is better in OLTP system. However, for select query, another study shows MySQL v5 is 2X faster the PostgreSQL v8. For data warehouse application, MySQL sounds like a better option as it is mostly read-only.

Here is the summary that I obtained from this article that compares PostgreSQL with MySQL:

  1. MySQL uses traditional row-level locking. PostgreSQL uses something called Multi Version Concurrency Control (MVCC) by default. MVCC is a little different from row-level locking in that transactions on the database are performed on a snapshot of the data and then serialized.
  2. MySQL supports the advanced feature of data partitioning within a database whereas PostgreSQL does not.
  3. PostgreSQL has many of the database features that Oracle, DB2, or MS-SQL has, including triggers, views, inheritance, sequences, stored procedures, cursors, and user-defined data types. MySQL’s development version, version 5.0, supports views, stored procedures, and cursors. MySQL’s future version, version 5.1, will support triggers.
  4. PostgreSQL supports user-defined data types, while MySQL does not.
  5. Both MySQL and PostgreSQL have support for single-master, multi-slave replication scenarios. PostgreSQL offers additional support for multi-master, multi-slave replication from a third-party vendor, as well as additional replication methods.
  6. MySQL uses a threaded model for server processes, wherein all of the users connect to a single database daemon for access. PostgreSQL uses a non-threaded model where every new connection to the database gets a new database process.
  7. MySQL does not support for bitmap indexes. Bitmap indexes are ideal for the kind of low-cardinality data that is commonly used in data warehouses. PostgreSQL supports bitmap indexes as of version v8.1, as do a number of commercial database systems.

Comments

comments

Subscribe

Subscribe my "7 Days Crack Course" to make money online together! Free for the first 100 registration.

No comments yet.

Leave a Reply