Category Archives: 4.3. Extract Intelligence

Learning Hive

Starting to learn Hive As I mentioned in my last article,  I was getting excited about the potential of Hive. Today, I decide to start my journey to learn this. I found a great introductory video that gives you a nice warm-up of using Hive (A basic knowledge of how hadoop and mapreduce work would [...]

Business Intelligence – Part 1 Pentaho

Getting into Business Intelligent World When I dig deeper in business intelligence, I found out that it is a huge topic ranging from reporting to data mining. Like all the knowledge acquisition plan, I put a series of milestones for myself. If you are interested, here is my list: Get and prepare your data Data [...]

Postgresql – Power of Array Type

Create 2 tables Item(id) and Item_log(item_id, price) Populate it insert into item(id) values(1); insert into item(id) values(2); insert into item(id) values(3); insert into item(id) values(4); insert into item_log(item_id, price) values(1, 100); insert into item_log(item_id, price) values(1, 100); insert into item_log(item_id, price) values(1, 100); insert into item_log(item_id, price) values(1, 200); insert into item_log(item_id, price) values(1, 200); [...]

Data warehouse 101

To build data warehouse, you will use the techniques of dimensional modeling. Here are the guidelines you can follow: Divide the world into measurements and context. Numeric measurements place in Fact table whereas context are broken down into Dimensions. A fact table in a pure star schema consists of multiple foreign keys, each paired with [...]

How to build your data warehouse

Operational databases are most commonly designed using normalized modeling, often using third-normal form or entity-relationship modeling. Normalized database schemas are tuned to support fast updates and inserts by minimizing the number of rows that must be changed when recording new data.  Example: Order-Management Schema for operational database Data warehouses differ from operational databases in the [...]

Pick the right database for data warehouse

For those who don’t want to go for licensing path. Open source is definitely a better solution. However, whether open source DBMS can be used to build your data warehouse? I am not a good person to answer this question. But I have seen more and more small and medium size companies launched their business [...]

Pentaho – Quick Start

This goal of this post is to walk you through an awesome business intelligent framework named “Pentaho”. I believe the philosophy of “Learn by Practice”. So, I will show you the steps to get pentaho up and run for a fictitious company. Along with this exercise, you should be able to understand how Pentaho works [...]

Pentaho Reporting Framework – Architecture

I am looking into Pentaho currently for my project. It looks very promising so far. Here is the video that talks about the architecure of it. Enjoy.