Category Archives: 4. Data Intelligence

Learning Hive

Starting to learn Hive As I mentioned in my last article,  I was getting excited about the potential of Hive. Today, I decide to start my journey to learn this. I found a great introductory video that gives you a nice warm-up of using Hive (A basic knowledge of how hadoop and mapreduce work would [...]

Hive on Amazon EC2 cloud

  I ever worked for a display ad network company that collects over 400 million of impression/ click logs per day. With this amount of data, my ex-company bought a supercomputer and cross their fingers that it can handle the grow in both volume and analytic demand of the data. It is obviously not a [...]

How to build data warehouse

Operational databases are most commonly designed using normalized modeling, often using third-normal form or entity-relationship modeling. Normalized database schemas are tuned to support fast updates and inserts by minimizing the number of rows that must be changed when recording new data.Example: Order-Management Schema for operational database Data warehouses differ from operational databases in the way [...]

Adobe Air with SQLite database

Recently, I am trying to build an interactive reporting tool that needs to deal with lots of data. The data is not dynamic because it is basically data from historical performance log files. However, the volume of the data is large (over few millions of rows) and I still want my clients to interact with [...]

Flex Startup Sequence

Magic behind the scene I always wonder how my Flex application displayed on the Flash Player in browser. Why decompile Flex SWF will give me 2 frames movie? What is SystemManager and how can I get a handle of it? Many of these kind of questions are at the lower level. The level that makes [...]

Flex Annotated Charting

Recently, I want to extend the LineChart in Flex. I want to have line chart with event annotated like Google Finance.   First of all, I googled the Net to see whether anyone had already done it. It was even better if I could find any open source project related to this. Below are the [...]

Database concurrency control – MVCC

Concurrency Issue – Lost Update Lost update is the key concurrency problem that we try to avoid: From the sequences above, SELECT-UPDATE transaction A will overwrite the update Transaction B made to the balance. .  If the transactions of A and B would serialize properly, the correct balance value after these transactions would be 700. [...]

Concurrent Programming – Part 1 Synchronization

Get yourself familiar with concurrency programming When I interview my candidates, I like to ask questions related to multi-threading. I found out that it is a good topic to differentiate out a hardcore programmer from application-oriented programmer. I am not saying I am looking for someone who could write the concurrency library as efficient as [...]

Common DBA jobs

Export schema/ data out from mysql To export schema and/or data, you can use mysqldump command: mysqldump -u [username] -p[password] -d [schema_name] > [filename].sql -d means no data (just gives me the schema). -B is needed for multiple schema output -h (hostname) Export data out from postgresql Export table data from postgresql to csv format Backup [...]

Database Performance – Indexing

There are 2 main focuses I will take to analyze a database. First, I will find out how it manages the data. Second, I will look at how it scales in term of data volume and traffics. Today, I will talk about the most common indexing scheme that most of the databases use today. It [...]

Page 1 of 41234