Tag Archives: hadoop

Learning Hive

Starting to learn Hive As I mentioned in my last article,  I was getting excited about the potential of Hive. Today, I decide to start my journey to learn this. I found a great introductory video that gives you a nice warm-up of using Hive (A basic knowledge of how hadoop and mapreduce work would [...]

Hive on Amazon EC2 cloud

  I ever worked for a display ad network company that collects over 400 million of impression/ click logs per day. With this amount of data, my ex-company bought a supercomputer and cross their fingers that it can handle the grow in both volume and analytic demand of the data. It is obviously not a [...]

Powerful Full Text Search – Part 2 Nutch

Introduction of Nutch & Hadoop After Lucene, the author created another powerful tool. Its name is Nutch. Nutch is a powerful crawler built on top of the Lucene. With Nutch, you can launch a multi-threaded crawler to obtain information from the Net. At this point of writing, Nutch is in its 0.9 version. Nutch comes [...]