Category Archives: 2.6. Unleash Your System

Learning Hive

Starting to learn Hive As I mentioned in my last article,  I was getting excited about the potential of Hive. Today, I decide to start my journey to learn this. I found a great introductory video that gives you a nice warm-up of using Hive (A basic knowledge of how hadoop and mapreduce work would [...]

Hive on Amazon EC2 cloud

  I ever worked for a display ad network company that collects over 400 million of impression/ click logs per day. With this amount of data, my ex-company bought a supercomputer and cross their fingers that it can handle the grow in both volume and analytic demand of the data. It is obviously not a [...]

Powerful Linux Text Processing Commands

Common Text Processing Commands In our daily life, we deal with lots of data. The data normally is stored in text format for the ease of human to read. With the large amount of data we have, we need ways to deal with it. There are several things we frequently do on the data: Search, [...]

Plenty of Fish – Cash cow!

A site called “PlentyOfFish.com” is currently getting 30 million hits a day. The number doesn’t blow me off. However, what surprise me is that this site is basically operated by single man “Markus Frind”. How does he achieved that? If you want to hear how he does that, you can go to his interview from [...]

Powerful Full Text Search Engine – Part 1 Lucene Introduction

Introduction of Lucene I have heard of Lucene and its powerful full text search capability many times. Today, I decide to take a look at it. Before I dive into the user guide, I went to Google Tech Talk to find a video related to Lucene first. Here is what I found:  After I finished [...]

Tomcat Performance Tuning

Most companies I have worked for use Tomcat as Servlet Container. It is de facto standard just like how Apache been used as Web Server. However, most of us just drag our war file to the webapp folder and use Tomcat with all the settings as default out of the box. It works fine in [...]

Power of awk

If you have a file of records, and you want to find out which record(s) meets the criteria like field1=xyz, field2=abc… How would you approach it? Simple! Load the file to database, write a sql with where clause and have the database taken care of it for you. Is it the simplest way? May not! [...]

WebDav vs FTP

Today, I have come across a technical issue that a process is taking too long to download a file from one of our file server. The reason is due to the number of the files of a folder is increased over time and finally reach to ~ 12000. If you use ftp, you need to [...]

Basic hardware knowledge

What should we look at for a machine? CPU (how many core, how many physical cpu(s), how fast, 64 bits?, cache size) Memory RAM IO speed  Dual-core CPU vs multiprocessor A dual-core CPU is a CPU with two separate cores on the same die, each with its own cache. It’s the equivalent of getting two microprocessors in [...]

Tomcat 5.5 – Quick Notes

Configure Tomcat Change port to 80.  Edit install_dir/conf/server.xml and change the port attribute of the Connector element from 8080 to 80. Turn on servlet reloading. Edit install_dir/conf/context.xml and change <Context> to <Context reloadable=”true”>. Change the default AJP/1.3 connector port of Tomcat. Edit install_dir/conf/server.xml and change the value of the port attribute in the AJP/1.3 Connector element. [...]

Page 1 of 212