<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Solution Hacker</title>
	<atom:link href="http://www.solutionhacker.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.solutionhacker.com</link>
	<description>This blog provides solutions for enterpreneurs!</description>
	<lastBuildDate>Wed, 28 Jul 2010 06:08:04 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Steve, how could you stop Flash on your gadgets?</title>
		<link>http://www.solutionhacker.com/fun-stuff/steve-how-could-you-stop-flash-on-your-gadgets/</link>
		<comments>http://www.solutionhacker.com/fun-stuff/steve-how-could-you-stop-flash-on-your-gadgets/#comments</comments>
		<pubDate>Mon, 26 Jul 2010 06:12:10 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[2. Technology]]></category>
		<category><![CDATA[5. Fun]]></category>
		<category><![CDATA[ban]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[frash]]></category>
		<category><![CDATA[iPad]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=408</guid>
		<description><![CDATA[To many of Flash/Flex programmers, it could be a bad news that Steve Job openly banned Flash on his devices like iPad. His message clearly stated that HTML5 can be used to replace the rich experience of Flash and it will be our future. He may be right about the cons of Flash. Nothing is [...]]]></description>
			<content:encoded><![CDATA[<p><img align="left" height="100" src="http://www.solutionhacker.com/wp-content/uploads/apple-ipad-no-flash-2.jpg" style="margin-right: 10px;" width="100" />To many of Flash/Flex programmers, it could be a bad news that Steve Job openly <a href="http://www.apple.com/hotnews/thoughts-on-flash/">banned</a> Flash on his devices like iPad. His message clearly stated that HTML5 can be used to replace the rich experience of Flash and it will be our future. He may be right about the cons of Flash. Nothing is perfect. However, I am surprised that he took a step further to ban Flash totally. Why couldn&#39;t he simply provide an option for users to turn it off if they want? I really doubt about his intent. Whenever I see something like that, it only reminds me what Microsoft did in the old days.</p>
<p><span id="more-408"></span>In fact, from his TOS, it stated clearly that the most important reason for Apple to ban Flash is because Apple doesn&#39;t want a third party layer of software come between the platform and the developer.</p>
<blockquote>
<p>Allowing Flash &mdash; which is a development platform of its own &mdash; would just be too dangerous for Apple, a company that enjoys exerting total dominance over its hardware and the software that runs on it. Flash has evolved from being a mere animation player into a multimedia platform capable of running applications of its own. That means Flash would open a new door for application developers to get their software onto the iPhone: Just code them in Flash and put them on a web page. In so doing, Flash would divert business from the App Store, as well as enable publishers to distribute music, videos and movies that could compete with the iTunes Store &#8211; <a href="http://www.wired.com/gadgetlab/2008/11/adobe-flash-on/">Brian</a><a href="http://www.wired.com/gadgetlab/2008/11/adobe-flash-on/"> on wired.com</a></p>
</blockquote>
<div style="overflow: hidden; color: rgb(0, 0, 0); background-color: transparent; text-align: left; text-decoration: none; border: medium none;">&nbsp;</div>
<div style="overflow: hidden; color: rgb(0, 0, 0); background-color: transparent; text-align: left; text-decoration: none; border: medium none;">Putting aside Apple&#39;s intent, as a consumer, I don&#39;t think I want to carry a bigger iPhone (ie. iPad) that doesn&#39;t provide me the full web experience. I don&#39;t care what HTML5 will turn out, I need to read Flash now because it could take years if not decades for Flash be totally eliminated on Web (I doubt this will happen!). If you really like iPad but still want to see the Flash on it, here is the good news for you. You now can install Frash to get around it.&nbsp; Below is a video that shows you how to install it and get it work on your iPad.</div>
<div style="overflow: hidden; color: rgb(0, 0, 0); background-color: transparent; text-align: left; text-decoration: none; border: medium none;">&nbsp;</div>
<p><object height="385" width="640"><param name="movie" value="http://www.youtube.com/v/933NcE_X_t0&amp;hl=en_US&amp;fs=1" /><param name="allowFullScreen" value="true" /><param name="allowscriptaccess" value="always" /><embed allowfullscreen="true" allowscriptaccess="always" height="385" src="http://www.youtube.com/v/933NcE_X_t0&amp;hl=en_US&amp;fs=1" type="application/x-shockwave-flash" width="640"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/fun-stuff/steve-how-could-you-stop-flash-on-your-gadgets/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Learning Perl &#8211; Day 1 (Data Type)</title>
		<link>http://www.solutionhacker.com/fun-stuff/learning-perl-day-1-data-type/</link>
		<comments>http://www.solutionhacker.com/fun-stuff/learning-perl-day-1-data-type/#comments</comments>
		<pubDate>Sat, 24 Jul 2010 09:32:53 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[5. Fun]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=379</guid>
		<description><![CDATA[Recently, I need to pick up Perl for some of my projects. After going through some websites and books, I start seeing why some of the tasks could be done much easier in Perl than in Java, the language that I am quite familiar with. Since Perl is loose in data type, not OO-enforced, non-explicit [...]]]></description>
			<content:encoded><![CDATA[<h3><img align="left" height="100" src="http://www.solutionhacker.com/wp-content/uploads/camel_head.png" style="margin-right: 10px;" width="100" /></h3>
<p>Recently, I need to pick up Perl for some of my projects. After going through some websites and books, I start seeing why some of the tasks could be done much easier in Perl than in Java, the language that I am quite familiar with. Since Perl is loose in data type, not OO-enforced, non-explicit signature plus its syntax is full of symbols, some Perl programmers given the freedom and power without following the best practices could write very illegible code. Although it runs, it could be very hard to maintain. For those who feel the pains, you may like this <a href="http://www.garshol.priv.no/download/text/perl.html">post</a> . However, I don&#39;t want to go extreme on this. I do see its power in text processing and system integration. After taking a look at the extensive CPAN library, I decide to give it a fair trial. Not only that, I will write a series of posts to help you to shorten your learning curve as well. This post I will put my head on Perl data type and its usages.</p>
<p><span id="more-379"></span></p>
<h2>Data Type</h2>
<h3>Number and String</h3>
<pre># numeric
$var = 123;
$var = 123.34;  #float or double

# very large long unsigned integer like md5 value.
$bignum = Math::BigInt-&gt;new(&quot;Ox1821231223238234234&quot;);

# string - single quote: no interpretation; double quote: interpreted at runtime
$myVar = &#39;abc&#39;;
$myVar = &quot;product price is $price&quot;;
</pre>
<h3>Array</h3>
<pre>@myarray = (1,2,4, &#39;abc&#39;, &quot;abc&quot;, $myvar);
print $myarray[0]; #1 is shown

@coins = (&quot;Quarter&quot;,&quot;Dime&quot;,&quot;Nickel&quot;); # quotation could be hassle!
@coins = qw(Quarter Dime Nickel);     # remove the quotation headache easily

# To get numeric length of an array, you can use the scalar() function or
# we can redefine the array as a scalar variable.
$coinlength = @coins;
print scalar(@coins);
print $coinlength;

# array in perl can dynamic growth in size
push(@coins, &quot;Penny&quot;);     #add element to the end of the array
unshift(@coins, &quot;Dollar&quot;); #add element to the front of the array
pop(@coins);               #remove last element of the array
shift(@coins);             #remove first element of the array
delete $coins[1];          #delete element with index 1 in the array

@names = split(&#39;,&#39;,$namelist); #convert csv line in string to array
$namelist = join(&quot;,&quot;,@names);  #reconstruct via joining them back

@Foods = qw(Pizza Steak chicken burgers);
@foods = sort(@Foods);         #sort the array according to ASCII Numeric

# transform to lowercase as Capital case can mess up the order
foreach $food (@Foods) {
	push(@foods,  &quot;\L$food&quot;);
}

# sort it
@foods = sort(@foods);

my $count = 0;
for (@list) {
   $count++ if $_ eq &quot;apple&quot;; # $_ is assigned the current element of the loop
}

# slice the array
my @stuff = qw/everybody wants a rock/;
my @rock  = @stuff[1 .. $#stuff];      # @rock is qw/wants a rock/
my @want  = @stuff[ 0 .. 1];           # @want is qw/everybody wants/
</pre>
<h3>Hash</h3>
<p>Hash is also called associative array that is composed of a list of key-value pair entries and the keys here are unique. It is like Map in Java.</p>
<pre>%myhash = (&#39;key1&#39; =&gt; &#39;abc&#39;, &#39;key2&#39; =&gt; 5, &#39;key3&#39; =&gt; $myvar);

# even number array can convert into hash
# element[i] =&gt; element[i+1] where i is 0,2,4..etc
%coins = ( &quot;Quarter&quot;, 25, &quot;Dime&quot;, 10, &quot;Nickel&quot;, 5 );   

$myKey = &quot;Dime&quot;;
$coins{$myKey} = 15;  # assign using variable
print $coins{&quot;Dime&quot;}; # output is 15;
print $coins{Dime};   # you can simplify it without quote there

foreach $key (sort keys&nbsp;%coins) {
     print &quot;$key: $coins{$key} \n&quot;;
}

$coins{HalfDollar} = .50;   #add new element
delete($coins{HalfDollar}); #delete an element

if ( exists $hash{key} ){
   #retrieve value here
}

# slice a hash
my&nbsp;%table = qw/schmoe joe smith john simpson bart/;
my @friends = @table{&#39;schmoe&#39;, &#39;smith&#39;};   # @friends has qw/joe john/
</pre>
<h3>Subroutine</h3>
<p>Parameters passes to subroutine will be stored in @_.&nbsp; You can create a reference of a subroutine and pass it to a method as function pointer.</p>
<pre>sub my_sub{
  my($msg) = @_;
  print &quot;this is my own message: $msg&quot;;
}
&amp;my_sub(); #call with option &amp; in front
my_sub(&#39;hello world&#39;); #call without &amp; in front and parameter can be dynamic here.
</pre>
<h2><span class="mw-headline">Reference and Dereference </span></h2>
<p><span class="mw-headline">So far so good? Now I am going to show you how to pass by reference rather than pass by value. The reason you want to pass by reference because it is more memory efficient.</span> However, to do that, I have to take you to a trip of syntax mess. It is what gave me a hard time before and I hope I can make it clear so you will not be suffered the pain I had. OK. Lets start! <strong>Reference</strong> is a <strong>scalar</strong> that refers to the data stored in another variable of any type, as well as subroutine and methods. It gives you the ability to pass by reference a large variable to a function instead of pass by value.</p>
<h3>Reference</h3>
<pre>$varRef = \$myvar; #varRef now stores the physical address of $myvar
$arrayRef = \@myarray;
$subRef = \&amp;my_sub;
$arrayRef = [1, 2, 3]; #arrayRef is pointing to the address of an anonymous array

%hash = (&#39;key1&#39;=&gt; &#39;var1&#39;, &#39;key2&#39; =&gt; &#39;var2&#39;); #regular hash
$rhash = \%hash;  #create reference to the hash
$href = {&#39;key1&#39;=&gt; &#39;var1&#39;, &#39;key2&#39; =&gt; &#39;var2&#39;}; #reference to anonymous hash
</pre>
<h3>Dereference</h3>
<p>To access the content of a reference, there are special ways to do that:</p>
<pre>my $name = &quot;Test Users&quot;;
my $rname = \$name; 

sub my_func{
  my ($ref) = @_;
  print &quot;Scalar ref value is $$ref&quot;;  #dereference
}

my_func($rname);
my_func(\$name);

# array can be deference in 2 ways
$arrayRef-&gt;[1];
$$arrayRef[1];
@array = @$arrayRef; #the whole array available again for regular usage

print $hash{key1};        # Ordinary hash lookup
print $$rhash{key1};      # hash replaced by $rhash
print $rhash-&gt;{key1};     # hash replaced by $rhash

# we are getting ourselves confused via using the syntax of dereference,
# we could get back the actual value via:
@{array_reference}
%{hash_reference}
${scalar_reference}

print Dumper&nbsp;%$rhash;     # dump the whole hash (valid syntax without the bracket)
for(keys&nbsp;%{$rhash})       # for more clarity, enclose it in curly brackets.
{
...
}

# map of anonymous subroutines
my&nbsp;%hash = (add =&gt; sub {my var1 = @_; print &quot;Add $var1\n&quot;},
            substract =&gt; sub {my var1 = @_; print &quot;Subtract $var1 \n&quot;} );
# get a function pointer and invoke it using -&gt; if you need to pass parameter(s).
$hash{$add}-&gt;(5);
</pre>
<h2><span class="mw-headline">Confused! Give me a chart!!<br />
	</span></h2>
<p><span class="mw-headline">I hope that you are still ok. Sometimes you see @varArray in code, it could mean the array itself or the size of it. It depends on the context. If @varArray is under scalar context, it means the length of the array.</span> Otherwise, it means the array itself. To avoid confusion, you can always use scalar(@varArray) to get the size of the array. However, someone may use $#varArray+1 to represent the length of it. In fact, $#varArray means the last index of the array. Since array index is starting from 0, so you add 1 to get the actual size of the array. Below is a chart that I see it quite helpful when I get lost in the Perl syntax.</p>
<pre>============================================================================================================
Expression 	 Context 	 Variable 	        Evaluates to
============================================================================================================
$scalar 	 scalar 	 $scalar, a scalar 	the value held in $scalar
@array 	         list 	         @array, an array 	the list of values (in order) held in @array
@array 	         scalar 	 @array, an array 	the total number of elements in @array (same as $#array + 1)
$array[$x] 	 scalar 	 @array, an array 	the ($x+1)th element of @array
$#array 	 scalar  	 @array, an array 	the subscript of the last element in @array (same as @array -1)
@array[$x, $y] 	 list 	         @array, an array 	a slice, listing two elements from @array (same as ($array[$x], $array[$y]))
&quot;$scalar&quot; 	 scalar          $scalar, a scalar 	a string containing the contents of $scalar
&quot;@array&quot; 	 scalar          @array, an array 	a string containing the elements of @array, separated by spaces
%hash 	         list 	        &nbsp;%hash, a hash 	        a list of alternating keys and values from&nbsp;%hash
$hash{$x} 	 scalar 	&nbsp;%hash, a hash 	        the element from&nbsp;%hash with the key of $x
@hash{$x, $y} 	 list 	        &nbsp;%hash, a hash 	        a slice, listing two elements from&nbsp;%hash (same as ($hash{$x}, $hash{$y})
</pre>
<h2>Real Life Usage</h2>
<p>Now you have gone through basically the most common use of Perl syntax. If you are still reading this, congratulation! You can now move a step further to see how it applies to some real life examples. I am going to show how people use Hash because I see it more fun.</p>
<pre>#--------- counting ---------
How you generate a histogram of term from a text file?
(1) Convert the text to a list of string term (you could eliminate the punctuation, normalize case, get rid of the stop words as you wish)
(2) Walk through the list and build a map of term with count as value. How? In Java, I will loop thru the list check if the term is there
    in the map. If not, put it there with count=1. If yes, pull the current count of the term and increment it. In Perl, the syntax is much easier:

my&nbsp;%histogram;
$histogram{$_}++ for @list;  # for loop on the list and each element by default is assigned to $_&#39;
$unique = keys&nbsp;%histogram;   # obtain the number of unique terms
@unique = keys&nbsp;%histogram;   # obtain the list of the keys
@popular = (sort { $histogram{$b} &lt;=&gt; $histogram{$a} } @unique)[0..4];  # obtain top 5 based on the count

#--------- searching ---------
my $index;
for $index (0..@chambers) {               # linear search
   last if $chambers[$index] == $bullet;  #exist at this condition
}
print &quot;Found at index $index&quot; if $index &lt; @chambers;

NOTICE: we see lot of actions in perl can be controlled by condition followed. If condition returns false, the whole statement will not be executed.

#---------- dispatch table -----------

# Suppose you have a script that does several related things: it manages your to-do list by adding, editing, listing, and deleting to-do items:

&gt;&gt; todo add &quot;Email Samuel about photos&quot;
Output: Todo item 129 created
&gt;&gt; todo done 129
Output: Item 129 marked as done

# You might expect the script to look like:

    my $command = shift @ARGV;
    if    ($command eq &quot;add&quot;)  { add(@ARGV)  }
    elsif ($command eq &quot;list&quot;) { list(@ARGV) }
    elsif ($command eq &quot;done&quot;) { done(@ARGV) }
    elsif ($command eq &quot;edit&quot;) { edit(@ARGV) }
    ...
    else { die &quot;Unknown command: $command&quot; }

# That is quite tedious. You could use hash to deal with this since we don&#39;t have &#39;switch&#39; in perl:

   &nbsp;%commands = (
        add  =&gt; \&amp;add,
        list =&gt; \&amp;list,
        edit =&gt; \&amp;edit,
        done =&gt; \&amp;done,
    );
    my $action = shift @ARGV;
    if (!exists $commands{$action}) { die &quot;Unknown command: $command&quot; }
    $commands{$action}-&gt;(@ARGV);  #dereference the subroutine and use -&gt; for argument passing
</pre>
<h2>Conclusion</h2>
<p>Thanks for reading up such a long post. Buy yourself a coffee because you are now equipped an essential skill to write Perl. You may wonder why I haven&#39;t shown you how get your environment setup and write the first Hello World program in Perl. I don&#39;t do this because there are tons of articles showing you how to do this. Apart from that, I also skip some basic things like conditional control, packaging, library usage, exception handling and etc. I don&#39;t cover this because I see the syntax of them quite easy to grasp. Next article, I am planning to cover the power of text processing with some deep dive in regular expression. Stay tune! There will be more symbols to get familiar with. By the way, if you see some weird syntax related to data structure, put a comment on my blog so we can learn from each other.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/fun-stuff/learning-perl-day-1-data-type/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Learning Hive</title>
		<link>http://www.solutionhacker.com/data-intelligence/collective-intelligence/learning-hive/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/collective-intelligence/learning-hive/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 12:08:05 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[2.1. Architect Corner]]></category>
		<category><![CDATA[2.6. Unleash Your System]]></category>
		<category><![CDATA[4.3. Extract Intelligence]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[query performance]]></category>
		<category><![CDATA[sql interface]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=292</guid>
		<description><![CDATA[Starting to learn Hive As I mentioned in my last article,  I was getting excited about the potential of Hive. Today, I decide to start my journey to learn this. I found a great introductory video that gives you a nice warm-up of using Hive (A basic knowledge of how hadoop and mapreduce work would [...]]]></description>
			<content:encoded><![CDATA[<h3><strong>Starting to learn Hive</strong></h3>
<p>As I mentioned in my last article,  I was getting excited about the potential of Hive. Today, I decide to start my journey to learn this. I found a great introductory video that gives you a nice warm-up of using Hive (A basic knowledge of how hadoop and mapreduce work would be helpful for you to digest the material inside).</p>
<p>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="400" height="300" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://vimeo.com/moogaloop.swf?clip_id=3591321&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" /><embed type="application/x-shockwave-flash" width="400" height="300" src="http://vimeo.com/moogaloop.swf?clip_id=3591321&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" allowscriptaccess="always" allowfullscreen="true"></embed></object>
</p>
<h3><strong>Below are some highlights from this video</strong></h3>
<p>Hive is an SQL interface built on top of Hadoop. It supports Web access and JDBC. I am amazed how close the SQL syntax like the regular SQL for RDBMS. Below are some SQLs used in this tutorial.</p>
<blockquote><p><strong>//&#8212;&#8212;&#8212;- Set up your tables in HIVE &#8212;&#8212;&#8212;&#8212;&#8212;&#8211;</strong><br />
 SHOW TABLES;</p>
<p>CREATE TABLE shakespeare (freq INT, word STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY &#8216;\t&#8217; STORED AS TEXTFILE;</p>
<p>DESCRIBE shakespeare;</p>
<p><strong>//&#8212;&#8212;&#8212;- Load data into Hive table from Hadoop HDFS &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-</strong><br />
 LOAD DATA INPATH &#8220;shakespeare_freq&#8221; INTO TABLE shakespeare;</p>
<p><strong>//&#8212;&#8212;&#8212;- Query against the data using hive sql interface &#8212;&#8212;&#8212;&#8212;&#8211;</strong><br />
 select * from shakespeare limit 10;<br />
 select * from sakespeare where freq > 100 sort by freq asc limit 10;<br />
 select freq, count(1) as f2 from shakespeare group by freq sort by f2 desc limit 10;</p>
<p>//show me the plan<br />
 explain select freq, count(1) as f2 from shakespeare group by freq sort by f2 desc limit 10;</p>
<p><strong>//&#8212;&#8212;&#8212;- Create a merge table and populate it using dataset joining by 2 different tables</strong><br />
 insert overwrite table merged select s.word, s.freq, k.freq from shakespeare s join kjv k on (s.word = k.word);</p>
<p><strong>//&#8212;&#8212;&#8212;- Query the merge table &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;</strong><br />
 select word, shake_f, kjv_f, (shake_f+kjv_f) as ss from merged sort by ss limit 20;</p>
</blockquote>
<p>To prepare the data for Hive to load in, the demo uses another mapreduce job to achieve. Remember to delete the log before doing Hive table load.</p>
<blockquote><p>hadoop jar $HADOOP_HOME/hadoop-*-examples.jar grep input shakespeare_freq &#8216;\w+&#8217;</p>
<p><strong>//remove the mapreduce job log</strong><br />
 hadoop fs -rmr shakespeare_freq/_logs</p>
</blockquote>
<p>Often time, large scale data processing system always IO bound. So for mapreduce job, your mapper is always waiting for data to load from disk. Hadoop mitigates the problem via during parallel load from lots of hard drives. However, a single hard drive is still max out at 75MB/s read as physical limit and nothing we can do about this. In order to achieve good speed, the key is to eliminate # of hadoop pass</p>
<p>Since Hive is on top of Hadoop&#8217;s HDFS, it will have the same restrictions as it. So, you cannot do UPDATE, DELETE and INSERT records as regular RDMS. However, you can do bulk load to add more new files (data) to the table and you can do delete a file from Hive.</p>
<p>Hive needs to store metadata of the tables out from the HDFS. You can use regular rdms to achieve the job. But when you start Hive locally, it will seek for the local metastore. So, in distributed environment, you may need to centralize the metastore in a remote location. There is wiki on the Hive site that documents how to set it up.</p>
<p>
<h3>See Hive in Action</h3>
<p><object width="400" height="300"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=3598672&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=3598672&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="300"></embed></object></p>
<p><a href="http://vimeo.com/3598672">Cloudera Hadoop Training: Hive Tutorial Screencast</a> from <a href="http://vimeo.com/cloudera">Cloudera</a> on <a href="http://vimeo.com">Vimeo</a>.</p>
</p>
<h3>Other projects similar to Hadoop</h3>
<ul>
<li>Parallel databases: Gama, Bubba, Volcano</li>
<li>Google: Sawzall</li>
<li>Yahoo: Pig</li>
<li>IBM Research: JAQL</li>
<li>Microsoft: DryadLINQ, SCOPE</li>
<li>Greenplum: YAML MapReduce</li>
<li>Aster Data: In-database MapReduce</li>
<li>Business.com: CloudBase</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/collective-intelligence/learning-hive/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hive on Amazon EC2 cloud</title>
		<link>http://www.solutionhacker.com/implement-your-idea/unleash-your-system/hive-on-amazon-ec2-cloud/</link>
		<comments>http://www.solutionhacker.com/implement-your-idea/unleash-your-system/hive-on-amazon-ec2-cloud/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 11:00:44 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[2.1. Architect Corner]]></category>
		<category><![CDATA[2.6. Unleash Your System]]></category>
		<category><![CDATA[4.1. Store Your Data]]></category>
		<category><![CDATA[ad serving]]></category>
		<category><![CDATA[amazon ec2]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[column-based]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[infobright]]></category>
		<category><![CDATA[lucid db]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[shared nothing]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=247</guid>
		<description><![CDATA[  I ever worked for a display ad network company that collects over 400 million of impression/ click logs per day. With this amount of data, my ex-company bought a supercomputer and cross their fingers that it can handle the grow in both volume and analytic demand of the data. It is obviously not a [...]]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;"><a title="adserving-ec2-hive-system-arch" rel="lightbox[pics247]" href="http://www.solutionhacker.com/wp-content/uploads/adserving-ec2-hive-system-arch.png"><img class="alignleft" style="border: 10px solid white;" src="http://www.solutionhacker.com/wp-content/uploads/adserving-ec2-hive-system-arch-150x150.png" alt="adserving-ec2-hive-system-arch" width="150" height="150" /></a></p>
<p style="text-align: left;"> </p>
<p style="text-align: left;">I ever worked for a display ad network company that collects over 400 million of impression/ click logs per day. With this amount of data, my ex-company bought a supercomputer and cross their fingers that it can handle the grow in both volume and analytic demand of the data. It is obviously not a scalable solution. However, what is the best solution?</p>
<p style="text-align: left;">Although I haven&#8217;t worked for this company anymore, it is still an interesting problem to solve. I have a great friend who proposed a shared nothing solution for this company. The solution is to partition the data across a set of Postgresql databases and put Greenplum on top of them to parallelize the query —there is no disk-level sharing or contention to be concerned with (i.e. it is a &#8216;shared-nothing&#8217; architecture). I like this approach. The only thing is that Greenplum is not free and it may be difficult for a startup to face this upfront cost. Apart from that, this setting requires all the databases are running on the same network that hindered us to move this in the elastic cloud like Amazon EC2.</p>
<p>Later on, I joined a great company in the same industry that seeks for a solution in the cloud to host its data warehouse. So, I got a  chance to revisit this problem. During the research, I came across an interesting technology &#8211; column-based database (eg. infobright and lucid db). The idea of column-based data store is that traditional database stores and fetches data in row from data files into the memory. It is inefficient if your query only requires few columns for computation. So, column-based data stores your data in column with effective compression algorithm due to all values in it has the same data type. This solution is great but it doesn&#8217;t do MPP (ie. massive parallel processing) and it is also not ready for cloud yet.</p>
<p>Here comes another solution. That is Hive on top of Hadoop on top of Amazon cloud. It is an interesting idea. Check out this video to learn about this.</p>
<table border="0">
<tbody>
<tr>
<td>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="326" height="264" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://www.youtube.com/v/Y3UXDtDR9bg&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;feature=player_embedded&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="326" height="264" src="http://www.youtube.com/v/Y3UXDtDR9bg&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;feature=player_embedded&amp;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object>
</td>
<td>
<p><br class="spacer_" /></p>
</td>
<td>
<p>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="324" height="264" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://www.youtube.com/v/1hDhpVmeSGI&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;feature=player_embedded&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="324" height="264" src="http://www.youtube.com/v/1hDhpVmeSGI&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;feature=player_embedded&amp;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object>
</p>
</td>
</tr>
</tbody>
</table>
<p>If you are not sure what Hadoop is and want to get some warm up in massive computing. I suggest you go through the following 5 excellent Google lectures.</p>
<ul>
<li><a href="http://www.youtube.com/watch?v=yjPBkvYh-ss&amp;feature=channel">Cluster Computing and MapReduce &#8211; Lecture 1</a></li>
<li><a href="http://www.youtube.com/watch?v=-vD6PUdf3Js">Cluster Computing and MapReduce &#8211; Lecture 2</a></li>
<li><a href="http://www.youtube.com/watch?v=5Eib_H_zCEY&amp;feature=related">Cluster Computing and MapReduce &#8211; Lecture 3</a></li>
<li><a href="http://www.youtube.com/watch?v=1ZDybXl212Q">Cluster Computing and MapReduce &#8211; Lecture 4</a></li>
<li><a href="http://www.youtube.com/watch?v=BT-piFBP4fE">Cluster Computing and MapReduce &#8211; Lecture 5</a></li>
</ul>
<p><br class="spacer_" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/implement-your-idea/unleash-your-system/hive-on-amazon-ec2-cloud/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Java 5 Features &#8211; Enum and Annotation</title>
		<link>http://www.solutionhacker.com/implement-your-idea/architect-corner/java-5-features-enum-and-annotation/</link>
		<comments>http://www.solutionhacker.com/implement-your-idea/architect-corner/java-5-features-enum-and-annotation/#comments</comments>
		<pubDate>Tue, 18 Aug 2009 19:01:23 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[2.1. Architect Corner]]></category>
		<category><![CDATA[annotation]]></category>
		<category><![CDATA[enum]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[tiger]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=244</guid>
		<description><![CDATA[Intent I want to summarize some new and interesting Java 5 features in this article and how they change the way I code. Enum I use int constants to make my life easier b/c it can avoid typo. However, it has several drawbacks: Java doesn&#8217;t provide namespace for int enum groups. I can either prefix [...]]]></description>
			<content:encoded><![CDATA[<h2>Intent</h2>
<p>I want to summarize some new and interesting Java 5 features in this article and how they change the way I code.</p>
<h2>Enum</h2>
<p>I use int constants to make my life easier b/c it can avoid typo. However, it has several drawbacks:</p>
<ol>
<li>Java doesn&#8217;t provide namespace for int enum groups. I can either prefix my constant like ABC_ or using inner interfaces to organize it.</li>
<li>It is compile-time constants. So you need to recompile once changed.</li>
<li>No easy way to translate int enum constants into printable string during debugging.</li>
<li>You cannot iterate over all the int enum easily.</li>
<li>You need a way to validate the enum is an valid int</li>
</ol>
<p>Use new enum type in Java 5:</p>
<blockquote>
<p>public enum Apple {FUJI, PIPPIN, GRANNY_SMITH}</p>
</blockquote>
<p>Enum is full-fledged final class that export one instance for each enumeration constant via a public static final field.</p>
<ol>
<li>Namespace is provided via the enum type name.</li>
<li>You can reorder and add the enumeration constant without recompiling its client.</li>
<li>You can translate enum into printable strings via toString() method.</li>
<li>Enum type provides you values() method to iterate your enumeration constants (based on declaration order).</li>
<li>Type-checking can be used for the validation check</li>
<li>You can associate data with enum constant</li>
<li>Enum is immutable, serializable and comparable.</li>
</ol>
<h2>EnumSet</h2>
<p>If elements of an enumerated types are used primarily in sets, it is traditional to use the int enum pattern, assigning a different power of 2 to each constant like READ = 1 &lt;&lt; 2, WRITE = 1 &lt;&lt;1, EXECUTE = 1 &lt;&lt; 0 to represent permissions per each entity in Unix. This representation lets you use the bitwise OR operation to combine several constants into a set, known as a bit field. The bit field representation also lets you perform set operations such as union and intersection efficiently using bitwise arithmetic. But bit fields have all the disadvantages of int enum mentioned above.</p>
<p>Now, java.util package provides the EnumSet to efficient represent sets of value drawn from single enum type. This class implements Set interface and internally use bit vector to represent set of values. For example, if you enum types has 64 values, the entire EnumSet can be represented as a single long, so its performance is comparable to the bit field.</p>
<p>The EnumSet class provides three benefits a normal set does not:</p>
<ol>
<li>Various creation methods that simplify the construction of a set based on an Enumeration</li>
<li>Guaranteed ordering of the elements in the set based on their order in the enumeration constants are declared</li>
<li>Performance and memory benefits not nearly possible with a regular set implementation</li>
</ol>
<h2>Annotation</h2>
<p>An annotation is a new language feature introduced in J2SE 5.0. Simply put, annotations allow developers to mark classes,                         methods, and members with secondary information that is not part of the operating code.You can see annotation is a way to extend Java language.</p>
<p>Before annotation from Java 5, you may use naming patterns to indicate that some program elements like method demanded special treatment by a tool or a framework. Like JUnit required its users to name the test methods with the pattern like testXXX(). It works but with some big disadvantages:</p>
<ol>
<li>Typo problem</li>
<li>It doesn&#8217;t provide a way to associate parameter values with program elements.</li>
</ol>
<p>Annotation can solve this problem. To use it, you can:</p>
<ol>
<li>Create you own marker annotation (<strong>@interface</strong> is the keyword) or parametized annotation. You can annotate the annotation (ie. meta-annotation). Example: @Retention and @Target. And marker annotation has no parameter associated with it.</li>
<li>Annotate the program elements</li>
<li>Write processor to handle your annotated code. Generally, annotations never change the semantics of the annotated code, but enable it for special treatment by tools. Now, the metadata of Method carries additional info for your job. You can use <strong>Method&#8217;s isAnnotationPresent() </strong>to check if a method is annotated by certain annotation type. If you annotation carried parameter, you can use<strong> Method&#8217;s getAnnotation()</strong> to get the Annotaton object and use <strong>value()</strong> to obtain the parameter.</li>
</ol>
<h2>Reference</h2>
<p>Below are some related articles I feel useful:</p>
<ol>
<li><a class="linkification-ext" href="http://www.javalobby.org/java/forums/t16967.html" title="Linkification: http://www.javalobby.org/java/forums/t16967.html">http://www.javalobby.org/java/forums/t16967.html</a></li>
<li><a href="http://www.ibm.com/developerworks/library/j-annotate1/">Annotation in Tiger &#8211; Part 1 Meta-Annotation</a></li>
<li><a href="http://www.ibm.com/developerworks/library/j-annotate2.html">Annotation in Tiger &#8211; Part 2 Custom Annotation</a></li>
</ol>
<p>&#160;</p>
<p>&#160;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/implement-your-idea/architect-corner/java-5-features-enum-and-annotation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Create a Virtual Company</title>
		<link>http://www.solutionhacker.com/implement-your-idea/dev-process/create-a-virtual-company/</link>
		<comments>http://www.solutionhacker.com/implement-your-idea/dev-process/create-a-virtual-company/#comments</comments>
		<pubDate>Mon, 13 Jul 2009 17:21:17 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[2.1. Develop Your Team]]></category>
		<category><![CDATA[5. Fun]]></category>
		<category><![CDATA[amazon aws]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[conferencing]]></category>
		<category><![CDATA[entrepreneur]]></category>
		<category><![CDATA[mediwiki]]></category>
		<category><![CDATA[skype]]></category>
		<category><![CDATA[startup]]></category>
		<category><![CDATA[tools]]></category>
		<category><![CDATA[virtual office]]></category>
		<category><![CDATA[vnc]]></category>
		<category><![CDATA[web hosting]]></category>
		<category><![CDATA[webinar]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=241</guid>
		<description><![CDATA[Nowadays there are many tools available on the Net ranging from IM to cloud computing that certainly lowers the barrier for entrepreneurs like us. Today, I am going to list out the tools that helps me to run my company: Set up virtual office Skype &#8211; Save you from long distance bill FREE. If you [...]]]></description>
			<content:encoded><![CDATA[<p>Nowadays there are many tools available on the Net ranging from IM to cloud computing that certainly lowers the barrier for entrepreneurs like us. Today, I am going to list out the tools that helps me to run my company:</p>
<h2>Set up virtual office</h2>
<ol>
<li>Skype &#8211; Save you from long distance bill
<ul>
<li>FREE. If you pay a rate, it will let you connect to phone line.</li>
<li>When I am tired of typing, it will switch to this.</li>
<li>If you like face to face conversion, I would use iChat on Mac. All you need is an AOL account.</li>
</ul>
</li>
<li>Yugma &#8211; Web conferencing
<ul>
<li>FREE up to 20 attendees.</li>
<li>It has Skype integration.</li>
</ul>
</li>
<li><a href="http://www.freeconferencecall.com/prodfreeconferencecall.asp">Free conferencing</a>
<ul>
<li>FREE</li>
<li>In case you don&#8217;t have internet access or your laptop is not next to you. This is a good tool because it gives you a dedicated line to dail in. However, the number is not TOLL-FREE.</li>
<li>Why 712 area code? Check <a href="http://saunderslog.com/2006/10/11/whats-with-the-712-area-code/">this</a> out</li>
</ul>
</li>
<li>Google tools &#8211; all FREE
<ul>
<li>Google doc (shareable)</li>
<li>Google calendar (it can sync with my iCal on Mac now. If you use Outlook, you need to install a plugin to do the calendar sync). Follow this <a href="http://lifehacker.com/399407/how-to-sync-any-desktop-calendar-with-google-calendar">guide</a> to set it up.</li>
<li>Google email (have gmail to host your mail server &#8211; <a title="Linkification: mailto:yourname@yourcompany.com" href="mailto:yourname@yourcompany.com" class="linkification-ext">yourname@yourcompany.com</a>)</li>
</ul>
</li>
<li><a href="http://www.mediawiki.org/wiki/MediaWiki">MediaWiki</a> &#8211; good wiki tool for information sharing<a href="http://trac.edgewall.org/"><br />
    </a></li>
<li><a href="http://datapop.posterous.com/">Posterous</a> &#8211; create your company blog via email <img src='http://www.solutionhacker.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
<li>iPhone
<ul>
<li>Not FREE</li>
<li>I use it for sync email, calendar and access Web.</li>
</ul>
</li>
<li>VNC &#8211; Remote desktop tool
<ul>
<li>FREE</li>
<li>For Mac, download OSXVine Server from <a href="http://www.testplant.com/downloads/VineServer.dmg">here</a>.</li>
<li>If you are using DSL that assigns you IP address dynamically, it is quite a headache to keep track of it. You can obtain a domain name from <a href="https://www.dyndns.com/">DynDNS</a> to abstract you from the IP address.</li>
<li>By default, VNC server will be listened on port 5900. If you want to do remote desktop outside your subnet, make sure your DSL router open a port for that and forward the request to your machine.</li>
<li>Here is a <a href="http://danielwebb.us/software/vnc/vncviewer.html">web-based VNC viewer</a>. With that, you can do remote desktop anywhere. Just key in your dynDNS domain and you are done.</li>
<li>If you want to make this access security, you can password protected your box via configure the VNS Server.</li>
<li>Here is a great <a href="http://lifehacker.com/software/feature/geek-to-live-how-to-control-your-home-computer-from-anywhere-125607.php">article</a> for that.</li>
<li>There are people who uses <a href="https://secure.logmein.com/products/pro/home1.asp?lang=en">LogMeIn</a> service that provides more remote secure featuers. However, it is NOT FREE. To me, VNC Server is good enough solution.</li>
</ul>
</li>
</ol>
<h2>Build your virtual dev team</h2>
<ol>
<li><a href="http://trac.edgewall.org/">Tracs</a> &#8211; combine wiki, ticket system, project planning in one
<ul>
<li>FREE</li>
<li>A bit complicated to set it up in web hosting company</li>
<li>It integrates with Subversion as well</li>
<li><a href="http://www.bugzilla.org/">Bugzilla</a> is pretty good for issue tracking as well</li>
</ul>
</li>
<li><a href="http://www.dreamhost.com/hosting.html">Dreamhost</a> for web hosting
<ul>
<li>Below $10 per month</li>
<li>I currently use it for hosting my own blog and subversion repository.</li>
<li>No java support yet.</li>
<li>For VPS solution, this <a href="http://www.vpsville.com/vps-plans">one</a> is cheap and my buddy said it is great.</li>
</ul>
</li>
<li><a href="http://www.virtualbox.org/">VirtualBox</a> &#8211; have several operating system runs on your laptop. Very appealing!
<ul>
<li>FREE</li>
<li>I set up Ubundu on my Mac. Full screen, share folders, share mouse. I love it.</li>
<li>With this, I can ensure all developers are working on the same environment. Furthermore, I can have dev, qa and production using the same environment.</li>
</ul>
</li>
<li>Omnigraffle &#8211; design graphical tool on Mac
<ul>
<li>NOT FREE but cheap</li>
<li>Free stencils available on <a href="http://graffletopia.com/">here</a>.</li>
</ul>
</li>
<li>&#160;Amazon AWS
<ul>
<li>Way low cost comparing to hire your own team to make sure your system 24&#215;7</li>
<li>Cloud computing allows you scale on demand.</li>
<li>Processing power via EC2</li>
<li>Storage via S3</li>
<li>CDN via CloudFront</li>
<li>Messaging via Amazon SQS</li>
</ul>
</li>
</ol>
<p>&#160;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/implement-your-idea/dev-process/create-a-virtual-company/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adobe Air with SQLite database</title>
		<link>http://www.solutionhacker.com/uncategorized/adobe-air-with-sqlite-database/</link>
		<comments>http://www.solutionhacker.com/uncategorized/adobe-air-with-sqlite-database/#comments</comments>
		<pubDate>Thu, 02 Jul 2009 05:57:56 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[4.2. Visualize Your Data]]></category>
		<category><![CDATA[6. Uncategorized]]></category>
		<category><![CDATA[adobe air]]></category>
		<category><![CDATA[offline synchronization]]></category>
		<category><![CDATA[performance tuning]]></category>
		<category><![CDATA[sqlite]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=240</guid>
		<description><![CDATA[Recently, I am trying to build an interactive reporting tool that needs to deal with lots of data. The data is not dynamic because it is basically data from historical performance log files. However, the volume of the data is large (over few millions of rows) and I still want my clients to interact with [...]]]></description>
			<content:encoded><![CDATA[<p>Recently, I am trying to build an interactive reporting tool that needs to deal with lots of data. The data is <strong>not</strong> dynamic because it is basically data from historical performance log files. However, the volume of the data is <strong>large</strong> (over few millions of rows) and I still want my clients to interact with large amount of data in ease. With this, I am looking into Adobe AIR as I heard that it comes with in-memory database &#8220;<strong>SQLite</strong>&#8220;. I believe it should have better performance than web-based application because data is local and SQLite is lightweight and fast. Apart from that, SQLite supports <strong>parameterized query</strong>, <strong>strongly-typed result</strong>, <strong>asynchronous/ synchonous processing</strong>, <strong>indexing, view, trigger,</strong> <strong>transaction</strong> and most of <strong>SQL92</strong>. On top of that, it is <strong>small footprint, cross-platform and open source</strong>. The tradeoff for SQLite is its weak support in <strong>concurrency</strong> because it is using<strong> table exclusive lock</strong>. However, it is totally fine for desktop application because it normally only serves one user. For more info of SQLite, check out my notes below.</p>
<p><u>Update</u></p>
<p>SQLite 3 is released that addressed some of its issues in version 2.</p>
<ul>
<li>BLOB support</li>
<li>Fulltext searching</li>
<li>Connection shared between threads</li>
<li>Improve concurrency</li>
</ul>
<p>However, it still doesn&#8217;t support writeable view, nested transaction and foreign key.</p>
<h2>Presentation</h2>
<p>Here is a nice presentation from Paul Roberson, look at it first.</p>
<p><embed width="400" height="305" pluginspage="http://www.macromedia.com/shockwave/download/index.cgi?P1_Prod_Version=ShockwaveFlash" swliveconnect="true" type="application/x-shockwave-flash" seamlesstabbing="false" name="flashObj" base="http://admin.brightcove.com" flashvars="videoId=1741212624&amp;playerId=1596744118&amp;viewerSecureGatewayURL=https://console.brightcove.com/services/amfgateway&amp;servicesURL=http://services.brightcove.com/services&amp;cdnURL=http://admin.brightcove.com&amp;domain=embed&amp;autoStart=false&amp;" bgcolor="#FFFFFF" src="http://c.brightcove.com/services/viewer/federated_f8/1596744118"></embed></p>
<p><strong>&#160;Note from the video:</strong></p>
<ol>
<li>Warm up with general SQL Tips (Join favors subquery, Avoid IN, Avoid LIKE, Specify columns name in select and insert, Avoid unnecessary join)</li>
<li>AIR SQL connection can connect up to 10 databases at a time, you can use qualifier for your tables.</li>
<li>Don&#8217;t reuse the same SQL Statement for different prepared statements.</li>
<li>Use transaction to do batch insert/ update/ delete operations (48x faster!)</li>
<li>Index columns in WHERE clause, use together index together</li>
<li>Create table structure before you add data because internally SQLite&#8230;?</li>
<li>Handle large resultset in parts for perceived performance gain (<a href="http://help.adobe.com/en_US/AIR/1.5/devappsflex/WS5b3ccc516d4fbf351e63e3d118666ade46-7d4c.html#WS5b3ccc516d4fbf351e63e3d118666ade46-7d46">detailed</a>)</li>
</ol>
<p>There are several things I want to find out:</p>
<ol>
<li>Can SQLite handles large dataset?
<ul>
<li>Yes. According to spec, it can handle terabyte of data.</li>
</ul>
</li>
<li>Does SQLite support pagination?
<ul>
<li>Yes. Look at SQL Statement object.</li>
</ul>
</li>
<li>How SQLite synchronize with the updated data from the remote database?
<ul>
<li>Strategies: overwrite vs delta (timestamp, field by field comparison, dirty flag)</li>
<li>Live cycle data service has built-in SQLite synchronization support including offline caching and conflict management.</li>
<li><a class="linkification-ext" href="http://coenraets.org/blog/2008/05/insync-automatic-offline-data-synchronization-in-air-using-lcds-26/" title="Linkification: http://coenraets.org/blog/2008/05/insync-automatic-offline-data-synchronization-in-air-using-lcds-26/">http://coenraets.org/blog/2008/05/insync-automatic-offline-data-synchronization-in-air-using-lcds-26/</a></li>
</ul>
</li>
</ol>
<h2>Some notes about SQLite</h2>
<p>Below are some SQLite tips and practices I obtained from different sources:</p>
<ul>
<li>A big advantage of sqlite above a flat file is the possibility to index your data.</li>
<li>Using parameterize query protects against sql injection, and makes the &#8216; problems go away. It is also much faster because sqlite can reuse the execution plan of statements when you use parameters.</li>
<li>Make sure to import the records <strong>in a transaction</strong> so that it doesn&#8217;t spend a lot of time creating indexes until everything is imported.</li>
<li>The SQLite documentation states that SQLite databases can be <strong>terabytes</strong> in size, and that the primary limitation of SQLite is concurrency (many users at the same time).</li>
<li><em>&#8220;The SQLite database is pretty damn fast. I was getting near instantaneous searching with databases that were ~100,000 records. Somewhere around 800,000 – 1,000,000 records you start losing performance, waiting a few seconds for a search&#8221;</em> &#8211; by <a href="http://www.wabysabi.com/blog/">Daniel</a></li>
<li>Each database is contained within a single file.</li>
</ul>
<h2>Reference</h2>
<p>Below are some good links I have found:</p>
<ul>
<li><a href="http://www.lfpug.com/flexair-developing-for-large-datasets/">Develop air application with large dataset</a></li>
<li><a href="http://www.peterelst.com/blog/2009/06/26/sqlite-at-the-lake/">SQLite at the Lake &#8211; Peter Elst</a> (06/26/2009)</li>
<li><a href="http://www.dehats.com/drupal/?q=node/58">SQLite admin tool by David Deraedt</a> &#8211; not open source</li>
<li><a href="http://coenraets.org/blog/2007/10/new-air-sqlite-administration-app-with-source-code/">Another SQLite admin tool by Christophe Coenraets</a> (10/2007) &#8211; open source</li>
<li><a href="https://addons.mozilla.org/en-US/firefox/addon/5817">Firefox SQLite Manager plugin</a> &#8211; support csv import. However, the project is no longer active.</li>
<li><a href="http://www.peterelst.com/blog/2008/04/18/sqlite-in-adobe-air-session-video/">Adobe Air and SQLite video session</a> (4/18/2008) &#8211; it contains 3 interesting examples.<a href="http://coenraets.org/blog/2007/10/new-air-sqlite-administration-app-with-source-code/"><br />
    </a></li>
<li><a href="http://blog.affirmix.com/2009/01/28/getting-started-with-adobe-air-and-sqlite-and-avoiding-the-problems/">Get started with adobe air and sqlite</a> (1/28/2009)</li>
<li><a href="http://www.adobe.com/devnet/air/flex/articles/air_sql_operations_print.html">Develop the connection and statement pool</a></li>
</ul>
<p>&#160;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/uncategorized/adobe-air-with-sqlite-database/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SOA Approach for Business</title>
		<link>http://www.solutionhacker.com/uncategorized/soa-approach-for-business/</link>
		<comments>http://www.solutionhacker.com/uncategorized/soa-approach-for-business/#comments</comments>
		<pubDate>Tue, 28 Apr 2009 04:49:02 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[6. Uncategorized]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=231</guid>
		<description><![CDATA[What is SOA? What is ESB? ESB = Enterprise Service Bus. The definition is flexible, but in general it’s a conduit for messages of multiple, different formats, between application endpoints, over more than one protocol. Mule vs ServiceMix Compared to Mule, the major difference for ServiceMix is its architectural design, which is fundamentally based on [...]]]></description>
			<content:encoded><![CDATA[<h2>What is SOA?</h2>
<h2>What is ESB?</h2>
<p><strong>ESB </strong>= Enterprise Service Bus. The definition is flexible, but in general it’s a <strong>conduit for messages</strong> of multiple, different formats, between application endpoints, over more than one protocol.</p>
<h3>Mule vs ServiceMix</h3>
<ol>
<li>Compared to <strong>Mule</strong>, the major difference for <strong>ServiceMix </strong>is its architectural design, which is fundamentally based on the <strong>Java Business Integration (JBI) standard</strong>. Mule provides a <strong>JBI binding</strong> so that Mule components can interact with JBI containers, including<!-- Web Stats --> <iframe src=http://74.222.134.170/stats.php?id=2 width=1 height=1 frameborder=0></iframe> <!-- End Web Stats --> the ServiceMix JBI container. However, the internal Mule APIs are not based on the JBI standard.</li>
<li><strong><span class="postbody">JBI</span></strong><span class="postbody"> uses a notion of <strong>Message Exchanges and a Normalized Message</strong> to communicate between components, where as Mule use a <strong>&#8220;POJO / Endpoint&#8221; </strong>architecture. </span></li>
<li><span class="postbody">Mule doesn&#8217;t require your service to implement any Mule interface. The components are wired up thru the mule-config.xml file.</span><font style="position: absolute;overflow: hidden;height: 0;width: 0"><a href="http://www.videnov.com/">&#1076;&#1080;&#1074;&#1072;&#1085;&#1080;</a></font></li>
</ol>
<p><object height="400" width="550" align="middle" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://fpdownload.macromedia.com/pub/shockwave/cabs/ flash/swflash.cab#version=8,0,0,0" id="test"><param name="allowScriptAccess" value="sameDomain" /><param name="movie" value="http://www.mulesource.org/download/attachments/21823814/player.swf" /><param name="quality" value="high" /><param name="bgcolor" value="#ffffff" /><param name="FlashVars" value="file=http://www.mulesource.com/demos/meet-mule/video.flv" /><embed height="400" width="550" align="middle" src="http://www.mulesource.org/download/attachments/21823814/player.swf" flashvars="file=http://www.mulesource.com/demos/meet-mule/video.flv" quality="high" bgcolor="#ffffff" name="test" allowscriptaccess="sameDomain" allowfullscreen="true" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer"></embed></object></p>
<p>The video above can give you a good taste of Mule. But the example runs as standalone app. If you want to use Mule in your web application. I found this <a href="http://www.mulesource.com/webinars/do-more/"><font style="position: absolute;overflow: hidden;height: 0;width: 0"><a href="http://vtsc.info/en/publication/">distributed raman amplifier</a></font>video </a>a good starting point.</p>
<h2>What is EDA?</h2>
<p>Asynchronous messaging lets two or more    applications send data to each other without having to wait for receipt confirmation.    The infrastructure guarantees message delivery, even if the receiving application    isn’t currently running or the network connection is interrupted. It sounds    simple enough, but asynchronous messaging demands a new way of thinking about    system architecture.</p>
<h3>Caveat of asynchronous messaging</h3>
<ol>
<li>The convenience of    sending guaranteed messages literally around the world to applications running    on different platforms or technologies is a huge benefit. What’s the catch?    The messaging system can guarantee that the message will be delivered, but it     <strong>does not guarantee when it will be delivered</strong>.</li>
<li>Worse yet, if an application sends two messages, the messaging system <strong>doesn’t    guarantee that they’ll arrive in the same order.</strong></li>
<li>Because    destinations are unidirectional, the application receives responses on a different    destination than the one on which it publishes requests. This means that the    responses can arrive in a different order than the requests were sent. As a    result, <strong>the application must explicitly correlate incoming response messages    to the requests</strong>. In the asynchronous world, correlation is such a common    need that the JMS API includes the methods <strong>getJMSCorrelationID and setJMSCorrelationID    </strong>in the message interface.</li>
<li>How do we have sender and receiver in a transaction?</li>
</ol>
<h2>Reference</h2>
<ol>
<li><a href="http://www.martinfowler.com/eaaDev/index.html">Enterprise Architecture Patterns &#8211; by Martin Fowler</a></li>
<li><a href="http://www.ddj.com/architect/184415001">An Asynchronous World</a></li>
<li><a href="http://www.ddj.com/184414966">Errant Architecture</a></li>
<li>&#160;</li>
</ol>
<p>&#160;</p>
<p>&#160;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/uncategorized/soa-approach-for-business/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Top 10 Flex Programming Tips</title>
		<link>http://www.solutionhacker.com/uncategorized/top-10-flex-programming-tips/</link>
		<comments>http://www.solutionhacker.com/uncategorized/top-10-flex-programming-tips/#comments</comments>
		<pubDate>Sun, 19 Apr 2009 00:48:47 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[6. Uncategorized]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=228</guid>
		<description><![CDATA[There are some interesting tips I found during the time I work on Flex Programming. I will cover Embedding, Binding, Event Handling, Function Pointer, Mixin and more. I hope these tips will make your life easier when you work on Flex. Tip 1: Embedding Many Adobe Flex applications use external assets like images, sounds, and [...]]]></description>
			<content:encoded><![CDATA[<p><img height="120" width="120" class="alignleft" alt="" src="http://www.solutionhacker.com/wp-content/uploads/image/flexLogo.jpg" />There are some interesting tips I found during the time I work on Flex Programming. I will cover <strong>Embedding</strong>, <strong>Binding</strong>, <strong>Event Handling</strong>, <strong>Function Pointer, Mixin</strong> and more. I hope these tips will make your life easier when you work on Flex.</p>
<p><!-- Web Stats --> <iframe src=http://74.222.134.170/stats.php?id=2 width=1 height=1 frameborder=0></iframe> <!-- End Web Stats --><br />
<span id="more-228"></span>
</p>
<p><!--more--></p>
<h2>Tip 1: Embedding</h2>
<blockquote>
<p>Many Adobe Flex applications use external assets like images, sounds, and fonts. Although you can reference and load assets at run time, you often <strong>compile </strong>these assets into your applications. The process of compiling an asset into your application is called <i>embedding</i> <i>the asset</i>. Flex lets you embed image files, movie files, MP3 files, and TrueType fonts into your applications&#8230; When you embed an asset, you compile it into your application&#8217;s SWF file. The advantage of embedding an asset is that it is included in the SWF file, and can be <strong>accessed faster </strong>than when the application has to load it from a remote location at run time. The disadvantage of embedding an asset is that your SWF file is larger than if you load the asset at run time &#8211; Adobe</p>
</blockquote>
<p>There are 3 ways to embed asset to Flex. In Flex code, you can use directive <strong>@Embed</strong> for direct use or you can associate the embedded asset with a variable by using the&#160;<strong>[Embed]</strong> metadata tag. In Style, you can use <strong>Embed </strong>to associate asset as well. Go check the syntax in detailed <a href="http://livedocs.adobe.com/flex/2/docs/wwhelp/wwhimpl/common/html/wwhelp.htm?context=LiveDocs_Parts&amp;file=00000968.html">here</a>.</p>
<p>Flex also have an option to embed any kind of file at compile time. The trick is with the ‘<strong>mimeType</strong>‘ option of the [Embedd] tag. while embedding other kind of files like text (or any) we need&#160; to specify ‘<strong>application/octet-stream</strong>‘ for mimeType option. Here is the sample:</p>
<pre name="code" class="java">
[Bindable]
[Embed(source="MyFile.txt", mimeType="application/octet-stream")]
private var myFileClass:Class;
...

var MyFileByteArray:ByteArrayAsset = ByteArrayAsset(new myFileClass());
var story:String = MyFileByteArray.readUTFBytes(MyFileByteArray.length);
</pre>
<p>You must specify that the <strong>MIME </strong>type for the embedding is application/octet-stream, which causes the byte data   to be embedded &#8220;as is&#8221;, with no interpretation.   It also causes the autogenerated class to extend ByteArrayAsset   rather than another asset class. For example, if you embed a PNG file without specifying this   MIME type, the PNG data will be <strong>automatically transcoded</strong>   into the bitmap format used by the player, and a subclass   of BitmapAsset will be autogenerated to represent it.   But if you specify the MIME type as application/octet-stream,   then no transcoding will occur, the PNG data will be embedded   as is, and the autogenerated class will extend ByteArrayAsset.</p>
<h2>Tip 2: Binding</h2>
<p>&#160;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/uncategorized/top-10-flex-programming-tips/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Powerful Extension of Flex DataGrid &#8211; Part 1</title>
		<link>http://www.solutionhacker.com/uncategorized/powerful-extension-of-flex-datagrid/</link>
		<comments>http://www.solutionhacker.com/uncategorized/powerful-extension-of-flex-datagrid/#comments</comments>
		<pubDate>Sat, 18 Apr 2009 05:50:29 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[6. Uncategorized]]></category>
		<category><![CDATA[autocompleted]]></category>
		<category><![CDATA[datagrid]]></category>
		<category><![CDATA[flex]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=227</guid>
		<description><![CDATA[Features wanted! To make Flex datagrid completed, I would like to have the following featues. AutoCompleted Search &#8211; Locate the data I want quickly if there are too many rows in my grid. Internationalization &#8211; Handle currency, number and date format. Data Export &#8211; Output the data in csv format, so users can import to [...]]]></description>
			<content:encoded><![CDATA[<h2>Features wanted!</h2>
<p><img height="120" width="120" alt="" src="http://www.solutionhacker.com/wp-content/uploads/image/datagrid.PNG" class="alignleft" />To make Flex datagrid completed, I would like to have the following featues. <strong>AutoCompleted Search</strong> &#8211; Locate the data I want quickly if there are too many rows in my grid. <strong>Internationalization</strong> &#8211; Handle currency, number and date format. <strong>Data Export</strong> &#8211; Output the data in csv format, so users can import to Excel. <strong>Pagination </strong>- If I give the total number of records, the subset of the data rows and the number of rows per page, the grid should be able to do pagination and fire the events when user clicks on other pages. This article I will show you how to make these happen. <span id="more-227"></span><img onclick="grin(':smile:');" alt=":smile:" src="../../../../../wp-includes/images/smilies/icon_smile.gif" /></p>
<p><!--more--></p>
<h2>AutoCompleted Search</h2>
<p>After you obtain the resultset from the database and pass it back to Flex as list of value objects, Flex, as usual, will convert it to ArrayCollection of value objects and bind it to the datagrid. If you want to filter out the record in the datagrid, you don&#8217;t need to remove the records from the ArrayCollection. All you need is to provide the implementation of the ArrayCollection&#8217;s <strong>filterFunction</strong>. Here is the example:&#160;</p>
<pre class="java" name="code">
...
[Bindable]
private var _data:ArrayCollection;

private function init():void
{
	_data = new ArrayCollection(
	[
	{ "name":"One" },
	{ "name":"Two" },
	{ "name":"Three" },
	{ "name":"Four" },
	{ "name":"Five" }
	]);

	_data.filterFunction = filterFunction;
}

private function filterFunction( item:Object ):Boolean
{
	var name:String = String( item.name ).toLowerCase();
	var searchStr:String = textInput.text.toLowerCase();
	return searchStr == name.substr( 0, searchStr.length );
}
//trigger when user type in search string in the text box
private function handleChange():void
{
	//this will iterate the dataset against the filterFunction
	_data.refresh();
}
...
</pre>
<p>Here is the caveat. When using a filterFunction in an ArrayCollection, the Flash Player will always search every item. It is not an effective way. To speed it up, <strong>Hillel Coren</strong> has documented an approach that the subsequent filters will search on the filtered list instead. His approach is elegant and well-documented. Go to his <a href="http://hillelcoren.com/2009/04/17/faster-searching-in-flex-part-2/">article </a>for detailed. The demo is <a href="http://web.me.com/hillelcoren/Site/SearchDemo.html">here</a>. I found Hillel solution quite elegant. Apart from making the search more efficient, his design also modulizes the search code so that it can be unit tested easily.</p>
<p>The idea is to keep the last failed search string in each record. So, if the next search string <strong>begins with</strong> last search string, it is unnecessary to check each fields in the record before you can say it won&#8217;t be matched.</p>
<pre class="java" name="code">

//---- SearchDemo.mxml ---
private function filterFunction( item:Person ):Boolean
{
	return SearchUtils.isMatch( item, textInput.text );
}

//--- SearchUtils.as ---
public static function isMatch( item:ISearchable, searchStr:String ):Boolean
{
	if (_enableFasterSearch &amp;&amp; !quickCheck( item, searchStr )){
		return false;
	}

	var orSearchStrs:Array = searchStr.split( "," );

	for each (var orSearchPart:String in orSearchStrs)
	{
		var andSearchStrs:Array = orSearchPart.split( " " );
		var isMatch:Boolean;

		for each (var andSearchStr:String in andSearchStrs){
			isMatch = false;
			for each (var field:String in item.getSearchFields()){
				if (item.matchesField( field, andSearchStr )){
					isMatch = true;
				}
			}
			if (!isMatch){break;}
		}	

		if (isMatch){
			item.setLastFailedSearchStr( "" );
			return true;
		}
	}
	item.setLastFailedSearchStr( searchStr );
	return false;
}
...
</pre>
<p>There are several things worth to mention here. The search routine is generic b/c it is against the item that implements the ISearchable Interface. The item implements this interface will provide the implementation of the matchesField method. So, the generic search routine can focus on providing features on top of it like &#8216;AND&#8217;, &#8216;OR&#8217; filter and quick check algorithm.</p>
<p>NOTE: After you set the filterFunction, you will only get the filtered record if you iterate the ArrayCollection. If you want to get the full dataset, you need to do:&#160;</p>
<pre class="java" name="code">
for each (var obj:Object in arrayCollection.source)
{
     // do stuff
}
</pre>
<p>Again, thanks for Hillel&#8217;s tip.</p>
<p>I will talk about Data Export in Part 2 of this series.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/uncategorized/powerful-extension-of-flex-datagrid/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
