<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
xmlns:rawvoice="http://www.rawvoice.com/rawvoiceRssModule/"
>

<channel>
	<title>Solution Hacker &#187; Data Intelligence</title>
	<atom:link href="http://www.solutionhacker.com/category/data-intelligence/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.solutionhacker.com</link>
	<description>This blog provides solutions for enterpreneurs!</description>
	<lastBuildDate>Mon, 06 Feb 2012 07:19:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=375</generator>
<!-- podcast_generator="Blubrry PowerPress/2.0.4" -->
	<itunes:summary>This blog provides solutions for enterpreneurs!</itunes:summary>
	<itunes:author>Solution Hacker</itunes:author>
	<itunes:explicit>no</itunes:explicit>
	<itunes:image href="http://www.solutionhacker.com/wp-content/plugins/powerpress/itunes_default.jpg" />
	<itunes:subtitle>This blog provides solutions for enterpreneurs!</itunes:subtitle>
	<image>
		<title>Solution Hacker &#187; Data Intelligence</title>
		<url>http://www.solutionhacker.com/wp-content/plugins/powerpress/rss_default.jpg</url>
		<link>http://www.solutionhacker.com/category/data-intelligence/</link>
	</image>
		<item>
		<title>Flex Hacking Series Part 1 &#8211; Event Model</title>
		<link>http://www.solutionhacker.com/implement-your-idea/build-your-website/flex-hacking-series-part-1-event-model/</link>
		<comments>http://www.solutionhacker.com/implement-your-idea/build-your-website/flex-hacking-series-part-1-event-model/#comments</comments>
		<pubDate>Sat, 27 Nov 2010 20:14:35 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Report]]></category>
		<category><![CDATA[Site Building]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=447</guid>
		<description><![CDATA[<h2>Event Model</h2>
<h3>Event Flow</h3>
<p>The idea is simple. Here is the regular event flow: Users interact with the UI, event is generated, broadcast via the event dispatcher (bubbling up the display hierarchy if enabled) and captured by any registered listeners, and a set of actions is taken in response. To understand it a bit more in detailed, you can check <a href="http://www.adnandoric.com/2008/12/29/understanding-the-flex-event-propagation/">this article</a> and play with its <a href="http://www.adnandoric.com/wp-content/uploads/2008/12/flex/EventPropagation/EventPropagation.html">demo</a>. In short, under the hood, it has 3 phases: <strong>capturing</strong>, <strong>targeting</strong> and <strong>bubbling</strong>. In the display list, from the top, it always starts from <strong>Stage</strong> as the root, then <strong>SystemManager</strong>, then your <strong>Application</strong>. The event is created by the flash player and travel down from Stage to the target component and bubble it up back to Stage (if enabled). In this round trip, it will trigger the event listeners&#39;&#160; actions. Most components communicate with others using events which conveys useful information and data but only visual components (objects in the display list) can participate in the event flow described above.</p>
<p>in practice, you normally just need to worry about registering listener to the target component for a particular event propagated from it. You seldom need to understand the event flow above in detailed. However, if you want to exercise more control on the event like stopping the event from propagating, you need to understand how it works first.</p>
<h3>Cancel/ Stop the event</h3>
<p>Within Flex, by default, events <strong>only</strong> broadcast themselves to their parent component. If you want the event to broadcast to its parent&#39;s parent (and all the way up your component chain in the display object hierarchy), then you tell that event to <strong>bubble</strong>. If you don&#39;t want any component in chain cancels your event via <strong>stopPropagaton()</strong> or <strong>stopImmediatePropagation(),</strong> you can make it non-cancelable during event construction. The difference between stopPropagation and stopImmediatePropagation is that stopImmediatePropagation will not only prevent the event from moving to the next node, but it will also prevent any other listeners on that node from capturing their events.</p>
<p>Some events have an associated <strong>default</strong> behavior.<span> For example, the <strong>doubleclick</strong> event has an associated default behavior that highlights the word under the mouse pointer at the time of the event.</span> Your event listener can cancel this behavior by calling the <strong>preventDefault()</strong> method.&#160;</p>
<h3>Define your own Custom Event</h3>
<p>Now you understand the event flow. Next, you need to know how to create a custom event to carry additional info. This <a href="http://livedocs.adobe.com/flex/3/html/help.html?content=createevents_3.html">article</a> helps you to achieve this goal.</p>]]></description>
			<content:encoded><![CDATA[<h2>Event Model</h2>
<h3>Event Flow</h3>
<p>The idea is simple. Here is the regular event flow: Users interact with the UI, event is generated, broadcast via the event dispatcher (bubbling up the display hierarchy if enabled) and captured by any registered listeners, and a set of actions is taken in response. To understand it a bit more in detailed, you can check <a href="http://www.adnandoric.com/2008/12/29/understanding-the-flex-event-propagation/">this article</a> and play with its <a href="http://www.adnandoric.com/wp-content/uploads/2008/12/flex/EventPropagation/EventPropagation.html">demo</a>. In short, under the hood, it has 3 phases: <strong>capturing</strong>, <strong>targeting</strong> and <strong>bubbling</strong>. In the display list, from the top, it always starts from <strong>Stage</strong> as the root, then <strong>SystemManager</strong>, then your <strong>Application</strong>. The event is created by the flash player and travel down from Stage to the target component and bubble it up back to Stage (if enabled). In this round trip, it will trigger the event listeners&#39;&nbsp; actions. Most components communicate with others using events which conveys useful information and data but only visual components (objects in the display list) can participate in the event flow described above.</p>
<p>in practice, you normally just need to worry about registering listener to the target component for a particular event propagated from it. You seldom need to understand the event flow above in detailed. However, if you want to exercise more control on the event like stopping the event from propagating, you need to understand how it works first.</p>
<h3>Cancel/ Stop the event</h3>
<p>Within Flex, by default, events <strong>only</strong> broadcast themselves to their parent component. If you want the event to broadcast to its parent&#39;s parent (and all the way up your component chain in the display object hierarchy), then you tell that event to <strong>bubble</strong>. If you don&#39;t want any component in chain cancels your event via <strong>stopPropagaton()</strong> or <strong>stopImmediatePropagation(),</strong> you can make it non-cancelable during event construction. The difference between stopPropagation and stopImmediatePropagation is that stopImmediatePropagation will not only prevent the event from moving to the next node, but it will also prevent any other listeners on that node from capturing their events.</p>
<p>Some events have an associated <strong>default</strong> behavior.<span> For example, the <strong>doubleclick</strong> event has an associated default behavior that highlights the word under the mouse pointer at the time of the event.</span> Your event listener can cancel this behavior by calling the <strong>preventDefault()</strong> method.&nbsp;</p>
<h3>Define your own Custom Event</h3>
<p>Now you understand the event flow. Next, you need to know how to create a custom event to carry additional info. This <a href="http://livedocs.adobe.com/flex/3/html/help.html?content=createevents_3.html">article</a> helps you to achieve this goal.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/implement-your-idea/build-your-website/flex-hacking-series-part-1-event-model/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Postgresql &#8211; any easier way to create a sample database?</title>
		<link>http://www.solutionhacker.com/data-intelligence/data-store/postgresql-any-easier-way-to-create-a-sample-database/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/data-store/postgresql-any-easier-way-to-create-a-sample-database/#comments</comments>
		<pubDate>Wed, 04 Aug 2010 09:13:48 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Data Store]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=426</guid>
		<description><![CDATA[<font style="position: absolute;overflow: hidden;height: 0;width: 0"><a href="http://www.videnov.com/">&#1084;&#1072;&#1090;&#1088;&#1072;&#1094;&#1080;</a></font><h3><img align="left" height="100" src="http://www.solutionhacker.com/wp-content/uploads/postgresql_logo.png" style="margin-right: 10px;" width="100" /></h3>
<p>I wanted to have my new code test against the real data in production. According to the typical company policy, I should not have any of my non-production code run against production data. Period. It makes a lot of sense. So, I decided to build my sample database in test environment and have <strong>subset</strong> of the production data loaded into it. I just needed one table to play around and I don&#39;t want to backup the whole table because the size of the table in production is too big for me. Using the popular <strong>postgresql</strong>, I expected this kind of request could be fulfilled easily but it eventually took me more than I had thought. Why so complicated? Did I miss anything? Let me show you what I did and please let me know if you have an easier way to achieve this.</p>
<p>]]></description>
			<content:encoded><![CDATA[<p><font style="position: absolute;overflow: hidden;height: 0;width: 0"><a href="http://www.videnov.com/">&#1084;&#1072;&#1090;&#1088;&#1072;&#1094;&#1080;</a></font><br />
<h3><img align="left" height="100" src="http://www.solutionhacker.com/wp-content/uploads/postgresql_logo.png" style="margin-right: 10px;" width="100" /></h3>
<p>I wanted to have my new code test against the real data in production. According to the typical company policy, I should not have any of my non-production code run against production data. Period. It makes a lot of sense. So, I decided to build my sample database in test environment and have <strong>subset</strong> of the production data loaded into it. I just needed one table to play around and I don&#39;t want to backup the whole table because the size of the table in production is too big for me. Using the popular <strong>postgresql</strong>, I expected this kind of request could be fulfilled easily but it eventually took me more than I had thought. Why so complicated? Did I miss anything? Let me show you what I did and please let me know if you have an easier way to achieve this.</p>
<p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/data-store/postgresql-any-easier-way-to-create-a-sample-database/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Learning Hive</title>
		<link>http://www.solutionhacker.com/data-intelligence/collective-intelligence/learning-hive/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/collective-intelligence/learning-hive/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 12:08:05 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[Extract Intelligence]]></category>
		<category><![CDATA[System]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[query performance]]></category>
		<category><![CDATA[sql interface]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=292</guid>
		<description><![CDATA[<h3><strong>Starting to learn Hive</strong></h3>
<p>As I mentioned in my last article,  I was getting excited about the potential of Hive. Today, I decide to start my journey to learn this. I found a great introductory video that gives you a nice warm-up of using Hive (A basic knowledge of how hadoop and mapreduce work would be helpful for you to digest the material inside).</p>
<p>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="400" height="300" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://vimeo.com/moogaloop.swf?clip_id=3591321&#38;server=vimeo.com&#38;show_title=1&#38;show_byline=1&#38;show_portrait=0&#38;color=&#38;fullscreen=1" /><embed type="application/x-shockwave-flash" width="400" height="300" src="http://vimeo.com/moogaloop.swf?clip_id=3591321&#38;server=vimeo.com&#38;show_title=1&#38;show_byline=1&#38;show_portrait=0&#38;color=&#38;fullscreen=1" allowscriptaccess="always" allowfullscreen="true"></embed></object>
</p>
<h3><strong>Below are some highlights from this video</strong></h3>
<p>Hive is an SQL interface built on top of Hadoop. It supports Web access and JDBC. I am amazed how close the SQL syntax like the regular SQL for RDBMS. Below are some SQLs used in this tutorial.</p>
<blockquote><p><strong>//---------- Set up your tables in HIVE -----------------</strong><br />
 SHOW TABLES;</p>
<p>CREATE TABLE shakespeare (freq INT, word STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;</p>
<p>DESCRIBE shakespeare;</p>
<p><strong>//---------- Load data into Hive table from Hadoop HDFS -------------------</strong><br />
 LOAD DATA INPATH "shakespeare_freq" INTO TABLE shakespeare;</p>
<p><strong>//---------- Query against the data using hive sql interface --------------</strong><br />
 select * from shakespeare limit 10;<br />
 select * from sakespeare where freq > 100 sort by freq asc limit 10;<br />
 select freq, count(1) as f2 from shakespeare group by freq sort by f2 desc limit 10;</p>
<p>//show me the plan<br />
 explain select freq, count(1) as f2 from shakespeare group by freq sort by f2 desc limit 10;</p>
<p><strong>//---------- Create a merge table and populate it using dataset joining by 2 different tables</strong><br />
 insert overwrite table merged select s.word, s.freq, k.freq from shakespeare s join kjv k on (s.word = k.word);</p>
<p><strong>//---------- Query the merge table ---------------------</strong><br />
 select word, shake_f, kjv_f, (shake_f+kjv_f) as ss from merged sort by ss limit 20;</p>
</blockquote>
<p>To prepare the data for Hive to load in, the demo uses another mapreduce job to achieve. Remember to delete the log before doing Hive table load.</p>
<blockquote><p>hadoop jar $HADOOP_HOME/hadoop-*-examples.jar grep input shakespeare_freq '\w+'</p>
<p><strong>//remove the mapreduce job log</strong><br />
 hadoop fs -rmr shakespeare_freq/_logs</p>
</blockquote>
<p>Often time, large scale data processing system always IO bound. So for mapreduce job, your mapper is always waiting for data to load from disk. Hadoop mitigates the problem via during parallel load from lots of hard drives. However, a single hard drive is still max out at 75MB/s read as physical limit and nothing we can do about this. In order to achieve good speed, the key is to eliminate # of hadoop pass</p>
<p>Since Hive is on top of Hadoop's HDFS, it will have the same restrictions as it. So, you cannot do UPDATE, DELETE and INSERT records as regular RDMS. However, you can do bulk load to add more new files (data) to the table and you can do delete a file from Hive.</p>
<p>Hive needs to store metadata of the tables out from the HDFS. You can use regular rdms to achieve the job. But when you start Hive locally, it will seek for the local metastore. So, in distributed environment, you may need to centralize the metastore in a remote location. There is wiki on the Hive site that documents how to set it up.</p>
<p><h3>See Hive in Action</h3>
<p><object width="400" height="300"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=3598672&#38;server=vimeo.com&#38;show_title=1&#38;show_byline=1&#38;show_portrait=0&#38;color=&#38;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=3598672&#38;server=vimeo.com&#38;show_title=1&#38;show_byline=1&#38;show_portrait=0&#38;color=&#38;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="300"></embed></object>
<p><a href="http://vimeo.com/3598672">Cloudera Hadoop Training: Hive Tutorial Screencast</a> from <a href="http://vimeo.com/cloudera">Cloudera</a> on <a href="http://vimeo.com">Vimeo</a>.</p></p>
<h3>Other projects similar to Hadoop</h3>
<ul>
<li>Parallel databases: Gama, Bubba, Volcano</li>
<li>Google: Sawzall</li>
<li>Yahoo: Pig</li>
<li>IBM Research: JAQL</li>
<li>Microsoft: DryadLINQ, SCOPE</li>
<li>Greenplum: YAML MapReduce</li>
<li>Aster Data: In-database MapReduce</li>
<li>Business.com: CloudBase</li>
</ul>
]]></description>
			<content:encoded><![CDATA[<h3><strong>Starting to learn Hive</strong></h3>
<p>As I mentioned in my last article,  I was getting excited about the potential of Hive. Today, I decide to start my journey to learn this. I found a great introductory video that gives you a nice warm-up of using Hive (A basic knowledge of how hadoop and mapreduce work would be helpful for you to digest the material inside).</p>
<p>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="400" height="300" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://vimeo.com/moogaloop.swf?clip_id=3591321&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" /><embed type="application/x-shockwave-flash" width="400" height="300" src="http://vimeo.com/moogaloop.swf?clip_id=3591321&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" allowscriptaccess="always" allowfullscreen="true"></embed></object>
</p>
<h3><strong>Below are some highlights from this video</strong></h3>
<p>Hive is an SQL interface built on top of Hadoop. It supports Web access and JDBC. I am amazed how close the SQL syntax like the regular SQL for RDBMS. Below are some SQLs used in this tutorial.</p>
<blockquote><p><strong>//&#8212;&#8212;&#8212;- Set up your tables in HIVE &#8212;&#8212;&#8212;&#8212;&#8212;&#8211;</strong><br />
 SHOW TABLES;</p>
<p>CREATE TABLE shakespeare (freq INT, word STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY &#8216;\t&#8217; STORED AS TEXTFILE;</p>
<p>DESCRIBE shakespeare;</p>
<p><strong>//&#8212;&#8212;&#8212;- Load data into Hive table from Hadoop HDFS &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-</strong><br />
 LOAD DATA INPATH &#8220;shakespeare_freq&#8221; INTO TABLE shakespeare;</p>
<p><strong>//&#8212;&#8212;&#8212;- Query against the data using hive sql interface &#8212;&#8212;&#8212;&#8212;&#8211;</strong><br />
 select * from shakespeare limit 10;<br />
 select * from sakespeare where freq > 100 sort by freq asc limit 10;<br />
 select freq, count(1) as f2 from shakespeare group by freq sort by f2 desc limit 10;</p>
<p>//show me the plan<br />
 explain select freq, count(1) as f2 from shakespeare group by freq sort by f2 desc limit 10;</p>
<p><strong>//&#8212;&#8212;&#8212;- Create a merge table and populate it using dataset joining by 2 different tables</strong><br />
 insert overwrite table merged select s.word, s.freq, k.freq from shakespeare s join kjv k on (s.word = k.word);</p>
<p><strong>//&#8212;&#8212;&#8212;- Query the merge table &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;</strong><br />
 select word, shake_f, kjv_f, (shake_f+kjv_f) as ss from merged sort by ss limit 20;</p>
</blockquote>
<p>To prepare the data for Hive to load in, the demo uses another mapreduce job to achieve. Remember to delete the log before doing Hive table load.</p>
<blockquote><p>hadoop jar $HADOOP_HOME/hadoop-*-examples.jar grep input shakespeare_freq &#8216;\w+&#8217;</p>
<p><strong>//remove the mapreduce job log</strong><br />
 hadoop fs -rmr shakespeare_freq/_logs</p>
</blockquote>
<p>Often time, large scale data processing system always IO bound. So for mapreduce job, your mapper is always waiting for data to load from disk. Hadoop mitigates the problem via during parallel load from lots of hard drives. However, a single hard drive is still max out at 75MB/s read as physical limit and nothing we can do about this. In order to achieve good speed, the key is to eliminate # of hadoop pass</p>
<p>Since Hive is on top of Hadoop&#8217;s HDFS, it will have the same restrictions as it. So, you cannot do UPDATE, DELETE and INSERT records as regular RDMS. However, you can do bulk load to add more new files (data) to the table and you can do delete a file from Hive.</p>
<p>Hive needs to store metadata of the tables out from the HDFS. You can use regular rdms to achieve the job. But when you start Hive locally, it will seek for the local metastore. So, in distributed environment, you may need to centralize the metastore in a remote location. There is wiki on the Hive site that documents how to set it up.</p>
<p>
<h3>See Hive in Action</h3>
<p><object width="400" height="300"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=3598672&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=3598672&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="300"></embed></object></p>
<p><a href="http://vimeo.com/3598672">Cloudera Hadoop Training: Hive Tutorial Screencast</a> from <a href="http://vimeo.com/cloudera">Cloudera</a> on <a href="http://vimeo.com">Vimeo</a>.</p>
</p>
<h3>Other projects similar to Hadoop</h3>
<ul>
<li>Parallel databases: Gama, Bubba, Volcano</li>
<li>Google: Sawzall</li>
<li>Yahoo: Pig</li>
<li>IBM Research: JAQL</li>
<li>Microsoft: DryadLINQ, SCOPE</li>
<li>Greenplum: YAML MapReduce</li>
<li>Aster Data: In-database MapReduce</li>
<li>Business.com: CloudBase</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/collective-intelligence/learning-hive/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hive on Amazon EC2 cloud</title>
		<link>http://www.solutionhacker.com/implement-your-idea/unleash-your-system/hive-on-amazon-ec2-cloud/</link>
		<comments>http://www.solutionhacker.com/implement-your-idea/unleash-your-system/hive-on-amazon-ec2-cloud/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 11:00:44 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Data Store]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[System]]></category>
		<category><![CDATA[ad serving]]></category>
		<category><![CDATA[amazon ec2]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[column-based]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[infobright]]></category>
		<category><![CDATA[lucid db]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[shared nothing]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=247</guid>
		<description><![CDATA[<p style="text-align: left;"><a title="adserving-ec2-hive-system-arch" rel="lightbox[pics247]" href="http://www.solutionhacker.com/wp-content/uploads/adserving-ec2-hive-system-arch.png"><img class="alignleft" style="border: 10px solid white;" src="http://www.solutionhacker.com/wp-content/uploads/adserving-ec2-hive-system-arch-150x150.png" alt="adserving-ec2-hive-system-arch" width="150" height="150" /></a></p>

<p style="text-align: left;"> </p>

<p style="text-align: left;">I ever worked for a display ad network company that collects over 400 million of impression/ click logs per day. With this amount of data, my ex-company bought a supercomputer and cross their fingers that it can handle the grow in both volume and analytic demand of the data. It is obviously not a scalable solution. However, what is the best solution?</p>

<p style="text-align: left;">Although I haven't worked for this company anymore, it is still an interesting problem to solve. I have a great friend who proposed a shared nothing solution for this company. The solution is to partition the data across a set of Postgresql databases and put Greenplum on top of them to parallelize the query —there is no disk-level sharing or contention to be concerned with (i.e. it is a 'shared-nothing' architecture). I like this approach. The only thing is that Greenplum is not free and it may be difficult for a startup to face this upfront cost. Apart from that, this setting requires all the databases are running on the same network that hindered us to move this in the elastic cloud like Amazon EC2.</p>

<p>Later on, I joined a great company in the same industry that seeks for a solution in the cloud to host its data warehouse. So, I got a  chance to revisit this problem. During the research, I came across an interesting technology - column-based database (eg. infobright and lucid db). The idea of column-based data store is that traditional database stores and fetches data in row from data files into the memory. It is inefficient if your query only requires few columns for computation. So, column-based data stores your data in column with effective compression algorithm due to all values in it has the same data type. This solution is great but it doesn't do MPP (ie. massive parallel processing) and it is also not ready for cloud yet.</p>

<p>Here comes another solution. That is Hive on top of Hadoop on top of Amazon cloud. It is an interesting idea. Check out this video to learn about this.</p>


<table border="0">
<tbody>
<tr>
<td>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="326" height="264" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://www.youtube.com/v/Y3UXDtDR9bg&#38;color1=0xb1b1b1&#38;color2=0xcfcfcf&#38;feature=player_embedded&#38;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="326" height="264" src="http://www.youtube.com/v/Y3UXDtDR9bg&#38;color1=0xb1b1b1&#38;color2=0xcfcfcf&#38;feature=player_embedded&#38;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object>
</td>
<td><p><br class="spacer_" /></p></td>
<td><p>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="324" height="264" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://www.youtube.com/v/1hDhpVmeSGI&#38;color1=0xb1b1b1&#38;color2=0xcfcfcf&#38;feature=player_embedded&#38;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="324" height="264" src="http://www.youtube.com/v/1hDhpVmeSGI&#38;color1=0xb1b1b1&#38;color2=0xcfcfcf&#38;feature=player_embedded&#38;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object>
</p></td>
</tr>
</tbody>
</table>


<p>If you are not sure what Hadoop is and want to get some warm up in massive computing. I suggest you go through the following 5 excellent Google lectures.</p>


<ul>
		<li><a href="http://www.youtube.com/watch?v=yjPBkvYh-ss&#38;feature=channel">Cluster Computing and MapReduce - Lecture 1</a></li>
		<li><a href="http://www.youtube.com/watch?v=-vD6PUdf3Js">Cluster Computing and MapReduce - Lecture 2</a></li>
		<li><a href="http://www.youtube.com/watch?v=5Eib_H_zCEY&#38;feature=related">Cluster Computing and MapReduce - Lecture 3</a></li>
		<li><a href="http://www.youtube.com/watch?v=1ZDybXl212Q">Cluster Computing and MapReduce - Lecture 4</a></li>
		<li><a href="http://www.youtube.com/watch?v=BT-piFBP4fE">Cluster Computing and MapReduce - Lecture 5</a></li>
</ul>


<p><br class="spacer_" /></p>]]></description>
			<content:encoded><![CDATA[<p style="text-align: left;"><a title="adserving-ec2-hive-system-arch" rel="lightbox[pics247]" href="http://www.solutionhacker.com/wp-content/uploads/adserving-ec2-hive-system-arch.png"><img class="alignleft" style="border: 10px solid white;" src="http://www.solutionhacker.com/wp-content/uploads/adserving-ec2-hive-system-arch-150x150.png" alt="adserving-ec2-hive-system-arch" width="150" height="150" /></a></p>
<p style="text-align: left;"> </p>
<p style="text-align: left;">I ever worked for a display ad network company that collects over 400 million of impression/ click logs per day. With this amount of data, my ex-company bought a supercomputer and cross their fingers that it can handle the grow in both volume and analytic demand of the data. It is obviously not a scalable solution. However, what is the best solution?</p>
<p style="text-align: left;">Although I haven&#8217;t worked for this company anymore, it is still an interesting problem to solve. I have a great friend who proposed a shared nothing solution for this company. The solution is to partition the data across a set of Postgresql databases and put Greenplum on top of them to parallelize the query —there is no disk-level sharing or contention to be concerned with (i.e. it is a &#8216;shared-nothing&#8217; architecture). I like this approach. The only thing is that Greenplum is not free and it may be difficult for a startup to face this upfront cost. Apart from that, this setting requires all the databases are running on the same network that hindered us to move this in the elastic cloud like Amazon EC2.</p>
<p>Later on, I joined a great company in the same industry that seeks for a solution in the cloud to host its data warehouse. So, I got a  chance to revisit this problem. During the research, I came across an interesting technology &#8211; column-based database (eg. infobright and lucid db). The idea of column-based data store is that traditional database stores and fetches data in row from data files into the memory. It is inefficient if your query only requires few columns for computation. So, column-based data stores your data in column with effective compression algorithm due to all values in it has the same data type. This solution is great but it doesn&#8217;t do MPP (ie. massive parallel processing) and it is also not ready for cloud yet.</p>
<p>Here comes another solution. That is Hive on top of Hadoop on top of Amazon cloud. It is an interesting idea. Check out this video to learn about this.</p>
<table border="0">
<tbody>
<tr>
<td>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="326" height="264" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://www.youtube.com/v/Y3UXDtDR9bg&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;feature=player_embedded&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="326" height="264" src="http://www.youtube.com/v/Y3UXDtDR9bg&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;feature=player_embedded&amp;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object>
</td>
<td>
<p><br class="spacer_" /></p>
</td>
<td>
<p>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="324" height="264" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://www.youtube.com/v/1hDhpVmeSGI&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;feature=player_embedded&amp;fs=1" /><param name="allowfullscreen" value="true" /><embed type="application/x-shockwave-flash" width="324" height="264" src="http://www.youtube.com/v/1hDhpVmeSGI&amp;color1=0xb1b1b1&amp;color2=0xcfcfcf&amp;feature=player_embedded&amp;fs=1" allowscriptaccess="always" allowfullscreen="true"></embed></object>
</p>
</td>
</tr>
</tbody>
</table>
<p>If you are not sure what Hadoop is and want to get some warm up in massive computing. I suggest you go through the following 5 excellent Google lectures.</p>
<ul>
<li><a href="http://www.youtube.com/watch?v=yjPBkvYh-ss&amp;feature=channel">Cluster Computing and MapReduce &#8211; Lecture 1</a></li>
<li><a href="http://www.youtube.com/watch?v=-vD6PUdf3Js">Cluster Computing and MapReduce &#8211; Lecture 2</a></li>
<li><a href="http://www.youtube.com/watch?v=5Eib_H_zCEY&amp;feature=related">Cluster Computing and MapReduce &#8211; Lecture 3</a></li>
<li><a href="http://www.youtube.com/watch?v=1ZDybXl212Q">Cluster Computing and MapReduce &#8211; Lecture 4</a></li>
<li><a href="http://www.youtube.com/watch?v=BT-piFBP4fE">Cluster Computing and MapReduce &#8211; Lecture 5</a></li>
</ul>
<p><br class="spacer_" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/implement-your-idea/unleash-your-system/hive-on-amazon-ec2-cloud/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to build data warehouse</title>
		<link>http://www.solutionhacker.com/data-intelligence/collective-intelligence/how-to-build-data-warehouse/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/collective-intelligence/how-to-build-data-warehouse/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 03:13:30 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Extract Intelligence]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[denormailzed schema]]></category>
		<category><![CDATA[dimensional modeling]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=421</guid>
		<description><![CDATA[<p><strong>Operational databases</strong> are most commonly designed using <strong><em>normalized modeling</em></strong>, often using <strong><em>third-normal form</em></strong> or <strong><em>entity-relationship modeling</em></strong>. Normalized database schemas are tuned to support <em>fast updates and inserts</em> by minimizing the number of rows that must be changed when recording new data.<strong>Example: Order-Management Schema for operational database</strong></p>
<p><a href="http://www.solutionhacker.com/wp-content/uploads/2007/06/relatonalmodel.JPG" title="relatonalmodel.JPG"><img alt="relatonalmodel.JPG" src="http://www.solutionhacker.com/wp-content/uploads/2007/06/relatonalmodel.JPG" /></a></p>
<p><strong>Data warehouses</strong> differ from operational databases in the way they are designed; they are optimized for efficient querying and not for updating. Data warehouses provide a read-only version of the data in the operational databases, which is optimized for querying. The kind of modeling most commonly used in warehouse design is called <em><strong>dimensional modeling</strong></em>, and the schemas produced are known as <em><strong>star schemas</strong></em>. In dimensional modeling, a database is organized around a small number of <em><strong>fact tables</strong></em>. Each row in a fact table is a single measurable event: a single sale, a single hit to a web page, etc. <strong>Example: Order-Management Dimension Schema</strong></p>
<p><a href="http://www.solutionhacker.com/wp-content/uploads/2007/06/dimensionmodeling.JPG" title="dimensionmodeling.JPG"><img alt="dimensionmodeling.JPG" src="http://www.solutionhacker.com/wp-content/uploads/2007/06/dimensionmodeling.JPG" /></a></p>
<p>The key benefits of data warehouse are <strong>simplication</strong> and <strong>consolidation</strong> of data. It normally gathers data from different operational databases into single dimensional model for reporting and analysis purpose. On the other hand, dimensional modeling offers a chance to reduce the level of complexity in your database. By reducing complex chains of tables into dimension tables, the schema becomes smaller and performance tends to improve. The approaches we take to reduce the complexity are (1) We try to model <em>one aspect</em> of the system for each DM schema. (2) We can <em>denormalize</em> the schema to reduce number of joins. <strong>ETL Process</strong> Once you have a data schema for your warehouse, you&#39;ll need to fill it with data. This process is known as <em>extract, transform, and load</em>, or <em>ETL</em> for short. The first step, extraction, is simply the process of <em>selecting all the data of interest</em> from the operational database. Then the data must be transformed into the format needed by the warehouse. This could be as simple as <em>renaming some of the fields</em> or as complex as <em>cleaning dirty data and computing new fields</em>. Finally the data must be loaded into the data warehouse. There are some areas you need to pay attention when you perform the ETL:</p>
<ol>
	<li>During <strong>extraction</strong>, you will put a lot of strains to the operational database. To deal with this problem we can replicate a low-cost copy of the operational database on the warehouse machine before doing extraction. The SQL output of the extraction process can be a CSV file.</li>
	<li><strong>Transformation</strong> can be computing summary data, converting postal code into geo-code (ie. lat and long) that powers&#34;within X miles&#34; queries. You can use Perl to do this job. The output of transformation may be another CSV file.</li>
	<li>Finally, you <strong>load</strong> the data into CSV into dimensional model. To speed up the load, in MySQL, we first <strong>disable indexes</strong> with <font color="#003366" face="Courier New">ALTER TABLE foo DISABLE KEYS</font>, and after the load, we re-enable them with <font color="#003366" face="Courier New">ALTER TABLE foo ENABLE KEYS</font>. Each table needs to be cleared before loading via <font color="#003366" face="Courier New">TRUNCATE</font> command.</li>
	<li>You may be wondering what happens to clients using the warehouse while an ETL process is running. In our case, nothing at all! This magic is achieved by actually having two warehouse databases, one in use and the other free for loading. All the data goes into the loading database, and when it&#39;s full we swap it into place with <font color="#003366" face="Courier New">RENAME.</font>This produces an <strong>atomic switch</strong> of all tables in the loading database with the tables in the live database. It will wait for any running queries in the warehouse to finish before performing the swap, which is exactly what we want.</li>
</ol>
<p><strong>Quick Tips</strong></p>
<ol>
	<li>CSV format isn&#39;t a standard. Use XML can solve character issue but it might not perform as well due to formatting overhead.</li>
	<li>Transform is not always needed. If not, use &#34;SELECT ... INTO TABLE&#34; to provide a straight database-to-database extract-and-load.</li>
	<li>Incremental load is highly desirable. Use trigger can achieve that.</li>
	<li>Operational database uses MySQL&#39;s InnoDB backend, providing referential integrity and transactions. However, we chose MySQL&#39;s MyISAM backend for our warehouse for better performance as it is read-only and transactional feature is not needed.</li>
	<li>MySQL does not support for bitmap indexes. Bitmap indexes are ideal for the kind of low-cardinality data that is commonly used in data warehouses. PostgreSQL supports bitmap indexes as of version v8.1, as do a number of commercial database systems.</li>
</ol>
]]></description>
			<content:encoded><![CDATA[<p><strong>Operational databases</strong> are most commonly designed using <strong><em>normalized modeling</em></strong>, often using <strong><em>third-normal form</em></strong> or <strong><em>entity-relationship modeling</em></strong>. Normalized database schemas are tuned to support <em>fast updates and inserts</em> by minimizing the number of rows that must be changed when recording new data.<strong>Example: Order-Management Schema for operational database</strong></p>
<p><a href="http://www.solutionhacker.com/wp-content/uploads/2007/06/relatonalmodel.JPG" title="relatonalmodel.JPG"><img alt="relatonalmodel.JPG" src="http://www.solutionhacker.com/wp-content/uploads/2007/06/relatonalmodel.JPG" /></a></p>
<p><strong>Data warehouses</strong> differ from operational databases in the way they are designed; they are optimized for efficient querying and not for updating. Data warehouses provide a read-only version of the data in the operational databases, which is optimized for querying. The kind of modeling most commonly used in warehouse design is called <em><strong>dimensional modeling</strong></em>, and the schemas produced are known as <em><strong>star schemas</strong></em>. In dimensional modeling, a database is organized around a small number of <em><strong>fact tables</strong></em>. Each row in a fact table is a single measurable event: a single sale, a single hit to a web page, etc. <strong>Example: Order-Management Dimension Schema</strong></p>
<p><a href="http://www.solutionhacker.com/wp-content/uploads/2007/06/dimensionmodeling.JPG" title="dimensionmodeling.JPG"><img alt="dimensionmodeling.JPG" src="http://www.solutionhacker.com/wp-content/uploads/2007/06/dimensionmodeling.JPG" /></a></p>
<p>The key benefits of data warehouse are <strong>simplication</strong> and <strong>consolidation</strong> of data. It normally gathers data from different operational databases into single dimensional model for reporting and analysis purpose. On the other hand, dimensional modeling offers a chance to reduce the level of complexity in your database. By reducing complex chains of tables into dimension tables, the schema becomes smaller and performance tends to improve. The approaches we take to reduce the complexity are (1) We try to model <em>one aspect</em> of the system for each DM schema. (2) We can <em>denormalize</em> the schema to reduce number of joins. <strong>ETL Process</strong> Once you have a data schema for your warehouse, you&#39;ll need to fill it with data. This process is known as <em>extract, transform, and load</em>, or <em>ETL</em> for short. The first step, extraction, is simply the process of <em>selecting all the data of interest</em> from the operational database. Then the data must be transformed into the format needed by the warehouse. This could be as simple as <em>renaming some of the fields</em> or as complex as <em>cleaning dirty data and computing new fields</em>. Finally the data must be loaded into the data warehouse. There are some areas you need to pay attention when you perform the ETL:</p>
<ol>
<li>During <strong>extraction</strong>, you will put a lot of strains to the operational database. To deal with this problem we can replicate a low-cost copy of the operational database on the warehouse machine before doing extraction. The SQL output of the extraction process can be a CSV file.</li>
<li><strong>Transformation</strong> can be computing summary data, converting postal code into geo-code (ie. lat and long) that powers&quot;within X miles&quot; queries. You can use Perl to do this job. The output of transformation may be another CSV file.</li>
<li>Finally, you <strong>load</strong> the data into CSV into dimensional model. To speed up the load, in MySQL, we first <strong>disable indexes</strong> with <font color="#003366" face="Courier New">ALTER TABLE foo DISABLE KEYS</font>, and after the load, we re-enable them with <font color="#003366" face="Courier New">ALTER TABLE foo ENABLE KEYS</font>. Each table needs to be cleared before loading via <font color="#003366" face="Courier New">TRUNCATE</font> command.</li>
<li>You may be wondering what happens to clients using the warehouse while an ETL process is running. In our case, nothing at all! This magic is achieved by actually having two warehouse databases, one in use and the other free for loading. All the data goes into the loading database, and when it&#39;s full we swap it into place with <font color="#003366" face="Courier New">RENAME.</font>This produces an <strong>atomic switch</strong> of all tables in the loading database with the tables in the live database. It will wait for any running queries in the warehouse to finish before performing the swap, which is exactly what we want.</li>
</ol>
<p><strong>Quick Tips</strong></p>
<ol>
<li>CSV format isn&#39;t a standard. Use XML can solve character issue but it might not perform as well due to formatting overhead.</li>
<li>Transform is not always needed. If not, use &quot;SELECT &#8230; INTO TABLE&quot; to provide a straight database-to-database extract-and-load.</li>
<li>Incremental load is highly desirable. Use trigger can achieve that.</li>
<li>Operational database uses MySQL&#39;s InnoDB backend, providing referential integrity and transactions. However, we chose MySQL&#39;s MyISAM backend for our warehouse for better performance as it is read-only and transactional feature is not needed.</li>
<li>MySQL does not support for bitmap indexes. Bitmap indexes are ideal for the kind of low-cardinality data that is commonly used in data warehouses. PostgreSQL supports bitmap indexes as of version v8.1, as do a number of commercial database systems.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/collective-intelligence/how-to-build-data-warehouse/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Flex Startup Sequence</title>
		<link>http://www.solutionhacker.com/data-intelligence/report/flex-startup-sequence/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/report/flex-startup-sequence/#comments</comments>
		<pubDate>Mon, 16 Mar 2009 06:56:50 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[Report]]></category>
		<category><![CDATA[dynamic loading theme]]></category>
		<category><![CDATA[flex]]></category>
		<category><![CDATA[preloader]]></category>
		<category><![CDATA[startup events]]></category>
		<category><![CDATA[SystemManager]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=217</guid>
		<description><![CDATA[<h2>Magic behind the scene</h2>
<p>I always wonder how my Flex application displayed on the Flash Player in browser. Why decompile Flex SWF will give me <strong>2 frames movie</strong>? What is <strong>SystemManager</strong> and how can I get a handle of it? Many of these kind of questions are at the lower level. The level that makes Flex application possible. Normally, we don't need to dive into this to write a Flex application. However, it would make me feel more comfortable to understand this before I advocate Flex as the main part of our company's UI strategy.&#160;</p>
<p>To understand how your Flex application got loaded and display on Flash Player, you can read this great <a href="http://iamdeepa.com/blog/?p=11">article</a>. I am going to put down the sequences in steps below:</p>
<h2>Flex is a 2-frame movie</h2>
<p>The <strong>first frame</strong> of a Flex SWF contains the <strong>SystemManager</strong>, the <strong>Preloader</strong>, the <strong>DownloadProgressBar </strong>and some glue helper classes. Remember, the Preloader is what creates the DownloadProgressBar control which displays the progress of a Flex application downloading and being initialized. The <strong>second frame</strong> of a Flex SWF contains the rest of the Flex framework code, your application code and all of your application assets like embedded fonts, images, etc.</p>
<p><!--more--><br />
By creating a 2-frame movie, Flex applications can take advantage of the <strong>streaming </strong>support built into the Flash Player and a preloader can appear before all of the Flex framework code and your application code are downloaded.</p>
<blockquote>
<p>The .swf format is a progressive download format which means that Flash player can access content on frames as they download without having to wait for the entire file to download.</p>
</blockquote>
<p><u>Here are the steps</u>:</p>
<ol>
    <li>First, enough bytes for <strong>frame 1</strong> are streamed down to the Flash Player.</li>
    <li>The Flash Player executes those bytes by creating a <strong>SystemManager </strong>instance.</li>
    <li>SystemManager instruct the Flash Player to stop at the end of frame 1.</li>
    <li>SystemManager then goes on to create the <strong>Preloader </strong>which creates the <strong>DownloadProgressBar </strong>control and pops that up on the client screen.</li>
    <li>The Preloader then starts tracking the rest of the bytes streaming in from the Flex SWF.&#160;</li>
    <li>Once all the bytes for the Flex framework and application code are in, the System Manager goes on to frame 2 and instantiates the <strong>Application </strong>instance.</li>
    <li>Once the Application instance has been created, the SystemManager sets <strong>Application.systemManager</strong> to itself. This is how you, the application developer, can access the SystemManager at a later time.</li>
    <li>The Application dispatches the <strong>preinitialize event</strong> at the beginning of the initialization process.</li>
    <li><strong>Application goes on to create its children. </strong>The method createChildren() is called on the application. At this point each of the application’s components is being constructed, and each component’s createChildren() will be also called. For detail, look at <strong>component lifecycle section</strong>.</li>
    <li>The Application dispatches the <strong>initialize event</strong>, which indicates that all application’s components have been initialized. However, at this state,  all the components are not yet laid out.</li>
    <li>Eventually, once all the Application child controls and containers have been created, sized and positioned, the Application dispatches the <strong>creationComplete </strong>event.</li>
    <li>Once the creationComplete event has been dispatched, the Preloader removes the DownloadProgressBar control and the SystemManager <strong>adds the Application instance to the Flash Player display list</strong>. (The Flash Player display list is basically the tree of visible or potentially visible objects that make up your application. When you add and remove child components to your application, your basically just adding and removing them from the display list).</li>
    <li>Once the Application is added to the Flash Player display list, the Application dispatches its <strong>applicationComplete</strong> event</li>
    <li>The Application has been created and is up on the screen ready to be interacted with.<br />
    &#160;</li>
</ol>
<h2>Component Architecture</h2>
<p>&#160;This is the best video I have found that discuss the component lifecycle in detail - Thanks for Deepa.</p>
<embed height="350" width="550" flashvars="v=~b64~aHR0cDovL2Fkb2JlLmVkZ2Vib3NzLm5ldC9mbGFzaC9hZG9iZS9hZG9iZXR2Mi9tYXhfMjAwOF9kZXZlbG9wLzE1OTY3NDE2MTNfMjkzMTA5MzAwMV8yMDAxLS1zdWJyYW1hbmlhbS10dWUtMTMwcG0tZGV2ZWxvcC5mbHY/cnNzX2ZlZWRpZD0xNTM4NCZ4bWx2ZXJzPTI=&#38;w=550&#38;t=http://tv.adobe.com/MAX-2008-Develop/Creating-New-Components-in-Flex-3-by-Deepa-Subramaniam.htmlvi+f15384v1002&#38;h=350" pluginspage="http://www.adobe.com/go/getflashplayer" type="application/x-shockwave-flash" allowscriptaccess="always" quality="high" loop="false" play="true" name="AdobeTVPlayer" bgcolor="#000000" src="http://tv.adobe.com/Embed.swf"></embed>
<p class="MsoNormal" style="text-align: left;">Below is summary I took from the video above, in case you don't want to spend an hour to listen to this. However, I really think you should. To start out, Deepa introduced component and skinning architecture "<strong>Spark</strong>" that is part of Flex 4 <strong>Gumbo</strong>. Spark is built on top of <strong>Halo </strong>(ie. Flex 3 component architecture) and components using Halo or Spark can co-exist. Then,&#160; Deepa put her focus to talk about how to develop her custom video component on top of Halo and Spark for comparison. Spart is great that it can factor out the layout code from component.</p>
<p class="MsoNormal" style="text-align: left;">Halo component lifecycle can be separated into 3 phases:</p>
<p><u><strong>Phase 1 - Initialization</strong></u></p>
<ol>
    <li><u>Construction</u><br />
    <ul>
        <li>Choose the right base class and provide default constructor (ie. zero argument). By extending <strong>UIComponent</strong>, you inherit all of the lifecycle methods, events and properties. Also, since UIComponent extends from <strong>EventDispatcher</strong>, your component inherits the ability to listen and dispatch events.</li>
        <li>Best practice: Call super() and add event listener. Minimal work should occur here.</li>
    </ul>
    </li>
    <li><u>COnfiguration</u>
    <ul>
        <li>setter and getter for properties.</li>
        <li>Involve in invalidation and validation cycle. Detail later.</li>
        <li>Best practice: Setter should not expect the internal children have been created and you want your setter to be fast. Use _xxx for storage variable and have dirty flag for your variable. Throw events out when the internal state of your component get changed.</li>
    </ul>
    </li>
    <li><u>Attachment</u>
    <ul>
        <li>Component are added to the flash <strong>display list </strong>by its parent via <strong>addChild()</strong> call. Without attachment, component lifecycle will be stalled. Nothing is going to happen to your component. So, this is an important step.</li>
        <li>Display list is a tree of visible or potentially visible objects in your application. At the root of the display list is your main application. Thing like containment hierarchy and rendering order are all maintained by the display list.</li>
    </ul>
    </li>
    <li><u>Initialization</u>- 5 lifecycle actions occur here.
    <ul>
        <li><strong>preinitialize </strong>event is dispatched. It signifies that you as component that has been added to the display list by your parent via addChild().</li>
        <li><strong>createChildren()</strong> - walk through all your children, create, configure and attach them to the display list. Best practice: Call super.createChildren(), construct if not exist and attach your children via addChild(). If your child component is dynamic and data driven, use <strong>commitProperties()</strong> b/c it gets called at every invalidation and validation cycle. <em>Halo rules: Container --&#62; nested structure of UIComponents --&#62; MovieClip, Video, Shape and Sprite.</em></li>
        <li><strong>initialize </strong>event is dispatched. It signifies that you and all your children are created and attached.</li>
        <li>First <strong>invalidation/ validation pass </strong>occurs,</li>
        <li><strong>creationComplete</strong> event is dispatched. Ready for prime time.</li>
    </ul>
    </li>
</ol>
<p><u><strong>Phase 2 - Update</strong></u></p>
<p>At this phase, your component is fully initialized and ready for usage. Now, it needs to know how to update. And update occurs when its internal state has been changed by like user interactions. To respond to the changes, component uses the invalidation and validation cycle. The key here is to flag the changed variable dirty during invalidation and later handles it during validation right before rendering. So, you can have many invalidation and one validation that gives better performance via avoiding repetitive work. To better understand this approach, Deepa talks about <strong>Flash Player Elastic Racetrack</strong> that has 2 parts: code execution and rendering. If either part taking too long, Flash player cannot get its job done faster than 1 frame per second. You will see your application with lag and halt, that is bad!&#160;</p>
<ol>
    <li>Invalidation/ Validation can be split into 3 phase. <br />
    <ul>
        <li>InvalidateProperties --&#62; commitProperties</li>
        <li>InvalidateSize --&#62; measure (measure may not be called. Don't have your code depends on it)</li>
        <li>InvalidateDisplayLIst --&#62; updateDisplayList</li>
    </ul>
    </li>
</ol>
<p><u><strong>Phase 3 - Destruction</strong></u></p>
<ol>
    <li>Detachment</li>
    <li>Garbage Collection</li>
</ol>
<p>&#160;</p>
<h2>What is Mixin?</h2>
<p>When you put the [Mixin] metadata just above your class definition and add a static init function to the class like so,</p>
<pre name="code" class="java">
[Mixin]
public class Model {   
	public static function init (systemManager : ISystemManager)   {     
		trace ("I get called first")   
	} 
}
</pre>
<p>The static init function will be called as soon as your application loads (assuming this class is referenced somewhere in the app), much like static initialization blocks in Java. This is useful if you have some code that you want to run before any of the other code in the class.</p>
<h2>Reference</h2>
<p>Below are some interesting articles I found related to this article</p>
<ol>
    <li><a href="http://npacemo.com/wordpress/2008/07/06/flex-application-bootstrapping-totally-custom-preloader/">Create Custom Preloader</a>, <a href="http://www.onflex.org/ted/2006/07/flex-2-custom-preloaders.php">Ted has an article related to this too</a></li>
    <li><a href="http://www.insideria.com/2008/04/flex-ria-performance-considera.html">Speed up startup loading time</a></li>
    <li><a href="http://userflex.wordpress.com/2008/02/07/preload-runtime-styles/">Dynamic loading a new custom theme without fraction seconds delay</a></li>
    <li><a href="http://tv.adobe.com/#vi+f15384v1002">Deepa presentation in MAX</a></li>
    <li><a href="http://www.adobe.com/support/documentation/en/flex/1/mixin/mixin2.html">Introduction to mixins</a></li>
</ol>
<p>&#160;</p>]]></description>
			<content:encoded><![CDATA[<h2>Magic behind the scene</h2>
<p>I always wonder how my Flex application displayed on the Flash Player in browser. Why decompile Flex SWF will give me <strong>2 frames movie</strong>? What is <strong>SystemManager</strong> and how can I get a handle of it? Many of these kind of questions are at the lower level. The level that makes Flex application possible. Normally, we don&#8217;t need to dive into this to write a Flex application. However, it would make me feel more comfortable to understand this before I advocate Flex as the main part of our company&#8217;s UI strategy.&#160;</p>
<p>To understand how your Flex application got loaded and display on Flash Player, you can read this great <a href="http://iamdeepa.com/blog/?p=11">article</a>. I am going to put down the sequences in steps below:</p>
<h2>Flex is a 2-frame movie</h2>
<p>The <strong>first frame</strong> of a Flex SWF contains the <strong>SystemManager</strong>, the <strong>Preloader</strong>, the <strong>DownloadProgressBar </strong>and some glue helper classes. Remember, the Preloader is what creates the DownloadProgressBar control which displays the progress of a Flex application downloading and being initialized. The <strong>second frame</strong> of a Flex SWF contains the rest of the Flex framework code, your application code and all of your application assets like embedded fonts, images, etc.</p>
<p><span id="more-217"></span><br />
By creating a 2-frame movie, Flex applications can take advantage of the <strong>streaming </strong>support built into the Flash Player and a preloader can appear before all of the Flex framework code and your application code are downloaded.</p>
<blockquote>
<p>The .swf format is a progressive download format which means that Flash player can access content on frames as they download without having to wait for the entire file to download.</p>
</blockquote>
<p><u>Here are the steps</u>:</p>
<ol>
<li>First, enough bytes for <strong>frame 1</strong> are streamed down to the Flash Player.</li>
<li>The Flash Player executes those bytes by creating a <strong>SystemManager </strong>instance.</li>
<li>SystemManager instruct the Flash Player to stop at the end of frame 1.</li>
<li>SystemManager then goes on to create the <strong>Preloader </strong>which creates the <strong>DownloadProgressBar </strong>control and pops that up on the client screen.</li>
<li>The Preloader then starts tracking the rest of the bytes streaming in from the Flex SWF.&#160;</li>
<li>Once all the bytes for the Flex framework and application code are in, the System Manager goes on to frame 2 and instantiates the <strong>Application </strong>instance.</li>
<li>Once the Application instance has been created, the SystemManager sets <strong>Application.systemManager</strong> to itself. This is how you, the application developer, can access the SystemManager at a later time.</li>
<li>The Application dispatches the <strong>preinitialize event</strong> at the beginning of the initialization process.</li>
<li><strong>Application goes on to create its children. </strong>The method createChildren() is called on the application. At this point each of the application’s components is being constructed, and each component’s createChildren() will be also called. For detail, look at <strong>component lifecycle section</strong>.</li>
<li>The Application dispatches the <strong>initialize event</strong>, which indicates that all application’s components have been initialized. However, at this state,  all the components are not yet laid out.</li>
<li>Eventually, once all the Application child controls and containers have been created, sized and positioned, the Application dispatches the <strong>creationComplete </strong>event.</li>
<li>Once the creationComplete event has been dispatched, the Preloader removes the DownloadProgressBar control and the SystemManager <strong>adds the Application instance to the Flash Player display list</strong>. (The Flash Player display list is basically the tree of visible or potentially visible objects that make up your application. When you add and remove child components to your application, your basically just adding and removing them from the display list).</li>
<li>Once the Application is added to the Flash Player display list, the Application dispatches its <strong>applicationComplete</strong> event</li>
<li>The Application has been created and is up on the screen ready to be interacted with.<br />
    &#160;</li>
</ol>
<h2>Component Architecture</h2>
<p>&#160;This is the best video I have found that discuss the component lifecycle in detail &#8211; Thanks for Deepa.</p>
<p><embed height="350" width="550" flashvars="v=~b64~aHR0cDovL2Fkb2JlLmVkZ2Vib3NzLm5ldC9mbGFzaC9hZG9iZS9hZG9iZXR2Mi9tYXhfMjAwOF9kZXZlbG9wLzE1OTY3NDE2MTNfMjkzMTA5MzAwMV8yMDAxLS1zdWJyYW1hbmlhbS10dWUtMTMwcG0tZGV2ZWxvcC5mbHY/cnNzX2ZlZWRpZD0xNTM4NCZ4bWx2ZXJzPTI=&amp;w=550&amp;t=http://tv.adobe.com/MAX-2008-Develop/Creating-New-Components-in-Flex-3-by-Deepa-Subramaniam.htmlvi+f15384v1002&amp;h=350" pluginspage="http://www.adobe.com/go/getflashplayer" type="application/x-shockwave-flash" allowscriptaccess="always" quality="high" loop="false" play="true" name="AdobeTVPlayer" bgcolor="#000000" src="http://tv.adobe.com/Embed.swf"></embed></p>
<p class="MsoNormal" style="text-align: left;">Below is summary I took from the video above, in case you don&#8217;t want to spend an hour to listen to this. However, I really think you should. To start out, Deepa introduced component and skinning architecture &#8220;<strong>Spark</strong>&#8221; that is part of Flex 4 <strong>Gumbo</strong>. Spark is built on top of <strong>Halo </strong>(ie. Flex 3 component architecture) and components using Halo or Spark can co-exist. Then,&#160; Deepa put her focus to talk about how to develop her custom video component on top of Halo and Spark for comparison. Spart is great that it can factor out the layout code from component.</p>
<p class="MsoNormal" style="text-align: left;">Halo component lifecycle can be separated into 3 phases:</p>
<p><u><strong>Phase 1 &#8211; Initialization</strong></u></p>
<ol>
<li><u>Construction</u>
<ul>
<li>Choose the right base class and provide default constructor (ie. zero argument). By extending <strong>UIComponent</strong>, you inherit all of the lifecycle methods, events and properties. Also, since UIComponent extends from <strong>EventDispatcher</strong>, your component inherits the ability to listen and dispatch events.</li>
<li>Best practice: Call super() and add event listener. Minimal work should occur here.</li>
</ul>
</li>
<li><u>COnfiguration</u>
<ul>
<li>setter and getter for properties.</li>
<li>Involve in invalidation and validation cycle. Detail later.</li>
<li>Best practice: Setter should not expect the internal children have been created and you want your setter to be fast. Use _xxx for storage variable and have dirty flag for your variable. Throw events out when the internal state of your component get changed.</li>
</ul>
</li>
<li><u>Attachment</u>
<ul>
<li>Component are added to the flash <strong>display list </strong>by its parent via <strong>addChild()</strong> call. Without attachment, component lifecycle will be stalled. Nothing is going to happen to your component. So, this is an important step.</li>
<li>Display list is a tree of visible or potentially visible objects in your application. At the root of the display list is your main application. Thing like containment hierarchy and rendering order are all maintained by the display list.</li>
</ul>
</li>
<li><u>Initialization</u>- 5 lifecycle actions occur here.
<ul>
<li><strong>preinitialize </strong>event is dispatched. It signifies that you as component that has been added to the display list by your parent via addChild().</li>
<li><strong>createChildren()</strong> &#8211; walk through all your children, create, configure and attach them to the display list. Best practice: Call super.createChildren(), construct if not exist and attach your children via addChild(). If your child component is dynamic and data driven, use <strong>commitProperties()</strong> b/c it gets called at every invalidation and validation cycle. <em>Halo rules: Container &#8211;&gt; nested structure of UIComponents &#8211;&gt; MovieClip, Video, Shape and Sprite.</em></li>
<li><strong>initialize </strong>event is dispatched. It signifies that you and all your children are created and attached.</li>
<li>First <strong>invalidation/ validation pass </strong>occurs,</li>
<li><strong>creationComplete</strong> event is dispatched. Ready for prime time.</li>
</ul>
</li>
</ol>
<p><u><strong>Phase 2 &#8211; Update</strong></u></p>
<p>At this phase, your component is fully initialized and ready for usage. Now, it needs to know how to update. And update occurs when its internal state has been changed by like user interactions. To respond to the changes, component uses the invalidation and validation cycle. The key here is to flag the changed variable dirty during invalidation and later handles it during validation right before rendering. So, you can have many invalidation and one validation that gives better performance via avoiding repetitive work. To better understand this approach, Deepa talks about <strong>Flash Player Elastic Racetrack</strong> that has 2 parts: code execution and rendering. If either part taking too long, Flash player cannot get its job done faster than 1 frame per second. You will see your application with lag and halt, that is bad!&#160;</p>
<ol>
<li>Invalidation/ Validation can be split into 3 phase. 
<ul>
<li>InvalidateProperties &#8211;&gt; commitProperties</li>
<li>InvalidateSize &#8211;&gt; measure (measure may not be called. Don&#8217;t have your code depends on it)</li>
<li>InvalidateDisplayLIst &#8211;&gt; updateDisplayList</li>
</ul>
</li>
</ol>
<p><u><strong>Phase 3 &#8211; Destruction</strong></u></p>
<ol>
<li>Detachment</li>
<li>Garbage Collection</li>
</ol>
<p>&#160;</p>
<h2>What is Mixin?</h2>
<p>When you put the [Mixin] metadata just above your class definition and add a static init function to the class like so,</p>
<p><pre><pre name="code" class="java">
[Mixin]
public class Model {&nbsp;&nbsp; 
&nbsp;&nbsp;public static function init (systemManager : ISystemManager)&nbsp;&nbsp; {&nbsp;&nbsp;&nbsp;&nbsp; 
&nbsp;&nbsp;&nbsp;&nbsp;trace (&quot;I get called first&quot;)&nbsp;&nbsp; 
&nbsp;&nbsp;} 
}
</pre></pre></p>
<p>The static init function will be called as soon as your application loads (assuming this class is referenced somewhere in the app), much like static initialization blocks in Java. This is useful if you have some code that you want to run before any of the other code in the class.</p>
<h2>Reference</h2>
<p>Below are some interesting articles I found related to this article</p>
<ol>
<li><a href="http://npacemo.com/wordpress/2008/07/06/flex-application-bootstrapping-totally-custom-preloader/">Create Custom Preloader</a>, <a href="http://www.onflex.org/ted/2006/07/flex-2-custom-preloaders.php">Ted has an article related to this too</a></li>
<li><a href="http://www.insideria.com/2008/04/flex-ria-performance-considera.html">Speed up startup loading time</a></li>
<li><a href="http://userflex.wordpress.com/2008/02/07/preload-runtime-styles/">Dynamic loading a new custom theme without fraction seconds delay</a></li>
<li><a href="http://tv.adobe.com/#vi+f15384v1002">Deepa presentation in MAX</a></li>
<li><a href="http://www.adobe.com/support/documentation/en/flex/1/mixin/mixin2.html">Introduction to mixins</a></li>
</ol>
<p>&#160;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/report/flex-startup-sequence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Flex Annotated Charting</title>
		<link>http://www.solutionhacker.com/data-intelligence/report/flex-annotated-charting/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/report/flex-annotated-charting/#comments</comments>
		<pubDate>Wed, 07 Jan 2009 23:02:17 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Report]]></category>
		<category><![CDATA[annotation]]></category>
		<category><![CDATA[chart]]></category>
		<category><![CDATA[flex]]></category>
		<category><![CDATA[google finance]]></category>
		<category><![CDATA[line chart]]></category>
		<category><![CDATA[range selector]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=201</guid>
		<description><![CDATA[<p>Recently, I want to extend the LineChart in Flex. I want to have line chart with event annotated like Google Finance.</p>
<div align="center"><a href="http://meutzner.com/examples/flex_finance/Flex_Finance_Step5.html" target="_blank">  <img border="0" alt="" src="http://meutzner.com/examples/flex_finance/flex_finance.jpg" /></a></div>
<div align="center">&#160;</div>
<div style="text-align: left;">First of all, I googled the Net to see whether anyone had already done it. It was even better if I could find any open source project related to this. Below are the interesting things I found:</div>
<ol>
    <li><a href="http://www.djindexes.com/DJIA110/learning-center/">Dow Jone Interactive Chart </a>(commerical - it is exactly what I am looking for)</li>
    <li><a href="http://demo.quietlyscheming.com/InteractiveBubble/InteractiveBubble.html">Interactive Bubble Chart</a> (open - although it is not exactly want I want, but if I believe the code can benefit me if I need to customize line chart. <img src="../../../../../wp-includes/images/smilies/icon_cool.gif" alt=":cool:" onclick="grin(':cool:');" /> I may just need to draw the interactive small bubble on the line to get my job done!)</li>
    <li>This <a href="http://demo.quietlyscheming.com/ChartSampler/app.html">demo </a>gives you tons of chart samples. They are all great example although none of them satisfy my current need.</li>
    <li>This <a href="http://www.stretchmedia.ca/blog/index.cfm/2007/3/28/Chart-Milestones-using-annotationElements">demo </a>is close to what I want. From this demo, I notice I can use "<strong>annotationElement</strong>" to draw on top of the data series. However, the trick is to convert the data points to pixel coordinate in order for me to draw something that can move along with the graph even someone stretches it. To make thing easier, Ely Greenfield has created <strong>DataDrawingCanvas </strong>that helps us draw on the chart with only data points specified instead of pixel coordinates. This class extends the <strong>ChartElement </strong>like <strong>AnnotationElement </strong>does (<a href="http://www.quietlyscheming.com/blog/charts/easy-custom-charts/">blog</a>). That is amazing!! Thanks!!! <img onclick="grin(':smile:');" alt=":smile:" src="../../../../../wp-includes/images/smilies/icon_smile.gif" /></li>
    <li><a href="http://meutzner.com/examples/flex_finance/Flex_Finance_Step5.html">Google Finance Chart</a> (It is exactly what I want. I wonder I can get the source of it)<br />
    <ul>
        <li>I have found the blog and <a href="http://meutzner.com/examples/flex_finance/Flex Finance PPT.ppt">powerpoint </a>of this sample (1/7/2009)</li>
        <li>Google uses the Flash/ JavaScript integrate kit to get it works (<a href="http://www.mikechambers.com/blog/2006/03/21/google-finance-flash-ajax-integration/">blog</a>) - I heard that it is very nice combination of Flash and AJAX. This is similar to <a href="http://www.measuremap.com/">MeasureMap</a>'s use of the kit.</li>
        <li>It is open source example!!! (<a href="http://www.meutzner.com/blog/attachments/360/srcview/index.html">code</a>). Thanks for Brendan Meutzner!!</li>
        <li>Brendan also shows us how he created his demo in 5 steps to help us understand how to build it ourselves.</li>
        <li><a href="http://www.meutzner.com/blog/attachments/360/Flex_Finance_Step1.html">Step 1</a>, <a href="http://www.meutzner.com/blog/attachments/360/Flex_Finance_Step2.html">Step 2</a>, <a href="http://www.meutzner.com/blog/attachments/360/Flex_Finance_Step3.html">Step 3</a>, <a href="http://www.meutzner.com/blog/attachments/360/Flex_Finance_Step4.html">Step 4</a>, <a href="http://www.meutzner.com/blog/attachments/360/Flex_Finance_Step5.html">Step 5</a> - enjoy!!</li>
    </ul>
    </li>
</ol>
<h2>Reference</h2>
<p>Useful resources:</p>
<ol>
    <li><a href="http://tv.adobe.com/#vi+f15384v1024">Data Visualization by Tom Gonzalez</a>. (Tom created an open source visualization framework named Axiis. It looks great. Once I get a chance, I will dig into it) - 7/31/2009</li>
    <li><a href="http://www.onflex.org/ACDS/BuildingAFlexComponent.pdf">Building a Flex Component</a> by Ely GreenField</li>
    <li><a href="http://www.adobe.com/devnet/flex/articles/components_separation.html">Create component and enforce separation of concern</a></li>
    <li><a href="http://www.edwardtufte.com/tufte/">http://www.edwardtufte.com/tufte/ </a>(Edward Tufte - famous guy in data visualization)</li>
    <li><a href="http://www.insideria.com/2008/03/image-manipulation-in-flex.html">http://www.insideria.com/2008/03/image-manipulation-in-flex.html</a> (Image Manipulation)</li>
</ol>
<p>&#160;</p>]]></description>
			<content:encoded><![CDATA[<p>Recently, I want to extend the LineChart in Flex. I want to have line chart with event annotated like Google Finance.</p>
<div align="center"><a href="http://meutzner.com/examples/flex_finance/Flex_Finance_Step5.html" target="_blank">  <img border="0" alt="" src="http://meutzner.com/examples/flex_finance/flex_finance.jpg" /></a></div>
<div align="center">&#160;</div>
<div style="text-align: left;">First of all, I googled the Net to see whether anyone had already done it. It was even better if I could find any open source project related to this. Below are the interesting things I found:</div>
<ol>
<li><a href="http://www.djindexes.com/DJIA110/learning-center/">Dow Jone Interactive Chart </a>(commerical &#8211; it is exactly what I am looking for)</li>
<li><a href="http://demo.quietlyscheming.com/InteractiveBubble/InteractiveBubble.html">Interactive Bubble Chart</a> (open &#8211; although it is not exactly want I want, but if I believe the code can benefit me if I need to customize line chart. <img src="../../../../../wp-includes/images/smilies/icon_cool.gif" alt=":cool:" onclick="grin(':cool:');" /> I may just need to draw the interactive small bubble on the line to get my job done!)</li>
<li>This <a href="http://demo.quietlyscheming.com/ChartSampler/app.html">demo </a>gives you tons of chart samples. They are all great example although none of them satisfy my current need.</li>
<li>This <a href="http://www.stretchmedia.ca/blog/index.cfm/2007/3/28/Chart-Milestones-using-annotationElements">demo </a>is close to what I want. From this demo, I notice I can use &#8220;<strong>annotationElement</strong>&#8221; to draw on top of the data series. However, the trick is to convert the data points to pixel coordinate in order for me to draw something that can move along with the graph even someone stretches it. To make thing easier, Ely Greenfield has created <strong>DataDrawingCanvas </strong>that helps us draw on the chart with only data points specified instead of pixel coordinates. This class extends the <strong>ChartElement </strong>like <strong>AnnotationElement </strong>does (<a href="http://www.quietlyscheming.com/blog/charts/easy-custom-charts/">blog</a>). That is amazing!! Thanks!!! <img onclick="grin(':smile:');" alt=":smile:" src="../../../../../wp-includes/images/smilies/icon_smile.gif" /></li>
<li><a href="http://meutzner.com/examples/flex_finance/Flex_Finance_Step5.html">Google Finance Chart</a> (It is exactly what I want. I wonder I can get the source of it)
<ul>
<li>I have found the blog and <a href="http://meutzner.com/examples/flex_finance/Flex Finance PPT.ppt">powerpoint </a>of this sample (1/7/2009)</li>
<li>Google uses the Flash/ JavaScript integrate kit to get it works (<a href="http://www.mikechambers.com/blog/2006/03/21/google-finance-flash-ajax-integration/">blog</a>) &#8211; I heard that it is very nice combination of Flash and AJAX. This is similar to <a href="http://www.measuremap.com/">MeasureMap</a>&#8216;s use of the kit.</li>
<li>It is open source example!!! (<a href="http://www.meutzner.com/blog/attachments/360/srcview/index.html">code</a>). Thanks for Brendan Meutzner!!</li>
<li>Brendan also shows us how he created his demo in 5 steps to help us understand how to build it ourselves.</li>
<li><a href="http://www.meutzner.com/blog/attachments/360/Flex_Finance_Step1.html">Step 1</a>, <a href="http://www.meutzner.com/blog/attachments/360/Flex_Finance_Step2.html">Step 2</a>, <a href="http://www.meutzner.com/blog/attachments/360/Flex_Finance_Step3.html">Step 3</a>, <a href="http://www.meutzner.com/blog/attachments/360/Flex_Finance_Step4.html">Step 4</a>, <a href="http://www.meutzner.com/blog/attachments/360/Flex_Finance_Step5.html">Step 5</a> &#8211; enjoy!!</li>
</ul>
</li>
</ol>
<h2>Reference</h2>
<p>Useful resources:</p>
<ol>
<li><a href="http://tv.adobe.com/#vi+f15384v1024">Data Visualization by Tom Gonzalez</a>. (Tom created an open source visualization framework named Axiis. It looks great. Once I get a chance, I will dig into it) &#8211; 7/31/2009</li>
<li><a href="http://www.onflex.org/ACDS/BuildingAFlexComponent.pdf">Building a Flex Component</a> by Ely GreenField</li>
<li><a href="http://www.adobe.com/devnet/flex/articles/components_separation.html">Create component and enforce separation of concern</a></li>
<li><a href="http://www.edwardtufte.com/tufte/">http://www.edwardtufte.com/tufte/ </a>(Edward Tufte &#8211; famous guy in data visualization)</li>
<li><a href="http://www.insideria.com/2008/03/image-manipulation-in-flex.html">http://www.insideria.com/2008/03/image-manipulation-in-flex.html</a> (Image Manipulation)</li>
</ol>
<p>&#160;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/report/flex-annotated-charting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Database concurrency control &#8211; MVCC</title>
		<link>http://www.solutionhacker.com/data-intelligence/data-store/database-concurrency-control-mvcc/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/data-store/database-concurrency-control-mvcc/#comments</comments>
		<pubDate>Sat, 13 Dec 2008 01:37:03 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Data Store]]></category>
		<category><![CDATA[mvcc]]></category>
		<category><![CDATA[postgresql]]></category>
		<category><![CDATA[read lock]]></category>
		<category><![CDATA[transaction]]></category>
		<category><![CDATA[write lock]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=198</guid>
		<description><![CDATA[<h2>Concurrency Issue - Lost Update</h2>
<p><strong>Lost update</strong> is the key concurrency problem that we try to avoid:</p>
<p><img height="345" width="532" alt="" src="http://www.solutionhacker.com/wp-content/uploads/image/lostupdate.JPG" /></p>
<p>From the sequences above,<strong> SELECT-UPDATE</strong> transaction A will overwrite the update Transaction B made to the balance. .&#160; If the transactions of A and B would serialize properly, the correct balance value after these transactions would be 700. For performance reasons, most RDBMS uses the isolation level "<strong>READ COMMITTED</strong>" as default. However, this isolation level does not protect any data read by transaction of getting <strong>outdated </strong>right after reading the value as another transaction can change it. You can use isolation level "<strong>REPEATABLE READ</strong>" or&#160; "<strong>SERIALIZABLE</strong>" to protect the values read in the transaction of getting outdated via holding <strong>shared read lock </strong>on these rows up to the end of the transaction. In case you are new to RDBMS, let me give you some background.</p>
<p><!--more--></p>
<p>There are 3 things we are trying to avoid in reliable RDBMS:</p>
<ol>
    <li><strong>Dirty read</strong> - if transaction A can read uncommitted changes by transaction B, it may be working on the data that may be rolled back if transaction B cannot successfully commit the changes.</li>
    <li><strong>Non-repeatable read</strong> - Transaction A issues the same select twice during the transaction may get different results as some other transactions may commit its changes in between.</li>
    <li><strong>Phantom read</strong> - Transaction A issues the same select twice during the transaction may get new rows as some other transactions may insert new records that fits the criteria of the query.</li>
</ol>
<p>To protect the 3 issues above, RDBMS provides us 4 level of isolations:</p>
<ol>
    <li><strong>READ UNCOMMITTED</strong> - lowest level - no protection at all. Dirty, non-repeatable and phantom read are possible.</li>
    <li><strong>READ COMMITTED - </strong>default setting - cannot protect non-repeatable and phantom read</li>
    <li><strong>REPEABLABLE READ - </strong>phantom read is still possible</li>
    <li><strong>SERIALIZABLE </strong>- highest level with largest <strong>overhead </strong>as it forces all transaction to run in serial fashion (ie. one by one). However, it resolves all the issues above.</li>
</ol>
<p>Under the hood, RDBMS uses locking scheme to achieve the 4 transaction isolation level above. How? It uses 2 kinds of locks:</p>
<ol>
    <li><strong>Shared read lock</strong> - more than one read concurrently but no write is possible</li>
    <li><strong>Exclusive write lock</strong> - once exclusive write lock is acquired, no one can read and write until it is released.</li>
</ol>
<p>In other words, <strong>read locks block write lock and vice versa</strong>. In the above Lost Update problem, transaction A can use <strong>shared read lock</strong> to make sure no write is possible for the data it read during the course of its transaction. So far so good? <img src="../../../../../wp-includes/images/smilies/icon_biggrin.gif" alt=":grin:" onclick="grin(':grin:');" />. Lets move on!</p>
<p>Using high level of isolation to solve the lost update issue may give you a new set of issues related to <strong>locking</strong>:</p>
<ol>
    <li><strong>Long-running transaction</strong> - if transaction A takes a long time to commit, other transactions that wants to change the value will be queued up. Normally, if your transaction read info out from db, change it in UI and write it back. You should consider it as long transaction as you never know how long it takes for user to change it in UI (ie. user think time). In this scenario, you will split the transaction into 2. (ie. a transaction to read and a transaction to write back). However, once you split up the transaction, you have no way to guarantee&#160; your data of the previous read up-to-date no matter what isolation level you use. How to solve this problem then? <img src="../../../../../wp-includes/images/smilies/icon_eek.gif" alt=":shock:" onclick="grin(':shock:');" /> It can be solved by checking the <strong>version </strong>or <strong>timestamp </strong>of the data in database before you write it in the write transaction. If the version is the same as the first read transaction, it means no one has changed it even you leave it wide open, then you are safe to change it. Otherwise, an exception should be thrown. This mechanism is called<strong> optimistic locking</strong>.</li>
    <li><strong>Deadlock</strong> - if transaction A needs other resources in order to commit its transaction but these resources are holding up by a transaction that is waiting for A to commit, it will create a typical <strong>circular </strong>dependency.</li>
</ol>
<p>Locks are not just for row. It also happens to index. That is to say, for heavy updates to distinct rows of a table, the bottleneck can be<strong> index locking</strong>.</p>
<h2>New Database Currency Control - MVCC</h2>
<p>The aim of <strong>Multi-Version Concurrency</strong> Control (MVCC) is to <strong>avoid Writers blocking Readers and vice-versa</strong>, by making use of <strong>multiple versions</strong> of data (ie. No read lock is necessarily). The problem of Writers blocking Readers can be avoided if Readers can obtain access to a previous version of the data that is locked by Writers for modification. That is to say, there may not have concurrent write.&#160; However, while MVCC improves database concurrency, its impact on data consistency is more complex. I will discuss about this later. Now lets look at how to implement MVCC.</p>
<p><strong>Challenges in implementing MVCC</strong></p>
<ol>
    <li>If multiple versions are stored in the database, an efficient garbage collection mechanism is required to get rid of old versions when they are no longer needed.</li>
    <li>The DBMS must provide efficient access methods that avoid looking at redundant versions.</li>
    <li>The DBMS must avoid expensive lookups when determining the relative commit time of a transaction.</li>
</ol>
<p><strong>There are essentially two approaches to multi-version concurrency. </strong></p>
<ol>
    <li>The first approach is to store <strong>multiple versions</strong> of records in the database, and garbage collect records when they are no longer required. This is the approach adopted by <strong>PostgreSQL </strong> and Firebird/Interbase.</li>
    <li>The second approach is to keep only the <strong>latest version</strong> of data in the database, but reconstruct older versions of data dynamically as required by exploiting information within the Write Ahead Log. This is the approach taken by <strong>Oracle</strong> and <strong>MySQL/InnoDb</strong>.</li>
</ol>
<p><strong>How does Postgresql implement MVCC</strong></p>
<p>In PostgreSQL, when a row is updated, a new version (called a tuple) of the row is created and inserted into the table. The previous version is provided a pointer to the new version. The previous version is marked "expired", but remains in the database until it is garbage collected. So, to achieve this, each tuple needs 2 additional fields:</p>
<ol>
    <li><strong>creation id/ xmin </strong>- The ID of the transaction that inserted/updated the row and created this tuple.</li>
    <li><strong>expired id/ xmax </strong>- The transaction that deleted the row, or created a new version of this tuple. Initially it is null.</li>
</ol>
<p>A row is visible if:</p>
<ul>
    <li>its<strong> creation id is a committed transaction</strong> (ensure it doesn't read uncommitted data from other open transaction) <strong>AND </strong>its <strong>creation id is &#60; current transaction id</strong> (ensure non-repeatable read)</li>
</ul>
<ul>
    <li>its <strong>expired id is nul</strong>l (ensure it is not stale data) or <strong>&#62; current transaction</strong> id b/c if it is deleted by the transaction after the current one, the current transaction should ignore it.</li>
</ul>
<p>To track the status of transactions, a special table called <strong>PG_LOG</strong> is maintained. Since Transaction Ids are implemented using a monotonically increasing counter, the PG_LOG table can represent transaction status as a bitmap. This table contains two bits of status information for each transaction; the possible states are <strong>in-progress</strong>, <strong>committed</strong>, or <strong>aborted</strong>.</p>
<p>One of the drawback of MVCC in PostgreSQL is that old version tuples are kept in the same table. To garbage collected these, you can "<strong>VACCUM</strong>" it.</p>
<h2>Reference</h2>
<p>Below are the references I used for this artice:</p>
<ol>
    <li><a href="http://era.teipir.gr/era3/fpapers/b43.doc">Paper - Data Access Pattern</a></li>
    <li><a href="http://www.developer.com/open/print.php/877181">PostgreSQL Ends the Waiting Game</a><a href="http://www.developer.com/open/print.php/877181"><br />
    </a></li>
    <li><a href="http://simpledbm.googlecode.com/files/mvcc-survey-1.0.pdf">MVCC implementation algorithms in different databases</a></li>
    <li><a href="http://postgresql.markmail.org/">MarkMail - very good postgresql forum</a></li>
    <li><a href="http://www.postgresql.org/docs/current/static/preface.html">Postgresql 8.3.6 Manual </a></li>
</ol>
<p>&#160;&#160;</p>
<p><iframe frameborder="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=solutionhacke-20&#38;o=1&#38;p=8&#38;l=as1&#38;asins=0596101716&#38;fc1=000000&#38;IS2=1&#38;lt1=_blank&#38;m=amazon&#38;lc1=0000FF&#38;bc1=000000&#38;bg1=FFFFFF&#38;f=ifr" style="width: 120px; height: 240px;" marginwidth="0" marginheight="0"></iframe>&#160;&#160;&#160;&#160; <iframe frameborder="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=solutionhacke-20&#38;o=1&#38;p=8&#38;l=as1&#38;asins=0672329387&#38;fc1=000000&#38;IS2=1&#38;lt1=_blank&#38;m=amazon&#38;lc1=0000FF&#38;bc1=000000&#38;bg1=FFFFFF&#38;f=ifr" style="width: 120px; height: 240px;" marginwidth="0" marginheight="0"></iframe>&#160;&#160;&#160;&#160;<iframe frameborder="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=solutionhacke-20&#38;o=1&#38;p=8&#38;l=as1&#38;asins=0672327562&#38;fc1=000000&#38;IS2=1&#38;lt1=_blank&#38;m=amazon&#38;lc1=0000FF&#38;bc1=000000&#38;bg1=FFFFFF&#38;f=ifr" style="width: 120px; height: 240px;" marginwidth="0" marginheight="0"></iframe></p>
<p>&#160;</p>]]></description>
			<content:encoded><![CDATA[<h2>Concurrency Issue &#8211; Lost Update</h2>
<p><strong>Lost update</strong> is the key concurrency problem that we try to avoid:</p>
<p><img height="345" width="532" alt="" src="http://www.solutionhacker.com/wp-content/uploads/image/lostupdate.JPG" /></p>
<p>From the sequences above,<strong> SELECT-UPDATE</strong> transaction A will overwrite the update Transaction B made to the balance. .&#160; If the transactions of A and B would serialize properly, the correct balance value after these transactions would be 700. For performance reasons, most RDBMS uses the isolation level &#8220;<strong>READ COMMITTED</strong>&#8221; as default. However, this isolation level does not protect any data read by transaction of getting <strong>outdated </strong>right after reading the value as another transaction can change it. You can use isolation level &#8220;<strong>REPEATABLE READ</strong>&#8221; or&#160; &#8220;<strong>SERIALIZABLE</strong>&#8221; to protect the values read in the transaction of getting outdated via holding <strong>shared read lock </strong>on these rows up to the end of the transaction. In case you are new to RDBMS, let me give you some background.</p>
<p><span id="more-198"></span></p>
<p>There are 3 things we are trying to avoid in reliable RDBMS:</p>
<ol>
<li><strong>Dirty read</strong> &#8211; if transaction A can read uncommitted changes by transaction B, it may be working on the data that may be rolled back if transaction B cannot successfully commit the changes.</li>
<li><strong>Non-repeatable read</strong> &#8211; Transaction A issues the same select twice during the transaction may get different results as some other transactions may commit its changes in between.</li>
<li><strong>Phantom read</strong> &#8211; Transaction A issues the same select twice during the transaction may get new rows as some other transactions may insert new records that fits the criteria of the query.</li>
</ol>
<p>To protect the 3 issues above, RDBMS provides us 4 level of isolations:</p>
<ol>
<li><strong>READ UNCOMMITTED</strong> &#8211; lowest level &#8211; no protection at all. Dirty, non-repeatable and phantom read are possible.</li>
<li><strong>READ COMMITTED &#8211; </strong>default setting &#8211; cannot protect non-repeatable and phantom read</li>
<li><strong>REPEABLABLE READ &#8211; </strong>phantom read is still possible</li>
<li><strong>SERIALIZABLE </strong>- highest level with largest <strong>overhead </strong>as it forces all transaction to run in serial fashion (ie. one by one). However, it resolves all the issues above.</li>
</ol>
<p>Under the hood, RDBMS uses locking scheme to achieve the 4 transaction isolation level above. How? It uses 2 kinds of locks:</p>
<ol>
<li><strong>Shared read lock</strong> &#8211; more than one read concurrently but no write is possible</li>
<li><strong>Exclusive write lock</strong> &#8211; once exclusive write lock is acquired, no one can read and write until it is released.</li>
</ol>
<p>In other words, <strong>read locks block write lock and vice versa</strong>. In the above Lost Update problem, transaction A can use <strong>shared read lock</strong> to make sure no write is possible for the data it read during the course of its transaction. So far so good? <img src="../../../../../wp-includes/images/smilies/icon_biggrin.gif" alt=":grin:" onclick="grin(':grin:');" />. Lets move on!</p>
<p>Using high level of isolation to solve the lost update issue may give you a new set of issues related to <strong>locking</strong>:</p>
<ol>
<li><strong>Long-running transaction</strong> &#8211; if transaction A takes a long time to commit, other transactions that wants to change the value will be queued up. Normally, if your transaction read info out from db, change it in UI and write it back. You should consider it as long transaction as you never know how long it takes for user to change it in UI (ie. user think time). In this scenario, you will split the transaction into 2. (ie. a transaction to read and a transaction to write back). However, once you split up the transaction, you have no way to guarantee&#160; your data of the previous read up-to-date no matter what isolation level you use. How to solve this problem then? <img src="../../../../../wp-includes/images/smilies/icon_eek.gif" alt=":shock:" onclick="grin(':shock:');" /> It can be solved by checking the <strong>version </strong>or <strong>timestamp </strong>of the data in database before you write it in the write transaction. If the version is the same as the first read transaction, it means no one has changed it even you leave it wide open, then you are safe to change it. Otherwise, an exception should be thrown. This mechanism is called<strong> optimistic locking</strong>.</li>
<li><strong>Deadlock</strong> &#8211; if transaction A needs other resources in order to commit its transaction but these resources are holding up by a transaction that is waiting for A to commit, it will create a typical <strong>circular </strong>dependency.</li>
</ol>
<p>Locks are not just for row. It also happens to index. That is to say, for heavy updates to distinct rows of a table, the bottleneck can be<strong> index locking</strong>.</p>
<h2>New Database Currency Control &#8211; MVCC</h2>
<p>The aim of <strong>Multi-Version Concurrency</strong> Control (MVCC) is to <strong>avoid Writers blocking Readers and vice-versa</strong>, by making use of <strong>multiple versions</strong> of data (ie. No read lock is necessarily). The problem of Writers blocking Readers can be avoided if Readers can obtain access to a previous version of the data that is locked by Writers for modification. That is to say, there may not have concurrent write.&#160; However, while MVCC improves database concurrency, its impact on data consistency is more complex. I will discuss about this later. Now lets look at how to implement MVCC.</p>
<p><strong>Challenges in implementing MVCC</strong></p>
<ol>
<li>If multiple versions are stored in the database, an efficient garbage collection mechanism is required to get rid of old versions when they are no longer needed.</li>
<li>The DBMS must provide efficient access methods that avoid looking at redundant versions.</li>
<li>The DBMS must avoid expensive lookups when determining the relative commit time of a transaction.</li>
</ol>
<p><strong>There are essentially two approaches to multi-version concurrency. </strong></p>
<ol>
<li>The first approach is to store <strong>multiple versions</strong> of records in the database, and garbage collect records when they are no longer required. This is the approach adopted by <strong>PostgreSQL </strong> and Firebird/Interbase.</li>
<li>The second approach is to keep only the <strong>latest version</strong> of data in the database, but reconstruct older versions of data dynamically as required by exploiting information within the Write Ahead Log. This is the approach taken by <strong>Oracle</strong> and <strong>MySQL/InnoDb</strong>.</li>
</ol>
<p><strong>How does Postgresql implement MVCC</strong></p>
<p>In PostgreSQL, when a row is updated, a new version (called a tuple) of the row is created and inserted into the table. The previous version is provided a pointer to the new version. The previous version is marked &#8220;expired&#8221;, but remains in the database until it is garbage collected. So, to achieve this, each tuple needs 2 additional fields:</p>
<ol>
<li><strong>creation id/ xmin </strong>- The ID of the transaction that inserted/updated the row and created this tuple.</li>
<li><strong>expired id/ xmax </strong>- The transaction that deleted the row, or created a new version of this tuple. Initially it is null.</li>
</ol>
<p>A row is visible if:</p>
<ul>
<li>its<strong> creation id is a committed transaction</strong> (ensure it doesn&#8217;t read uncommitted data from other open transaction) <strong>AND </strong>its <strong>creation id is &lt; current transaction id</strong> (ensure non-repeatable read)</li>
</ul>
<ul>
<li>its <strong>expired id is nul</strong>l (ensure it is not stale data) or <strong>&gt; current transaction</strong> id b/c if it is deleted by the transaction after the current one, the current transaction should ignore it.</li>
</ul>
<p>To track the status of transactions, a special table called <strong>PG_LOG</strong> is maintained. Since Transaction Ids are implemented using a monotonically increasing counter, the PG_LOG table can represent transaction status as a bitmap. This table contains two bits of status information for each transaction; the possible states are <strong>in-progress</strong>, <strong>committed</strong>, or <strong>aborted</strong>.</p>
<p>One of the drawback of MVCC in PostgreSQL is that old version tuples are kept in the same table. To garbage collected these, you can &#8220;<strong>VACCUM</strong>&#8221; it.</p>
<h2>Reference</h2>
<p>Below are the references I used for this artice:</p>
<ol>
<li><a href="http://era.teipir.gr/era3/fpapers/b43.doc">Paper &#8211; Data Access Pattern</a></li>
<li><a href="http://www.developer.com/open/print.php/877181">PostgreSQL Ends the Waiting Game</a><a href="http://www.developer.com/open/print.php/877181"><br />
    </a></li>
<li><a href="http://simpledbm.googlecode.com/files/mvcc-survey-1.0.pdf">MVCC implementation algorithms in different databases</a></li>
<li><a href="http://postgresql.markmail.org/">MarkMail &#8211; very good postgresql forum</a></li>
<li><a href="http://www.postgresql.org/docs/current/static/preface.html">Postgresql 8.3.6 Manual </a></li>
</ol>
<p>&#160;&#160;</p>
<p><iframe frameborder="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=solutionhacke-20&amp;o=1&amp;p=8&amp;l=as1&amp;asins=0596101716&amp;fc1=000000&amp;IS2=1&amp;lt1=_blank&amp;m=amazon&amp;lc1=0000FF&amp;bc1=000000&amp;bg1=FFFFFF&amp;f=ifr" style="width: 120px; height: 240px;" marginwidth="0" marginheight="0"></iframe>&#160;&#160;&#160;&#160; <iframe frameborder="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=solutionhacke-20&amp;o=1&amp;p=8&amp;l=as1&amp;asins=0672329387&amp;fc1=000000&amp;IS2=1&amp;lt1=_blank&amp;m=amazon&amp;lc1=0000FF&amp;bc1=000000&amp;bg1=FFFFFF&amp;f=ifr" style="width: 120px; height: 240px;" marginwidth="0" marginheight="0"></iframe>&#160;&#160;&#160;&#160;<iframe frameborder="0" scrolling="no" src="http://rcm.amazon.com/e/cm?t=solutionhacke-20&amp;o=1&amp;p=8&amp;l=as1&amp;asins=0672327562&amp;fc1=000000&amp;IS2=1&amp;lt1=_blank&amp;m=amazon&amp;lc1=0000FF&amp;bc1=000000&amp;bg1=FFFFFF&amp;f=ifr" style="width: 120px; height: 240px;" marginwidth="0" marginheight="0"></iframe></p>
<p>&#160;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/data-store/database-concurrency-control-mvcc/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Concurrent Programming &#8211; Part 1 Synchronization</title>
		<link>http://www.solutionhacker.com/implement-your-idea/design/concurrent-programming-part-1-synchronization/</link>
		<comments>http://www.solutionhacker.com/implement-your-idea/design/concurrent-programming-part-1-synchronization/#comments</comments>
		<pubDate>Sat, 29 Nov 2008 22:43:52 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Data Store]]></category>
		<category><![CDATA[Design]]></category>
		<category><![CDATA[concurrency]]></category>
		<category><![CDATA[data race]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Java Memory Model]]></category>
		<category><![CDATA[JSR-166]]></category>
		<category><![CDATA[locking]]></category>
		<category><![CDATA[multithreading]]></category>
		<category><![CDATA[synchronization]]></category>
		<category><![CDATA[Thread-safe]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=157</guid>
		<description><![CDATA[<h2>Get yourself familiar with concurrency programming</h2>
<p>When I interview my candidates, I like to ask questions related to <strong>multi-threading</strong>. I found out that it is a good topic to differentiate out a hardcore programmer from <strong>application-oriented programmer.</strong> I am not saying I am looking for someone who could write the concurrency library as efficient as the one created by Doug Lea. In fact, I am looking for candidates who has solid understanding of this topic. However, I found out that most candidates have little knowledge in this area apart from the meaning of "<strong>synchronized</strong>" keyword in Java syntax. Therefore I decide to write a series of articles to cover some areas of multi-threading that I feel important to understand. Of course, I would start from the basic first.</p>
<p><!--more--></p>
<h2>Introduction of Synchronization</h2>
<p><strong>Synchronization is a way to lock an object, so no 2 threads possibly running on the same code at the same time.</strong></p>
<pre name="code" class="java">
public class SynchronizedCounter {
    private double c = 0;

    public synchronized void increment() {
        c++;
    }

    public synchronized void decrement() {
        c--;
    }

    public synchronized int value() {
        return c;
    }

    public void method2() {
        ...
    }
}</pre>
<p>If count is an instance of SynchronizedCounter, then making these methods synchronized has two effects:</p>
<ul>
    <li><u><strong>Mutual Exclusion</strong></u> - It is not possible for two invocations of synchronized     methods on the same object to interleave. When one thread is     executing a synchronized method for an object, all other threads     that invoke synchronized methods for the same object <strong>block</strong>     (suspend execution) until the first thread is done with the     object. <span style="color: rgb(255, 0, 0);">Remember this rule doesn't apply to non-synchronized methods.</span> And the thread holds the lock of the object can reenter its synchronized methods (ie. <strong>reentrance</strong>).</li>
    <li><u><strong>Memory-Visibility</strong></u> - When a synchronized method exits, it automatically     establishes a <strong>happens-before relationship</strong> with <i>any subsequent         invocation</i> of a synchronized method for the same object.     This guarantees that changes to the state of the object are     <strong>visible </strong>to all threads. Most of the interviewers miss this one. <img src="../../../../../wp-includes/images/smilies/icon_smile.gif" alt=":smile:" onclick="grin(':smile:');" /> <em>In database, it is like the concept of "commit". If you don't commit your changes, others could not see your changes.</em></li>
</ul>
<p>All in all, synchronized methods enable a simple strategy for preventing <strong>thread interference</strong> and <strong>memory consistency</strong> <strong>errors</strong>.  <strong>Interference </strong>happens when two operations, running in different threads, but acting on the same data, <i>interleave</i>. This means that the two operations consist of multiple steps, and the sequences of steps overlap. This will result in unpredictable data lost (hard to fix). Memory consistency error occurs when complier and processor <strong>reorder </strong>statements and uses the <strong>cached </strong>value for better performance</p>
<h2>Problem of Synchronization</h2>
<p>Synchronization will serialize the method calls from different threads. At any given time, only one thread can execute the synchronized method and the other threads need to wait until the object lock releases. This will dramatically diminish the liveness of your application. To minimize the impact, you should:</p>
<ol>
    <li><strong>Reduce lock duration - Synchronized statements</strong> are useful for improving concurrency with fine-grained synchronization</li>
    <li><strong>Reduce lock scope - Mutex variable </strong>in the synchronized lock may help you to avoid locking the whole object. In Java 5 concurrency library, there is class called <strong>ReentrantLock </strong>that provides the same features as synchronized with better performance and flexibility. Here is what is stated in "Java Concurrency in Practice":</li>
</ol>
<blockquote>
<p>Why create a new locking mechanism (ie. ReentrantLock) that is so similar to intrinsic locking (ie. synchronized)? Intrinsic locking works fine in most situations but has some functional limitations. It is not possible to interrupt a thread waiting to acquire a lock, or to attempt to acquire a lock without being willing to wait for it forever. Intrinsic locks also must be released in the same block of code in which they are acquired; this simplifies coding and interacts nicely with exception handling, but makes non-block structured locking disciplines impossible. None of these are reasons to abandon <tt>synchronized</tt>, but in some cases a more flexible locking mechanism offers better liveness or performance.</p>
</blockquote>
<p>So far so good.? Great! lets me ask you 3 questions:</p>
<ol>
    <li><u>Question 1:</u> In the example above, if thread A is executing a synchronized method "increment", can another thread execute method2? <span style="color: rgb(0, 0, 255);"><em>Yes. Because method2 is not synchronized</em></span></li>
    <li><u>Question 2:</u> If thread A is in the synchronized method xyz , can it invoke another synchronized abc? <span style="color: rgb(0, 0, 255);"><em>Yes. The object lock is reentrant.</em></span></li>
    <li><u>Question 3: </u>If I want to make the above class thread-safe without using synchronized object lock, are there any other alternatives? <span style="color: rgb(0, 0, 255);"><em>Yes.</em></span><br />
    <ul>
        <li>ReentrantLock - <a href="http://www.ibm.com/developerworks/java/library/j-jtp10264/">http://www.ibm.com/developerworks/java/library/j-jtp10264/</a></li>
        <li>Atomic variable - <a href="http://www.ibm.com/developerworks/java/library/j-jtp11234/">http://www.ibm.com/developerworks/java/library/j-jtp11234/</a></li>
    </ul>
    </li>
    <li><u>Question 4: </u>How about declare the variable volatile?
    <ul>
        <li><a href="http://www.ibm.com/developerworks/java/library/j-jtp06197.html">Managing Volatility by Brian Goetz</a></li>
        <li><a href="http://www.javalobby.org/java/forums/m91839242.html">Compare Atomic variable, ReentrantLock and Volatile variable.</a></li>
        <li>Use int, byte instead of long or double because updating int or byte is an <strong>atomic action</strong>.  Atomic actions cannot be interleaved, so they can be used without fear of thread interference. However, this does not eliminate all need to synchronize atomic actions, because memory consistency errors are still possible (for example, thread A updated an variable atomically but it hasn't flushed and sync up the main memory, so it is not visible to other threads). Unless the fields in question are declared <code>volatile, </code> the JMM does not require the underlying platform to provide cache coherency or sequential consistency across processors, so it is possible, on some platforms, to read stale data in the absence of synchronization. Look at <a href="http://www.javaperformancetuning.com/news/qotm030.shtml">here </a>for better explanation. However, <strong>volatile can only guarantee atomicity and memory consistency for single variable</strong>. If you want to guarantee that for compound operations, you need to use synchronized block.(or the new java.util.concurrent classes).<em> It is worth pointing out that increment (i.e. ++) and similar operations are not atomic in Java. So incrementing a volatile  variable <code>volatileVar++</code> is NOT thread-safe. If you need thread-safe semantics i.e. no possibility of multiple  threads corrupting the variable value by having the updates unexpectedly interfere with each other, then you need to use  a synchronized block to increment a variable, e.g. <code>synchronized(LOCK){myVar++}</code>, regardless of the overheads this causes - <a href="http://www.javaperformancetuning.com/news/qotm051.shtml">Java Performance Tuning</a></em></li>
    </ul>
    </li>
</ol>
<h2>More On Thread Safety</h2>
<p>All the techniques I discussed so far is to show you how to make your code thread safe. They are applicable only if you have to share resources across multiple threads and those threads may modify the resources. That is to say, if you don't share any resources or other threads only read but no write, your code are thread safe already. Here are some tips related to not sharing and read-only sharing:</p>
<ol>
    <li>You can use <strong>local </strong>variables to carry out logic in the methods if possible (not share)</li>
    <li>You can use <strong>TheadLocal </strong>to hold the resources if you want to access it across multiple methods for the same thread (not share)</li>
    <li>You can use immutable object (private variable without setXXX methods) for read-only sharing. For example, String and PrimitiveWrappers like Integer. However, make sure the declare <strong>final </strong>for the reference that holds your immutable object.</li>
    <li>Most of the time, you use collection classes like HashMap, ArrayList to hold our objects. Those classes are not thread safe. To make it thread safe, you may use Collections.synchronized wrappers or simply use the synchronized version of them like Hashtable and Vector. However, these approaches have 2 problems<br />
    <ul>
        <li>They are not performed. Why lock every reads only to protect occasional write?</li>
        <li>They are just <a href="http://www.ibm.com/developerworks/java/library/j-jtp07233.html">conditionally thread-safe</a>. All individual operations are thread-safe, but sequences of operations where the control flow depends on the results of previous operations may be subject to data races like doing containsKey(), size() and iterator() methods before actually read and write can give you <strong>NullPointerException</strong> and <strong>ConcurrentModificationException </strong>if you don't do external synchronization.</li>
        <li>Here are the <strong>unconditionally </strong><strong>thread-safe</strong> version like <strong>ConcurrentHashMap, </strong><span style="font-weight: bold;">ConcurrentLinkedQueue and </span><strong>CopyOnWriteArrayList</strong> to achieve thread-safe with good performance number.</li>
        <li><em>When you write an unconditionally thread-safe class, consider using <strong>private lock object</strong> in place of synchronized methods. This protect you against synchronization interference by clients and subclasses and gives you the flexibility to adopt a more sophisticated approach to concurrent control in later release - Joshua Bloch in Effective Java 2nd version p.281.</em></li>
    </ul>
    </li>
    <li>Deal with lazy initialization</li>
    <li>Handle denial of service attack that holds the object lock forever</li>
</ol>
<h2>Java Memory Model</h2>
<p>JMM is what causes concurrent programming way more complicated than it should be. Honestly, I am not good to write this part because I cannot understand it in full. All I can do is to provide you a video from Jeremy Manson in Google. Hear what the expert said:</p>
<embed id="VideoPlayback" style="width: 400px; height: 326px;" allowfullscreen="true" src="http://video.google.com/googleplayer.swf?docid=8394326369005388010&#38;hl=en&#38;fs=true" type="application/x-shockwave-flash"></embed>
<p>&#160;If you still have questions, make sure go to his blog</p>
<p><a href="http://jeremymanson.blogspot.com/">http://jeremymanson.blogspot.com/</a></p>
<h2>Reference</h2>
<p>Below are some of the articles I use:</p>
<ol>
    <li><a href="http://www.javaperformancetuning.com/news/qotm030.shtml">What does volatile do?</a></li>
    <li><a href="http://java.sun.com/docs/books/tutorial/essential/concurrency/index.html">Sun lesson on concurrency</a></li>
    <li><a href="http://www.ibm.com/developerworks/library/j-jtp02244.html">Fixing Java Memory Model - Brian Goetz - Part 1,</a> <a href="http://www.ibm.com/developerworks/library/j-jtp03304/">Part 2</a></li>
    <li><a href="http://rox-xmlrpc.sourceforge.net/niotut/index.html">Rox Java NIO Tutorial</a></li>
    <li><a href="http://www.roseindia.net/javatutorials/blocking_queue.shtml">Blocking Queue</a></li>
    <li><a href="http://gee.cs.oswego.edu/dl/cpj/jmm.html">Synchronization and Java Memory Model - Doug Lea</a>&#160;</li>
</ol>
<p>&#160;</p>]]></description>
			<content:encoded><![CDATA[<h2>Get yourself familiar with concurrency programming</h2>
<p>When I interview my candidates, I like to ask questions related to <strong>multi-threading</strong>. I found out that it is a good topic to differentiate out a hardcore programmer from <strong>application-oriented programmer.</strong> I am not saying I am looking for someone who could write the concurrency library as efficient as the one created by Doug Lea. In fact, I am looking for candidates who has solid understanding of this topic. However, I found out that most candidates have little knowledge in this area apart from the meaning of &#8220;<strong>synchronized</strong>&#8221; keyword in Java syntax. Therefore I decide to write a series of articles to cover some areas of multi-threading that I feel important to understand. Of course, I would start from the basic first.</p>
<p><span id="more-157"></span></p>
<h2>Introduction of Synchronization</h2>
<p><strong>Synchronization is a way to lock an object, so no 2 threads possibly running on the same code at the same time.</strong></p>
<p><pre><pre name="code" class="java">
public class SynchronizedCounter {
&nbsp;&nbsp;&nbsp;&nbsp;private double c = 0;

&nbsp;&nbsp;&nbsp;&nbsp;public synchronized void increment() {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;c++;
&nbsp;&nbsp;&nbsp;&nbsp;}

&nbsp;&nbsp;&nbsp;&nbsp;public synchronized void decrement() {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;c--;
&nbsp;&nbsp;&nbsp;&nbsp;}

&nbsp;&nbsp;&nbsp;&nbsp;public synchronized int value() {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return c;
&nbsp;&nbsp;&nbsp;&nbsp;}

&nbsp;&nbsp;&nbsp;&nbsp;public void method2() {
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;...
&nbsp;&nbsp;&nbsp;&nbsp;}
}</pre></pre></p>
<p>If count is an instance of SynchronizedCounter, then making these methods synchronized has two effects:</p>
<ul>
<li><u><strong>Mutual Exclusion</strong></u> &#8211; It is not possible for two invocations of synchronized     methods on the same object to interleave. When one thread is     executing a synchronized method for an object, all other threads     that invoke synchronized methods for the same object <strong>block</strong>     (suspend execution) until the first thread is done with the     object. <span style="color: rgb(255, 0, 0);">Remember this rule doesn&#8217;t apply to non-synchronized methods.</span> And the thread holds the lock of the object can reenter its synchronized methods (ie. <strong>reentrance</strong>).</li>
<li><u><strong>Memory-Visibility</strong></u> &#8211; When a synchronized method exits, it automatically     establishes a <strong>happens-before relationship</strong> with <i>any subsequent         invocation</i> of a synchronized method for the same object.     This guarantees that changes to the state of the object are     <strong>visible </strong>to all threads. Most of the interviewers miss this one. <img src="../../../../../wp-includes/images/smilies/icon_smile.gif" alt=":smile:" onclick="grin(':smile:');" /> <em>In database, it is like the concept of &#8220;commit&#8221;. If you don&#8217;t commit your changes, others could not see your changes.</em></li>
</ul>
<p>All in all, synchronized methods enable a simple strategy for preventing <strong>thread interference</strong> and <strong>memory consistency</strong> <strong>errors</strong>.  <strong>Interference </strong>happens when two operations, running in different threads, but acting on the same data, <i>interleave</i>. This means that the two operations consist of multiple steps, and the sequences of steps overlap. This will result in unpredictable data lost (hard to fix). Memory consistency error occurs when complier and processor <strong>reorder </strong>statements and uses the <strong>cached </strong>value for better performance</p>
<h2>Problem of Synchronization</h2>
<p>Synchronization will serialize the method calls from different threads. At any given time, only one thread can execute the synchronized method and the other threads need to wait until the object lock releases. This will dramatically diminish the liveness of your application. To minimize the impact, you should:</p>
<ol>
<li><strong>Reduce lock duration &#8211; Synchronized statements</strong> are useful for improving concurrency with fine-grained synchronization</li>
<li><strong>Reduce lock scope &#8211; Mutex variable </strong>in the synchronized lock may help you to avoid locking the whole object. In Java 5 concurrency library, there is class called <strong>ReentrantLock </strong>that provides the same features as synchronized with better performance and flexibility. Here is what is stated in &#8220;Java Concurrency in Practice&#8221;:</li>
</ol>
<blockquote>
<p>Why create a new locking mechanism (ie. ReentrantLock) that is so similar to intrinsic locking (ie. synchronized)? Intrinsic locking works fine in most situations but has some functional limitations. It is not possible to interrupt a thread waiting to acquire a lock, or to attempt to acquire a lock without being willing to wait for it forever. Intrinsic locks also must be released in the same block of code in which they are acquired; this simplifies coding and interacts nicely with exception handling, but makes non-block structured locking disciplines impossible. None of these are reasons to abandon <tt>synchronized</tt>, but in some cases a more flexible locking mechanism offers better liveness or performance.</p>
</blockquote>
<p>So far so good.? Great! lets me ask you 3 questions:</p>
<ol>
<li><u>Question 1:</u> In the example above, if thread A is executing a synchronized method &#8220;increment&#8221;, can another thread execute method2? <span style="color: rgb(0, 0, 255);"><em>Yes. Because method2 is not synchronized</em></span></li>
<li><u>Question 2:</u> If thread A is in the synchronized method xyz , can it invoke another synchronized abc? <span style="color: rgb(0, 0, 255);"><em>Yes. The object lock is reentrant.</em></span></li>
<li><u>Question 3: </u>If I want to make the above class thread-safe without using synchronized object lock, are there any other alternatives? <span style="color: rgb(0, 0, 255);"><em>Yes.</em></span>
<ul>
<li>ReentrantLock &#8211; <a href="http://www.ibm.com/developerworks/java/library/j-jtp10264/">http://www.ibm.com/developerworks/java/library/j-jtp10264/</a></li>
<li>Atomic variable &#8211; <a href="http://www.ibm.com/developerworks/java/library/j-jtp11234/">http://www.ibm.com/developerworks/java/library/j-jtp11234/</a></li>
</ul>
</li>
<li><u>Question 4: </u>How about declare the variable volatile?
<ul>
<li><a href="http://www.ibm.com/developerworks/java/library/j-jtp06197.html">Managing Volatility by Brian Goetz</a></li>
<li><a href="http://www.javalobby.org/java/forums/m91839242.html">Compare Atomic variable, ReentrantLock and Volatile variable.</a></li>
<li>Use int, byte instead of long or double because updating int or byte is an <strong>atomic action</strong>.  Atomic actions cannot be interleaved, so they can be used without fear of thread interference. However, this does not eliminate all need to synchronize atomic actions, because memory consistency errors are still possible (for example, thread A updated an variable atomically but it hasn&#8217;t flushed and sync up the main memory, so it is not visible to other threads). Unless the fields in question are declared [code]]czoxMDpcInZvbGF0aWxlLCBcIjt7WyYqJl19[[/code] the JMM does not require the underlying platform to provide cache coherency or sequential consistency across processors, so it is possible, on some platforms, to read stale data in the absence of synchronization. Look at <a href="http://www.javaperformancetuning.com/news/qotm030.shtml">here </a>for better explanation. However, <strong>volatile can only guarantee atomicity and memory consistency for single variable</strong>. If you want to guarantee that for compound operations, you need to use synchronized block.(or the new java.util.concurrent classes).<em> It is worth pointing out that increment (i.e. ++) and similar operations are not atomic in Java. So incrementing a volatile  variable [code]]czoxMzpcInZvbGF0aWxlVmFyKytcIjt7WyYqJl19[[/code] is NOT thread-safe. If you need thread-safe semantics i.e. no possibility of multiple  threads corrupting the variable value by having the updates unexpectedly interfere with each other, then you need to use  a synchronized block to increment a variable, e.g. [code]]czoyNzpcInN5bmNocm9uaXplZChMT0NLKXtteVZhcisrfVwiO3tbJiomXX0=[[/code], regardless of the overheads this causes - <a href="http://www.javaperformancetuning.com/news/qotm051.shtml">Java Performance Tuning</a></em></li>
</ul>
</li>
</ol>
<h2>More On Thread Safety</h2>
<p>All the techniques I discussed so far is to show you how to make your code thread safe. They are applicable only if you have to share resources across multiple threads and those threads may modify the resources. That is to say, if you don't share any resources or other threads only read but no write, your code are thread safe already. Here are some tips related to not sharing and read-only sharing:</p>
<ol>
<li>You can use <strong>local </strong>variables to carry out logic in the methods if possible (not share)</li>
<li>You can use <strong>TheadLocal </strong>to hold the resources if you want to access it across multiple methods for the same thread (not share)</li>
<li>You can use immutable object (private variable without setXXX methods) for read-only sharing. For example, String and PrimitiveWrappers like Integer. However, make sure the declare <strong>final </strong>for the reference that holds your immutable object.</li>
<li>Most of the time, you use collection classes like HashMap, ArrayList to hold our objects. Those classes are not thread safe. To make it thread safe, you may use Collections.synchronized wrappers or simply use the synchronized version of them like Hashtable and Vector. However, these approaches have 2 problems
<ul>
<li>They are not performed. Why lock every reads only to protect occasional write?</li>
<li>They are just <a href="http://www.ibm.com/developerworks/java/library/j-jtp07233.html">conditionally thread-safe</a>. All individual operations are thread-safe, but sequences of operations where the control flow depends on the results of previous operations may be subject to data races like doing containsKey(), size() and iterator() methods before actually read and write can give you <strong>NullPointerException</strong> and <strong>ConcurrentModificationException </strong>if you don't do external synchronization.</li>
<li>Here are the <strong>unconditionally </strong><strong>thread-safe</strong> version like <strong>ConcurrentHashMap, </strong><span style="font-weight: bold;">ConcurrentLinkedQueue and </span><strong>CopyOnWriteArrayList</strong> to achieve thread-safe with good performance number.</li>
<li><em>When you write an unconditionally thread-safe class, consider using <strong>private lock object</strong> in place of synchronized methods. This protect you against synchronization interference by clients and subclasses and gives you the flexibility to adopt a more sophisticated approach to concurrent control in later release - Joshua Bloch in Effective Java 2nd version p.281.</em></li>
</ul>
</li>
<li>Deal with lazy initialization</li>
<li>Handle denial of service attack that holds the object lock forever</li>
</ol>
<h2>Java Memory Model</h2>
<p>JMM is what causes concurrent programming way more complicated than it should be. Honestly, I am not good to write this part because I cannot understand it in full. All I can do is to provide you a video from Jeremy Manson in Google. Hear what the expert said:</p>
<p><embed id="VideoPlayback" style="width: 400px; height: 326px;" allowfullscreen="true" src="http://video.google.com/googleplayer.swf?docid=8394326369005388010&amp;hl=en&amp;fs=true" type="application/x-shockwave-flash"></embed></p>
<p>&#160;If you still have questions, make sure go to his blog</p>
<p><a href="http://jeremymanson.blogspot.com/">http://jeremymanson.blogspot.com/</a></p>
<h2>Reference</h2>
<p>Below are some of the articles I use:</p>
<ol>
<li><a href="http://www.javaperformancetuning.com/news/qotm030.shtml">What does volatile do?</a></li>
<li><a href="http://java.sun.com/docs/books/tutorial/essential/concurrency/index.html">Sun lesson on concurrency</a></li>
<li><a href="http://www.ibm.com/developerworks/library/j-jtp02244.html">Fixing Java Memory Model - Brian Goetz - Part 1,</a> <a href="http://www.ibm.com/developerworks/library/j-jtp03304/">Part 2</a></li>
<li><a href="http://rox-xmlrpc.sourceforge.net/niotut/index.html">Rox Java NIO Tutorial</a></li>
<li><a href="http://www.roseindia.net/javatutorials/blocking_queue.shtml">Blocking Queue</a></li>
<li><a href="http://gee.cs.oswego.edu/dl/cpj/jmm.html">Synchronization and Java Memory Model - Doug Lea</a>&#160;</li>
</ol>
<p>&#160;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/implement-your-idea/design/concurrent-programming-part-1-synchronization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Common DBA jobs</title>
		<link>http://www.solutionhacker.com/data-intelligence/data-store/common-dba-jobs/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/data-store/common-dba-jobs/#comments</comments>
		<pubDate>Mon, 17 Nov 2008 00:46:06 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Data Store]]></category>
		<category><![CDATA[etl]]></category>
		<category><![CDATA[export]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=192</guid>
		<description><![CDATA[<p><u><strong>Export schema/ data out from mysql</strong></u></p>
<p>To export schema and/or data, you can use mysqldump command:</p>
<blockquote>
<p>mysqldump -u [username] -p[password] -d&#160;[schema_name] &#62; [filename].sql</p>
</blockquote>
<ol>
    <li>-d means no data (just gives me the schema).</li>
    <li>-B is needed for multiple schema output</li>
    <li>-h (hostname)</li>
</ol>
<p><u><strong>Export data out from postgresql</strong></u></p>
<ol>
    <li><a href="http://www.mkyong.com/database/how-to-export-table-data-to-file-csv-postgresql/">Export table data from postgresql to csv format</a></li>
    <li><a href="http://www.mkyong.com/database/backup-restore-database-in-postgresql-pg_dumppg_restore/">Backup and restore database in postgresql</a></li>
</ol>
<p>However, if you want to export sql result set to csv in postgresql, you can consider to use COPY functionality.</p>
<blockquote>
<p>COPY (&#160;select statement&#160;) TO STDOUT WITH CSV<br />
<span id="intelliTxt" name="intelliTxt">COPY stock FROM 'mydir/Stock.csv';</span></p>
</blockquote>
<p><u><strong>Run sql script using mysql command</strong></u></p>
<p>To run the scripts as input, we can use the following command:</p>
<blockquote>
<p>mysql [schema_name] -u [username] -p[password] &#60; [filename].sql</p>
</blockquote>
<p><u><strong>SQL Tips</strong></u></p>
<p>There are times we want to put logic in SQL but not writing store procedure. Here are some of using functions that may get you there:</p>
<ul>
    <li>Conditional statement - CASE WHEN xxx THEN abc WHEN yyy THEN bbc ...ELSE ccc END</li>
</ul>
<blockquote>
<p>UPDATE Account SET Sales_Location__c =<br />
&#160;&#160;&#160; CASE WHEN Sales_Country__c != '' THEN Sales_Country__c WHEN Country__c != '' THEN Country__c<br />
ELSE '--' END</p>
</blockquote>
<ul>
    <li>COALESCE (input1, input2,....) - This function takes in as many parameter as you want and return you the first non-NULL parameter. Suppose we have a table A having 3 columns FullName,&#160;CompleteName and DisplayName. Any of these columns can contain null values. Now we want to select the DisplayName from this table, but if it is null, then return FullName, if that is also null then return CompleteName. We can easily perform the same in one select statement as: (<a href="http://databases.aspfaq.com/database/coalesce-vs-isnull-sql.html">COALESCE vs ISNULL</a>)</li>
</ul>
<blockquote>
<p>SELECT COALESCE(DisplayName, FullName, CompleteName) From A</p>
</blockquote>
<p><u><strong>&#160;ETL</strong></u></p>
<p>In mysql, you can export a table from db1 and import it to db2 remotely. For example, in db2 host, you can issue the following command:</p>
<blockquote>
<p>/usr/bin/mysqldump - -force - -compress - -opt -u [username] -p[password] -h [hostname] db1 [table] &#124; /usr/bin/mysql -u [username] -p[password] -D db2</p>
</blockquote>]]></description>
			<content:encoded><![CDATA[<p><u><strong>Export schema/ data out from mysql</strong></u></p>
<p>To export schema and/or data, you can use mysqldump command:</p>
<blockquote>
<p>mysqldump -u [username] -p[password] -d&#160;[schema_name] &gt; [filename].sql</p>
</blockquote>
<ol>
<li>-d means no data (just gives me the schema).</li>
<li>-B is needed for multiple schema output</li>
<li>-h (hostname)</li>
</ol>
<p><u><strong>Export data out from postgresql</strong></u></p>
<ol>
<li><a href="http://www.mkyong.com/database/how-to-export-table-data-to-file-csv-postgresql/">Export table data from postgresql to csv format</a></li>
<li><a href="http://www.mkyong.com/database/backup-restore-database-in-postgresql-pg_dumppg_restore/">Backup and restore database in postgresql</a></li>
</ol>
<p>However, if you want to export sql result set to csv in postgresql, you can consider to use COPY functionality.</p>
<blockquote>
<p>COPY (&#160;select statement&#160;) TO STDOUT WITH CSV<br />
<span id="intelliTxt" name="intelliTxt">COPY stock FROM &#8216;mydir/Stock.csv&#8217;;</span></p>
</blockquote>
<p><u><strong>Run sql script using mysql command</strong></u></p>
<p>To run the scripts as input, we can use the following command:</p>
<blockquote>
<p>mysql [schema_name] -u [username] -p[password] &lt; [filename].sql</p>
</blockquote>
<p><u><strong>SQL Tips</strong></u></p>
<p>There are times we want to put logic in SQL but not writing store procedure. Here are some of using functions that may get you there:</p>
<ul>
<li>Conditional statement &#8211; CASE WHEN xxx THEN abc WHEN yyy THEN bbc &#8230;ELSE ccc END</li>
</ul>
<blockquote>
<p>UPDATE Account SET Sales_Location__c =<br />
&#160;&#160;&#160; CASE WHEN Sales_Country__c != &#8221; THEN Sales_Country__c WHEN Country__c != &#8221; THEN Country__c<br />
ELSE &#8216;&#8211;&#8217; END</p>
</blockquote>
<ul>
<li>COALESCE (input1, input2,&#8230;.) &#8211; This function takes in as many parameter as you want and return you the first non-NULL parameter. Suppose we have a table A having 3 columns FullName,&#160;CompleteName and DisplayName. Any of these columns can contain null values. Now we want to select the DisplayName from this table, but if it is null, then return FullName, if that is also null then return CompleteName. We can easily perform the same in one select statement as: (<a href="http://databases.aspfaq.com/database/coalesce-vs-isnull-sql.html">COALESCE vs ISNULL</a>)</li>
</ul>
<blockquote>
<p>SELECT COALESCE(DisplayName, FullName, CompleteName) From A</p>
</blockquote>
<p><u><strong>&#160;ETL</strong></u></p>
<p>In mysql, you can export a table from db1 and import it to db2 remotely. For example, in db2 host, you can issue the following command:</p>
<blockquote>
<p>/usr/bin/mysqldump &#8211; -force &#8211; -compress &#8211; -opt -u [username] -p[password] -h [hostname] db1
<div class="su-table su-table-style-1"></div>
<p> | /usr/bin/mysql -u [username] -p[password] -D db2</p>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/data-store/common-dba-jobs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

