<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"
xmlns:rawvoice="http://www.rawvoice.com/rawvoiceRssModule/"
>

<channel>
	<title>Solution Hacker &#187; Extract Intelligence</title>
	<atom:link href="http://www.solutionhacker.com/category/data-intelligence/collective-intelligence/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.solutionhacker.com</link>
	<description>This blog provides solutions for enterpreneurs!</description>
	<lastBuildDate>Mon, 06 Feb 2012 07:19:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=279</generator>
<!-- podcast_generator="Blubrry PowerPress/2.0.4" -->
	<itunes:summary>This blog provides solutions for enterpreneurs!</itunes:summary>
	<itunes:author>Solution Hacker</itunes:author>
	<itunes:explicit>no</itunes:explicit>
	<itunes:image href="http://www.solutionhacker.com/wp-content/plugins/powerpress/itunes_default.jpg" />
	<itunes:subtitle>This blog provides solutions for enterpreneurs!</itunes:subtitle>
	<image>
		<title>Solution Hacker &#187; Extract Intelligence</title>
		<url>http://www.solutionhacker.com/wp-content/plugins/powerpress/rss_default.jpg</url>
		<link>http://www.solutionhacker.com/category/data-intelligence/collective-intelligence/</link>
	</image>
		<item>
		<title>Learning Hive</title>
		<link>http://www.solutionhacker.com/data-intelligence/collective-intelligence/learning-hive/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/collective-intelligence/learning-hive/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 12:08:05 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[Extract Intelligence]]></category>
		<category><![CDATA[System]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hive]]></category>
		<category><![CDATA[mapreduce]]></category>
		<category><![CDATA[query performance]]></category>
		<category><![CDATA[sql interface]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=292</guid>
		<description><![CDATA[<h3><strong>Starting to learn Hive</strong></h3>
<p>As I mentioned in my last article,  I was getting excited about the potential of Hive. Today, I decide to start my journey to learn this. I found a great introductory video that gives you a nice warm-up of using Hive (A basic knowledge of how hadoop and mapreduce work would be helpful for you to digest the material inside).</p>
<p>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="400" height="300" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://vimeo.com/moogaloop.swf?clip_id=3591321&#38;server=vimeo.com&#38;show_title=1&#38;show_byline=1&#38;show_portrait=0&#38;color=&#38;fullscreen=1" /><embed type="application/x-shockwave-flash" width="400" height="300" src="http://vimeo.com/moogaloop.swf?clip_id=3591321&#38;server=vimeo.com&#38;show_title=1&#38;show_byline=1&#38;show_portrait=0&#38;color=&#38;fullscreen=1" allowscriptaccess="always" allowfullscreen="true"></embed></object>
</p>
<h3><strong>Below are some highlights from this video</strong></h3>
<p>Hive is an SQL interface built on top of Hadoop. It supports Web access and JDBC. I am amazed how close the SQL syntax like the regular SQL for RDBMS. Below are some SQLs used in this tutorial.</p>
<blockquote><p><strong>//---------- Set up your tables in HIVE -----------------</strong><br />
 SHOW TABLES;</p>
<p>CREATE TABLE shakespeare (freq INT, word STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;</p>
<p>DESCRIBE shakespeare;</p>
<p><strong>//---------- Load data into Hive table from Hadoop HDFS -------------------</strong><br />
 LOAD DATA INPATH "shakespeare_freq" INTO TABLE shakespeare;</p>
<p><strong>//---------- Query against the data using hive sql interface --------------</strong><br />
 select * from shakespeare limit 10;<br />
 select * from sakespeare where freq > 100 sort by freq asc limit 10;<br />
 select freq, count(1) as f2 from shakespeare group by freq sort by f2 desc limit 10;</p>
<p>//show me the plan<br />
 explain select freq, count(1) as f2 from shakespeare group by freq sort by f2 desc limit 10;</p>
<p><strong>//---------- Create a merge table and populate it using dataset joining by 2 different tables</strong><br />
 insert overwrite table merged select s.word, s.freq, k.freq from shakespeare s join kjv k on (s.word = k.word);</p>
<p><strong>//---------- Query the merge table ---------------------</strong><br />
 select word, shake_f, kjv_f, (shake_f+kjv_f) as ss from merged sort by ss limit 20;</p>
</blockquote>
<p>To prepare the data for Hive to load in, the demo uses another mapreduce job to achieve. Remember to delete the log before doing Hive table load.</p>
<blockquote><p>hadoop jar $HADOOP_HOME/hadoop-*-examples.jar grep input shakespeare_freq '\w+'</p>
<p><strong>//remove the mapreduce job log</strong><br />
 hadoop fs -rmr shakespeare_freq/_logs</p>
</blockquote>
<p>Often time, large scale data processing system always IO bound. So for mapreduce job, your mapper is always waiting for data to load from disk. Hadoop mitigates the problem via during parallel load from lots of hard drives. However, a single hard drive is still max out at 75MB/s read as physical limit and nothing we can do about this. In order to achieve good speed, the key is to eliminate # of hadoop pass</p>
<p>Since Hive is on top of Hadoop's HDFS, it will have the same restrictions as it. So, you cannot do UPDATE, DELETE and INSERT records as regular RDMS. However, you can do bulk load to add more new files (data) to the table and you can do delete a file from Hive.</p>
<p>Hive needs to store metadata of the tables out from the HDFS. You can use regular rdms to achieve the job. But when you start Hive locally, it will seek for the local metastore. So, in distributed environment, you may need to centralize the metastore in a remote location. There is wiki on the Hive site that documents how to set it up.</p>
<p><h3>See Hive in Action</h3>
<p><object width="400" height="300"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=3598672&#38;server=vimeo.com&#38;show_title=1&#38;show_byline=1&#38;show_portrait=0&#38;color=&#38;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=3598672&#38;server=vimeo.com&#38;show_title=1&#38;show_byline=1&#38;show_portrait=0&#38;color=&#38;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="300"></embed></object>
<p><a href="http://vimeo.com/3598672">Cloudera Hadoop Training: Hive Tutorial Screencast</a> from <a href="http://vimeo.com/cloudera">Cloudera</a> on <a href="http://vimeo.com">Vimeo</a>.</p></p>
<h3>Other projects similar to Hadoop</h3>
<ul>
<li>Parallel databases: Gama, Bubba, Volcano</li>
<li>Google: Sawzall</li>
<li>Yahoo: Pig</li>
<li>IBM Research: JAQL</li>
<li>Microsoft: DryadLINQ, SCOPE</li>
<li>Greenplum: YAML MapReduce</li>
<li>Aster Data: In-database MapReduce</li>
<li>Business.com: CloudBase</li>
</ul>
]]></description>
			<content:encoded><![CDATA[<h3><strong>Starting to learn Hive</strong></h3>
<p>As I mentioned in my last article,  I was getting excited about the potential of Hive. Today, I decide to start my journey to learn this. I found a great introductory video that gives you a nice warm-up of using Hive (A basic knowledge of how hadoop and mapreduce work would be helpful for you to digest the material inside).</p>
<p>
<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="400" height="300" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="src" value="http://vimeo.com/moogaloop.swf?clip_id=3591321&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" /><embed type="application/x-shockwave-flash" width="400" height="300" src="http://vimeo.com/moogaloop.swf?clip_id=3591321&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" allowscriptaccess="always" allowfullscreen="true"></embed></object>
</p>
<h3><strong>Below are some highlights from this video</strong></h3>
<p>Hive is an SQL interface built on top of Hadoop. It supports Web access and JDBC. I am amazed how close the SQL syntax like the regular SQL for RDBMS. Below are some SQLs used in this tutorial.</p>
<blockquote><p><strong>//&#8212;&#8212;&#8212;- Set up your tables in HIVE &#8212;&#8212;&#8212;&#8212;&#8212;&#8211;</strong><br />
 SHOW TABLES;</p>
<p>CREATE TABLE shakespeare (freq INT, word STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY &#8216;\t&#8217; STORED AS TEXTFILE;</p>
<p>DESCRIBE shakespeare;</p>
<p><strong>//&#8212;&#8212;&#8212;- Load data into Hive table from Hadoop HDFS &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-</strong><br />
 LOAD DATA INPATH &#8220;shakespeare_freq&#8221; INTO TABLE shakespeare;</p>
<p><strong>//&#8212;&#8212;&#8212;- Query against the data using hive sql interface &#8212;&#8212;&#8212;&#8212;&#8211;</strong><br />
 select * from shakespeare limit 10;<br />
 select * from sakespeare where freq > 100 sort by freq asc limit 10;<br />
 select freq, count(1) as f2 from shakespeare group by freq sort by f2 desc limit 10;</p>
<p>//show me the plan<br />
 explain select freq, count(1) as f2 from shakespeare group by freq sort by f2 desc limit 10;</p>
<p><strong>//&#8212;&#8212;&#8212;- Create a merge table and populate it using dataset joining by 2 different tables</strong><br />
 insert overwrite table merged select s.word, s.freq, k.freq from shakespeare s join kjv k on (s.word = k.word);</p>
<p><strong>//&#8212;&#8212;&#8212;- Query the merge table &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;</strong><br />
 select word, shake_f, kjv_f, (shake_f+kjv_f) as ss from merged sort by ss limit 20;</p>
</blockquote>
<p>To prepare the data for Hive to load in, the demo uses another mapreduce job to achieve. Remember to delete the log before doing Hive table load.</p>
<blockquote><p>hadoop jar $HADOOP_HOME/hadoop-*-examples.jar grep input shakespeare_freq &#8216;\w+&#8217;</p>
<p><strong>//remove the mapreduce job log</strong><br />
 hadoop fs -rmr shakespeare_freq/_logs</p>
</blockquote>
<p>Often time, large scale data processing system always IO bound. So for mapreduce job, your mapper is always waiting for data to load from disk. Hadoop mitigates the problem via during parallel load from lots of hard drives. However, a single hard drive is still max out at 75MB/s read as physical limit and nothing we can do about this. In order to achieve good speed, the key is to eliminate # of hadoop pass</p>
<p>Since Hive is on top of Hadoop&#8217;s HDFS, it will have the same restrictions as it. So, you cannot do UPDATE, DELETE and INSERT records as regular RDMS. However, you can do bulk load to add more new files (data) to the table and you can do delete a file from Hive.</p>
<p>Hive needs to store metadata of the tables out from the HDFS. You can use regular rdms to achieve the job. But when you start Hive locally, it will seek for the local metastore. So, in distributed environment, you may need to centralize the metastore in a remote location. There is wiki on the Hive site that documents how to set it up.</p>
<p>
<h3>See Hive in Action</h3>
<p><object width="400" height="300"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=3598672&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=3598672&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="400" height="300"></embed></object></p>
<p><a href="http://vimeo.com/3598672">Cloudera Hadoop Training: Hive Tutorial Screencast</a> from <a href="http://vimeo.com/cloudera">Cloudera</a> on <a href="http://vimeo.com">Vimeo</a>.</p>
</p>
<h3>Other projects similar to Hadoop</h3>
<ul>
<li>Parallel databases: Gama, Bubba, Volcano</li>
<li>Google: Sawzall</li>
<li>Yahoo: Pig</li>
<li>IBM Research: JAQL</li>
<li>Microsoft: DryadLINQ, SCOPE</li>
<li>Greenplum: YAML MapReduce</li>
<li>Aster Data: In-database MapReduce</li>
<li>Business.com: CloudBase</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/collective-intelligence/learning-hive/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to build data warehouse</title>
		<link>http://www.solutionhacker.com/data-intelligence/collective-intelligence/how-to-build-data-warehouse/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/collective-intelligence/how-to-build-data-warehouse/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 03:13:30 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Extract Intelligence]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[denormailzed schema]]></category>
		<category><![CDATA[dimensional modeling]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=421</guid>
		<description><![CDATA[<p><strong>Operational databases</strong> are most commonly designed using <strong><em>normalized modeling</em></strong>, often using <strong><em>third-normal form</em></strong> or <strong><em>entity-relationship modeling</em></strong>. Normalized database schemas are tuned to support <em>fast updates and inserts</em> by minimizing the number of rows that must be changed when recording new data.<strong>Example: Order-Management Schema for operational database</strong></p>
<p><a href="http://www.solutionhacker.com/wp-content/uploads/2007/06/relatonalmodel.JPG" title="relatonalmodel.JPG"><img alt="relatonalmodel.JPG" src="http://www.solutionhacker.com/wp-content/uploads/2007/06/relatonalmodel.JPG" /></a></p>
<p><strong>Data warehouses</strong> differ from operational databases in the way they are designed; they are optimized for efficient querying and not for updating. Data warehouses provide a read-only version of the data in the operational databases, which is optimized for querying. The kind of modeling most commonly used in warehouse design is called <em><strong>dimensional modeling</strong></em>, and the schemas produced are known as <em><strong>star schemas</strong></em>. In dimensional modeling, a database is organized around a small number of <em><strong>fact tables</strong></em>. Each row in a fact table is a single measurable event: a single sale, a single hit to a web page, etc. <strong>Example: Order-Management Dimension Schema</strong></p>
<p><a href="http://www.solutionhacker.com/wp-content/uploads/2007/06/dimensionmodeling.JPG" title="dimensionmodeling.JPG"><img alt="dimensionmodeling.JPG" src="http://www.solutionhacker.com/wp-content/uploads/2007/06/dimensionmodeling.JPG" /></a></p>
<p>The key benefits of data warehouse are <strong>simplication</strong> and <strong>consolidation</strong> of data. It normally gathers data from different operational databases into single dimensional model for reporting and analysis purpose. On the other hand, dimensional modeling offers a chance to reduce the level of complexity in your database. By reducing complex chains of tables into dimension tables, the schema becomes smaller and performance tends to improve. The approaches we take to reduce the complexity are (1) We try to model <em>one aspect</em> of the system for each DM schema. (2) We can <em>denormalize</em> the schema to reduce number of joins. <strong>ETL Process</strong> Once you have a data schema for your warehouse, you&#39;ll need to fill it with data. This process is known as <em>extract, transform, and load</em>, or <em>ETL</em> for short. The first step, extraction, is simply the process of <em>selecting all the data of interest</em> from the operational database. Then the data must be transformed into the format needed by the warehouse. This could be as simple as <em>renaming some of the fields</em> or as complex as <em>cleaning dirty data and computing new fields</em>. Finally the data must be loaded into the data warehouse. There are some areas you need to pay attention when you perform the ETL:</p>
<ol>
	<li>During <strong>extraction</strong>, you will put a lot of strains to the operational database. To deal with this problem we can replicate a low-cost copy of the operational database on the warehouse machine before doing extraction. The SQL output of the extraction process can be a CSV file.</li>
	<li><strong>Transformation</strong> can be computing summary data, converting postal code into geo-code (ie. lat and long) that powers&#34;within X miles&#34; queries. You can use Perl to do this job. The output of transformation may be another CSV file.</li>
	<li>Finally, you <strong>load</strong> the data into CSV into dimensional model. To speed up the load, in MySQL, we first <strong>disable indexes</strong> with <font color="#003366" face="Courier New">ALTER TABLE foo DISABLE KEYS</font>, and after the load, we re-enable them with <font color="#003366" face="Courier New">ALTER TABLE foo ENABLE KEYS</font>. Each table needs to be cleared before loading via <font color="#003366" face="Courier New">TRUNCATE</font> command.</li>
	<li>You may be wondering what happens to clients using the warehouse while an ETL process is running. In our case, nothing at all! This magic is achieved by actually having two warehouse databases, one in use and the other free for loading. All the data goes into the loading database, and when it&#39;s full we swap it into place with <font color="#003366" face="Courier New">RENAME.</font>This produces an <strong>atomic switch</strong> of all tables in the loading database with the tables in the live database. It will wait for any running queries in the warehouse to finish before performing the swap, which is exactly what we want.</li>
</ol>
<p><strong>Quick Tips</strong></p>
<ol>
	<li>CSV format isn&#39;t a standard. Use XML can solve character issue but it might not perform as well due to formatting overhead.</li>
	<li>Transform is not always needed. If not, use &#34;SELECT ... INTO TABLE&#34; to provide a straight database-to-database extract-and-load.</li>
	<li>Incremental load is highly desirable. Use trigger can achieve that.</li>
	<li>Operational database uses MySQL&#39;s InnoDB backend, providing referential integrity and transactions. However, we chose MySQL&#39;s MyISAM backend for our warehouse for better performance as it is read-only and transactional feature is not needed.</li>
	<li>MySQL does not support for bitmap indexes. Bitmap indexes are ideal for the kind of low-cardinality data that is commonly used in data warehouses. PostgreSQL supports bitmap indexes as of version v8.1, as do a number of commercial database systems.</li>
</ol>
]]></description>
			<content:encoded><![CDATA[<p><strong>Operational databases</strong> are most commonly designed using <strong><em>normalized modeling</em></strong>, often using <strong><em>third-normal form</em></strong> or <strong><em>entity-relationship modeling</em></strong>. Normalized database schemas are tuned to support <em>fast updates and inserts</em> by minimizing the number of rows that must be changed when recording new data.<strong>Example: Order-Management Schema for operational database</strong></p>
<p><a href="http://www.solutionhacker.com/wp-content/uploads/2007/06/relatonalmodel.JPG" title="relatonalmodel.JPG"><img alt="relatonalmodel.JPG" src="http://www.solutionhacker.com/wp-content/uploads/2007/06/relatonalmodel.JPG" /></a></p>
<p><strong>Data warehouses</strong> differ from operational databases in the way they are designed; they are optimized for efficient querying and not for updating. Data warehouses provide a read-only version of the data in the operational databases, which is optimized for querying. The kind of modeling most commonly used in warehouse design is called <em><strong>dimensional modeling</strong></em>, and the schemas produced are known as <em><strong>star schemas</strong></em>. In dimensional modeling, a database is organized around a small number of <em><strong>fact tables</strong></em>. Each row in a fact table is a single measurable event: a single sale, a single hit to a web page, etc. <strong>Example: Order-Management Dimension Schema</strong></p>
<p><a href="http://www.solutionhacker.com/wp-content/uploads/2007/06/dimensionmodeling.JPG" title="dimensionmodeling.JPG"><img alt="dimensionmodeling.JPG" src="http://www.solutionhacker.com/wp-content/uploads/2007/06/dimensionmodeling.JPG" /></a></p>
<p>The key benefits of data warehouse are <strong>simplication</strong> and <strong>consolidation</strong> of data. It normally gathers data from different operational databases into single dimensional model for reporting and analysis purpose. On the other hand, dimensional modeling offers a chance to reduce the level of complexity in your database. By reducing complex chains of tables into dimension tables, the schema becomes smaller and performance tends to improve. The approaches we take to reduce the complexity are (1) We try to model <em>one aspect</em> of the system for each DM schema. (2) We can <em>denormalize</em> the schema to reduce number of joins. <strong>ETL Process</strong> Once you have a data schema for your warehouse, you&#39;ll need to fill it with data. This process is known as <em>extract, transform, and load</em>, or <em>ETL</em> for short. The first step, extraction, is simply the process of <em>selecting all the data of interest</em> from the operational database. Then the data must be transformed into the format needed by the warehouse. This could be as simple as <em>renaming some of the fields</em> or as complex as <em>cleaning dirty data and computing new fields</em>. Finally the data must be loaded into the data warehouse. There are some areas you need to pay attention when you perform the ETL:</p>
<ol>
<li>During <strong>extraction</strong>, you will put a lot of strains to the operational database. To deal with this problem we can replicate a low-cost copy of the operational database on the warehouse machine before doing extraction. The SQL output of the extraction process can be a CSV file.</li>
<li><strong>Transformation</strong> can be computing summary data, converting postal code into geo-code (ie. lat and long) that powers&quot;within X miles&quot; queries. You can use Perl to do this job. The output of transformation may be another CSV file.</li>
<li>Finally, you <strong>load</strong> the data into CSV into dimensional model. To speed up the load, in MySQL, we first <strong>disable indexes</strong> with <font color="#003366" face="Courier New">ALTER TABLE foo DISABLE KEYS</font>, and after the load, we re-enable them with <font color="#003366" face="Courier New">ALTER TABLE foo ENABLE KEYS</font>. Each table needs to be cleared before loading via <font color="#003366" face="Courier New">TRUNCATE</font> command.</li>
<li>You may be wondering what happens to clients using the warehouse while an ETL process is running. In our case, nothing at all! This magic is achieved by actually having two warehouse databases, one in use and the other free for loading. All the data goes into the loading database, and when it&#39;s full we swap it into place with <font color="#003366" face="Courier New">RENAME.</font>This produces an <strong>atomic switch</strong> of all tables in the loading database with the tables in the live database. It will wait for any running queries in the warehouse to finish before performing the swap, which is exactly what we want.</li>
</ol>
<p><strong>Quick Tips</strong></p>
<ol>
<li>CSV format isn&#39;t a standard. Use XML can solve character issue but it might not perform as well due to formatting overhead.</li>
<li>Transform is not always needed. If not, use &quot;SELECT &#8230; INTO TABLE&quot; to provide a straight database-to-database extract-and-load.</li>
<li>Incremental load is highly desirable. Use trigger can achieve that.</li>
<li>Operational database uses MySQL&#39;s InnoDB backend, providing referential integrity and transactions. However, we chose MySQL&#39;s MyISAM backend for our warehouse for better performance as it is read-only and transactional feature is not needed.</li>
<li>MySQL does not support for bitmap indexes. Bitmap indexes are ideal for the kind of low-cardinality data that is commonly used in data warehouses. PostgreSQL supports bitmap indexes as of version v8.1, as do a number of commercial database systems.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/collective-intelligence/how-to-build-data-warehouse/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Business Intelligence &#8211; Part 1 Pentaho</title>
		<link>http://www.solutionhacker.com/data-intelligence/collective-intelligence/business-intelligence-part-1-pentaho/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/collective-intelligence/business-intelligence-part-1-pentaho/#comments</comments>
		<pubDate>Thu, 26 Jun 2008 09:00:41 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[Extract Intelligence]]></category>
		<category><![CDATA[BI]]></category>
		<category><![CDATA[business intelligence]]></category>
		<category><![CDATA[Kettle]]></category>
		<category><![CDATA[pentaho]]></category>
		<category><![CDATA[reporting]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/?p=153</guid>
		<description><![CDATA[<h2>Getting into Business Intelligent World</h2>
<p>When I dig deeper in business intelligence, I found out that it is a huge topic ranging from reporting to data mining. Like all the knowledge acquisition plan, I put a series of milestones for myself. If you are interested, here is my list:</p>
<p><strong>Get and prepare your data</strong></p>
<ul>
    <li>Data collection - log processing, web services (SOAP and REST), RSS, screen scraping and more.</li>
    <li>Data preparation and crunching - ETL (Kettle)</li>
    <li>Data storage - data warehousing</li>
</ul>
<p><strong>Visualize your data</strong></p>
<ul>
    <li>Reporting and Charting (Pentaho as server and Flex as frontend)</li>
</ul>
<p><strong>Analysis your data</strong></p>
<ul>
    <li>Data modeling</li>
    <li>Data analysis (OLAP)</li>
</ul>
<p><strong>Get smart of your data</strong></p>
<ul>
    <li>Collective intelligence</li>
    <li>Data mining</li>
</ul>
<p><!--more--></p>
<h2>Introduction of Pentaho</h2>
<p>Firstly, I want to see whether there is any out of the box&#160; open-source solution that captures what I am trying to do here. If so, I can reach my goal much faster. Yes. It has to be open-source b/c I don't have $$ and I don't want to be just a user. After doing my homework a bit, I found out an open-source BI tool named Pentaho that looks pretty solid. So, I decide to dive deep to this. Like all the tools I mess around with, I want to integrate Pentaho as library. However, I don't find anything on the Net that shows me how to do it. I am looking into its download and checkout its pentaho-sample project. What it shows me is how to use their tools to create a report on their systems using their UI. I definitely need more!</p>
<p>After few days of efforts, I managed to pull out all the unnecessary dependencies from pentaho. The heart of Pentaho is its xaction interpreter. The approach Pentaho uses is to write adapters, plug into its framework and use xaction to wire them up in a workflow fashion. In fact, most of its functionalities come from other open-source projects like quartz for scheduling, shark for workflow engine and jfreereport for reporting. I don't think their xaction is clean but I do like their architectural approach.</p>
<p>Enough talk, lets start! Here I would use series of articles to cut your learning curve and show you how to get yourself familiar with Pentaho as a developer rather than user. First thing first, follow the articles below to set up your environment.&#160;</p>
<ol>
    <li><span style="font-size: 10pt; font-family: &#34;Arial&#34;,&#34;sans-serif&#34;; color: navy;"><a href="http://wiki.pentaho.com/display/PentahoDoc/07.+Debugging+with+the+Standalone+Platform+Project">http://wiki.pentaho.com/display/PentahoDoc/07.+Debugging+with+the+Standalone+Platform+Project</a><o:p></o:p></span></li>
    <li><span style="font-size: 10pt; font-family: &#34;Arial&#34;,&#34;sans-serif&#34;; color: navy;"><a href="http://wiki.pentaho.com/display/PentahoDoc/Building+and+Debugging+Pentaho+with+Eclipse">http://wiki.pentaho.com/display/PentahoDoc/Building+and+Debugging+Pentaho+with+Eclipse</a><o:p></o:p></span></li>
    <li><span style="font-size: 10pt; font-family: &#34;Arial&#34;,&#34;sans-serif&#34;; color: navy;"><a href="http://wiki.pentaho.com/display/PentahoDoc/Manual+Deployment+of+Pentaho">http://wiki.pentaho.com/display/PentahoDoc/Manual+Deployment+of+Pentaho</a></span></li>
</ol>
<p><span style="font-size: 10pt; font-family: &#34;Arial&#34;,&#34;sans-serif&#34;; color: navy;"><o:p></o:p></span></p>]]></description>
			<content:encoded><![CDATA[<h2>Getting into Business Intelligent World</h2>
<p>When I dig deeper in business intelligence, I found out that it is a huge topic ranging from reporting to data mining. Like all the knowledge acquisition plan, I put a series of milestones for myself. If you are interested, here is my list:</p>
<p><strong>Get and prepare your data</strong></p>
<ul>
<li>Data collection &#8211; log processing, web services (SOAP and REST), RSS, screen scraping and more.</li>
<li>Data preparation and crunching &#8211; ETL (Kettle)</li>
<li>Data storage &#8211; data warehousing</li>
</ul>
<p><strong>Visualize your data</strong></p>
<ul>
<li>Reporting and Charting (Pentaho as server and Flex as frontend)</li>
</ul>
<p><strong>Analysis your data</strong></p>
<ul>
<li>Data modeling</li>
<li>Data analysis (OLAP)</li>
</ul>
<p><strong>Get smart of your data</strong></p>
<ul>
<li>Collective intelligence</li>
<li>Data mining</li>
</ul>
<p><span id="more-153"></span></p>
<h2>Introduction of Pentaho</h2>
<p>Firstly, I want to see whether there is any out of the box&nbsp; open-source solution that captures what I am trying to do here. If so, I can reach my goal much faster. Yes. It has to be open-source b/c I don&#8217;t have $$ and I don&#8217;t want to be just a user. After doing my homework a bit, I found out an open-source BI tool named Pentaho that looks pretty solid. So, I decide to dive deep to this. Like all the tools I mess around with, I want to integrate Pentaho as library. However, I don&#8217;t find anything on the Net that shows me how to do it. I am looking into its download and checkout its pentaho-sample project. What it shows me is how to use their tools to create a report on their systems using their UI. I definitely need more!</p>
<p>After few days of efforts, I managed to pull out all the unnecessary dependencies from pentaho. The heart of Pentaho is its xaction interpreter. The approach Pentaho uses is to write adapters, plug into its framework and use xaction to wire them up in a workflow fashion. In fact, most of its functionalities come from other open-source projects like quartz for scheduling, shark for workflow engine and jfreereport for reporting. I don&#8217;t think their xaction is clean but I do like their architectural approach.</p>
<p>Enough talk, lets start! Here I would use series of articles to cut your learning curve and show you how to get yourself familiar with Pentaho as a developer rather than user. First thing first, follow the articles below to set up your environment.&nbsp;</p>
<ol>
<li><span style="font-size: 10pt; font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: navy;"><a href="http://wiki.pentaho.com/display/PentahoDoc/07.+Debugging+with+the+Standalone+Platform+Project">http://wiki.pentaho.com/display/PentahoDoc/07.+Debugging+with+the+Standalone+Platform+Project</a><o:p></o:p></span></li>
<li><span style="font-size: 10pt; font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: navy;"><a href="http://wiki.pentaho.com/display/PentahoDoc/Building+and+Debugging+Pentaho+with+Eclipse">http://wiki.pentaho.com/display/PentahoDoc/Building+and+Debugging+Pentaho+with+Eclipse</a><o:p></o:p></span></li>
<li><span style="font-size: 10pt; font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: navy;"><a href="http://wiki.pentaho.com/display/PentahoDoc/Manual+Deployment+of+Pentaho">http://wiki.pentaho.com/display/PentahoDoc/Manual+Deployment+of+Pentaho</a></span></li>
</ol>
<p><span style="font-size: 10pt; font-family: &quot;Arial&quot;,&quot;sans-serif&quot;; color: navy;"><o:p></o:p></span></p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/collective-intelligence/business-intelligence-part-1-pentaho/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Postgresql &#8211; Power of Array Type</title>
		<link>http://www.solutionhacker.com/data-intelligence/collective-intelligence/postgresql-power-of-array-type/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/collective-intelligence/postgresql-power-of-array-type/#comments</comments>
		<pubDate>Wed, 20 Feb 2008 00:18:44 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Extract Intelligence]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/2008/02/19/postgresql-power-of-array-type/</guid>
		<description><![CDATA[<strong>Create 2 tables</strong>
Item(id) and Item_log(item_id, price)

<strong>Populate it</strong>
<font>insert into item(id) values(1);
insert into item(id) values(2);
insert into item(id) values(3);
insert into item(id) values(4);
</font>

<font>insert into item_log(item_id, price) values(1, 100);
insert into item_log(item_id, price) values(1, 100);
insert into item_log(item_id, price) values(1, 100);
insert into item_log(item_id, price) values(1, 200);
insert into item_log(item_id, price) values(1, 200);
insert into item_log(item_id, price) values(1, 200);
insert into item_log(item_id, price) values(1, 200);
insert into item_log(item_id, price) values(1, 200);
insert into item_log(item_id, price) values(1, 200);
insert into item_log(item_id, price) values(2, 200);
insert into item_log(item_id, price) values(2, 200);</font>

<strong>Run SQL</strong>
<font>SELECT COUNT(il.price), i.id AS item_id, il.price
FROM item i, item_log il
WHERE i.id = il.item_id GROUP BY il.price, i.id; </font>

<strong>Result</strong>
<font>count &#124; item_id &#124; price
-------+---------+-------
3 &#124;       1 &#124;   100
6 &#124;       1 &#124;   200
2 &#124;       2 &#124;   200 </font>

<strong>Run SQL</strong>
<font>SELECT COUNT(il.price), i.id AS item_id, il.price,  array_accum(il.id) AS item_id_array
FROM item i, item_log il
WHERE i.id = il.item_id
GROUP BY il.price, i.id;  </font>

<strong>Result
</strong><font>count &#124; item_id &#124; price &#124; item_id_array
-------+---------+-------+---------------
3 &#124;       1 &#124;   100 &#124; {1,2,3}
6 &#124;       1 &#124;   200 &#124; {4,5,6,7,8,9}
2 &#124;       2 &#124;   200 &#124; {10,11} </font>]]></description>
			<content:encoded><![CDATA[<p><strong>Create 2 tables</strong><br />
Item(id) and Item_log(item_id, price)</p>
<p><strong>Populate it</strong><br />
<font>insert into item(id) values(1);<br />
insert into item(id) values(2);<br />
insert into item(id) values(3);<br />
insert into item(id) values(4);<br />
</font></p>
<p><font>insert into item_log(item_id, price) values(1, 100);<br />
insert into item_log(item_id, price) values(1, 100);<br />
insert into item_log(item_id, price) values(1, 100);<br />
insert into item_log(item_id, price) values(1, 200);<br />
insert into item_log(item_id, price) values(1, 200);<br />
insert into item_log(item_id, price) values(1, 200);<br />
insert into item_log(item_id, price) values(1, 200);<br />
insert into item_log(item_id, price) values(1, 200);<br />
insert into item_log(item_id, price) values(1, 200);<br />
insert into item_log(item_id, price) values(2, 200);<br />
insert into item_log(item_id, price) values(2, 200);</font></p>
<p><strong>Run SQL</strong><br />
<font>SELECT COUNT(il.price), i.id AS item_id, il.price<br />
FROM item i, item_log il<br />
WHERE i.id = il.item_id GROUP BY il.price, i.id; </font></p>
<p><strong>Result</strong><br />
<font>count | item_id | price<br />
&#8212;&#8212;-+&#8212;&#8212;&#8212;+&#8212;&#8212;-<br />
3 |       1 |   100<br />
6 |       1 |   200<br />
2 |       2 |   200 </font></p>
<p><strong>Run SQL</strong><br />
<font>SELECT COUNT(il.price), i.id AS item_id, il.price,  array_accum(il.id) AS item_id_array<br />
FROM item i, item_log il<br />
WHERE i.id = il.item_id<br />
GROUP BY il.price, i.id;  </font></p>
<p><strong>Result<br />
</strong><font>count | item_id | price | item_id_array<br />
&#8212;&#8212;-+&#8212;&#8212;&#8212;+&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8212;<br />
3 |       1 |   100 | {1,2,3}<br />
6 |       1 |   200 | {4,5,6,7,8,9}<br />
2 |       2 |   200 | {10,11} </font></p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/collective-intelligence/postgresql-power-of-array-type/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data warehouse 101</title>
		<link>http://www.solutionhacker.com/data-intelligence/collective-intelligence/data-warehouse-101/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/collective-intelligence/data-warehouse-101/#comments</comments>
		<pubDate>Fri, 22 Jun 2007 16:02:02 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Extract Intelligence]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/2007/06/22/data-warehouse-101/</guid>
		<description><![CDATA[To build data warehouse, you will use the techniques of dimensional modeling. Here are the guidelines you can follow:
<ol>
	<li>Divide the world into measurements and context.</li>
	<li>Numeric measurements place in <strong>Fact</strong> table whereas context are broken down into <strong>Dimensions. </strong>A fact table in a pure <strong>star schema</strong> consists of multiple foreign keys, each paired with a primary key in a dimension, together with the facts containing the measurements.</li>
	<li>Build the <strong>FK-PK</strong> pairs as <em>surrogate</em> keys that are just sequentially assigned integers.</li>
	<li>Use a special record in Dimension to represent unknown or noÂ because we want toÂ avoid putting null as FK.</li>
	<li>Resist <strong>snowflake</strong> the dimensional tables andÂ leave them in flat second normal form because the flat tables are much more efficient to query. Snowflaking a dimension into third normal form, while not incorrect, destroys the ability to use <strong>bitmap indexes</strong> and increases the user-perceived complexity of the design.</li>
	<li><strong>Semi-additive fact</strong> - Most fact tables are huge, with millions or even billions of rows, you almost never fetch a single record into your answer set. Rather, you fetch a very large number of records, which you compress into digestible form by adding, counting, averaging, or taking the min or max. Bank balance and inventory levels represent intensities that are awkward to express in an additive format. You sum over balance for 1 month is not really meaningful. Normally, we still treat these semiadditive facts as if they were additive but just before presenting the results to the end user, divide the answer y the number of time periods to get the right result. This technique is called <strong>averging over time</strong>.</li>
	<li><strong>Slowly changing dimension </strong>-</li>
	<li><strong>Hierarchical Dimension</strong> - There are 2 types of hierarchies. One is "Parent-Child" relationship and the other one is "Array of Level". Array of level like Country -&#62; State -&#62; City -&#62; Store. Parent-Child like product categories that can be nested in different ways.</li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>To build data warehouse, you will use the techniques of dimensional modeling. Here are the guidelines you can follow:</p>
<ol>
<li>Divide the world into measurements and context.</li>
<li>Numeric measurements place in <strong>Fact</strong> table whereas context are broken down into <strong>Dimensions. </strong>A fact table in a pure <strong>star schema</strong> consists of multiple foreign keys, each paired with a primary key in a dimension, together with the facts containing the measurements.</li>
<li>Build the <strong>FK-PK</strong> pairs as <em>surrogate</em> keys that are just sequentially assigned integers.</li>
<li>Use a special record in Dimension to represent unknown or noÂ because we want toÂ avoid putting null as FK.</li>
<li>Resist <strong>snowflake</strong> the dimensional tables andÂ leave them in flat second normal form because the flat tables are much more efficient to query. Snowflaking a dimension into third normal form, while not incorrect, destroys the ability to use <strong>bitmap indexes</strong> and increases the user-perceived complexity of the design.</li>
<li><strong>Semi-additive fact</strong> &#8211; Most fact tables are huge, with millions or even billions of rows, you almost never fetch a single record into your answer set. Rather, you fetch a very large number of records, which you compress into digestible form by adding, counting, averaging, or taking the min or max. Bank balance and inventory levels represent intensities that are awkward to express in an additive format. You sum over balance for 1 month is not really meaningful. Normally, we still treat these semiadditive facts as if they were additive but just before presenting the results to the end user, divide the answer y the number of time periods to get the right result. This technique is called <strong>averging over time</strong>.</li>
<li><strong>Slowly changing dimension </strong>-</li>
<li><strong>Hierarchical Dimension</strong> &#8211; There are 2 types of hierarchies. One is &#8220;Parent-Child&#8221; relationship and the other one is &#8220;Array of Level&#8221;. Array of level like Country -&gt; State -&gt; City -&gt; Store. Parent-Child like product categories that can be nested in different ways.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/collective-intelligence/data-warehouse-101/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pick the right database for data warehouse</title>
		<link>http://www.solutionhacker.com/data-intelligence/collective-intelligence/pick-the-right-database-for-data-warehouse/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/collective-intelligence/pick-the-right-database-for-data-warehouse/#comments</comments>
		<pubDate>Sat, 16 Jun 2007 00:36:46 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Extract Intelligence]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/2007/06/15/pick-the-right-database-for-data-warehouse/</guid>
		<description><![CDATA[For those who don't want to go for licensing path. Open source is definitely a better solution. However, whether open source DBMS can be used to build your data warehouse? I am not a good person to answer this question. But I have seen more and more small and medium size companies launched their business intelligent platform powered by open source DBMS like PostgreSQL and MySQL. Before MySQL v5 released, I don't recommend to use MySQL for data warehouse because it missed some of key features that others provide like trigger, stored procedure, partitioning. But now, I suggest to revisit it, especially I have heard MySQL become a golden partner with the great open source business intelligent platform "Pentaho". Below is a rough comparison chart for DBMS I got from devx.com. Take a look at it first.

<a href="http://www.solutionhacker.com/wp-content/uploads/2007/06/dbcomparison.JPG" title="dbcomparison.JPG"><img src="http://www.solutionhacker.com/wp-content/uploads/2007/06/dbcomparison.JPG" alt="dbcomparison.JPG" /></a>

There are debates about whether we should choose PostgreSQL vs MySQL. Here is one <a target="_blank" href="http://blog.page2rss.com/2007/01/postgresql-vs-mysql-performance.html">case study</a> that shows PostgreSQL is better in OLTP system. However, for select query, another <a target="_blank" href="http://wskills.blogspot.com/2007/01/postgresql-vs-mysql-benchmark.html">study</a> shows MySQL v5 is 2X faster the PostgreSQL v8. For data warehouse application, MySQL sounds like a better option as it is mostly read-only.

HereÂ isÂ theÂ summary that I obtained from this <a target="_blank" href="http://www.devx.com/dbzone/Article/20743/0/page/2">article</a> that compares PostgreSQL with MySQL:
<ol>
	<li>MySQL uses traditional row-level locking. PostgreSQL uses something called Multi Version Concurrency Control (MVCC) by default. MVCC is a little different from row-level locking in that transactions on the database are performed on a snapshot of the data and then serialized.</li>
	<li>MySQL supports the advanced feature of data partitioning within a database whereas PostgreSQL does not.</li>
	<li>PostgreSQL has many of the database features that Oracle, DB2, or MS-SQL has, including triggers, views, inheritance, sequences, stored procedures, cursors, and user-defined data types. MySQL's development version, version 5.0, supports views, stored procedures, and cursors. MySQL's future version, version 5.1, will support triggers.</li>
	<li>PostgreSQL supports user-defined data types, while MySQL does not.</li>
	<li>Both MySQL and PostgreSQL have support for single-master, multi-slave replication scenarios. PostgreSQL offers additional support for multi-master, multi-slave replication from a third-party vendor, as well as additional replication methods.</li>
	<li>MySQL uses a threaded model for server processes, wherein all of the users connect to a single database daemon for access. PostgreSQL uses a non-threaded model where every new connection to the database gets a new database process.</li>
	<li>MySQL does not support for bitmap indexes. Bitmap indexes are ideal for the kind of low-cardinality data that is commonly used in data warehouses. PostgreSQL supports bitmap indexes as of version v8.1, as do a number of commercial database systems.</li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>For those who don&#8217;t want to go for licensing path. Open source is definitely a better solution. However, whether open source DBMS can be used to build your data warehouse? I am not a good person to answer this question. But I have seen more and more small and medium size companies launched their business intelligent platform powered by open source DBMS like PostgreSQL and MySQL. Before MySQL v5 released, I don&#8217;t recommend to use MySQL for data warehouse because it missed some of key features that others provide like trigger, stored procedure, partitioning. But now, I suggest to revisit it, especially I have heard MySQL become a golden partner with the great open source business intelligent platform &#8220;Pentaho&#8221;. Below is a rough comparison chart for DBMS I got from devx.com. Take a look at it first.</p>
<p><a href="http://www.solutionhacker.com/wp-content/uploads/2007/06/dbcomparison.JPG" title="dbcomparison.JPG"><img src="http://www.solutionhacker.com/wp-content/uploads/2007/06/dbcomparison.JPG" alt="dbcomparison.JPG" /></a></p>
<p>There are debates about whether we should choose PostgreSQL vs MySQL. Here is one <a target="_blank" href="http://blog.page2rss.com/2007/01/postgresql-vs-mysql-performance.html">case study</a> that shows PostgreSQL is better in OLTP system. However, for select query, another <a target="_blank" href="http://wskills.blogspot.com/2007/01/postgresql-vs-mysql-benchmark.html">study</a> shows MySQL v5 is 2X faster the PostgreSQL v8. For data warehouse application, MySQL sounds like a better option as it is mostly read-only.</p>
<p>HereÂ isÂ theÂ summary that I obtained from this <a target="_blank" href="http://www.devx.com/dbzone/Article/20743/0/page/2">article</a> that compares PostgreSQL with MySQL:</p>
<ol>
<li>MySQL uses traditional row-level locking. PostgreSQL uses something called Multi Version Concurrency Control (MVCC) by default. MVCC is a little different from row-level locking in that transactions on the database are performed on a snapshot of the data and then serialized.</li>
<li>MySQL supports the advanced feature of data partitioning within a database whereas PostgreSQL does not.</li>
<li>PostgreSQL has many of the database features that Oracle, DB2, or MS-SQL has, including triggers, views, inheritance, sequences, stored procedures, cursors, and user-defined data types. MySQL&#8217;s development version, version 5.0, supports views, stored procedures, and cursors. MySQL&#8217;s future version, version 5.1, will support triggers.</li>
<li>PostgreSQL supports user-defined data types, while MySQL does not.</li>
<li>Both MySQL and PostgreSQL have support for single-master, multi-slave replication scenarios. PostgreSQL offers additional support for multi-master, multi-slave replication from a third-party vendor, as well as additional replication methods.</li>
<li>MySQL uses a threaded model for server processes, wherein all of the users connect to a single database daemon for access. PostgreSQL uses a non-threaded model where every new connection to the database gets a new database process.</li>
<li>MySQL does not support for bitmap indexes. Bitmap indexes are ideal for the kind of low-cardinality data that is commonly used in data warehouses. PostgreSQL supports bitmap indexes as of version v8.1, as do a number of commercial database systems.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/collective-intelligence/pick-the-right-database-for-data-warehouse/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pentaho &#8211; Quick Start</title>
		<link>http://www.solutionhacker.com/data-intelligence/collective-intelligence/pentaho-quick-start/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/collective-intelligence/pentaho-quick-start/#comments</comments>
		<pubDate>Fri, 15 Jun 2007 07:57:59 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Extract Intelligence]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/2007/06/15/pentaho-quick-start/</guid>
		<description><![CDATA[<p>This goal of this post is to walk you through an awesome business intelligent framework named "Pentaho". I believe the philosophy of "Learn by Practice". So, I will show you the steps to get pentaho up and run for a fictitious company. Along with this exercise, you should be able to understand how Pentaho works and what features it provides. Lets start. :) </p>
<h2><strong>Installation of Pentaho</strong></h2>
<ol>
    <li>Download Pentaho Demo (PCI) <a href="http://www.pentaho.org/download/" target="_blank">here</a></li>
    <li>Read <em>Pentaho Quick Start</em> and the <em>Creating Pentaho Solutions</em> pdf documents. You can get those documents from the download center above as well.</li>
    <li>Unzip the download file will result in a <font size="2" face="Courier" color="#ff6633">pentaho-demo</font> directory. This is the server root, and it is commonly referred to as the <em>PCI root</em> or <em>PCI install directory</em> or something similar. To start the server, windows users run <font size="2" face="Courier" color="#ff6633">start-pentaho.bat</font>; *nix users run <font size="2" face="Courier" color="#ff6633">start-pentaho.sh</font>.</li>
    <li>Open an internet browser, and navigate to:<a href="http://localhost:8080/" target="_pentaho"><font size="2" color="#6699cc">http://localhost:8080/</font></a>. This may take a little while - the server needs to warm up.</li>
    <li>Now you should see the pentaho web front. Try this sample out to make sure the setup is correctly done.
    <ul>
        <li><a href="http://localhost:8080/pentaho/Navigate?solution=samples&#38;path=getting-started" target="_pentaho"><font color="#6699cc">Getting Started</font></a>::<a href="http://localhost:8080/pentaho/ViewAction?&#38;solution=samples&#38;path=getting-started&#38;action=HelloWorld.xaction" target="_pentaho"><font color="#666699">Hello World</font></a></li>
        <li><a href="http://localhost:8080/pentaho/Navigate?&#38;solution=samples&#38;path=reporting&#38;action=" target="_pentaho"><font color="#666699">Reporting</font></a></li>
        <li><a href="http://localhost:8080/pentaho/Navigate?&#38;solution=samples&#38;path=charts&#38;action="><font color="#6699cc">Chart Examples</font></a>. Shows some of the included charting capabilities</li>
        <li><a href="http://localhost:8080/pentaho/Navigate?&#38;solution=samples&#38;path=analysis&#38;action="><font color="#6699cc">Analysis / OLAP Examples</font></a>. Demonstrates slice and dice</li>
        <li><a href="http://localhost:8080/pentaho/Navigate?&#38;solution=samples&#38;path=dashboard&#38;action="><font color="#6699cc">Dashboards</font></a>. There is demo "Flash Dashboard" that actually uses XML-driven <a href="http://www.maani.us/xml_charts/" target="_blank">free Flash chart</a>. Look pretty good! However, I suggest to use FLEX Charting although it is not free</li>
    </ul>
    </li>
    <li>When you're done with pentaho, locate the <font size="2" face="Courier" color="#ff6633">stop-pentaho</font> script in the PCI installation directory. Execute the script to stop de server.</li>
    <li>For more info of how to set up PCI as server and how to configure email service. Take a look at <a href="http://rpbouman.blogspot.com/2006/07/hands-on-mysql-samples-for-pentaho-2_12.html" target="_blank">Roland blog</a>.</li>
</ol>
<h2><strong>Create a new sample</strong></h2>
<ol>
    <li>Start pentaho demo as stated above.</li>
    <li>Create a folder named "MySQL" under %PCI%/pentaho-solutions/samples/mysql.</li>
    <li>To make mysql folder to display at the entry page. We need to put index.xml in the mysql folder. You may notice that we are using variables for name and description. The value of the variables are defined using index.properties under the same folder. The reason to do that is to support internationalization because you can define index_cn.properties for Chinese wording. <strong>Note</strong>: Click "Update Solution Repository" under Admin tab to refresh the change.</li>
    <li>Download MySQL Sample databaseÂ&#160;- sakila. Here is the <a href="http://www.stardata.it/sakila/sakila.html" target="_blank">schema view</a>.</li>
    <blockquote>&#60;index&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;name&#62;%directory_name&#60;/name&#62; <br />
    &#160;&#160;&#160;&#160;&#160; &#60;description&#62;%directory_description&#60;/description&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;icon&#62;folder.png&#124;dashboard.jpg&#60;/icon&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;visible&#62;true&#60;/visible&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;display-type&#62;list&#60;/display-type&#62; <br />
    &#60;/index&#62;</blockquote>
    <li>Download <a href="http://dev.mysql.com/downloads/mysql/5.0.html#win32" target="_blank">mysql v5 database </a>and its <a href="http://dev.mysql.com/downloads/connector/j/5.0.html" target="_blank">jdbc driver.</a></li>
    <li>Run sakila_data.sql and sakila_schema.sql against MySql database. Now you have your sample movie database ready.</li>
    <li>
    <p align="left">Create a file named <strong>mysql-ds.xml</strong> in $DEMO_BASE/jboss/server/default/deploy/</p>
    </li>
    <blockquote>&#60;?xml version="1.0" encoding="UTF-8"?&#62; <br />
    &#60;datasources&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;local-tx-datasource&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;jndi-name&#62;sakila&#60;/jndi-name&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;connection-url&#62;jdbc:mysql://localhost/sakila&#60;/connection-url&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;driver-class&#62;com.mysql.jdbc.Driver&#60;/driver-class&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;user-name&#62;root&#60;/user-name&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;password&#62;honr&#60;/password&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;/local-tx-datasource&#62; <br />
    &#60;/datasources&#62;</blockquote>
    <li>
    <p align="left">Â&#160;Edit the file $DEMO_BASE/jboss/server/default/deploy/<a title="Linkification: http://pentaho.war/WEB-INF/" href="http://pentaho.war/WEB-INF/" class="linkification-ext">pentaho.war/WEB-INF/</a><strong>web.xml</strong>. Add the following right below solution5 resource-ref entry</p>
    <blockquote>&#60;resource-ref&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;description&#62;sakila&#60;/description&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;res-ref-name&#62;jdbc/sakila&#60;/res-ref-name&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;res-type&#62;javax.sql.DataSource&#60;/res-type&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;res-auth&#62;Container&#60;/res-auth&#62;<br />
    &#60;/resource-ref&#62;</blockquote></li>
    <li>
    <p align="left">Â&#160;Edit the file $DEMO_BASE/jboss/server/default/deploy/<a title="Linkification: http://pentaho.war/WEB-INF/" href="http://pentaho.war/WEB-INF/" class="linkification-ext">pentaho.war/WEB-INF/</a><strong>jboss-web.xml</strong>. Add the following right below theÂ&#160;solution5 entry.</p>
    <blockquote>&#60;resource-ref&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;res-ref-name&#62;jdbc/sakila&#60;/res-ref-name&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;res-type&#62;javax.sql.DataSource&#60;/res-type&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;jndi-name&#62;java:/sakila&#60;/jndi-name&#62;<br />
    &#60;/resource-ref&#62;</blockquote></li>
    <li>Copy yourÂ&#160;mysql jdbc driverÂ&#160;library in the following directory: $DEMO_BASE/jboss/server/default/lib</li>
    <li>Create your own myFirst.xaction file.</li>
    <blockquote>&#60;?xml version="1.0" encoding="UTF-8"?&#62; <br />
    &#60;action-sequence&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;name&#62;myFirst.xaction&#60;/name&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;title&#62;%title&#60;/title&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;version&#62;1&#60;/version&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;logging-level&#62;debug&#60;/logging-level&#62;<br />
    &#160;&#160;&#160;&#160;&#160; &#60;documentation&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;author&#62;Raymond Hon&#60;/author&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;description&#62;%description&#60;/description&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;help/&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;result-type&#62;rule&#60;/result-type&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;icon&#62;SQL_Datasource.png&#60;/icon&#62;<br />
    &#160;&#160;&#160;&#160; &#60;/documentation&#62;<br />
    &#160;&#160;&#160;&#160; &#60;inputs/&#62;<br />
    &#160;&#160;&#160;&#160; &#60;outputs&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;rule-result&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#160; &#160;&#160; &#60;type&#62;list&#60;/type&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;/rule-result&#62;<br />
    &#160;&#160;&#160;&#160; &#60;/outputs&#62;<br />
    &#160;&#160;&#160;&#160; &#60;resources/&#62;<br />
    &#160;&#160;&#160;&#160; &#60;actions&#62; <br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;action-definition&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;action-outputs&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;rule-result type="list" /&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;/action-outputs&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;component-name&#62;SQLLookupRule&#60;/component-name&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;action-type&#62;rule&#60;/action-type&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;component-definition&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;jndi&#62;sakila&#60;/jndi&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;query&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;![CDATA[select * from actor where actor_id = 1]]&#62; <br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;/query&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;/component-definition&#62;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#60;/action-definition&#62; <br />
    &#160;&#160;&#160;&#160; &#60;/actions&#62; <br />
    &#60;/action-sequence&#62;</blockquote>
    <li>Create your own myFirst.properties with title and description.</li>
    <li>Restart pentaho</li>
    <li>Open up Firefox (cough IE) and hit the following URL: <a href="http://localhost:8080/pentaho/ViewAction?&#38;solution=samples&#38;path=datasources&#38;action=MyFirst.xaction">http://localhost:8080/pentaho/ViewAction?&#38;solution=samples&#38;path=mysql&#38;action=<strong>myFirst.xaction</strong></a></li>
    <li>That's it! :)</li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>This goal of this post is to walk you through an awesome business intelligent framework named &#8220;Pentaho&#8221;. I believe the philosophy of &#8220;Learn by Practice&#8221;. So, I will show you the steps to get pentaho up and run for a fictitious company. Along with this exercise, you should be able to understand how Pentaho works and what features it provides. Lets start. <img src='http://www.solutionhacker.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  </p>
<h2><strong>Installation of Pentaho</strong></h2>
<ol>
<li>Download Pentaho Demo (PCI) <a href="http://www.pentaho.org/download/" target="_blank">here</a></li>
<li>Read <em>Pentaho Quick Start</em> and the <em>Creating Pentaho Solutions</em> pdf documents. You can get those documents from the download center above as well.</li>
<li>Unzip the download file will result in a <font size="2" face="Courier" color="#ff6633">pentaho-demo</font> directory. This is the server root, and it is commonly referred to as the <em>PCI root</em> or <em>PCI install directory</em> or something similar. To start the server, windows users run <font size="2" face="Courier" color="#ff6633">start-pentaho.bat</font>; *nix users run <font size="2" face="Courier" color="#ff6633">start-pentaho.sh</font>.</li>
<li>Open an internet browser, and navigate to:<a href="http://localhost:8080/" target="_pentaho"><font size="2" color="#6699cc">http://localhost:8080/</font></a>. This may take a little while &#8211; the server needs to warm up.</li>
<li>Now you should see the pentaho web front. Try this sample out to make sure the setup is correctly done.
<ul>
<li><a href="http://localhost:8080/pentaho/Navigate?solution=samples&amp;path=getting-started" target="_pentaho"><font color="#6699cc">Getting Started</font></a>::<a href="http://localhost:8080/pentaho/ViewAction?&amp;solution=samples&amp;path=getting-started&amp;action=HelloWorld.xaction" target="_pentaho"><font color="#666699">Hello World</font></a></li>
<li><a href="http://localhost:8080/pentaho/Navigate?&amp;solution=samples&amp;path=reporting&amp;action=" target="_pentaho"><font color="#666699">Reporting</font></a></li>
<li><a href="http://localhost:8080/pentaho/Navigate?&amp;solution=samples&amp;path=charts&amp;action="><font color="#6699cc">Chart Examples</font></a>. Shows some of the included charting capabilities</li>
<li><a href="http://localhost:8080/pentaho/Navigate?&amp;solution=samples&amp;path=analysis&amp;action="><font color="#6699cc">Analysis / OLAP Examples</font></a>. Demonstrates slice and dice</li>
<li><a href="http://localhost:8080/pentaho/Navigate?&amp;solution=samples&amp;path=dashboard&amp;action="><font color="#6699cc">Dashboards</font></a>. There is demo &#8220;Flash Dashboard&#8221; that actually uses XML-driven <a href="http://www.maani.us/xml_charts/" target="_blank">free Flash chart</a>. Look pretty good! However, I suggest to use FLEX Charting although it is not free</li>
</ul>
</li>
<li>When you&#8217;re done with pentaho, locate the <font size="2" face="Courier" color="#ff6633">stop-pentaho</font> script in the PCI installation directory. Execute the script to stop de server.</li>
<li>For more info of how to set up PCI as server and how to configure email service. Take a look at <a href="http://rpbouman.blogspot.com/2006/07/hands-on-mysql-samples-for-pentaho-2_12.html" target="_blank">Roland blog</a>.</li>
</ol>
<h2><strong>Create a new sample</strong></h2>
<ol>
<li>Start pentaho demo as stated above.</li>
<li>Create a folder named &#8220;MySQL&#8221; under %PCI%/pentaho-solutions/samples/mysql.</li>
<li>To make mysql folder to display at the entry page. We need to put index.xml in the mysql folder. You may notice that we are using variables for name and description. The value of the variables are defined using index.properties under the same folder. The reason to do that is to support internationalization because you can define index_cn.properties for Chinese wording. <strong>Note</strong>: Click &#8220;Update Solution Repository&#8221; under Admin tab to refresh the change.</li>
<li>Download MySQL Sample databaseÂ&#160;- sakila. Here is the <a href="http://www.stardata.it/sakila/sakila.html" target="_blank">schema view</a>.</li>
<blockquote><p>&lt;index&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;name&gt;%directory_name&lt;/name&gt; <br />
    &#160;&#160;&#160;&#160;&#160; &lt;description&gt;%directory_description&lt;/description&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;icon&gt;folder.png|dashboard.jpg&lt;/icon&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;visible&gt;true&lt;/visible&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;display-type&gt;list&lt;/display-type&gt; <br />
    &lt;/index&gt;</p></blockquote>
<li>Download <a href="http://dev.mysql.com/downloads/mysql/5.0.html#win32" target="_blank">mysql v5 database </a>and its <a href="http://dev.mysql.com/downloads/connector/j/5.0.html" target="_blank">jdbc driver.</a></li>
<li>Run sakila_data.sql and sakila_schema.sql against MySql database. Now you have your sample movie database ready.</li>
<li>
<p align="left">Create a file named <strong>mysql-ds.xml</strong> in $DEMO_BASE/jboss/server/default/deploy/</p>
</li>
<blockquote><p>&lt;?xml version=&#8221;1.0&#8243; encoding=&#8221;UTF-8&#8243;?&gt; <br />
    &lt;datasources&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;local-tx-datasource&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;jndi-name&gt;sakila&lt;/jndi-name&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;connection-url&gt;jdbc:mysql://localhost/sakila&lt;/connection-url&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;driver-class&gt;com.mysql.jdbc.Driver&lt;/driver-class&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;user-name&gt;root&lt;/user-name&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;password&gt;honr&lt;/password&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;/local-tx-datasource&gt; <br />
    &lt;/datasources&gt;</p></blockquote>
<li>
<p align="left">Â&#160;Edit the file $DEMO_BASE/jboss/server/default/deploy/<a title="Linkification: http://pentaho.war/WEB-INF/" href="http://pentaho.war/WEB-INF/" class="linkification-ext">pentaho.war/WEB-INF/</a><strong>web.xml</strong>. Add the following right below solution5 resource-ref entry</p>
<blockquote><p>&lt;resource-ref&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;description&gt;sakila&lt;/description&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;res-ref-name&gt;jdbc/sakila&lt;/res-ref-name&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;res-type&gt;javax.sql.DataSource&lt;/res-type&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;res-auth&gt;Container&lt;/res-auth&gt;<br />
    &lt;/resource-ref&gt;</p></blockquote>
</li>
<li>
<p align="left">Â&#160;Edit the file $DEMO_BASE/jboss/server/default/deploy/<a title="Linkification: http://pentaho.war/WEB-INF/" href="http://pentaho.war/WEB-INF/" class="linkification-ext">pentaho.war/WEB-INF/</a><strong>jboss-web.xml</strong>. Add the following right below theÂ&#160;solution5 entry.</p>
<blockquote><p>&lt;resource-ref&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;res-ref-name&gt;jdbc/sakila&lt;/res-ref-name&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;res-type&gt;javax.sql.DataSource&lt;/res-type&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;jndi-name&gt;java:/sakila&lt;/jndi-name&gt;<br />
    &lt;/resource-ref&gt;</p></blockquote>
</li>
<li>Copy yourÂ&#160;mysql jdbc driverÂ&#160;library in the following directory: $DEMO_BASE/jboss/server/default/lib</li>
<li>Create your own myFirst.xaction file.</li>
<blockquote><p>&lt;?xml version=&#8221;1.0&#8243; encoding=&#8221;UTF-8&#8243;?&gt; <br />
    &lt;action-sequence&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;name&gt;myFirst.xaction&lt;/name&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;title&gt;%title&lt;/title&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;version&gt;1&lt;/version&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;logging-level&gt;debug&lt;/logging-level&gt;<br />
    &#160;&#160;&#160;&#160;&#160; &lt;documentation&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;author&gt;Raymond Hon&lt;/author&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;description&gt;%description&lt;/description&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;help/&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;result-type&gt;rule&lt;/result-type&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;icon&gt;SQL_Datasource.png&lt;/icon&gt;<br />
    &#160;&#160;&#160;&#160; &lt;/documentation&gt;<br />
    &#160;&#160;&#160;&#160; &lt;inputs/&gt;<br />
    &#160;&#160;&#160;&#160; &lt;outputs&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;rule-result&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &#160; &#160;&#160; &lt;type&gt;list&lt;/type&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;/rule-result&gt;<br />
    &#160;&#160;&#160;&#160; &lt;/outputs&gt;<br />
    &#160;&#160;&#160;&#160; &lt;resources/&gt;<br />
    &#160;&#160;&#160;&#160; &lt;actions&gt; <br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;action-definition&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;action-outputs&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;rule-result type=&#8221;list&#8221; /&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;/action-outputs&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;component-name&gt;SQLLookupRule&lt;/component-name&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;action-type&gt;rule&lt;/action-type&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;component-definition&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;jndi&gt;sakila&lt;/jndi&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;query&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;![CDATA[select * from actor where actor_id = 1]]&gt; <br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;/query&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;/component-definition&gt;<br />
    &#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; &lt;/action-definition&gt; <br />
    &#160;&#160;&#160;&#160; &lt;/actions&gt; <br />
    &lt;/action-sequence&gt;</p></blockquote>
<li>Create your own myFirst.properties with title and description.</li>
<li>Restart pentaho</li>
<li>Open up Firefox (cough IE) and hit the following URL: <a href="http://localhost:8080/pentaho/ViewAction?&amp;solution=samples&amp;path=datasources&amp;action=MyFirst.xaction">http://localhost:8080/pentaho/ViewAction?&amp;solution=samples&amp;path=mysql&amp;action=<strong>myFirst.xaction</strong></a></li>
<li>That&#8217;s it! <img src='http://www.solutionhacker.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/collective-intelligence/pentaho-quick-start/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Pentaho Reporting Framework &#8211; Architecture</title>
		<link>http://www.solutionhacker.com/data-intelligence/collective-intelligence/pentaho-reporting-framework-architecture/</link>
		<comments>http://www.solutionhacker.com/data-intelligence/collective-intelligence/pentaho-reporting-framework-architecture/#comments</comments>
		<pubDate>Thu, 07 Jun 2007 00:14:12 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Extract Intelligence]]></category>

		<guid isPermaLink="false">http://www.solutionhacker.com/2007/06/06/pentaho-reporting-framework-architecture/</guid>
		<description><![CDATA[I am looking into Pentaho currently for my project. It looks very promising so far. Here is the <a target="_blank" href="http://www.pentaho.com/products/demos/pentaho_architecture/Pentaho_Architecture_controller.swf">video </a>that talks about the architecure of it. Enjoy.]]></description>
			<content:encoded><![CDATA[<p>I am looking into Pentaho currently for my project. It looks very promising so far. Here is the <a target="_blank" href="http://www.pentaho.com/products/demos/pentaho_architecture/Pentaho_Architecture_controller.swf">video </a>that talks about the architecure of it. Enjoy.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.solutionhacker.com/data-intelligence/collective-intelligence/pentaho-reporting-framework-architecture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

