Solution Hacker

This blog provides solutions for enterpreneurs!

468
  • Home
  • Code
  • Database
  • Economy
  • Flex
  • Linux
  • Open Source
  • Reference
  • About
  • Lord

Java System Architecture Resources - Links

Posted by admin in September 22nd 2008  
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Memory Management

One strength of the Java™ 2 Platform, Standard Edition (J2SE™) is that it performs automatic memory
management, thereby shielding the developer from the complexity of explicit memory management. However, it doesn’t mean that there will not be any memory leak. So, I decide to give a summary on some key areas in this topics. They are:

  1. How garbage collector works?
  2. How memory leak still shows up?
  3. What is weak reference?
  4. What stores in heap and stack?

If you want to understand this topic in detail, you can read the this article from Sun. The key points are summarized below:

  1. Without automatic memory management - GC, we may face 2 common issues: dangling reference (deallocate a object while others are still referencing it) and space leak (some objects are not referenced but not been deallocated either). GC is great but it doesn’t solve all the memory allocation problem. For example, you can have object list keep growing until it uses up all the free memory.
  2. Garbage collection takes time and resource to do it and sometimes it is not acceptable for the real-time mission critical system.
  3. The task of fulfilling an allocation request, which involves finding a block of unused memory of a certain size in
    the heap
    , is a difficult one. The main problem for most dynamic memory allocation algorithms is to avoid
    fragmentation
    , while keeping both allocation and deallocation efficient. One approach to eliminating fragmentation is called compaction.
  4. It is also desirable that a garbage collector operate efficiently, without introducing long pauses during which the
    application is not running.

 

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • del.icio.us
  • Reddit
  • Technorati
  • NewsVine
  • Slashdot
  • SphereIt
  • YahooMyWeb
  • BlogMemes
  • Spurl
  • E-mail this story to a friend!
  • Facebook
  • Furl
  • Google
  • Print this article!
No Comment
under: 11. Architect Corner
Tags: algorithm, dangling reference, garbage collector, memory leak, memory management
Digg it Add to del.icio.us Stumble it add to technorati

Amazon Web Service Solutions

Posted by admin in September 7th 2008  
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

When we talk about SOA, I would think of Amazon. It is the company that takes SOA to the next level, proving to the world that it is a viable solution for us. Great! I decide to put sometime to learn from Amazon via reviewing the web services it provides, reading the related interviews and blogs, studying how to build an application on top of its infrastructure, develop an application to consume data provided from its Web Services. Anyway, I believe the best way to learn SOA is to get a taste of the services provided from a company that relies greatly on this to scale its business. Before I delve deeper, I need to clarify one thing. Many people use the term SOA and Web Service interchangeably. Be honest, I was among one of them. However, in definition, they are not the same. SOA is about design; Web services are a specific technology set that supports distributed computing. Web services make it easier to create a service-based system, but only if your developers are using SOA design principles, where functions are packaged into modular, shareable, distributable services that can be used and reused by multiple consumers. In Amazon, each service is independent and encapsulates 3 things: data, business logic and public service interface. Each service owns its data and is never been directly accessed by other services. According to its CTO, this is the core architecture that scales Amazon.
 

 

…..Click here to read more

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • del.icio.us
  • Reddit
  • Technorati
  • NewsVine
  • Slashdot
  • SphereIt
  • YahooMyWeb
  • BlogMemes
  • Spurl
  • E-mail this story to a friend!
  • Facebook
  • Furl
  • Google
  • Print this article!
No Comment
under: 05. Scale your site, Uncategorized
Digg it Add to del.icio.us Stumble it add to technorati

Database Performance - Indexing

Posted by admin in August 24th 2008  
1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 2 out of 5)
Loading ... Loading ...

There are 2 main focuses I will take to analyze a database. First, I will find out how it manages the data. Second, I will look at how it scales in term of data volume and traffics. Today, I will talk about the most common indexing scheme that most of the databases use today. It is B-Tree Indexing.

B-Tree Indexing

begin figure description - The paragraph that precedes this figure describes the content of the figure. - end figure description
 
Many people think B-Tree means binary tree. It is not right. If I really use binary tree to structure the index, 1 million index values will have a very deep tree to traverse and each node retrieval is equivalent to a read operation. Then, it may take so many reads to get down to the leaf node. How can it be performed? Instead, B-Tree means balanced tree. A B-tree is said to be balanced because it will never become lopsided as new nodes are added and removed. Apart from that, each node can have many sub-nodes. So, for millions of records, it can be handled by 2-3 levels of balanced tree. So, it is very good in performance. It offers O(log n) performance for a single-record lookups.
 
The fundamental unit of an index is the index item and a node is an index page that stores a group of index items. An index item contains a key value that represents the value of the indexed column for a particular row. An index item also contains rowid information that the database server uses to locate the row in a data page. A node is an index page that stores a group of index items.
 
It is interesting to note that the leaf nodes of the index are actually  a doubly linked list. Once we find out where to start in the leaf node (find the first value), doing an ordered scan of value (index range scan) is very easy. We don’t have to navigate the structure any more; we just go forward through the leaf nodes. That makes solving range-based queries such as "BETWEEN 20 and 30" much easier.
 

When should you use B-Tree Index?

To understand when you should use B-Tree index, you should know there are 2 ways to use an index. First, you can use index as a mean to access rows in a table via rowid. If you use index for that, you want to access a very small percentage of the rows in the table. Otherwise, you need to get into "index then row" cycle many times (implies many IOs) and it will be worse than pulling bunch of rows in batch to reduce the number of IOs (the costly part of database operation). According to the experiment done, full scan is faster if  we access too high % of  rows via index. Second, you can use index as a mean to answer the query if the index contains enough information to answer the entire query. In this case, we don’t need to go to the table at all. The index will be used as a thinner version of the table. So, if you want to access a large % of rows via index, you should consider to get the query answer via the information in the index.

MySQL Indexing

There are several rules to remember for MySQL indexing

  1. MySQL will only ever use one index per table per query (except for UNION b/c it is considered as separated queries).
  2. To get around that, you can create multicolumn indexes.
  3. When there are more than 1 indexes to choose from, MySQL makes an educated guess based on the statistics gathered.
  4. MyISAM has indexes kept in a completely separate file from table rows. And table rows are stored in the random order that are retrieved by the rowid in the index items.
  5. InnoDB uses clustered indexes that has primary key and the record itself clustered and the records are all stored in primary-key order. When your data is almost always searched on via its PK, clustered indexes can make lookups incredibly fast because single lookup can pull out record in question.
  6. Primary key cannot contain NULL whereas unique index can.

 

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • del.icio.us
  • Reddit
  • Technorati
  • NewsVine
  • Slashdot
  • SphereIt
  • YahooMyWeb
  • BlogMemes
  • Spurl
  • E-mail this story to a friend!
  • Facebook
  • Furl
  • Google
  • Print this article!
No Comment
under: 15. Database Performance
Tags: B-Tree, balanced Tree, database performance
Digg it Add to del.icio.us Stumble it add to technorati

Secret of Warren Buffett

Posted by admin in August 9th 2008  
1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 3.67 out of 5)
Loading ... Loading ...

Recently I have come across a great book named "Even Buffett Isn’t Perfect" that talks about Warren Buffett. What makes this book differentiates from the others is that it is not simply a love letter of Warren but an objective analysis of what contributes to Warren’s success. Below are the some of the key points I got from this book:

  1. Buffett loves insurance business. By sterring Berkshire into the insurance business, Buffett was able to get his hands on tremendous amount of float.
  2. Buffett puts lots of focuses on the good-quality management that he sees it will create long-term value.
  3. Buffett relies heavily on a discounted cash flow (DCF) analysis to find stocks and companies that can be purchased for less than intrinsic value. In other words, Buffett doesn’t buy cheap stock, he buys stock cheap.
  4. The key here is how to accurately calculate the intrinsic value.
  5. Diversification lowers your risk but it also locks you in the market rate of return. In Buffett’s mind, he believes if you know what  you are doing and really understand how to evaluate business, you just needs to hold dozen stocks to be sufficient diversification.
  6. From studies, value stocks beat growth stocks (ie. high price multiple like P/E) over the long term, but that growth stocks are a better bet for investors with shorter horizons. It also shows that small cap stocks do better than large-cap stocks over the long term.
  7. Buffett is a buyer of companies than a buyer of stocks.
  8. Buffett actively petitioned against efforts to eliminate personal taxes on dividends, even though dividends are paid to shareholders only after corporations have paid their own taxes. In other words, the situation results in double taxation. However, lower tax rates could benefit everyone by stimulating economic growth means better job opportunities. Author believes taxing the investment class at higher rates is likely to do more harm than good.

 

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • del.icio.us
  • Reddit
  • Technorati
  • NewsVine
  • Slashdot
  • SphereIt
  • YahooMyWeb
  • BlogMemes
  • Spurl
  • E-mail this story to a friend!
  • Facebook
  • Furl
  • Google
  • Print this article!
No Comment
under: Uncategorized
Digg it Add to del.icio.us Stumble it add to technorati

Evolution of XML parsing technologies

Posted by admin in July 16th 2008  
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Introduction

There were 2 main XML parsing technologies few years ago. They were SAX and DOM.

  1. SAX is event-driven and the events are fired and forget along the xml parsing. Advantages: It doesn’t need to cache the whole xml document in memory and you don’t need to wait til the whole xml been parsed before the first event emitted. Disadvantages: It uses Push API that holds the control during parsing. So clients cannot control the parsing and it doesn’t fit for xml manipulation.
  2. DOM is used to convert the xml into object tree in memory before manipulation. Advantages: Easier to manipulate the xml. Disadvantages: Eat up a lot of memory that is not good for documents larger than few MBs in size or in memory constrained environment such as J2ME.

Pull API is a more comfortable alternative for streaming processing of XML. A pull API is based around the more familiar iterator design pattern rather than observer design pattern. In a pull API, the client program asks the parser for the next piece of information rather than the parser telling the client program when the next datum is available. In a pull API the client program drives the parser. In a push API the parser drives the client. That leads to the invention of StAX.

In this article, I will introduce an new object model from Axis2 named AXIOM that uses StAX underneath for xml parsing. With this, xml parsing will cost less memory with better control. …..Click here to read more

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • del.icio.us
  • Reddit
  • Technorati
  • NewsVine
  • Slashdot
  • SphereIt
  • YahooMyWeb
  • BlogMemes
  • Spurl
  • E-mail this story to a friend!
  • Facebook
  • Furl
  • Google
  • Print this article!
No Comment
under: 11. Architect Corner
Tags: axiom, Axis2, data binding, dom, rpc, sax, soap, StAX, web service, XFire, xml, xml parser
Digg it Add to del.icio.us Stumble it add to technorati

Salesforce.com opens up Google Data API

Posted by admin in July 11th 2008  
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Salesforce + Google

It is good news to hear that Salesforce.com has made Google Data API available on its platform. To further understand the full potential of the new platform, I have googled around to see whether anyone has talked about it, here is the first article I found that covers some use cases on this topic. 

After getting a taste of its power, lets set it up and try it ourselves.

 

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • del.icio.us
  • Reddit
  • Technorati
  • NewsVine
  • Slashdot
  • SphereIt
  • YahooMyWeb
  • BlogMemes
  • Spurl
  • E-mail this story to a friend!
  • Facebook
  • Furl
  • Google
  • Print this article!
No Comment
under: 16. Salesforce
Tags: Google, Google Data API, salesforce
Digg it Add to del.icio.us Stumble it add to technorati

Powerful Full Text Search - Part 3 Solr

Posted by admin in July 6th 2008  
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Introduction of Solr

Solr is a standalone enterprise search server with a web-services like API. You put documents in it (called "indexing") via XML over HTTP (RESTful). You query it via HTTP GET and receive XML results.

  • Advanced Full-Text Search Capabilities
  • Optimized for High Volume Web Traffic
  • Standards Based Open Interfaces - XML and HTTP
  • Comprehensive HTML Administration Interfaces
  • Scalability - Efficient Replication to other Solr Search Servers
  • Flexible and Adaptable with XML configuration
  • Extensible Plugin Architecture

…..Click here to read more

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • del.icio.us
  • Reddit
  • Technorati
  • NewsVine
  • Slashdot
  • SphereIt
  • YahooMyWeb
  • BlogMemes
  • Spurl
  • E-mail this story to a friend!
  • Facebook
  • Furl
  • Google
  • Print this article!
No Comment
under: 02. Build your site, 05. Scale your site
Tags: indexing, lucene, solr, web service
Digg it Add to del.icio.us Stumble it add to technorati

Powerful Full Text Search - Part 2 Nutch

Posted by admin in July 6th 2008  
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Introduction of Nutch & Hadoop

After Lucene, the author created another powerful tool. Its name is Nutch. Nutch is a powerful crawler built on top of the Lucene. With Nutch, you can launch a multi-threaded crawler to obtain information from the Net. At this point of writing, Nutch is in its 0.9 version. Nutch comes with a list of cool features, including whole Web crawling, local file crawling for the
intranet, indexing all the while.

Hadoop was designed to handle the petabytes of data that Nutch could potentially store and process. In fact, Hadoop has its own file system: the Hadoop Distributed File System (HDFS), which can run on any old run-of-the-mill, low-cost hardware.
Hadoop works by storing part of the file system’s data across all the servers in the cluster. As new queries come in, HDFS follows the "moving computation is cheaper than moving data" rule — meaning that moving the processing of the query to as
close as possible to the data will be faster than placing the query at random within the cluster and moving data long distances across the network.

I have searched around to see if anyone can give me some tips on this tool. Surprisingly, I don’t see much. But don’t worry, I have found some that can at least get you start playing with it.

…..Click here to read more

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • del.icio.us
  • Reddit
  • Technorati
  • NewsVine
  • Slashdot
  • SphereIt
  • YahooMyWeb
  • BlogMemes
  • Spurl
  • E-mail this story to a friend!
  • Facebook
  • Furl
  • Google
  • Print this article!
No Comment
under: 02. Build your site, 11. Architect Corner
Tags: crawler, hadoop, HDFS, indexing service, lucene, nutch
Digg it Add to del.icio.us Stumble it add to technorati

Powerful Full Text Search Engine - Part 1 Lucene Introduction

Posted by admin in July 4th 2008  
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Introduction of Lucene

I have heard of Lucene and its powerful full text search capability many times. Today, I decide to take a look at it. Before I dive into the user guide, I went to Google Tech Talk to find a video related to Lucene first. Here is what I found: 

After I finished this video, I found Lucene a really great tool for me. So, I decided to have a deeper look at it. After a quick search,  I found a great blog that showed me how to use Lucene with Digg. With Solr on top of Lucene, you can make Lucene available as RESTful Web Service. It is so awesome, isn’t it? In this article, I will list you all the information I found during my little research on Lucene and I hope you will feel it useful.

…..Click here to read more

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • del.icio.us
  • Reddit
  • Technorati
  • NewsVine
  • Slashdot
  • SphereIt
  • YahooMyWeb
  • BlogMemes
  • Spurl
  • E-mail this story to a friend!
  • Facebook
  • Furl
  • Google
  • Print this article!
No Comment
under: 02. Build your site, 10. Unleash your system
Tags: digester, full-text search, grep, lucene, REST, solr, web service
Digg it Add to del.icio.us Stumble it add to technorati

Grid Computing - Part 1 Introduction

Posted by admin in July 3rd 2008  
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Introduction from Cameron Purdy

 

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • bodytext
  • del.icio.us
  • Reddit
  • Technorati
  • NewsVine
  • Slashdot
  • SphereIt
  • YahooMyWeb
  • BlogMemes
  • Spurl
  • E-mail this story to a friend!
  • Facebook
  • Furl