Archive | Site Building RSS feed for this section

Plenty of Fish – Cash cow!

A site called “PlentyOfFish.com” is currently getting 30 million hits a day. The number doesn’t blow me off. However, what surprise me is that this site is basically operated by single man “Markus Frind”. How does he achieved that? If you want to hear how he does that, you can go to his interview from this link. Otherwise, you can read the summary I got from his interview.

The stuff I learnt from Markus

You may think that Markus must spend a lot of $$ to maintain his site. A picture of server farm may be popped up in your head. Hahaha… all he needs is just 1 web server and 3 database servers. This is the cost that you and me can afford. No bother to write your business plan and wait for VC $$ nowadays. :grin:

Here are some quick tips for Markus

  1. You need a lot of RAM. RAM is cheap, go ahead to power up your box with tons of RAMs please!
  2. Markus uses Akamai CDN to offload the bandwidth of fetching images across different locales.
  3. Separate R/W database operation.
  4. Markus uses one database as master for write and 2 databases as slave to handle the searches (read). According to him, radius-based searches demand lots of resources. “If you have one system to do just one thing, it will do it much efficiently.”
  5. Markus put RAM to both web and db servers. “If you can load your whole db in the RAM, do it!”
  6. Optimize the db access is the key to handle lots of requests.
  7. Denormalization is necessary if you want to reduce the number of joins that can potentially slow down your queries.
  8. PlentyOfFish.com is purely based on “Word of Mouth” marketing. Do things right, your users will spread it out for you. Cheapest marketing strategy ever!
  9. PlentyOfFish.com is FREE site. Because it is free, it doesn’t have high requirements like uptime. It can be down without much issues.
  10. PlentyOfFish.com solely monetized from advertisement like Google Ads. Just this, Markus is making around 10 million annually. Amazing!
  11. PlentyOfFish.com is purely using Microsoft solution like IIS, ASP.NET and SQL Server. In fact, you can build it using other solution like Apache, Spring, MySQL

I love to see how people like Markus beat down the giant like Match.com. One man beats hundreds of people with simple system settings. Incredible! Folks, there is no excuse whining no $$ to start your business!:lol:

Although it sounds easy for Markus during the interview, there are areas the interviewer didn’t cover:

  1. PlentyOfFish.com webfront is not looking good. How could it attract the first set of users in the first place? FREE
  2. If you go to a FREE site without data, you may leave it right away. How PlentyOfFish.com attracts the first real user? Did PlentyOfFish.com crawl competitors’ data to power his site as bootstrap?
  3. PlentyOfFish.com purely makes $$ from Google AdSense. However, according to John Chow, Adsense is not a good place to make $$. Why is that?

What possibly may go wrong for his approach:

His database architecture is traditional master-slave approach. It can offload the read but not write operations. Obviously the master becomes the write bottleneck and a single point of failure. And as load increases the cost of replication increases as well. Replication costs in CPU, network bandwidth, and disk IO. The slaves fall behind and have stale data. The folks at YouTube had a big problem with replication overhead as they scaled. This problem can be tackled by shard/ federation. I will discuss this topic later.

 

Leave a comment Continue Reading →

Powerful Full Text Search – Part 3 Solr

Introduction of Solr

Solr is a standalone enterprise search server with a web-services like API. You put documents in it (called "indexing") via XML over HTTP (RESTful). You query it via HTTP GET and receive XML results.

  • Advanced Full-Text Search Capabilities
  • Optimized for High Volume Web Traffic
  • Standards Based Open Interfaces – XML and HTTP
  • Comprehensive HTML Administration Interfaces
  • Scalability – Efficient Replication to other Solr Search Servers
  • Flexible and Adaptable with XML configuration
  • Extensible Plugin Architecture

Set up Solr

 To set up Solr, you should follow this guideline. After the set up Solr, you practically have a indexing service up.

The HTTP/XML interface of the indexer has two main access points: the update URL, which maintains the index, and the select URL, which is used for queries. In the default configuration, they are found at:

  • [code]]czozNDpcImh0dHA6Ly9baG9zdG5hbWU6cG9ydF0vc29sci91cGRhdGVcIjt7WyYqJl19[[/code]
  • [code]]czo3OlwiaHR0cDovL1wiO3tbJiomXX0=[[/code][code]]czoxNTpcIltob3N0bmFtZTpwb3J0XVwiO3tbJiomXX0=[[/code][code]]czoxMjpcIi9zb2xyL3NlbGVjdFwiO3tbJiomXX0=[[/code]

To add a document to the index, we POST an XML representation of the fields to index to the update URL. In addition, you can delete, update (ie. re-post on unique). All change operations need to commit to flush to file system. On the other hand,  once we have indexed some data, an HTTP GET on the select URL does the querying. 

Powerful features Behind Solr

If you follow the guideline above, you already get yourself familiar with indexing, searching and facet browsing. Now lets get down to how to make Solr a scalable solution with great performance.

Caching

TBA

Distribution and Replication

For applications that receive large volumes of queries, a single Solr server may not be enough to meet performance requirements. Therefore, Solr provides mechanisms for replicating the Lucene index across multiple servers that are part of a load-balanced suite of query servers. The replication process is handled through a combination of event listeners enabled through the solrconfig.xml file and several shell scripts (located in solr/bin of the example application).

In a replicating architecture, one Solr server acts as the master server, providing copies of the index (called [code]]czo5Olwic25hcHNob3RzXCI7e1smKiZdfQ==[[/code]) to one or more slave servers that handle query requests. Indexing commands are sent to the master server and queries are sent to the slave servers. The master server can create snapshots manually or by configuring the [code]]czoyMTpcIiZsdDt1cGRhdGVIYW5kbGVyJmd0O1wiO3tbJiomXX0=[[/code] section of solrconfig.xml to trigger snapshot creation when [code]]czo2OlwiY29tbWl0XCI7e1smKiZdfQ==[[/code] and/or [code]]czo4Olwib3B0aW1pemVcIjt7WyYqJl19[[/code] events are received. In either the manual or the event-driven process, the [code]]czoxMTpcInNuYXBzaG9vdGVyXCI7e1smKiZdfQ==[[/code] script is invoked on the master server, creating a directory on the server named [code]]czoyMzpcInNuYXBzaG90Lnl5eXltbWRkSEhNTVNTXCI7e1smKiZdfQ==[[/code] where [code]]czoxNDpcInl5eXltbWRkSEhNTVNTXCI7e1smKiZdfQ==[[/code] is the actual time the snapshot was created. The slave servers then use rsync to copy only those files in the Lucene index that have been changed.

<listener event="postCommit" class="solr.RunExecutableListener">
    <str name="exe">snapshooter</str>
    <str name="dir">solr/bin</str>
    <bool name="wait">true</bool>
    <arr name="args"> <str>arg1</str> <str>arg2</str> </arr>
    <arr name="env"> <str>MYVAR=val1</str> </arr>
</listener>

Reference

Below are some cool references I found:

  1. Search smarter with Apache Solr, Part 1: Essential features and the Solr schema
  2. Search smarter with Apache Solr, Part 2: Solr for the enterprise
  3. Advanced Lucene

 

 

Leave a comment Continue Reading →

Powerful Full Text Search – Part 2 Nutch

Introduction of Nutch & Hadoop

After Lucene, the author created another powerful tool. Its name is Nutch. Nutch is a powerful crawler built on top of the Lucene. With Nutch, you can launch a multi-threaded crawler to obtain information from the Net. At this point of writing, Nutch is in its 0.9 version. Nutch comes with a list of cool features, including whole Web crawling, local file crawling for the
intranet, indexing all the while.

Hadoop was designed to handle the petabytes of data that Nutch could potentially store and process. In fact, Hadoop has its own file system: the Hadoop Distributed File System (HDFS), which can run on any old run-of-the-mill, low-cost hardware.
Hadoop works by storing part of the file system’s data across all the servers in the cluster. As new queries come in, HDFS follows the "moving computation is cheaper than moving data" rule — meaning that moving the processing of the query to as
close as possible to the data will be faster than placing the query at random within the cluster and moving data long distances across the network.

I have searched around to see if anyone can give me some tips on this tool. Surprisingly, I don’t see much. But don’t worry, I have found some that can at least get you start playing with it.

Set up Nutch

Here is the guideline written by Peter Wang that I followed to bring my Nutch up. Follow it and bring your Nutch before go further. By the way, if you want to run Nutch with Solr, this is a good tutorial.

Nutch Architectural Review

 

 

 

Leave a comment Continue Reading →

Powerful Full Text Search Engine – Part 1 Lucene Introduction

Introduction of Lucene

I have heard of Lucene and its powerful full text search capability many times. Today, I decide to take a look at it. Before I dive into the user guide, I went to Google Tech Talk to find a video related to Lucene first. Here is what I found: 

After I finished this video, I found Lucene a really great tool for me. So, I decided to have a deeper look at it. After a quick search,  I found a great blog that showed me how to use Lucene with Digg. With Solr on top of Lucene, you can make Lucene available as RESTful Web Service. It is so awesome, isn’t it? In this article, I will list you all the information I found during my little research on Lucene and I hope you will feel it useful.

Architecture Overview

Before we dig into the code or set up guidelines, I would like to have a high level picture of Lucene first. I borrow a diagram from this article that helps me to grasp the key components in search.

This high level picture shows you that your search keywords you entered (normally using a form) will become a HTTP search request and later been translated into a form that search engine understands by Query parser. Search engine will perform the search operation against the indexed files that was previously prepared by Indexer. After that, the result will be ranked based on predefined ranking algorithm and returned to the user. The source of the data can be from Web Service, database or documents in your file system. In this diagram, it shows you that you can launch spider or crawler like Google to obtain the data from web pages on the Internet and feed it to Indexer as your source.

Get one step deeper

Now you know the high level flow of how search works. Lets get one step into the detail.

  1. What search interface to use?
  2. How search interface communicates with your search engine?
  3. What kind of search the search engine provides?
  4. How search engine indexes the documents?
  5. How result be ranked and what kind of ranking algorithms we normally use?

Below is the answers of the questions above:

  1. Up to you. I would use Flex as I want to provide a rich search interface to my users.
  2. Flex can talk HTTP, Web Service or RemoteObject AMF. If you put web service layer on Lucene (ie. Solr), you can use REST call (ie. HTTP) to obtain the result.
  3. Lucene supports several kinds of advanced searches like:
    • Boolean operators – users can compose query using AND, OR, NOT
    • Field Search – what fields the search operates on? like title, author or content?
    • Wildcard Search - supports * and ?.
    • Fuzzy Search - Lucene provides a fuzzy search that’s based on an edit distance algorithm. You can use the tilde character (~) at the end of a single search word to do a fuzzy search. For example, the query "think~" searches for the terms similar in spelling to the term "think." The key here is the word "similar". Do we consider horse and donkey are related? Or you have hose and horse be related somehow in spelling?
    • Range Search - age, date and etc
  4. Large topic. I will go back to it later.
  5. Up to you. If you want to look at the popular ranking algorithm in the world, check out Google Page Rank. It is one of the algorithms that many of us interested to know. Before I want to have my wedding website – Justproposed.com be shown on at the top of the result when users type "wedding website" as search keywords, I have looked into SEO. It is a fun area to explore. Generally speaking, if the query keywords shown in the title, it weights more, If the keyword frequency is higher, it ranks higher..blah blah. However, I know Google has weighted a lot on the links. It is not just purely based on the document that you have. How to obtain the additional information during the crawling is beyond the scope of this article.

Get your hand dirty

Look into this great article.

The thing this article doesn’t mention is that you need to create you dataDir and indexDir folders under C and drag a list of html files into the dataDir before you start the web server. If you drag new htmls into it, you need to clean up your indexDir and restart your web server in order to rebuild the indexes.

I have got the application up and running. It is nice trial. My next step is to enhance this example. I will do the following:

  1. Use Flex as search interface
  2. Use Solr to expose the Lucene search engine as Web Service.
  3. Have Flex calls my search engine via REST.
  4. Display the result on Flex.

After I have my new enhancements working, I would do the following:

  1. Look into how Lucene do the indexing
  2. Look into Nutch,. So I can have it crawled some sites and put the htmls in dataDir for me automatically.

How Lucene Indexes the documents?

Yes. I haven’t forgot to answer the question 4. Here is the article that answers your question. To summarize, here are several key points I extracted from this article.

  1. Content Extraction – Lucene only takes text for index. So, it provides different types of parsers to extract content from different types of document like word, html, doc, pdf and etc. If you have other type of document that you cannot find a parser, you take the responsibility to extract the content out for Lucene. This article shows you how to use Digester to extract content out from XML and feed Lucene. If you have a large pool of XML for content extraction, you need to pay attention on the parsing time. There is someone who has done this and obtain some performance number as reference. However, the article was a bit outdated.
  2. Content Preprocessing – Analyzer is used to extract the token from your text content to be indexed. Before text is indexed, it is passed through an [code]]czo4OlwiQW5hbHl6ZXJcIjt7WyYqJl19[[/code]. [code]]czo4OlwiQW5hbHl6ZXJcIjt7WyYqJl19[[/code]s are in charge of extracting indexable tokens out of text to be indexed, and eliminating the rest. Lucene comes with a few different [code]]czo4OlwiQW5hbHl6ZXJcIjt7WyYqJl19[[/code] implementations. Some of them deal with skipping stop words (frequently-used words that don’t help distinguish one document from the other, such as "a," "an," "the," "in," "on," etc.), some deal with converting all tokens to lowercase letters, so that searches are not case-sensitive, and so on.
  3. Indexing – IndexWriter is the key component in the indexing process. This class will use Analyzer that you passed in as parameter to create a new index or open an existing index and add documents to it. You need to set up fields and documents and feed them to the IndexWriter to do the job. Like the code below, you fetches a list of .txt files and its metadata like path from a directory and feed them for IndexWriter. IndexWriter will index them one after one.
  4. Configuration - You can configure IndexWriter to achieve better performance via increasing the buffer size because the bottleneck normally happen during the IO of the index files.
  5. Lucene uses inverted index concept. An inverted index is an inside-out arrangement of documents in which terms take center stage. Each term points to a list of documents that contain it. On the contrary, in a forwarding index, documents take the center stage, and each document refers to a list of terms it contains. You can use an inverted index to easily find which documents contain certain terms. Lucene uses an inverted index as its index structure.

 
for(int i = 0; i < textFiles.length; i++){
      if(textFiles[i].isFile() >> textFiles[i].getName().endsWith(".txt")){
        Reader textReader = new FileReader(textFiles[i]);
        Document document = new Document();
        document.add(Field.Text("content",textReader));
        document.add(Field.Keyword("path",textFiles[i].getPath()));
        indexWriter.addDocument(document);
      }
}

Lucene offers four different types of fields from which a developer can choose: Keyword,UnIndexed,UnStored,and Text.

Keyword fields are not tokenized, but are indexed and stored in the index verbatim. This field is suitable for fields whose original value should be preserved in its entirety, such as URLs, dates, personal names, Social Security numbers, telephone numbers, etc.

UnIndexed fields are neither tokenized nor indexed, but their value is stored in the index word for word. This field is suitable for fields that you need to display with search results, but whose values you will never search directly. Because this type of field is not indexed, searches against it are slow. Since the original value of a field of this type is stored in the index, this type is not suitable for storing fields with very large values, if index size is an issue.

UnStored fields are the opposite of UnIndexed fields. Fields of this type are tokenized and indexed, but are not stored in the index. This field is suitable for indexing large amounts of text that does not need to be retrieved in its original form, such as the bodies of Web pages, or any other type of text document.

Text fields are tokenized, indexed, and stored in the index. This implies that fields of this type can be searched, but be cautious about the size of the field stored as Text field.

Conclusion

To use Lucene, there are 3 main concepts you need to grasp. There are:

  1. Indexer – create search engine indexes
  2. Analyzer – Split text into tokens that make sense for the search engine. The structure is like document -> a sequence of fields and each field is name/value pair -> tokens. Field values may be stored, indexed or analyzed/ tokenize, (and, now, vectored). The lecture note from Doug Cutting will give you more detail.
  3. Searcher

You may think of using grep to achieve or database to achieve what Lucene does. Grep is powerful Linux tool, however, if you want it to search on files with several MB in size, you will see that the tool is inefficient. The reason is grep doesn’t prepare the indexes of your files ahead of the time you do the search. Database can do indexing but not so sophisticated as Lucene in your varchar field. Oracle may provide one but I am not familiar with it. One key thing to remember: Lucene is open source, free and does the job extremely well. Why bother to dig into Oracle costly solution?

Lucene has given us a rich search engine capability on our web application. It has many features that I haven’t got a chance to discuss them all in this article. I will continue to write more articles on this topic as my research moves forward. Have a nice day! :lol:

Reference

The blog of the Lucene and Solr creator – Doug Cutting

 

Leave a comment Continue Reading →

Scale your site via Amazon solution – EC2 and S3

Steps to use EC2

  1. To get you start, read here.
  2. Create a custom AMI

How others use it

  1. How SmugMug uses Amazon solution
  2. http://www.rajiv.com/blog/2008/02/04/amazon-ec2/
Leave a comment Continue Reading →

Web Utility Package

Http Proxy Servlet
Both Flex and AJAX have Browser security restriction that can only allow them to access the web server where they originally come from. In order for your Flex and AJAX app to access the web service on the Net, you need to deploy a proxy onto the web server they can access. In PHP, you can use this php proxy (code). In Java, you can use this proxy servlet (code).

Shell Servlet
If you want to issue commands to the web server and tell it to run this command. You can do that with Shell Servlet. With this, you can do Web Administration online, even through WAP phone. 

Reference:

http://www.servletsuite.com/servlets.htm

Leave a comment Continue Reading →

Web Technology – Application Events

Web application events
In  servlet 2.3 spec, web application events are introduced that give you greater degree of control over your web application. The two important application events are:

  1. Application startup and shutdown
  2. Session creation and invalidation

As their names suggest, application startup event occurs when your web application is first loaded and started by the Servlet container and application shutdown event occurs when the web application is shutdown.

Session creation event occurs every time a new session is created on the server and similarly session invalidation event occurs every time a session is invalidated. To make use of these web application events and to do something useful you’ll have to create and make use of special “listener” classes.

Listener
These are simple Java classes which implement one of the two following interfaces:

  1. javax.servlet.ServletContextListener
  2. javax.servlet.http.HttpSessionListener

If you want your class to listen for application startup and shutdown events then implement ServletContextListener interface. If you want your class to listen for session creation and invalidation events then implement HttpSessionListener interface.

ServletContextListener – This interface contains two methods :
[code]]czoxMTI6XCJwdWJsaWMgdm9pZCBjb250ZXh0SW5pdGlhbGl6ZWQoU2VydmxldENvbnRleHRFdmVudCBzY2UpOw0KcHVibGljIHZvaWR7WyYqJl19IGNvbnRleHREZXN0cm95ZWQoU2VydmxldENvbnRleHRFdmVudCBzY2UpO1wiO3tbJiomXX0=[[/code]

HttpSessionListener – This interface contains two methods also:
[code]]czoxMDA6XCJwdWJsaWMgdm9pZCBzZXNzaW9uQ3JlYXRlZChIdHRwU2Vzc2lvbkV2ZW50IHNlKTsNCnB1YmxpYyB2b2lkIHNlc3Npb257WyYqJl19RGVzdHJveWVkKEh0dHBTZXNzaW9uRXZlbnQgc2UpO1wiO3tbJiomXX0=[[/code]

Example of usage – use HttpSessionListener to count how many active session[code]]czo5MzpcIg0KaW1wb3J0IGphdmF4LnNlcnZsZXQuaHR0cC5IdHRwU2Vzc2lvbkxpc3RlbmVyOw0KaW1wb3J0IGphdmF4LnNlcnZsZXR7WyYqJl19Lmh0dHAuSHR0cFNlc3Npb25FdmVudDtcIjt7WyYqJl19[[/code][code]]czozNzA6XCJwdWJsaWMgY2xhc3MgU2Vzc2lvbkNvdW50ZXIgaW1wbGVtZW50cyBIdHRwU2Vzc2lvbkxpc3RlbmVyIHsNCnByaXZhdGV7WyYqJl19IHN0YXRpYyBpbnQgYWN0aXZlU2Vzc2lvbnMgPSAwOw0KLyogU2Vzc2lvbiBDcmVhdGlvbiBFdmVudCAqLw0Kw4LCoHB1YmxpYyB2b3tbJiomXX1pZCBzZXNzaW9uQ3JlYXRlZChIdHRwU2Vzc2lvbkV2ZW50IHNlKSB7DQrDgsKgIGFjdGl2ZVNlc3Npb25zKys7DQrDgsKgfQ0KLyoge1smKiZdfVNlc3Npb24gSW52YWxpZGF0aW9uIEV2ZW50ICovDQrDgsKgcHVibGljIHZvaWQgc2Vzc2lvbkRlc3Ryb3llZChIdHRwU2Vzc2lvbkV7WyYqJl19dmVudCBzZSkgew0Kw4LCoCBpZihhY3RpdmVTZXNzaW9ucyAmZ3Q7IDApDQrDgsKgIGFjdGl2ZVNlc3Npb25zLS07DQrDgsKgfVwiO3tbJiomXX0=[[/code][code]]czo3ODpcInB1YmxpYyBzdGF0aWMgaW50IGdldEFjdGl2ZVNlc3Npb25zKCkgew0Kw4LCoCByZXR1cm4gYWN0aXZlU2Vzc2lvbnM7DQp7WyYqJl19w4LCoH0NCn1cIjt7WyYqJl19[[/code]Register listener to the web application[code]]czo2NzA6XCINCiZsdDshLS0gV2ViLnhtbCAtLSZndDsNCiZsdDs/eG1sIHZlcnNpb249XCIxLjBcIiBlbmNvZGluZz1cIklTTy04ODU5LTF7WyYqJl19XCI/Jmd0Ow0KJmx0OyFET0NUWVBFIHdlYi1hcHANCsOCwqBQVUJMSUMgXCItLy9TdW4gTWljcm9zeXN0ZW1zLCBJbmMuLy9EVEQgV2Vie1smKiZdfSBBcHBsaWNhdGlvbiAyLjMvL0VOXCINCsOCwqBcIjxhIGhyZWY9XCJodHRwOi8vamF2YS5zdW4uY29tL2oyZWUvZHRkcy93ZWItYXBwXzJ7WyYqJl19LjMuZHRkXCI+aHR0cDovL2phdmEuc3VuLmNvbS9qMmVlL2R0ZHMvd2ViLWFwcF8yLjMuZHRkPC9hPlwiJmd0OyZsdDt3ZWItYXBwJmd0e1smKiZdfTsmbHQ7IS0tIExpc3RlbmVycyAtLSZndDsNCsOCwqAmbHQ7bGlzdGVuZXImZ3Q7DQrDgsKgw4LCoCZsdDtsaXN0ZW5lci1jbGFzcyZ7WyYqJl19Z3Q7DQrDgsKgw4LCoGNvbS5zdGFyZGV2ZWxvcGVyLndlYi5saXN0ZW5lci5TZXNzaW9uQ291bnRlcg0Kw4LCoMOCwqAmbHQ7L2xpc3tbJiomXX10ZW5lci1jbGFzcyZndDsNCsOCwqAmbHQ7L2xpc3RlbmVyJmd0Ow0Kw4LCoCZsdDtsaXN0ZW5lciZndDsNCsOCwqDDgsKgJmx0O2xpe1smKiZdfXN0ZW5lci1jbGFzcyZndDsNCsOCwqDDgsKgY29tLnN0YXJkZXZlbG9wZXIud2ViLmxpc3RlbmVyLkFwcGxpY2F0aW9uV2F0Y2gNCsN7WyYqJl19gsKgw4LCoCZsdDsvbGlzdGVuZXItY2xhc3MmZ3Q7DQrDgsKgJmx0Oy9saXN0ZW5lciZndDsmbHQ7L3dlYi1hcHAmZ3Q7XCI7e1smKiZdfQ==[[/code]

Spring Events
Spring provides a simple mechanism for sending and receiving events between beans. To receive an event, a bean implements ApplicationListener, which has a single method:
[code]]czo1NTpcInB1YmxpYyB2b2lkIG9uQXBwbGljYXRpb25FdmVudChBcHBsaWNhdGlvbkV2ZW50IGV2ZW50KTtcIjt7WyYqJl19[[/code]
To publish events to listeners you call the publishEvent() method the ApplicationContext. This will publish the same event to every listener in the context. Event listeners receive events synchronously. This means the publishEvent()  method blocks until all listeners have finished processing the event. it is possible to supply an alternate event publishing strategy via a ApplicationEventMulticaster implementation. Furthermore, when a listener receives an event it operates inside the transaction context of the publisher, if a transaction context is available.

You can be both listener or publisher. If it is a publisher, it needs to have access to the ApplicationContext. This means that beans will have to made aware of the container that they are running in. You can create your own custom event via extends ApplicationEvent class. In addition to events that are published by other beans, the Spring container itself publishes a handful of events during the course of an application’s lifetime. These application events include:

  1. ContextClosedEvent – publish when the application context is closed
  2. ContextRefreshEvent – publish when the application context is initialized or refreshed
  3. RequestHandledEvent – publish when a request is handled

Put them together
Look at how acegi publish session creation/destroy events to the bean(s) listening on this. It has HttpSessionEventPublisher class that implements HttpSessionListener. So, web container will trigger its sessionCreated and sessoinDestroy methods when session is created an destroyed. Within these methods, the publisher will use ApplicationContext to publish its own HttpSessionCreatedEvent and HttpSessionDestroyedEvent to all the spring bean(s) listening on these. (code)

Reference
http://java.sys-con.com/read/171482_1.htm (Acegi)
http://www.acegisecurity.org/articles.html (Acegi)

Leave a comment Continue Reading →

Migrate Struts to Spring MVC

How Spring MVC works?
Many people suggest me to use Spring MVC instead of Struts. I decide to take a quick look at that. Before I give you a comparsion chart, I think it is good to outline the core mechanism that Spring MVC uses to achieve the goal. Here it is:

  • A dispatcher servlet, which will intercept incoming requests.
  • A configuration that instructs the web container to route requests to the dispatcher servlet (ie. a servlet mapping)
  • Controllers (the C in MVC) that can perform the business logic and route to particular views (the V) perhaps conveying information models (the M).
  • A way for the dispatcher servlet to know which URLs to map to which controllers – called a url mapping (Decouple servlet from controller)
  • A way to determine, when a controller has done its job, how to get to the view – called a view resolver  (Decouple controller from view)

Now you know the heart of Spring MVC. It is really similar to how Struts works : ActionServlet -> Action (logic) -> View with Model (struts-mapping.xml holds the information of how to wire them up).

Reference

http://www.devx.com/Java/Article/29208/1954?pf=true

Leave a comment Continue Reading →

Spring Web Flow – basic concept

How can you implement the page flow for airline booking system with Struts alone? Before we go deep on the new technology, lets try to tackle it the old way. First, take a look at the page flow below first.

To implement a multi-step page flow in Struts,  we can chain individual actions together through the various views. Action URLs to process different events like “back” or “submit” are hard-coded into each view. Some form of ad-hoc session storage is used to manage flow state. Redirect after post is used to prevent duplicate submissions, etc. Although this is a simple and functional approach, it has a major disadvantage: the overall page flow of the web application is not clear from looking at the action definitions in the struts-config.xml file. That is to say, the flow is not explicitly defined but implicitly hide in your struts-config.xml. Besides, flexibility also suffers since actions and views cannot be easily reused.

This is where Spring Web Flow comes in, allowing you to represent the page flow of a web application in finite state machine. 

In finite state machine, a web flow is composed of a set of states. A state is a point in the flow where something happens; for instance, displaying a view (ie. ViewState) or executing an action (ie. ActionState). Each state has one or more transitions (ie. event) that are used to move to another state. In Spring Web Flow, an ActionState executes an action when entered. That action returns the logical result of its execution, and that result is mapped to a state transition. When an ViewState is entered, it will cause the executing flow to pause, and returns control back to the client with instruction to render the configured view. Later, after some user think-time, the client signals an event describing what action the user took. That resumes the flow, and the event that occurred is mapped to a state transition, which takes the user to the next step in the flow.

The state machine model can be reused anywhere, including environments like Struts, Spring MVC, Tapestry, JSF, and even Portlets. Besides, The page flow in a web application is clearly visible by looking at the corresponding web flow definition (in an XML file or Java class). The page flow now is clearly and explicitly specified in web flow definition (either in XML file or Java class). And it is self-contained and reusable in multiple situations. On top of that, it has a clear, observable lifecycle that is managed for you automatically.

Q & A

Q1) “… since the executing flow is paused when a ViewState is entered, and control is returned to the browser, how is the same flow picked up and resumed on subsequent events?”  The answer is the client tracks the unique id of the executing flow, and provides it as input when the next event is signaled. This is typically done using a hidden form field. This is the concept of continuation.

<input type=”hidden” value=”<c:out value=”${flowExecution.id}”/>”>

Q2) “What about decision state and subflow state?”  Decision state decisions what transition to take with conditional logic embedded. For a flow that is independent from the main flow, you can model it as subflow. When a subflow state is entered, a child flow is spawned. The parent flow is suspended until the child flow ends. This lets you view your application as a set of self-contained modules – flows – that you can easily embed in multiple situations in a consistent manner. By the way, you can pass information to the subflow.

Q3) “What about the end states of the state machine?”  When an end state is entered, the active flow session terminates. Upon termination, all resources associated with the flow are cleaned up for you automatically. So, a new scope “flow” is introduced that spans multiple requests but shorter than a session.

Q4) “When should I use web flow?”  It should be noted that Spring Web Flow is not a one-size-fits-all solution. As you’ve seen, it’s a stateful system that automates the management of page flows that drive business processes. It should not be used when simpler, stateless solutions are more appropriate. For example, it should not be used where sites require free navigations, where the user is free to “click around” anywhere they please. Spring Web Flow is designed to power controlled navigations, where the user is guided through a process with a clear business goal and lifecycle.

Reference

TheServerSide Spring Web Flow Article

Spring Web Flow Architecture

Spring Web Flow – Practical Introduction

Official Spring Web Flow Site

Leave a comment Continue Reading →