Archive | December, 2008

How to pick a good web hosting company for Java webapp

I currently use Dreamhost for my own company “JustProposed.com” that is powered by typical LAMP solution. It is a great shared hosting solution but it doesn’t support website powered by Java. If you have Java website, I suggest you to try a web hosting company that provides you VPS (Virtual Private Server) solution. VPS occupies a middle ground between a dedicated server and regular shared hosting. You get the features of dedicated hosting in a shared environment. In other words, your virtual server runs on one of host servers. The host server runs a number of virtual servers.  Each virtual server shares the host server’s memory, CPU, Internet connection and other resources. No one VPS can monopolize resources.  Each VPS gets a guaranteed share of the server CPU, disk IO and network.

Before you pick a good web hosting company for your java webapp, you better get familiar with the system need of your webapp first. There are some interesting topics that you may encounter:

  1. Shared JVM vs Private JVM – The two biggest problems with shared Java hosting are memory leaks and security. If someone has a memory leak or code that is a high user of memory in a shared environment, all of the people sharing that JVM suffer the same memory problems.  Apart from that, it is not secure. Therefore, you don’t want to share the same JVM with other users although it is cheaper. One more disadvantage of Shared JVM is that you cannot restart the Tomcat as you wish.
  2. PermGen space (default = 64MB) is used for things that do not change (or change often). e.g. Java classes.  So often large, complex apps will need lots of PermGen space. Similarly if you are doing frequent war/ear/jar deployments to running servers like Tomcat or JBoss you may need to issue a server restart after a few deploys or increase your PermGen space. To increase the PermGen space use something like: -XX:MaxPermSize=128m
  3. Heap size setting – Java has a couple of settings that help control how much memory it uses:
    • -Xms sets the minimum memory heap size.
    • -Xmx sets the maximum memory heap size. When setting the -Xmx setting you should consider a few things…  -Xmx has to be enough for you to run your app.  If it is set too low then you may get Java OutOfMemory exceptions (even when there is sufficient spare memory on the server).
    • We typically specify the same amount of memory for both flags (-Xms and -Xmx) to force the server to use all the allocated memory from startup. This way, the JVM wouldn’t need to dynamically change the heap size at runtime, which is a leading cause of JVM instability.
    • If you don’t specify a memory size in the JVM startup flags, the JVM would limit the heap memory to 64MB (512MB on Linux), no matter how much physical memory you have on the server!
    • For 64-bit servers, make sure that you run a 64-bit JVM on top of a 64-bit operating system to take advantage of all RAM on the server. Otherwise, the JVM would only be able to utilize 2GB or less of memory space. 64-bit JVMs are typically only available for JDK 5.0.
    • Suggested memory size: PermGen + Max Heap = 256MB. Of course, the more you get the better! :smile:
  4. Garbage collection (GC)With a large heap memory, the garbage collection (GC) operation could become a major performance bottleneck. It could take more than ten seconds for the GC to sweep through a multiple gigabyte heap.
    • Single-threaded vs concurrent GC – In JDK 1.3 and earlier, GC is a single threaded operation, which stops all other tasks in the JVM. That not only causes long and unpredictable pauses in the application, but it also results in very poor performance on multi-CPU computers since all other CPUs must wait in idle while one CPU is running at 100% to free up the heap memory space. It is crucial that we select a JDK 1.4+ JVM that supports parallel and concurrent GC operations. Actually, the concurrent GC implementation in the JDK 1.4 series of JVMs is not very stable. So, we strongly recommend you upgrade to JDK 5.0.
    • Pick a good GC algorithms – Parallel GC free up memory faster but longer pause. Concurrent GC has shorter pause but doesn’t free up all memory at once.
  5. How to have your tomcat be available at port 80
  6. More on Java system tuning on multi-core server

Below are some of the great VPS options on the Net:

RimuHosting.com provides Xen-based VPS. Xen is the virtualization technology that Amazon use for EC2.

  • Guaranteed 99.9% uptime
  • They target at most, 8 customers per CPU core with 16% usage.
  • The host server is 2U Supermicro with 32GB of memory, 8 2.4Ghz Xeon CPU cores and 4TB of disk space.
  • You can get 256MB memory and 4GB disk space allocated for about $30.
  • Great how to wiki page


SliceHost.com also provides Xen-based VPS.

  • Guaranteed 99.9% uptime
  • The host server is in 16GB memory, 64 bits, quad-core CPU with 8+ Ghz. RAID-10 disk storage with Gigabit network.
  • You can get 256MB memory, 10GB disk space and 100GB bandwidth for about $20.
  • No contracts, no setup fees.
  • Great how to wikipage

WestHost.com that costs $10 per month.

  • Guaranteed 99.9% uptime
  • Lots of hard drive space but no guaranteed on memory allocation.

http://www.perfectblogger.com/2007/10/why-i-think-slicehost-is-the-best/

How to get SVN service for FREE?

If you are using website is in php, I believe Dreamhost is enough for you. Apart from using Dreamhost as web hosting, I also use it as my SVN repository. I have a project named “Justproposed.com” that I want to work in collaboration with my partners. I want to use SVN that I found very helpful at work. However, I don’t want to pay too much to do that. I wonder any of the web hosting company on the Net that provides this service. So, I can pay my regular web hosting fee and get this extra service for FREE (It is practically free if the web hosting company you pick gives you tons of disk space). I have searched through different postings and feedbacks and eventually found “Dreamhost” that gives me exactly what I want.

DreamHost

If you use “SOLUTIONHACKER” as code, you will get your first year FREE as well. Apart from the great service they provide, there are many supporters on the Net to help each other. Here is a good article that shows you how to set up your first SVN repository on Dreamhost. Follow this to get started your project in a cheap way!

http://www.jtbullitt.com/projects/tech/svn-for-website-development

Leave a comment Continue Reading →

Wiring up Flex, Mate, BlazeDS, Spring, Hibernate and MySQL with Maven 2 – Part 1

Introduction

This article is written on top of the great work that Sébastien Arbogast has done. He has written 3 articles that showed you how to wire up Flex, BlazeDS, Spring, Hibernate and MySQL with Maven as build process. I have included his articles below as your reference.

  1. The Flex, Spring, and BlazeDS full stack – Part 1: Creating a Flex module
  2. The Flex, Spring and BlazeDS full stack – Part 2: Writing the to-do list server
  3. The Flex, Spring and BlazeDS full stack – Part 3: Putting the application together

I have found Sebastien’s work as a good foundation for my own project. To contribute back to the community, I will write a series of articles to show you how can customize and extend the todolist sample.

What is in the Part 1 of the series…

  1. Enhancements on the Maven build process
    • Leverage RSL to factor our the framework swc, so the size of the application swf will be reduced. Apart from that, I also take advantage of Flash Player Cache that is available after version 9 update 3 to cache the framework libraries.
    • Clean up the Flex and BlazeDS dependencies in POM as the latest version of the sdk is available and the BlazeDS dependencies are officially available.
    • Include some common reports for maven site generation
    • Embed Jetty web server in the build process for quick deployment and testing
  2. Document how to get the sample up on Eclipse for development
  3. Use Mate as Flex framework
    • Restructure ToDoList sample to leverage Mate framework
    • Factor out Mate as RSL and integrate it with Maven build process via Flex-mojo plugin.

What are in the coming articles…

  1. In part 2 of this series, I will show you how to use flex-mojo to build a modular Flex application.
  2. In part 3 of this series, I will show you how to test your flex app via FlexUnit (Unit test) and FlexMonkey (Functional test)
  3. In part 4 or this series, I will work on server side. I am planning to add monitoring, caching and security to the server side.

Review “ToDoList” sample

Before I start my journey, let me highlight what Sebastien has done first:

  1. Sebastien’s sample demonstrates how to use Maven as a build process. There are 3 parts or subprojects in his sample. They are:
    • todolist-config (configuration files shared by other subprojects)
    • todolist-ria (Flex frontend)
    • todolist-web (Server side that supports the Frontend)
  2. All these subprojects are considered as modules of the main project (root POM). Finally, they are combined together into war artifact and ready to deploy to Tomcat or other J2EE webapp server.
  3. Flex frontend and backend communicate through a binary RPC protocol – AMF. AMF is considered to be the simplest and fastest remoting approach available in Flex. Recently, Adobe has released BlazeDS as an open source implementation of AMF spec. In this sample, BlazeDS is used. To use BlazeDS, there are few things you need to do:
    • Externalize your POJO service via BlazeDS. This sample shows you how to integrate BlazeDS with Spring
    • Make BlazeDS endpoints availabe to the Net via Servlet.
    • Have frontend and backend shared the same BlazeDS configuration files.
  4. In this sample, you can also find out how to use flex-mojo maven plugin to compile the Flex frontend code into swf. Apart from flex-mojo plugin, there are other two good plugins worth to mention:
    • maven-assembly-plugin - can be used to bundle all the files under a directory into a zip file. It is used by todolist-config to bundle all the configuration files (service-config.xml and remoting-config.xml) into a zip during the package phase.
    • maven-dependency-plugincan be used to unpack the zip file and move to the place you want. It is used by todolist-web to unpack the config zip during the generate-resources phase.

Enhancements on maven POM

I have modified the sample’s maven pom as follows:

  • Link to new repository “Sonatype Forge” in the root POM. So, I can use the new version of flex-mojo and simplify the todolist-ria adobe framework dependencies. Apart from that, I also take away the private repository from Sebastein because BlazeDS libraries are available in official maven repository (Note: The BlazeDS libraries available in official maven repo are in version 3.0 instead of 3.0.0.544. So, you need to modify the webapp pom correspondingly).

    <repositories>
        <repository>
            <id>flex-mojos-repository</id>
            <url>http://svn.sonatype.org/flexmojos/repository/</url>
            <releases>
                <enabled>true</enabled>
            </releases>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
    </repositories>

    <pluginRepositories>
        <pluginRepository>
            <id>flex-mojos-repository</id>
            <url>http://svn.sonatype.org/flexmojos/repository/</url>
            <releases>
                <enabled>true</enabled>
            </releases>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </pluginRepository>
    </pluginRepositories>

  • Because I link to Sonatype repository, I can have my todolist-ria depends on one flex-framework pom dependency instead of all the swc dependencies. Note that the pom dependency is a way to factor out all the adobe swc dependencies that makes your pom easier to maintain.

        <dependency>
            <groupId>com.adobe.flex.framework</groupId>
            <artifactId>flex-framework</artifactId>
            <version>3.1.0.2710</version>
            <type>pom</type>
        </dependency>

  • I include mysql driver as dependency in my webapp pom. I think it is cleaner to bundle it in war. I have also added jetty plugin in the POM so you have a web server embedded in the build process. With this, you can run this sample application right after you check it out from svn (assume you have maven 2 installed). To start jetty, you can issue the following maven command under your webapp project.

project_root> mvn clean install
project_root/jp-web> mvn jetty:run-war

  • I have included some reports that will be shown after site generation. You may not be able to do mvn site-deploy because it is linked to my web hosting site. However, you can modify it for your own sake.

Get the sample up on Eclipse

To develop on Eclipse, you can follow the steps below:

  1. Create Eclipse project file via running the command below at the project root. This will create 2 eclipse projects. One for todolist-ria and one for the webapp. You noticed that I use the -Declipse.downloadSource=true to include the source files of my dependencies in my eclipse project. Therefore, I can get to the source code if needed.

mvn -Declipse.downloadSource=true eclipse:eclipse

  1. Import the projects into Eclipse
  2. Add new variable M2_REPO and set it equals to [home]/.m2/repository
  3. If you have installed Flex Builder plugin to your Eclipse, you can Add Flex Project Nature to the todolist-ria project.
    • Select Application Server Type: J2EE
    • Put check on “Use remote object access service” with LiveCycle Data Service selected.
    • Set up the path. I have my tomcat installed under C:\tools with default 8080 as port. You should make the changes if you installed it differently.
    • Remove the generated main.mxml under the src folder.
    • Set index.mxml under src folder as default Flex application file to run.
    • Use Default Flex SDK in Flex Compiler Configuration instead of Server Flex SDK
    • Right click and select Recreate HTML Template if you see error.
    • After all these, you have configured your Flex application pointing to the webapp server and sharing the BlazeDS configuration files. You can verify in Flex Compiler Configuration’s Additional Compiler Parameters. See whether you see this: -services “C:\tools\tomcat-6.0.16\webapps\jp\WEB-INF\flex\services-config.xml” -locale en_US
    • Move the war to your tomcat’s webapp folder and start it under remote debugging setting. If you are using window, set DEBUG_OPTS=-Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=8787,suspend=n under your bin/catalina.bat.
    • Start your webapp via bin/startup.bat
    • Put breakpoint under TodoServiceImpl save method and start remote debugger on localhost:8787
    • Right click the index.mxml and Run As Flex Application.
    • Add a new entry and save it on the flex app. :razz: You should see your remote debugger halt at the breakpoint for you to debug.
    • Now you can change your flex code and test it out without leaving your Eclipse. However, if you modify the service in webapp, you need to run “mvn clean install” and deploy the war to the tomcat before your flex code can call your server-side code via AMF.

Use Mate as Framework

If you are not familiar with Mate, click the image below that moves you to a nice presentation.

 

What did I do to restructure the todolist sample to make it Mate app?

  1.  

Download

I have made my work available at: www.solutionhacker.com/wp-content/uploads/todolist-jp-modified.zip

Reference

Below are the references I used for the article:

  1. Flex mojo compiler user guide
  2. Flex mojo dependency scope rules
  3. Flex 3 feature introduction: Flex 3 RSL
  4. Improving Flex application performance using Flash Player Cache
  5. FNA archetype projects 

 

Leave a comment Continue Reading →

Database concurrency control – MVCC

Concurrency Issue – Lost Update

Lost update is the key concurrency problem that we try to avoid:

From the sequences above, SELECT-UPDATE transaction A will overwrite the update Transaction B made to the balance. .  If the transactions of A and B would serialize properly, the correct balance value after these transactions would be 700. For performance reasons, most RDBMS uses the isolation level “READ COMMITTED” as default. However, this isolation level does not protect any data read by transaction of getting outdated right after reading the value as another transaction can change it. You can use isolation level “REPEATABLE READ” or  “SERIALIZABLE” to protect the values read in the transaction of getting outdated via holding shared read lock on these rows up to the end of the transaction. In case you are new to RDBMS, let me give you some background.

There are 3 things we are trying to avoid in reliable RDBMS:

  1. Dirty read – if transaction A can read uncommitted changes by transaction B, it may be working on the data that may be rolled back if transaction B cannot successfully commit the changes.
  2. Non-repeatable read – Transaction A issues the same select twice during the transaction may get different results as some other transactions may commit its changes in between.
  3. Phantom read – Transaction A issues the same select twice during the transaction may get new rows as some other transactions may insert new records that fits the criteria of the query.

To protect the 3 issues above, RDBMS provides us 4 level of isolations:

  1. READ UNCOMMITTED – lowest level – no protection at all. Dirty, non-repeatable and phantom read are possible.
  2. READ COMMITTED – default setting – cannot protect non-repeatable and phantom read
  3. REPEABLABLE READ – phantom read is still possible
  4. SERIALIZABLE - highest level with largest overhead as it forces all transaction to run in serial fashion (ie. one by one). However, it resolves all the issues above.

Under the hood, RDBMS uses locking scheme to achieve the 4 transaction isolation level above. How? It uses 2 kinds of locks:

  1. Shared read lock – more than one read concurrently but no write is possible
  2. Exclusive write lock – once exclusive write lock is acquired, no one can read and write until it is released.

In other words, read locks block write lock and vice versa. In the above Lost Update problem, transaction A can use shared read lock to make sure no write is possible for the data it read during the course of its transaction. So far so good? :grin:. Lets move on!

Using high level of isolation to solve the lost update issue may give you a new set of issues related to locking:

  1. Long-running transaction – if transaction A takes a long time to commit, other transactions that wants to change the value will be queued up. Normally, if your transaction read info out from db, change it in UI and write it back. You should consider it as long transaction as you never know how long it takes for user to change it in UI (ie. user think time). In this scenario, you will split the transaction into 2. (ie. a transaction to read and a transaction to write back). However, once you split up the transaction, you have no way to guarantee  your data of the previous read up-to-date no matter what isolation level you use. How to solve this problem then? :shock: It can be solved by checking the version or timestamp of the data in database before you write it in the write transaction. If the version is the same as the first read transaction, it means no one has changed it even you leave it wide open, then you are safe to change it. Otherwise, an exception should be thrown. This mechanism is called optimistic locking.
  2. Deadlock – if transaction A needs other resources in order to commit its transaction but these resources are holding up by a transaction that is waiting for A to commit, it will create a typical circular dependency.

Locks are not just for row. It also happens to index. That is to say, for heavy updates to distinct rows of a table, the bottleneck can be index locking.

New Database Currency Control – MVCC

The aim of Multi-Version Concurrency Control (MVCC) is to avoid Writers blocking Readers and vice-versa, by making use of multiple versions of data (ie. No read lock is necessarily). The problem of Writers blocking Readers can be avoided if Readers can obtain access to a previous version of the data that is locked by Writers for modification. That is to say, there may not have concurrent write.  However, while MVCC improves database concurrency, its impact on data consistency is more complex. I will discuss about this later. Now lets look at how to implement MVCC.

Challenges in implementing MVCC

  1. If multiple versions are stored in the database, an efficient garbage collection mechanism is required to get rid of old versions when they are no longer needed.
  2. The DBMS must provide efficient access methods that avoid looking at redundant versions.
  3. The DBMS must avoid expensive lookups when determining the relative commit time of a transaction.

There are essentially two approaches to multi-version concurrency.

  1. The first approach is to store multiple versions of records in the database, and garbage collect records when they are no longer required. This is the approach adopted by PostgreSQL and Firebird/Interbase.
  2. The second approach is to keep only the latest version of data in the database, but reconstruct older versions of data dynamically as required by exploiting information within the Write Ahead Log. This is the approach taken by Oracle and MySQL/InnoDb.

How does Postgresql implement MVCC

In PostgreSQL, when a row is updated, a new version (called a tuple) of the row is created and inserted into the table. The previous version is provided a pointer to the new version. The previous version is marked “expired”, but remains in the database until it is garbage collected. So, to achieve this, each tuple needs 2 additional fields:

  1. creation id/ xmin - The ID of the transaction that inserted/updated the row and created this tuple.
  2. expired id/ xmax - The transaction that deleted the row, or created a new version of this tuple. Initially it is null.

A row is visible if:

  • its creation id is a committed transaction (ensure it doesn’t read uncommitted data from other open transaction) AND its creation id is < current transaction id (ensure non-repeatable read)
  • its expired id is null (ensure it is not stale data) or > current transaction id b/c if it is deleted by the transaction after the current one, the current transaction should ignore it.

To track the status of transactions, a special table called PG_LOG is maintained. Since Transaction Ids are implemented using a monotonically increasing counter, the PG_LOG table can represent transaction status as a bitmap. This table contains two bits of status information for each transaction; the possible states are in-progress, committed, or aborted.

One of the drawback of MVCC in PostgreSQL is that old version tuples are kept in the same table. To garbage collected these, you can “VACCUM” it.

Reference

Below are the references I used for this artice:

  1. Paper – Data Access Pattern
  2. PostgreSQL Ends the Waiting Game
  3. MVCC implementation algorithms in different databases
  4. MarkMail – very good postgresql forum
  5. Postgresql 8.3.6 Manual

  

         

 

Leave a comment Continue Reading →