Introduction
This page will cover an application full stack that I used as infrastructure of all my projects. The technologies that I picked for this stack were carefully selected and they were proven to be powerful from different projects. To justify how I made these decisions, I will talk about the reasons behind. Apart from that, some non-functional requirements (NFRs) are also covered like security, logging, transactional management, SLA fulfillment and etc. Hopefully, it gives you an idea of how to build a serious website that can scale and support millions of users.

This article, by Sébastien Arbogast, gives you detail of how to set up the technology stack above. Terracotta has proposed an application stack as well. Except the web frontend, it is similar to the stack I picked. However, I have noticed some interesting tools that Terracotta picked that I will dig more to see their powers. They are:
- JPA - For the high-level domain persistence layer
- Cargo - To launch the web container(s) for scripted/automated testing
- Crosscheck - For unit testing JavaScript
- FreeMarker - To manage the layout of email messages
- EhCache - For standalone caching and Hibernate L2 caching
Frontend
I adopt Flex as RIA solution. It is way better than AJAX because it frees me from the headache of browser compatibility issues. In order to simplify the communication mechanism between my J2EE backend and Flex, I adopt BlazeDS as well. BlazeDS allows Flex talking to Java POJO via AMF protocol without sending xml back and forth.
- Flex
- Look and Feel
- CSS, Skinning and Font
- Effects/ Animation
- Remoting
- AMF vs JSON vs XML. There is an article that did the performance analysis on this topic.
- Navigational support (Back, Forward and Bookmark)
- UrlKit to externalize the state to URL.
- Testability
- FlexUnit (Unit testing) and FlexMonkey (Functional testing)
- Event model
- Binding
- Metadata/ annotation
- Security
- Internationalization
- Look and Feel
- BlazeDS
- Push Mechanism (Comet)
- AMF
- Cairngorm or Mate as Flex programming framework.
- Cocomo is a Platform as a Service that allows Flex developers to easily add real-time social capabilities into their RIA (rich Internet applications) – developer guide is here!
- Off-line application (Google Gear, AIR, native drag and drop, integrate with Excel, iCam support, SQLite support)
Middle Tier
I use J2EE solution at the back. I believe Java is not just a language but an application platform because it has so many tools available as open source solution.
Apache HTTP Web Server 2.0
Tomcat 6.0 – Servlet container
- The servlet container creates a singleton of each servlet defined in the web.xml configuration file. Therefore, it needs to be thread-safe. A piece of code is thread-safe if it is reentrant or protected from multiple simultaneous execution by some form of mutual exclusion. Either you make your servlet stateless ( b/c every thread has its own stack that is local to the thread and not shared with other threads. ) or you protect the shared state via synchronization that may lead to congestion or deadlock. If you have state that doesn't need to be shared across other threads but is needed for its own reference later, consider to move this state to ThreadLocal. Some may achieve it via object pool but I think it would complicate your system unnecessarily.
- Database connectivity objects like connections, result sets, statements or Hibernate sessions are stateful objects. They are not thread-safe by design and should not be shared by multiple threads simultaneously. To avoid synchronization, you can use servlet filter to create a JDBC connection or Hibernate session at the start of the request before the web tier is invoked and bind it to the current thread by means of ThreadLocal for use in the business logic.
- Tomcat 6 has NIO connector that can handle large number of concurrent connections that is definitely a requirement for today’s AJAX applications. Apart from Tomcat 6, Jetty supports NIO as well. (is it a myth?)
- Thread pool management – Thread pooling works to limit the level of activity in your application by limiting the number of requests it can handle. The only way to know if you've set a proper upper and lower limit is to monitor the number of active threads, response times, and the level of utilization of critical system resources.
Spring 2.5 – POJO framework
- IoC - Dependency injection, object configuration management and construction.
- AOP - AspectJ vs Spring-AOP
- Bean Instantiation
- setter vs constructor injection
- singleton vs prototype – per ApplicationContext
- direct vs indirect – Factory Bean introduce a layer of indirection, the bean injected as dependency comes from getObject()
- You can define abstract bean definition that can be extended by other bean definitions to save typing repetitively.
- You can set up properties, list, map via ApplicationContext.
- Spring MVC to expose my POJO to the Web
- Spring's data access abstraction is known for declarative transaction demarcation and reuse of data access best practices through templates to cut boilerplate code. It also provides a thread-safe way to access data source plus convert checked exceptions to unchecked exceptions of its abstracted and portable exception hierarchy.
Hibernate 3.0 for ORM – If you are new to hibernate, read this.
- Cascading save
- Automatic dirty checking
- Automatic CRUD sql generation with dialect taken care of
- Transactional write behind (reordering sql to make it right)
- Support user-defined custom type – very flexible for object – relational mapping.
- How to have Spring and Hibernate worked together (Session Management). This article talked about Session per request pattern and how Spring and Hibernate work together to achieve that.
- How to write your own DAO
- Don't repeat your DAO
- Advanced DAO programming
- Transaction Demarcation: (1) inside or outside DAO? Outside b/c you may want to have multiple DAOs participate in single transaction, (2) JDBC vs JTA? With JDBC transaction demarcation, you can combine multiple SQL statements into a single transaction. One of the drawbacks of JDBC transactions is that the transaction's scope is limited to a single database connection whereas a JTA transaction can have multiple participants. If you plan to demarcate transactions with JTA, you will need a JDBC driver that implements the [code]]czoyMjpcImphdmF4LnNxbC5YQURhdGFTb3VyY2VcIjt7WyYqJl19[[/code], [code]]czoyMjpcImphdmF4LnNxbC5YQUNvbm5lY3Rpb25cIjt7WyYqJl19[[/code], and [code]]czoyMDpcImphdmF4LnNxbC5YQVJlc291cmNlXCI7e1smKiZdfQ==[[/code] interfaces. A driver that implements these interfaces will be able to participate in JTA transactions. An [code]]czoxMjpcIlhBRGF0YVNvdXJjZVwiO3tbJiomXX0=[[/code] object is a factory for [code]]czoxMjpcIlhBQ29ubmVjdGlvblwiO3tbJiomXX0=[[/code] objects. [code]]czoxMjpcIlhBQ29ubmVjdGlvblwiO3tbJiomXX0=[[/code]s are JDBC connections that participate in JTA transactions. (3) Programmatic vs Declarative? Prefer declarative.
- How to implement your domain object if it will be persisted by ORM
- Hibernate caching strategy
- Hibernate mapping strategy
- Hibernate Shard
Caching
Most existing in-memory caching solutions fall into the category of what we call a "plain" cache system, in which the direct object references are stored and cached. Since a plain cache deals with the object references directly, it acts like an elaborate [code]]czo3OlwiSGFzaE1hcFwiO3tbJiomXX0=[[/code] and thus is very intuitive to use. When an object needs to be replicated or persisted in a plain cache system, the object has to implement the [code]]czoxMjpcIlNlcmlhbGl6YWJsZVwiO3tbJiomXX0=[[/code] interface. However, a plain cache also has some known limitations relating to replication or persistency:
- Cumbersome - The user will have to manage the cache specifically. For instance, when an object is updated, a user will need to execute a corresponding API to update the cache content.
- Hamper performance - If the object size is huge, even a single field update would trigger serialization of the whole object and replication across the cluster. Thus, it can be unnecessarily expensive.
- Relationship cannot be preserved between cached objects - In particular, the cached object cannot be referenced multiple times by other objects (multiple referenced), or have an indirect reference to itself (cyclic). Otherwise, the relationship will be broken upon serialization. During replication, if we have two [code]]czo2OlwiUGVyc29uXCI7e1smKiZdfQ==[[/code] instances that share the same [code]]czo3OlwiQWRkcmVzc1wiO3tbJiomXX0=[[/code] object, upon replication, it will be split into two separate [code]]czo3OlwiQWRkcmVzc1wiO3tbJiomXX0=[[/code] instances (instead of one).
Solution : POJO cache - No need to implement the [code]]czoxMjpcIlNlcmlhbGl6YWJsZVwiO3tbJiomXX0=[[/code] interface for POJOs. Replication is done on a per-field basis resulting in a potential boost to performance. The object relationship and identity are preserved automatically in a distributed replicated environment. This enables transparent usage experience and increases software performance.
- JBoss has POJO Cache.
- Distributed Caching with JBoss Cache
- Terracotta outperforms JBoss POJO Cache
- Object Identity Part I, Part II, Part III (From Terracotta)
Other Middle Tier Services
- Messaging - ActiveMQ
- Scheduling - Quartz
- Web Service - Axis 2 (StaX's pull parsing)
- Reporting - JasperSoft Server
- Dynamic SQL creation and domain model population from JDBC resultset - iBatis
- Searching - Lucene
- Crawling - Nutch
- Batch processing in J2EE environment
- Rule-based engine
- Why we need a Rule Engine? Take a look at InfoQ article: Real World Rule Engines
- Forward chaining vs backward chaining
- Rete Algorithm to improve performance of forward chaining (linear vs exponential)
- Rete Implementation - JBoss Rules (on top of Drools). presentation
- Example: Build an stock trading application using Drools Part I , Part II
- Data mining - R
- BPMS - Intalio
Database Server
Horizontal Partitioning
- You can handle your routing logic in application. The following articles you may find useful.
- In Skype, they have the routing logic kept in database level.
Performance Tuning
Memory Management
- If your system does not enough memory, it will start paging data from memory to disk based on LRU algorithm. Finding that page requires use of the CPU. It also requires the use of the disk I/O channel.
- Java objects are stored in the heap. A large heap will cause your application to "stall" for a long period of time. Small heap will produce frequent short stalls. In either case, the process of managing memory, garbage collection, can consume a serious amount of CPU.
- The Hotspot Virtual Machine employs generational garbage collection
- Young generation holds recently created objects
- Tenured generation those objects which survived (multiple) major Garbage Collections (GCs).
- Permanent generation (or: perm space) used to store class and method data as well as interned strings. Just like heap space, you can also run out of perm space. You have to increase the available perm space using -XX:MaxPermSize.
- If you want to learn more about the different garbage collection algorithm, check this out.
- This article shows you how to use Memory Analyzer (discuss below) to examine Perm Space. Goal is to avoid Leaking Class Loader.
- Memory Analyzer (eclipse plugin)
- Perform analysis on the snapshot of a heapdump.
- Java VMs of Sun, HP, SAP and some other vendors can be configured to write heap dumps on the first thrown OutOfMemoryError (OOM). Such a heap dump basically contains all the Java Objects including their field values and references among them and is helpful to understand the state of the application at the time the Java VM run out of memory.
- You can open those heap dumps with the SAP Memory Analyzer and inspect them in detail. The SAP Memory Analyzer won’t tell you when and by whom the objects were allocated, because this information is not present in the heap dump. Also, you will only see the live objects, i.e. you won’t see the objects which have been allocated and garbage collected in the past. Again, this information is not present in the heap dump. What you will see are all the Java Objects which couldn’t be garbage collected at the time the heap dump was written, along with the information who keeps them alive, their size and more.
- Good webinar that shows you how to use this tool. Good articles related to this topics:
- Note: Histogram tells you memory consumption and # of instances per classes, Diminator Tree helps you to look into the biggest object in the heap and find out its reference path, GC chain shows you what is referencing this object, Grouping feature gives you aggregated level view
- GC
- A large heap will cause your application to "stall" for a long period of time. Small heap will produce frequent short stalls. In either case, the process of managing memory, garbage collection, can consume a serious amount of CPU.
- By setting -Xverbose:gc flag, you can view the GC activity summary. With the number there, you can calculate GC efficiency = Time spent in GC / running time of app. A GC efficiency greater than 10% is an indication that one needs to tuIne the JVM heap. If CPU utilization is high and GC is running ok then it is most likely that an algorithmic inefficiency is at fault. To diagnose that possibility we'd turn to an execution profiler.
- Lock Contention
- If we've eliminated the hardware and we've determined that the JVM is properly configured, the only things left to consider are lock contention and interactions with external systems. Each of these cases are characterized by threads that are stalled and thus unable to do "useful work". That said, long stalls maybe an indication that the transaction should be handled asynchronously thus freeing the tread to perform other useful tasks.
CPU
- Kernel vs application utilization. If kernel > 20%, something in our code is causing OS to work too hard.
- Kernel tasks: Context switching for threads to achieve parallelism - managed by thread scheduler. There are conditions under which a thread maybe removed from the CPU before it's finished with it's time quantum. Some of the more common reasons are being blocked on I/O or being forced to wait for a lock. Repeated removal of a thread from the CPU will cause the scheduler to work harder. If this happens very frequently it can drive up CPU utilization by the kernel causes it to impact application performance.
Scale out your webapp via data grid
A Data Grid is a horizontally scalable in-memory data management solution. Data grids try to eliminate data source contention by scaling out data management with commodity hardware.There are solutions in this area that I want to look into:
- Terracotta (JVM level)
- Hibernate without database bottleneck (This article talks about how to integrate Hibernate with Terracotta in 2 different areas: detached object vs second level cache. According to the performance metrics in the article, the detached object mode is the best option. The key benefit that Terracotta brings us is that you can now trust your memory heap as it is now shared across all the nodes in your cluster through the byte-code injection without requiring you make your domain object serializable).
- How Terraoctta works? The magic behind Terracotta is its capability to combine memory in different JVMs and present it to your Java application as single master memory resource. They called it Network Attached Memory that is similar to the concept of Network Attached Storage (NAS). How does Terracotta achieve this?
- Gigaspaces
Cloud Computing
If your task is very CPU intensive, you need a way to split it and leverage different machines to process a subset of it. The technologies you can use are:
- Hadoop - implementation of Google MapReduce in Java
Reference
Here are some good resources you can look into:
- Java Performance Tuning
- Hibernate 3.3.1 reference documentation
- Java Reference Guide
- Scalability Video Lectures
- Directi scalability video - very good
- Storage architecture
- A good blog here to compare Coherence, Terracotta and Gigaspace
- Gigaspace CTO's blog
- OSGi component architecture
No comments yet.