Dev

Java Theory and Practice

This section is to show you how to write effective Java code. I will cover topics like data structure, threading, memory management, dynamic proxies, generics and more.

Online Java Book and Articles

Lets follow the outline from Heinz’s Java Master course and see whether we can follow his syllabus to get us into expert level. Because I have included Thinking in Java Book there, I will only cover the advanced topics here.

Imagemap

 

  1. Constructor vs Static factory method
    • Constructor restriction on signature
    • Static factory method can control number of instances in the system via reusing the same immutable instance(s) (Flyweight Pattern, Singleton Pattern)
    • Static factory method enables interface-based framework (eg. Collection framework) – they can return an object of any subtype of their return type. Normally, it is used with non-public implementation classes (Strategy Pattern). Client code is remained unchanged even you later add in more implementation strategies.  This approach forms the basis of service provider framework (eg. JDBC API).
    • However, Class without public or protected constructors cannot be subclassed. But it encourages us to use Composition over Inheritance.
  2. Use Builder Pattern when you faced with many constructor parameters.
    • Static factories and constructors share a limitation that they do not scale well to large numbers of optional parameters. The clean solution is that you could provide constructor with required parameters and get a builder object (inner class). Then the client calls setXXX for optional parameters. Finally, the client calls a parameterless build method to generate the immutable object.
  3. Make a class a singleton can make it difficult to test its clients.
  4. A WeakHashMap is a special Map implementation where the keys of the map are stored in a java.lang.ref.WeakReference. By storing the keys in a weak reference, key-value pairs can dynamically be dropped from the map when the only reference to the key is from the weak reference (ie. no external reference to the key). It may be used to implement class that needs to manage its own memory b/c users may not nullify the non-used objects sometimes.
  5. Never use the database identifier to implement equality; use a business key, a combination of unique, usually immutable, attributes. The database identifier will change if a transient object is made persistent. If the transient instance (usually together with detached instances) is held in a Set, changing the hashcode breaks the contract of the Set. So, remember to implementing equals() and hashCode() using Business key equality. This works as long as you have a relatively immutable natural key for each of your objects. However, you may not have such a key for every object type. If so, use UUID as id once the object is instantiated. UUIDs are 16-byte (128-bit) numbers that adhere to a standard format. The String version of a UUID looks like this: 2cdb8cee-9134-453f-9d7a-14c0ae8184c6. However, Hibernate checked whether the id field was null to determine if the object was new. Obviously this won’t work anymore since our object id is never null. We can easily solve this by configuring Hibernate to check whether the version field, rather than the id field, is null. The major drawback of using UUIDs as database primary keys is their size in the database (rather than in memory), where indexes and foreign keys compound the size increase. Here you have to make trade-offs. With a String representation, the database primary keys are 32 or 36 bytes.

 

<hibernate-mapping package=”my.package”>
   <class name=”Person” table=”PERSON”>
     <id name=”id” column=”ID”>
       <generator class=”assigned” />
     </id>
     <version name=”version” column=”VERSION” unsaved-value=”null” />
     <!– Map Person-specific properties here. –>
   </class>
</hibernate-mapping>

 

  1. Choose uniformly distributed hash function. HashMap uses hashcode() to assign object to the right bucket when you do put. Then, it will use equals() to fish out the right object when you do get(). The worst case to HashMap is to return same hashcode for every object b/c you practically degenerate your map into linked list. Then the program will run in linear time instead run in quadratic time. So, a good hash function tends to produce unequal hash codes for unequal objects. Ideally, a hash function should distribute any reasonable collection of unequal instances uniformly across all the possible hash values.
  2. Reduce couping and increase cohesion. A well-designed module has a good degree of encapsulation. API is cleanly separated from its implementation details. Modules communicate only through their APIs. With this, it decouples modules that comprise a system, allowing them to be developed, tested, optimized, used, understood, and modified in isolation. This speeds up system development b/c modules can be developed in parallel and more reusable (ie. Heart of SOA). Access modifiers can help on encapsulation (public, protected, package private, private).
    • If a package-private top level classes or interface is used by only one class, consider making it privated nested class of this class. This reduces it accessibility from all the classes in its package.
    • Make your classes and fields as less accessible as possible.
    • If a method overrides a superclass method, it is not permitted to have a lower access level in the subclass than it does in the superclass. This is necessary to ensure that an instance of the subclass is usable anywhere that an instance of superclass is usable.
    • Constant is expressed as  public static final field. A final field containing a reference to a mutable object has all the disadvantages of a nonfinal field.
    • Don’t return internal mutable fields for clients to change (return a copy of it or immutable version instead).
  3. Immutability – If the state of an object never change once instantiated, you can freely share and cache references to it without having to copy or clone them. You can cache their fields or the results of their methods without worrying about the values becoming stale or inconsistent with the rest of the object’s state. Because of these, there are some advantages of Immutable classes: (1) Best map keysIf you use such a mutable object as a HashSet key, and then the object changes its state, the HashSet implementation will become confused — the object will still be present if you enumerate the set, but it may not appear to be present if you query the set with contains(). Needless to say, this could cause some confusing behavior. (2) Inherently thread-safe – You don’t have to synchronize access to them across threads. (3) Enable flyweight pattern – Uses sharing to facilitate using objects to represent large numbers of fine-grained objects efficiently. For example, create 26 character objects and uses in different documents to reduce the # of object count. Infrequent change data – Is there any way to obtain the convenience and thread-safety benefits of immutability with data that sometimes changes? The CopyOnWriteArrayList class, from the util.concurrent package, is a good example of how to harness the power of immutability while still permitting occasional modifications. CopyOnWriteArrayList behaves much like the ArrayList class, except that when the list is modified, instead of mutating the underlying array, a new array is created and the old array is discarded. How to minimize mutability? No setter, Make the class final, Make all fields private, No return mutable internal fields.
  4. Favor Composition over Inheritance.   Here the inheritance means implementation inheritance not interface inheritance. Inheritance  violates encapsulation b/c subclasses may depend on the implementation of the superclass. For example, if B extends A and add security check on top of all A methods, it will have security hole once A has added more methods but B hasn’t caught up with it. In that case, subclass needs to evolute with the its superclass to make everything works as expected and it causes fragility of your system (Decorator Pattern, Adapter Pattern).
  5. Interface vs Abstract class.  If you use abstract classes to define types, you leave programmer who wants to add functionality with no alternate but to use inheritance b/c he cannot use composition as he cannot instantiate from abstract class. You can create skeleton implementation (Template Pattern) of an interface using abstract class.
  6. Nested classes (inner classes) are to serve the enclosed class. Object-orientation provided a concept of state and behavior to the Objects. Inner classes provide state and behavior to the “members”, allowing multifaceted objects. In effect, nested classes make multiple-inheritance possible. There are 4 different type of nested classes:
    • static member classes (has access to all static fields of the enclosing class) -
    • nonstatic member classes (instance specific and has access to any and all methods and members, even the parent’s this reference) – this article gives an good example of usage. In short, non-static member class allows us to take logic out of the parent and objectify it.
    • anonymous classes (define inside a method of the enclosing class) – usage: callback on the fly like adding new ActionListener to the button instead of creating a new class just to serve one button.
    • local classes
  7.  Generic
    • A good article that talks about it.
    • Generic is not covariant – Integer[] can assign to where Number[] is called for but List <Integer> cannot assign to where List <Number>.is called for. The reason is to enforce type safety b/c you can assign List <Integer> to List <Number> then you can add List <Float> to the List <Number> and later you want to get your Integer back but there is a Float :lol:.  

Concurrency

Write a thread-safe application:

  • Thread-safety - to avoid issues arise when 2 or more threads access the same shared mutable data.
  • synchronized can provide mutual exclusion and happen-before visibility
  • volatile can provide happen-before visibility but not mutual exclusion
  • non-long and non-double data can guarantee to be atomic update (no need mutual exclusion) but it doesn’t provide happen-before visibility.
  • AtomXXX like AtomicLong can provide atomic update with happen-before visibility on the individual field.
  • Better shared immutable field.
  • On the other hand, you should avoid excessive synchronization as well.
  • Lock in Java is reentrant . So, a thread in synchronized region can call back region(s) protected by the same lock without deadlock. However, sometimes, it may not be safe. For example, if a thread iterates a collection in synchronized method A and then calls another synchronized method B of the same object to remove element of the collection before the iteration ends. It will cause ConcurrentModificationException. To avoid this, you may delegate to another thread via Executor to remove the element in the collection. However, this will lead to deadlock as the method A is waiting for B’s Executor to be done in order to complete while Executor cannot remove the collection until the lock is released by the thread that is in the method A. To get around this, you can either move the alien method (method B invocation) outside of synchronized block or use CopyOnWriteArrayList instead of ArrayList. This special list is a variant of ArrayList in which all write operations are implemented by making fresh copy of the entire underlying array.
  • Achieving Thread Synchronization & Parallelized Execution in Java

IO Programming

Write a server to support client connections:

  • Thread Pooling Approach (Traditional thread per client model): Assign an active thread from thread pool to each socket and handle the job. When the job is done, the thread will be released and go back to the thread pool. However, this approach cannot handle large # of connections b/c once your pool size is increased to hundreds, your CPU will spend large portion of the time on context switching rather than performing real tasks.
  • IO Multiplexing Approach: A single thread is used to handle an arbitrary number of sockets. To implement that, we need to use the Java NIO APIs: selector and non-blocking I/O. In theory, with I/O multiplexing it is possible to have a single thread do all of the work in a server application. In practice, that is a very bad idea. When using a single thread, it is not possible to hide the latency of disk I/O (Java NIO does not support non-blocking file operations) or to take advantage of systems with multiple CPUs. As a rough guideline, a server application should have at least 2*n threads, with n being the number of execution units available. Therefore, we had to implement a way of dividing the work among threads. Threading Model:(1) M dispatchers/no workers, (2) 1 dispatcher/N workers,(3) M dispatchers/N workers. In all cases, incoming connections are assigned to an event-dispatch thread (a SelectorThread) for the duration of their lives. In the first architecture, I/O events are fully processed by the event-dispatch thread. In the other two, the processing is delegated to worker threads. The lesson learnt: A selector, its selection keys, and registered channels should never be accessed by more than one thread. So, we chose M dispatcher and no worker model.
  • Build highly scalable server using Java NIO
  • Introducing non-blocking sockets

 

  • Each key doesn’t represent the entire information stream a client sends to a server, but just a part. And it will create another piece to the server if the read event is fired. So, it is not waiting or blocking for data to be fully transmitted from the client.

APIs

A website (http://www.programmableweb.com) provides you tons of public apis that you can use to build your mashups. Apart from that, it even gives you a chart indicating the most popular APIs on the Net. Very good resources!!

Design Pattern

Below is the overview of the 26 design patterns in Gang of Four Book.

Algorithm

Below are some good references that you can follow up:

  1. MIT Introduction of Algorithm
  2. Greedy Algorithm

Testing

I found some very good blogs for testing to share

  1. Google Testing Blog

Sample Code

TBA

Tools

There are various tools that will facilitate the development cycle. I will list the ones I use here:

  1. Eclipse – IDE
  2. Maven – Build process
  3. CruiseControl – automated testing (TDD)
  4. WebTest or JMeter (stress test)
  5. Bugzilla
  6. Twiki
  7. Selenium – automated integration tests
  8. SVN
  9. Methodology – Agile, TDD, Scrum, XP

Team development

This section provides best practices for building a dev team and have them work together in a productive way.

  1. Scrum process
    • Product Backlog  -> Sprint Backlog -> Sprint(s) interations -> Post-modem reviews for each sprint
    • Stand up meeting for each day
    • Product Scope -> Product Spec -> Eng Spec (use Wiki for coordination)
    • Activities (code review, study group, pair programming and demo).
  2. Local, Dev (Demo), QA (Test for the next launch), Staging and Production
  3. Database schema changes – launch process
    • Database schema need to evolve as your application does, but differ from code deployments in 2 very important ways. They are slow and they can’t always be undone.
    • Speed issue doesn’t occur until a certain size of dataset is reached. If you use MySQL, you can modify a table with < 100,000 rows in few second. Past a certain point, a modification to a large table is going to lock that table for a considerable time. At that point, your modification may need to take the database offline. To avoid downtime, you may take a set of dbs from a pool offline for modifications then bring them up and continue to modify the rest in the same fashion (ie. rolling window).
    • Dropping a column means data loss. If your database is too large, it is not practical to backup for rollback later.
    •  
  4. SVN
    • Atomic fileset commit. The repository as a whole, rather than the individual files, has an new revision number after each commit. So, tag becomes less important b/c you can consider revision number as tag.
    • Files can be renamed and moved in the repository with their edit histories intact.
  5. Build process – Maven 2.0

 

Leave a comment

0 Comments.

Leave a Reply

You must be logged in to post a comment.