Archive | November, 2008

How to test my system?

Here is a good video from Google 

Testing Strategies Overview

I believe that I don’t need to stress out how important testing is. So, lets cut the crap and talk about the challenging areas related to this subject. Here are the things I want to talk about

  1. Write your unit test first then code
    • It is the core concept from Test Driven Development (TDD). Everyone tried that will tell how great it is. However, the difficult part of it is to enforce it across your team especially under the time constraint. Everyone wants to get something out right away and we are all getting so much stresses from our business side about features release to stay competitive in the market. How can we allocate time for this “luxury”? The key here is whether you believe it will save you tons of time in a long run. If not, it is hard to carry out this idea.
    • What are the areas that could save my time? Agile development may face one big challenge from the product side. The problem is mainly due to the way they break down the project into mini-specs to fit into a sprint of work. Sometimes they find it difficult to think of a project in a micro level. To deliver the mini-spec in a agile process, they may leave out the detail in the spec and try to fill them up in an iterative fashion. Write your test first forces you to think API, input, output and use cases ahead of time. This will help you to feedback the product team earlier in the loop. Otherwise, it will cause you more time to change your logic or patch your codes later.
    • If you have a suite of unit tests, you are more confidence to do refactoring because you can perform regression tests to make sure nothing is broken. Apart from that, you can add new features to someone’s code without worrying that it will break the old features as you have the unit tests from the owner to verify.
    • Things you need to pay attention are: (1) Don’t commit code to the trunk that breaks the build (compile), but it is ok that it doesn’t pass the tests temporary as you are in the middle to fill up the code and you don’t want to have your code sitting in your local box too long. (2) Do code coverage to make sure you have your unit test exercises your codes in good %.
    • Tools I used to write my unit test: TestNG/ JUnit for Java code, FlexUnit for flex code
    • How do we unit test our DAO? Test with your local database (use same vendor and version of the database as the one in production. I don’t like to use any memory databases like Hypersonic or SQLite b/c it doesn’t really test our your SQL).
  2. Functional testing – use Selenium to automate it.

Performance/load testing

Basic workflow

  1. Create use case scripts for your application via recording. For Flex, we want a tool that can convert the binary AMF message to readable text as part of our script.
  2. Introduce natural variability - varying the login credentials, records opened, records saved, and other user actions including thinking time. This step is important because it provides the real-world variability that keeps your application working hard and prevents you from missing performance problems because of caching at various layers on your server.
  3. Define test data sets to be used by your scripts as you ramp up hundreds or thousands of simulated users executing your tests.
  4. Define profiles of load on your application. You might decide to have 20% of your users entering information slowly, 20% entering information very quickly, 40% looking up information, 10% producing reports, and 10% making account changes. The breakdowns and level of detail are entirely up to you.
  5. Specify how to run the test. You control such things as the speed of ramping up user load, how fast each user executes the test, and whether there should be natural variability between users and sessions. Each tool provides different fine-grained control over this step.

Tips

  • Don’t carry performance test without passing all the functional tests.
  • Test against single box first and you should not run JMeter on the same machine running the application to be tested.
  • Make sure that the testing is affected as little as possible by network traffic (do it in the subnet to take out network IO factor).
  • Create a test plan that contains thread group(s) that controls the threads that will be created by JMeter to simulate simultaneous users. Determine # of threads and the ramp-up period. Add HttpRequest then listener to view result in table and/or graph format. Each thread will issue an HttpRequest and response time will be recorded. Then avg response time and standard deviation will be calculated. Another way people uses JMeter is to use its HTTP session recording to produce load tests.
  • On the other hand, you need to monitor your target in terms of CPU, memory and IO consumption (use OpenNMS) during the perfomance testing.
  • Create a scheduled integration build that runs JMeter tests (use CruiseControl). Read this article for integration detail.
  • How to achieve the load via distribution?
  • How to obtain meaningful information from the raw data?
  • Look at throughput (QPS), resources consumption (CPU, Memory, IO) over time and over users/ load.
  • Need to set up the database with good amount of data for testing.
  • Performance profiling
    • Obtain the rough data from performance test first
    • Identify the bottleneck and fix the problem.
    • Don’t waste too much time to over-tune your system. Just fix the bottlenecks that impact the SLA.
  • Stress testing
  • Scalability testing – testing on scaling out or up your system.
  • Availability testing

Reference

Below are the references I read to write this article

  1. http://www.opensourcetesting.org
  2. JMeter User Manual

 

Leave a comment Continue Reading →

Concurrent Programming – Part 1 Synchronization

Get yourself familiar with concurrency programming

When I interview my candidates, I like to ask questions related to multi-threading. I found out that it is a good topic to differentiate out a hardcore programmer from application-oriented programmer. I am not saying I am looking for someone who could write the concurrency library as efficient as the one created by Doug Lea. In fact, I am looking for candidates who has solid understanding of this topic. However, I found out that most candidates have little knowledge in this area apart from the meaning of “synchronized” keyword in Java syntax. Therefore I decide to write a series of articles to cover some areas of multi-threading that I feel important to understand. Of course, I would start from the basic first.

Introduction of Synchronization

Synchronization is a way to lock an object, so no 2 threads possibly running on the same code at the same time.

public class SynchronizedCounter {
    private double c = 0;

    public synchronized void increment() {
        c++;
    }

    public synchronized void decrement() {
        c--;
    }

    public synchronized int value() {
        return c;
    }

    public void method2() {
        ...
    }
}

If count is an instance of SynchronizedCounter, then making these methods synchronized has two effects:

  • Mutual Exclusion – It is not possible for two invocations of synchronized methods on the same object to interleave. When one thread is executing a synchronized method for an object, all other threads that invoke synchronized methods for the same object block (suspend execution) until the first thread is done with the object. Remember this rule doesn’t apply to non-synchronized methods. And the thread holds the lock of the object can reenter its synchronized methods (ie. reentrance).
  • Memory-Visibility – When a synchronized method exits, it automatically establishes a happens-before relationship with any subsequent invocation of a synchronized method for the same object. This guarantees that changes to the state of the object are visible to all threads. Most of the interviewers miss this one. :smile: In database, it is like the concept of “commit”. If you don’t commit your changes, others could not see your changes.

All in all, synchronized methods enable a simple strategy for preventing thread interference and memory consistency errors. Interference happens when two operations, running in different threads, but acting on the same data, interleave. This means that the two operations consist of multiple steps, and the sequences of steps overlap. This will result in unpredictable data lost (hard to fix). Memory consistency error occurs when complier and processor reorder statements and uses the cached value for better performance

Problem of Synchronization

Synchronization will serialize the method calls from different threads. At any given time, only one thread can execute the synchronized method and the other threads need to wait until the object lock releases. This will dramatically diminish the liveness of your application. To minimize the impact, you should:

  1. Reduce lock duration – Synchronized statements are useful for improving concurrency with fine-grained synchronization
  2. Reduce lock scope – Mutex variable in the synchronized lock may help you to avoid locking the whole object. In Java 5 concurrency library, there is class called ReentrantLock that provides the same features as synchronized with better performance and flexibility. Here is what is stated in “Java Concurrency in Practice”:

Why create a new locking mechanism (ie. ReentrantLock) that is so similar to intrinsic locking (ie. synchronized)? Intrinsic locking works fine in most situations but has some functional limitations. It is not possible to interrupt a thread waiting to acquire a lock, or to attempt to acquire a lock without being willing to wait for it forever. Intrinsic locks also must be released in the same block of code in which they are acquired; this simplifies coding and interacts nicely with exception handling, but makes non-block structured locking disciplines impossible. None of these are reasons to abandon synchronized, but in some cases a more flexible locking mechanism offers better liveness or performance.

So far so good.? Great! lets me ask you 3 questions:

  1. Question 1: In the example above, if thread A is executing a synchronized method “increment”, can another thread execute method2? Yes. Because method2 is not synchronized
  2. Question 2: If thread A is in the synchronized method xyz , can it invoke another synchronized abc? Yes. The object lock is reentrant.
  3. Question 3: If I want to make the above class thread-safe without using synchronized object lock, are there any other alternatives? Yes.
  4. Question 4: How about declare the variable volatile?
    • Managing Volatility by Brian Goetz
    • Compare Atomic variable, ReentrantLock and Volatile variable.
    • Use int, byte instead of long or double because updating int or byte is an atomic action. Atomic actions cannot be interleaved, so they can be used without fear of thread interference. However, this does not eliminate all need to synchronize atomic actions, because memory consistency errors are still possible (for example, thread A updated an variable atomically but it hasn’t flushed and sync up the main memory, so it is not visible to other threads). Unless the fields in question are declared [code]]czoxMDpcInZvbGF0aWxlLCBcIjt7WyYqJl19[[/code] the JMM does not require the underlying platform to provide cache coherency or sequential consistency across processors, so it is possible, on some platforms, to read stale data in the absence of synchronization. Look at here for better explanation. However, volatile can only guarantee atomicity and memory consistency for single variable. If you want to guarantee that for compound operations, you need to use synchronized block.(or the new java.util.concurrent classes). It is worth pointing out that increment (i.e. ++) and similar operations are not atomic in Java. So incrementing a volatile variable [code]]czoxMzpcInZvbGF0aWxlVmFyKytcIjt7WyYqJl19[[/code] is NOT thread-safe. If you need thread-safe semantics i.e. no possibility of multiple threads corrupting the variable value by having the updates unexpectedly interfere with each other, then you need to use a synchronized block to increment a variable, e.g. [code]]czoyNzpcInN5bmNocm9uaXplZChMT0NLKXtteVZhcisrfVwiO3tbJiomXX0=[[/code], regardless of the overheads this causes – Java Performance Tuning

More On Thread Safety

All the techniques I discussed so far is to show you how to make your code thread safe. They are applicable only if you have to share resources across multiple threads and those threads may modify the resources. That is to say, if you don’t share any resources or other threads only read but no write, your code are thread safe already. Here are some tips related to not sharing and read-only sharing:

  1. You can use local variables to carry out logic in the methods if possible (not share)
  2. You can use TheadLocal to hold the resources if you want to access it across multiple methods for the same thread (not share)
  3. You can use immutable object (private variable without setXXX methods) for read-only sharing. For example, String and PrimitiveWrappers like Integer. However, make sure the declare final for the reference that holds your immutable object.
  4. Most of the time, you use collection classes like HashMap, ArrayList to hold our objects. Those classes are not thread safe. To make it thread safe, you may use Collections.synchronized wrappers or simply use the synchronized version of them like Hashtable and Vector. However, these approaches have 2 problems
    • They are not performed. Why lock every reads only to protect occasional write?
    • They are just conditionally thread-safe. All individual operations are thread-safe, but sequences of operations where the control flow depends on the results of previous operations may be subject to data races like doing containsKey(), size() and iterator() methods before actually read and write can give you NullPointerException and ConcurrentModificationException if you don’t do external synchronization.
    • Here are the unconditionally thread-safe version like ConcurrentHashMap, ConcurrentLinkedQueue and CopyOnWriteArrayList to achieve thread-safe with good performance number.
    • When you write an unconditionally thread-safe class, consider using private lock object in place of synchronized methods. This protect you against synchronization interference by clients and subclasses and gives you the flexibility to adopt a more sophisticated approach to concurrent control in later release – Joshua Bloch in Effective Java 2nd version p.281.
  5. Deal with lazy initialization
  6. Handle denial of service attack that holds the object lock forever

Java Memory Model

JMM is what causes concurrent programming way more complicated than it should be. Honestly, I am not good to write this part because I cannot understand it in full. All I can do is to provide you a video from Jeremy Manson in Google. Hear what the expert said:

 If you still have questions, make sure go to his blog

http://jeremymanson.blogspot.com/

Reference

Below are some of the articles I use:

  1. What does volatile do?
  2. Sun lesson on concurrency
  3. Fixing Java Memory Model – Brian Goetz – Part 1, Part 2
  4. Rox Java NIO Tutorial
  5. Blocking Queue
  6. Synchronization and Java Memory Model – Doug Lea 

 

Leave a comment Continue Reading →

Common DBA jobs

Export schema/ data out from mysql

To export schema and/or data, you can use mysqldump command:

mysqldump -u [username] -p[password] -d [schema_name] > [filename].sql

  1. -d means no data (just gives me the schema).
  2. -B is needed for multiple schema output
  3. -h (hostname)

Export data out from postgresql

  1. Export table data from postgresql to csv format
  2. Backup and restore database in postgresql

However, if you want to export sql result set to csv in postgresql, you can consider to use COPY functionality.

COPY ( select statement ) TO STDOUT WITH CSV
COPY stock FROM ‘mydir/Stock.csv’;

Run sql script using mysql command

To run the scripts as input, we can use the following command:

mysql [schema_name] -u [username] -p[password] < [filename].sql

SQL Tips

There are times we want to put logic in SQL but not writing store procedure. Here are some of using functions that may get you there:

  • Conditional statement – CASE WHEN xxx THEN abc WHEN yyy THEN bbc …ELSE ccc END

UPDATE Account SET Sales_Location__c =
    CASE WHEN Sales_Country__c != ” THEN Sales_Country__c WHEN Country__c != ” THEN Country__c
ELSE ‘–’ END

  • COALESCE (input1, input2,….) – This function takes in as many parameter as you want and return you the first non-NULL parameter. Suppose we have a table A having 3 columns FullName, CompleteName and DisplayName. Any of these columns can contain null values. Now we want to select the DisplayName from this table, but if it is null, then return FullName, if that is also null then return CompleteName. We can easily perform the same in one select statement as: (COALESCE vs ISNULL)

SELECT COALESCE(DisplayName, FullName, CompleteName) From A

 ETL

In mysql, you can export a table from db1 and import it to db2 remotely. For example, in db2 host, you can issue the following command:

/usr/bin/mysqldump – -force – -compress – -opt -u [username] -p[password] -h [hostname] db1

| /usr/bin/mysql -u [username] -p[password] -D db2

Leave a comment Continue Reading →

Data representation

Data can be represented in text format for human and binary format for computer. Here my focus will be on text representation.

For application, we commonly use XML because:

  1. Its self-documenting format describes structure and field names as well as specific values. And it is easily digested by both human and machine.
  2. It is platform-independent, thus relatively immune to changes in technology and facilitate in data exchange across heterogeneous systems.
  3. It supports Unicode.
  4. It can represent common computer science data structures: records, lists and trees.
  5. It allows validation using schema languages such as DTD and XSD. XSDs are far more powerful than DTDs in describing XML languages. They use a rich datatyping system, allow for more detailed constraints on an XML document’s logical structure, and must be processed in a more robust validation framework.

With all the above advantages, it quickly becomes the standard of data exchange especially in web service world. However, XML also carries its disadvantages like it is verbose and the hierarchical model for representation is limited in comparison to an object oriented graph.

Other options:

  1. XML vs JSON - JSON is now more attractive than XML for kinds of data interchange that powers Web-based mashups and Web gadgets widgets. Why? Look at the articles below:
    • Fixing AJAX: XMLHttpRequest considered HarmfulYou don’t see much AJAX examples that access third party web services like Amazon, Yahoo and Google. That is because all the newest web browsers impose a significant security restriction on the use of XMLHttpRequest. That restriction is that you aren’t allowed to make XMLHttpRequest to any server except the server where your web page came from. If you attempt to do so, XMLHttpRequest will either fail or pop up warnings, depending on the browser you are using… Solution: Application Proxy, Apache Proxy or Use Script Tag Hack (On-demand Javascript).
    • JSON vs XML: Browser Security ModelThis article comments the solutions proposed above. It indicated that Script Tag approach is better than proxy.
    • JSON and Yahoo!’s Javascript API – This article will give you example of how to use Script Tag to communicate with Yahoo Web service API and bypass the restriction of XMLHttpRequest. The way to bypass XHR restriction is not using XHR at all. The cross-site requests are made by adding script tags to a document’s HEAD with DOM methods (i.e. [code]]czoxNDpcIi5hcHBlbmRDaGlsZCgpXCI7e1smKiZdfQ==[[/code])
    • Is JSON better than XML (a good objective review)
    • In conclusion, JSON enables you to use Script Tag approach to bypass XHR security restriction b/c JSON itself is part of Javasript. That makes JSON popular.
  2. YAML as an alternative of data serialization
  3. Java Serialization will take object to binary representation (versioning headache). XStream is a simple library to serialize objects to XML and back again.

For machine, data is represented in binary format:

  1. The art of assembly language (a free book that you can read online)
Leave a comment Continue Reading →