Reporting solution!

Open source reporting

My company needs a reporting engine but it doesn’t want to go for the expensive commerical ones like MicroStrategy. In fact, I don’t know why we need to pay so much because there are tools out there for FREE. As usual, I googled the Net and found out two seemingly promising open source reporting solution.

  1. Pentaho Reporting
  2. Jasper Reporting

Both of them are bundled with a suite of tools related to OLAP, Data Mining, ETL.. etc. To me, I just want an non-invasive reporting engine that can easily integrate into our architecture. To my dismay, I found out Pentaho doesn’t go this route. It basically gives you a reporting server configured. You could build your reports and deploy them following the manual. However, I hardly see a reporting solution that could satisfy all the business requirements without customization. All I expected from Pentaho is a jar file with documents that shows me how to use its api to generate reports in different formats and how to integrate with our database. I have attempted to look into the code and extracted the stuff I want from Pentaho. However, I found out the engine is actually not powerful. To strip out the workflow part, it is basically a simple SQL executor that later on will render the result according to the UI info embedded in the report definition. What is wrong with that?

» Read more…

Linux System Overview – File System

Linux File System Basic

Ext3 (successor of Ext2) is the standard file system for Linux: It is robust, fast and suitable for all fields of use. The main difference between them is that Ext3 has a journal that records the pending operations for fast recovery purpose in the event of system crash. This record guarantees a consistent file system at all times and reduces the time needed for checking a mounted file system from several hours to a few seconds b/c instead of checking the entire disk, the system can check just those areas noted in the journal as having pending operations.

Like all decent Unix file systems, Ext3 uses three general data structures: directories, inodes and data blocks. Directories only contain file names and the inode numbers assigned to them. Each file has one i-node that contains a list of disk block’s starting sector addresses as a file content is normally not stored in contiguous disk blocks in disk drive due to constant add and delete and the size is dynamic (ie. external fragmentation). If the file content are scattered, it takes longer to retrieve its content as it takes more header spins physically.

http://www.heise-online.co.uk/images/110398/0/1

» Read more…

Flex Annotated Charting

Recently, I want to extend the LineChart in Flex. I want to have line chart with event annotated like Google Finance.

 
First of all, I googled the Net to see whether anyone had already done it. It was even better if I could find any open source project related to this. Below are the interesting things I found:
  1. Dow Jone Interactive Chart (commerical – it is exactly what I am looking for)
  2. Interactive Bubble Chart (open – although it is not exactly want I want, but if I believe the code can benefit me if I need to customize line chart. :cool: I may just need to draw the interactive small bubble on the line to get my job done!)
  3. This demo gives you tons of chart samples. They are all great example although none of them satisfy my current need.
  4. This demo is close to what I want. From this demo, I notice I can use “annotationElement” to draw on top of the data series. However, the trick is to convert the data points to pixel coordinate in order for me to draw something that can move along with the graph even someone stretches it. To make thing easier, Ely Greenfield has created DataDrawingCanvas that helps us draw on the chart with only data points specified instead of pixel coordinates. This class extends the ChartElement like AnnotationElement does (blog). That is amazing!! Thanks!!! :smile:
  5. Google Finance Chart (It is exactly what I want. I wonder I can get the source of it)
    • I have found the blog and powerpoint of this sample (1/7/2009)
    • Google uses the Flash/ JavaScript integrate kit to get it works (blog) – I heard that it is very nice combination of Flash and AJAX. This is similar to MeasureMap‘s use of the kit.
    • It is open source example!!! (code). Thanks for Brendan Meutzner!!
    • Brendan also shows us how he created his demo in 5 steps to help us understand how to build it ourselves.
    • Step 1, Step 2, Step 3, Step 4, Step 5 – enjoy!!

Reference

Useful resources:

  1. Data Visualization by Tom Gonzalez. (Tom created an open source visualization framework named Axiis. It looks great. Once I get a chance, I will dig into it) – 7/31/2009
  2. Building a Flex Component by Ely GreenField
  3. Create component and enforce separation of concern
  4. http://www.edwardtufte.com/tufte/ (Edward Tufte – famous guy in data visualization)
  5. http://www.insideria.com/2008/03/image-manipulation-in-flex.html (Image Manipulation)

 

How to pick a good web hosting company for Java webapp

I currently use Dreamhost for my own company “JustProposed.com” that is powered by typical LAMP solution. It is a great shared hosting solution but it doesn’t support website powered by Java. If you have Java website, I suggest you to try a web hosting company that provides you VPS (Virtual Private Server) solution. VPS occupies a middle ground between a dedicated server and regular shared hosting. You get the features of dedicated hosting in a shared environment. In other words, your virtual server runs on one of host servers. The host server runs a number of virtual servers.  Each virtual server shares the host server’s memory, CPU, Internet connection and other resources. No one VPS can monopolize resources.  Each VPS gets a guaranteed share of the server CPU, disk IO and network.

» Read more…

Wiring up Flex, Mate, BlazeDS, Spring, Hibernate and MySQL with Maven 2 – Part 1

Introduction

This article is written on top of the great work that Sébastien Arbogast has done. He has written 3 articles that showed you how to wire up Flex, BlazeDS, Spring, Hibernate and MySQL with Maven as build process. I have included his articles below as your reference.

  1. The Flex, Spring, and BlazeDS full stack – Part 1: Creating a Flex module
  2. The Flex, Spring and BlazeDS full stack – Part 2: Writing the to-do list server
  3. The Flex, Spring and BlazeDS full stack – Part 3: Putting the application together

I have found Sebastien’s work as a good foundation for my own project. To contribute back to the community, I will write a series of articles to show you how can customize and extend the todolist sample.

What is in the Part 1 of the series…

  1. Enhancements on the Maven build process
    • Leverage RSL to factor our the framework swc, so the size of the application swf will be reduced. Apart from that, I also take advantage of Flash Player Cache that is available after version 9 update 3 to cache the framework libraries.
    • Clean up the Flex and BlazeDS dependencies in POM as the latest version of the sdk is available and the BlazeDS dependencies are officially available.
    • Include some common reports for maven site generation
    • Embed Jetty web server in the build process for quick deployment and testing
  2. Document how to get the sample up on Eclipse for development
  3. Use Mate as Flex framework
    • Restructure ToDoList sample to leverage Mate framework
    • Factor out Mate as RSL and integrate it with Maven build process via Flex-mojo plugin.

What are in the coming articles…

  1. In part 2 of this series, I will show you how to use flex-mojo to build a modular Flex application.
  2. In part 3 of this series, I will show you how to test your flex app via FlexUnit (Unit test) and FlexMonkey (Functional test)
  3. In part 4 or this series, I will work on server side. I am planning to add monitoring, caching and security to the server side.

» Read more…

Database concurrency control – MVCC

Concurrency Issue – Lost Update

Lost update is the key concurrency problem that we try to avoid:

From the sequences above, SELECT-UPDATE transaction A will overwrite the update Transaction B made to the balance. .  If the transactions of A and B would serialize properly, the correct balance value after these transactions would be 700. For performance reasons, most RDBMS uses the isolation level “READ COMMITTED” as default. However, this isolation level does not protect any data read by transaction of getting outdated right after reading the value as another transaction can change it. You can use isolation level “REPEATABLE READ” or  “SERIALIZABLE” to protect the values read in the transaction of getting outdated via holding shared read lock on these rows up to the end of the transaction. In case you are new to RDBMS, let me give you some background.

» Read more…

Build your project via Maven 2

POM demysterified

POM lets you to tell maven the “what” and maven knows the “how”. Put it in another way, POM gives you an abstraction layer over your build process to make things more explicit, more standard and easier to follow. There are several key elements in POM that I want to talk about. Below is a nice diagram I borrowed and it shows you what information the POM captures.

 

» Read more…

How to test my system?

Here is a good video from Google 

Testing Strategies Overview

I believe that I don’t need to stress out how important testing is. So, lets cut the crap and talk about the challenging areas related to this subject. Here are the things I want to talk about

  1. Write your unit test first then code
    • It is the core concept from Test Driven Development (TDD). Everyone tried that will tell how great it is. However, the difficult part of it is to enforce it across your team especially under the time constraint. Everyone wants to get something out right away and we are all getting so much stresses from our business side about features release to stay competitive in the market. How can we allocate time for this “luxury”? The key here is whether you believe it will save you tons of time in a long run. If not, it is hard to carry out this idea.
    • What are the areas that could save my time? Agile development may face one big challenge from the product side. The problem is mainly due to the way they break down the project into mini-specs to fit into a sprint of work. Sometimes they find it difficult to think of a project in a micro level. To deliver the mini-spec in a agile process, they may leave out the detail in the spec and try to fill them up in an iterative fashion. Write your test first forces you to think API, input, output and use cases ahead of time. This will help you to feedback the product team earlier in the loop. Otherwise, it will cause you more time to change your logic or patch your codes later.
    • If you have a suite of unit tests, you are more confidence to do refactoring because you can perform regression tests to make sure nothing is broken. Apart from that, you can add new features to someone’s code without worrying that it will break the old features as you have the unit tests from the owner to verify.
    • Things you need to pay attention are: (1) Don’t commit code to the trunk that breaks the build (compile), but it is ok that it doesn’t pass the tests temporary as you are in the middle to fill up the code and you don’t want to have your code sitting in your local box too long. (2) Do code coverage to make sure you have your unit test exercises your codes in good %.
    • Tools I used to write my unit test: TestNG/ JUnit for Java code, FlexUnit for flex code
    • How do we unit test our DAO? Test with your local database (use same vendor and version of the database as the one in production. I don’t like to use any memory databases like Hypersonic or SQLite b/c it doesn’t really test our your SQL).
  2. Functional testing – use Selenium to automate it.

Performance/load testing

Basic workflow

  1. Create use case scripts for your application via recording. For Flex, we want a tool that can convert the binary AMF message to readable text as part of our script.
  2. Introduce natural variability - varying the login credentials, records opened, records saved, and other user actions including thinking time. This step is important because it provides the real-world variability that keeps your application working hard and prevents you from missing performance problems because of caching at various layers on your server.
  3. Define test data sets to be used by your scripts as you ramp up hundreds or thousands of simulated users executing your tests.
  4. Define profiles of load on your application. You might decide to have 20% of your users entering information slowly, 20% entering information very quickly, 40% looking up information, 10% producing reports, and 10% making account changes. The breakdowns and level of detail are entirely up to you.
  5. Specify how to run the test. You control such things as the speed of ramping up user load, how fast each user executes the test, and whether there should be natural variability between users and sessions. Each tool provides different fine-grained control over this step.

Tips

  • Don’t carry performance test without passing all the functional tests.
  • Test against single box first and you should not run JMeter on the same machine running the application to be tested.
  • Make sure that the testing is affected as little as possible by network traffic (do it in the subnet to take out network IO factor).
  • Create a test plan that contains thread group(s) that controls the threads that will be created by JMeter to simulate simultaneous users. Determine # of threads and the ramp-up period. Add HttpRequest then listener to view result in table and/or graph format. Each thread will issue an HttpRequest and response time will be recorded. Then avg response time and standard deviation will be calculated. Another way people uses JMeter is to use its HTTP session recording to produce load tests.
  • On the other hand, you need to monitor your target in terms of CPU, memory and IO consumption (use OpenNMS) during the perfomance testing.
  • Create a scheduled integration build that runs JMeter tests (use CruiseControl). Read this article for integration detail.
  • How to achieve the load via distribution?
  • How to obtain meaningful information from the raw data?
  • Look at throughput (QPS), resources consumption (CPU, Memory, IO) over time and over users/ load.
  • Need to set up the database with good amount of data for testing.
  • Performance profiling
    • Obtain the rough data from performance test first
    • Identify the bottleneck and fix the problem.
    • Don’t waste too much time to over-tune your system. Just fix the bottlenecks that impact the SLA.
  • Stress testing
  • Scalability testing – testing on scaling out or up your system.
  • Availability testing

Reference

Below are the references I read to write this article

  1. http://www.opensourcetesting.org
  2. JMeter User Manual

 

Concurrent Programming – Part 1 Synchronization

Get yourself familiar with concurrency programming

When I interview my candidates, I like to ask questions related to multi-threading. I found out that it is a good topic to differentiate out a hardcore programmer from application-oriented programmer. I am not saying I am looking for someone who could write the concurrency library as efficient as the one created by Doug Lea. In fact, I am looking for candidates who has solid understanding of this topic. However, I found out that most candidates have little knowledge in this area apart from the meaning of “synchronized” keyword in Java syntax. Therefore I decide to write a series of articles to cover some areas of multi-threading that I feel important to understand. Of course, I would start from the basic first.

» Read more…

Our future – Cloud Computing

Cloud computing is a way to leverage the resources in the cloud (internet) instead of your local box. We may hear the term SaaS (software as a service) and HaaS (hardware as a service). They are combined moving us to an new generation of computing, called utility computing. You can imagine software and computing power could be leased like electricity and our laptop is more like a device to hook it up. That is a great news for enterpreneurs like us. Now, we can leverage the cloud to reduce the first time set up cost to support million of users. It is not just an idea. Companies like Amazon has released EC2 (HaaS), Google has released Google App Engine (HaaS) and Salesforce has released online customizable CRM (SaaS). We can expect more to come in this area. Here is a good video that gives you a better explanation.

 

 

For SaaS, it is similar to the ASP model. After Web 2.0, developers can now make applications that give you similar user experience as your desktop applications. Now, you can find out application like excel and powerpoint on the Net. Check out Google doc  and Sliderocket.com and you will find out how powerful these tools are. The beauty of these tool is that you don’t need to buy or download applications to your laptop and you can access it anywhere, anytime (assume you have Internet access). On top of top, it promotes collaborative work that can only be achievable through the Internet. I am really excited to see more applications come out to replace the desktop application and I can use my iPhone to use it anytime I want. :lol:

Page 3 of 121234510...Last »