Archive | Data Intelligence RSS feed for this section

Postgresql – Power of Array Type

Create 2 tables
Item(id) and Item_log(item_id, price)

Populate it
insert into item(id) values(1);
insert into item(id) values(2);
insert into item(id) values(3);
insert into item(id) values(4);

insert into item_log(item_id, price) values(1, 100);
insert into item_log(item_id, price) values(1, 100);
insert into item_log(item_id, price) values(1, 100);
insert into item_log(item_id, price) values(1, 200);
insert into item_log(item_id, price) values(1, 200);
insert into item_log(item_id, price) values(1, 200);
insert into item_log(item_id, price) values(1, 200);
insert into item_log(item_id, price) values(1, 200);
insert into item_log(item_id, price) values(1, 200);
insert into item_log(item_id, price) values(2, 200);
insert into item_log(item_id, price) values(2, 200);

Run SQL
SELECT COUNT(il.price), i.id AS item_id, il.price
FROM item i, item_log il
WHERE i.id = il.item_id GROUP BY il.price, i.id;

Result
count | item_id | price
——-+———+——-
3 | 1 | 100
6 | 1 | 200
2 | 2 | 200

Run SQL
SELECT COUNT(il.price), i.id AS item_id, il.price, array_accum(il.id) AS item_id_array
FROM item i, item_log il
WHERE i.id = il.item_id
GROUP BY il.price, i.id;

Result
count | item_id | price | item_id_array
——-+———+——-+—————
3 | 1 | 100 | {1,2,3}
6 | 1 | 200 | {4,5,6,7,8,9}
2 | 2 | 200 | {10,11}

Leave a comment Continue Reading →

Flex – Store data in client via SharedObject

Introduction 

The SharedObject class functions like a browser cookie. You use the class to store data on the user’s local hard drive and call that data during the same session or in a later session. Below are the keys of using SharedObject:

  1. Applications can only access their own SharedObject data, and only if they are running on the same domain. The data is not sent to the server and is not accessible by other Flex applications running on other domains.
  2. When you create a SharedObject, Macromedia Flash Player creates a new directory for the application and domain, and creates an empty *.sol file that stores the SharedObject data. The default location of this file is in a subdirectory of the user’s home directory.
  3. By default, Flash can save locally persistent SharedObjects of up to 100K in size. When you try to save a larger set of data, Flash Player displays the Local Storage dialog box, which lets the user allow or deny local storage for the domain that is requesting access.
  4. Some important methods:
    • clear()
    • flush() – write to the file. If not, flex persists it when the application is closed. However, this does not provide the user with an opportunity to increase the available space
    • getLocal(“name”)
    • getSize()
  5. You can store simple data types in a SharedObject. These types are Number, String, Boolean, XML, Date, Array, and Object. After you assign values to the data property, you must instruct Flash Player to write those values to the SharedObject’s file. To force Flash Player to write the values to the SharedObject’s file, use the flush() method:

[code]]czo1MzpcInNoYXJlZE9iamVjdF9JRC5kYXRhLnZhcmlhYmxlID0gdmFsdWU7IC8vc3RvcmUgZGF0YQ0KXCI7e1smKiZdfQ==[[/code]

LSO vs Cookie

  1. LSOs are can store more data than cookies
  2. LSOs never expire
  3. LSOs arent transmitted between the client and the server
  4. LSOs can store native ActionScript datatypes. 

Reference

http://www.adobe.com/support/documentation/en/flex/1/lsos/lsos5.html

Leave a comment Continue Reading →

RichTube – Flex-based Youtube!

This is great application to demonstrate the power of Flex

http://www.richapps.de/flexsources/richtube/richtube.html

Leave a comment Continue Reading →

How to unit test your Flex?

Use FlexUnit to do the job

  • Getting Started with FlexUnit – Darron Schall
  • Using FlexUnit with Ant  – Peter Martin
  • Advanced ActionScript 3 with Design Patterns Book  – Chapter 1
  • Unit testing and TDD for Flex and ActionScript 3.0 - Neil Webb
  • FlexUnit reference doc
  • Leave a comment Continue Reading →

    How to Export Data out from Flex Datagrid?

    There are 2 ways that you can do to export data out from the datagrid.

    1. You can format the data and write it to TextArea where users can use the mouse to highlight and use Ctrl+C to copy it.
    2. You can send the formatted data back to server and the server-side script like servlet will return the data with the right MIME-TYPE. For this to work, you need to use FileReference API. (detailed)
    Leave a comment Continue Reading →

    Why Flex for RIA, no AJAX?

    Here is the list of reasons why I chose Flex for the RIA development.

    1. Write Once Deploy Everywhere – Flex generates SWF that runs on top of Flash Player VM and behaves consistently across different browsers, even mobile phones later. With this, all the browser compatibility issues are basically offloaded by Adobe.
    2. Solid programming model with rich widgets and libraries.
    3. AMF makes Flex object to Java POJO communication possible. No need to use verbose XML – Check out BlazeDS.
    4. Flex IDE is a plugin in Eclipse that gives stepwise debugging, UI design console, code completion and more. Working with Actionscript is like Java.
    5. Flex SDK is open source and free.
    6. Great support on video streaming
    7. Integrate with HTML, Javascript and CSS, so it is not invasive adoption.
    8. Support offline application via AIR – Adobe has been working on the Adobe Integrated Runtime (AIR) that allows for using existing web application development skills to build and deploy desktop applications. AIR is still in early development, but promises to allow developers to use their newly learned Flex skills to build desktop applications. No need to learn Swing, Applet…etc.
    9. Provide several RPC methods like HTTPService, WebService, AMF and JSON. AMF is 10x faster than SOAP. James Ward developed his Census Flex application to provide performance benchmarks for the different RPC methods in the mainstream RIA technologies. (Download)
    10. You can keep the state in the Flex app and have your server completely stateless.
    11. More to come! :)

     

    Leave a comment Continue Reading →

    Flex – Performance Guidelines

    Flex Performance Guidelines

    1. Avoid nested container - it is resource intensive for Flex to figure out the x,y of the elements via sizing and measuring algorithm. Tips: hard-code x,y and width and height.
    2. Naviagtor containers have built-in deferred instantiation – eg. Accrdion, TabNavigator, ViewStack.
    3. Progressive layout improves initialization experience using queued creationPolicy in the deferred instantiation architecture. UI element are loaded in the successively fashion. Progressive layout does not quantifiably reduce application startup time, but it significantly improves the perceived startup time.
    4. Handle large data sets may make your Flex application appears slow. You can use pagination or other methods to address this. (reference).
    5. Improve charting performance
      • All charts cache intermediary values in the transformation from data to screen, so that only the minimum amount of recalculation occurs in response to any change to the data or chart.
      • The most expensive actions to perform in Flex charts is forcing a chart to redraw an axis, or forcing a chart to recalculate its labels. So, change dataProvider is costly in this sense.
    6. Use Runtime Shared Library (RSLs) to shrink the size of your application’s resulting SWF file. Because once you externalizing shared assets into standalone files, you can separately download and cache on the client. (reference)
    7. Use Profiler: getTimer() to record the time in millisecond in ActionScript.

    Reference

    Good Video: https://admin.adobe.acrobat.com/_a300965365/p71169528/

    Leave a comment Continue Reading →

    Data warehouse 101

    To build data warehouse, you will use the techniques of dimensional modeling. Here are the guidelines you can follow:

    1. Divide the world into measurements and context.
    2. Numeric measurements place in Fact table whereas context are broken down into Dimensions. A fact table in a pure star schema consists of multiple foreign keys, each paired with a primary key in a dimension, together with the facts containing the measurements.
    3. Build the FK-PK pairs as surrogate keys that are just sequentially assigned integers.
    4. Use a special record in Dimension to represent unknown or no because we want to avoid putting null as FK.
    5. Resist snowflake the dimensional tables and leave them in flat second normal form because the flat tables are much more efficient to query. Snowflaking a dimension into third normal form, while not incorrect, destroys the ability to use bitmap indexes and increases the user-perceived complexity of the design.
    6. Semi-additive fact – Most fact tables are huge, with millions or even billions of rows, you almost never fetch a single record into your answer set. Rather, you fetch a very large number of records, which you compress into digestible form by adding, counting, averaging, or taking the min or max. Bank balance and inventory levels represent intensities that are awkward to express in an additive format. You sum over balance for 1 month is not really meaningful. Normally, we still treat these semiadditive facts as if they were additive but just before presenting the results to the end user, divide the answer y the number of time periods to get the right result. This technique is called averging over time.
    7. Slowly changing dimension -
    8. Hierarchical Dimension – There are 2 types of hierarchies. One is “Parent-Child” relationship and the other one is “Array of Level”. Array of level like Country -> State -> City -> Store. Parent-Child like product categories that can be nested in different ways.
    Leave a comment Continue Reading →

    Flex – Cairngorm Microarchitecture

    What is Cairngorm?

    Adobe Consulting Group has defined an architectural framework for Flex application, named "Cairngorm". The framework has borrowed quite a bit of design patterns from GOF and J2EE. Its goal is to help us to layout the groundwork for complicated RIA. It is interesting to see how those patterns work together in a seamless way. The framework not only cleans up our code, but it also decouples our components. The result is very neat and elegant!

    flexarch.JPG

    Lets go through some interesting points for this microarchitecture. I assume you already read this excellent 6 parts articles from Steven Webster.

    Encapsulate your Flex component and make it no dependency to the Cairngorm framework to promote high reusability

    • To do a better encapsulation – input (VO) and output (event) are non-Cairngorm object.
    • Component internally use VO to bind to its controls and external link VO from ModelLocator to it. So, whenever the VO in ModelLocator be changed by Command, the components’ controls that link to it will be updated.
    • User gesture is first captured by the component via event listener and then re-dispatched as custom event and bubbled up. Have the View that uses the component to capture the custom event and re-dispatch again as one type of CairngormEvent with VO loaded. In Cairngorm 2.2, you simple can call the "dispatch()" method from the CairngormEvent. Then, the FrontController can take care the rest. The CairngormEvent dispatched should be non-bubbled and cancellable.

    View is the place for layout the controls and interact with Cairgnorm objects (normally it is non-reusable)

    • View can set up control with VO from ModelLocator and capture custom event from control and re-dispatch it as a type of CairngormEvent.
    • View contains state definition that is explicitly binding to the ModelLocator where it contains what state the View should be. The code below indicates that when the search_state is changed, the searchState() method will be invoked and alter the state of the View.

    [code]]czoxMTc6XCImbHQ7bXg6QmluZGluZyBzb3VyY2U9JnF1b3Q7e01vZGVsTG9jYXRvci5nZXRJbnN0YW5jZSgpLnNlYXJjaF9zdGF0ZX17WyYqJl19JnF1b3Q7IGRlc3RpbmF0aW9uPSZxdW90O3NlYXJjaFN0YXRlJnF1b3Q7IC8mZ3Q7XCI7e1smKiZdfQ==[[/code]

    Command should be the one that changes the model

    • FrontController is a registry that associates Event with Command. This mapping can be M:1.
    • Consider ModelLocator is a mediator for Views to interact one another via changing the model.
    • Delegate may intercept the response via registered its result and fault handler. The goal is to hide the detail from remote call. Therefore, it is good practice to convert XML to VO (via Factory) before invoking Command’s result and fault handler.

    [code]]czoxOTU6XCI8Zm9udCBzaXplPVwiMlwiPnZhciB0b2tlbjpBc3luY1Rva2VuID0gc2VydmljZS5zZW5kKHBhcmFtcyk7IHZhciByZXNwb3tbJiomXX1uZGVyOm14LnJwYy5SZXNwb25kZXIgPSBuZXcgbXgucnBjLlJlc3BvbmRlcihzZWFyY2hCb29rc19vblJlc3VsdCwgc2VhcmNoQm9ve1smKiZdfWtzX29uRmF1bHQpOyB0b2tlbi5hZGRSZXNwb25kZXIocmVzcG9uZGVyKTs8L2ZvbnQ+XCI7e1smKiZdfQ==[[/code]

    How to unit test Cairgnorm-enabled project

    • VO, Service, FrontController, Event and ModelLocator are simple classes that are not subject to test.
    • Command can be tested with Mock Delegate
    • Model can be tested if it contains logic
    • Factory can be tested if it contains parsing logic.
    • Delegate can be tested with Mock Service (but a bit tricky as how to write a mock service)
    • Control can be tested via addListener for the custom event thrown.

    References

    Update

    Recently, I have heard that PureMVC provides a clean framework than cairngorm as cairgnorm uses a lot of singleton framework class. I haven’t got a chance to look into it. Be sure will keep you update if I find it interesting.

     

    Leave a comment Continue Reading →

    Pick the right database for data warehouse

    For those who don’t want to go for licensing path. Open source is definitely a better solution. However, whether open source DBMS can be used to build your data warehouse? I am not a good person to answer this question. But I have seen more and more small and medium size companies launched their business intelligent platform powered by open source DBMS like PostgreSQL and MySQL. Before MySQL v5 released, I don’t recommend to use MySQL for data warehouse because it missed some of key features that others provide like trigger, stored procedure, partitioning. But now, I suggest to revisit it, especially I have heard MySQL become a golden partner with the great open source business intelligent platform “Pentaho”. Below is a rough comparison chart for DBMS I got from devx.com. Take a look at it first.

    dbcomparison.JPG

    There are debates about whether we should choose PostgreSQL vs MySQL. Here is one case study that shows PostgreSQL is better in OLTP system. However, for select query, another study shows MySQL v5 is 2X faster the PostgreSQL v8. For data warehouse application, MySQL sounds like a better option as it is mostly read-only.

    Here is the summary that I obtained from this article that compares PostgreSQL with MySQL:

    1. MySQL uses traditional row-level locking. PostgreSQL uses something called Multi Version Concurrency Control (MVCC) by default. MVCC is a little different from row-level locking in that transactions on the database are performed on a snapshot of the data and then serialized.
    2. MySQL supports the advanced feature of data partitioning within a database whereas PostgreSQL does not.
    3. PostgreSQL has many of the database features that Oracle, DB2, or MS-SQL has, including triggers, views, inheritance, sequences, stored procedures, cursors, and user-defined data types. MySQL’s development version, version 5.0, supports views, stored procedures, and cursors. MySQL’s future version, version 5.1, will support triggers.
    4. PostgreSQL supports user-defined data types, while MySQL does not.
    5. Both MySQL and PostgreSQL have support for single-master, multi-slave replication scenarios. PostgreSQL offers additional support for multi-master, multi-slave replication from a third-party vendor, as well as additional replication methods.
    6. MySQL uses a threaded model for server processes, wherein all of the users connect to a single database daemon for access. PostgreSQL uses a non-threaded model where every new connection to the database gets a new database process.
    7. MySQL does not support for bitmap indexes. Bitmap indexes are ideal for the kind of low-cardinality data that is commonly used in data warehouses. PostgreSQL supports bitmap indexes as of version v8.1, as do a number of commercial database systems.
    Leave a comment Continue Reading →