Archive | October, 2007

Architectural Review – Scalability and Performance

How to start our Architectural Review  

We can review our architecture in 3 different aspects to evaluate its scalability and performance.

  1. Deployment and infrastructure
  2. Technology Stacks
  3. Performance and Scalability Frame

Deployment and Infrastructure

  • Do you need a distributed architecture? If not, you better co-locate web and app server to eliminate the overhead via remote communications (ie. serialization cost and network latency). If you really need to separate them, use facade pattern to provide a coarse-grained interface and cache data locally where appropriate.
  • What distributed communication should you use? In Java, you have choice like SOAP, XML, REST, Hessian, RMI, EJB…
  • Do you have frequent interaction across boundaries? Reduce coupling and increase cohesion via encapsulation and component design.
  • What restrictions does your infrastructure impose? Like internal and external firewall, SSL and etc.
  • Do you consider network bandwidth restrictions? Evaluate the size of the average request and response and multiple it by the expected concurrent load of users. The total figure should be considerably lower than the available network bandwidth. There are various ways to minimize the client bandwidth like use data paging, enable HTTP 1.1 compression (eg. gzip).
  • Do you share resources with other applications? CPU, Memory and IO are 3 major resources for an application.
  • Does your design support scaling up and out? Scale out first then up. Avoid server affinity by designing appropriate state and caching mechanisms.

Performance and Scalability Frame

  1. Communication
    • Chatty remote call? How you do remoting?
    • Secure communication?  Pick the right key size and encryption algorithm. If you use encryption, where possible (when both parties are known in advance), use symmetric encryption instead of asymmetric encryption. Asymmetric encryption provides improved security but has a greater negative impact on performance. A common approach is to use asymmetric only to exchange a secret key and then to use symmetric encryption.
    • Asynchronous messaging? queue vs topic. Do you make long running call? If yes, you should consider doing it asynchronously via messaging. If the client needs a way to get the result back and pick up for continue processing, register the callback for the return.
  2. Concurrency
    • How well your design minimizes contention and maximizes concurrency?
    • Issues: Blocking calls, hold on the lock longer than necessary, inappropriate isolation level, misusing threads, nongranular lock
    • Do you need to execute tasks concurrently? Concurrent execution tends to be most suitable for tasks that are independent of each other. You do not benefit from asynchronous implementation if the work is CPU bound (especially for single processor servers) instead of I/O-bound. If the work is CPU bound, an asynchronous implementation results in increased utilization and thread switching on an already busy processor. This is likely to hurt performance and throughput. Consider using asynchronous invocation when the client can execute parallel tasks that are I/O-bound as part of the unit of work. For example, you can use an asynchronous call to a Web service to free up the executing thread to do some parallel work before blocking on the Web service call and waiting for the results.
    • Do you create threads on a per-request basis? Use thread pool is better option. Threads are shared resources and are expensive to initialize and manage. If you create threads on a per-request basis in a server-side application, this affects scalability by increasing the likelihood of thread starvation and affects performance, due to the increased overhead of thread creation, processor context switching, and garbage collection..
    • Do you design thread safe types by default? If it is unnecessary, use non-thread safe type (eg. use HashMap instead of Hashtable).
    • Do you use fine-grained locks?
    • Do you acquire late and release early? Acquiring late and releasing shared resources early is the key to reducing contention. For example, you may use synchronized block instead of synchronized method to reduce the contention.
    • Do you use the appropriate synchronization primitive? Read/Write lock, Mutex, Semaphore, atomic operation like increment on integer rather than double.
    • Do you use an appropriate transaction isolation level?
    • Does your design consider asynchronous execution?
  3. Caching
    • Do you cache data?
    • Do you know which data to cache?
    • Do you cache volatile data?
    • Have you chosen the right cache location? Different cache layers.
    • What is your expiration policy?
  4. State Management
    • Keep your server as stateless as possible.
    • If you cannot, pay attention what to put in the session in term of size and amount.
    • Expire the session once the user logout or you set the session expiration time for idle session to timeout.
    • The objects you put in the session need to be serializable.
  5. Data structure and algorithm
    • Use the right data structure and algorithm can greatly improve the performance.
Leave a comment Continue Reading →

Yahoo Talk – Rules to scale your Frontend

From the Yahoo research on user experience, they found out that 90% of user waiting time is coming from the frontend. So, we should start paying attention on this to get a bigger impact on our changes in a short time.

Here are the 14 rules they formulated:

  1. Make fewer HTTP requests
    • Concatenate scripts and stylesheets to reduce HTTP traffics
    • CSS sprites (combine images to 1 file and use background position attribute for particular image).
  2. Use a CDN
    • Akamai (distribute your static content before distributing your dynamic content).
  3. Add an Expires header
    • Maximize the use of Browser cache (not just for images but stylesheet, scripts as well).
  4. Gzip components
    • Gzip will cut 90% of size for response to send back (fewer TCP packets). This will affect users’ download times. Configure apache to do it.
  5. Put stylesheets at the top
    • IE waits for all external stylesheets to load up before rendering.
    • Improve user perception to put it at the top.
    • use LINK tag but not @import
  6. Move scripts to the bottom
    • scripts block parallel downloads across all hostnames. In HTTP 1.1, you can download 2 components in parallel per hostname. So, JS right above your component to minimize the block.
    • scripts block rendering of everything below them in the page
    • scripts defer attribute is not a solution. Block rendering and downloads in FF but slightly in IE. So, rely on it.
  7. Avoid CSS expressions
    • Use one-time expression
  8. Make JS and CSS external
    • Could be cached!
  9. Reduce DNS lookups
    • typically takes 20-120ms.
  10. Minify JS
    • Take away white space, tab… and comments on JS to reduce its size.
    • Obfuscation is risky and not save you much unless you want to protect your JS.
  11. Avoid redirects
  12. Remove duplicate scripts
  13. Configure ETags
  14. Make AJAX cacheable
    • You can make expire time far in the future and put last modified time in the url.

Tool

http://developer.yahoo.com/yslow/ (Firebug)

Leave a comment Continue Reading →

How YouTube tackles its scalability issues?

This video is very informative that shows you how to scale a site from infancy to 100 millions view per day. It is amazing that the team behind this is very small.

Interesting areas:

  1. Database scaling
    • Initially YouTube had single master multiple slaves replica database architecture. Replicating your database has many advantages. For example, backup and recovery become easier. Rather than shutting down your master server to back it up, you can simply back up a slave. More importantly, replication makes it easier to scale large applications. By sending all write queries (INSERT, UPDATE, DELETE) to the master and using slaves for most of the read queries (SELECT), it’s possible to achieve nearly linear scalability for read-intensive applications. Because YouTube used MySQL, its replication is done in asynchronous log-based fashion. Under the hood, master puts the write queries (DML) to the binary log and slaves’ IO threads are responsible for connecting to the master, retrieving copies of the events recorded in the binary log and write to their own relay log. Then SQL thread will pick up and replay the queries (better than replicating delta if cheap DMLs with large data change). This dual-threaded design helps to ensure that the slave always has the most recent data, even if it hasn’t had a chance to process it.
    • Master-N-Slaves is very good for read intensive application. However, YouTube is also write intensive. If your application is write intensive this configuration will be saturated much faster because it has to handle much more write load. Especially keeping into account MySQL replication is single thread it might be not long before it will be unable to keep up. Just like everyone else that goes down this path, it eventually reach a point where replicating the writes to all the DBs, uses up all the capacity of the slaves. Apart from that, asynchronous nature of replication has its own compromise b/c slaves can contain stale data. It is true delay is often insignificant but in times of heavy load or in case you was running some heavy queries on the master which not take time to replicate to the slave replication lag can be significant.
    • Db partitioning from monolithic DB to multiple disjoint shards. This way, you can spread both read and write. Plus you will have better use of cache. 
    • In YouTube, they do user partition and have the web server to determine which shard to hit based on what user content is requested. You can use an algorithm to figure out what shard to go at runtime with user_id as input. However, YouTube decides to create a table for user and shard mapping because it will give the flexibility to move user data to different shards if necessary. For example, power account may have huge amount of video requests. Relying on hashing the userId to find the shard will end up having unbalanced resource allocation (ie. overloading a shard while other shards are underused).
    • In this video, YouTube also found out there is swapping problem in Linux 2.4. Look like the kernel is aggressively swap out the pages even MySQL hasn’t not used up the free memory. It finally causes the database to halt as MySQL and kernel finally end up swapping in and out the page. YouTube simply removes the swap from kernel to get around this problem. According to Jermey Zawodny, he has noticed that the kernel still attempts to do swap check even you disable the swap (detail here). In term of virtual memory management, FreeBSD has done a much better job. However, Linux is better in managing threads.
  2. Improve performance
    • Caching, CDN 
  3. Google BigTable
  4. RAID
  5. MapReduce

Reference

  1. http://kylecordes.com/2007/07/12/youtube-scalability/
  2. http://freescienceonline.blogspot.com/2007/08/scalability-and-scalable-architecture.html (more videos)
  3. http://www.mysqlperformanceblog.com/category/replication/ (great blog)
  4. Google Big Table Paper
  5. Great MySQL HA Presentation
  6. Scalability Resource Links
Leave a comment Continue Reading →

RichTube – Flex-based Youtube!

This is great application to demonstrate the power of Flex

http://www.richapps.de/flexsources/richtube/richtube.html

Leave a comment Continue Reading →

Tomcat 5.5 – Quick Notes

Configure Tomcat

  1. Change port to 80. 
    • Edit install_dir/conf/server.xml and change the port attribute of the Connector element from 8080 to 80.
  2. Turn on servlet reloading.
    • Edit install_dir/conf/context.xml and change [code]]czoxNTpcIiZsdDtDb250ZXh0Jmd0O1wiO3tbJiomXX0=[[/code] to [code]]czozNjpcIiZsdDtDb250ZXh0w4LCoHJlbG9hZGFibGU9XCJ0cnVlXCImZ3Q7XCI7e1smKiZdfQ==[[/code].
  3. Change the default AJP/1.3 connector port of Tomcat.
    • Edit install_dir/conf/server.xml and change the value of the port attribute in the AJP/1.3 Connector element.
  4. Access manager console.
    • Create a new username/password combination and associate the role name manager with it.
    • Edit install_dir/conf/tomcat-users.xml

Reference

  1. http://pdf.coreservlets.com/ (free online servlet book)
  2. http://www.anders.com/projects/sysadmin/tomcat.html (configure to have apache 2.0 and tomcat 5.5 worked together)
Leave a comment Continue Reading →

Database Indexing – MySQL

Database index is similar to the book index that helps you quickly locate the information you want. Without index, you need to do table scan to find out a set of records that match – O(n). However, to maintain a separate list of indexes’ values and keep them updated as your data change (insert/ update/ delete), you introduce overhead as well. So, we don’t index every column in a table.Multicolumn indexes
Some of the indexes have more than 1 columns (multi-column). Such indexes can improve the query speed if you often query all columns together in the WHERE clause or if a single column doesn’t have sufficient variety (not selective as >25% of the rows have the same value in this column – table scan is better for that). For example, you create multicolumn index for last_name & first_name that gives you fewer rows than just look at last name. If you create 2 indexes instead of multicolumn index, it will not be helpful as MySQL won’t use them at the same time. In fact, MySQL will ever use one index per table per query except for UNION. To choose the index, MySQL will pick the one that gives fewer rows back based on some statistics. If you create a composite (multi-column) index, the order of the columns in the key are very important. Try to order the columns in the key as to enhance selectivity, with the most selective columns to the leftmost of the key.

[code]]czo1MjpcIkFMVEVSIFRBQkxFIHBob25lX2Jvb2sgSU5ERVggKGxhc3RfbmFtZSwgZmlyc3RfbmFtZSlcIjt7WyYqJl19[[/code]

If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimizer tries to find the most restrictive index by deciding which index finds fewer rows and using that index to fetch the rows. If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3).

MySQL can’t use a partial index if the columns don’t form a leftmost prefix of the index. Suppose that you have the SELECT statements shown here:

[code]]czoxMzU6XCJTRUxFQ1QgKiBGUk9NIHRibF9uYW1lIFdIRVJFIGNvbDE9dmFsMTsNClNFTEVDVCAqIEZST00gdGJsX25hbWUgV0hFUkV7WyYqJl19IGNvbDI9dmFsMjsNClNFTEVDVCAqIEZST00gdGJsX25hbWUgV0hFUkUgY29sMj12YWwyIEFORCBjb2wzPXZhbDM7XCI7e1smKiZdfQ==[[/code]

If an index exists on (col1, col2, col3), only the first of the preceding queries uses the index. The second and third queries do involve indexed columns, but (col2) and (col2, col3) are not leftmost prefixes of (col1, col2, col3).

Clustered Index
Indexes are normally kept in a completely separate file that contains a list of primary (and possibly secondary) keys and a value that represents the byte offset of the record. These ensure MySQL can find and then quickly skip ot that point within database to locate record. MySQL has to store the indexes this way because the records are stored in essentially random order.

With clustered indexes, primary key and the record itself are “clusterd” together, and the records are all stored in primary-key order. InnoDB uses clustered indexes. In Oracle, clustered indexes are known as “index-organized tables”. When your data is almost always searched on via its primary key, clustered indexes can make lookups incredibly fast. For a standard MyISAM index, there are 2 lookups,  one to the index, and a second to the table itself via the location specified in the index. With clustered indexes, there is single lookup only.

Index Structure

B-Tree Indexes

Balanced tree is a most common type of index. It is the default index structure because of its unique combination of flexibility, size, and overall performance. As indicated in its name, B-tree is a tree and nodes are arranged in sorted order based on the key values. A B-tree is said to be balanced because it will never become lopsided as new nodes are added and removed. The main benefit of this balance is that the worst-case performance of a B-tree is always quite good. It offers O(log n) performance for single-record lookups. Unlike binary tree, B-tree has many keys per node and don’t grow tall or deep as quick as a binary tree. B-tree indexes offer a lot of flexibility when you need to resolve queries esp range-base query. For example:

[code]]czo3MDpcIlNFTEVDVCAqIEZST00gcGhvbmVfYm9vayBXSEVSRSBsYXN0X25hbWUgQkVUV0VFTiBcJ01hcnRlblwnIGFuZCBcJ01hc29uXCc7e1smKiZdfVwiO3tbJiomXX0=[[/code]

The server simply finds the first ‘Marten’ record and the last ‘Mason’ record. It then knows that everything in between are also matches.

A B-tree index can be used for column comparisons in expressions that use the =, >, >=, <, <=, or BETWEEN operators. The index also can be used for LIKE comparisons if the argument to LIKE is a constant string that doesn’t start with a wildcard character. For example, the following SELECT statements use indexes:

[code]]czoxMDg6XCJTRUxFQ1QgKiBGUk9NIHRibF9uYW1lIFdIRVJFIGtleV9jb2wgTElLRSBcJ1BhdHJpY2slXCc7DQpTRUxFQ1QgKiBGUk9NIHtbJiomXX10YmxfbmFtZSBXSEVSRSBrZXlfY29sIExJS0UgXCdQYXQlX2NrJVwnO1wiO3tbJiomXX0=[[/code]

Hash Indexes

Hash indexes resemble to a hashtable rather than a tree. The structure is very flat compared to a tree. Rather than ordering index records based on a comparison of the key value with similar key values, hash indexes are based on the result of running each key through a hash function. The end result is that the hash index provides very fast lookup O(1).

Given a key k, store it in an array T with position h(k) where h is the hash function.

Leave a comment Continue Reading →

MySQL performance tuning – best practices

This video from Google is awesome! It gives a good summary of MySQL performance tuning in 45 minutes.  For those who want to get the quick digest from this video. I have put down some notes below. Apart from this,  I also put down the background information for some topics to make it more completed in my blog. Enjoy!

 

Transcript:

Benchmarking Core Concept

  1. Set a target with baseline as starting point. Change 1 thing at a time to see how it affects the performance. At the end, record the configuration and result for tracking later.
  2. Disable query cache via giving it size=0 to avoid the cache affecting the result. However, it is OS cache can aslo skew the result. To minimize its impact, you can pump up # of runs.

Profiling Concept

  1. Explain plan (Access type? path?)
  2. Use the Slow Query Log (threshold to log, no index log) and mysqldumpslow
  3. Concentrate on the stuff that gives you bigger impact
  4. Use mytop to catch locking and long-running queries

Sources of Problems

  1. Poor indexing choices
  2. Inefficient or bloated schema design (don’t denormalize at the beginning)
  3. Bad coding practices (MySql has some subqueries issue, in general using join is better)
  4. Server variable not tuned properly (time consuming test)
  5. Hardware and/or Network Bottlenecks (IO & CPU bound, Network latency)

Index Guidelines

  1. Poor or missing index is fastest way to kill a system
  2. Covering index (MySql can get all the information from the index record which is slimmer than data record. Use this information to complete whatever the queries are. The slimmer the index, the more you can fit into single index block, the fewer reach you going to do, the faster it is).
  3. Ensure good selectivity on index fields. Cardinality = 1 (unique). Lower selective column can attach to other columns to become multi-column indexes.
  4. On multi-column indexes, pay attention to order of fields within the index definition.
      – clustered index organization
      – PK (product_id, tag_id)
      – Innodb vs myism
  5. Database grows, make sure distribution is good.
  6. Remove redundancy indexes (mysql doesn’t check). Update key value and insert will touch indexes (overhead).

Database index is similar to the book index that helps you quickly locate the information you want. Without index, you need to do table scan to find out a set of records that match – O(n). However, to maintain a separate list of indexes’ values and keep them updated as your data change (insert/ update/ delete), you introduce overhead as well. So, we don’t index every column in a table. Some of the indexes have more than 1 columns (multi-column). Such indexes can improve the query speed if you often query all columns together in the WHERE clause or if a single column doesn’t have sufficient variety (not selective as >25% of the rows have the same value in this column – table scan is better for that). For example, you create multicolumn index for last_name & first_name that gives you fewer rows than just look at last name. If you create 2 indexes instead of multicolumn index, it will not be helpful as MySQL won’t use them at the same time. In fact, MySQL will ever use one index per table per query except for UNION. To choose the index, MySQL will pick the one that gives fewer rows back based on some statistics.

Schema Guidelines

  1. Use the smallest data types needed -> narrower index records -> more records per block -> fewer reads -> faster
  2. Choose a small clusting key since it is appended to each secondary index record
  3. Don’t use surrogate keys when a naturally occuring primary key exists
  4. Consider horizontally splitting many-columned table where some columns are rarely accessed or some columns has large size but others are more frequently accessed. Split and join by 1×1 is more efficient.
  5. Consider vertically splitting

Coding Guideline

  1. Use stored procedure for a big performance boost
  2. Innodb doesn’t optimize select count(*) query b/c multiversioning (Use counter table – increment it when insert)
  3. Isolate indexed fields on one side of equation. Use application to insert a day instead of using current day.
  4. LIKE ‘%.com’ cannot use index (use reverse email address and put % on the right).
  5. Set-based programming vs procedural programming
  6. Don’t try to outhink the optimizator.

Get rid of correlated subquery

[code]]czoxMjQ6XCJzZWxlY3QgcC5uYW1lLA0Kw4LCoCAoc2VsZWN0IE1BWChwcmljZSkgRlJPTSBPcmRlckl0ZW1zIFdIRVJFIHByb2R1Y3R7WyYqJl19X2lkID0gcC5wcm9kdWN0X2lkKSBBUyBtYXhfc29sZF9wcmljZQ0KRlJPTSBQcm9kdWN0cyBwO1wiO3tbJiomXX0=[[/code][code]]czoyOlwidnNcIjt7WyYqJl19[[/code][code]]czoxMzc6XCJzZWxlY3QgcC5uYW1lLCBNQVgob2kucHJpY2UpIEFTIG1heF9zb2xkX3ByaWNlDQpGUk9NIFByb2R1Y3RzIHANCklOTkV7WyYqJl19UiBKT0lOIE9yZGVySXRlbSBvaSBPTiBwLnByb2R1Y3RfaWQgPSBvaS5wcm9kdWN0X2lkDQpHUk9VUCBCWSBwLm5hbWVcIjt7WyYqJl19[[/code]

Derived Table is better than Correlated Subquery

Server Parameters

  1. per global vs per thread (sort buffer size)
  2. Query Cache (off by default): READ intensive (turn it on) but not for write intensive
  3. key_buffer_Size != innodb_buffer_pool_size (50-60% of machine memory should allocate)
  4. Memory is cheapest, fastest, easiest way to better performance
Leave a comment Continue Reading →

What is retargeting?

Connecting businesses with their past website visitors so there is an increased likelihood of a completed transaction. Aware that current industry statistics show that on average, 90% of site visitors leave without taking action, companies are interested to employ retargeting technology to maximize opportunities to reconnect with these previous visitors and convert them into sales. One example of how companies do retargeting is that they can retarget all consumers who visited the site within the last 30 days (retargeting with an associated period).

Some studies suggest that you need seven contacts with a customer to make a sale. Retargeting is an invaluable tool for developing that relationship with prospects.

Case Studies:
During the campaign, DrapeStyle experienced a 38% increase in clicks. Previous site visitors who were retargeted converted into sales an astounding 5.5 times more often than prospects who were only targeted through DrapeStyle’s paid search campaigns. Moreover, DrapeStyle’s CPA (defined as sales or brochure requests) fell approximately $50 to an extremely cost-effective $2.80 when search and retargeting were combined. In summary, during a four-month period, the company invested $80 for retargeting and gained $22,500 in revenue. DrapeStyle’s retargeting campaign highlights the key benefits of adding retargeting as an extension to current online advertising and traffic-driving methods.

Reference

Leave a comment Continue Reading →