Tag Archives: performance

Speed up your website via caching

Introduction

Caching is a crucial performance tuning strategy, especially your system has high read to write ratio. You can perform caching strategy at different levels from client browser cache all the way to disk cache at server side. Lets take a brief look at where we can cache based on the invocation path for a request to be fulfilled:

  1. Client browser cache
  2. CDN network
    • A CDN is a network, like Akamai, where a web site such as JustProposed.com can offload high-bandwidth static files like photos and videos to another network, so that my web site doesn’t need to have such huge bandwidth to run. Since bandwidth is a major expense, especially as we grow or when we get slashdotted (in which case we run out of bandwidth), a CDN has looked interesting. However, Akamai is too expensive for us to use. So, we will go for the free network, Coral CDN.
    • Apart from the bandwidth, JustProposed.com has lots of non-USA users who sometimes find my site slow to use. So, CDN network gives us proximity advantages.
    • To use Coral CDN, you simply append nydu.net:8080 to the end of the hostname in the URL of your expensive resources. For example, http://www.justproposed.com/raydoris/myphoto.jpg to http://www.justproposed.com.nydu.net:8080/raydoris/myphoto.jpg
    • Coral looks great, the only problem I have with it is that it’s running on a high port, so that people behind proxy servers that don’t automatically support http over anything bug port 80 will have problems. To use Coral, follow this instruction.
  3. Reverse proxy server and content accelerator – Squid
    •  Why not use Apache as reverse proxy instead of putting Squid in front of Apache? Here are some of the benefits of this setup. The main reason is that Apache spawns out a new process per request that eats up lots of resources.
    •  

 

There are several things that you need to look at when you go for caching approach:

  1. What to cache? The data used by most web applications varies in its dynamicity, from completely static to always changing at every request. Everything that has some degree of stability can be cached. However, I always pick the ones that are most frequently access and/or expensive to compute and retrieve to cache because of the limited resource (ie. memory).

Application level caching (for J2EE)

JCS – Java Caching System

  1. Configuration
    • To understand the power of JCS, the best way is to look at its configuration file. To find out what is each configurable parameter does, take a look at this article.
  2. Integrate with Spring
    • To use JCS with Spring, take a look at this article. It talks about how to create a wrapper or Interceptor for your DAO and inject it to your service for caching purpose. To implement cache as an aspect with full control of what and how to cache, it doesn’t use the declarative Spring module caching approach. Regular dependency injection can do the trick!
  3. Distributed caching
    • JCS is a front-tier cache that can be configured to maintain consistency across multiple servers by using a centralized remote server (client-server) or by lateral distribution (peer-to-peer) of cache updates. 

Reference

  1. Speed up your LAMP stack with lighhttpd
  2. Squid and Apache on the same server – have squid listened on port 80 and apache listened on port 8080
  3. Squid configuration variable

 

Leave a comment Continue Reading →

Tomcat Performance Tuning

Most companies I have worked for use Tomcat as Servlet Container. It is de facto standard just like how Apache been used as Web Server. However, most of us just drag our war file to the webapp folder and use Tomcat with all the settings as default out of the box. It works fine in development environment but may not in production. This article will give you advice in several areas:

  1. Production Tomcat Architecture
  2. Tuning tomcat for performance
  3. Resolving problems which affect availability

 Production Tomcat Architecture

In production Tomcat relies on a number of resources which can impact its overall performance. Understanding the overall system architecture is key to tuning performance and troubleshooting problems.

  1. Hardware: CPU(s), memory, network IO and file IO
  2. OS: SMP (symmetric multiprocessing) and thread support
  3. JVM: version, tuning memory usage, and tuning GC
  4. Tomcat: version (example, Tomcat 6 supports NIO)
  5. Application: Application design can have the largest impact on overall performance
  6. Database: concurrent db connection is allowed (pooling and object caching)
  7. Web Server: Apache can sit in front of Tomcat and serves the static content. It also can do load balancing across multiple Tomcat instances.
  8. Network: Network delays.
  9. Remote Client: How fast is the communication protocol? Content can be compressed. 

Performance Tuning

How to measure and test performance

  • Request latency is key b/c it reflects the responsiveness of your site for visitors.
  • Test environment should match production as closely as possible.
  • The data volume is important to simulate in database side.
  • Test HTTP requests with different request parameters (test corner cases)
  • Use load test to simulate the traffics (ex. JMeter)
  • Final tests should be over longer periods like days because JVM performance changes over time and can actually improve if using HotSpot. Memory leaks, db temporary unavailable, etc can only be found when running longer tests.

JVM version, memory usage and GC

  • Sun Java 1.3 and later releases inlcude HotSpot profiling optimizer customized for long running server application.
  • Tomcat will freeze processing of all requests while the JVM is performing GC. On a poorly tuned JVM this can last 10′s of seconds. Most GC’s should take < 1 second and never exceed 10 seconds
  • Tune the -Xms (min) and -Xmx (max) java stack memory (set them to the same value can improve GC performance)
  • Make sure the java process always keeps the memory it uses resident in physical memory and not swapped out to virtual memory.
  • Use -Xincgc to enable incremental garbage collection
  • Try reducing -Xss thread stack memory usage

Tomcat version and configuration

  • Tomcat 6 supports NIO.
  • Set “reloadable” false – remove unnecessary detection overhead
  • Set “liveDeploy” to false – liveDeploy controls whether your webapps directory is periodically checked for new war files. This is done using background thread.
  • Set “debug” to 0
  • Set “swallowOutput” to true – This makes sure all output to stdout or stderr for a web application gets directed to the web application log rather than the console or catalina.out. This make it easier to troubleshoot problems.
  • Connector configuration – minProcessor, maxProcessor, acceptCount, enableLookups. Don’t set the acceptCount too high b/c this sets the number of pending requests awaiting processing. It is better deny few requests than overload Tomcat and cause problems for all requests. Set “enableLookups” to false b/c DNS lookups can add significant delays.

Database connection pool

  • We use connection pool provided by Spring instead
  • Using middleware to persist and cache objects from your database can significantly improve performance b/c of fewer db calls, less thrashing of the JVM for creation and subsequent GC of object craeted for resultset.

Application design and profiling

  • If the data used to generate a dynamic page rarely changes, modify it to a static page which you regenerate periodically.
  • Cache dynamic page
  • Use tool like JProble to profle your web applications during development phase
  • Look for possible thread synchronization bottlenecks
  • Date and Time thread synchronization bottleneck 

Troubleshooting

Collecting and analyzing log data

Common problems

  • Broken pipe – For HTTP Connector indicates that the remote client aborted the request. For web server JK Connector indicates that the web server process or thread was terminated. These are normal and rarely due to a problem with Tomcat. However, if you have long request, the connectionTimeout may close the connection before you send your response back.
  • Tomcat freezes or pauses with no request being processed – usually due to a long pause of JVM GC. A long pause can cause a cascading effect and high load once Tomcat starts handling requests again. Don’t set the “acceptCount” too high and use java -verbose:gc startup argument to collect GC data.
  • Out of Memory Exception – look into application code to fix the leak (profile tool can help). Increase available memory on the system via -Xmx. Restart tomcat!
  • Database connection failure – connection used up when traffic is spike.
  • Random connection close exception - when you close your connection twice. First close(), the connection returns to the pool. It may be picked up by another thread. Now, second close() may close a connection that is being used by other thread. Don’t close connection twice, use JDBC Template from Spring to avoid this problem. 

Reference

  1. JavaWorld GC Article
  2. Sun HotSpot Performance Document
  3. Tomcat Performance Slides

  

Leave a comment Continue Reading →

Database – Tuning Tips

Understand Query Processing Basic

Before tuning your slow query, you better understand how your database process your query first. Here are the steps:

  1. Each client connections gets its own thread within the server process (MySQL). The connetion’s queries execute within that single thread, which in turn resides on one core or CPU. The database server caches threads, so they don’t need to be created and destroyed for each new connection (ie. has its own thread pool).
  2. Look up the query cache to see whether the resultset of your query is available. MySQL does this via hashing your query and use the hashed value to check for the results in the cache (ie. Map[ hashed query : resultset] ). MySQL uses the exact query text it receives, so the cache is sensitive to the most trivial variations like case, space and etc. Best practice: Have your application generates the queries. If your query cannot be found in the query cache, go to step 3. Once the table is updated, the query in the cache will be flushed and updated.
  3. Parse the query to create an internal structure (the parse tree).
  4. Analysis the query
  5. Optimize the query – Create the query plan that is believed to be the most efficient way to execute a query. The effort center around indexes and table join order. Use Explain to look into the query plan that database uses for your query.
  6. Execution
  7. Send the resultset back to client

Approaches that can speed up your queries

  1. Index tables to allow the database server to look up rows more quickly.
  2. Write effective SQL that takes advantage of those indexes to the fullest extent, and use the EXPLAIN statement to check whether the RDBMS really is doing so.
  3. Tune the hardware

Concept of Indexing

The article from Paul Dubios gives a great summary of the benefit and cost of indexing and when we should use it. If you don’t want to read up 6 pages of his article (but I really suggest you do), here is my summary.

Benefits/Cost of Indexing

  1. The column you indexed are sorted and you can combine few columns together (order is important) as single index (ie. multi-columns index).
  2. Indexed column is sorted. To locate record with last name equals to “HON”, you will look it up via lastname index and pull consecutive entries until it is not equal to “HON” and you can ignore the rest. Therefore, one efficiency gained by using the index is that we can tell where the matching rows end and can skip the rest. Another efficiency comes about through the use of positioning algorithms for finding the first matching entry without doing a linear scan from the start of the index (for example, a binary search is much quicker than a scan). Normally, B-Tree index is applied by default.
  3. If you search your table in different ways, you can have more than 1 index. Remember at most one index per table is picked by RDBMS each time you query a table. You can get around this limitation via composite index.
  4. Index may be stored separately from data. But MySQL InnoDB stores them together.
  5. Indexes are always sorted, but the data on disk is not. Using an index means accessing the rows in index-sorted order (ie. random seek) rather than in the order they reside on the disk (ie. sequential access). The end result is more time spent moving around the disk and less time reading data. You can draw 2 conclusions form this knowledge:
    • If a table is going to remain very small, you may want to leave off the indexes.
    • Always use the production data for your test

Choose columns to index

  1. Index columns that you use for searching, sorting, or grouping, not columns you only display as output. In other words, the best candidate columns for indexing are the columns that appear in your WHERE clause, columns named in join clauses, or columns that appear in ORDER BY or GROUP BY clauses. If RDBMS can optimize a query using joined columns, it cuts down the potential table-row combinations quite a bit by eliminating full table scans.
  2. Consider column cardinality. The cardinality of a column is the number of distinct values that it contains. Indexes work best for columns that have a high cardinality relative to the number of rows in the table (that is, columns that have many unique values and few duplicates). The query optimizer generally skips an index in favor of a full table scan if it determines that a value occurs in a large percentage of a table’s rows. The conventional wisdom for this percentage used to be “30%.” In that case, your index gives you no benefit but cost to maintain!
  3. Favor the column with smaller data type to index. Faster for comparsion, less disk IO required, more can be cached in memory. For the InnoDB and BDB storage engines that use clustered indexes, it’s especially beneficial to keep the primary key short. A clustered index is one where the data rows are stored together with (that is, clustered with) the primary key values. Other indexes are secondary indexes; these store the primary key value with the secondary index values. A lookup in a secondary index yields a primary key value, which then is used to locate the data row. The implication is that primary key values are duplicated into each secondary index, so if primary key values are longer, the extra storage is required for each secondary index as well.
  4. Take advantage of leftmost prefixes. When you create an n-column composite index, you actually create n indexes that MySQL can use. A composite index serves as several indexes because any leftmost set of columns in the index can be used to match rows. Such a set is called a “leftmost prefix.” Suppose that you have a table with a composite index on columns named state, city, and zip. Rows in the index are sorted in state/city/zip order, so they’re automatically sorted in state/city order and in state order as well. This means that MySQL can take advantage of the index even if you specify only state values in a query, or only state and city values. However, it cannot use the index for searches that don’t involve a leftmost prefix. For example, if you search by city or by zip, the index isn’t used

Concept of Concurrency

To provide better concurrency, below are the rules of thumbs that I used

  1. Locking level: Locking at a finer level allows better concurrency (ie. row level is better than page level, and page level is better than table level), because more clients can be using a table at the same time if they use different parts of it.
  2. Table locking does have an advantage over finer levels of locking in terms of deadlock prevention.
  3. MVCC is new mechanism to achieve better concurrency b/c read doesn’t block write and vice versa..

Schema design and structuring

This section will cover the techniques that you can leverage to design your schema with performance in mind.

  1. Normalize first then denormalize later. Don’t over normalize. Use 3rd Normal Form
  2. Use the right data type. Don’t waste space unnecessarily (general)
    • Use INT unsigned for IPv4 address
    • Pay attention on TEXT, BLOB and BIGINT data types. Consider separate table for TEXT, filesystem for BLOB and other numeric data type for BIGINT.
  3. Partitioning
    • Vertical partitioning – split table with many columns into multiple tables (ex. static vs dynamic columns, frequent vs infrequent access pattern)
    • Horizontal partitioning – split table with many rows into multiple tables (ex. user-based)
    • MySQL 5.1 partitioning has issues (Postgresql and Oracle are more mature in this area).
  4. Understand the benefit and disadvantages of different storage engines (mysql)
  5. Leverage query cache effectively
    • Each time an update on the any record in the table, all queries referencing the table are invalidated in the query cache. So if you split out the dynamic fields into another table, you can make your query cache less frequent in thrashing.
  6. Index smartly

Write effective SQL

  1. Use ANSI SQL style that explicitly declare JOIN conditions using the ON clause. This syntax can handle Outer join.
  2. Use Explain plan to find out how RDBMS chose the execution plan for your SQL. Tune it accordingly.
  3. Don’t use indexed field in function call
  4. In explain plan, “Extra: Using Index” means FULL Index Scan. It happens when scan is better than seek and index table is enough to satisfy the query.
  5. Why full table scan?
    • No WHERE condition
    • No index on any type of field in the WHERE clause
    • Poor selectivity/ cardinality on an indexed field
    • Too many records meet WHERE condition (scan favors seek).
    • < MySQL 5.0 and using OR in WHERE clause

Put it all together

  • Think in set instead of iteration.

Example: For each customer, fetch the last payment record in the Payment History table. You can solve this query via correlated subquery. That is:

select p.* from Payment_History p
where p.payment_date = (select max(payment_date) from Payment_History where customer_id = p.customer_id)

The above query can be optimized a bit if you use compound index (customer_id, payment_date). However, if you want to achieve the ultimate performance, you should not use correlated subquery b/c it will execute n times and n = number of rows in the Payment_History table. To avoid that, you can use derived table. The query will be:

SELECT p.* FROM (select customer_id, max(payment_date) as last_order from Payment_History group by customer_id) as last_orders
INNER JOIN Payment_History p
ON p.customer_id = last_orders.customer_id
AND p.payment_date = last_orders.last_order

Optimize the N:M joins

If you write an application that you can tag a project many times. You may model your database schema with Project table, Tag table and Tag2Project join table. When you want to grab all projects that either tag by “mysql” OR “php”. You may write a select statement that have Project inner join Tag2Project with project_id and resultset inner join Tag table filtered by “mysql” or “php” with tag_id. However, optimizer may not join it as the order you specified. It may have Tag table filtered first then inner join Tag2Project table and pass the resultset to inner join Project table. You can force the sequence of join via using derived table.

On the other hand, if you want to grab all project that has tagged by both mysql AND php. You may end up solve it via using derived table from Tag2Project and Tag that “GROUP BY” project_id and having count(*) = 2 with Tag filtered by IN clause.

Configurable variables

Before you start tuning your server for mysql performance, you need to obtain the system information first. To do that, there are some linux command that may be useful. top command

top_display.JPG How to interpret the output from top command? Here you go.

  1. The first line is telling you that the system is up 211 days 9 hr and 38 minutes. There are 2 users using the system (from process table, they are mysql and ec users). For the last 1, 5 and 15 minutes, the load of the system shows 0.01, 0.11 and 0.21 respectively.
  2. The 2nd line shows you that there are 82 processes running in the system, only 1 of them are active (In the process table, S=R means Running) and 81 of them are in sleeping mode (S = S means sleeping).
  3. The 3rd – 5th line are very useful. It gives you the current status of our system. CPU is 99% idle. If it is not, you want to find out who is using it. The process table is sort by %CPU (if you have dual core processor, the % here can reach 200). In term of memory, the total memory in the system is around 4G in total. As you can see, 3.5 G is in used. For our mysql instance, it is using 1.2G of memory that is about 32% of the used memory. Another interesting thing to examine is the swap memory. There is 2G of swap memory allocated in this system and the system barely uses it. It is good because system takes time to swap in and out the data from memory to disk and vice versa.
  4. Troubleshooting: When you see mysql process is taking high %CPU – CPU-bound, there may be queries costing intensive cpu cycles. You can look into it via mysql administrative commands. When you see high % of IO wait – ie. IO-bound, you may have inefficient queries that cause mysql reading too many rows to locate the data you are interested in. In system level, you could reduce the # of IO via better use of memory/buffer or reduce the seek time via faster disk and less random data storage.

Other linux commands

  1. more /proc/version (check linux os version)
  2. more /proc/cpuinfo (for dual processor with speed 2.80MHz: cpu core = 2, processor = 1, cpu MHz = 2.80)

Hardware Tuning Tips

There are several key areas that we normally look at when we try to tune our database in hardware perspective.

  1. Add more memory if you can – reduce # of IOs (primary bottleneck).
  2. Achieve parallelism - multi-processor for processing and parallel write to different disk devices (eg. log and database store in different disk device). MySQL is multi-threaded.
  3. Faster disk to improve IO latency
  4. RAID can give you some advantages of parallelism

In MySQL, most of the memory mysql allocates is used for various internal buffers and caches. These buffers fall into 2 major groups: global vs per-connection buffers. The 2 most important global buffers are MyISAM key buffer (key_buffer_size) and InnoDB’s buffer pool (innodb_buffer_pool_size).

  • For MyISAM, remember that msyql doesn’t cache row data but the index block. So, the less often mysql needs to hit the disk to scan a table’s index, the faster the queries will be. In practice, OS may somehow help you out via caching some row data in cache. Even this, if possible, consider making the key buffer large enough to hold the indexes for your most actively used tables. By adding up the size of the .MYI files for the tables, you will have a good idea to set the buffer.
  • For InnoDB, both cache index and row data are together in its buffer pool via using the clustered indexes. So if the table size is not large, you will achieve the best performance via storing the whole table in cache because you don’t need to do any IO. However, if you modify the data in the table, you need to sync up your disk either per min or per commit.

Allocate enough memory for mysql

min_memory_needed = global_buffers + (thread_buffers * max_connections)

thread_buffers includes:

  • sort_buffer
  • myisam_sort_buffer
  • read_buffer
  • join_buffer
  • read_rnd_buffer

global_buffers includes:

  • key_buffer
  • innodb_buffer_pool
  • innodb_log_buffer
  • innodb_additional_mem_pool
  • net_buffer

When there are large number of concurrent users like 300-400, a particular nasty series of events may happen. The reason is large number of threads need to allocate additional memory, it can cause mysql to allocate so much memory that the OS begins swapping, which causes performance to degrade further, which means that each query takes longer to complete. With queries running more slowly, the odds of more threads needing memory increases. It is a vicious spiral. The only solution is to restore balance between the system’s memory and mysql’s memory needs. That means doing one of the following:

  1. Add more memory
  2. Decrease max_connections
  3. Decrease some of the per-thread buffer sizes

Be proactive, monitor memory use on your server.

Below are some quick tips for MySQL tuning that I have learnt from my colleagues and books. To keep this article short and useful as quick reference, I will just put the common ones here.

  • key_buffer size. (for MyISAM – default at only 8MB)
  • innodb_buffer_pool_size (for InnoDB – default at only 8MB) – reference

You need buffer pool a bit (say 10%) larger than your data (total size of Innodb TableSpaces) because it does not only contain data pages – it also contain adaptive hash indexes, insert buffer, locks which also take some time – Peter Zaitsev

  • Bumping these values up can decrease disk IO and make your database run a lot smoother.Use indexes. Enable the slow query log and look for queries that show up most often, then learn the explain command on them.
  • Check Created_tmp_disk_tables in show status; This value usually goes in hand with huge queries that aren’t using indexes. If it keeps increasing it means you are having to create on disk files to execute the query. Consider increasing tmp_table_size or putting your disk files on a TMPFS by setting tmpdir = /tmp.
  • Use the query cache if it can help you. Check the query_cache_size. Check how well it is performing. You can do this by doing a “show status” and then using this formula:

qcache_hit_ratio = qcache_hits / (qcache_hits + qcache_inserts + qcache_not_cached)

  • If there are a lot of Qcache_lowmem_prunes, that means your query cache size isn’t big enough.
  • If Opened_tables is increasing you probably have table_cache too small.
  • If you have a lot of connections, check Threads_created in show status; If it keeps increasing, you need to up the thread_cache_size in my.cnf. The max connections in MySQL are default at 200.

Reference

Below are some references I used to write this article

  1. MySQL Toolkit
  2. One large table vs many small tables
  3. The Art of SQL Tuning (Jay Pipes has a great presentation on sql tuning)
  4. MySQL Performance Blog
  5. Peteris’s Blog on SQL Tuning
  6. MySQL Query Optimization

 

Leave a comment Continue Reading →