Archive | Data Store RSS feed for this section

Postgresql – any easier way to create a sample database?

матраци

I wanted to have my new code test against the real data in production. According to the typical company policy, I should not have any of my non-production code run against production data. Period. It makes a lot of sense. So, I decided to build my sample database in test environment and have subset of the production data loaded into it. I just needed one table to play around and I don't want to backup the whole table because the size of the table in production is too big for me. Using the popular postgresql, I expected this kind of request could be fulfilled easily but it eventually took me more than I had thought. Why so complicated? Did I miss anything? Let me show you what I did and please let me know if you have an easier way to achieve this.

Leave a comment Continue Reading →

Hive on Amazon EC2 cloud

adserving-ec2-hive-system-arch

 

I ever worked for a display ad network company that collects over 400 million of impression/ click logs per day. With this amount of data, my ex-company bought a supercomputer and cross their fingers that it can handle the grow in both volume and analytic demand of the data. It is obviously not a scalable solution. However, what is the best solution?

Although I haven’t worked for this company anymore, it is still an interesting problem to solve. I have a great friend who proposed a shared nothing solution for this company. The solution is to partition the data across a set of Postgresql databases and put Greenplum on top of them to parallelize the query —there is no disk-level sharing or contention to be concerned with (i.e. it is a ‘shared-nothing’ architecture). I like this approach. The only thing is that Greenplum is not free and it may be difficult for a startup to face this upfront cost. Apart from that, this setting requires all the databases are running on the same network that hindered us to move this in the elastic cloud like Amazon EC2.

Later on, I joined a great company in the same industry that seeks for a solution in the cloud to host its data warehouse. So, I got a  chance to revisit this problem. During the research, I came across an interesting technology – column-based database (eg. infobright and lucid db). The idea of column-based data store is that traditional database stores and fetches data in row from data files into the memory. It is inefficient if your query only requires few columns for computation. So, column-based data stores your data in column with effective compression algorithm due to all values in it has the same data type. This solution is great but it doesn’t do MPP (ie. massive parallel processing) and it is also not ready for cloud yet.

Here comes another solution. That is Hive on top of Hadoop on top of Amazon cloud. It is an interesting idea. Check out this video to learn about this.


If you are not sure what Hadoop is and want to get some warm up in massive computing. I suggest you go through the following 5 excellent Google lectures.


Leave a comment Continue Reading →

Database concurrency control – MVCC

Concurrency Issue – Lost Update

Lost update is the key concurrency problem that we try to avoid:

From the sequences above, SELECT-UPDATE transaction A will overwrite the update Transaction B made to the balance. .  If the transactions of A and B would serialize properly, the correct balance value after these transactions would be 700. For performance reasons, most RDBMS uses the isolation level “READ COMMITTED” as default. However, this isolation level does not protect any data read by transaction of getting outdated right after reading the value as another transaction can change it. You can use isolation level “REPEATABLE READ” or  “SERIALIZABLE” to protect the values read in the transaction of getting outdated via holding shared read lock on these rows up to the end of the transaction. In case you are new to RDBMS, let me give you some background.

There are 3 things we are trying to avoid in reliable RDBMS:

  1. Dirty read – if transaction A can read uncommitted changes by transaction B, it may be working on the data that may be rolled back if transaction B cannot successfully commit the changes.
  2. Non-repeatable read – Transaction A issues the same select twice during the transaction may get different results as some other transactions may commit its changes in between.
  3. Phantom read – Transaction A issues the same select twice during the transaction may get new rows as some other transactions may insert new records that fits the criteria of the query.

To protect the 3 issues above, RDBMS provides us 4 level of isolations:

  1. READ UNCOMMITTED – lowest level – no protection at all. Dirty, non-repeatable and phantom read are possible.
  2. READ COMMITTED – default setting – cannot protect non-repeatable and phantom read
  3. REPEABLABLE READ – phantom read is still possible
  4. SERIALIZABLE - highest level with largest overhead as it forces all transaction to run in serial fashion (ie. one by one). However, it resolves all the issues above.

Under the hood, RDBMS uses locking scheme to achieve the 4 transaction isolation level above. How? It uses 2 kinds of locks:

  1. Shared read lock – more than one read concurrently but no write is possible
  2. Exclusive write lock – once exclusive write lock is acquired, no one can read and write until it is released.

In other words, read locks block write lock and vice versa. In the above Lost Update problem, transaction A can use shared read lock to make sure no write is possible for the data it read during the course of its transaction. So far so good? :grin:. Lets move on!

Using high level of isolation to solve the lost update issue may give you a new set of issues related to locking:

  1. Long-running transaction – if transaction A takes a long time to commit, other transactions that wants to change the value will be queued up. Normally, if your transaction read info out from db, change it in UI and write it back. You should consider it as long transaction as you never know how long it takes for user to change it in UI (ie. user think time). In this scenario, you will split the transaction into 2. (ie. a transaction to read and a transaction to write back). However, once you split up the transaction, you have no way to guarantee  your data of the previous read up-to-date no matter what isolation level you use. How to solve this problem then? :shock: It can be solved by checking the version or timestamp of the data in database before you write it in the write transaction. If the version is the same as the first read transaction, it means no one has changed it even you leave it wide open, then you are safe to change it. Otherwise, an exception should be thrown. This mechanism is called optimistic locking.
  2. Deadlock – if transaction A needs other resources in order to commit its transaction but these resources are holding up by a transaction that is waiting for A to commit, it will create a typical circular dependency.

Locks are not just for row. It also happens to index. That is to say, for heavy updates to distinct rows of a table, the bottleneck can be index locking.

New Database Currency Control – MVCC

The aim of Multi-Version Concurrency Control (MVCC) is to avoid Writers blocking Readers and vice-versa, by making use of multiple versions of data (ie. No read lock is necessarily). The problem of Writers blocking Readers can be avoided if Readers can obtain access to a previous version of the data that is locked by Writers for modification. That is to say, there may not have concurrent write.  However, while MVCC improves database concurrency, its impact on data consistency is more complex. I will discuss about this later. Now lets look at how to implement MVCC.

Challenges in implementing MVCC

  1. If multiple versions are stored in the database, an efficient garbage collection mechanism is required to get rid of old versions when they are no longer needed.
  2. The DBMS must provide efficient access methods that avoid looking at redundant versions.
  3. The DBMS must avoid expensive lookups when determining the relative commit time of a transaction.

There are essentially two approaches to multi-version concurrency.

  1. The first approach is to store multiple versions of records in the database, and garbage collect records when they are no longer required. This is the approach adopted by PostgreSQL and Firebird/Interbase.
  2. The second approach is to keep only the latest version of data in the database, but reconstruct older versions of data dynamically as required by exploiting information within the Write Ahead Log. This is the approach taken by Oracle and MySQL/InnoDb.

How does Postgresql implement MVCC

In PostgreSQL, when a row is updated, a new version (called a tuple) of the row is created and inserted into the table. The previous version is provided a pointer to the new version. The previous version is marked “expired”, but remains in the database until it is garbage collected. So, to achieve this, each tuple needs 2 additional fields:

  1. creation id/ xmin - The ID of the transaction that inserted/updated the row and created this tuple.
  2. expired id/ xmax - The transaction that deleted the row, or created a new version of this tuple. Initially it is null.

A row is visible if:

  • its creation id is a committed transaction (ensure it doesn’t read uncommitted data from other open transaction) AND its creation id is < current transaction id (ensure non-repeatable read)
  • its expired id is null (ensure it is not stale data) or > current transaction id b/c if it is deleted by the transaction after the current one, the current transaction should ignore it.

To track the status of transactions, a special table called PG_LOG is maintained. Since Transaction Ids are implemented using a monotonically increasing counter, the PG_LOG table can represent transaction status as a bitmap. This table contains two bits of status information for each transaction; the possible states are in-progress, committed, or aborted.

One of the drawback of MVCC in PostgreSQL is that old version tuples are kept in the same table. To garbage collected these, you can “VACCUM” it.

Reference

Below are the references I used for this artice:

  1. Paper – Data Access Pattern
  2. PostgreSQL Ends the Waiting Game
  3. MVCC implementation algorithms in different databases
  4. MarkMail – very good postgresql forum
  5. Postgresql 8.3.6 Manual

  

         

 

Leave a comment Continue Reading →

Concurrent Programming – Part 1 Synchronization

Get yourself familiar with concurrency programming

When I interview my candidates, I like to ask questions related to multi-threading. I found out that it is a good topic to differentiate out a hardcore programmer from application-oriented programmer. I am not saying I am looking for someone who could write the concurrency library as efficient as the one created by Doug Lea. In fact, I am looking for candidates who has solid understanding of this topic. However, I found out that most candidates have little knowledge in this area apart from the meaning of “synchronized” keyword in Java syntax. Therefore I decide to write a series of articles to cover some areas of multi-threading that I feel important to understand. Of course, I would start from the basic first.

Introduction of Synchronization

Synchronization is a way to lock an object, so no 2 threads possibly running on the same code at the same time.

public class SynchronizedCounter {
    private double c = 0;

    public synchronized void increment() {
        c++;
    }

    public synchronized void decrement() {
        c--;
    }

    public synchronized int value() {
        return c;
    }

    public void method2() {
        ...
    }
}

If count is an instance of SynchronizedCounter, then making these methods synchronized has two effects:

  • Mutual Exclusion – It is not possible for two invocations of synchronized methods on the same object to interleave. When one thread is executing a synchronized method for an object, all other threads that invoke synchronized methods for the same object block (suspend execution) until the first thread is done with the object. Remember this rule doesn’t apply to non-synchronized methods. And the thread holds the lock of the object can reenter its synchronized methods (ie. reentrance).
  • Memory-Visibility – When a synchronized method exits, it automatically establishes a happens-before relationship with any subsequent invocation of a synchronized method for the same object. This guarantees that changes to the state of the object are visible to all threads. Most of the interviewers miss this one. :smile: In database, it is like the concept of “commit”. If you don’t commit your changes, others could not see your changes.

All in all, synchronized methods enable a simple strategy for preventing thread interference and memory consistency errors. Interference happens when two operations, running in different threads, but acting on the same data, interleave. This means that the two operations consist of multiple steps, and the sequences of steps overlap. This will result in unpredictable data lost (hard to fix). Memory consistency error occurs when complier and processor reorder statements and uses the cached value for better performance

Problem of Synchronization

Synchronization will serialize the method calls from different threads. At any given time, only one thread can execute the synchronized method and the other threads need to wait until the object lock releases. This will dramatically diminish the liveness of your application. To minimize the impact, you should:

  1. Reduce lock duration – Synchronized statements are useful for improving concurrency with fine-grained synchronization
  2. Reduce lock scope – Mutex variable in the synchronized lock may help you to avoid locking the whole object. In Java 5 concurrency library, there is class called ReentrantLock that provides the same features as synchronized with better performance and flexibility. Here is what is stated in “Java Concurrency in Practice”:

Why create a new locking mechanism (ie. ReentrantLock) that is so similar to intrinsic locking (ie. synchronized)? Intrinsic locking works fine in most situations but has some functional limitations. It is not possible to interrupt a thread waiting to acquire a lock, or to attempt to acquire a lock without being willing to wait for it forever. Intrinsic locks also must be released in the same block of code in which they are acquired; this simplifies coding and interacts nicely with exception handling, but makes non-block structured locking disciplines impossible. None of these are reasons to abandon synchronized, but in some cases a more flexible locking mechanism offers better liveness or performance.

So far so good.? Great! lets me ask you 3 questions:

  1. Question 1: In the example above, if thread A is executing a synchronized method “increment”, can another thread execute method2? Yes. Because method2 is not synchronized
  2. Question 2: If thread A is in the synchronized method xyz , can it invoke another synchronized abc? Yes. The object lock is reentrant.
  3. Question 3: If I want to make the above class thread-safe without using synchronized object lock, are there any other alternatives? Yes.
  4. Question 4: How about declare the variable volatile?
    • Managing Volatility by Brian Goetz
    • Compare Atomic variable, ReentrantLock and Volatile variable.
    • Use int, byte instead of long or double because updating int or byte is an atomic action. Atomic actions cannot be interleaved, so they can be used without fear of thread interference. However, this does not eliminate all need to synchronize atomic actions, because memory consistency errors are still possible (for example, thread A updated an variable atomically but it hasn’t flushed and sync up the main memory, so it is not visible to other threads). Unless the fields in question are declared [code]]czoxMDpcInZvbGF0aWxlLCBcIjt7WyYqJl19[[/code] the JMM does not require the underlying platform to provide cache coherency or sequential consistency across processors, so it is possible, on some platforms, to read stale data in the absence of synchronization. Look at here for better explanation. However, volatile can only guarantee atomicity and memory consistency for single variable. If you want to guarantee that for compound operations, you need to use synchronized block.(or the new java.util.concurrent classes). It is worth pointing out that increment (i.e. ++) and similar operations are not atomic in Java. So incrementing a volatile variable [code]]czoxMzpcInZvbGF0aWxlVmFyKytcIjt7WyYqJl19[[/code] is NOT thread-safe. If you need thread-safe semantics i.e. no possibility of multiple threads corrupting the variable value by having the updates unexpectedly interfere with each other, then you need to use a synchronized block to increment a variable, e.g. [code]]czoyNzpcInN5bmNocm9uaXplZChMT0NLKXtteVZhcisrfVwiO3tbJiomXX0=[[/code], regardless of the overheads this causes – Java Performance Tuning

More On Thread Safety

All the techniques I discussed so far is to show you how to make your code thread safe. They are applicable only if you have to share resources across multiple threads and those threads may modify the resources. That is to say, if you don’t share any resources or other threads only read but no write, your code are thread safe already. Here are some tips related to not sharing and read-only sharing:

  1. You can use local variables to carry out logic in the methods if possible (not share)
  2. You can use TheadLocal to hold the resources if you want to access it across multiple methods for the same thread (not share)
  3. You can use immutable object (private variable without setXXX methods) for read-only sharing. For example, String and PrimitiveWrappers like Integer. However, make sure the declare final for the reference that holds your immutable object.
  4. Most of the time, you use collection classes like HashMap, ArrayList to hold our objects. Those classes are not thread safe. To make it thread safe, you may use Collections.synchronized wrappers or simply use the synchronized version of them like Hashtable and Vector. However, these approaches have 2 problems
    • They are not performed. Why lock every reads only to protect occasional write?
    • They are just conditionally thread-safe. All individual operations are thread-safe, but sequences of operations where the control flow depends on the results of previous operations may be subject to data races like doing containsKey(), size() and iterator() methods before actually read and write can give you NullPointerException and ConcurrentModificationException if you don’t do external synchronization.
    • Here are the unconditionally thread-safe version like ConcurrentHashMap, ConcurrentLinkedQueue and CopyOnWriteArrayList to achieve thread-safe with good performance number.
    • When you write an unconditionally thread-safe class, consider using private lock object in place of synchronized methods. This protect you against synchronization interference by clients and subclasses and gives you the flexibility to adopt a more sophisticated approach to concurrent control in later release – Joshua Bloch in Effective Java 2nd version p.281.
  5. Deal with lazy initialization
  6. Handle denial of service attack that holds the object lock forever

Java Memory Model

JMM is what causes concurrent programming way more complicated than it should be. Honestly, I am not good to write this part because I cannot understand it in full. All I can do is to provide you a video from Jeremy Manson in Google. Hear what the expert said:

 If you still have questions, make sure go to his blog

http://jeremymanson.blogspot.com/

Reference

Below are some of the articles I use:

  1. What does volatile do?
  2. Sun lesson on concurrency
  3. Fixing Java Memory Model – Brian Goetz – Part 1, Part 2
  4. Rox Java NIO Tutorial
  5. Blocking Queue
  6. Synchronization and Java Memory Model – Doug Lea 

 

Leave a comment Continue Reading →

Common DBA jobs

Export schema/ data out from mysql

To export schema and/or data, you can use mysqldump command:

mysqldump -u [username] -p[password] -d [schema_name] > [filename].sql

  1. -d means no data (just gives me the schema).
  2. -B is needed for multiple schema output
  3. -h (hostname)

Export data out from postgresql

  1. Export table data from postgresql to csv format
  2. Backup and restore database in postgresql

However, if you want to export sql result set to csv in postgresql, you can consider to use COPY functionality.

COPY ( select statement ) TO STDOUT WITH CSV
COPY stock FROM ‘mydir/Stock.csv’;

Run sql script using mysql command

To run the scripts as input, we can use the following command:

mysql [schema_name] -u [username] -p[password] < [filename].sql

SQL Tips

There are times we want to put logic in SQL but not writing store procedure. Here are some of using functions that may get you there:

  • Conditional statement – CASE WHEN xxx THEN abc WHEN yyy THEN bbc …ELSE ccc END

UPDATE Account SET Sales_Location__c =
    CASE WHEN Sales_Country__c != ” THEN Sales_Country__c WHEN Country__c != ” THEN Country__c
ELSE ‘–’ END

  • COALESCE (input1, input2,….) – This function takes in as many parameter as you want and return you the first non-NULL parameter. Suppose we have a table A having 3 columns FullName, CompleteName and DisplayName. Any of these columns can contain null values. Now we want to select the DisplayName from this table, but if it is null, then return FullName, if that is also null then return CompleteName. We can easily perform the same in one select statement as: (COALESCE vs ISNULL)

SELECT COALESCE(DisplayName, FullName, CompleteName) From A

 ETL

In mysql, you can export a table from db1 and import it to db2 remotely. For example, in db2 host, you can issue the following command:

/usr/bin/mysqldump – -force – -compress – -opt -u [username] -p[password] -h [hostname] db1

| /usr/bin/mysql -u [username] -p[password] -D db2

Leave a comment Continue Reading →

Database Performance – Indexing

There are 2 main focuses I will take to analyze a database. First, I will find out how it manages the data. Second, I will look at how it scales in term of data volume and traffics. Today, I will talk about the most common indexing scheme that most of the databases use today. It is B-Tree Indexing.

B-Tree Indexing

begin figure description - The paragraph that precedes this figure describes the content of the figure. - end figure description
 
Many people think B-Tree means binary tree. It is not right. If I really use binary tree to structure the index, 1 million index values will have a very deep tree to traverse and each node retrieval is equivalent to a read operation. Then, it may take so many reads to get down to the leaf node. How can it be performed? Instead, B-Tree means balanced tree. A B-tree is said to be balanced because it will never become lopsided as new nodes are added and removed. Apart from that, each node can have many sub-nodes. So, for millions of records, it can be handled by 2-3 levels of balanced tree. So, it is very good in performance. It offers O(log n) performance for a single-record lookups.
 
The fundamental unit of an index is the index item and a node is an index page that stores a group of index items. An index item contains a key value that represents the value of the indexed column for a particular row. An index item also contains rowid information that the database server uses to locate the row in a data page. A node is an index page that stores a group of index items.
 
It is interesting to note that the leaf nodes of the index are actually  a doubly linked list. Once we find out where to start in the leaf node (find the first value), doing an ordered scan of value (index range scan) is very easy. We don’t have to navigate the structure any more; we just go forward through the leaf nodes. That makes solving range-based queries such as "BETWEEN 20 and 30" much easier.
 

When should you use B-Tree Index?

To understand when you should use B-Tree index, you should know there are 2 ways to use an index. First, you can use index as a mean to access rows in a table via rowid. If you use index for that, you want to access a very small percentage of the rows in the table. Otherwise, you need to get into "index then row" cycle many times (implies many IOs) and it will be worse than pulling bunch of rows in batch to reduce the number of IOs (the costly part of database operation). According to the experiment done, full scan is faster if  we access too high % of  rows via index. Second, you can use index as a mean to answer the query if the index contains enough information to answer the entire query. In this case, we don’t need to go to the table at all. The index will be used as a thinner version of the table. So, if you want to access a large % of rows via index, you should consider to get the query answer via the information in the index.

MySQL Indexing

There are several rules to remember for MySQL indexing

  1. MySQL will only ever use one index per table per query (except for UNION b/c it is considered as separated queries).
  2. To get around that, you can create multicolumn indexes.
  3. When there are more than 1 indexes to choose from, MySQL makes an educated guess based on the statistics gathered.
  4. MyISAM has indexes kept in a completely separate file from table rows. And table rows are stored in the random order that are retrieved by the rowid in the index items.
  5. InnoDB uses clustered indexes that has primary key and the record itself clustered and the records are all stored in primary-key order. When your data is almost always searched on via its PK, clustered indexes can make lookups incredibly fast because single lookup can pull out record in question.
  6. Primary key cannot contain NULL whereas unique index can.

 

Leave a comment Continue Reading →

Database – Tuning Tips

Understand Query Processing Basic

Before tuning your slow query, you better understand how your database process your query first. Here are the steps:

  1. Each client connections gets its own thread within the server process (MySQL). The connetion’s queries execute within that single thread, which in turn resides on one core or CPU. The database server caches threads, so they don’t need to be created and destroyed for each new connection (ie. has its own thread pool).
  2. Look up the query cache to see whether the resultset of your query is available. MySQL does this via hashing your query and use the hashed value to check for the results in the cache (ie. Map[ hashed query : resultset] ). MySQL uses the exact query text it receives, so the cache is sensitive to the most trivial variations like case, space and etc. Best practice: Have your application generates the queries. If your query cannot be found in the query cache, go to step 3. Once the table is updated, the query in the cache will be flushed and updated.
  3. Parse the query to create an internal structure (the parse tree).
  4. Analysis the query
  5. Optimize the query – Create the query plan that is believed to be the most efficient way to execute a query. The effort center around indexes and table join order. Use Explain to look into the query plan that database uses for your query.
  6. Execution
  7. Send the resultset back to client

Approaches that can speed up your queries

  1. Index tables to allow the database server to look up rows more quickly.
  2. Write effective SQL that takes advantage of those indexes to the fullest extent, and use the EXPLAIN statement to check whether the RDBMS really is doing so.
  3. Tune the hardware

Concept of Indexing

The article from Paul Dubios gives a great summary of the benefit and cost of indexing and when we should use it. If you don’t want to read up 6 pages of his article (but I really suggest you do), here is my summary.

Benefits/Cost of Indexing

  1. The column you indexed are sorted and you can combine few columns together (order is important) as single index (ie. multi-columns index).
  2. Indexed column is sorted. To locate record with last name equals to “HON”, you will look it up via lastname index and pull consecutive entries until it is not equal to “HON” and you can ignore the rest. Therefore, one efficiency gained by using the index is that we can tell where the matching rows end and can skip the rest. Another efficiency comes about through the use of positioning algorithms for finding the first matching entry without doing a linear scan from the start of the index (for example, a binary search is much quicker than a scan). Normally, B-Tree index is applied by default.
  3. If you search your table in different ways, you can have more than 1 index. Remember at most one index per table is picked by RDBMS each time you query a table. You can get around this limitation via composite index.
  4. Index may be stored separately from data. But MySQL InnoDB stores them together.
  5. Indexes are always sorted, but the data on disk is not. Using an index means accessing the rows in index-sorted order (ie. random seek) rather than in the order they reside on the disk (ie. sequential access). The end result is more time spent moving around the disk and less time reading data. You can draw 2 conclusions form this knowledge:
    • If a table is going to remain very small, you may want to leave off the indexes.
    • Always use the production data for your test

Choose columns to index

  1. Index columns that you use for searching, sorting, or grouping, not columns you only display as output. In other words, the best candidate columns for indexing are the columns that appear in your WHERE clause, columns named in join clauses, or columns that appear in ORDER BY or GROUP BY clauses. If RDBMS can optimize a query using joined columns, it cuts down the potential table-row combinations quite a bit by eliminating full table scans.
  2. Consider column cardinality. The cardinality of a column is the number of distinct values that it contains. Indexes work best for columns that have a high cardinality relative to the number of rows in the table (that is, columns that have many unique values and few duplicates). The query optimizer generally skips an index in favor of a full table scan if it determines that a value occurs in a large percentage of a table’s rows. The conventional wisdom for this percentage used to be “30%.” In that case, your index gives you no benefit but cost to maintain!
  3. Favor the column with smaller data type to index. Faster for comparsion, less disk IO required, more can be cached in memory. For the InnoDB and BDB storage engines that use clustered indexes, it’s especially beneficial to keep the primary key short. A clustered index is one where the data rows are stored together with (that is, clustered with) the primary key values. Other indexes are secondary indexes; these store the primary key value with the secondary index values. A lookup in a secondary index yields a primary key value, which then is used to locate the data row. The implication is that primary key values are duplicated into each secondary index, so if primary key values are longer, the extra storage is required for each secondary index as well.
  4. Take advantage of leftmost prefixes. When you create an n-column composite index, you actually create n indexes that MySQL can use. A composite index serves as several indexes because any leftmost set of columns in the index can be used to match rows. Such a set is called a “leftmost prefix.” Suppose that you have a table with a composite index on columns named state, city, and zip. Rows in the index are sorted in state/city/zip order, so they’re automatically sorted in state/city order and in state order as well. This means that MySQL can take advantage of the index even if you specify only state values in a query, or only state and city values. However, it cannot use the index for searches that don’t involve a leftmost prefix. For example, if you search by city or by zip, the index isn’t used

Concept of Concurrency

To provide better concurrency, below are the rules of thumbs that I used

  1. Locking level: Locking at a finer level allows better concurrency (ie. row level is better than page level, and page level is better than table level), because more clients can be using a table at the same time if they use different parts of it.
  2. Table locking does have an advantage over finer levels of locking in terms of deadlock prevention.
  3. MVCC is new mechanism to achieve better concurrency b/c read doesn’t block write and vice versa..

Schema design and structuring

This section will cover the techniques that you can leverage to design your schema with performance in mind.

  1. Normalize first then denormalize later. Don’t over normalize. Use 3rd Normal Form
  2. Use the right data type. Don’t waste space unnecessarily (general)
    • Use INT unsigned for IPv4 address
    • Pay attention on TEXT, BLOB and BIGINT data types. Consider separate table for TEXT, filesystem for BLOB and other numeric data type for BIGINT.
  3. Partitioning
    • Vertical partitioning – split table with many columns into multiple tables (ex. static vs dynamic columns, frequent vs infrequent access pattern)
    • Horizontal partitioning – split table with many rows into multiple tables (ex. user-based)
    • MySQL 5.1 partitioning has issues (Postgresql and Oracle are more mature in this area).
  4. Understand the benefit and disadvantages of different storage engines (mysql)
  5. Leverage query cache effectively
    • Each time an update on the any record in the table, all queries referencing the table are invalidated in the query cache. So if you split out the dynamic fields into another table, you can make your query cache less frequent in thrashing.
  6. Index smartly

Write effective SQL

  1. Use ANSI SQL style that explicitly declare JOIN conditions using the ON clause. This syntax can handle Outer join.
  2. Use Explain plan to find out how RDBMS chose the execution plan for your SQL. Tune it accordingly.
  3. Don’t use indexed field in function call
  4. In explain plan, “Extra: Using Index” means FULL Index Scan. It happens when scan is better than seek and index table is enough to satisfy the query.
  5. Why full table scan?
    • No WHERE condition
    • No index on any type of field in the WHERE clause
    • Poor selectivity/ cardinality on an indexed field
    • Too many records meet WHERE condition (scan favors seek).
    • < MySQL 5.0 and using OR in WHERE clause

Put it all together

  • Think in set instead of iteration.

Example: For each customer, fetch the last payment record in the Payment History table. You can solve this query via correlated subquery. That is:

select p.* from Payment_History p
where p.payment_date = (select max(payment_date) from Payment_History where customer_id = p.customer_id)

The above query can be optimized a bit if you use compound index (customer_id, payment_date). However, if you want to achieve the ultimate performance, you should not use correlated subquery b/c it will execute n times and n = number of rows in the Payment_History table. To avoid that, you can use derived table. The query will be:

SELECT p.* FROM (select customer_id, max(payment_date) as last_order from Payment_History group by customer_id) as last_orders
INNER JOIN Payment_History p
ON p.customer_id = last_orders.customer_id
AND p.payment_date = last_orders.last_order

Optimize the N:M joins

If you write an application that you can tag a project many times. You may model your database schema with Project table, Tag table and Tag2Project join table. When you want to grab all projects that either tag by “mysql” OR “php”. You may write a select statement that have Project inner join Tag2Project with project_id and resultset inner join Tag table filtered by “mysql” or “php” with tag_id. However, optimizer may not join it as the order you specified. It may have Tag table filtered first then inner join Tag2Project table and pass the resultset to inner join Project table. You can force the sequence of join via using derived table.

On the other hand, if you want to grab all project that has tagged by both mysql AND php. You may end up solve it via using derived table from Tag2Project and Tag that “GROUP BY” project_id and having count(*) = 2 with Tag filtered by IN clause.

Configurable variables

Before you start tuning your server for mysql performance, you need to obtain the system information first. To do that, there are some linux command that may be useful. top command

top_display.JPG How to interpret the output from top command? Here you go.

  1. The first line is telling you that the system is up 211 days 9 hr and 38 minutes. There are 2 users using the system (from process table, they are mysql and ec users). For the last 1, 5 and 15 minutes, the load of the system shows 0.01, 0.11 and 0.21 respectively.
  2. The 2nd line shows you that there are 82 processes running in the system, only 1 of them are active (In the process table, S=R means Running) and 81 of them are in sleeping mode (S = S means sleeping).
  3. The 3rd – 5th line are very useful. It gives you the current status of our system. CPU is 99% idle. If it is not, you want to find out who is using it. The process table is sort by %CPU (if you have dual core processor, the % here can reach 200). In term of memory, the total memory in the system is around 4G in total. As you can see, 3.5 G is in used. For our mysql instance, it is using 1.2G of memory that is about 32% of the used memory. Another interesting thing to examine is the swap memory. There is 2G of swap memory allocated in this system and the system barely uses it. It is good because system takes time to swap in and out the data from memory to disk and vice versa.
  4. Troubleshooting: When you see mysql process is taking high %CPU – CPU-bound, there may be queries costing intensive cpu cycles. You can look into it via mysql administrative commands. When you see high % of IO wait – ie. IO-bound, you may have inefficient queries that cause mysql reading too many rows to locate the data you are interested in. In system level, you could reduce the # of IO via better use of memory/buffer or reduce the seek time via faster disk and less random data storage.

Other linux commands

  1. more /proc/version (check linux os version)
  2. more /proc/cpuinfo (for dual processor with speed 2.80MHz: cpu core = 2, processor = 1, cpu MHz = 2.80)

Hardware Tuning Tips

There are several key areas that we normally look at when we try to tune our database in hardware perspective.

  1. Add more memory if you can – reduce # of IOs (primary bottleneck).
  2. Achieve parallelism - multi-processor for processing and parallel write to different disk devices (eg. log and database store in different disk device). MySQL is multi-threaded.
  3. Faster disk to improve IO latency
  4. RAID can give you some advantages of parallelism

In MySQL, most of the memory mysql allocates is used for various internal buffers and caches. These buffers fall into 2 major groups: global vs per-connection buffers. The 2 most important global buffers are MyISAM key buffer (key_buffer_size) and InnoDB’s buffer pool (innodb_buffer_pool_size).

  • For MyISAM, remember that msyql doesn’t cache row data but the index block. So, the less often mysql needs to hit the disk to scan a table’s index, the faster the queries will be. In practice, OS may somehow help you out via caching some row data in cache. Even this, if possible, consider making the key buffer large enough to hold the indexes for your most actively used tables. By adding up the size of the .MYI files for the tables, you will have a good idea to set the buffer.
  • For InnoDB, both cache index and row data are together in its buffer pool via using the clustered indexes. So if the table size is not large, you will achieve the best performance via storing the whole table in cache because you don’t need to do any IO. However, if you modify the data in the table, you need to sync up your disk either per min or per commit.

Allocate enough memory for mysql

min_memory_needed = global_buffers + (thread_buffers * max_connections)

thread_buffers includes:

  • sort_buffer
  • myisam_sort_buffer
  • read_buffer
  • join_buffer
  • read_rnd_buffer

global_buffers includes:

  • key_buffer
  • innodb_buffer_pool
  • innodb_log_buffer
  • innodb_additional_mem_pool
  • net_buffer

When there are large number of concurrent users like 300-400, a particular nasty series of events may happen. The reason is large number of threads need to allocate additional memory, it can cause mysql to allocate so much memory that the OS begins swapping, which causes performance to degrade further, which means that each query takes longer to complete. With queries running more slowly, the odds of more threads needing memory increases. It is a vicious spiral. The only solution is to restore balance between the system’s memory and mysql’s memory needs. That means doing one of the following:

  1. Add more memory
  2. Decrease max_connections
  3. Decrease some of the per-thread buffer sizes

Be proactive, monitor memory use on your server.

Below are some quick tips for MySQL tuning that I have learnt from my colleagues and books. To keep this article short and useful as quick reference, I will just put the common ones here.

  • key_buffer size. (for MyISAM – default at only 8MB)
  • innodb_buffer_pool_size (for InnoDB – default at only 8MB) – reference

You need buffer pool a bit (say 10%) larger than your data (total size of Innodb TableSpaces) because it does not only contain data pages – it also contain adaptive hash indexes, insert buffer, locks which also take some time – Peter Zaitsev

  • Bumping these values up can decrease disk IO and make your database run a lot smoother.Use indexes. Enable the slow query log and look for queries that show up most often, then learn the explain command on them.
  • Check Created_tmp_disk_tables in show status; This value usually goes in hand with huge queries that aren’t using indexes. If it keeps increasing it means you are having to create on disk files to execute the query. Consider increasing tmp_table_size or putting your disk files on a TMPFS by setting tmpdir = /tmp.
  • Use the query cache if it can help you. Check the query_cache_size. Check how well it is performing. You can do this by doing a “show status” and then using this formula:

qcache_hit_ratio = qcache_hits / (qcache_hits + qcache_inserts + qcache_not_cached)

  • If there are a lot of Qcache_lowmem_prunes, that means your query cache size isn’t big enough.
  • If Opened_tables is increasing you probably have table_cache too small.
  • If you have a lot of connections, check Threads_created in show status; If it keeps increasing, you need to up the thread_cache_size in my.cnf. The max connections in MySQL are default at 200.

Reference

Below are some references I used to write this article

  1. MySQL Toolkit
  2. One large table vs many small tables
  3. The Art of SQL Tuning (Jay Pipes has a great presentation on sql tuning)
  4. MySQL Performance Blog
  5. Peteris’s Blog on SQL Tuning
  6. MySQL Query Optimization

 

Leave a comment Continue Reading →

Art of using database indexes

Need of Indexes

Image of you have a table of user info, if the table contains 50 million of rows. Without index, running a query like below will need a full table scan. Clearly it is not efficient as it is O(n) problem.

SELECT * FROM user_info WHERE last_name = “Tom”

But if we index it on last_name column, the last_name field will be sorted alphabetically. Now if you look up the last_name = ‘Tom’, you can go directly to the one starts with ‘T’. Internally, index table contains the fields you are indexed and the position of the matching records (ie. pointer).

Cost of Indexes

  1. Because database needs to maintain a separate list of indexes’ values, there is cost to keep them updated as your data changes
  2. Indexes cost space. It is a trade-off between space and time. Lets say the user_info table has 2 billions rows and last_name is 8 bytes long, you are looking at roughly 16 GB of space for the data portion of the index. Plus 4-8 bytes for each row pointer, it will go up to 32 GB of space.

With these cost, you don’t want to index every column in a table.

Type of Indexes

 

  1. Multicolumn indexes – reduce the set that matches single column only (ie. more selective). For example, if you have each field as index separately, database may not use them all at once at the same time. Like MySQL, it will only ever use one index per table per query. To choose which index to use, MySQL will make a decision about which index will return fewer rows via index statistics.
  2. Partial indexes – if you don’t have too much space for your index, you can specify a subset of bytes from your index value be used.
  3. Clustered indexes – In MySQL, for MyISAM, the indexes are kept completely separate from the row data. With clustered indexes, the index and the record itself are “clustered” together. InnoDB uses clustered indexes. In Oracle, clustered indexes are known as “index-organized tables”. The type of index will reduce two lookups (index and record data) to one. Internally, clustered index reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages.

Deep look in MySQL: The InnoDB storage engine creates a clustered index for every table. If the table has a primary key, that is the clustered index. If not, InnoDB internally assigns a six-byte unique ID to every row and uses that as the clustered index. (don’t let it generate a useless one for you). All indexes are B-trees. In InnoDB, the primary key’s leaf nodes are the data. Secondary indexes have a pointer to the data at their leaf nodes.

Index Structure

  • B-Tree Indexes – (balanced tree indexes, a tree structure that will never become lopsided as new nodes are added and removed. It gives us O(log n) performance for single-record lookup). Unlike binary tree, B-trees have many keys per node and don’t grow tall or deep as quick as a binary tree. It is very good for range-based queries as well. For example, for quey below, server simply finds the first Ray record and last Robert record. It then knows everything in between are also matches. The same is true of virtually any query that involves understanding the range of values (>, <, MIN, MAX, BETWEEN xx AND yy)

SELECT * FROM user_info WHERE last_name BETWEEN ‘Ray’ AND ‘Robert’

  • Hash Indexes – It gives very fast lookups O(1) but it is less flexible and predictable than other indexes. First, the key is hashed and compare. So, the range-based queries cannot use it. Uniform distribution is the key here.

Index Limitation

  1. Index doesn’t work together with wildcard and regular expression search.
  2. If index selectively is like > 30% of rows (very low), table scan may be better.

 

Leave a comment Continue Reading →