This video is very informative that shows you how to scale a site from infancy to 100 millions view per day. It is amazing that the team behind this is very small.
Interesting areas:
- Database scaling
- Initially YouTube had single master multiple slaves replica database architecture. Replicating your database has many advantages. For example, backup and recovery become easier. Rather than shutting down your master server to back it up, you can simply back up a slave. More importantly, replication makes it easier to scale large applications. By sending all write queries (INSERT, UPDATE, DELETE) to the master and using slaves for most of the read queries (SELECT), it’s possible to achieve nearly linear scalability for read-intensive applications. Because YouTube used MySQL, its replication is done in asynchronous log-based fashion. Under the hood, master puts the write queries (DML) to the binary log and slaves’ IO threads are responsible for connecting to the master, retrieving copies of the events recorded in the binary log and write to their own relay log. Then SQL thread will pick up and replay the queries (better than replicating delta if cheap DMLs with large data change). This dual-threaded design helps to ensure that the slave always has the most recent data, even if it hasn’t had a chance to process it.
- Master-N-Slaves is very good for read intensive application. However, YouTube is also write intensive. If your application is write intensive this configuration will be saturated much faster because it has to handle much more write load. Especially keeping into account MySQL replication is single thread it might be not long before it will be unable to keep up. Just like everyone else that goes down this path, it eventually reach a point where replicating the writes to all the DBs, uses up all the capacity of the slaves. Apart from that, asynchronous nature of replication has its own compromise b/c slaves can contain stale data. It is true delay is often insignificant but in times of heavy load or in case you was running some heavy queries on the master which not take time to replicate to the slave replication lag can be significant.
- Db partitioning from monolithic DB to multiple disjoint shards. This way, you can spread both read and write. Plus you will have better use of cache.Â
- In YouTube, they do user partition and have the web server to determine which shard to hit based on what user content is requested. You can use an algorithm to figure out what shard to go at runtime with user_id as input. However, YouTube decides to create a table for user and shard mapping because it will give the flexibility to move user data to different shards if necessary. For example, power account may have huge amount of video requests. Relying on hashing the userId to find the shard will end up having unbalanced resource allocation (ie. overloading a shard while other shards are underused).
- In this video, YouTube also found out there is swapping problem in Linux 2.4. Look like the kernel is aggressively swap out the pages even MySQL hasn’t not used up the free memory. It finally causes the database to halt as MySQL and kernel finally end up swapping in and out the page. YouTube simply removes the swap from kernel to get around this problem. According to Jermey Zawodny, he has noticed that the kernel still attempts to do swap check even you disable the swap (detail here). In term of virtual memory management, FreeBSD has done a much better job. However, Linux is better in managing threads.
- Improve performance
- Caching, CDNÂ
- Google BigTable
- RAID
- MapReduce
Reference
- http://kylecordes.com/2007/07/12/youtube-scalability/
- http://freescienceonline.blogspot.com/2007/08/scalability-and-scalable-architecture.html (more videos)
- http://www.mysqlperformanceblog.com/category/replication/Â (great blog)
- Google Big Table Paper
- Great MySQL HA Presentation
- Scalability Resource Links






































(4.75 out of 5)
No Comment Received
Sorry the comment area are closed for non registered users