Scaling - a series of small, incremental evolutionary changes interspersed with intense “oh shit!” moments.

We’ve recently undergone a significant change in architecture in response to a series of bad downtime. You can see above what it used to look like: a fairly simple setup, with a few app servers, and what it looks like now. We’ve added a few important changes to the way everything works, and we’re rolling these out in stages.

Tender has been growing at a fantastic rate (doubled in size this year) but the hardware infrastructure hasn’t really changed - as we’ve grown, we just increase the instance size of the master db and added app servers as necessary. This has worked excellently because it doesn’t require any downtime. But as you can probably tell, the old infrastructure had many points of failure. The main two issues we’ve seen are (a) the haproxy machine going down but not failing over and (b) high IO on the database master EBS disks.

We can’t fix the failover issues - that’s the domain of our webhost, and they assure us it’s fixed. Regardless, we’ve upgraded our support plan with them to super-mega-premium, so we can get expert help a bit faster (an additional $2,000 per month! There goes the summer home in the Bahamas!).

The database/EBS IO issues are unavoidable on EC2. Sometimes the disks are just slow which means random queries like ‘update sites set updated_at = NOW() where id=12254’ take fifteen to twenty seconds. In combination with the latency issue we also outgrown our web host: the largest machine that EC2 offers, is the quad extra-large has 68gb of RAM, but our largest tables (comments and discussions) each have indexes larger than this, which means a lot of mySQL hunting on disk for the data.  To make things worse, Engineyard, who provides the stack we’re using on top of EC2, doesn’t support running multi-master replication or even having multiple writable database servers. And since we’re on the largest machine that EC2 offers, we’re prevented from growing with the existing strategy. If we hosted our own gear, we could easily build yet another bigger server. But we aren’t.

As it turns out, we can work around this limitation with noSQL. It involves managing a lot of the in-memory data MySQL used to handle by building things ourselves in Redis, which is ridiculously easy to scale horizontally. Here’s how we’re doing it:

  • Store as much as possible in Redis, which means, store as much as possible in RAM. Redis makes doing big intersections of lists (i.e. all discussions in this queue, on this site, in this state) super fast and easy.
  • Minimize database table and index scans, again, by building our own indexes (e.g. all discussions by this user_id) in Redis instead of mysql.
  • Add a bunch of redis instances and servers dedicated to running Redis. Redis and its ruby library makes this super easy again with consistent hashing. We’re making up to 32 instances per server.

Finally, we can also add readonly slaves to mySQL (and hope that replication doesn’t get too far out of sync), and move most of our reads to those. We already use background jobs to process most writes to the database already, so this was easier than expected. (on Friday, we were down due to a memory leak in the sharding library. D’oh! Two steps forward, one step back)

So, implementing these improvements include all your standard scaling progress:

  • Shard the database vertically so that reads go to a separate slave. Once this works reliably,
  • Shard the database horizontally so that each site has its own dedicated database on a server with very few other databases (better CPU to RAM ratio)
  • Add more Redis servers with up to 32 instances per box
  • Reindex all the existing redis data so that it is spread onto the new instances
  • Add a bunch more app servers, so the frontend is always responsive even if the master db is backed up.

Combined, this allows us to move individual chunks of both Redis and MySQL databases onto their own server to equalize load and memory usage; once one instance gets too large (at this stage, over a few hundred MB), we can move that one instance onto a new server with very little effort and almost zero downtime. It also gives us an order of magnitude more CPU per GB of RAM, which means everything’s faster.

What’s next? Hopefully all this will happen behind the scenes and everything will get faster and more reliable. We’ve just hired two more skilled developers to help us implement all this as soon as possible. Once we have the new backend fully online, we can start shipping the fully dynamic ajax frontend, live in-page updates, and more.

  1. drewmellow reblogged this from jonpaullussier
  2. jonpaullussier reblogged this from entpblog
  3. entpblog posted this