Increasing Availability Using Anti-Affinity
May we introduce: Anti-Affinity. Use this small but powerful new feature to build even more resilient setups. Furthermore, we would like to share some insights in how we approach high availability (HA) on many different levels.
- How to benefit most from Anti-Affinity
- What measures we take to maximize availability
- Why HA is more than just a reliable server
How to benefit most from Anti-Affinity
With the most popular virtualization technologies, a crashing physical compute host inevitably takes down all the servers that were running on it. As a remedy, you may already have "N+1 redundancy" in place by clustering multiple virtual servers (e.g. multiple web workers or a DB cluster) to increase your solution's availability. It is possible, however, that some of those servers are running on the same physical host by coincidence.
With our new Anti-Affinity feature, you can ensure that servers with identical tasks will always be running on separate physical hosts. This effectively protects you against the impact of a single compute host's hardware defects.
What measures we take to maximize availability
Of course, a server failure still is annoying. That is why we at cloudscale.ch only use systems which are designed to always keep running and stay online:
All of our systems are equipped with redundant, hot-swappable power supplies. All physical servers are connected to multiple switches simultaneously and can be administered through a separate out-of-band management network.
In case of a defective compute host, all affected virtual servers are restarted promptly on separate, hot-standby compute hosts. Thanks to our distributed storage cluster based on Ceph, the content of the hard disk will be left intact. Moreover, with a replication factor of 3, your data is well protected against hardware defects – a risk category which we further reduce by using enterprise grade SSDs only.
On the level of supplies we have built-in redundancies, too, aiming for the highest availability: For one thing, we can rely on the data center's redundant cooling as well as two power sources, both of them backed by UPS systems and diesel generators. For another, we maintain multiple Internet connections with different upstream providers and a link to the SwissIX Internet Exchange.
Finally, we run our own critical software (e.g. OpenStack components) following the N+1 principle to operate seamlessly through a possible outage.
Why HA is more than just a reliable server
To us, availability means that your servers are running and reachable. Think about what availability means to you – or your users, for that matter. What could go wrong, and how can you prevent or minimize a negative impact? Choose the approach and tools which suit you and your use case best. In the end it is all about being prepared.
For servers that work!
Your cloudscale.ch team
PS: Matching the topic, André Keller of VSHN AG and our CEO Manuel Schweizer will give a presentation addressing "How to increase availability using ExaBGP". For registration and more information on this event taking place in Berne on November 4, 2016, see http://www.swinog.ch/meetings/swinog30/