Testing our Infrastructure from a User Perspective
When you have a complex technical solution developed, you want to be sure that you actually receive what was agreed on. This is why, especially in the IT sector, it is common practice to perform acceptance tests when a solution is handed over to a customer to ensure adherence to the specification. Standardized continuous services, such as cloud services, are subject to a constant handover process, which is exactly why, at cloudscale.ch, we have developed an "Acceptance Test Suite" and recently released it on GitHub. This provides you with an insight into part of our quality assurance process and allows you to see for yourself which standards we set for our cloud infrastructure.
What is tested
We developed the Acceptance Test Suite to allow us to see, as completely as possible, our cloud infrastructure from a customer perspective. This means that these end-to-end tests cover almost every aspect of our cloud offer, including confirmation that a single server really can have up to 128 volumes, that servers can be scaled between Flex and Plus flavors, that a Floating IP can be moved between servers, and that jumbo frames can be used in a private network. During this process, the acceptance tests simulate a power user utilizing all the features of our offer. This provides us with the certainty that our infrastructure works perfectly during intense day-to-day use, too.
The acceptance tests use our public API and thus the same technical interface as our CLI and DevOps tools, such as Ansible and Terraform. As this enables acceptance tests to be fully automated, they can be performed regularly: from GitHub we run them against our two cloud locations every day to ensure that we are constantly aware of what our customers see "from the outside". We also use this to test our lab setups every day as well as to perform targeted test runs before and after major updates. As a complement to our manual verification steps in the lab, the acceptance tests provide additional certainty that planned work on productive systems will not have any negative effects for our customers.
Our S3-compatible Object Storage, which is based on Ceph, is the only thing completely excluded from the acceptance tests we developed. Here, we use the corresponding automated tests that are already publicly available for this open source project.
What the acceptance tests mean for our customers
We use redundancy and extensive monitoring in order to minimize negative effects of isolated events, e.g. hardware defects, for our customers. With the acceptance tests, which simulate a wide range of use cases, we have extended this monitoring so that it also covers cases where all the "cogs" work in isolation, but for some reason still do not mesh together correctly. Typical examples here are configuration errors or version updates that bring with them slightly modified system behavior. Our comprehensive acceptance tests mean that we detect many of the edge cases while still in the lab; regular testing of the productive systems then confirms to us that all features are available to our customers as usual over the long term.
Despite all these precautions, problems can still occur in unexpected places, causing previously correct system behavior to suddenly disappear. Expanding our Acceptance Test Suite is one way to prevent regressions of this kind in future. As a project that has grown and continues to grow, the acceptance tests are developing together with our cloud offer. It goes without saying that all customers automatically benefit from this institutionalized learning process.
How to see for yourself
The main aim of our acceptance tests is, of course, for you to be able to rely on the documented features of our infrastructure in your day-to-day work and life. We, however, go an extra step. On GitHub, you can see the tests we run against our productive cloud infrastructures, including the results. Please remember that, as things sometimes go wrong on the Internet, tests may be repeated one additional time before they are assessed as "failed".
For everyone who would like to look at this in more detail, we have published the source code of our acceptance tests on GitHub. This will enable you to reconstruct exactly which tests we perform and how we perform them. If desired, you can also run the tests against our infrastructure yourself. All you need is a Linux or macOS system with Python (version 3.6 and above) and a cloudscale.ch account (for security reasons, we recommend that you use a separate account where you do not use any productive resources).
Our customers depend on their infrastructure working reliably at cloudscale.ch. Regular tests both in our lab and on the productive infrastructure are a fixed component of our quality assurance process. Releasing our acceptance tests on GitHub provides you with a direct insight into this essential tool that we use to measure ourselves against every day.
Performing the acid test,
Your cloudscale.ch team