At Alfa we attend a range of technology conferences in order to learn and improve, meet other developers, and share ideas. QCon is a conference focused on innovation, and unlike many of its counterparts uses an editorial panel to appoint speakers by special invitation only. Alfa attends QCon London every year, and we always uncover new knowledge about the latest computer science research, new architectures and approaches, and benefit from lessons learned by other agile teams.
Distributed systems: the new normal
QCon aims to cover topics at the "bleeding-edge for the enterprise", concentrating on those considered to be ‘innovators’ or ‘early adopters’ on the technology adoption curve. Tech conference speakers and thought leaders have been presenting cloud-based, high-performance, scalable, distributed systems for years. Because these have now become the norm for many teams, there was a change of tone from this year's presenters.
As well as pointing to familiar tech industry names like Netflix or Spotify as organisations to follow and learn from, many of this year's QCon talks were about organisations who've successfully run projects to transform to cloud systems, or implemented "cloud first". We listened to diverse presenters, from NHS Digital to the team behind BBC iPlayer, and from the Financial Times to challenger banks, about the trade-offs they've made.
If you arrange your architecture into many small services, where each does one thing and does it well, it’s possible to change each service independently. If you can harness this potential, it can be fast and cheap to change your system, and get the benefits of those changes into production quickly. This requires discipline, attention to feedback, and investment in the automated build, test and delivery pipeline that makes those changes safely.
How do you make sure your software is supportable in production? Optimise for flexibility, and have a single engineering team responsible for dev and ops, or have specialists and develop a shared understanding of each service's characteristics? Challenger bank Monzo discussed how they write the majority of their services in Go, concentrate on very small services that are easy for any of their developers to change, and having the shortest time to get changes into production.
Assembling a large number of services into a distributed application adds complexity – we’re actively working in this space as we evolve our Alfa hosting services. Kubernetes, a project we’re already looking at, has emerged as the container coordinator of choice with many presenters demonstrating using it to manage sophisticated deployments.
Security > Resilience
In all software, but in cloud and finance systems in particular, security is vital. We heard from teams who embed security expertise alongside developers, and a series of talks on the potential of Blockchain technology reminded us how important this is likely to become.
It was fascinating to receive an insight into how challenger bank Starling has demonstrated to regulators how its highly available, resilient architecture of Java microservices with Docker on Amazon Web Services (AWS) meets the sector’s strict requirements. Many of the tech team there are former Alfa colleagues and we enjoyed hearing how they prove this architecture every day, using chaos testing in production to deliberately kill servers and simulated traffic to observe the system under high load.
Who watches the watchers?
Distributed systems made up of many services are often extremely complex. Having a clear picture of how they are behaving can become difficult, and we watched several teams talk about their experiences with monitoring and telemetry systems.
At the Financial Times, a team responsible for a system made up of more than 300 microservices discussed the changes they’ve made to avoid being overwhelmed by floods of alert emails, sometimes generated when no intervention was needed. Monitoring is a key production requirement; they have standardised health-check APIs for every production service, they test their alerts, and they concentrate on aligning each alert to the relevant business function. This theme was echoed in several conference tracks – whether it’s service up-time or performance benchmarking, it pays to measure the real impact on users and the business.
Scaling to meet demand
Services that operate at internet scale can have vast potential reach, and unpredictable, extremely “spiky” demand. The Facebook Live team faced interesting challenges where one celebrity’s live stream was watched by over a million simultaneous users, and employed innovative caching strategies to meet them.
As we’ve seen, the value in delivering a distributed system comes with easy to change, scalable services, which often require administering servers or fleets of containers. The next step on this arc might be to move to a so-called serverless deployment model, where the "cloud" is a pure compute-as-a-service resource such as AWS Lambda, where you only pay while your code is actually executing. This presents scaling opportunities and cost savings, but also novel problems - how do we test, debug or monitor these lambdas?
As usual, we had a great time at QCon and we will use what we learned to help us improve our products and services. This year, we’ve picked out many of the sessions that were directly relevant to our cloud hosting offering at Alfa, but we also took away how to attract and build diverse teams, tips on optimising Java performance, machine learning techniques, and an idea of what the future of the internet of things looks like. See you next year, QCon.