With this kind of power, companies are able to respond immediately to customers’ changing needs and have the capacity to disrupt traditional limitations across the entire business lifecycle. Manufacturers, for example, might be able to design a sports shoe, a fashion accessory, or a car body, using just a laptop, and bring the products to market in just a few days. With the power of data virtualization, companies are limited only by their imaginations.
References
1.
R. Fuller Buckminster, Critical Path, 2nd Edition ed., New York: Griffin, 1982.
2.
Forrester Research, “Create A Road Map For A Real-Time, Agile, Self-Service Data Platform,” 2015.
3.
Gartner Research, “The Big Data Warehouse Deal: The Future of Data Management Solution for Analytics,” 2016.
4.
R. van der Lans, Data Virtualization for Business Intelligence Systems, Watham: Morgan Kaufmann, 2012.
© Springer-Verlag GmbH Germany 2018
Claudia Linnhoff-Popien, Ralf Schneider and Michael Zaddach (eds.)Digital Marketplaces Unleashedhttps://doi.org/10.1007/978-3-662-49275-8_63
63. The Cloud Native Stack: Building Cloud Applications as Google Does
Josef Adersberger1 and Johannes Siedersleben1
(1)QAware GmbH, Munich, Germany
Josef Adersberger (Corresponding author)
Email: [email protected]
Johannes Siedersleben
Email: [email protected]
63.1 Cloud Native Disruption
The term Digitalization disguises the world’s perplexity about the immense success and the hegemony of digital age companies – notably the GAFA gang (Google, Apple, Facebook, Amazon) which are often called disruptors for having disrupted classical industries such as retail, banking and travel. Other areas like insurances, logistics and mobility will be affected before long. The digital disruptors not only have had bright business ideas and good strategies to grow and monetize but have been clever at vastly improving non‐functional properties such as hyperscale, continuous feature delivery and antifragility, all of them unknown until recently.
Hyperscalability 1
is the ability to scale up and down in real time as data or traffic vary, even if this happens at exponential rates. Opportunity costs are reduced through high utilization and pay‐as‐you‐go resource consumption. Hyperscale systems scale inductively: adding or removing a single resource to N resources present requires a constant effort independent of N. Examples for hyperscale systems are Facebook and Apple: Facebook scaled from 1200 Harvard students in 20042 to 100 million users in 2008 and has reached two billion users across the globe in 20163. Apple has rolled out Siri, a highly computing‐intensive service, on tens of thousands of nodes, adopted by hundreds of millions of users.
Continuous feature delivery
[1] is all about continuously delivering new features to the customer. Starting from a minimum viable product (MVP), the system is steadily being enhanced by small, quick steps. This can be seen as another form of inductive scaling in terms of features rather than in data or traffic. New features are developed by independent teams with little synchronization involved. This requires a suitable software architecture and highly automated post‐development tasks such as acceptance test, deployment and all kinds of standard operations procedures. Continuous feature delivery presupposes agile development; it is incompatible with whatever looks remotely like a waterfall. Look at Walmart: Back in 2012 they were deploying a new version every two months – way too slow when competing with Amazon. Today, in 2016, they manage over 1000 deployments per day, directly triggered by their development teams4.
Antifragility
is one of the ultimate aims of software engineering. This term, coined by [2], conveys the idea of systems not only coping with the fact that everything fails all the time 5 but handling failures gracefully and emerging invigorated from mishaps. Hardware fails, software is buggy. When failure is not an option, we need resilient systems, processes and organizations. Leanness is an important ingredient of antifragility: what isn’t there cannot fail. You are done when there is nothing left to remove. Systems with no accidental complexity left and essential complexity boxed are easily hardened. Twitter used to be notorious for its fail whale 6, a last resort error message presented to end‐users in case of serious production problems. In 2007 Twitter was down for no less than 6 days, in 2008 it crashed during Steve Jobs’ keynote at MacWorld. But Twitter has been able to harden their technology successfully: the fail whale has been gone since 2013 with uptime next to 100%.
The GAFAs and other digital disruptors have gathered top engineering talents to commoditize the technology necessary for hyperscale, antifragile systems and enabling continuous feature delivery. They did the heavy lifting and then released much of their achievements as open source projects to the world. These projects enable everyone to develop systems like the GAFA’s. This is called GIFEE (Google Infrastructure for Everyone Else), or more descriptively Cloud Native Stack because of the technology targeted at cloud computing applications. Systems running on top of the Cloud Native Stack are called Cloud Native Applications, and the whole thing (stack and applications) is called Cloud Native Computing.
Cloud Native Computing leverages disruptive digital solutions but requires deep changes in organization, methodology and technology. Organizational and methodological changes raise acceptance barriers to be overcome gradually by means of pilot projects and small teams exploring the new approach. Let the change grow only if you know what you are doing. Technological change is risky indeed because the available technology is at least partly immature, the knowhow is restricted to few experts and the complexity of Cloud Native Applications is inherently higher than that of conventional ones. So, it is again advisable to start with proofs of concept, small pilot projects and to further move on to suitable building blocks, using well‐tried technology taming the essential complexity of Cloud Native Applications.
Our view is that of a friendly but critical observer: we describe the cloud native world out there, its features, advantages, the costs and risks. The cloud native world has emerged over the last few years, with no a priori roadmap. We, normal mortals not hired by GAFA, woke up one day and saw a cloud that had arrived. Now, let’s make the best of it, which means: Cloud Native technology enables applications no one would have thought few years ago, billions of users, terabytes of data can now be handled, at least by some types of applications. Cloud Native technology opens the door to a new world. On the other hand, it might or might not be useful for applications meeting standard requirements. This paper contains some hints as to how much Cloud Native Technology standard application need, but it would be foolish to expect any definite answer beyond the obvious It depends.
63.1.1 Organisational Change
The organisational change amounts to rotating the IT by 90 degrees (Fig. 63.1), replacing a sequential pattern with a parallel one. Referring to Gardner’s bimodal IT7, the cloud native development paradigm is mode 2. In mode 1, IT is organised along the waterfall process: artefacts go all the way from design and architecture via development and test into production causing new features being delivered in stages say three times per year which doesn’t correspond exactly to the idea of continuous delivery. Mode 2 requires therefore a parallel organisation instead: several teams work in parallel, largely independently and with little interaction. The infrastructure team manages whatever the feature teams need: professionally run platforms, cross‐sectional tasks, and the build pipeline. Several feature teams create and run each a disjoint set of features. Boundaries between single systems and system landscapes vanish, making room for something new: a vast set of features we call the feature lake, gradually ousting conventional systems.
&nb
sp; Fig. 63.1Mode 1 vs. Mode 2 Organization
63.1.2 Methodological Change
Cloud Native Applications are incompatible with conventional, waterfall‐like methods with at best two or three deliveries per year. They heavily rely on two premises: A software architecture suitable as a basis for non‐functional requirements, the most important being flexibility and extendibility.
A feature driven development process, enabling many feature teams working in parallel, and based on emergent design.
Software architecture is about decomposing systems into cohesive and loosely coupled units, the components and their required and provided interfaces. Components drive how teams are organized: each team is responsible for one or more components. Interfaces drive the interaction of components as well as that of teams. Design decisions present themselves at two levels: Macro decisions affect many teams and are taken by a suitable architectural authority. Micro decisions affect one team only and are locally dealt with. All decisions are deferred as long as possible; uncertainty is part of the game, rather than an undesirable state to be evacuated quickly. This organisation has been chosen by many teams and, for the time being, seems to be the best one at hand.
63.1.3 Technological Change
A lot of new technology is available, waiting to be apprehended and tamed. It is important to understand the basic concepts and the anatomy of a cloud native stack and cloud native applications. There are two fundamental technological concepts: Ops Components and Data Centres as a Computer. In what follows we describe the facts, the tools available, their features and how they interact, very much like a biologist would describe a particular species: we explain the animal as it has developed very quickly during the last five years or so. It is the result of the accumulated but largely unsynchronized efforts of many clever minds at the GAFAs.
Ops Components
They are called Microservices, Nanoservices [3, 4], self‐contained systems8 or twelve‐factor apps9, but essentially they’re all based on the same concept: Ops Components. Ops Components transfer the idea of component‐based software into the realm of operations. Ops Components feature interfaces such as: Lifecycle interface for start, restart and termination (e. g. Docker container).
Remote interface exposing the component’s functionality (often as a REST interface).
Diagnosis interface providing access for tools monitoring metrics, traces, and logs (e. g. collectd and logstash agents).
An Ops Component is many things at the same time, namely: A testing unit. It can be tested in isolation, with other Ops Components mocked away or fully integrated.
A release unit. It can be released stand‐alone.
A deployment unit. It can be deployed stand‐alone.
A runtime unit. It can be started and terminated independently. It has a lifecycle of its own.
A scaling unit. Ops Components live as arbitrarily many parallel instances being added and removed on demand.
A transport unit. It can be moved around across nodes.
An Ops Component does not run in vacuum. It requires other Ops Components it depends on as well as some special infrastructure like service discovery, API gateway or configuration & coordination server. More about this infrastructure as part of the cloud native stack in Sect. 63.2.3.
Data Centre as a Computer
The idea of a Data Centre as a Computer has been introduced by [3]. It can be thought of as an operation system for clusters of up to tens of thousands of nodes rather than for a single one, abstracting away the essential complexity of distributed computing. This is also called a Cluster Operating System [5]. It manages many Ops Components on a cluster, performs standard procedures like deployment, scaling, or rollback, and provides drivers for cluster resources such as processors, storage, network or memory. A cluster scheduler distributes Ops Components on suitable nodes, optimising utilisation; a cluster orchestrator is in charge of keeping all Ops Components up and running. Sect. 63.2.3 contains more details.
Cluster resources are treated as cattle rather than pets, a metaphor invented by Bill Baker (Microsoft): pets have got individual names and are pampered till death, cattle is identified by numbers, made use of and killed when time is up. Cluster operating system can also be useful for mode 1 applications which benefit from automated operations, improved utilisation and availability.
63.2 Applied Cloud Native Computing
In this section we outline how cloud native computing can be applied: what design principles count and how Ops Components can be derived from Dev Components. We then describe the anatomy of the cloud native stack, containing the ops component infrastructure on top of a cluster operating system, and finally report on available open source technology.
63.2.1 Design Principles
When designing cloud native applications on a cloud native stack the following design principles apply:
Design for Performance
Be performant in terms of responsiveness (provide feedback as fast as possible), concurrency (parallelize tasks as much as possible) and efficiency (consume as little resources as possible).
Design for Automation
Automate stereotyped tasks. All processes are cast into code: building, testing, shipping, deployment, running and monitoring.
Design for Resiliency
Tolerate failures by compensating errors and healing root causes.
Design for Elasticity
Scale dynamically and fast. Detect automatically the need to scale, see to adding or removing instances within seconds and without tampering components running in parallel.
Design for Delivery
Focus on short round trip times from coding to production, automate the delivery process and have components loosely coupled.
Design for Diagnosibility
Provide a cluster‐wide way to collect, store and analyse the plethora of diagnosis data such as metrics, logs and traces. Focus on lowering the MTTR (Mean Time to Repair).
These principles apply for any cloud application because the intended non‐functional properties are exactly what cloud computing is about: if performance or elasticity is no issue, why bother building a cloud application? Depending on the requirements, other principles may enter the stage, such as design for security. Anyway, with these design principles in mind we are prepared to discuss how cloud native applications can be designed.
63.2.2 From Dev Components to Ops Components
Software components or Dev Components are units of software design and development. It is well known how to design and implement them using methods like domain‐driven design [6] or component‐based software development. They are often represented as separate build projects with well‐defined dependencies. Dev Components remain what they have always been, unaffected by whatever new technology. They are rocks on which we stand. The question is how to cut well‐known Dev Components into as yet largely unexplored Ops Components.
Fig. 63.2 shows different levels of either type (Dev and Ops), which makes us ask at which level Ops Components are to be carved out. The monolith approach crams the whole system into one big Ops Component, whereas transforming each and every application service into a single Ops Component leads to as many nanoservices. Microservices represent a compromise, with application services suitably grouped into Ops Components. Rigid rules such as “microservices must not exceed 1000 lines of code” (or not fall below, for that matter) are not helpful. Instead we present some arguments for small‐grained Ops Components, followed by others in favour of large‐grained ones. Please note that we are discussing Ops Components as opposed to Dev Components, which are, as we have shown, largely invariant as Ops Components are fused or split.
Fig. 63.2Levels for cutting out Ops Components
Why should Ops
Components be small?
Scaling
Small Ops Components scale easily by multiplication. This saves resources.
High availability
As a rule of thumb with obvious limitations one can say that small components do less harm when crashing than large ones. Small components is a useful prerequisite for resilience and high availability. If one ops component crashes only this one gets unavailable but the system has a chance to survive.
Dependency management
With ops‐components reflecting more or less 1:1 the Dev Components dependencies can be managed better. Dependencies are explicit because cross‐component dependencies are always remote. No accidental in‐code dependency can occur and dependencies and interfaces are always explicit. A further benefit is that component interfaces are more in focus as they’re remote interfaces and have to be exposed and described explicitly.
Digital Marketplaces Unleashed Page 96