Feb 17, 2020

Breaking your build on SonarQube quality gate failure

We use SonarQube to do a static analysis of our code for code smells, security bugs and more. SonarQube had a great feature in earlier versions: breaking the build if the quality gate of your project is red. A quality gate defines the metrics you want your code to have, for example: no detected bugs, code coverage 80% or more, no TODO comments. SonarQube quality gates help us to follow the rule "don't live with broken windows" [1] in our daily work.

If the quality gate fails, the build breaks and the developer which made the change immediately notices that something went wrong and fixes the code. I found this feature really helpful, especially when working in a pull-request based manner. When running the SonarQube analysis on the pull-request and the quality gate turns red, the build of this branch fails. As a passing build is a requirement for the merge into the master, you have ensured that no broken code (at least how SonarQube sees it) ever reaches the master branch.

Unfortunately the SonarQube developers don't agree with me on that. They removed the functionality to break the build from their Maven SonarQube plugin. They later posted a way to fail the build using a webhook, which is a really great solution when you are allowed to install Jenkins plugins. That won't work when you are not allowed to install the plugin or if you don't use Jenkins. In my project we work with Gitlab CI and we wanted a way to fail the build if the SonarQube quality gate is red.

Here at QAware there is the concept of QAlabs - this is a place where you essentially get (paid) time to develop something outside of your project if it benefits the company. I did exactly that - told the idea to the QAlabs chief, received green light and implemented it.

We open sourced the code, as we think other people also need a solution for that (just google for "SonarQube break build"...). I wrote a Maven plugin to break the build. Integration is easy, all you need is the URL to the SonarQube server and an API token. "But I don't use Maven", i hear you say. Fear not, as we also built a standalone application which you can run in your pipeline. It returns exit code 0 when the quality gate is green and exit code 1 if it's red. You should be able to integrate this in every CI pipeline you use.

At the end of the year, there is a price for the best QAlabs project. And guess which project had won? The SonarQube Build Breaker! Yeah!

You have a cool idea for a QAlabs project and want to participate, too? We're hiring :)


[1] Andrew Hunt, David Thomas (1999). The Pragmatic Programmer. Addison-Wesley

Nov 15, 2019

A Field Report From the O’Reilly Software Architecture in Berlin

by Susanne Apel, Stephan Klapproth, Andreas Zitzelsberger

Last week, we were at the O’Reilly Software Architecture in Berlin. Apart from showing off our toys at our booth and chatting with the nice people at the conference, Leander Reimer presented a hitchhiker's guide to cloud native API gateways and we visited the conference talks. In this article we share our key learnings and takeaways with you. Slides or videos are provided where available.

Foto of our booth at the O'Reilly Sofware Architecture

Cognitive Biases in the architectures life - Birgitta Böckeler

[by Susanne Apel] [Video]

I was impressed of how honestly Birgitta spoke about past communications and decisions. She started her presentation by talking openly about how she feels about getting feedback to be ‘even more confident’ and having the immediate impulse to explain herself. This keynote was the profound answer that will hopefully get many minds contemplating about cognitive biases and confidence as a concept.
I was happy to see the topic of cognitive biases in the tech world underlined with good examples. To give one: Featuring the past decisions to use framework X should not be judged while disregarding the outcome bias. You do not know the future of framework X in advance (Reflux in this case). You should be aware of the source of a positive outcome: Was it the decision making or was it luck?
Birgitta is very much aware of the differences and encourages all of us to do the same. This will lead to the point where we will make fewer absolute and more relative statements.

The 3-headed dog: Architecture, Process, Structure - Allen Holub

[by Susanne Apel] [Video]

In addition to the three heads in the talk’s title, Allen also mentioned culture and physical work environment.
In agile teams, the real goal is to quickly add value to the product - and speed in gaining value can only be achieved by speeding up feedback cycles.
The teams should very much be autonomous and empowered to make decisions.
In my point of view, these are the underlying principles in agile software development, regardless of the particular framework used.
Allen points out the role of teams and offices and the real meaning of a MVP - a small solution that can be enlarged (as opposed to a throw-away product), demonstrated with impressive images of actual houses built this way. He emphasizes that if you want to change one head of the five-headed dog, you also have to change all of the other heads.

A CRDT Primer - John Mumm

[by Susanne Apel]

John explained conflict-free replicated data types (CRDT) with a clear motivation and a nice mathematical introduction providing an intuitive grasp of the topic.

From a computer science point-of-view, the talk seemed very mathematical, from a mathematical point of view it gave plausible explanations while leaving out the more subtle parts of definitions and proofs. The intuition is sufficient, the validity proven elsewhere.

John motivates the issue with a Twitter-like application where the number of likes is to be maintained 'correctly'. This is not a trivial task for a large scale application with many instances.
For the Twitter likes, assume that you cannot unlike a post after you liked it earlier. This gives the following implementation:
Each node maintains a local copy of the local likes of every node in the cluster. When the number of likes is requested, the node sums up the number of likes. If a user likes a tweet, the node 'n' answering the like request increases its own counter of likes. When there 'is time', the node 'n' broadcasts (gossips) the new local copy of its cluster-view. The other nodes will compare and see a higher number of number of ‘n’-likes and will incorporate this number in their own local copy. To be more precise, the node broadcasts its own internal state of all node. This makes the broadcasting more efficient. However, the principle of distribution just explains stays the same. The nice thing is that the broadcasting works very smoothly, and you do not have to think about order of events. It might be that the user sees old data, but there will be eventual consistency. And their own interactions are always reflected immediately.

Mathematics confirm that this works, also with data types other than counters - given that they do fulfill the mathematical relations. Roughly speaking, the relations can be put as following: You need to define lookup, update, merge and compare methods (or variations thereof. The CRDT Wikipedia page provides a good explanation).
If all of these functions together fulfill certain rules, you will get eventual convergence of the lookup value of the data type (monotonic join semi-lattice and the comparison function should be a partial order). Broadcasting is part of the very concept of CRDTs. The CRDTs provide the framework for the actual operations to be executed within the cluster.

The rise and fall of microservices - Mark Richards

[by Stephan Klapproth] [Presentation]

Mark talked about how these days microservices are everywhere. DDD, continuous delivery, cloud environments, agility in business and technology were some of the drivers of the rise of microservices. Unfortunately, projects that introduce microservice architectures often are struggling with the complexity. They often miss their project plans and budget.

So before jumping on the bandwagon, you have to be aware of the challenges such a highly distributed architecture comes with. In his talk Mark outlined several pitfalls and gave some best practices to stop the decline and fall of microservices.

How do we take architectural decisions in eBay Classifieds Group - Engin Yöyen

[by Stephan Klapproth]

In his talk Engin presents different approaches to cope with the challenges of a widely distributed team with hundreds of developers, forcing him to rethink the classical role model of a software architect.
Consensual high level architectures, empowering the engineers to lead, architects as supporting enablers (versus architects as governance), techniques like delegation levels and architecture decision records ensured the success of the project at eBay Classifieds Group.

Reactive domain-driven design: From implicit blocking to explicit concurrency - Vaughn Vernon

[by Andreas Zitzelsberger]

Vaughn Vernon took us on an elaborate journey to a reactive domain-driven world. I had two key takeaways:
1. An anemic model, that is a model consisting only of data types, is not sufficient for a reactive domain-driven world. Instead state changing actions should be properly defined. For instance, to provide a method Person.changeAddress instead of Person.setStreet, Person.setCity, … Vaughn pressed the point that this is a necessity for effective reactivity.
2. When migrating to reactive microservices, the strangler pattern is an effective approach. Vaughn pointed out two tools that can help to enable reactivity with the strangler approach: Debezium, which turns database changes into events and Oracle Golden Gate.

Nov 11, 2019

DevOps Enterprise Summit 2019

von Michael Rohleder (Bereichsleiter bei QAware)

Die IT-Transformation in Richtung DevOps beschäftigt aktuell sehr viele Unternehmen der Welt, unter anderem auch unsere Kunden BMW, Deutsche Telekom und Allianz. Grund genug also auf dem Enterprise DevOps Summit zu sehen, welche Trends die DevOps-Community beschäftigen und wie es den Unternehmen bei Ihrer Transformation geht. Gastgeber der Konferenz ist Gene Kim, Gründer von IT Revolution. Er ist bekannt als Autor von einigen erfolgreichen Büchern, z.B. The Phoenix Project, The DevOps Handbook und Accelerate. Brandneu erschienen ist sein Buch The Unicorn Project, welches auf der Konferenz heftig umworben wurde - so wie man das eben aus den USA kennt.

Dieser Artikel liefert eine Zusammenfassung meiner persönlichen Eindrücke zum Besuch des DevOps Enterprise Summits in Las Vegas und liefert Links zu Vorträgen und weiterführenden Inhalten.


Die drei Tage der Konferenz waren geprägt von vielen anschaulichen und beeindruckenden Erfahrungsberichten von IT-Initiativen großer Unternehmen. So zeigte CSG wie sie in mehreren Jahren ihre veraltete Mainframe IT-Landschaft modernisiert und mit DevOps-Praktiken fit für die Zukunft gemacht haben. Walmart zeigte, wie sie ihren so kritischen und schwierigen Anwendungsfall zur Prüfung der Artikelverfügbarkeit in ihrer breiten Systemlandschaft umsetzen konnten. Grundlage war eine Umstellung ihrer synchron-orientierten Message-Architektur auf eine event-orientierte Message-Architektur. Viele weitere Initiativen wurden von Führungskräften der Unternehmen vorgestellt, z.B. von Adidas, John Deere, Optum, uvm.

Besonders gefreut hat mich der Vortrag unseres Kunden BMW. Ralf Waltram und Frank Ramsak stellten Ihre 100% Agile Journey bei BMW vor. Wunderbar zu sehen, wie sich unser Kunde in der IT weiterentwickelt. Ich hatte zudem den Eindruck, dass auch die Zuhörerschaft sehr beeindruckt von der Story war.

Psychological Safety

Der Erfolg der Unternehmen hängt immer maßgeblicher davon ab, wie sicher und wohl sich die Mitarbeiter im Unternehmen fühlen. Das wurde in vielen Vorträgen, aber auch Podiumsdiskussionen von Führungskräften untermauert. „Psychological Safety“ ist der Begriff, der in der Community schon länger als Erfolgsfaktor genannt wird. Nicht verwunderlich, dass der Begriff auch im neuen Buch von Gene Kim “The Unicorn Project” ein Thema ist. In diesem spricht er von “The Five Ideals”:
  • The First Ideal - Locality and Simplicity
  • The Second Ideal - Focus, Flow, and Joy
  • The Third Ideal - Improvement of Daily Work
  • The Fourth Ideal - Psychological Safety
  • The Fifth Ideal - Customer Focus

Auch Deloitte wirbt mit dem Slogan „better value sooner safer happier“ von Jonathan Smart, der in seinem Vortrag Risk and Control is Dead, Long Live Risk and Control eindrucksvoll erklärte, wie wichtig „Psychological Safety“ ist, um angemessen mit Risiken im Unternehmen umzugehen.


Der Accelerate State of DevOps Report repräsentiert die Forschungsergebnisse und Daten von mehr als 31.000 Umfrage-Teilnehmern weltweit. Er zeigt welche DevOps-Praktiken und Methoden zu mehr Software-Delivery- und Operational-Performance (SDO) führen. Dr. Nicole Forsgren, verantwortlich für den DevOps-Report, stellte die neuen Ergebnisse und Erkenntnisse auf dem Summit vor und zeigte was “DevOps Elite Performers” von “Low Performers” unterscheidet. Grundlage dafür sind genau vier Metriken “lead time”, “deployment frequency”, “mean time to restore (MTTR)” und “change fail percentage”. Thoughtworks hat diese vier Metriken auf ihrem Technologie Radar im April diesen Jahres von “Trial” auf “Adopt” gestellt, was einer Empfehlung zum Einsatz dieser Technik entspricht. Genauere Informationen zu den Ergebnissen des DevOps-Reports und zum wissenschaftlichen Vorgehen erläutert das Buch Accelerate. Den State of DevOps Report findet man bei Google, dort sind auch hilfreiche Beschreibungen zu DevOps-Praktiken und Methoden hinterlegt, die der Community bei der Umsetzung von DevOps in Ihrem Unternehmen helfen sollen.

Ein weiteres Finding auf der Konferenz ist für mich die Open Practice Library, die eine Sammlung von aktuellen DevOps-Tools und -Praktiken enthält, die durch die Community selbst entstehen.

“Project to Product” Bewegung

Die Umstellung der IT-Organisation von projektorientiertem Vorgehen hin zur Produktorientierung ist ein immer mehr gehörter Baustein in der IT-Transformation vieler Unternehmen. Hierzu gab es neben den Erfahrungsberichten der Unternehmen auch spannende Vorträge. Mik Kersten stellte in seinem Vortrag Project to Product sein Konzept des Flow Framework™ vor. Ein neuer Ansatz, der es Unternehmen ermöglichen soll, den “Flow of business value” im Software-Entstehungsprozess auf eine Weise zu messen, die sowohl IT als auch der Fachbereich verstehen soll. Weiterführende Informationen dazu gibt es in seinem Buch Project to Product.

Dominica DeGrandis versuchte Hilfestellung bei folgender Frage zu geben: “Do You Have the Right Teams to Manage Work by Product?”. Dabei zeigte sie auf, weshalb es “Full Stack Teams” braucht und keine “Full Stack Engineers”, welche neue Rollen man bei der Produktorientierung berücksichtigen sollte und wie man mit dem Team-Skillset umgeht.

Dominica ist auch bekannt als Autorin des Buchs Making Work Visible, in dem sie fünf Zeitdiebe in der Softwareentwicklung aufzeigt und erklärt, wie man sie abstellen kann.

Die Folien und Video-Aufzeichnungen zum DevOps Enterprise Summit sind erfreulicherweise öffentlich gestellt:

Interessante und spannende drei Tage in den USA, auf denen man den Spirit und den enormen Fortschritt der DevOps-Bewegung bei den Vorträgen und Podiumsdiskussionen und auch bei den Gesprächen mit anderen Teilnehmern spüren konnte. Zum Abschluss gab es auch noch ein schönes Mitbringsel nach Deutschland: eine erste Ausgabe des Buchs “The Unicorn Project” als Geschenk vom Gastgeber und Autor Gene Kim mit persönlicher Widmung. Vielen Dank!

Mar 11, 2019

How to dispatch flux to worker in Reactor

This post shows how to dispatch a flux of items to services of separated functional domains when using Reactor in Java. The author encountered this problem while developing a larger reactive application, where a strict separation of different domains of the application is key to maintain a clean architecture.
Reactor is a library for developing reactive applications and its reference guide is a good read to understand the basic principles of reactive programming.
The examples the author found for Reactor, or for other reactive libraries, show how to deal with a flux of items without mentioning how to dispatch and share this flux among separated functional domains. When mapping a flux to one functional domain, there is no obvious way to obtain the original flux to map it to another function domain. In the following, an example will detail the problem and present a solution for the Reactor library.

An example application

This section introduces an example application which will be transformed later to a reactive one. It will dispatch some deletion tasks to independent services, which is a common feature of larger software systems.
A customer is represented by (the usual Java boiler plate such as getters, setters, equalshashCodetoString is omitted)
public class Customer {
    CustomerId id;
    AccountId account;
    Set<InvoiceId> invoices;
A customer has its own account and a set of associated invoices. The classes CustomerIdAccountIdInvoiceId here are simple wrapper classes to uniquely identify the corresponding entities.
A service supposed to delete a set of customers has the interface
public interface CustomerService {
    void deleteCustomers(Set<CustomerId> customerIds)
An implementation of CustomerService should take care of deleting the account and the invoices as well.
public class CustomerServiceImpl {
  public void deleteCustomers(Set<CustomerId> customerIds) {
      Set<Customer> deletedCustomers = customerRepository.deleteCustomersByIds(customerIds);
      Set<AccountId> toBeDeletedAccounts = deletedCustomers.stream()
      Set<InvoiceId> toBeDeletedInvoices = deletedCustomers.stream()
              .flatMap(customer -> customer.getInvoices().stream())
The deletion of the customers itself is delegated to an underlying customerRepository, which returns a collection of the deleted customers for further processing (this "find and delete" pattern is common for NoSQL databases, such as MongoDB).
Furthermore, the deletion of the associated accounts and invoices are delegated to the respective accountService and invoiceService, which have the following interface:
public interface AccountService {
    void deleteAccounts(Set<AccountId> accountIds);

public interface InvoiceService {
    void deleteInvoices(Set<InvoiceId> invoiceIds);
Note that this example application has clearly separated domains, which are the customers, the invoices and the accounts.

Reactive interfaces

Turning the service interfaces into reactive services is straight forward:
public interface ReactiveAccountService {
    Mono<Void> deleteAccounts(Flux<AccountId> accountIds);

public interface ReactiveInvoiceService {
    Mono<Void> deleteInvoices(Flux<InvoiceId> invoiceIds);

public interface ReactiveCustomerRepository {
    Flux<Customer> deleteCustomersByIds(Set<CustomerId> customerIds);

public interface ReactiveCustomerService {
    Mono<Void> deleteCustomers(Set<CustomerId> customerIds);
Note that returning a Mono<Void> is the reactive way of telling the caller that the requested operation has completed (with or without errors). Also note that the input to the ReactiveCustomerRepository stays non-reactive, as we want to focus on the reactive implementation of the CustomerService in combination with ReactiveAccountService and ReactiveInvoiceService.

Reactive implementation

A first attempt

A first attempt to implement CustomerService reactively could lead to the following code
public Mono<Void> deleteCustomers(Set<CustomerId> customerIds) {
    Flux<Customer> deletedCustomers = reactiveCustomerRepository.deleteCustomersByIds(customerIds);

    Flux<AccountId> toBeDeletedAccounts = deletedCustomers
    Mono<Void> accountsDeleted = reactiveAccountService.deleteAccounts(toBeDeletedAccounts);

    Flux<InvoiceId> toBeDeletedInvoices = deletedCustomers
            .flatMap(customer -> Flux.fromIterable(customer.getInvoices()));
    Mono<Void> invoicesDeleted = reactiveInvoiceService.deleteInvoices(toBeDeletedInvoices);

    return Flux.merge(accountsDeleted, invoicesDeleted).then();
However, when using the following dummy implementation for the reactiveCustomerRepository,
public Flux<Customer> deleteCustomersByIds(Set<CustomerId> customerIds) {
    Flux<Integer> generatedNumbers = Flux.generate(
            () -> 0,
            (state, sink) -> {
                System.out.println("Generating " + state);
                if (state == customerIds.size() - 1)
                return state + 1;
    return generatedNumbers
            .doOnSubscribe(subscription -> {
                System.out.println("Subscribed to repository source");
            .map(i -> {
                CustomerId id = new CustomerId("Customer " + i);
                return createDummyCustomerFromId(id);
the following output is obtained:
Subscribed to repository source
Generating 0
Deleting account AccountId[id=Account CustomerId[id=Customer 0]]
Generating 1
Deleting account AccountId[id=Account CustomerId[id=Customer 1]]
Generating 2
Deleting account AccountId[id=Account CustomerId[id=Customer 2]]
Subscribed to repository source
Generating 0
Deleting invoice InvoiceId[id=Invoice CustomerId[id=Customer 0]]
Generating 1
Deleting invoice InvoiceId[id=Invoice CustomerId[id=Customer 1]]
Generating 2
Deleting invoice InvoiceId[id=Invoice CustomerId[id=Customer 2]]
This might be surprising as the reactiveCustomerRepository is requested twice to generate the customer. If the repository wasn’t a dummy implementation here, the account deletion would have consumed all those deletedCustomers, and the subsequent invoice deletion would have worked on a completed stream (meaning doing nothing at all). This is certainly undesired behavior.

Handling multiple subscribers

The reference documentation has an answer to this problem: Broadcasting to multiple subscribers with .publish(). The failing attempt should thus be modified as follows
public Mono<Void> deleteCustomers(Set<CustomerId> customerIds) {
    Flux<Customer> deletedCustomers = reactiveCustomerRepository.deleteCustomersByIds(customerIds);

    deletedCustomers = deletedCustomers.publish().autoConnect(2);
    Flux<AccountId> toBeDeletedAccounts = deletedCustomers
    Mono<Void> accountsDeleted = reactiveAccountService.deleteAccounts(toBeDeletedAccounts);
    deletedCustomers = Flux.merge(deletedCustomers, accountsDeleted).map(customer -> (Customer)customer);

    deletedCustomers = deletedCustomers.publish().autoConnect(2);
    Flux<InvoiceId> toBeDeletedInvoices = deletedCustomers
            .flatMap(customer -> Flux.fromIterable(customer.getInvoices()));
    Mono<Void> invoicesDeleted = reactiveInvoiceService.deleteInvoices(toBeDeletedInvoices);
    deletedCustomers = Flux.merge(deletedCustomers, invoicesDeleted).map(customer -> (Customer)customer);

    return deletedCustomers.then();
As .autoConnect(2) is used, the subscription to the repository publisher only happens if two subscriptions have happened downstream. This requires the reactiveAccountService and reactiveInvoiceService to return a Mono<Void> which completes once the given input flux is consumed completely, which ensures one subscription. The second subscription is achieved by merging the output together with original input flux.
The output is then as expected
Subscribed to repository source
Generating 0
Deleting invoice InvoiceId[id=Invoice CustomerId[id=Customer 0]]
Deleting account AccountId[id=Account CustomerId[id=Customer 0]]
Generating 1
Deleting invoice InvoiceId[id=Invoice CustomerId[id=Customer 1]]
Deleting account AccountId[id=Account CustomerId[id=Customer 1]]
Generating 2
Deleting invoice InvoiceId[id=Invoice CustomerId[id=Customer 2]]
Deleting account AccountId[id=Account CustomerId[id=Customer 2]]
At this point, the reactiveAccountService and reactiveInvoiceService could now also decide to .buffer their own given flux if they wanted to delete the given accounts or invoices in batch. Each implementation is free to choose a different buffer (or batch) size on its own. This is an advantage over the non-reactive implementation, where all items have been collected in one large list beforehand and are then given in bulk to the accountService and invoiceService.

Introducing a utility method

The above working solution has already been written such that a generic utility method can be extracted
public class ReactiveUtil {
    private ReactiveUtil() {
        // static methods only

    public static <T> Flux<T> dispatchToWorker(Flux<T> input, Function<Flux<T>, Mono<Void>> worker) {
        Flux<T> splitFlux = input.publish().autoConnect(2);
        Mono<Void> workerResult = worker.apply(splitFlux);
        return Flux.mergeDelayError(Queues.XS_BUFFER_SIZE, workerResult, splitFlux)

    private static <T> T uncheckedCast(Object o) {
        return (T)o;
Instead of Flux.mergeFlux.mergeDelayError is used which handles the situation better if the worker returns an error for completion. In this particular use case, it’s desired that deletion continues even if one worker fails to do so. The worker is also expected to return a Mono<Void> which completes once the input flux is consumed. The simplest worker function would thus be Flux::then.
The unchecked cast could not be removed, but in this circumstance it should never fail as the merged flux can only contain items of type T, as the Mono<Void> just completes with no items at all.
A usage example in a more reactive style of coding would be
return reactiveCustomerRepository.deleteCustomersByIds(customerIds)
        .transform(deletedCustomers -> ReactiveUtil.dispatchToWorker(
                workerFlux -> {
                    Flux<AccountId> toBeDeletedAccounts = workerFlux
                    return reactiveAccountService.deleteAccounts(toBeDeletedAccounts);
        .transform(deletedCustomers -> ReactiveUtil.dispatchToWorker(
                workerFlux -> {
                    Flux<InvoiceId> toBeDeletedInvoices = workerFlux
                            .flatMap(customer -> Flux.fromIterable(customer.getInvoices()));
                    return reactiveInvoiceService.deleteInvoices(toBeDeletedInvoices);
Note the pattern of using .transform together with the utility function. The output is the same as the working example above.


Reactive applications should still follow the overall architecture of larger applications, which are usually split into several components for each functional domain. This approach clashes with reactive programming, where usually one stream is mapped with operators and dispatching work to other services is not easily supported. This post shows a solution, although usage in Java of the presented utility function is still somewhat clumsy.
In Kotlin, the usage of extension functions would make this utitilty easier to use without the rather clumsy .transform pattern above.
It’s also open if there’s a better solution for the presented problem. Comments welcome!

Jul 12, 2018

Virtual Kubelet - Run pods without nodes

During my recent visit of the ContainerDays 2018 in Hamburg (19.-20.06.2018) I attended an interesting talk held by Ria Bhatia from Microsoft about Virtual Kubelet.

Virtual Kubelet is an open source Kubernetes Kubelet implementation that allows you to run Kubernetes pods without having to manage nodes with enough capacity to run the pods.

In classical Kubernetes setups, the Kubelet is an agent running on each node of the Kubernetes cluster. The Kubelet provides an API that allows to manage the pod lifecycle. After a kubelet has launched, it registers itself as a node at the Kubernetes API Server. The node is then known within the cluster and the Kubernetes scheduler can assign pods to the new node, accordingly.

Especially in environments with volatile workloads, managing a Kubernetes cluster means providing the right number of nodes over time. Adding nodes just in time is often not an option, since spinning up a new node just takes too much time. Thus, operators are forced into running and paying for additional nodes to support payload spikes.

The Virtual Kubelet project addresses such operational hardships by introducing an application that masquerades as a kubelet. Just like a normal kubelet, the Virtual Kubelet registers at the Kubernetes API Server as node and provides the Kubelet API to manage pod lifecycles. Instead of interacting with the container runtime on a host, the Virtual Kubelet utilizes serverless container platforms like Azure Container Instances, Fargate or Hyper.sh to run the pods.

Image Source: https://github.com/virtual-kubelet/virtual-kubelet

Using these services via the Virtual Kubelet allows you to run containers within seconds and paying for them per seconds of use, while still having the Kubernetes capabilities for orchestrating them.

The interaction with external services out of the Virtual Kubelet is abstracted by a provider interface. Implementing it allows to bind other external services for running pods.

The project is still in an early state and currently not ready for use in production. However, it’s a very interesting link between container orchestration platforms and serverless platforms and has numerous use cases.

Jul 11, 2018

ContainerDays 2018: Top talks on conference day 2

Bildergebnis für containerdays logo

I attended the ContainerDays 2018 in the Hamburg Hafenmuseum. It was a very cool location midst containers (the real ones), cranes and big freight ships. There were three stages, the coolest stage was definitely in the belly of an old ship. I‘ll write about the talks I visited and give a short summary. You find a list of all talks here: https://containerdays.io/program/conference-day-2.html. Videos are coming soon – I’ll edit this post when they are available.

Update: Here they are: https://www.youtube.com/playlist?list=PLHhKcdBlprMcVSL5OmlzyUrFtc7ib1V4w


One Cluster to Rule Them All - Multi-Tenancy for 1k Users

Lutz Behnke from the HAW Hamburg (Hamburg University of Applied Sciences) talked about running a private cloud in a university. Their requirements on a cloud are somewhat different from what you see in the private sector: Sometimes they just need a small web server to serve a static web page for a student, sometimes they need loads of GPUs for computing a heavy research project. All that should be easily available to more than 1000 students, which are reluctant to read documentation.
They had to build that from the ground up, as universities mostly can’t use AWS, Google Cloud etc. The first version was based on VMware, but that was scrapped quickly: Students overestimated their requirements on resources and requested quad core CPUs with a load of memory to just serve their small web application they needed for their network course. After the course was done, no one released the resources. Of course no student ever applied security patches to these eternally running virtual machines.
The second version of the private cloud is based on Kubernetes (k8s). K8s should in theory support multi-tenancy, but everyone understands that concept a little bit differently. The HAW needed LDAP authentication in k8s, so they built a small tool called kubelogin, which authenticates against a LDAP server. The authorization is managed via GitLab groups, and they built a tool which syncs these GitLab groups back into k8s. Rook.io is used for the distributed storage.
They already solved many problems and the solutions have been communicated back into the Multitenancy Working Group. But some problems are still unsolved: How to handle the logs from 2000+ nodes? How to share GPU nodes? They also ran into problems with etcd – the default of 2 GB storage space is too little when you have high pod churn.
One lesson learned: Even when every of your nodes is cattle, etcd is your pet. They published all the work done on their website http://hypathia.net/en/.


Lightning Talk: Gitkube: Continuous Deployment to Kubernetes using Git Push

Shahidh K. Muhammed from Hasura talked about GitKube. He isn‘t happy with the current deployment flow when using k8s and wants something similar to Heroku, where you just push code to a git remote and the system does the rest: compiling, packaging, deploying. He showed us GitKube (https://github.com/hasura/gitkube), which works by using git hooks. When you push to the git remote, a special worker in the k8s cluster builds, packages and deploys the code. He demonstrated the whole setup in a live demonstration. Very cool!


Distributed Microservices Metrics and Tracing with Istio and OpenCensus

Sandeep Dinesh from Google talked about the microservice hype, metrics, tracing and Istio. Turns out that microservices, for all their glory, have downsides: They increase the complexity in infrastructure, development and introduce more latency. Also tracing the request through multiple services and metrics collection gets a lot harder.
For distributed tracing, you need in essence: a trace id generator, passing of the trace id to downstream services, span (service and method calls) to collector sending and finally some data processing of these traces and spans in the collector. An example of such a processing tool is Zipkin (https://zipkin.io/). As you don‘t want vendor locking to one tracing tool, Google created a new initiative called OpenCensus (https://opencensus.io/). This decouples the implementation (e.g. Zipkin) from the API to which the service is compiled to.
Istio, which uses sidecars to instrument and trace services on k8s, also supports OpenCensus. Istio takes care, for example, of monitoring the incoming and outgoing traffic. It also creates the trace id, if none is available. As Istio can‘t look in the k8s service, you need to call the OpenCensus API to create the spans. Istio then merges the spans from OpenCensus with its own observed behavior and reports it to the collector.
Sandeep showed all of that in a live demonstration and also emphasized that the whole stack is early development and should not be used in production.


Applying (D)DDD and CQ(R)S to Cloud Architectures

Benjamin Nothdurft talked about domain driven design (DDD) and high level architecture. He explained a technique to find the bounded contexts of your domain and gave an introduction into CQRS.
CQRS is essentially splitting your models into two: one for querying, one for updating. In the software Benjamin presented, they used JPA and a relational database for the update model and an ElasticSearch instance for the querying model. They also split the service into two, one for updating, one for querying. When updating the data, the update event is put in a queue. The querying service processes the events from the queue and applies the updates on the querying model.
This, of course, complicates the whole system and makes sense when you have an asymmetric load – in this case the querying side had to be scaled independently from the updating side.


Containers on billions of devices

Alexander Sack from Pantarcor talked about containers on devices. In his talk, a device can be a router, a drone, a tablet etc. He excluded the smaller things, like embedded sensors or actors.
The way these devices are built today is as follows: Design the hardware and the firmware, then develop this software, assemble the hardware in the factory, put the software on it and then never touch it until end of life. One of the things lost this way is security. A better way would be to pick general stock hardware and peripherals, assemble them in a chassis and put some general purpose software on it. The device specific software can then be developed and updated even after the device has been shipped to the customer.
One way to do that is - you guessed it - with containers! There are already some products which do this, for example resin.io using Docker containers. The problem using Docker containers is that they are really heavy on the disk and not that suitable for smaller devices, with, say, 8 MB of flash space.
Pantarcor developed a solution which is completely open source. They are packaging the whole system in containers, with a small hypervisor to orchestrate the containers and to update the base system. They are using LXC containers under the hood, which lowers the space consumption. PantaHub (https://www.pantahub.com/) is their UI to manage the devices.


Secret Management with Hashicorp's Vault

Daniel Bornkessel from innoQ talked about Vault (https://www.vaultproject.io/). Vault manages application secrets like encryption keys, database credentials and more. It also does credential rolling, revocation of credentials and auditing.
He explained the architecture and concepts of Vault: A client authenticates to the Vault server, Vault uses a connector to read (or generate) the secret (e.g. the database credentials). The authentication is pluggable and supports multiple plugins, like hard-coded secrets, k8s security or AWS credentials.
Vault also supports generating credentials on the fly. It can, for example, log into your PostgreSQL database, create a user name and password on the fly and hand it to the client. When the client authentication expires, it cleans up the user in PostgreSQL. That way there are no hard-coded passwords. This is definitely a big win for security!

Jun 14, 2018

Impressions from SEACON 2018 - Part 2

by Harald Störrle

"Domain Driven Design" and "Taylorism"

Henning Schwentner (wps solutions GmbH) presented the concepts behind Domain Driven Design (DDD, see [2,6,8] for general references, and [7] for the slides of the talk). The general idea behind DDD is to structure applications vertically rather than horizontally into domains: Design small, self-contained portions of an application domain rather than attempt to get (only?) the big picture. It doesn’t stop there, though: The domain-structure ought to be established, says Schwentner, not just in the design (aka. models), but likewise in the architecture, code structure, organisation structure, and tooling (e.g. repositories). The “Design” in DDD refers to domain-level models (mostly conceptual, it appears) that constitute the ontology of a (sub-) domain and allow to define the boundaries ("Bounded Context") which are reflected in the interfaces at code level.

At first hearing, DDD reminds me a lot of the Role-Modeling approaches of the late 1990’s [1,3,4] (then absorbed into UML), or the Business Objects from the early 1990’s [5], or, even earlier, of the vision and promise of OO technology in general: closing the “semantic gap” between application and technology. Of course, DDD offers modern(ized) terminology, and there certainly is a lot of technical progress since the early days of OO, but the idea is not as new as it might seem… Still, it is a good idea, and it easily survives being renamed, rephrased, and repackaged (again). Maybe, this time around, we will finally see the convergence of application needs and technology opportunities.

Obviously, vertical structuring organisations is all the rage today. The main benefit is obviously the increased agility of small scale teams, hopefully not loosing the capability to tackle large scale problems, or maybe even upgrading organisational capabilities from solving complicated to solving complex problems, never mind wicked problems. Clearly, introducing proper modules into Java 9 is an important contribution towards this goal. And it makes perfect sense to me to bet on this one, even though “module” is not quite a brand new concept either…better late than never. I remain cautious, though, since vertical structures have downsides, too (ever heard the term “information silo”?). And I can’t see the reasons of having horizontal really going away for good (synergy, reuse, integration).

Having said that, I do like the idea of starting at an (elevated) level of abstraction. In my experience, this is difficult enough at the level of models, let alone code. What I find truly interesting, though, is the breadth and prominence that the social or organisational persepective has gained in IT conferences. A side topic in Schwentner's talk, it took center stage in a talk by Frank Düsterbeck. He spoke about leadership in learning organisations ("Taylor ist tot, es lebe der Mensch – Führung in der lernenden Organisation"; "Taylor is dead, long live the human - leadership in learning organisations"). He pointed out that there, in fact, are two types of problems:
  • Complicated problems can be tackled by applying diligence, systematic procedure and delegation. Such problems can be solved by mechanical steps in the end.
  • Complex problems, on the other hand, are by definition beyond what one person can grasp. Only self-organised teams can hope to conquer them.

Of course, in today's highly dynamic market places the latter abound. With the threat of disruption just around the corner, agility is key for thriving as an organisation. So, the call to take teams seriously, is perfectly plausible to me. Not many organisations have embraced this idea, and many more should. Düsterbeck's plea strikes me as somewhat shallow, though. As he points out himself, a tree has fewer edges than a (connected) graph. If every edge corresponds to a communication link, then the overhead for self-organised teams increases much stronger with increasing numbers than it does for hierarchical organisations (Brooks' Law: adding people to a late project makes it later). He observes that there are two types of communication:
  • Steering communication: This is unavoidable, but it is also the smaller part and is thus not the key factor contributiong to communication overhead.
  • Knowledge dissemination: This can, at least to some degree, be replaced by converting fluid and tacit knowledge into a more static form (aka. "documentation").

I am not sure, how much slack this distinction cuts a team. And what about those problems that are too big for one (small) team? DDD will answer: Create another subdomain and establish interfaces. However, the overall picture must be established, too, and "emergent interfaces" is boud to create friction, duplication and defects of every sort. Düsterbeck also highlights that the usual T-shaped profile in technology (broad coverage with deep, deep rooting in some place) is not enough. It must be complemented by the second dimension of domain-knowledge, again T-shaped. What is more, he wants a third dimension in this picture, the social dimension of individuals and teams (see figure below as taken from https://twitter.com/fduesterbeck). Indeed, the times of Taylorism are over.



[1]   Epstein, Pete, and Ravi Sandhu. "Towards a UML based approach to role engineering." Proc 4th ACM Ws. Role-based Access Control. ACM, 1999.
[2]   Evans, Eric: “Domain-driven design: tackling complexity in the heart of software” Addison-Wesley, 2004.
[3]   Halpin, Terry "Object-role modeling (ORM/NIAM)" Handbook on architectures of Information Systems. Springer, Berlin, Heidelberg, 1998. 81-103.
[4]   Halpin, Terry, and Anthony Bloesch "Data modeling in UML and ORM: a comparison" Journal of Database Management (JDM) 10.4 (1999): 4-13.
[5]   Sims, Oliver “Business objects: Delivering cooperative objects for client-server McGraw-Hill, Inc., 1994.
[6]   Schwentner, Henning: Domain Storytelling Website http://domainstorytelling.org/
[7]   Schwentner, Henning: Models, Modules, and Microservices” Speakerdeck.com/hschwentner
[8]   Vernon, Vaughn: “Domain-driven design distilled” Addison-Wesley, 2016.