raphting.dev

Recent Posts

What Software Engineering can learn from Aviation

Is it not magical that an airplane crew that has possibly never worked together in this constellation can flawlessly execute a flight on an extremely complex airplane? A few key ingredients make that possible. Can Software Engineers learn from it?

Disclaimer: I am a licensed pilot, but I never worked with a Crew. I was lucky enough to have excellent instructors who work(ed) as Fighter Jet Pilot, KLM Boeing 747 Captain and Lufthansa Airbus A320 Captain. What I write about airplane crews comes from what I learned from them.

What unites the procedure to execute a flight is the so called Crew Resource Management (CRM). NASA invented it after the number of fatalities came to a peak in the 1970s. Even though all pilots undergo long trainings to fly an airplane, most fatalities originate from human failure. With this observation, the idea was to change pilot’s trainings. Just being good at flying seems to miss a point. Comparing this observation to Software Engineering, just because a team comprises good coders does not mean the team is successful at building a great software product.

CRM puts communication, leadership and decision making in the focus. I will describe how a crew works and how we could use these patterns in Software Engineering.

The Mission Goal

In aviation, the goal’s definition might be easier than in engineering. On a regular mission, the Crew wants to execute a safe, orderly and expeditious flight, which is the “How”. Flying an airplane from one point to another is the “What”. In Software Engineering, I have seen all kinds of goal definitions, from very heroic to measurable and time-bound. What these goals rarely contain is a prioritization of comprisable goals.

Software Engineering does usually not have enough time or Engineers. Software Engineering is constrained. So is flying. You have what you have aboard, not more, not less. A Crew that needs to decide knows exactly what to put first: safety. Then comes orderly. Then expeditious. In a more tactical way this is aviate, navigate, communicate. Does a Software Engineer know what to put first when the goal is “By the end of this quarter, the time spent per task should reduce by 10%”? This is a “What” goal. We need the “How”.

How to reach software engineering goals

Let’s say we keep wording simple and give clear goals on how software should be like:

  1. Readable
  2. Correct
  3. Reliable
  4. Reusable
  5. Extendable
  6. Flexible
  7. Efficient

With an ordered list like this, there is a clear picture of how to structure the development process. An Engineer trying to decrease time spent on a task will trade efficiency of the written software first, as it comes last in the priorities. This is an example. Order or content of this list should be adapted to the needs of the respective software project.

Leadership

Aviations “flat hierarchy” is widely known. The Captain has the last word but the Co-Pilot is an effective follower. Followership is a skill to learn. I think Software Engineering gets this one quite right in most cases. Many teams have a dedicated lead developer. Hierarchies are usually flat in modern developer teams.

I want to emphasise that Psychological Safety is the number one indicator for a well-performing team (The five keys to a successful Google team). This is for both, leaders to foster respectful communication and followers to be assertive and respectful at the same time.

Communication

Communication skills are not given. The knowledge about communication is often told but not always incorporated. Most people have heard about how to make a proper appeal, how to do nonviolent communication, and most people have an intuition about the right tone. The biggest difference between aviation and software engineering is how often communication is addressed.

Airplane Crews speak about communication. Usually in the briefing before the flight, the Captain mentions again that everyone is encouraged to communicate about issues they notice. Even though every single crew member learned about communication in their training, Crews address its importance frequently.

In Engineering we sometimes assume communication. I think we can learn from aviation in a way to repeat how and why we want to communicate. Before I fly, I brief my passengers to tell me when they see something they find unusual. I tell them I take responsibility to notice unusual situations, but a second pair of eyes never hurts. I tell them even if they were my passengers before.

Decision making

One action I learned to take in aviation is to “sit on my hands” before falling for hasty measures. Usually no situation is so time critical that you can’t think about it for a few seconds. I remember when I was on approach for a small airport. The controller notified me there is an airplane stuck on the runway. She offered me to land on the grass strip next to it. On that day, a licensed pilot sat in the back of my airplane. He joined the flight as passenger. I informed him about the situation and that I would go for a landing on the grass strip. If anything feels unsafe, I would abort the landing. I asked for his opinion. He agreed. Instead of deviating or accepting the controller’s offer right away, I “sat on my hands” for a situation I never trained before, asked for more input, and took a decision after I could see the big picture again.

It turns out that in critical and stressful situations, humans fall back to behaviour they trained before (This is also the reason you should read the safety card. In the “unlikely event”, you need to know how to get out). Usually as a Software Engineer, you have more time to think about decisions to make than when flying a plane. Still, production databases get accidentally deleted and other mistakes happen, especially in critical situations affecting production environments. If a Software Engineer would simulate stressful scenarios as part of their training, maybe failure rates would lower. If Software Engineers would repeat the process of decision making, risk calculations, asking for a second opinion, we could win uptime in production.

Conclusion

Finding analogies for software development was done in excess before. With this article, I don’t want to open another analogy by saying developing software is like flying an airplane. I want to explain how Software Engineering, and probably other disciplines too, can learn from Crew Resource Management. It is not the machinery or the craft skills; it is the incorporation of human factors that lead to the significant results modern aviation achieves and I would like to see more of the methodologies applied for Engineering.

It requires discipline to work in the way outlined here, it is not “the sexy way” to work, but I am convinced it is worth the effort.

Digitaler Impfpass - Kritik

Der digitale Impfpass kommt. Die Bundesregierung hat IBM den Zuschlag zur Koordination des Projekts gegeben. Eine Firma namens Ubirch aus Köln liefert das entsprechende Protokoll zur Ausstellung und Verifizierung von Impfdaten. Ich möchte nicht über die Nützlichkeit oder die gesellschaftlichen Aspekte des digitalen Impfpasses schreiben. Mir geht es vor allem um die technische Umsetzung in Bezug auf Datensparsamkeit. Ich kritisiere zunächst drei Punkte der Implementierung, wie sie zur Zeit ersichtlich ist. Die Daten, die ich online finde, sind allerdings spärlich. Abschließend erläutere ich einen Ansatz, der mir praktikabler erscheint und auf den Datenschutz und das Voranschreiten der Wissenschaft Rücksicht nimmt.

1. Was das Bundesgesundheitsministerium gerne hätte

Das vom Bundesgesundheitsministerium beschriebene Konzept erscheint mir nicht schlüssig. Die Beschreibung vermischt einen Überblick mit technischen Details. Deshalb ist mir nicht klar, ob hier vielleicht nur zu sehr vereinfacht wurde. Besonders gestört habe ich mich an dem Absatz

Die App speichert die Impfbescheinigung lokal auf dem Smartphone. Dieser 2D-Barcode ist nur einmalig einlesbar und die Impfbescheinigung ist anschließend an das einlesende Smartphone gebunden.

Es muss jedem klar sein, dass ein Barcode mehrfach einlesbar ist, sofern er nach dem Abfotografieren nicht zu Staub zerfällt. Es reicht aus, das selbe Backup auf weiteren Telefonen einzuspielen, selbst wenn eine App die registrierten Barcodes online abgleicht. Ich nehme an, die Anforderung wurde gestellt, um eine Weitergabe des Barcodes zu verhindern. Eine Weitergabe des Telefons wäre allerdings jederzeit möglich.

2. Wofür wir keine Blockchain brauchen

Weiter geht es mit der Blockchain, dem Thema des digitalen Impfpasses mit den meisten Witzen im Netz. Das Konzept der Blockchain hat keinen Vorteil den ich sehen kann, der nicht auch mit einer herkömmlichen Datenbank zu erreichen wäre. Die vorgeschlagene Blockchain wird laut Infografik von GovDigital betrieben. Man könnte argumentieren, dass die private Blockchain eine Nachvollziehbarkeit der Reihenfolge gewährleistet. In privater Hand, wäre die Reihenfolge allerdings trotz Blockchain jederzeit veränderbar. Banken müssen Transaktionen ebenfalls nachvollziehbar abspeichern. Auch dort muss man sich nicht auf eine Blockchain stützen.

3. Datensparsamkeit

Wenn ich das Beispiel auf der Seite von Ubirch richtig verstehe, wird ein digitaler Fingerabdruck der Patientendaten (ein sogenannter kryptographischer Hashwert) in der Blockchain gespeichert. Sofern ich bei einer Kontrolle alle meine Daten, die auch durch die Impfstelle gemeldet wurden, abgebe, kann der Hashwert erneut von meinen Daten berechnet werden und wird danach in der Blockchain gefunden. Sofern der Hashwert gefunden wurde und meine vorgezeigten Daten bei der Kontrolle ausreichend sind, ist die Kontrolle erfolgreich. Die Kontrollstelle erhält also meinen Namen, Geburtsdatum, verabreichter Impfstoff, Daten der Impfstoffgaben und Dokumentnummer zum Abgleich mit einem Ausweis. Als NutzerIn der App kann ich mich also keiner Kontrolle unterziehen, ohne alle Patientendaten an eine möglicherweise nicht vertrauenswürdige Kontrollstelle zu geben.

Digital aber machbar

Meine Kritikpunkte habe ich oben erläutert: Barcodes sind, anders als vom Gesundheitsministerium angenommen, mehrfach einlesbar. Ein zentraler Speicherort von Patientendaten oder dessen Fingerabdruck, vor allem in einer Blockchain, ist umständlicher als nötig. Bei Kontrollen werden mehr Daten ausgetauscht als eigentlich notwendig.

Ich schlage folgendes Konzept vor: Jede Impfstelle erhält ein digitales Zertifikat. Digitale Zertifikate können wie digitale Unterschriften verwendet werden. Mit diesem Zertifikat ist es der Impfstelle möglich, Patientendaten zu beglaubigen. Die Impfstelle erstellt einen 2D Barcode der Patientendaten und unterschreibt diese Daten digital mit ihrem Zertifikat. Danach erhält die geimpfte Person diesen Barcode.

Konzept Digitaler Impfpass

Mit einer App wird der 2D Barcode eingescannt. Bevor eine Kontrolle durchgeführt wird, kann die Anwenderin entscheiden, welche Daten sie preisgeben möchte. In der Regel ist nur ein “Okay/nicht Okay” und eine Ausweisnummer notwendig. Es spielt keine Rolle, welcher zugelassenen Impfstoff verwendet wurde, oder ob das “Okay” zum Beispiel wegen einer durchgemachten Erkrankung gegeben wird.

Die eingelesenen, von der Impfstelle digital unterschriebenen Patientendaten werden von der Anwenderin mithilfe einer App zu einer Validierungsstelle im Internet geschickt. Zurück kommt ein kleineres Datenpaket, das ebenfalls digital unterschrieben ist. Diesmal ist die Unterschrift allerdings von der zentralen Stelle, zum Beispiel vom Gesundheitsministerium. Dieses kleinere Datenpaket enthält nur noch die Informationen “Okay, Ausweisnummer, Gültig bis”. Dieses kleinere Datenpaket kann die Anwenderin nun per 2D Barcode an die Kontrollstelle übergeben. Die Kontrollstelle kann die digitale Unterschrift verifizieren. Das geht übrigens auch ohne Verbindung zum Internet. Danach muss nur noch die Ausweisnummer mit dem vorgelegten Ausweis abgeglichen werden.

Der Nachteil ist, dass die Patientendaten einmal durch die Onlinestelle laufen müssen, und nicht ausschließlich im Impfzentrum und bei der geimpften Person verbleiben. Im Ubirch Konzept gehen die Daten allerdings immer vollständig an alle besuchten, unter Umständen nicht vertrauenswürdigen, Kontrollstellen. Der Vorteil ist, dass begrenzt gültige Zugangsberechtigungen nach dem Stand der Wissenschaft immer wieder neu ausgestellt werden können. Zum direkten Vergleich: Sobald Daten in der Ubirch Blockchain enthalten sind, liegt es im Ermessen der Kontrollstelle zu entscheiden, ob ihnen zum Beispiel ein bestimmter Hersteller eines Impfstoffs genügt. Im von mir vorgeschlagenen Konzept entscheidet eine zentrale Stelle darüber, was nach dem Stand der Wissenschaft zumindest für eine begrenzte Gültigkeit als “Okay” erscheint.

Es bleibt abzuwarten, wie die konkrete Umsetzung aussehen wird. Vielleicht wird doch keine Blockchain eingesetzt. Vielleicht wird doch mit digitalen Zertifikaten gearbeitet, um eine Überprüfung ohne Verbindung zum Internet zu ermöglichen. Vielleicht wird es doch noch ein System geben, bei dem nicht alle Patientendaten an die Kontrollstelle gegeben werden müssen, sondern nur die tatsächlich notwendigen Daten.

Secure updating is hard

Some systems have higher security requirements for auto-updates than others. Think about cars, airplanes and wherever physical harm can result. For updates, secure or not, there is a common pattern:

  1. Check a remote endpoint for available updates
  2. Retrieve the update
  3. Apply the update

Number 3 is environment and application specific, so it won’t be covered in this text. If you think, infrastructure will never be compromised, then this text is not for you. The approach here is: “Trust the people, don’t trust the infrastructure”. The following paragraphs are written under the assumption that update hosts could be compromised.

Attacks on updates

To have a realistic case, imagine you wrote an App and want to apply the above three-step pattern for updates. The App knows an endpoint for updates. This could be an HTTPS endpoint. There, the client finds the link for the update, called target and downloads the target.

Arbitrary data

If an attacker controls the update host, she has the option to change the target link for the update to arbitrary sources. The client would blindly download these compromised sources and apply them 💥 A provided hash, cryptographically identifying the target, does not help, as the attacker could change the hash to her liking too.

Let’s sign the data on the server. The client is (pre-)loaded with a public key and able to check the signature. So now we are safe?

Roll back

The attacker watched us already for a while and has a history of all signed update files. She knows that one of the older versions was vulnerable. She puts the old signed file on the update server. Our client trusts the signature and downloads and applies the vulnerable version 💥

Alright, this one was easy. We add a timestamp to the updates. If the local version is older than the remote version, the client can safely download and apply data. We are done, right 😅?

Indefinite freeze

We achieved so far the update’s authenticity and freshness. The attacker could block us from new updates without our clients ever noticing though. The attacker gives us the same update file over and over again. This could give her an advantage to find security bugs, wait for a responsive disclosure, letting cars drive with old traffic rules, letting planes fly with old airspace data etc. 💥

We add an expiration field to our update file. This field needs to be signed frequently with an online-key (a cryptographic key that is stored on a server in contrast to other keys that should reside on an offline device). Even if the attacker prevents us from downloading updates, at least the clients can detect the expiration and give a warning.

Wow, that was more work than expected to secure the update process, don’t you think? You guessed it. There is more.

Endless data

The attacker gets a little frustrated about our good security measures, so she thinks about doing harm otherwise. She controls the update target as well. Instead of letting the client download a normal file, she puts a few TB of randomness there. Enough to fill any of our clients drive. In the worst case, overfull storage will brick our clients forever 💥

We could have thought about it before, but with this deterrent example it is clear. On the signed update file we add a field for file size and tell our client to stop downloading when this amount of data was received.

Hey, what a fun ride. The update process is secure now, right?

Wrong software

We forgot that we run similar clients but with a whole different software on it. It runs a few years longer already, so the update version numbers are higher than for our current App. I think you think what the attacker thinks: Hell’s bells, the attacker could present the other software’s update data. It was signed with the same key. Our clients will just download another software 💥 Who knows how they react to it?

This was a tricky one, but we got it. Together with the other information, we store an App identifier, for example the product name and platform. We ask the client to always compare the given App identifier to a hard coded App identifier. Only if they match, the update is applied.

Compromised online key

Did I write earlier we should work with an online key to sign the expiration date? We don’t know if the attacker could access the secret online key. So expiration dates will be vulnerable forever 💥? No good, no good. What we need are multiple keys for multiple roles. The secret root key should never get lost, but it’s okay to store it offline somewhere.

In a separate file on the update server, we store the needed public keys. These entries are signed by the secret root key. The client is able to retrieve public keys through this file, for example the expiration date signing key. If that one is compromised or should be refreshed, just update and sign the root file. From that moment on, the client can trust the new keys again.

Summing it up

A lot can go wrong when a client tries to update itself, especially if you cannot exclude a compromise of the update infrastructure. Most scary attacks can be prevented by the use of cryptography and logic. The Update Framework is a great resource for exploring the space of secure updates more.

Writing again

I write as long as I know how to use a text editor. It started early, I think when I was eight years old and my mum brought home a used computer. She worked at an IT company at that time, otherwise I don’t think my family would have gotten a computer that early.

My stories were about kids running through forests (the stuff I usually did as a kid), becoming witness of mystical situations (the stuff I wished as a kid to happen).

When I familiarized more with the internet and followed all kinds of blogs (oh these Google Reader times!), I was up for my own. It was called “Schuelerleben” (student’s life) and about my days at highschool. Two more blogs followed. One about the year I lived in Ghana. The other about the years I studied Computer Science and Electrical Engineering. And then - void. Somehow I felt I couldn’t write about my daily work, to protect my co-workers, me or my companies.

A few years have passed, and I see more and more topics around work that I want to write down like I used to write down what’s on my mind. Hence this blog, hence this first blog post. If you use a feed reader, I would be delighted if you added this blog to your reader.

By Raphael Sprenger licensed under CC BY-NC 4.0