Resilience

Four-part blog series by Prof. Dr. Mario Trapp on resilience

In view of the current crises, the term resilience is currently receiving considerable attention. However, even in connection with technical systems and software reliability, the term has played a much discussed role for several years. The increased use of artificial intelligence has made this role even more important.

Read the four-part blog series in our Safe Intelligence Blog by Prof. Dr. Mario Trapp:

The Story of Resilience, Part 1 / 29.9.2022

1. Understanding resilience

Resilience is a key ingredient for lifting cars, industrial manufacturing and medical devices to the next level. It is far more than a fancier synonym of dependability. And the aspect that resilience is standing for is essential for building cognitive systems. That's why resilience is the focus of a new series on the Safe Intelligence Blog. In the first part, we will look behind the term and clarify why resilience is so important and why it is more than just a synonym for dependability.

Understanding resilience

The Story of Resilience, Part 2 / 27.10.2022

2. The complexity challenge

Resilience is the key ingredient for the next generation of cognitive cyber-physical systems. In this article, part 2 of our series on resilience, we look at the character of complexity to better understand where to start engineering in order to manage this complexity.

The Complexity Challenge

The Story of Resilience, Part 3 / 16.11.2022

3. Understanding self-adaptive systems

What are the basic principles of self-adaptive systems? This is the basic prerequisite for bringing the concept of resilience to life. Part 3 of our series.

Self-Adaptive Systems

The Story of Resilience, Part 4 / 30.1.2023

4. Adaptive safety control – why it's the icing on the cake

Safety is a property of a system and its environment. That's why predicting a system's context is so important to safety engineers. In this fourth and final part, we add the missing ingredient by taking safety engineering to the next level.

Adaptive Safety Control

To our Safe Intelligence Blog

On our Safe Intelligence Blog you will find blog articles on all our research topics. Read on directly!

Safe Intelligence Blog

What is resilience?

The roots of the concept of resilience go back a long way. The philosopher and lawyer Francis Bacon (1561 to 1626) defined resilience as a physical property of a body to be able to return to its original state after the action of a force.
With regard to technical systems, resilience means not failing completely in the event of faults or partial failures, but maintaining essential system services.

The computer scientist Jean-Claude Laprie defined resilience in connection with software reliability. A very important point: Resilience also refers to changes that have not been foreseen or even are unpredictable. The concept of resilience thus accepts the fact that one is unable to predict the context of systems. Resilient systems therefore have to adapt themselves again and again in order to achieve their overarching goals in changing, uncertain contexts.

Resilient software systems are characterized in that they are capable of reacting to a change in a particular context by adapting to it, so that essential characteristics of the system are maintained or optimized.

Definition of resilience of Fraunhofer IKS

In the areas in which Fraunhofer IKS is active, the key feature that the systems are supposed to preserve is safety. Specifically, the requirement for systems is to provide a certain functionality in a safe manner. That is why the scientists at Fraunhofer IKS are trying to optimize the benefits of a system while at the same time maintaining safety in unsafe, unknown and changing contexts. The appropriate definition of resilience is then: Optimizing the benefits while maintaining safety in unsafe contexts.

Differentiation from reliability

When we talk about the reliability of a system, this term refers more to internal problems such as faulty components or programming errors. In contrast, resilience focuses on the external context. Even though such changes are often detected within the system, the cause of the problem is in the context.

Differentiation from robustness

Resilience is somehow related to robustness, but it goes beyond that. Since resilience refers to changes in context, it is not enough to build the same system just a little harder and more robust. Instead, the system needs to adapt, because the current structures and behavior of the system can no longer achieve its objectives under the new boundary conditions.

Resilience and its meaning – where is resilience relevant?

Resilience factors become critical when a system moves in a non-clearly defined environment, the so-called open-world context, i.e. a context that cannot be fully predicted and specified. It is also not possible to develop a single static solution that is suitable for all possible operating situations. Examples are cyber-physical systems such as autonomous vehicles or mobile robots.

Examples of resilience

Autonomous driving – level 5

Level 5 autonomous vehicles are in a completely open and highly complex context. They move on a four-lane highway just as in a dense city. They encounter a large number of different road users whose behavior is not foreseeable. In order for the vision of driverless cars to become reality, it is essential to enable the systems to adapt to their context. The aim is to maximize their benefits while at the same time ensuring safety.

Autonomous mobile robots

Mobile robots, which, for example, bring components or goods from A to B in storage logistics, also move in an open context. For example, they have to avoid people, bypass obstacles and orient themselves independently.

Automation with cobots – robots working together with humans

Whenever systems have to interact with people, unpredictable changes in the context can occur. Examples of this are collaborating robots – so-called cobots. They cooperate with human workers in the manufacturing environment, for example – ideally hand in hand. In this connection, the robot system must always adapt to humans. While the robot always works in the same way, this is not the case with the worker. The latter’s receptivity, for example, may change – depending on the time of day, individual characteristics or personal motivation.

Promoting resilience – a challenge

Empowering systems to adapt to changes in the context and to increase resilience leads to more complexity. Example: If a software system consists of 100 different components, every single one of which has five configurations, then this system offers 5100 different combination options to respond to a specific event.

This complexity can hardly be controlled. This in turn leads to quality problems because it can no longer be guaranteed that the system can still fulfill its functions. So as a consequence of increased flexibility, it may no longer be possible to guarantee the benefit of a system.

What role does artificial intelligence play?

The challenge of resilience is particularly evident when using artificial intelligence (AI). The strength of AI technologies, such as neural networks, is to react flexibly to unforeseen events. Therefore, they play a decisive role in autonomous driving, for example, because the car can only move in the open-world context with their help.

However, since AI, like a neural network, is a very complex system, it is not possible to understand how their decisions come about. AI is a black box, which means that it is impossible to guarantee the functioning of AI. However, when it comes to safety-critical applications such as autonomous driving, guarantees are necessary.

Installing a safety barrier in the form of “if/then/otherwise adaptations” does, however, not solve the problem as this would deprive AI of its greatest strength – flexibility.

Solution approach – resilience requires holistic thinking

In order to develop resilient systems, holistic thinking is required. We have to consider artificial intelligence, technical methods and suitable architectures holistically. Fraunhofer IKS calls this safe intelligence.

Here, software technology is also of great importance, as it already offers instruments for self-adjusting systems. Methods and architectures have been developed for these systems for more than two decades, which form a basis for the development of resilient systems. They are also capable of controlling the complexity of resilient systems. Among other things, approaches like DevOps will play an increasingly important role.

What is Fraunhofer IKS working on?

Specifically, Fraunhofer IKS is working on various approaches to safeguard artificial intelligence, such as that used in autonomous vehicles. One of them is the structured safety analysis. A logical model of the system architecture is created, which represents signal flows and their quality as well as the limitations of the sensors. Afterwards, it is assessed how critical the identified vulnerabilities are and which risk they pose. The safety analysis then examines which critical situations lead to safety-relevant errors.