Micro Replication for Dependable Network-based Services (Mirador)

Third party funded individual grant


Acronym: Mirador

Start date : 01.11.2024

End date : 31.10.2027


Project details

Scientific Abstract

Network-based services such as distributed databases, file systems, or blockchains are essential parts of today's computing infrastructures and therefore must be able to withstand a wide spectrum of fault scenarios, including hardware crashes, software failures, and attacks. Although a variety of state-machine replication protocols exist that provide fault and intrusion tolerance, it is inherently difficult to build dependable systems based on their complex and often incomplete specifications. Unfortunately, this commonly leads to systems being vulnerable against correlated failures or attacks, for example, in cases where, to save development and maintenance costs, all replicas in a system share the same homogeneous implementation.

In the Mirador project, we seek to eliminate this gap between theory and practice, proposing a novel paradigm for the specification and implementation of dependable systems: micro replication. In contrast to existing systems, micro-replication architectures do not consist of a few monolithic and complex replicas, but instead are organized as dedicated, loosely coupled micro-replica clusters that are each responsible for a different protocol step or mechanism. As a key benefit of providing only a small subset of the overall protocol functionality, micro replicas make it significantly easier to reason about the completeness and correctness of both specifications as well as implementations. To further reduce complexity, all micro replicas follow a standardized internal work flow, thereby greatly facilitating the task of introducing heterogeneity at the replica, communication, and authentication level.

Starting from this basic concept, in the Mirador project we explore micro replication as a means to build dependable replicated systems and examine its flexibility by developing micro-replication architectures for different fault models (i.e., crashes and Byzantine faults). In particular, our research focuses on two problem areas: First, we aim at increasing the resilience of micro-replicated systems by enabling them to recover from replica failures. Among other things, this requires mechanisms for rejuvenating micro replicas from a clean state and integrating replacement replicas at runtime. Second, our goal is to improve the performance and efficiency of micro-replicated systems and the applications running on top of them. Specifically, this includes the design of techniques to reduce overheads by exploiting optimistic approaches that save processor and network resources in the absence of faults. Furthermore, we investigate ways to restructure the service logic and for example outsource preprocessing steps to upstream micro-replica clusters. To evaluate the concepts, protocols, and mechanisms developed in the Mirador project, we build a heterogeneous micro-replicated platform that allows us to conduct experiments for a wide range of different settings and with a variety of applications.

Involved:

Contributing FAU Organisations:

Funding Source

Research Areas