DSLab Home | Zsolt’s Personal Site |
There are three available Master thesis topics at the Distributed Systems and Computer Networks Lab, under the supervision of Prof. Zsolt István.
For further information, please email zsolt.istvan@cs.tu-darmstadt.de with an up-to-date CV and a short explanation of how you see your skills fit in one of these topics:
1. Finding Bottlenecks at Scale in Consensus Algorithms
Keywords: Benchmark, Distributed Systems
Consensus mechanisms for ensuring consistency are some of the most expensive operations in managing large amounts of data. Often, there is a trade off that involves reducing the coordination overhead at the price of accepting possible data loss or inconsistencies. As the demand for more efficient data centers increases, it is important to provide better ways of ensuring consistency without affecting performance.
In recent years, there have been several research proposals around Crash Fault Tolerant (e.g.,[1] and [2]) and Byzantine Fault Tolerant consensus protocols (e.g., [3] and [4]) that, even though offer promising results on a handful of machines, have not been evaluated at large scale. With the increasing importance of Distributed Ledger-based systems, it is important to evaluate the state of the art consensus libraries and identify the bottlenecks they face at scale on 100s of nodes!
By working on this project in our group, you will have the opportunity to perform experiments in the cloud but also build analytical models of the behavior and learn in detail about the different consensus protocols.
Student Profile
- Skills needed: Solid Distributed Systems background, Ideally some Go and Python experience
- Skills to be acquired: Design and implementation of distributed protocols, Experimentation with large-scale distributed systems
[1] [https://dl.acm.org/doi/abs/10.1145/2749246.2749267]
[2] [https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaro]
[3] [https://arxiv.org/abs/1906.05552]
[4] [https://ieeexplore.ieee.org/abstract/document/6903593]
2. Low-latency Inference inside a Smart Storage Node
Keywords: FPGA, Data Management, Machine Learning
(This project will be carried out in collaboration with Xilinx Research Labs in Dublin)
Machine Learning operators are becoming increasingly commonly used in data management systems and, in this project, we will explore the challenges and benefits of integrating inference operators from FINN [1] within a so-called Smart Storage system [2]. Both the inference and data management aspects will be handled by an FPGA in order to provide small energy footprint and guarantee high access bandwidth at the same time (see details below). During this project, explore questions such as: What are the data management challenges of organizing the key-value contents to make inference possible with low latency? What are the integration challenges at the FPGA circuit level of this functionality in a Smart Storage node?
For inference, we will rely on FINN [1], an experimental framework from Xilinx Research Labs to explore deep neural network inference on FPGAs. It specifically targets quantized neural networks, with emphasis on generating dataflow-style architectures customized for each network.
For the Smart Storage Node we will use Caribou [2]. Caribou nodes are built with FPGAs and each node stores key-value pairs in main memory and exposes a simple interface over TCP/IP that software clients can connect to. Caribou is “smart” because it is possible to offload filtering into the storage nodes. The nodes can also perform scans on the data. In this design filtering is a combination of regular expression matching and predicate evaluation. Different types of processing can, however, easily be added to the processing pipeline. Caribou is “distributed” because it runs on multiple FPGAs that replicate the data using a leader-based consensus protocol that is both low latency and high throughput. Caribou provides a “storage service” because it stores key-value pairs in a Cuckoo hash table and implements slab-based storage allocation.
Student Profile
- Skills needed: VHDL/Verilog coding, Debugging FPGA projects at least in simulation, Ideally some Go and Python experience
- Skills to be acquired: Designing HW/SW systems, Working with network-facing FPGA designs, Possibly HLS coding
[1] [https://xilinx.github.io/finn/]
[2] [fpgasystems/caribou]
3. Towards In-network Processing in Cloud Servers
Keywords: Networking, eBPF, Packet Processing
(This project will be carried out in collaboration with SAP)
The networking stack of cloud servers has been traditionally difficult to be extended by applications, even if such extensions would lead to better security, observability, and even higher performance. The main impediment for implementing application-specific features in the Linux kernel is that this process is cumbersome and finicky, out of the reach of most programmers. With the general availability of eBPF [1], however, it is possible to write packet manipulation logic in a simple portable format and run it securely: the eBPF virtual machine is sandboxed inside the Linux kernel and helps in abstracting away a lot of the OS ans hardware details.
In this project, after a hands-on introduction to eBPF, you will explore how eBPF can be used not only to filter packets but also to make applications more resilient (by assisting replication) or more responsive in a distributed setting (by implementing a proxy / load balancer). eBPF also makes it easier to program emerging specialized hardware, such as Smart NICs built with FPGAs [2], which means that some of the ideas covered in the first part of the project could also be evaluated with programmable hardware accelerators in mind.
After the thesis, there will be a possibility of applying for an internship at our collaborators at SAP (Walldorf or Berlin), utilizing the skills acquired to explore novel ideas for increasing the reliability of SAP HANA Cloud.
Student Profile
- Skills needed: Working knowledge of Operating Systems and Networking, Ideally some C and Go experience
- Skills to be acquired: eBPF coding, working with programmable networks, notions of high performance packet processing
[1] [https://ebpf.io/]
[2] [https://www.usenix.org/conference/osdi20/presentation/brunella]
~~~February 2022~~~