Systems Group Home Zsolt’s Personal Site

There are several available Master thesis topics at the Systems Group at TU Darmstadt, under the supervision of Prof. Zsolt István.

For further information, please email zsolt.istvan@cs.tu-darmstadt.de with an up-to-date CV and a short explanation of how you see your skills fit in one of the below topics. If you don't see a topic of interest but would still like to work with us, send us the previously mentioned documents, as well as your own topic proposal.

1. Who Decides What to Forget? A Near-Data KV Cache Eviction Oracle on a SmartNIC

Keywords: SmartNIC, LLM Inference, KV Cache

The ever-growing adoption of long-context language models has made the KV cache, i.e., the key and value tensors that a transformer's attention layers accumulate for every token processed so far, the dominant consumer of GPU high-bandwidth memory in production inference deployments. When memory is full, entries must be evicted. Today that decision is made on the GPU itself: systems such as H2O and SnapKV scan accumulated attention scores to identify and drop the lowest-ranked token entries. This stands at odds with efficient hardware utilization: the GPU, the most expensive component in the stack, spends cycles on memory bookkeeping rather than computation. Furthermore, the policy is inherently local, i.e., it sees only a single request's scores rather than the global access distribution across the full concurrent batch.

A SmartNIC (also called a Data Processing Unit, or DPU) sits in the data path between the GPU and the rest of the system and, crucially, sees every KV tensor that moves in or out of accelerator memory to disaggregated storage. In this project you will implement a near-data eviction oracle on the ARM cores of an NVIDIA BlueField SmartNIC: the oracle intercepts KV block transfers via DOCA data-path APIs, maintains per-block access-frequency and recency scores in a hash table in NIC-local DRAM, and supplies eviction candidates to the vLLM scheduler. This means that eviction policy is fully offloaded from the GPU and with visibility across the entire batch. Policy exploration at cluster scale uses the open-source Opal simulator (IBM Research), which models distributed KV cache management without requiring a full multi-node hardware deployment.

💡This topic could also be explored using an FPGA-based Smart NIC. Let us know if you have prior experience with FPGA programming (HDL or HLS)

2. Compress Before You Send: In-Network Block Floating Point Compression for KV Cache Transfers

Keywords: FPGA, HLS, LLM Inference, KV Cache, In-Network Computing

Distributed LLM inference clusters increasingly rely on prefix caching (i.e., reusing the KV tensors already computed for a shared prompt prefix) to reduce time-to-first-token (TTFT) and avoid expensive GPU recomputation. The bottleneck is data movement: when the matching cache resides on a different node, the KV tensors must be transferred across the network before generation can start. For a 70B-parameter model at a 10K-token prefix, that amounts to roughly 25 GB; even at 100 Gbit/s InfiniBand bandwidth, the transfer takes over two seconds. The latency benefit of skipping recomputation is thus largely outweighed by the cost of moving the data.

KV tensors are, however, highly compressible. Within a single transformer layer, activations cluster tightly around a shared order of magnitude, so a block of values can be represented as one shared exponent plus short per-value mantissas. This scheme is known as block floating point (BFP), achieving 4–8× compression with near-lossless quality. We make the case that pushing this compression into the network data path, i.e., performing it on an FPGA in the PCIe-to-NIC pipeline, sidesteps the host CPU bottleneck entirely and makes the transfer transparent to both GPU and application.

In this project you will design and implement a streaming BFP pipeline in Vitis HLS on FPGA-based SmartNICs, evaluate it on KV tensor distributions, and use the open-source Opal simulator to project the end-to-end impact at cluster scale.

3. From Sensors to Insights: Building the Next Generation of Edge-to-Cloud Streaming Systems

Keywords: Streaming Queries, Edge, Cloud

Interest in stream processing has been increasing due to the massive amount of data produced by devices at the Edge. This trend also shifts data processing demands from the cloud towards the Edge. There are parts of data processing pipelines, e.g., pre-processing and initial filtering, that can be executed more efficiently closer to the data source. In our ongoing research we demonstrate that well-established streaming systems, such as Flink, are not a very good match for Edge-to-Cloud use because they are designed for homogeneous cloud resources and suffer unexpected slowdowns in a heterogeneous setting. They might not even be able to handle Edge-to-Cloud failure scenarios. Emerging stream processing systems designed specifically for the Edge are also not a perfect match because they might not be able to leverage high-performance cloud resources, while lacking general workload compatibility and a large user base compared to established cloud streaming systems. We show that a promising way forward is to retrofit an established system, such as Flink, to meet the needs of the Edge-to-Cloud. In our ongoing work, we implemented a modified version of Flink, called FlinkE2C ,that can substantially reduce performance fluctuations in heterogeneous settings, adds the ability to dynamically load-balance operators at runtime, and even recovers from node failures through dynamic pipeline updates. There are several exciting thesis topic directions in the context of FlinkE2C.

Concrete topics that build on FlinkE2C include:

1. Decentralized Management -- Replacing the centralized planning and monitoring logic of FlinkE2C with a distributed version that runs on the computing nodes and can take both local and global decisions more efficiently than the centralized one.

2. Automated Topology Discovery -- In Edge-to-Cloud settings, network links and node performance can fluctuate for numerous reasons. Today, the node characteristics and link capacities are assumed to be known before running a query and to be stable throughout. In this research line, you will investigate how changes can be detected reliably.

3. Utilization of Specialized Hardware -- An Edge-to-Cloud environment comprises a diverse set of devices on which the pipeline must run efficiently. This means that an Edge-to-Cloud stream processing system needs to be aware of the different resources (e.g., Smart NICs, in-network processors, embedded GPUs, etc.) and use them effectively, for example, by reducing data volume closer to the source while placing more power-hungry operators in the cloud.

Main reference paper to get a sense of the topic: The Edge Awakens: Making Stream Processing Systems Fit for Edge-to-Cloud Deployment by M. Hüttner and Zs. István (under review, can be shared upon request)

4. Making the Unpractical Practical: Efficient MPC-based Analytics in the Enterprise

Keywords: Secure Multi-Party Computation, Database Analytics, Distributed Computing

Data analytics across different branches or subsidiaries of a large enterprise has many benefits, but raises security and privacy concerns. Secure Multi-party Computation (MPC) is becoming increasingly practical for Data Federations and could be one way of increasing guarantees in the enterprise as well. However, in the enterprise setting, MPC incurs prohibitive overheads due to large data volumes and many-join queries. Conversely, not all queries require fully oblivious execution and, in many cases, allow relaxations tailored to the in-enterprise use-case, in exchange for better performance. We are building on recent progress in making MPC orders of magnitude faster and are actively investigating to what extent the ideas can be transferred to in-enterprise use-cases and what novel solutions are needed. Our goal is to build an MPC query optimizer and integrate it with query operators already developed by our group in earlier work and provide, this way, a prototype of an in-enterprise MPC analytics system that can flexibly navigate the trade-off between privacy/security and performance. That is giving users control over relaxing guarantees and then finding the fastest way to run queries with the chosen guarantees in our MPC framework.

There are several different thesis topics possible in this project -- depending on whether you're more interested in the systems or the formal aspects of the project. The main reference is the recent Reflex paper by L. Gu, S. Zeitouni, C. Binnig and Zs. István


Looking for a sample of previous thesis topics?

[List of topics from earlier years]

~~~June 2026~~~