Systems Group Home	Zsolt’s Personal Site

There are several available Master thesis topics at the Systems Group at TU Darmstadt, under the supervision of Prof. Zsolt István.

For further information, please email zsolt.istvan@cs.tu-darmstadt.de with an up-to-date CV and a short explanation of how you see your skills fit in one of the below topics. If you don't see a topic of interest but would still like to work with us, send us the previously mentioned documents, as well as your own topic proposal.

1. Elementary Functions in Hardware: Algorithm Exploration

Keywords: FPGAs, Algorithms

Day-to-Day Advisor: Lukas Stasytis

Elementary functions are of key importance in all fields of computer science, from simulation and image processing to cryptography, operations such as computing logarithms, sines, cosines and exponents are widely used, yet difficult to compute. In this project, the student researches, implements and evaluates increasingly more elaborate methods for computing elementary functions (sin, cos, square root, logarithm, etc) on FPGA devices. Starting with look-up table approaches, proceeding to shift-add algorithms and polynomial approximators, capping off with specialized architectures mixing multiple simpler approaches, the student incrementally outlines, implements and evaluates the various methods of computing elementary functions. As a final optional step, the student can incorporate the designed elementary function kernels into a larger algorithm such as the Fast Fourier Transform or a matrix decomposition kernel and evaluate their practicality.

Concrete steps: 1. Formalize a set of methods for computing elementary functions, outlining their strengths and weaknesses as well as implementation challenges. What are the expected latencies, resource use, precision? 2. Implement the various kernel types in either register-transfer level languages such as Verilog and VHDL or High-Level Synthesis, starting with the simple lookup table approach and incrementality moving towards more elaborate approaches. 3. Evaluate the implementations regarding performance, resource costs, efficient use of an FPGA and compare them among themselves and as well as CPU and GPU solutions. 4. Optional extra task: Incorporate into a larger algorithm as an FPGA kernel, evaluate practical challenges of the incorporation.

Main reference paper to get a sense of the topic: [1] Elementary Functions and Approximate Computing by Jean-Michel Muller

[1] [https://hal.science/hal-02517784v2/document]

2. Matrix Decomposition FPGA Kernel for Extremely Large Matrices

Keywords: FPGAs, Algorithms

Day-to-Day Advisor: Lukas Stasytis

Matrix decompositions are of key importance in a wide variety of fields, such as image compression, least squares solving, facial recognition and latent semantic analysis. Computing these decompositions, however, is extremely computationally expensive.

One such decomposition is the Singular Value Decomposition and can be efficiently computed using a Jacobi Method on FPGAs. In this project, the student uses readily available Jacobi Method kernels for FPGAs to efficiently solve the SVD for extremely large matrices. While standard library implementations exist, they have a fixed maximum matrix size and are not readily integrated into larger pipelines. The task is to adjust a given Jacobi Method SVD kernel on FPGAs to be ready for solving real-world problems by a) adding arbitrary size matrix support by utilizing streaming, b) incorporate preprocessing of the matrix using a different matrix decomposition kernel (for example QR) to decrease the computational complexity and c) design a pipeline for computing on real datasets, capping off by d) utilizing high bandwidth memory and e) multiple FPGAs in parallel to speed up the computation further.

The task has incremental steps of complexity as to improve the overall design: 1. Analyze the computational characteristics of the provided Jacobi Method and analytically determine the timescale for computing varied size SVDs using given FPGA hardware. 2. Incorporate streaming architectures to compute SVD using either a One-Sided or Two-Sided Jacobi method for arbitrary size matrices. 3. Further improve the design to exploit multiple memory banks and increase memory bandwidth. (HBM+) 4. Incorporate a QR decomposition kernel to first precondition the matrix and reduce overall computational complexity. 5. Divide the work between multiple FPGA devices in parallel using MPI techniques.

Main reference paper to get a sense of the topic: [1] The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale by Jack Dongarra

[1] [https://netlib.org/utk/people/JackDongarra/PAPERS/svd-sirev-M111773R.pdf]

3. Implementation of Probabilistic Data Structures in NVIDIA Smart Network Interface Card

Keywords: Probabilistic Data Structures, NVIDIA Smart Network Interface Card

Day-to-Day Advisor: Faeze Faghih

Using deterministic data structures (e.g. Array, List, Set) when the data does not fit into the memory becomes difficult. Probabilistic data structures (e.g. Bloom Filter, Count-Min Sketches) are alternatives to use instead of deterministic data structures to solve this challenge. With deterministic data structures, we always get the exact result; however, we get an approximation of the result when using probabilistic data structures. The accuracy of these approximate results is determined by a variety of factors, including data size and the dedicated memory for the data structures.

In this project, you will implement Bloom Filter and Count-min Sketches in the NVIDIA Smart Network Interface Cards (NICs). Using the CPU in NICs, it's possible to build the data structures on the data that is passing through the NIC. With this implementation, we will investigate the possibility of building the data structures at line rate (i.e. if it's possible to do the processing inside the CPU at the same rate as the data arrives).

After a hands-on introduction to NVIDIA NIC and probabilistic data structures, you will implement Bloom Filter and Count-min sketches in the NIC CPU. In the next step, you will use this implementation to analyze sample data. As an optional step, you could also look for other probabilistic data structures to implement in the NIC.

4. RISC-V Case Study: Evaluate HLS Code Flexibility and Reusability

Keywords: High Level Synthesis, RISC-V Architecture

Topic in collaboration with industry: Synogate

High-Level Synthesis (HLS) allows turning almost regular software code into digital circuits. It promises higher productivity than lower level approaches to circuit design, and it stands to reason that this also extends to higher code flexibility and reusability.

In this project, you will explore and derive design practices that increase code flexibility and reusability in HLS. To this end, you will build two minimal implementations of the RISC-V processor architecture, specifically the rv32i integer base instruction set. One implementation is to be a size-optimized, single adder iterative design, while the other is a pipelined design. The goal is to maximize code reuse and maintainability between both implementations and keeping them flexible for future, hypothetical feature additions.

Looking for a sample of previous thesis topics?

[List of topics in SoSe 22]

~~~February 2023~~~