Internally funded project
Acronym: FST
Start date : 19.09.2022
With the ongoing rise in global data volumes, database compression is becoming increasingly relevant. While the compression of numeric data types has been extensively researched, the compression of strings has only recently received renewed scientific attention.
A promising approach to string compression is the use of symbol tables, where recurring substrings within a database are substituted with short codes. A corresponding table enables the smooth reconstruction of the original data. This method is distinguished by short compression and decompression times, although the compression rate heavily depends on the quality of the symbol table.
The research project FST focuses on the creation of optimized symbol tables to maximize the compression rate. For this purpose the eponymous Frequent-Substring Trees are constructed, a trie-like data structure that maps all potential table entries and enables the identification of optimal entries through the use of metadata.
The primary objective of the research project is to increase the compression rate of string compression methods without significantly affecting the compression and decompression times.