Generation of Symbol Tables for String Compression with Frequent-Substring Trees (FST)

Internally funded project


Acronym: FST

Start date : 19.09.2022


Project details

Scientific Abstract

With the ongoing rise in global data volumes, database compression is becoming increasingly relevant. While the compression of numeric data types has been extensively researched, the compression of strings has only recently received renewed scientific attention.

A promising approach to string compression is the use of symbol tables, where recurring substrings within a database are substituted with short codes. A corresponding table enables the smooth reconstruction of the original data. This method is distinguished by short compression and decompression times, although the compression rate heavily depends on the quality of the symbol table.

The research project FST focuses on the creation of optimized symbol tables to maximize the compression rate. For this purpose the eponymous Frequent-Substring Trees are constructed, a trie-like data structure that maps all potential table entries and enables the identification of optimal entries through the use of metadata.

The primary objective of the research project is to increase the compression rate of string compression methods without significantly affecting the compression and decompression times.

Involved:

Contributing FAU Organisations:

Research Areas