Near-Data Query Processing on Heterogeneous FPGA-based Systems

Becher A (2022)

Publication Language: English

Publication Type: Thesis

Publication year: 2022

URI: https://nbn-resolving.org/urn:nbn:de:bvb:29-opus4-189289

Abstract

Query processing is a traditional yet still ongoing field of research. Its significance is derived from the increase of data created and processed every day and the opportunities provided by the analysis of the data. In today’s world, complete businesses are built on top of sophisticated data processing capabilities. However, with the increase of data, processing these huge amounts of data becomes more and more challenging. This is not only because of the time and resources it takes to process such amounts of data but also due to the energy costs. Consequently, researchers broadened the range of processing architectures to investigate for query processing beyond traditional processor-based systems. Next to programmable graphic processing units (GPUs), field programmable gate arrays (FPGAs) have become of great interest due to their unique features. FPGAs not only allow for the construction of highly optimized hardware circuits for specific tasks but also enable the adaption of hardware to the tasks during runtime. Hence, many researchers have presented various proposals to exploit the features provided by FPGAs. Although the proposed systems can achieve high throughput and efficiency in general, they are often not able to accelerate queries that haven’t been considered during their design. Performance and efficiency can only be gained best through specialization, and thus, a system should adapt to an incoming and unknown query. This is possible with FPGAs due to their ability to be reconfigured fully or in parts during runtime. However, this comes at the cost of high startup times as the FPGA has to be configured according to the query prior to the execution of the query. Furthermore, it is almost impossible to generate hardware configurations for every possible query. This thesis introduces an innovative FPGA-based near-data processing system able to process a wide variety of queries at I/O-rate (line-rate). It is based on reconfigurable and parametrizable accelerators. The accelerators are composed of parametrizable modules within a library. These modules do not only implement a specific operator for a specific type but are optimized to implement operators for multiple types or even multiple functions without a drastic increase of resources. Another contribution of this thesis is the concept of optimistic query processing for demanding operators such as the join and regular matching operator. It is based on the idea to approximately filter as much data as possible in hardware without removing tuples that should be kept. The resulting, often reduced, intermediate table is guaranteed to be a superset of the accurate filter operation. A software-based operator implementation can then be applied on an intermediate table with less tuples to finalize the operation. As an example, the implementation of a module for regular expression matching is presented. Equipped with a parameter sequencer, accelerators assembled from this library are able to implement a greater variety of queries by setting the parameters of the modules according to the query to process. However, the schema which the tables are stored in also influences the design of the accelerator and, therefore, may limit the types of queries it can implement. For this, a hardware unit called ReOrder is introduced. It decouples the table schema and storage layout from the accelerator enabling all accelerators to be used on every table with row-oriented and column-oriented storage layouts. Even though the developed accelerators are able to implement a wide variety of queries, no one-fits-all accelerator is possible. Consequently, the system is designed to concatenate multiple partially reconfigurable (i.e., exchangeable) accelerators without a decrease in tuple throughput. This increases the types of queries that can be processed even further. As accelerators might not use all available resources within a partially reconfigurable FPGA region, the idea of in situ statistics generation is proposed. In situ statistics modules can utilize the free resources to gather information on the table that is processed by an accelerator without additional costs in terms of time. Complementary to the hardware related parts already mentioned, a control software managing the execution of a query on such a system is presented as well. Starting from the basic components needed to execute queries on the platform, the description goes into depth on the particularities of such a FPGA-based query execution system. Especially the query placement problem which describes the problem of finding a query-specific-configuration of the system’s hardware according to an incoming query is formulated. In addition, the challenges to obtain an optimal placement is discussed and exemplified using the problem of buffer assignment. Afterwards, the parameters of the modules have to be generated. In this regard, an algorithm to obtain the parameters for a ReOrder unit is presented and evaluated in depth. Additionally, considerations about parameter generators for a histogram module and the optimistic regular expression matching module are provided. Finally, an implemented prototype of the system called ReProVide unit has been evaluated. It is able to provide I/O-rate processing of simple as well as complex queries. Compared to a software-based in-memory database system executed on an ARM processor, queries could be executed 19.9× faster on the prototype on average. When executed on an x86 processor, comparable execution times have been observed. This means the prototype system storing the tables on two solid states drives was able to process queries as fast as an x86 system holding the tables in memory. Furthermore, the prototype built is shown to be very energy-efficient, consuming only less than 25% of the energy consumed by the x86 system on average.

Authors with CRIS profile

Andreas Becher Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)

Related research project(s)

Query Optimisation and Near-Data Processing on Reconfigurable SoCs for Big Data Analysis (ReProVide) DFG-Schwerpunktprogramm (SPP) 2037 - Skalierbares Datenmanagement für zukünftige Hardware Aug. 28, 2017 - March 1, 2023

How to cite

APA:

Becher, A. (2022). Near-Data Query Processing on Heterogeneous FPGA-based Systems (Dissertation).

MLA:

Becher, Andreas. Near-Data Query Processing on Heterogeneous FPGA-based Systems. Dissertation, 2022.

BibTeX: Download