Query Optimisation and Near-Data Processing on Reconfigurable SoCs for Big Data Analysis

Third Party Funds Group - Sub project

Overall project details

Overall project: DFG Priority Programme (SPP) 2037 - Scalable Data Management for Future Hardware

Project Details

Project leader:
Dr.-Ing. Stefan Wildermann
Prof. Dr.-Ing. Jürgen Teich
Prof. Dr.-Ing. Klaus Meyer-Wegener

Project members:
Andreas Becher
Lekshmi Beena Gopalakrishnan Nair

Contributing FAU Organisations:
Chair for Computer Science 6 (Data Management)
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)

Funding source: DFG / Schwerpunktprogramm (SPP)
Acronym: ReProVide
Start date: 28/08/2017
End date: 31/08/2020

Research Fields

Database Systems
Chair for Computer Science 6 (Data Management)
Reconfigurable Computing
Lehrstuhl für Informatik 12 (Hardware-Software-Co-Design)

Abstract (technical / expert description):

The goal of this project is to provide novel hardware and optimisation techniques for scalable, high-performance processing of Big Data. We particularly target huge data sets with flexible schemata (row-oriented, column-oriented, document-oriented, irregular, and/or non-indexed) as well as data streams as found in click-stream analysis, enterprise sources like emails, software logs and discussion-forum archives, as well as produced by sensors in IoT and Industrie 4.0. In this realm, the project investigates the potential of hardware-reconfigurable, FPGA-based SoCs for near-data processing where computations are pushed towards such heterogeneous data sources. Based on FPGA technology and in particular on their dynamic reconfiguration, we propose a generic architecture called ReProVide for low-cost processing of database queries.

The concepts are intended to enable the integration of FPGA-based acceleration support into available SQL, NoSQL, and in-memory database management systems (DBMSs) as well as stream-processing frameworks. Our intention is to attach volatile and non-volatile storage directly to ReProVide nodes, which will not only contain cleansed and integrated data sets, but can also be used for temporarily or persistently storing uncleaned data from new data sources and data streams.

Our FPGA-based SoC is psuhed forward by the Chair for Computer Science 12. It

  • makes use of hardware reconfiguration to adapt datapaths and accelerators for being able to process different OLAP and data-mining operators on data from such heterogeneous data sources,

  • provides management techniques to generate local meta-data, indexes, and statistics of such data sources for optimised data processing, as well as

  • offers schema-on-read capabilities for the DBMS accessing the SoC.

While the support of irregular data (e.g., graph processing) is not in the main focus of our research, we provide a design methodology that is generic and extensible by user-defined functions and data schemata.

Integration of such architectures that come with their own local optimiser into DBMS requires novel global query optimisation techniques based on concepts known from multi-databases research. This is the task of the Chair for Computer Science 6. While the local optimiser builds statistics of its local data, the global optimiser has to access such data and information of this near-data processor. Global query optimisation decides based on these data which operations are worthwhile to assign to ReProVide SoCs, and which are not. It is vital that the optimiser has enough knowledge to engage ReProVide in query processing whenever there is a benefit. This requires functional knowledge (which data and which operators can be offered) as well as non-functional knowledge (e.g., cost estimates). In this project, we provide an extensible interface over which not only the global optimiser can hand over the QEP to be processed on the ReProVide system and the ReProVide can transmit the query result. But it will also enable to bidirectionally exchange hints to improve their respective optimisation.

External Partners

Technische Universität Ilmenau
Technische Universität München (TUM)
Technische Universität Dresden
Technische Universität Berlin
Otto-von-Guericke-Universität Magdeburg


Becher, A., Herrmann, A., Wildermann, S., & Teich, J. (2019). ReProVide: Towards Utilizing Heterogeneous Partially Reconfigurable Architectures for Near-Memory Data Processing. In Gesellschaft für Informatik, Bonn (Eds.), Proceedings of the 1st Workshop on Novel Data Management Ideas on Heterogeneous (Co-)Processors (NoDMC) (pp. 51-70). Universität Rostock, DE: Bonn: Gesellschaft für Informatik.
Becher, A., Beena Gopalakrishnan Nair, L., Broneske, D., Drewes, T., Gurumurthy, B., Meyer-Wegener, K.,... Wildermann, S. (2018). Integration of FPGAs in Database Management Systems: Challenges and Opportunities. Datenbank-Spektrum. https://dx.doi.org/10.1007/s13222-018-0294-9
Becher, A., Wildermann, S., & Teich, J. (2018). Optimistic Regular Expression Matching on FPGAs for Near-Data Processing. In Proceedings of the Data Management on New Hardware (DaMoN). Houston, Texas, US: ACM.

Last updated on 2019-28-02 at 12:01