Query-Driven Data Profiling with OCEANProfile

Wahl AM, Sauerhammer C, Schwab P, Herbst S, Lenz R (2018)


Publication Language: English

Publication Type: Conference contribution, Conference Contribution

Publication year: 2018

Publisher: ACM

Conference Proceedings Title: Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics

Event location: Rio de Janeriro BR

ISBN: 978-1-4503-6607-6

DOI: 10.1145/3242153.3242154

Abstract

Complex data analysis scenarios often require discovering and combining multiple data sources. Data scientists usually formulate a series of SQL queries building on each other, also called a session, to iteratively derive results. However, due to a lack of familiarity with data sources or the complexity of query results, it can be a hard task to decide on the next query iteration solely based on the results of the last one.

While existing approaches provide mechanisms to assess the results of a specific query, support for analyzing results in the context of the respective session remains mostly absent. Such approaches do also not seamlessly integrate with established tools and workflows.

To overcome these problems, we introduce OCEANProfile, a framework for session-based profiling of query results. Query results are intercepted at driver level and streamed into our framework for automated data profiling. Result profiles can be compared with those of previous queries and visualized in a companion app compatible with existing analysis tools. Visualizations are automatically ranked according to their usefulness in the context of the respective session.

Authors with CRIS profile

Related research project(s)

How to cite

APA:

Wahl, A.M., Sauerhammer, C., Schwab, P., Herbst, S., & Lenz, R. (2018). Query-Driven Data Profiling with OCEANProfile. In Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics. Rio de Janeriro, BR: ACM.

MLA:

Wahl, Andreas Maximilian, et al. "Query-Driven Data Profiling with OCEANProfile." Proceedings of the Twelfth International Workshop on Real-Time Business Intelligence and Analytics (BIRTE 2018), Rio de Janeriro ACM, 2018.

BibTeX: Download