ParSeMiS - the Parallel and Sequential Graph Mining Suite (ParSeMiS)

Internally funded project

Acronym: ParSeMiS

Start date : 01.05.2006

End date : 31.12.2010

Overview

Project details

Scientific Abstract

The ParSeMiS project (Parallel and Sequential Graph Mining Suite) searches for frequent, interesting substructures in graph databases. This task is becoming increasingly popular because science and commerce need to detect, store, and process complex relations in huge graph structures.
For huge data that cannot be worked on manually, algorithms are needed that detect interesting correlations. Since in general the problem is NP-hard and requires huge amounts of computation time and memory, parallel or specialized algorithms and heuristics are required that can perform the search within time boundaries and memory limits.
Our target is to provide an efficient and flexible tool for searching in arbitrary graph data, to improve the adaption to new application areas, and to simplify and unify the design of new mining algorithms.

Aufbauend auf den Ergebnissen des Projekts ParMol2 wurden 2006/2007 folgende Ziele erreicht:

Restrukturierung und Neudesign der gewachsenen ParMol-Strukturen zu einer flexiblen Graphbibiliothek.
Ergänzung des objekt-orientierten Graphdesigns zu kompakteren, zur Parallelisierung besser geeigneten Datenstrukturen.
Überführung und Zerlegung der Algorithmen gSpan und Gaston in die neuen Strukturen und Einbau von Erweiterungen für das aktuelle Anwendungsgebiet ”Prozedurale Abstraktion”.
Entwurf und Implementierung eines neuen Algorithmus zur Suche in gerichteten azyklischen Graphen (DAGs) als Spezialisierung für die Prozedurale Abstraktion.
Implementierung einer angepassten grafischen Anzeige für DAGs.

In 2008, the following goals have been achieved:

Documentation and publication of the source code to enlarge the user base of the project,
Implementation of a specialized graph layout for DAGs,
Restructuring of the graphical user interface, and
Added support for clusters of multi-core nodes.

In 2009, the following goals have been attacked:

Optimizations for embedding-based frequency mining: A more detailed look at the pruning-related properties of the maximum-clique-test resulted in a huge run-time improvement, particularly for low frequencies that are of special interest for applications. This is achieved by an early detection during the NP-complete test that decides for a fragment whether can become frequent at all.
Improved distribution for clusters of multi-core nodes: Co-location of threads in the same memory speeds up parallel search. First results have been seen in 2009, more are expected in 2010.

In 2010, the distributed stack implementations have also been tested on other algorithms and data structures.

Involved:

Michael Philippsen Project Leader Marc Wörlein Project Member Alexander Dreweke Project Member Tobias Werth Project Member

Contributing FAU Organisations:

Lehrstuhl für Informatik 2 (Programmiersysteme)