ParSeMiS - the Parallel and Sequential Graph Mining Suite (ParSeMiS)
Internally funded project
Acronym:
ParSeMiS
Start date :
01.05.2006
End date :
31.12.2010
Project details
Scientific Abstract
The ParSeMiS project (Parallel and Sequential Graph Mining Suite) searches for frequent, interesting substructures in graph databases. This task is becoming increasingly popular because science and commerce need to detect, store, and process complex relations in huge graph structures.
For huge data that cannot be worked on manually, algorithms are needed that detect interesting correlations. Since in general the problem is NP-hard and requires huge amounts of computation time and memory, parallel or specialized algorithms and heuristics are required that can perform the search within time boundaries and memory limits.
Our target is to provide an efficient and flexible tool for searching in arbitrary graph data, to improve the adaption to new application areas, and to simplify and unify the design of new mining algorithms.
Aufbauend auf den Ergebnissen des Projekts ParMol2 wurden 2006/2007 folgende Ziele erreicht:
- Restrukturierung und Neudesign der gewachsenen ParMol-Strukturen zu einer flexiblen Graphbibiliothek.
- Ergänzung des objekt-orientierten Graphdesigns zu kompakteren, zur Parallelisierung besser geeigneten Datenstrukturen.
- Überführung
und Zerlegung der Algorithmen gSpan und Gaston in die neuen Strukturen
und Einbau von Erweiterungen für das aktuelle Anwendungsgebiet
”Prozedurale Abstraktion”.
- Entwurf und Implementierung eines
neuen Algorithmus zur Suche in gerichteten azyklischen Graphen (DAGs)
als Spezialisierung für die Prozedurale Abstraktion.
- Implementierung einer angepassten grafischen Anzeige für DAGs.
In 2008, the following goals have been achieved:
- Documentation and publication of the source code to enlarge the user base of the project,
- Implementation of a specialized graph layout for DAGs,
- Restructuring of the graphical user interface, and
- Added support for clusters of multi-core nodes.
In 2009, the following goals have been attacked:
- Optimizations for embedding-based frequency mining: A more detailed look at the pruning-related properties of the maximum-clique-test resulted in a huge run-time improvement, particularly for low frequencies that are of special interest for applications. This is achieved by an early detection during the NP-complete test that decides for a fragment whether can become frequent at all.
- Improved distribution for clusters of multi-core nodes: Co-location of threads in the same memory speeds up parallel search. First results have been seen in 2009, more are expected in 2010.
In 2010, the distributed stack implementations have also been tested on other algorithms and data structures.
Involved:
Contributing FAU Organisations: