Text Mining in Program Code

Dreweke A, Fischer I, Werth T, Wörlein M (2009)

Publication Language: English

Publication Type: Book chapter / Article in edited volumes

Publication year: 2009

Publisher: Idea Group Publishing

Edited Volumes: Handbook of Research on Text and Web Mining Technologies

City/Town: Hershey

Pages Range: 626-645

ISBN: 1599049902

URI: https://cs2-gitlab.cs.fau.de/i2public/publications/-/blob/master/DFWW09.pdf

DOI: 10.4018/978-1-59904-990-8.ch035

Abstract

Searching for frequent pieces in a database with some sort of text is a well-known problem. A special sort of text is program code as e.g. C++ or machine code for embedded systems. Filtering out duplicates in large software projects leads to more understandable programs and helps avoiding mistakes when reengineering the program. On embedded systems the size of the machine code is an important issue. To ensure small programs, duplicates must be avoided. Several different approaches for finding code duplicates based on the text representation of the code or on graphs representing the data and control flow of the program and graph mining algorithms.

Authors with CRIS profile

Alexander Dreweke Lehrstuhl für Informatik 2 (Programmiersysteme) Tobias Werth Lehrstuhl für Informatik 2 (Programmiersysteme) Marc Wörlein Lehrstuhl für Informatik 2 (Programmiersysteme)

How to cite

APA:

Dreweke, A., Fischer, I., Werth, T., & Wörlein, M. (2009). Text Mining in Program Code. In Song, Min ; Wu, Yi-Fang Brook (Eds.), Handbook of Research on Text and Web Mining Technologies. (pp. 626-645). Hershey: Idea Group Publishing.

MLA:

Dreweke, Alexander, et al. "Text Mining in Program Code." Handbook of Research on Text and Web Mining Technologies. Ed. Song, Min ; Wu, Yi-Fang Brook, Hershey: Idea Group Publishing, 2009. 626-645.

BibTeX: Download