Learning Code Transformations from Repositories

Dotzler G (2018)

Publication Language: English

Publication Type: Thesis

Publication year: 2018

Publisher: FAU University Press

ISBN: 978-3961471416

DOI: 10.25593/978-3-96147-142-3


Library updates, program errors, and maintenance tasks in general force developers to apply the same code change to different locations within their projects. If the locations are very different to each other, it is very time-consuming to identify all of them. Even with sufficient time, there is no guarantee that a manual search reveals all locations. If the change is critical, each missed location can lead to severe consequences. The manual application of the code change to each location can also get tedious. If the change is larger, developers have to execute several transformation steps for each code location. In the worst case, they forget a required step and thus add new errors to their projects.
There are several example-based code recommendation systems that support developers in the execution of such repetitive code changes. These tools use one or more code changes as input to generate a pattern. With such a pattern, they search for code locations that are similar to the input examples. If the tools find a suitable location, they apply the pattern and present the transformed code as recommendation to the developer. The use of such tools automates the search and requires less time. The tools also perform the required code transformation automatically and thus cause less errors compared to the manual execution. However, current state-of-the-art tools have two drawbacks. First, the tools often generate recommendations that are syntactically or semantically wrong. Developers cannot copy such recommendations directly into their projects. The necessary adaptation of these recommendations means additional manual work. Second, developers have to provide the input examples. This creates an additional workload for developers and makes the use of such tools often too costly.
To address the first drawback, this thesis presents the recommendation system ARES. It uses a pattern design that leads to more accurate recommendations compared to previous approaches. ARES achieves this by conserving variations in the input examples in more detail due to its pattern design and by an improved handling of code movements. The evaluation of ARES uses historic data from code repositories. This makes it possible to compare the generated recommendations with real code changes. In this scenario, ARES achieves an average accuracy of 96%.
With the tool C3, this thesis presents also a solution to the second drawback. ARES requires groups of two or more similar code changes as input. C3 extracts such groups from code repositories. To identify the groups, C3 supports two different syntactic similarity metrics and two different clustering algorithms that are all specialized for code changes.
ARES, C3, and similar tools rely on lists of edit operations to express code changes. However, creating compact (i.e., short) lists of edit operations from data in repositories is difficult. As previous approaches produce too long lists for ARES and C3, this thesis presents a novel tree differencing approach called MTDIFF. It includes six general optimizations that are also compatible with other tree differencing approaches. The evaluation shows that the optimizations shorten the edit operation lists of other state-of-the-art approaches. Compared to these optimized approaches, MTDIFF further shortens these lists.

Authors with CRIS profile

Related research project(s)

How to cite


Dotzler, G. (2018). Learning Code Transformations from Repositories (Dissertation).


Dotzler, Georg. Learning Code Transformations from Repositories. Dissertation, FAU University Press, 2018.

BibTeX: Download