SeSaMe: A Data Set of Semantically Similar Java Methods

Kamp M, Kreutzer P, Philippsen M (2019)


Publication Language: English

Publication Type: Conference contribution, Original article

Publication year: 2019

Publisher: IEEE Press

City/Town: Piscataway, NJ, USA

Pages Range: 529 - 533

Conference Proceedings Title: Proceedings of the 16th International Conference on Mining Software Repositories (MSR 2019)

Event location: Montréal, QC, Kanada CA

ISBN: 978-1-7281-3412-3

URI: https://i2git.cs.fau.de/i2public/publications/-/raw/master/MSR19.pdf

DOI: 10.1109/MSR.2019.00079

Abstract

In the past, techniques for detecting similarly behaving code fragments were often only evaluated with small, artificial oracles or with code originating from programming competitions. Such code fragments differ largely from production codes.

To enable more realistic evaluations, this paper presents SeSaMe, a data set of method pairs that are classified according to their semantic similarity. We applied text similarity measures on JavaDoc comments mined from 11 open source repositories and manually classified a selection of 857 pairs.

Authors with CRIS profile

Related research project(s)

How to cite

APA:

Kamp, M., Kreutzer, P., & Philippsen, M. (2019). SeSaMe: A Data Set of Semantically Similar Java Methods. In Proceedings of the 16th International Conference on Mining Software Repositories (MSR 2019) (pp. 529 - 533). Montréal, QC, Kanada, CA: Piscataway, NJ, USA: IEEE Press.

MLA:

Kamp, Marius, Patrick Kreutzer, and Michael Philippsen. "SeSaMe: A Data Set of Semantically Similar Java Methods." Proceedings of the 16th International Conference on Mining Software Repositories (MSR 2019), Montréal, QC, Kanada Piscataway, NJ, USA: IEEE Press, 2019. 529 - 533.

BibTeX: Download