Understanding and explaining Delta measures for authorship attribution

Beitrag in einer Fachzeitschrift
(Originalarbeit)


Details zur Publikation

Autorinnen und Autoren: Evert S, Proisl T, Jannidis F, Reger I, Pielström S, Schöch C, Vitt T
Zeitschrift: Digital Scholarship in the Humanities
Jahr der Veröffentlichung: 2017
Band: 32
Heftnummer: suppl_2
Seitenbereich: ii4–ii16
ISSN: 2055-7671
eISSN: 2055-768X
Sprache: Englisch


Abstract


This article builds on a mathematical explanation of one the most prominent stylometric measures, Burrows’s Delta (and its variants), to understand and explain its working. Starting with the conceptual separation between feature selection, feature scaling, and distance measures, we have designed a series of controlled experiments in which we used the kind of feature scaling (various types of standardization and normalization) and the type of distance measures (notably Manhattan, Euclidean, and Cosine) as independent variables and the correct authorship attributions as the dependent variable indicative of the performance of each of the methods proposed. In this way, we are able to describe in some detail how each of these two variables interact with each other and how they influence the results. Thus we can show that feature vector normalization, that is, the transformation of the feature vectors to a uniform length of 1 (implicit in the cosine measure), is the decisive factor for the improvement of Delta proposed recently. We are also able to show that the information particularly relevant to the identification of the author of a text lies in the profile of deviation across the most frequent words rather than in the extent of the deviation or in the deviation of specific words only.



FAU-Autorinnen und Autoren / FAU-Herausgeberinnen und Herausgeber

Evert, Stefan Prof. Dr.
Lehrstuhl für Korpus- und Computerlinguistik
Proisl, Thomas
Lehrstuhl für Korpus- und Computerlinguistik


Zitierweisen

APA:
Evert, S., Proisl, T., Jannidis, F., Reger, I., Pielström, S., Schöch, C., & Vitt, T. (2017). Understanding and explaining Delta measures for authorship attribution. Digital Scholarship in the Humanities, 32(suppl_2), ii4–ii16. https://dx.doi.org/10.1093/llc/fqx023

MLA:
Evert, Stefan, et al. "Understanding and explaining Delta measures for authorship attribution." Digital Scholarship in the Humanities 32.suppl_2 (2017): ii4–ii16.

BibTeX: 

Zuletzt aktualisiert 2018-06-12 um 13:50