Heinrich P (2018)
Publication Language: English
Publication Type: Conference contribution, Conference Contribution
Publication year: 2018
Pages Range: 129 - 134
Conference Proceedings Title: Proceedings of 4th Asia Pacific Corpus Linguistics Conference (APCLC2018)
We are concerned with the automatic processing of annual reports submitted to the U.S. SEC’s EDGAR filing system. The filings consist of structured as well as unstructured information. One part of the filings, the 10-k forms, contains mostly linguistic data segmented into up to 20 items. We briefly describe what steps have to be taken to extract the relevant linguistic information from the unstructured part of the data. We then present results of a first exploratory corpus analysis and provide descriptive statistical figures for our NLP calculations (sentiment, readability, and further stylistic dimensions) for each item of the 10-k form and point out connections between the semantic content of the analyzed items and the quantitative linguistic observables. The linguistic register both varies across items as well as subject to the standard industrial classification of the company. We conclude by applying a dimensionality reduction algorithm (t-SNE) to the linguistic observables and use the embedding for a qualitative comparison with the company’s industry.
APA:
Heinrich, P. (2018). Stylistic Features in Corporate Disclosures and their Predictive Power. In Yukio Tono & Hitoshi Isahara (Eds.), Proceedings of 4th Asia Pacific Corpus Linguistics Conference (APCLC2018) (pp. 129 - 134). Takamatsu, JP.
MLA:
Heinrich, Philipp. "Stylistic Features in Corporate Disclosures and their Predictive Power." Proceedings of the 4th Asia Pacific Corpus Linguistics Conference (APCLC2018), Takamatsu Ed. Yukio Tono & Hitoshi Isahara, 2018. 129 - 134.
BibTeX: Download