Separating the signal from the noise - financial machine learning for Twitter

Schnaubelt M, Fischer T, Krauß C (2020)


Publication Language: English

Publication Status: Submitted

Publication Type: Journal article, Original article

Future Publication Type: Journal article

Publication year: 2020

Journal

Series: FAU Discussion Papers in Economics

Book Volume: 114

Pages Range: 103895

Edition: 14/2018

URI: https://www.sciencedirect.com/science/article/pii/S0165188920300634

DOI: 10.1016/j.jedc.2020.103895

Abstract

Most statistical arbitrage strategies in the academic literature solely rely on price time series. By contrast, alternative data sources are of growing importance for professional investors. We contribute to bridging this gap by assessing the price-predictive value of more than nine million tweets on intraday returns of the S&P 500 constituents. For this purpose, we design a machine learning pipeline addressing specific challenges inherent to this task. At first, we engineer domain-specific features along three categories, i.e., directional indicators, relevance indicators and meta features. Next, we leverage a random forest to extract the relationship between these features and subsequent stock returns in a low signal-to-noise setting. For performance evaluation, we run a rigorous event-based backtesting study across all tweets and stocks. We find annualized returns of 6.4 percent and a Sharpe ratio of 2.2 after transaction costs. Finally, we illuminate the machine learning black box and unveil sources of profitability: First, results are both driven and limited by the temporal clustering of tweets, i.e., the majority of profits stem from tweets clustered closely together in time, corresponding to high-event situations. Second, the importance of included features follows an economic rationale, e.g., tweets with positive sentiment tend to yield positive returns and vice versa. Third, we find that stocks of medium market capitalization and from the consumer and technology sectors contribute most to our results, which we interpret as a trade-off between tweet coverage and tweet relevance.

Authors with CRIS profile

Related research project(s)

How to cite

APA:

Schnaubelt, M., Fischer, T., & Krauß, C. (2020). Separating the signal from the noise - financial machine learning for Twitter. Journal of Economic Dynamics & Control, 114, 103895. https://dx.doi.org/10.1016/j.jedc.2020.103895

MLA:

Schnaubelt, Matthias, Thomas Fischer, and Christopher Krauß. "Separating the signal from the noise - financial machine learning for Twitter." Journal of Economic Dynamics & Control 114 (2020): 103895.

BibTeX: Download