NLP beyond sentiment: the new source of Alpha

The boundaries of true alpha have been pushed further in the past 10 years, due to the development of a large variety of alternative risk premia (ARP) strategies as a new layer between passive (beta) investing and active (alpha) generation.

In order to go beyond risk premia strategies, active portfolio managers need to go past current market observables such as historical prices of assets, implied volatilities, etc. 

Truly forward-looking fundamental information as well as exogenous variables rooted in macroeconomics, socioeconomics and geopolitics, are innovative sources of alpha that can be harnessed by utilising Natural Language Processing (NLP) technology.

NLP can be used to identify stable causal and predictive relationships beyond correlations and sentiment. The outcome of combining that with a sophisticated quant model is unparalleled new sources of systematic alpha.

Beta, Risk Premia and Alpha

  • 1970s Market beta vs Alpha – Capital Asset Pricing Model
    • The CAPM was introduced by multiple economics independently, all building on the earlier work of Harry Markowitz on diversification and modern portfolio theory. Sharpe, Markowitz and Merton Miller jointly received the 1990 Nobel Memorial Prize in Economics for this contribution to the field of financial economics. Fischer Black (1972) developed another version of CAPM, called Black CAPM or zero-beta CAPM, that does not assume the existence of a risk-free asset.
    • The main mathematical tool used here is a mono-factor linear-regression
  • Early 1990s Fama-French Style Factors and multi-factors approaches
    • In 1993, Fama and French made an observation in their famous paper about “Common risk factors in the returns on stocks and bonds” (Journal of Financial Economics). The observation was that two classes of stocks tend to do better than the market as a whole: (i) small caps and (ii) stocks with a high book-to-market ratio. They then added two factors to CAPM to reflect a portfolio’s exposure to these two classes.
    • The approach was then quickly generalised to multi-factor models by academic researchers, practitioners and risk/performance models vendors. The main mathematical here is still linear-regression in a multi-factor world this time. In most cases the factors are not independent. Hence some regularisation (shrinkage) of the covariance matrices (between the various factors) is required to stabilize the regression.
  • Late 1990s – Alternative Risk Premia (ARP)
    • Gradually these factors and the associated systematic/rule-based strategies, have been interpreted as options for investors to contribute to the liquidity of the market in particular market conditions. Investors are rewarded for these alternative sources of risk and return; hence the terminology of Alternative Risk Premia.

These approaches can be either based on time-series and/or be cross-sectional. The process to generate these sources of systematic returns can be non-linear but remain essentially based on pure market data, essentially past market data/prices to predict future prices.

NLP beyond sentiment: the new source of Alpha
Source: Alternative Risk Premia & Alternative Beta, bfinance Insights

Alternative Data and NLP as a new source of algorithmic edge and alpha

An ever-increasing share of human interactions, communications, and culture is recorded as digital text. All this digitised information is becoming a precious input for different types of models in Economics and Finance 1.

To take just a few examples: In finance, text from financial news, social media, and company filings is used to predict asset price movements and study the causal impact of new information. In macroeconomics, text is used to forecast variation in inflation and unemployment and estimate the effects of monetary policy.

As a sign of this acceleration, for the first time in 2018, the main international scientific and professional society for research in natural language and computation, the Association for Computational Linguistics, organised ECONLP 2018. This was the first Workshop on Economics and Natural Language Processing.

key advances

Among the different themes, some key advances were submitted on the following topics:

  • NLP-based market analytics: prediction of economic performance indicators (trend prediction and performance forecasting). This can be done by analysing verbal statements of enterprises, businesses, companies, and associated legal or administrative actors
  • NLP-based organisation/enterprise analytics: tracing and altering social images, risk prediction, fraud analysis, analysis of business, sustainability and auditing reports
  • Competitive intelligence services based on NLP tooling
  • Relationship and interaction between quantitative (structured) economic data (time series data) and qualitative (unstructured verbal) economic data (press releases, newswire streams, social media)
  • Information management based on organising and archiving verbal communication of organisations and enterprises (emails, meeting minutes, business letters, etc.)

Beyond sentiment

An efficient use of NLP in macroeconomics and finance goes much beyond sentiment analysis. NLP can be used to identify the current factors and themes, which are dominating the dynamics of financial markets.

The machine can learn the fundamental relationships between economics variables from the best textbooks. This can be done using a combination of word embedding techniques (distributional semantics) and causal inference.

The access to exogenous information (beyond market data) and the analysis of this information beyond sentiment analysis and identifying “causation”, “relevance” and “sense-making” are the true new sources of alpha.

By Dr. Francois Oustry

  1. Gentzkow, Matthew, Bryan Kelly, and Matt Taddy. “Text as data.” Journal of Economic Literature 57.3 (2019): 535-74. (Stanford, Yale and AQR recent review).