This paper introduces statistical models Wordscores and Wordfish to study and predict banking crises. While Wordscores is akin to supervised learning, Wordfish is analogous to unsupervised learning. Both methods estimate the position of banking distress on a tranquil-to-crisis spectrum. Findings suggest that the two statistical methods signal banking crisis up to two-years in advance, with robust results from AUROC, Granger causality and VAR impulse responses. Both methods outperform random forests in predicting crises using textual data. The Wordscores index highlights increased usage of banking sector nomenclature two years preceding a crisis, and Granger causes a crisis series with one and two lag lengths. Results from the Wordfish technique, a statistical model with Poisson distribution, show the index spikes before and during the Global Financial Crisis, when a large share of the countries in the world encountered banking crises. This paper contributes to literature on text-based models of banking crises by bolstering the preemptive policy responses available to policy makers. Given their early warning signals, both Wordscores and Wordfish can be considered a part of the toolset to monitor the stability and resilience of the banking sector.
Digital transformation entails new sources of economic information in the form of rich texts, which can provide a deeper understanding of banking sector developments. With textual data available and accessible in digital format, this paper develops three distinct indices based on a large corpus of economic news articles to forecast banking crises. The methodological approaches feature the identification of key topics within a large volume of texts. A Banking Crisis Lexicon Index and Sentiment Index are developed through analysing a vast number of economic articles to detect the evolution of banking sector discourse. Findings from Granger causality highlight leading indicator status of the Banking Crisis Lexicon Index, signalling a change in the banking crisis series four years in advance, accentuated by innovations from a VAR analysis using Cholesky decomposition, and substantiated by receiver operating characteristics with under the curve estimates suggesting robust predictive performance strength above 70%, on a global scale, for developed economies and crisis countries. Serving as benchmark, the Sentiment Index constitutes a concurrent indicator, which moves in tandem with the crisis series, thereby providing more granular information on banking developments. A combined Banking Crisis Lexicon and Sentiment Index exhibits solid forecasting performance, which is comparable to the Banking Crisis Lexicon Index, yet with shorter lead time. In a robustness test, German-based indices outperform those based on English reporting in a predominantly German speaking region, highlighting the value of textual analysis in the vernacular. In reading between the lines, this paper contributes to the literature on quantitative analyses of textual data in constructing text-based banking crisis indicators to support a preemptive policy response.