Prof Allen HUANG from the Department of Accounting and Prof YANG Yi from the Department of Information Systems, Business Statistics and Operations Management (ISOM) developed a deep-learning based language model, based on Google’s BERT that is customized to financial text data. FinBERT, created by the two professors as part of the research under the Fintech Research Project, is pre-trained on financial communication text that has an impressive total corpora size of 4.9B tokens.

“We seek to advance our understanding of the information content in qualitative corporate disclosure by applying deep learning methods to its analysis. Compared to general deep-learning methods, this new method has advantages in extracting information accurately when applied to financial text data,” said Prof Huang. “This may help explain various market anomalies and signal earnings management, misreporting and even financial fraud.”

Prof Yang is confident that HKUST’s FinBERT will be a useful addition welcomed by the financial industry. “We use earnings conference calls, analyst reports and corporate filings to pre-train the model while other similar models use financial news. Our financial text sample size is also substantially larger,” said Prof Yang.

FinBERT can be utilized to analyze financial texts to predict outcomes including stock returns, stock volatilities, and corporate fraud. It can be used as a pre-trained model where companies can fine-tune it with their own datasets or be used as a fine-tuned model trained with 10,000 manually annotated analyst statements for sentiment classification in financial communication. Going forward, the research group will explore on creating a financial related sentiment index using FinBERT, which will further accommodate the applications and needs of industry users.

The FinBERT model is available as an open-source project and can be downloaded via the Fintech Research Project: https://ust.az1.qualtrics.com/jfe/form/SV_9LXsU66wTPpyCmF