The investment industry is increasingly using AI technology to make investment decisions, but computers are not ready to replace human intelligence just yet.

Professor YOU Haifeng, Department of Accounting, HKUST Business School

Artificial intelligence (AI), which is powered by machine learning technology, has affected many aspects of our lives. For example, Apple’s Siri uses natural language processing to understand the commands of users and then frame the appropriate responses. Medical image analysis, which uses deep neural networks, is also widely used in the healthcare industry to provide more accurate medical diagnoses.

The adoption of machine learning technology in the investment industry has also picked up steam in recent years. Hedge fund managers and other institutional investors are increasingly turning to machine learning for opportunities to generate superior returns. Many have attempted to identify outperforming stocks and estimate the likelihood of bond default, using mathematical algorithms to facilitate their investment decisions.

This accelerated adoption is partially driven by an explosive growth in alternative data. Such data is compiled from sources such as financial transactions, sensors, mobile devices, satellites, and the internet, and cannot be handled by traditional software such as Microsoft Excel. Machine learning technology makes it substantially easier to extract information from this hard-to-process data, and this has led to an “arms race” in alternative data among institutional investors.

While alternative data has certainly taken the spotlight in recent years, traditional data, particularly data relating to financial statements, also merits a fresh look through the lens of AI. Financial statements, together with other information contained in corporate reports, have long been one of the most important sources of information for investors. The tradition of analyzing financial statements for investment decision making can be traced back at least as far as Graham and Dodd (1934). In their canonical book of value investing, Graham and Dodd devote several hundred pages to analyzing financial statements to arrive at the intrinsic value of a company.

Financial Statement Data

This “fundamental analysis” has been a dominant approach to investing for almost a century. Traditional fundamental analysis primarily relies on “human intelligence” which requires investors to painstakingly decipher financial statements, together with other information in corporate reports. For example, it is well known that Warren Buffett likes to read corporate filings. He famously got his start as an investor by reading Moody’s manual of publicly listed companies from cover to cover.

However, corporate financial reports have become increasingly complex and lengthy. A study finds that the median text length of corporate annual reports in the US doubled from 23,000 words in 1996 to nearly 50,000 words in 2013.1 Firms also report an intimidating amount of financial statement items. The most commonly used machine-readable financial statement database, COMPUSTAT, currently reports nearly 1,000 items of data for 840,469 firm-year observations of 41,159 unique firms (as of Oct 10, 2021). Complex financial reports impose considerable challenges to investors when processing the information for a large number of firms, and this prevents them from fully appreciating the information content of the financial statements. Indeed, researchers have demonstrated that complex annual reports lead to a significant market underreaction to the information contained in these reports.2 Thus, it is conceivable that significant information remains hidden in financial statements and is not yet fully understood by investors.

Can machine learning technology come to the rescue? The answer is likely to be yes. Machine learning has been developed to efficiently handle high-dimensional data, and is capable of accommodating more complex relationships. As discussed above, financial statements, together with the footnotes, include nearly 1,000 data items. It is a formidable task for a human to process such a large volume of data for thousands of publicly listed companies, but it could be a breeze for a powerful computer armed with advanced AI algorithms.

Furthermore, financial statement data is the accumulated result of numerous transactions and it involves a complicated generation process. Rich information is consequently hidden in the relationships among the line items of different financial statements. These relationships can be very subtle and non-linear in nature, and therefore cannot be easily understood by investors. In contrast, machine learning has the capability to deal with complicated nonlinear relationships, making it a promising technology to navigate through these subtle relationships and extract useful information for investors.

Forecasting Corporate Earnings

A recent study adopted machine learning technology to perform one of the most important tasks of fundamental analysis, that is, forecasting corporate earnings. This provides clear evidence about the usefulness of the technology in equity investment using financial statement data.3 Corporate earnings are one of the most important drivers of equity valuation and stock returns. Both academics and industry practitioners have devoted tremendous effort to forecasting earnings accurately.

For example, as suggested by Thomson Reuters, over 30,000 analysts from 3,000+ contributing brokers have contributed their earnings forecasts to its IBES database. The revision in these forecasts often triggers dramatic stock price changes. Academics have also developed a battery of statistical models to produce earnings forecasts. However, researchers have challenged the performance of these model-based forecasts, and concluded that they are not much more accurate than the simple guess that future earnings will remain the same as the prior year.4

The predictive power of the extant models is limited because most of them focus only on highly aggregated measures (e.g. net income) and fail to account for the differential effects of other detailed financial statement line items. Furthermore, these models are largely designed to take a linear form, and cannot capture the subtle nonlinear relationships that could be very important in driving future earnings. The research shows that machine learning forecasts outperform the extant models substantially in terms of forecast accuracy. Moreover, machine learning forecasts almost completely subsume the information content in the extant models regarding future earnings changes. Finally, the new information extracted by machine learning models predicts analyst forecast errors and future stock returns, suggesting that both financial analysts and the stock market fail to make full use of the information in financial statements.

Machine Learning Limitations

While machine learning shows great promise in fundamental analysis and investment decision making, it is not without limitations. Compared to other applications such as image recognition and voice recognition, the application of machine learning to investment faces several distinct challenges. First, there is much more noise in stock prices than in images or voices, and this lead to a low information-to-noise ratio for many of the prediction problems on the financial markets. The low information-to-noise ratio exacerbates the risk of overfitting, and therefore may render machine learning models useless for out-of-sample predictions.

Meanwhile, non-stationarity is another problem when we use machine learning models to make investment decisions. Stock markets are highly dynamic, and the innate relationships may experience significant changes due to reasons such as changing market conditions and investor preferences, and arbitrage activities by investors.

In 2020, the market witnessed a significant drop in the investment return of Renaissance Technologies, a legendary hedge fund known for adopting machine learning in its investment process.5 In a letter to its clients, Renaissance admitted that their models are trained on historical data, so the results were not surprising considering that 2020 was unusual year. Adam Taback, chief investment officer of Wells Fargo Private Wealth Management, also notes that quantitative models might have difficulty capturing useful information when markets are volatile.

These examples highlight the importance of human judgment and experience in overcoming the limitations of machine learning in the investment process. Human knowledge can be useful to determine whether patterns detected by machines reflect overfitting due to noise or reflect a sensible and sustainable relationship. Humans may also make better forward-looking logical judgments or predictions with a small amount of historical data. By better isolating information from noise and taking a forward-looking approach, human intelligence is still of paramount importance in making sound investment decisions.

Thus, at least in the foreseeable future, humans are unlikely to be completely replaced by machines in the investment industries. Both humans and machines have their own parts to play in the game. The best approach would be man+machine. As Paul Tudor Jones, a renowned billionaire hedge fund manager said, “No man is better than a machine, and no machine is better than a man with a machine.”6

The aforementioned study on earnings forecasts demonstrates the promise of this approach. The authors find that machine learning forecasts compete well against, and most of the time outperform, the collective wisdom of financial analysts from thousands of contributing brokerage firms. More importantly, the simple approach of combining machine learning and analysts’ forecasts outperforms both individually. The combined forecasts are more accurate than analysts’ consensus forecasts in 29, 31, and 32 years out of the total 33 years testing period for one, two and three years ahead forecasts respectively, indicating that human analysts can substantially improve their performance by incorporating the insights from machine learning models.

The simple approach of combining machine learning and analysts’ forecasts outperforms both individually

1. Travis Dyer, Mark Lang, Lorien Stice-Lawrence, 2017, The evolution of 10-K textual disclosure: Evidence from Latent Dirichlet Allocation, Journal of Accounting and Economics, 64(2–3): 221-245.
2. Haifeng You, Xiao-Jun Zhang, 2009, Financial reporting complexity and investor underreaction to 10-K information, Review of Accounting Studies 14: 559-586
3. Kai Cao, Haifeng You, 2021, Fundamental analysis via machine learning:
4. Steven J. Monahan, 2018, Financial statement analysis and earnings forecasting. Foundations and Trends® in Accounting 12 (2), 105-215.