Thursday, July 25, 2024
- Advertisement -

    Latest Posts

    GPT-4 Performs Better Than Humans At Financial Statement Analysis, Says Study

    GPT-4 can rival or even outperform human professionals at financial statement analysis, revealed research done by the University of Chicago’s Booth School of Business. The researchers Alex G. Kim, Maximilian Muhn and Valeri V. Nikolaev provided standardized and anonymous financial statements to GPT4 and instructed it to determine the direction of future earnings. Using Chain of Thought (CoT) reasoning that mimicked the steps followed by human analysts, GPT-4 was asked to predict whether earnings would decrease or increase in the future, alongside the rationale for its prediction. The LLM model achieved an accuracy of 60.35%, or a 7 percentage points increase compared to analyst predictions one month after a company’s earnings release. The research has been published as a working paper, titled ‘Financial Statement Analysis with Large Language Models’ on the Social Science Research Network (SSRN).

    The researchers pointed out that financial statement analysis was a difficult task, especially for an LLM as it required common sense, broad knowledge and narrative context like industry knowledge and macroeconomic trends. “Financial statement analysis is a broad task that is more of an art than science, whereas machines typically excel in narrow, well-defined tasks,” stated the researchers. They add, “ humans are more capable of incorporating their knowledge of broader context – something a machine often cannot do – by taking into account soft information, knowledge of the industry, regulatory, political, and macroeconomic factors. These factors stack up against the odds that an LLM can achieve a human-like performance in analyzing financial statements.”

    The methods used by the researchers

    The researchers used two methods, one was a ‘simple prompt’, which instructed the LLM to analyze the financial statements of a company and determine the direction of future earnings, without any further guidance on the task. The second was a CoT prompt where the model was instructed to identify any notable changes in certain financial statements, compute key financial ratios and provide economic interpretations of the results. Then, using this basic quantitative information and the insights that follow from it, the model is asked to predict whether earnings are likely to increase or decrease in the subsequent period and provide a rationale for its claim. The model is also asked to state the magnitude of the change and its confidence in the answer. 

    The model used for the experiment was gpt-4-0125-preview, the most updated version of GPT, with the temperature set to zero.

    The experiment utilised two years of balance sheet and three years of income statement data, acquired from Compustat, a financial database, and filtered to 150,678 observations from 15,401 companies. The researchers also anonymised the data.

    Analyst forecasts were taken from The Institutional Brokers’ Estimate System (IBES) database and compiled into monthly consensus forecasts, which are the median of individual forecasts. 

    Forecasts were evaluated on the basis of two metrics,

    • Accuracy, which is the percentage of correctly predicted cases scaled by the total number of predictions made and
    • F1-score, which is the harmonic mean of precision and recall.

    Precision measures the proportion of true positive predictions in the total positive predictions, while recall measures the proportion of true positive predictions out of all actual positives.

    Results of the study

    The results of the study showed that the analysts had an accuracy of 52.71% and the F1 score was 54.48% when predicting the direction of one-year-ahead earnings. Simple prompts had an accuracy of 52.33% and an F1-score of 54.52%, while CoT prompts were accurate 60.35% of the time and had an F1 of 60.90%.

    The researchers also compared GPT-4 with specialised Artificial Neural Networks (ANN) that were designed to predict earnings data. They found that GPT-4 performed on par with state-of-the-art specialised neural networks. “Not only does it outperform human analysts, but it generates performance on par with the narrowly specialized state-of-the-art ML applications,” said the report.

    When discussing the source of GPT-4’s ability, the researchers state, “We first rule out that the model’s performance stems from its memory. Instead, our analysis suggests that the model draws its inference by gleaning useful insights from its analysis of trends and financial ratios, and by leveraging its theoretical understanding and economic reasoning. Notably, the narrative financial statement analysis generated by the language model has substantial informational value in its own right.” They added that “even though we strive to understand the sources of model predictions, it is empirically difficult to pinpoint how and why the model performs well.”

    Also Read:

    The post GPT-4 Performs Better Than Humans At Financial Statement Analysis, Says Study appeared first on MEDIANAMA.

    Latest Posts

    - Advertisement -

    Don't Miss

    Stay in touch

    To be updated with all the latest news, offers and special announcements.