
Unstructured data and AI: Fine-tuning LLMs to enhance the investment process
The report discusses the use of unstructured data and AI, particularly large language models (LLMs), in investment processes. It discusses the benefits of fine-tuning these models to improve investment strategies and includes a case study on using AI in ESG investing.
Please login or join for free to read more.

OVERVIEW
Executive summary
The report explores the rise of AI in investment, focusing on large language models (LLMs) like ChatGPT. It highlights the integration of alternative and unstructured data, emphasising natural language processing (NLP) for enhanced insights from financial narratives. Fine-tuning LLMs on proprietary data is shown to add significant value, especially in ESG investing.
Introduction
A 2023 CFA Institute survey found that 55% of investment professionals incorporate unstructured data, and 64% use alternative data. The exponential growth of digital data and advances in technology offer new opportunities for investment, requiring sophisticated methods to parse unstructured data and gain insights.
Unstructured, alternative, and open-source data
Unstructured data, such as PDFs and news articles, differ from traditional financial statements and require advanced algorithms for analysis. The report discusses ethical considerations and methods for using these data in investment projects. It also highlights the benefits of open-source tools in processing unstructured data.
Fine-tuning large language models
The evolution of NLP has enabled significant advancements in data analysis. Fine-tuning LLMs involves using supervised learning with human-labelled data to improve model accuracy. This method can effectively categorise ESG-related communications, impacting stock prices. The report outlines the benefits and challenges of fine-tuning, including dataset curation and performance optimisation.
ESG case study
The case study investigates the impact of ESG disclosures on investment returns, using LLM fine-tuning on Twitter data. It identifies four categories of ESG tweets: “Not Important,” “Community Outreach,” “Industry Recognition,” and “Actions and Innovations.” The study finds that the most material ESG disclosures, especially in small-cap portfolios, drive performance.
Findings show that fine-tuning could effectively discern between categories of materiality of ESG communications. When it comes to materiality effect of disclosures, there was minimal difference in returns among ESG categories, with “Not Important” slightly outperforming for large-cap portfolios. For small-cap portfolios, only the most material ESG disclosures significantly impact stock prices. Examining the effect of company size on the impact of disclosures, the results suggest that the highest performing small-cap portfolio is the “Actions and Innovations” category.
Recommendations
The report suggests:
- Integrating NLP tools to analyse real-time ESG data.
- Fine-tuning LLMs for specific investment use cases.
- Developing in-house capabilities for processing unstructured data.
- Staying updated with technological advancements in AI and NLP.
Conclusion
Fine-tuning LLMs offers valuable insights into ESG and investment performance. While this approach has shown promising results, it requires careful dataset curation and ongoing adaptation to technological changes. Investment professionals should leverage these tools to stay competitive in a rapidly evolving field .