Abstract:
In today’s rapidly evolving financial domain, effective information retrieval and analysis are crucial for making well-informed decisions. This study focuses on constructing a domain-specific knowledge graph (KG) tailored for financial data retrieval, aiming to overcome challenges such as the time-intensive nature of traditional processes, hallucination in large language models (LLMs), and knowledge cut-off limitations. The related works have explored improving the efficiency of RAG pipelines in finance by enhancing retriever accuracy, reducing hallucinations, and increasing context relevance through techniques such as fine-tuning and re-ranking. However, none have focused on employing a KG to achieve these objectives. In developing the finance domain-specific KG, an LLM is employed to extract entities and relationships from unstructured financial news articles. The extracted data was structured into a graph format, stored in a Neo4j database, and was queried using Cypher queries. This approach increased the potential to recover related information while maintaining scalability and a stable accuracy level. The resulting KG offered a concise and insightful overview of the market, highlighting key entities and their interconnections. The accuracy of the KG is evaluated by comparing its extracted entities and relationships with validated financial articles. The KG comprises 858 nodes and 1,624 edges, with a density of 0.0022 and an average node degree of 3.79. A clustering coefficient of 0.1447 indicates moderate interconnectedness, while 12 node types and 47 relationship types ensure diversity and a relationship type entropy of 4.2073 reflects its complexity. While future work will focus on the seamless integration of the developed KG with RAG systems to enhance LLMs’ contextual reasoning and response accuracy, the current findings already demonstrate that KGs significantly improve semantic search by reducing ambiguity and enhancing the quality of retrieved financial information.