1. Trang chủ
  2. » Thể loại khác

Prompting the market a large scale meta analysis of genai in finance nlp (2022–2025)

17 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Prompting the Market? a large-scale meta-analysis of GenAI in Finance NLP (2022–2025)
Tác giả Paolo Pedinotti, Peter Baumann, Nathan Jessurun, Leslie Barrett, Enrico Santus
Chuyên ngành Finance
Thể loại Preprint
Năm xuất bản 2025
Định dạng
Số trang 17
Dung lượng 2,03 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Trang 1

Prompting the Market? A Large-Scale Meta-Analysis of

GenAI in Finance NLP (2022–2025)

Paolo Pedinotti∗, Peter Baumann, Nathan Jessurun

Leslie Barrett, Enrico Santus∗

Bloomberg {ppedinotti, pbaumann25, njessurun, lbarrett4, esantus}@bloomberg.net

Abstract Large Language Models (LLMs) have

rapidly reshaped financial NLP, enabling

new tasks and driving a proliferation of

datasets and diversification of data sources.

Yet, this transformation has outpaced

tra-ditional surveys In this paper, we present

MetaGraph, a generalizable methodology

for extracting knowledge graphs from

scien-tific literature and analyzing them to obtain

a structured, queryable view of research

trends We define an ontology for financial

NLP research and apply an LLM-based

ex-traction pipeline to 681 papers (2022–2025),

enabling large-scale, data-driven analysis.

MetaGraph reveals three key phases: early

LLM adoption and task/dataset

innova-tion; critical reflection on LLM limitations;

and growing integration of peripheral

tech-niques into modular systems This

struc-tured view offers both practitioners and

researchers a clear understanding of how

financial NLP has evolved—highlighting

emerging trends, shifting priorities, and

methodological shifts—while also

demon-strating a reusable approach for mapping

scientific progress in other domains.

1 Introduction

The release of ChatGPT in late 2022

trig-gered a structural shift in NLP, rapidly

accel-erating adoption in high-stakes domains such

as finance LLMs expanded what was possible,

shifting the field from supervised learning to

zero-shot reasoning, and from structured tasks

to flexible document parsing Financial NLP

is undergoing rapid reinvention, after a long

period of dominance by sentiment analysis and

structured extraction (e.g., NER)

This fast-moving evolution has outpaced

traditional surveys, which lack a quantitative

grasp of generative AI’s impact on Financial

∗Equal contribution.

Figure 1: Example of paper subgraph.

NLP We introduce MetaGraph, a method-ology for automated Knowledge Graph (KG) construction from research papers using LLMs,

to address this gap MetaGraph involves the manual definition of an ontology of information that is relevant to tracking research evolution (such as papers metadata, motivations and lim-itations, tasks approached, techniques, models, and datasets), and the use of LLMs in a human-in-the-loop approach to extract this informa-tion and gather it into a unified, queryable

KG, enabling large-scale, data-driven insights about the evolution of the field

We applied MetaGraph to 681 Financial NLP papers (2022–2025), uncovering a land-scape that has undergone substantial trans-formation LLMs have forcefully entered the domain, reshaping both the tasks and data landscape In these three years, we noticed the rise of financial QA and an explosion of datasets powered by synthetic generation, fol-lowed by a growing awareness of LLMs limi-tations The growth in model size has slowed down and has been accompanied by both more sophisticated architectures and the integration

of LLMs with peripheral technologies (such as retrieval) to build integrated systems

Our contributions are three-fold: i) method-ology, as we provide a generalizable pipeline for extraction of quantitative insights from scientific literature This pipeline combines ontology definition, human-in-the-loop graph

Trang 2

Table 1: Comparison with Financial NLP surveys.

Survey GenAI Quantit Taxon KG

Xing et al (2018) ✗ ✗ ✗ ✗

Li et al (2023) ✓ ✗ ✓ ✗

Nie et al (2024) ✓ ✗ ✓ ✗

Du et al (2025) ✓ ✗ ✓ ✗

Ours (2025) ✓ ✓ ✓ ✓

extraction and taxonomy construction, and

it is designed to be reused in other domains

(e.g., biology, legal, etc.); ii) field synthesis,

as we map the shift from model adoption in

pre-existing structured pipelines to redesign

of modular, generative systems; iii) open

re-sources, as we made the graph available at

2 Related Work

Financial NLP Surveys While the survey

tasks such as classification, more recent

sur-veys (Li et al.,2023b;Nie et al.,2024;Du et al.,

2025) have focused on the transformative

im-pact LLMs have had on financial applications

Yet, these works rely on a traditional narrative

review methodology, qualitatively

summariz-ing the applications in the literature

Our approach diverges fundamentally by

em-ploying a bibliometric and holistic analysis of

the field We uncover structural shifts and

data-driven trends within the field by

quanti-tatively mapping the research landscape into

a knowledge graph, offering a comprehensive

view of the impact of LLMs on its evolution

LLM-Assisted Knowledge Graph

Con-struction Carta et al (2023) construct

domain-specific knowledge graphs through

stepwise prompting strategies, whileFunk et al

(2023) and Babaei Giglou et al (2023) use

LLMs to learn hierarchical relations among

con-cepts We take inspiration from GraphRAG

ex-tract a knowledge graph and enrich it with

information at different levels of granularity

1

58 of the 681 papers used in our analysis are not

included in the published dataset These papers were

posted to arXiv under a CC-BY-NC-ND 4.0 license,

which prevents reusers from distributing any derivative,

adapted form of the original material As such, we will

not be distributing them as part of the MetaGraph

knowledge graph.

3 Methodology

We introduce MetaGraph, a methodol-ogy for automatically constructing knowledge graphs to extract quantitative insights from large scientific corpora Graphs support struc-tured and scalable representations of com-plex information, enabling analysis beyond frequency queries, and uncovering relational patterns otherwise dispersed across text MetaGraph is designed to be generalizable and reusable, although in this work we apply

it specifically to NLP in Finance

3.1 Method and Implementation Ontology definition This stage involves finding the graph structure that allows us to attain the most actionable insights The ex-pressiveness of this ontology directly shapes the analytical power of the resulting graph

In our case, we focused on NLP entities, at-tributes, and relationships Whenever possible,

we constrain attributes by either controlled vo-cabularies or existing frameworks The graph structure with the entity types and their rela-tionships are described in Figure 13

Corpus acquisition We curated a corpus

of 681 financial NLP papers (2022-2025) from the ACL Anthology and arXiv (cs.CL, cs.AI, q-fin) using keyword filters23 The geograph-ical distribution of the corpus is in Figure 12

To support trend analysis, we grouped papers into time periods of varying granularity (see partition details inF), ensuring each contained

an equal number of papers We also mapped each paper to its Semantic Scholar ID to cap-ture citation relationships between papers LLM-based extraction We used Gemini 2.5 Flash4 for its trade-off between cost and performance5 We adopted a human-in-the-loop process with sample audits, error analysis, and prompt refinement Key design choices include: i) crafting separate prompts for each information type (e.g., motivations, limitations, tasks) to improve extraction precision; ii) al-lowing abstention and fallback options to

min-2 The set of keywords is financial, fintech, fraud, stock, portfolio, finance

3 Documents were processed via Mistral OCR: https: //mistral.ai/news/mistral-ocr

4 https://deepmind.google/models/gemini/ flash/

5 https://lmarena.ai/leaderboard

Trang 3

imize hallucinations; iii) using CoT prompting,

where models are required to explain their

an-swers The full set of prompts can be seen in

SectionC For a subset of entity types (Models,

Tasks, and Dataset), we manually compared

the extracted ones with a gold set on 12

pa-pers We observed almost perfect performance,

with the exception of two minor tasks from one

paper, and one model from another

Entity Resolution Term inconsistencies

such as name variants (e.g., Finqa and FinQA),

were addressed by clustering over text

embed-dings using OpenAI’s text-embedding-small,

and merging semantically equivalent mentions

based on a cosine similarity threshold of ≥ 0.93

(tuned empirically) (seeZeakis et al.(2023) for

a similar methodology)

Taxonomy Induction We organized

se-lected entity types into taxonomies We

ap-plied zero-shot LLM-based categorization by

iteratively prompting the model on subsets of

entities and manually merging the resulting

hierarchies Each entity is annotated with the

taxonomy categories it is an example of

Relevance Scoring We define a paper

rel-evance score based on three factors: i)

institu-tional centrality, i.e., PageRank of affiliated

institutions in the co-authorship graph; ii)

pro-ductivity, i.e., number of papers published by

the institution; iii) citation normalization,

i.e., paper citations normalized by year-average

citations We have used these scores to capture

emerging trends, by paying more attention to

most relevant papers

4 Findings and Insights

In this section, we showcase the types of

analyses that MetaGraph can enable We focus

on how ChatGPT’s release in late 2022 marked

a turning point for Financial NLP Until then,

the field focused mainly on sentiment analysis,

information extraction, and stock prediction

Up until April 2023, these tasks constituted

90% of published work, and the most widely

used datasets (Table 2a) reflected this focus

LLMs dramatically altered this landscape,

unlocking new applications and pushing

re-search toward more complex tasks Financial

QA has become the leading focus, rising from

10% to 33% of tasks by 2025, while traditional

tasks have steadily declined (see 6 for the

de-Figure 2: Increasing Focus on Financial QA Task frequency here is the number of papers with an instance of the task category.

tailed distribution of task over time)

LLMs have transformed the the way re-searchers approach financial problems This shift moves from narrow, task-specific pipelines

to flexible, generative systems that bridge pre-viously isolated tasks Between April 2023 and February 2024, the average number of tasks per paper rose from 1.36 to 1.9 Traditional tasks such as sentiment analysis and information ex-traction are now often used as intermediate steps in broader systems, such as RAG and financial agents

Data Sources and Datasets Datasets changed too On the one hand, QA bench-marks now lead the field, overtaking traditional datasets (Table2) On the other hand, we wit-nessed an expansion and diversification of the data sources used to generate QA datasets Re-cent papers increasingly mention multimodal and structured inputs—such as tables, charts, audio, and analyst commentaries—alongside core sources such as news and company reports (Table 3) This expansion has been supported

by synthetic data generation, which reduced the need for expert annotations The share of synthetic or human-in-the-loop datasets nearly tripled as LLMs became data generators, from 5% in April 2023 to almost 15% by November

2024 (see references in Table10)

Data trends for tasks are plotted in F 3 Most new datasets target QA tasks, while the development of datasets for other tasks, such as sentiment analysis, has slowed An exception

is stock prediction, which continues to see new benchmarks due to proprietary constraints that make the relevant data unshareable

Trang 4

Dataset Task Freq.

FPB ( Malo et al , 2014 ) SA 29

FinQA ( Chen et al , 2021 ) QA 19

FIQA-SA ( Maia et al , 2018 ) SA 15

ConvFinQA ( Chen et al , 2022 ) QA 13

RefInd ( Kaur et al , 2023 ) RE 7

(a) Nov 2022 – Feb 2024

Dataset Task Freq ConvFinQA ( Chen et al , 2022 ) QA 13 FPB ( Malo et al , 2014 ) SA 13 FinQA ( Chen et al , 2021 ) QA 12 FIQA-SA ( Maia et al , 2018 ) SA 7 FinanceBench ( Islam et al , 2023 ) QA 7

(b) Feb 2024 – Apr 2025

Table 2: Top datasets in Financial NLP by usage across two time periods (QA: Question Answering, SA: Sentiment Analysis, RE: Relation Extraction) We can note lower dataset usage frequencies in the second bin (sign of a fragmentation of the dataset landscape), with the relative proportion of QA vs Non-QA shifting to favor QA.

Table 3: Distribution of data sources and signal

types across time periods (T1: Jan ’22–Aug ’23,

T2: Sep ’23–Jul ’24, T3: Aug ’24–Apr ’25)

Sources (%) (%) (%)

News 27.48 29.14 25.35

Social Media / Forums 21.85 14.20 14.43

Company Reports 28.15 27.37 28.99

Company Fundamentals &

Indicators

11.04 15.09 15.13

Earnings Calls 5.18 7.10 6.58

Analyst Reports 4.73 3.25 3.92

University Textbooks 1.13 0.89 2.38

Financial Analyst Exams 0.45 2.96 3.22

Signals

Text 80.42 73.72 70.44

Tables 16.08 19.87 20.13

Image 1.40 2.56 5.03

Audio 0.70 1.92 1.89

Other 2.10 1.92 2.52

Figure 3: New datasets by period.

4.1 A Growing Awareness

LLMs have lowered key barriers to both

adoption and data processing On one hand,

they remove data format constraints—enabling

the processing of unstructured data On the

other, they support synthetic data

genera-tion, helping mitigate challenges such as cost,

scarcity, and domain bias (Table4) We show

how limitations have changed over time in

Fig-ure 4 As data constraints eased, research

attention increasingly shifted toward

model-level challenges—particularly reasoning,

inter-Figure 4: Reported limitations by period Syn-thetic data share complements data scarcity con-cerns.

pretability, efficiency, and safety We observed growing concerns around bias, privacy risks, and potential misuse (Table7)

This shift toward critical reflection is evident

in the evolution of research motivations, which increasingly convey a more cautious stance toward LLMs By 2024, critical themes such

as robustness, efficiency, reasoning, and RAG appeared in nearly 18% of papers—twice the share observed in early 2023 (see Table 5) This marks a shift from earlier studies, which primarily focused on leveraging LLMs through zero-shot learning and fine-tuning

4.2 From Models to Systems

In the wake of GenAI’s rise, researchers initially focused on adapting general-purpose LLMs to financial NLP tasks through prompt engineering—especially zero-shot and in-context learning—which quickly gained mo-mentum across applications This was often complemented by post-training methods such

as instruction tuning to further specialize mod-els for the financial domain (Table 9)

Researchers began to move beyond model-centric approaches as the limitations of rea-soning, safety, interpretability, and scalabil-ity became more apparent Over time, these

Trang 5

Table 4: Distribution of reported limitations across

time periods.

Data-related Limitations

Costly Human Judging 3.20 3.22 2.83

Insufficient Data Scale/Coverage 12.45 13.00 10.59

Skewed/Imbalanced Classes 3.08 2.29 1.47

Domain/Language Bias 12.03 11.69 9.59

LLM Limitations

Interpretability Gaps 1.63 2.29 2.04

Weak Reasoning 4.11 4.59 5.24

Cost & Environmental Footprint 2.66 2.79 3.46

Hallucination & Bias 2.18 2.95 3.72

Prompt Sensitivity 1.75 2.84 2.78

Latency / Scalability 2.06 2.18 2.57

Synthetic Data / Label Issues 7.38 4.86 5.14

Capacity Constraints 9.07 9.07 9.70

Gaps: Lab vs Live 9.43 8.68 10.27

Other (see appendix) 28.81 29.55 30.06

Table 5: Distribution of future research directions

across time periods.

Data

Data Scarcity & Annotation

Cost

32.25 28.82 23

Other 37.12 35.81 35.48

Exploiting LLMs

Zero/Few-Shot Evaluation 4.41 4.37 2.53

Domain-Specific LLM Training 10.44 8.95 11.89

Solving LLM Limitations

Quantitative Reasoning Gaps 5.10 5.02 6.43

Interpretability & Explainability 3.25 3.71 5.07

Efficiency Constraints 3.02 5.24 5.65

Safety, Robustness, & Fairness 2.78 5.90 7.21

RAG & Retrieval Bottlenecks 1.62 2.18 2.73

techniques were increasingly complemented by

system-level innovations that integrate LLMs

into broader frameworks (Figure5)

The most prominent of these is RAG, which

has become a cornerstone of the field Zooming

in on RAG’s evolution ( Table 8), we find it

mirrors the datasets trend: the spectrum of

source types and data formats has widened,

knowledge bases have grown, and the size of

retrieved context has expanded from single

sentences to large document chunks

Figure 5: Technique evolution over time.

Figure 6: Share of papers using open-source models

by task and timeframe.

This marks a move from standalone LLMs to system-oriented design Prompting strategies have evolved as well (Table 9): the progres-sion from in-context learning to augmented methods such as chain-of-thought, retrieval-based prompts, and self-criticism reflects a move away from relying solely on the model’s few-shot capabilities toward more deliberate prompt enrichment aimed at reducing errors 4.3 Towards Maturity

As the field matured, researchers began prioritizing shared resources over creating new dataset, increasingly relying on estab-lished, literature-backed benchmarks (Fig-ure 14), with notable growth in datasets cov-ering different tasks

A similar trend emerged on the modeling side The community increasingly turned to open-source models, valued for their trans-parency, controllability, and adaptability, as attention shifted from rapid expansion and adaptation to critical evaluation (Figure 6) Table 7 illustrates three key phases: the early dominance of GPT models, the emer-gence of LLaMA (Touvron et al., 2023), and the current diversification toward a mix of open models—such as Qwen (Bai et al.,2023) and DeepSeek (DeepSeek-AI et al.,2025)—and pro-prietary ones

Figure 8 shows how model sizes have also changed over time The field is revisiting cost-performance tradeoffs, driven by the financial and computational cost of large models This shift is reflected in the observed peak and a recent inflection in dimension

One Revolution, Two Speeds GenAI reshaped industry and academia at different paces (Figure 9) We took all the instances

of tasks, models, and datasets in our corpus,

Trang 6

Figure 7: LLMs’ usage distribution over time.

Figure 8: Open-source LLMs’ sizes over time.

and computed the relative proportion of

finan-cial QA instances, open models instances, new

datasets (datasets created after 2022), and

cre-ated datasets (the dataset has been crecre-ated by

the same authors who are using it) Industry

moved faster—dominating financial QA and

driving dataset innovation to stay competitive

Academia responded more cautiously, focusing

on established tasks and open-source models,

with a stronger emphasis on transparency and

reproducibility This is likely due to academia’s

structural constraints, which prioritize

trans-parency, reproducibility, and the use of publicly

available data and models—factors that

inher-ently slow down the adoption of cutting-edge

approaches In contrast, industry has largely

traded off transparency in favor of rapid

exper-imentation, leveraging proprietary data and

closed-source LLMs to push forward advanced

use cases such as Financial QA

4.4 Looking Ahead

Financial NLP is entering a new phase,

driven not just by LLMs’ impact but by a

deeper understanding of their strengths and

limitations As techniques such as RAG and

open-source fine-tuning become standard (grey

line in Figure11), multimodal and small

lan-guage models (green line) are gaining ground

Figure 9: Academia vs Industry.

Figure 10: Latest trends in financial NLP

New trends are also emerging (blue line in Figure 11), most notably multi-agent systems These range from simple expert-critic setups to more complex designs Hierarchical agents sim-ulate organizational roles and top-down flows, while collaborative systems divide tasks among peer agents (see references in 10)

Finally, reinforcement learning is re-entering the conversation: not only as a training ob-jective for trading policies, but also as a tool for improving LLM reasoning itself by align-ing it with a broader system ensuralign-ing more controlled and reliable output

The gap between academic research and real-world financial practice remains an open de-bate, as the focus shifts from QA to reasoning systems

5 Conclusion

MetaGraph is a scalable, general-purpose framework for analyzing scientific literature via LLM-powered knowledge graphs Applied

to 681 Financial NLP papers, it traces the field’s evolution, from rapid LLM adoption and multimodal expansion to growing empha-sis on system-level integration Recent work shifts toward architectural solutions such as RAG, agent-based workflows, and reinforce-ment learning MetaGraph offers a structured

Trang 7

Figure 11: Latest trends in financial NLP

lens on the field’s changing priorities and a

reusable toolkit for data-driven meta-analysis

6 Limitations

• Our approach relies on a manually defined

ontology, which introduces an inductive

bias in how entities and relations are

cate-gorized While this provides structure and

interpretability, it may also limit

flexibil-ity and overlook alternative or emergent

conceptualizations

• Despite the use of a human-in-the-loop

approach with continuous validation, the

entity extraction and taxonomy induction

processes remain based on LLMs, which

are inherently susceptible to

hallucina-tions, inaccuracies, and bias These

limi-tations may affect both the precision and

completeness of the extracted knowledge

References

Hamed Babaei Giglou, Jennifer D’Souza, and S¨ oren

Auer 2023 Llms4ol: Large language models

for ontology learning In The Semantic Web

– ISWC 2023, pages 408–427, Cham Springer

Nature Switzerland.

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai

Dang, Xiaodong Deng, Yang Fan, Wenbin Ge,

Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei

Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao

Liu, Chengqiang Lu, Keming Lu, and 29

oth-ers 2023 Qwen technical report Preprint,

arXiv:2309.16609.

Gagan Bhatia, El Moatez Billah Nagoudi, Hasan

Cavusoglu, and Muhammad Abdul-Mageed.

2024 FinTral: A family of GPT-4 level

mul-timodal financial large language models In

Findings of the Association for Computational

Linguistics: ACL 2024, pages 13064–13087,

Bangkok, Thailand Association for

Computa-tional Linguistics.

Salvatore Carta, Alessandro Giuliani, Leonardo Piano, Alessandro Sebastian Podda, Livio Pom-pianu, and Sandro Gabriele Tiddia 2023 It-erative zero-shot llm prompting for knowledge graph construction Preprint, arXiv:2307.01128 Chung-Chi Chen, Hiroya Takamura, Ichiro Kobayashi, and Yusuke Miyao 2024a Hierar-chical organization simulacra in the investment sector Preprint, arXiv:2410.00354.

Wei Chen, Qiushi Wang, Zefei Long, Xianyin Zhang, Zhongtian Lu, Bingxuan Li, Siyuan Wang, Jiarong Xu, Xiang Bai, Xuanjing Huang, and Zhongyu Wei 2023 Disc-finllm: A chinese financial large language model based on multiple experts fine-tuning Preprint, arXiv:2310.15205 Yuemin Chen, Feifan Wu, Jingwei Wang, Hao Qian, Ziqi Liu, Zhiqiang Zhang, Jun Zhou, and Meng Wang 2024b Knowledge-augmented financial market analysis and report generation In Pro-ceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Indus-try Track, pages 1207–1217, Miami, Florida, US Association for Computational Linguistics Zhiyu Chen, Wenhu Chen, Charese Smiley, Sameena Shah, Iana Borova, Dylan Langdon, Reema Moussa, Matt Beane, Ting-Hao Huang, Bryan Routledge, and William Yang Wang 2021.

FinQA: A dataset of numerical reasoning over financial data In Proceedings of the 2021 Con-ference on Empirical Methods in Natural Lan-guage Processing, pages 3697–3711, Online and Punta Cana, Dominican Republic Association for Computational Linguistics.

Zhiyu Chen, Shiyang Li, Charese Smiley, Zhiqiang

Ma, Sameena Shah, and William Yang Wang.

2022 ConvFinQA: Exploring the chain of numer-ical reasoning in conversational finance question answering In Proceedings of the 2022 Confer-ence on Empirical Methods in Natural Language Processing, pages 6279–6292, Abu Dhabi, United Arab Emirates Association for Computational Linguistics.

Nicole Cho, Nishan Srishankar, Lucas Cecchi, and William Watson 2024 Fishnet: Financial intel-ligence from sub-querying, harmonizing, neural-conditioning, expert swarms, and task planning

In Proceedings of the 5th ACM International Conference on AI in Finance, ICAIF ’24, page 591–599, New York, NY, USA Association for Computing Machinery.

Raul Salles de Padua, Imran Qureshi, and Mustafa U Karakaplan 2023 Gpt-3 mod-els are few-shot financial reasoners Preprint, arXiv:2307.13617.

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi,

Trang 8

Xiaokang Zhang, Xingkai Yu, Yu Wu, Z F Wu,

Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao,

and 181 others 2025 Deepseek-r1:

Incentiviz-ing reasonIncentiviz-ing capability in llms via reinforcement

learning Preprint, arXiv:2501.12948.

Xiang Deng, Vasilisa Bashlovkina, Feng Han,

Si-mon Baumgartner, and Michael Bendersky 2022.

What do llms know about financial markets? a

case study on reddit market sentiment analysis

Preprint, arXiv:2212.11311.

Kelvin Du, Yazhi Zhao, Rui Mao, Frank Xing, and

Erik Cambria 2025 Natural language

process-ing in finance: A survey Information Fusion,

115:102755.

Darren Edge, Ha Trinh, Newman Cheng, Joshua

Bradley, Alex Chao, Apurva Mody, Steven

Tru-itt, Dasha Metropolitansky, Robert Osazuwa

Ness, and Jonathan Larson 2025 From local to

global: A graph rag approach to query-focused

summarization Preprint, arXiv:2404.16130.

Sorouralsadat Fatemi and Yuheng Hu 2024

En-hancing financial question answering with a

multi-agent reflection framework In Proceedings

of the 5th ACM International Conference on AI

in Finance, ICAIF ’24, page 530–537 ACM.

George Fatouros, Kostas Metaxas, John Soldatos,

and Dimosthenis Kyriazis 2024 Can large

lan-guage models beat wall street? evaluating gpt-4’s

impact on financial decision-making with

mar-ketsenseai Neural Computing and Applications.

Maurice Funk, Simon Hosemann, Jean Christoph

Jung, and Carsten Lutz 2023 Towards ontology

construction with language models In Joint

pro-ceedings of the 1st workshop on Knowledge Base

Construction from Pre-Trained Language Models

(KBC-LM) and the 2nd challenge on Language

Models for Knowledge Base Construction

(LM-KBC) co-located with the 22nd International

Se-mantic Web Conference (ISWC 2023), Athens,

Greece, November 6, 2023, volume 3577 of CEUR

Workshop Proceedings CEUR-WS.org.

Yunfan Gao, Yun Xiong, Xinyu Gao,

Kangxi-ang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei

Sun, Qianyu Guo, Meng Wang, and Haofen

Wang 2023 Retrieval-augmented generation

for large language models: A survey ArXiv,

abs/2312.10997.

Jiayu Guo, Yu Guo, Martha Li, and Songtao Tan.

2025 Flame: Financial large-language model

assessment and metrics evaluation Preprint,

arXiv:2501.06211.

Yue Guo and Yi Yang 2024 EconNLI: Evaluating

large language models on economics reasoning

In Findings of the Association for Computational

Linguistics: ACL 2024, pages 982–994, Bangkok,

Thailand Association for Computational

Lin-guistics.

Xuewen Han, Neng Wang, Shangkun Che, Hongyang Yang, Kunpeng Zhang, and Sean Xin

Xu 2024 Enhancing Investment Analysis: Op-timizing AI-Agent Collaboration in Financial Research Papers 2411.04788, arXiv.org Pranab Islam, Anand Kannappan, Douwe Kiela, Rebecca Qian, Nino Scherrer, and Bertie Vid-gen 2023 Financebench: A new benchmark for financial question answering Preprint, arXiv:2311.11944.

Simerjot Kaur, Charese Smiley, Akshat Gupta, Joy Sain, Dongsheng Wang, Suchetha Siddagan-gappa, Toyin Aguda, and Sameena Shah 2023.

Refind: Relation extraction financial dataset

In Proceedings of the 46th International ACM SIGIR Conference on Research and Develop-ment in Information Retrieval, SIGIR ’23, page 3054–3063, New York, NY, USA Association for Computing Machinery.

Rik Koncel-Kedziorski, Michael Krumdick, Viet Lai, Varshini Reddy, Charles Lovering, and Chris Tanner 2024 Bizbench: A quantitative reasoning benchmark for business and finance Preprint, arXiv:2311.06602.

Michael Krumdick, Rik Koncel-Kedziorski, Viet Dac Lai, Varshini Reddy, Charles Lovering, and Chris Tanner 2024 BizBench: A quanti-tative reasoning benchmark for business and finance In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8309–8332, Bangkok, Thailand Association for Computational Linguistics.

Viet Dac Lai, Michael Krumdick, Charles Lovering, Varshini Reddy, Craig Schmidt, and Chris Tan-ner 2024 Sec-qa: A systematic evaluation cor-pus for financial qa Preprint, arXiv:2406.14394 Jiangtong Li, Yuxuan Bian, Guoxuan Wang, Yang Lei, Dawei Cheng, Zhijun Ding, and Changjun Jiang 2023a Cfgpt: Chinese financial as-sistant with large language model Preprint, arXiv:2309.10654.

Xiang Li, Zhenyu Li, Chen Shi, Yong Xu, Qing Du, Mingkui Tan, and Jun Huang 2024a AlphaFin: Benchmarking financial analysis with retrieval-augmented stock-chain framework In Proceed-ings of the 2024 Joint International Confer-ence on Computational Linguistics, Language Re-sources and Evaluation (LREC-COLING 2024), pages 773–783, Torino, Italia ELRA and ICCL Yinheng Li, Shaofei Wang, Han Ding, and Hang Chen 2023b Large language models in finance:

A survey arXiv preprint arXiv:2311.10723 Ac-cepted at the 4th ACM International Conference

on AI in Finance (ICAIF-23).

Yuan Li, Bingqiao Luo, Qian Wang, Nuo Chen,

Xu Liu, and Bingsheng He 2024b CryptoTrade:

Trang 9

A reflective LLM-based agent to guide zero-shot

cryptocurrency trading In Proceedings of the

2024 Conference on Empirical Methods in

Natu-ral Language Processing, pages 1094–1106,

Mi-ami, Florida, USA Association for

Computa-tional Linguistics.

Jiaxin Liu, Yi Yang, and Kar Yan Tam 2025a.

Evaluating and aligning human economic risk

preferences in llms Preprint, arXiv:2503.06646.

Xiao-Yang Liu, Guoxuan Wang, Hongyang Yang,

and Daochen Zha 2023a Fingpt:

Democratiz-ing internet-scale data for financial large

lan-guage models Preprint, arXiv:2307.10485.

Xiao-Yang Liu, Guoxuan Wang, Hongyang Yang,

and Daochen Zha 2023b FinGPT:

Democratiz-ing Internet-scale Data for Financial Large

Lan-guage Models Papers 2307.10485, arXiv.org.

Zhaowei Liu, Xin Guo, Fangqi Lou, Lingfeng Zeng,

Jinyi Niu, Zixuan Wang, Jiajie Xu, Weige Cai,

Ziwei Yang, Xueqian Zhao, Chao Li, Sheng Xu,

Dezhi Chen, Yun Chen, Zuo Bai, and Liwen

Zhang 2025b Fin-r1: A large language model

for financial reasoning through reinforcement

learning Preprint, arXiv:2503.16252.

Zhiwei Liu, Xin Zhang, Kailai Yang, Qianqian

Xie, Jimin Huang, and Sophia Ananiadou 2025c.

Fmdllama: Financial misinformation detection

based on large language models In

Compan-ion Proceedings of the ACM on Web Conference

2025, WWW ’25, page 1153–1157, New York,

NY, USA Association for Computing

Machin-ery.

Dakuan Lu, Hengkui Wu, Jiaqing Liang, Yipei

Xu, Qianyu He, Yipeng Geng, Mengkun Han,

Yingsi Xin, and Yanghua Xiao 2023 Bbt-fin:

Comprehensive construction of chinese financial

domain pre-trained language model, corpus and

benchmark Preprint, arXiv:2302.09432.

Macedo Maia, Siegfried Handschuh, Andr´ e Freitas,

Brian Davis, Ross McDermott, Manel Zarrouk,

and Alexandra Balahur 2018 Www’18 open

challenge: Financial opinion mining and

ques-tion answering In Companion Proceedings of

the The Web Conference 2018, WWW ’18, page

1941–1942, Republic and Canton of Geneva,

CHE International World Wide Web

Confer-ences Steering Committee.

Pekka Malo, Ankur Sinha, Pekka Korhonen, Jyrki

Wallenius, and Pyry Takala 2014 Good debt

or bad debt: Detecting semantic orientations in

economic texts Journal of the Association for

Information Science and Technology, 65(4):782–

796.

Yuqi Nie, Yaxuan Kong, Xiaowen Dong, John M.

Mulvey, H Vincent Poor, Qingsong Wen, and

Stefan Zohren 2024 A survey of large

language models for financial applications:

Progress, prospects and challenges Preprint, arXiv:2406.11903.

Sohini Roychowdhury 2024 Journey of hallucination-minimized generative ai solutions for financial decision makers In Proceedings

of the 17th ACM International Conference on Web Search and Data Mining, WSDM ’24, page 1180–1181, New York, NY, USA Association for Computing Machinery.

Manish Sanwal 2025 Layered chain-of-thought prompting for multi-agent llm systems: A com-prehensive approach to explainable large lan-guage models Preprint, arXiv:2501.18645 Sander Schulhoff, Michael Ilie, Nishant Balepur, Konstantine Kahadze, Amanda Liu, Chenglei

Si, Yinheng Li, Aayush Gupta, HyoJung Han, Sevien Schulhoff, Pranav Sandeep Dulepet, Saurav Vidyadhara, Dayeon Ki, Sweta Agrawal, Chau Pham, Gerson Kroiz, Feileen Li, Hud-son Tao, Ashay Srivastava, and 12 others.

2025 The prompt report: A systematic sur-vey of prompt engineering techniques Preprint, arXiv:2406.06608.

Raj Shah, Kunal Chawla, Dheeraj Eidnani, Agam Shah, Wendi Du, Sudheer Chava, Natraj Raman, Charese Smiley, Jiaao Chen, and Diyi Yang 2022.

When FLUE meets FLANG: Benchmarks and large pretrained language model for financial do-main In Proceedings of the 2022 Conference

on Empirical Methods in Natural Language Pro-cessing, pages 2322–2335, Abu Dhabi, United Arab Emirates Association for Computational Linguistics.

Jiashuo Sun, Hang Zhang, Chen Lin, Xiangdong

Su, Yeyun Gong, and Jian Guo 2024 APOLLO:

An optimized training approach for long-form numerical reasoning In Proceedings of the 2024 Joint International Conference on Computa-tional Linguistics, Language Resources and Eval-uation (LREC-COLING 2024), pages 1370–1382, Torino, Italia ELRA and ICCL.

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Tim-oth´ ee Lacroix, Baptiste Rozi` ere, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guil-laume Lample 2023 Llama: Open and ef-ficient foundation language models Preprint, arXiv:2302.13971.

Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravol-ski, Mark Dredze, Sebastian Gehrmann, Prab-hanjan Kambadur, David Rosenberg, and Gideon Mann 2023 Bloomberggpt: A large language model for finance Preprint, arXiv:2303.17564.

Zengqing Wu, Run Peng, Shuyuan Zheng, Qianying Liu, Xu Han, Brian I Kwon, Makoto Onizuka,

Trang 10

Shaojie Tang, and Chuan Xiao 2024 Shall we

team up: Exploring spontaneous cooperation of

competing LLM agents In Findings of the

Asso-ciation for Computational Linguistics: EMNLP

2024, pages 5163–5186, Miami, Florida, USA.

Association for Computational Linguistics.

Qianqian Xie, Weiguang Han, Zhengyu Chen,

Ruoyu Xiang, Xiao Zhang, Yueru He, Mengxi

Xiao, Dong Li, Yongfu Dai, Duanyu Feng,

Yi-jing Xu, Haoqiang Kang, Ziyan Kuang, Chenhan

Yuan, Kailai Yang, Zheheng Luo, Tianlin Zhang,

Zhiwei Liu, Guojun Xiong, and 15 others 2024.

Finben: A holistic financial benchmark for large

language models In Advances in Neural

Infor-mation Processing Systems, volume 37, pages

95716–95743 Curran Associates, Inc.

Qianqian Xie, Weiguang Han, Yanzhao Lai, Min

Peng, and Jimin Huang 2023a The wall street

neophyte: A zero-shot analysis of chatgpt over

multimodal stock movement prediction

chal-lenges Preprint, arXiv:2304.05351.

Qianqian Xie, Weiguang Han, Xiao Zhang,

Yanzhao Lai, Min Peng, Alejandro Lopez-Lira,

and Jimin Huang 2023b Pixiu: a large

lan-guage model, instruction data and evaluation

benchmark for finance In Proceedings of the

37th International Conference on Neural

Infor-mation Processing Systems, NIPS ’23, Red Hook,

NY, USA Curran Associates Inc.

Frank Z Xing, Erik Cambria, and Roy E Welsch.

2018 Natural language based financial

forecast-ing: a survey Artificial Intelligence Review,

50(1):49–73.

Siqiao Xue, Fan Zhou, Yi Xu, Ming Jin, Qingsong

Wen, Hongyan Hao, Qingyang Dai, Caigao Jiang,

Hongyu Zhao, Shuo Xie, Jianshan He, James

Zhang, and Hongyuan Mei 2024 Weaverbird:

Empowering financial decision-making with large

language model, knowledge base, and search

en-gine Preprint, arXiv:2308.05361.

Hongyang Yang, Xiao-Yang Liu, and Christina Dan

Wang 2023a Fingpt: Open-source

fi-nancial large language models Preprint,

arXiv:2306.06031.

Hongyang Yang, Boyu Zhang, Neng Wang, Cheng

Guo, Xiaoli Zhang, Likun Lin, Junlin Wang,

Tianyu Zhou, Mao Guan, Runjia Zhang, and

Christina Dan Wang 2024 Finrobot: An

open-source ai agent platform for financial

applica-tions using large language models Preprint,

arXiv:2405.14767.

Yi Yang, Yixuan Tang, and Kar Yan Tam 2023b.

Investlm: A large language model for

invest-ment using financial domain instruction tuning

Preprint, arXiv:2309.13064.

Antonio Jimeno Yepes, Yao You, Jan Milczek,

Se-bastian Laverde, and Renyu Li 2024 Financial

report chunking for effective retrieval augmented generation Preprint, arXiv:2402.05131.

Xinli Yu, Zheng Chen, and Yanbin Lu 2023 Har-nessing LLMs for temporal data - a study on explainable financial time series forecasting In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Indus-try Track, pages 739–753, Singapore Association for Computational Linguistics.

Yangyang Yu, Zhiyuan Yao, Haohang Li, Zhiyang Deng, Yuechen Jiang, Yupeng Cao, Zhi Chen, Jordan W Suchow, Zhenyu Cui, Rong Liu, Zhaozhuo Xu, Denghui Zhang, Koduvayur Sub-balakshmi, Guojun Xiong, Yueru He, Jimin Huang, Dong Li, and Qianqian Xie 2024 Fin-con: A synthesized llm multi-agent system with conceptual verbal reinforcement for enhanced financial decision making In Advances in Neu-ral Information Processing Systems, volume 37, pages 137010–137045 Curran Associates, Inc Alexandros Zeakis, George Papadakis, Dimitrios Skoutas, and Manolis Koubarakis 2023 Pre-trained embeddings for entity resolution: An experimental analysis Proc VLDB Endow., 16(9):2225–2238.

Boyu Zhang, Hongyang Yang, Tianyu Zhou, Muhammad Ali Babar, and Xiao-Yang Liu 2023.

Enhancing financial sentiment analysis via re-trieval augmented large language models In Proceedings of the Fourth ACM International Conference on AI in Finance, ICAIF ’23, page 349–356, New York, NY, USA Association for Computing Machinery.

Yiyun Zhao, Prateek Singh, Hanoz Bhathena, Bernardo Ramos, Aviral Joshi, Swaroop Gadi-yaram, and Saket Sharma 2024 Optimiz-ing LLM based retrieval augmented generation pipelines in the financial domain In Proceedings

of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Vol-ume 6: Industry Track), pages 279–294, Mexico City, Mexico Association for Computational Lin-guistics.

A The Geography of Financial NLP

The map depicted in Figure 12 illustrates the geographical distribution of institutions represented in our corpus It clearly high-lights that financial NLP research predomi-nantly clusters around three major global hubs

In the United States, the research activity is highly concentrated along the Atlantic Coast, with a distinct epicenter in New York City

In East Asia, significant research centers have emerged in major economic and technologi-cal hubs, notably within China, Korea, Japan,

Ngày đăng: 15/09/2025, 19:28

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm