Dianjin- r1: Evaluating and enhancing financial reasoning in large language models

Dianjin-r1: Evaluating, enhancing financial reasoning in large language models , author= · 2025 · arXiv 2504.15716

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

VertMark: A Unified Training-Free Robust Watermarking Framework for Vertical Domain Pre-trained Language Models

cs.CR · 2026-05-04 · unverdicted · novelty 7.0

VertMark embeds robust, training-free watermarks into vertical domain language models by creating hidden semantic equivalence between low-frequency triggers and high-frequency domain terms via parameter swaps, supporting reliable verification with negligible performance impact.

CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning

cs.CL · 2026-06-30 · unverdicted · novelty 6.0

CLExEval introduces a human-annotated evaluation framework on 40 rare cases that identifies verbosity bias, hidden knowledge paradox, and 68.6% reasoning-to-output mismatch in LLMs while showing LLM-as-a-Judge overestimates reliability.

FinReasoning: A Hierarchical Benchmark for Reliable Financial Research Reporting

cs.CL · 2026-02-25 · unverdicted · novelty 6.0

FinReasoning is a hierarchical benchmark that decomposes LLM financial research capabilities into semantic consistency, data alignment, and deep insight, revealing model-type differences in auditing versus insight generation.

Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models

cs.CL · 2025-08-21 · unverdicted · novelty 6.0

Fin-PRM is a domain-specialized process reward model that supplies binary step-level and trajectory-level supervision signals for financial reasoning in LLMs and outperforms general PRMs on CFLUE and FinQA benchmarks.

YouZhi: Towards High-Concurrency Financial LLMs via Adaptive GQA-to-MLA Transition

cs.CL · 2026-06-04 · unverdicted · novelty 5.0

YouZhi-LLM applies a layer-adaptive GQA-to-MLA transition plus Ascend-specific distillation and fine-tuning to reduce KV-cache size, yielding up to 2.69× higher concurrency and modest gains on financial benchmarks versus base models.

Reasoning through Verifiable Forecast Actions: Consistency-Grounded RL for Financial LLMs

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

StockR1 unifies LLM-based financial reasoning and time-series forecasting by emitting verifiable forecast actions that condition a decoder, optimized via consistency-grounded RL to improve accuracy on QA and prediction tasks.

PubSwap: Public-Data Off-Policy Coordination for Federated RLVR

cs.LG · 2026-04-14 · unverdicted · novelty 5.0

PubSwap uses a small public dataset for selective off-policy response swapping in federated RLVR to improve coordination and performance over standard baselines on math and medical reasoning tasks.

citing papers explorer

Showing 7 of 7 citing papers.

VertMark: A Unified Training-Free Robust Watermarking Framework for Vertical Domain Pre-trained Language Models cs.CR · 2026-05-04 · unverdicted · none · ref 59
VertMark embeds robust, training-free watermarks into vertical domain language models by creating hidden semantic equivalence between low-frequency triggers and high-frequency domain terms via parameter swaps, supporting reliable verification with negligible performance impact.
CLExEval: A Human-in-the-Loop Framework for Qualitative Evaluation of LLM Clinical Reasoning cs.CL · 2026-06-30 · unverdicted · none · ref 104
CLExEval introduces a human-annotated evaluation framework on 40 rare cases that identifies verbosity bias, hidden knowledge paradox, and 68.6% reasoning-to-output mismatch in LLMs while showing LLM-as-a-Judge overestimates reliability.
FinReasoning: A Hierarchical Benchmark for Reliable Financial Research Reporting cs.CL · 2026-02-25 · unverdicted · none · ref 42
FinReasoning is a hierarchical benchmark that decomposes LLM financial research capabilities into semantic consistency, data alignment, and deep insight, revealing model-type differences in auditing versus insight generation.
Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models cs.CL · 2025-08-21 · unverdicted · none · ref 31
Fin-PRM is a domain-specialized process reward model that supplies binary step-level and trajectory-level supervision signals for financial reasoning in LLMs and outperforms general PRMs on CFLUE and FinQA benchmarks.
YouZhi: Towards High-Concurrency Financial LLMs via Adaptive GQA-to-MLA Transition cs.CL · 2026-06-04 · unverdicted · none · ref 53
YouZhi-LLM applies a layer-adaptive GQA-to-MLA transition plus Ascend-specific distillation and fine-tuning to reduce KV-cache size, yielding up to 2.69× higher concurrency and modest gains on financial benchmarks versus base models.
Reasoning through Verifiable Forecast Actions: Consistency-Grounded RL for Financial LLMs cs.LG · 2026-05-21 · unverdicted · none · ref 16
StockR1 unifies LLM-based financial reasoning and time-series forecasting by emitting verifiable forecast actions that condition a decoder, optimized via consistency-grounded RL to improve accuracy on QA and prediction tasks.
PubSwap: Public-Data Off-Policy Coordination for Federated RLVR cs.LG · 2026-04-14 · unverdicted · none · ref 31
PubSwap uses a small public dataset for selective off-policy response swapping in federated RLVR to improve coordination and performance over standard baselines on math and medical reasoning tasks.

Dianjin- r1: Evaluating and enhancing financial reasoning in large language models

fields

years

verdicts

representative citing papers

citing papers explorer