CAPE produces spatially grounded natural-language explanations for document layouts using pattern detection and multi-level context, rated more helpful than content-only baselines in a user study.
Table meets llm: Can large language models understand structured table data? a benchmark and empirical study
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
SpreadsheetAgent uses incremental multi-format reading, structural sketching, and verification to raise spreadsheet benchmark accuracy from 35.27% to 38.16%.
MachineLearningLM uses continued pretraining on SCM-synthesized ML tasks with random-forest distillation to give LLMs robust many-shot in-context learning on tabular classification, reaching random-forest accuracy levels while preserving general chat performance.
SemStruct models tables as heterogeneous graphs with GNNs on frozen PLM embeddings to incorporate row co-occurrences for schema matching and reports SOTA results on Valentine and SOTAB-SM benchmarks.
Survey mapping LLM applications in software quality assurance to established standards including ISO/IEC 12207, ISO 25010, CMMI, and TMM, with case studies, challenges, and future directions.
citing papers explorer
-
Context-Aware Explanations for Spatialized Document Layouts
CAPE produces spatially grounded natural-language explanations for document layouts using pattern detection and multi-level context, rated more helpful than content-only baselines in a user study.
-
Towards Robust Real-World Spreadsheet Understanding with Multi-Agent Multi-Format Reasoning
SpreadsheetAgent uses incremental multi-format reading, structural sketching, and verification to raise spreadsheet benchmark accuracy from 35.27% to 38.16%.
-
MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining
MachineLearningLM uses continued pretraining on SCM-synthesized ML tasks with random-forest distillation to give LLMs robust many-shot in-context learning on tabular classification, reaching random-forest accuracy levels while preserving general chat performance.
-
SemStruct: Contextualizing Semantic Embeddings with Structural Information for Schema Matching
SemStruct models tables as heterogeneous graphs with GNNs on frozen PLM embeddings to incorporate row co-occurrences for schema matching and reports SOTA results on Valentine and SOTAB-SM benchmarks.
-
A Blueprint for AI-Driven Software Quality: Integrating LLMs with Established Standards
Survey mapping LLM applications in software quality assurance to established standards including ISO/IEC 12207, ISO 25010, CMMI, and TMM, with case studies, challenges, and future directions.
- ProfiliTable: Profiling-Driven Tabular Data Processing via Agentic Workflows