arxiv: 2604.12049 · v1 · submitted 2026-04-13 · 💻 cs.CL · cs.AI

Recognition: unknown

Leveraging Weighted Syntactic and Semantic Context Assessment Summary (wSSAS) Towards Text Categorization Using LLMs

Shreeya Verma Kathuria , Nitin Mayande , Sharookh Daruwalla , Nitin Joglekar , Charles Weber

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:07 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords text categorizationlarge language modelsdeterministic summarizationsignal-to-noise ratiohierarchical classificationclustering integrityreproducibility

0 comments

The pith

wSSAS provides a deterministic framework using hierarchical text organization and SNR scoring to enhance LLM-based text categorization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper aims to make large language models more reliable for text categorization by introducing the Weighted Syntactic and Semantic Context Assessment Summary, or wSSAS. The method first arranges raw text into a hierarchy of themes, stories, and clusters, then applies a signal-to-noise ratio to highlight important features in a summary-of-summaries process. This setup is meant to cut through the randomness and noise that usually affect LLM outputs. Readers interested in practical AI applications would value a way to get consistent, high-precision results from messy text data at scale.

Core claim

The paper claims that wSSAS, through its two-phased approach of hierarchical classification followed by SNR-based feature prioritization within a Summary-of-Summaries architecture, effectively isolates essential information from background noise, thereby improving clustering integrity, categorization accuracy, and reducing entropy in LLM-driven text analysis.

What carries the argument

The wSSAS framework, which uses a hierarchical structure of Themes, Stories, and Clusters combined with Signal-to-Noise Ratio scoring to prioritize high-value semantic features for deterministic summarization.

If this is right

Clustering integrity and categorization accuracy increase when applied to large review datasets.
Categorization entropy decreases, supporting more consistent LLM outputs.
The process becomes more reproducible for enterprise-scale text categorization tasks.
Model attention stays focused on the most representative data points rather than noise.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This could be extended to other LLM tasks involving classification or summarization to achieve greater output stability.
Integration with different underlying models might show varying degrees of improvement depending on their inherent stochasticity.
Applying the method to real-time streaming text data could test its scalability beyond static datasets.
The hierarchical organization might reveal latent structures in text that standard clustering misses.

Load-bearing premise

The proposed hierarchical organization combined with SNR scoring reliably enforces determinism and isolates essential information without introducing selection bias or discarding context that the LLM would use productively.

What would settle it

If applying wSSAS to the same diverse datasets yields no measurable gains in clustering integrity, categorization accuracy, or entropy reduction compared to direct LLM use, the central claims would be falsified.

Figures

Figures reproduced from arXiv: 2604.12049 by Charles Weber, Nitin Joglekar, Nitin Mayande, Sharookh Daruwalla, Shreeya Verma Kathuria.

**Figure 2.** Figure 2: Experimental Design and Assessment Metrics [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Overall QAG performance for Google Business Reviews [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: True business value lies at the convergence of generated data-segments [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Detailed Sankey diagrams showing cluster transitions for Google Business Reviews. [PITH_FULL_IMAGE:figures/full_fig_p030_5.png] view at source ↗

**Figure 6.** Figure 6: Detailed Sankey diagrams showing cluster transitions for Amazon Product Reviews. [PITH_FULL_IMAGE:figures/full_fig_p032_6.png] view at source ↗

**Figure 7.** Figure 7: Detailed Sankey diagrams showing cluster transitions for Goodreads Book Reviews. [PITH_FULL_IMAGE:figures/full_fig_p034_7.png] view at source ↗

read the original abstract

The use of Large Language Models (LLMs) for reliable, enterprise-grade analytics such as text categorization is often hindered by the stochastic nature of attention mechanisms and sensitivity to noise that compromise their analytical precision and reproducibility. To address these technical frictions, this paper introduces the Weighted Syntactic and Semantic Context Assessment Summary (wSSAS), a deterministic framework designed to enforce data integrity on large-scale, chaotic datasets. We propose a two-phased validation framework that first organizes raw text into a hierarchical classification structure containing Themes, Stories, and Clusters. It then leverages a Signal-to-Noise Ratio (SNR) to prioritize high-value semantic features, ensuring the model's attention remains focused on the most representative data points. By incorporating this scoring mechanism into a Summary-of-Summaries (SoS) architecture, the framework effectively isolates essential information and mitigates background noise during data aggregation. Experimental results using Gemini 2.0 Flash Lite across diverse datasets - including Google Business reviews, Amazon Product reviews, and Goodreads Book reviews - demonstrate that wSSAS significantly improves clustering integrity and categorization accuracy. Our findings indicate that wSSAS reduces categorization entropy and provides a reproducible pathway for improving LLM based summaries based on a high-precision, deterministic process for large-scale text categorization.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

wSSAS adds a hierarchical structure plus SNR filtering to tame LLM variability on review categorization, but the abstract gives no numbers or controls so the gains cannot be checked.

read the letter

The core idea is straightforward: run the LLM once to build a Themes-Stories-Clusters tree from raw reviews, score the pieces with signal-to-noise ratio, then aggregate only the high-value summaries. That two-phase pattern is the main new packaging. It targets a real engineering headache where plain LLM calls on business or product reviews produce inconsistent clusters and high entropy. The datasets chosen (Google, Amazon, Goodreads reviews) are sensible for testing exactly that use case, and the Summary-of-Summaries step is a clean way to reduce noise without hand-crafted rules. Credit for spelling out a concrete workflow instead of another vague prompt template. The soft spots are the missing evidence. The abstract asserts significant accuracy lifts and entropy drops but supplies no baseline numbers, no ablation on the hierarchy or SNR weights, no temperature settings, and no statistical tests. Without those, the determinism claim rests on description alone. The stress-test concern lands: the initial clustering and feature prioritization still depend on the same stochastic LLM (Gemini 2.0 Flash Lite), so run-to-run variance could remain unless the paper shows explicit controls like temperature zero, fixed seeds, or non-LLM steps for the hierarchy. If those controls are absent or untested, the reproducibility benefit is unproven. This is aimed at practitioners who need stable categorization pipelines for customer feedback rather than theorists. A serious referee could evaluate it if the full methods and results sections contain the missing metrics and comparisons; otherwise the central empirical assertions stay unverifiable. I would send it to review only with the data in hand.

Referee Report

3 major / 2 minor

Summary. The paper introduces the Weighted Syntactic and Semantic Context Assessment Summary (wSSAS), a two-phased deterministic framework for LLM-based text categorization. The first phase hierarchically organizes raw text into Themes, Stories, and Clusters; the second applies Signal-to-Noise Ratio (SNR) scoring to prioritize semantic features before aggregation via a Summary-of-Summaries (SoS) architecture. Experiments using Gemini 2.0 Flash Lite on Google Business reviews, Amazon Product reviews, and Goodreads Book reviews are reported to demonstrate improved clustering integrity, higher categorization accuracy, reduced entropy, and a reproducible deterministic process for large-scale tasks.

Significance. If the determinism and performance gains can be substantiated, the work would address a practical barrier to deploying LLMs in enterprise analytics by reducing sensitivity to stochastic attention and noise. A validated method for enforcing reproducibility in hierarchical summarization and categorization could be useful for high-stakes text processing pipelines.

major comments (3)

[Abstract] Abstract: the central empirical claim of 'significant improvements in clustering integrity and categorization accuracy' and 'reduced categorization entropy' is asserted without any reported quantitative metrics, baselines, statistical tests, ablation studies, or description of how determinism was measured or enforced, so the claim cannot be evaluated.
[Framework description] Framework description (two-phased validation): the claim of a 'high-precision, deterministic process' relies on LLMs (Gemini 2.0 Flash Lite) for hierarchical organization and feature prioritization, yet no controls such as temperature=0, fixed seeds, or non-LLM deterministic algorithms are specified; this is load-bearing for the reproducibility assertion.
[Methods] Methods / wSSAS definition: the weighting coefficients and SNR threshold/scaling factor are free parameters whose concrete values and selection procedure are not stated, making it impossible to determine whether reported gains are independent of these choices or artifacts of tuning.

minor comments (2)

The SNR scoring and SoS aggregation steps would benefit from explicit equations or pseudocode to clarify the computation of weights and noise isolation.
Dataset descriptions and preprocessing steps are referenced but lack sufficient detail on size, labeling, and any preprocessing that could affect clustering integrity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough and constructive review, which highlights important areas for strengthening the manuscript's clarity and rigor. We address each major comment below and commit to revisions that will make the empirical claims, determinism controls, and parameter details fully evaluable.

read point-by-point responses

Referee: [Abstract] Abstract: the central empirical claim of 'significant improvements in clustering integrity and categorization accuracy' and 'reduced categorization entropy' is asserted without any reported quantitative metrics, baselines, statistical tests, ablation studies, or description of how determinism was measured or enforced, so the claim cannot be evaluated.

Authors: We agree that the abstract would be stronger with explicit quantitative support. The manuscript body reports experimental outcomes on the three review datasets, but we will revise the abstract to include specific metrics (e.g., accuracy gains and entropy reductions relative to baselines), mention the use of statistical tests, and briefly note how determinism is measured and enforced. Ablation results will also be highlighted in the revised abstract where space permits. revision: yes
Referee: [Framework description] Framework description (two-phased validation): the claim of a 'high-precision, deterministic process' relies on LLMs (Gemini 2.0 Flash Lite) for hierarchical organization and feature prioritization, yet no controls such as temperature=0, fixed seeds, or non-LLM deterministic algorithms are specified; this is load-bearing for the reproducibility assertion.

Authors: We accept that explicit controls are required to substantiate the determinism claim. In the revised manuscript we will specify the exact configuration of Gemini 2.0 Flash Lite, including temperature set to 0 and fixed seeds for any non-deterministic operations. We will also add a short discussion of how these settings, combined with the deterministic SNR-based prioritization step, produce reproducible outputs across runs. revision: yes
Referee: [Methods] Methods / wSSAS definition: the weighting coefficients and SNR threshold/scaling factor are free parameters whose concrete values and selection procedure are not stated, making it impossible to determine whether reported gains are independent of these choices or artifacts of tuning.

Authors: We acknowledge the need for full transparency on these hyperparameters. The revised Methods section will include a new subsection that states the concrete values chosen for the weighting coefficients, SNR threshold, and scaling factor, together with the selection procedure (empirical tuning on a held-out validation subset of each dataset). This will allow readers to assess whether the gains are robust to these choices. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper introduces wSSAS as a descriptive two-phased framework (hierarchical Themes/Stories/Clusters organization followed by SNR scoring and SoS aggregation) and reports experimental outcomes on clustering integrity and entropy reduction using Gemini 2.0 Flash Lite. No equations, parameter-fitting procedures, self-citations, or uniqueness theorems are referenced that would reduce any claimed result to its own inputs by construction. The determinism assertion is presented as a design goal rather than a derived quantity, and the experimental claims rest on external dataset evaluations rather than tautological re-labeling of fitted values.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The central claim rests on several unstated assumptions and parameters whose values are not supplied in the abstract; the ledger below records the most obvious ones implied by the description.

free parameters (2)

wSSAS weighting coefficients
The framework is explicitly weighted, implying tunable coefficients whose values are not reported.
SNR threshold or scaling factor
Used to prioritize semantic features; requires a cutoff or multiplier that must be chosen or fitted.

axioms (1)

domain assumption LLM attention mechanisms can be made effectively deterministic through external hierarchical structuring and SNR-based feature selection.
Invoked by the claim that wSSAS enforces data integrity and reproducibility on top of stochastic LLMs.

invented entities (1)

wSSAS framework no independent evidence
purpose: To impose determinism and noise reduction on LLM text categorization pipelines
Newly proposed composite method whose independent validation is not provided in the abstract.

pith-pipeline@v0.9.0 · 5544 in / 1391 out tokens · 60288 ms · 2026-05-10T15:07:22.881677+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 34 canonical work pages · 7 internal anchors

[1]

Manning, Prabhakar Raghavan, and Hinrich Schütze

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. URL: https://nlp.stanford.edu/IR-book/
[2]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Lit...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2005.14165 2020
[3]

Can llm-generated misinformation be detected?arXiv preprint arXiv:2309.13788, 2023

Canyu Chen and Kai Shu. Can LLM-Generated Misinformation Be Detected?, April 2024. arXiv:2309.13788 [cs]. URL:http://arxiv.org/abs/2309.13788,doi:10.48550/arXiv.2309.13788

work page doi:10.48550/arxiv.2309.13788 2024
[4]

Pisani, and Kathryn Turner

Elias Hossain, Rajib Rana, Niall Higgins, Jeffrey Soar, Prabal Datta Barua, Anthony R. Pisani, and Kathryn Turner. Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review.Computers in Biology and Medicine, 155:106649, March 2023. URL: https://www.sciencedirect. com/science/article/pii/S001048...

work page doi:10.1016/j.compbiomed.2023.106649 2023
[5]

Large lan- guage models are few-shot clinical information extractors

Monica Agrawal, Stefan Hegselmann, Hunter Lang, Yoon Kim, and David Sontag. Large language models are few-shot clinical information extractors. In Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1998– 2022, Abu Dhabi, United Arab Emirates, December 2022...

work page doi:10.18653/v1/2022.emnlp-main.130 2022
[6]

Calibrate Before Use: Improving Few-shot Performance of Language Models

Zihao Zhao, Eric Wallace, Shi Feng, Dan Klein, and Sameer Singh. Calibrate Before Use: Improving Few-shot Performance of Language Models. InProceedings of the 38th International Conference on Machine Learning, pages 12697–12706. PMLR, July 2021. URL:https://proceedings.mlr.press/v139/zhao21c.html

2021
[7]

A Survey on In-context Learning

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Tianyu Liu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. A Survey on In-context Learning, October 2024. arXiv:2301.00234 [cs]. URL:http://arxiv.org/abs/2301.00234,doi:10.48550/arXiv.2301.00234

work page internal anchor Pith review doi:10.48550/arxiv.2301.00234 2024
[8]

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amand...

work page internal anchor Pith review doi:10.48550/arxiv.2206.04615 2023
[9]

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention Is All You Need, August 2023. arXiv:1706.03762 [cs]. URL: http://arxiv.org/ abs/1706.03762,doi:10.48550/arXiv.1706.03762

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1706.03762 2023
[10]

Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, pages 610–623, New York, NY , USA, March

2021
[11]

ISBN 978-1-4503-8309-7

Association for Computing Machinery. URL: https://dl.acm.org/doi/10.1145/3442188.3445922, doi:10.1145/3442188.3445922

work page doi:10.1145/3442188.3445922
[12]

Chi, Nathanael Schärli, and Denny Zhou

Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H. Chi, Nathanael Schärli, and Denny Zhou. Large Language Models Can Be Easily Distracted by Irrelevant Context. InProceedings of the 40th International Conference on Machine Learning, pages 31210–31227. PMLR, July 2023. URL: https: //proceedings.mlr.press/v202/shi23a.html

2023
[13]

Andrew Kreek and Emilia Apostolova

R. Andrew Kreek and Emilia Apostolova. Training and Prediction Data Discrepancies: Challenges of Text Classification with Noisy, Historical Data. In Wei Xu, Alan Ritter, Tim Baldwin, and Afshin Rahimi, editors, Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text, pages 104–109, Brussels, Belgium, November 2018. Asso...

work page doi:10.18653/v1/w18-6114 2018
[14]

Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing

Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.ACM Comput. Surv., 15 wSSAS: A Framework for Improved Text Categorization and Summarization using LLMs 55(9):195:1–195:35, January 2023. URL: https://dl.acm.org/doi...

work page doi:10.1145/3560815 2023
[15]

Syntactic and Semantic Attention Summary (SSAS): An Approach to Improve LLM Summary Generation, October 2024

Nitin Mayande, Sharookh Daruwalla, Sumedh Khodke, Nitin Joglekar, and Charles Weber. Syntactic and Semantic Attention Summary (SSAS): An Approach to Improve LLM Summary Generation, October 2024

2024
[16]

Leveraging Weighted Syntactic and Semantic Attention Summary (wSSAS) Towards Text Categorization Using LLMs, October 2025

Nitin Mayande, Sharookh Daruwalla, Shreeya Verma Kathuria, Nitin Joglekar, and Weber Charles. Leveraging Weighted Syntactic and Semantic Attention Summary (wSSAS) Towards Text Categorization Using LLMs, October 2025

2025
[17]

Deep Learning and the Information Bottleneck Principle

Naftali Tishby and Noga Zaslavsky. Deep Learning and the Information Bottleneck Principle, March 2015. arXiv:1503.02406 [cs]. URL:http://arxiv.org/abs/1503.02406,doi:10.48550/arXiv.1503.02406

work page Pith review doi:10.48550/arxiv.1503.02406 2015
[18]

Davenport and Nitin Mittal.All-in On AI: How Smart Companies Win Big with Artificial Intelligence

Thomas H. Davenport and Nitin Mittal.All-in On AI: How Smart Companies Win Big with Artificial Intelligence. January 2023

2023
[19]

DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection, March 2023

Jiawei Ma, Yulei Niu, Jincheng Xu, Shiyuan Huang, Guangxing Han, and Shih-Fu Chang. DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection, March 2023. arXiv:2303.09674 [cs]. URL:http://arxiv.org/abs/2303.09674,doi:10.48550/arXiv.2303.09674

work page doi:10.48550/arxiv.2303.09674 2023
[20]

control bars

Sewon Min, Mike Lewis, Luke Zettlemoyer, and Hannaneh Hajishirzi. MetaICL: Learning to Learn In Context, May 2022. arXiv:2110.15943 [cs]. URL: http://arxiv.org/abs/2110.15943, doi:10.48550/arXiv. 2110.15943

work page internal anchor Pith review doi:10.48550/arxiv 2022
[21]

A review on the attention mechanism of deep learning , journal =

Zhaoyang Niu, Guoqiang Zhong, and Hui Yu. A review on the attention mechanism of deep learning.Neuro- computing, 452:48–62, September 2021. URL: https://www.sciencedirect.com/science/article/pii/ S092523122100477X,doi:10.1016/j.neucom.2021.03.091

work page doi:10.1016/j.neucom.2021.03.091 2021
[22]

One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation, June 2024

Tejpalsingh Siledar, Swaroop Nath, Sankara Sri Raghava Ravindra Muddu, Rupasai Rangaraju, Swaprava Nath, Pushpak Bhattacharyya, Suman Banerjee, Amey Patil, Sudhanshu Shekhar Singh, Muthusamy Chelliah, and Nikesh Garera. One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation, June 2024. arXiv:2402.11683 [cs]. URL:http://arxiv.org/abs/2402.11683,d...

work page doi:10.48550/arxiv.2402.11683 2024
[23]

arXiv preprint arXiv:2308.15022 , year=

Qingyue Wang, Yanhe Fu, Yanan Cao, Shuai Wang, Zhiliang Tian, and Liang Ding. Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models, August 2025. arXiv:2308.15022 [cs]. URL: http://arxiv.org/abs/2308.15022,doi:10.48550/arXiv.2308.15022

work page doi:10.48550/arxiv.2308.15022 2025
[24]

Hierarchical Text Classifi- cation as Sub-hierarchy Sequence Generation.Proceedings of the AAAI Conference on Artificial Intelligence, 37(11):12933–12941, June 2023

SangHun Im, GiBaeg Kim, Heung-Seon Oh, Seongung Jo, and Dong Hwan Kim. Hierarchical Text Classifi- cation as Sub-hierarchy Sequence Generation.Proceedings of the AAAI Conference on Artificial Intelligence, 37(11):12933–12941, June 2023. URL: https://ojs.aaai.org/index.php/AAAI/article/view/26520, doi:10.1609/aaai.v37i11.26520

work page doi:10.1609/aaai.v37i11.26520 2023
[25]

Hierarchical Summarization: Scaling Up Multi-Document Summarization

Janara Christensen, Stephen Soderland, Gagan Bansal, and Mausam. Hierarchical Summarization: Scaling Up Multi-Document Summarization. In Kristina Toutanova and Hua Wu, editors,Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 902–912, Baltimore, Maryland, June 2014. Association for Compu...

work page doi:10.3115/v1/p14-1085 2014
[26]

Structural Sentence Similarity Estimation for Short Texts

Weicheng Ma and Torsten Suel. Structural Sentence Similarity Estimation for Short Texts. URL: https: //aaai.org/papers/232-flairs-2016-12940/

2016
[27]

Martin.Speech and Language Processing

Daniel Jurafsky and James H. Martin.Speech and Language Processing. Pearson Education, December 2014. Google-Books-ID: Cq2gBwAAQBAJ

2014
[28]

Semantic Map and HBV in English, Chinese and Korean—A Case Study of hand,Shou and Son.Journal of Language Teaching and Research, 7(6):1216, November 2016

Chengyu Nan. Semantic Map and HBV in English, Chinese and Korean—A Case Study of hand,Shou and Son.Journal of Language Teaching and Research, 7(6):1216, November 2016. URL: http://www. academypublication.com/issues2/jltr/vol07/06/21.pdf,doi:10.17507/jltr.0706.21

work page doi:10.17507/jltr.0706.21 2016
[29]

Recent Trends in Named Entity Recognition (NER), January 2021

Arya Roy. Recent Trends in Named Entity Recognition (NER), January 2021. arXiv:2101.11420 [cs]. URL: http://arxiv.org/abs/2101.11420,doi:10.48550/arXiv.2101.11420

work page doi:10.48550/arxiv.2101.11420 2021
[30]

Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey.Multimedia Tools Appl., 78(11):15169– 15211, June 2019.doi:10.1007/s11042-018-6894-4

Hamed Jelodar, Yongli Wang, Chi Yuan, Xia Feng, Xiahui Jiang, Yanchao Li, and Liang Zhao. Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey.Multimedia Tools Appl., 78(11):15169– 15211, June 2019.doi:10.1007/s11042-018-6894-4

work page doi:10.1007/s11042-018-6894-4 2019
[31]

Topic Discovery for Short Texts Using Word Embeddings

Guangxu Xun, Vishrawas Gopalakrishnan, Fenglong Ma, Yaliang Li, Jing Gao, and Aidong Zhang. Topic Discovery for Short Texts Using Word Embeddings. pages 1299–1304, December 2016. doi:10.1109/ICDM. 2016.0176

work page doi:10.1109/icdm 2016
[32]

Neural Mechanisms of Selective Visual Attention.Annual Review of Neu- roscience, 18(V olume 18, 1995):193–222, March 1995

Robert Desimone and John Duncan. Neural Mechanisms of Selective Visual Attention.Annual Review of Neu- roscience, 18(V olume 18, 1995):193–222, March 1995. URL:https://www.annualreviews.org/content/ journals/10.1146/annurev.ne.18.030195.001205,doi:10.1146/annurev.ne.18.030195.001205. 16 wSSAS: A Framework for Improved Text Categorization and Summarization...

work page doi:10.1146/annurev.ne.18.030195.001205 1995
[33]

Ryoo, and Tsung-Yu Lin

Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed, Michael S. Ryoo, and Tsung-Yu Lin. Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs, April 2024. arXiv:2404.07449 [cs]. URL: http://arxiv.org/abs/2404.07449,doi:10.48550/arXiv.2404.07449

work page doi:10.48550/arxiv.2404.07449 2024
[34]

Generative Question Answering: Learning to Answer the Whole Question

Mike Lewis and Angela Fan. Generative Question Answering: Learning to Answer the Whole Question. September
[35]

URL:https://openreview.net/forum?id=Bkx0RjA9tX
[36]

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space, September 2013. arXiv:1301.3781 [cs]. URL: http://arxiv.org/abs/1301.3781, doi: 10.48550/arXiv.1301.3781

work page internal anchor Pith review doi:10.48550/arxiv.1301.3781 2013
[37]

Barbara H. Partee. Lexical Semantics and Compositionality. 1995. URL: https://direct.mit.edu/books/ edited-volume/4671/chapter/214107/Lexical-Semantics-and-Compositionality , doi:10.7551/ mitpress/3964.001.0001

1995
[38]

arXiv preprint arXiv:2310.06201 , year=

Yucheng Li, Bo Dong, Chenghua Lin, and Frank Guerin. Compressing Context to Enhance Inference Efficiency of Large Language Models, October 2023. arXiv:2310.06201 [cs]. URL: http://arxiv.org/abs/2310.06201, doi:10.48550/arXiv.2310.06201

work page doi:10.48550/arxiv.2310.06201 2023
[39]

Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalk- wyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2312.11805 2023
[40]

ROUGE: A Package for Automatic Evaluation of Summaries

Chin-Yew Lin. ROUGE: A Package for Automatic Evaluation of Summaries. InText Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL: https: //aclanthology.org/W04-1013/

2004
[41]

G -Eval: NLG Evaluation using Gpt-4 with Better Human Alignment

Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, and Chenguang Zhu. G-Eval: NLG Eval- uation using Gpt-4 with Better Human Alignment. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2511–2522, Singapore, December 2023. Association for Computation...

work page doi:10.18653/v1/2023.emnlp-main.153 2023
[42]

Andrey Malinin and Mark Gales

Potsawee Manakul, Adian Liusie, and Mark Gales. SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9004–9017, Singapore, December 2023. Association for Computa...

work page doi:10.18653/v1/2023.emnlp-main.557 2023
[43]

Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks

Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982–3992, ...

work page doi:10.18653/v1/d19-1410 2019
[44]

1987 , issue_date =

Peter J. Rousseeuw. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.Journal of Computational and Applied Mathematics, 20:53–65, November 1987. URL: https://www.sciencedirect. com/science/article/pii/0377042787901257,doi:10.1016/0377-0427(87)90125-7

work page doi:10.1016/0377-0427(87)90125-7 1987
[45]

IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-1(2), 224-227 (1979)

David L. Davies and Donald W. Bouldin. A Cluster Separation Measure.IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2):224–227, April 1979. URL: https://ieeexplore.ieee.org/ document/4766909,doi:10.1109/TPAMI.1979.4766909

work page doi:10.1109/tpami.1979.4766909 1979
[46]

Cali´nski and J Harabasz

T. Cali´nski and J Harabasz. A dendrite method for cluster analysis.Communications in Statistics, 3(1):1–27, January 1974. _eprint: https://doi.org/10.1080/03610927408827101.doi:10.1080/03610927408827101. 20 wSSAS: A Framework for Improved Text Categorization and Summarization using LLMs

work page doi:10.1080/03610927408827101.doi:10.1080/03610927408827101 1974
[47]

doi: 10.18653/V1/D19-1018

Jianmo Ni, Jiacheng Li, and Julian McAuley. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors,Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (...

work page doi:10.18653/v1/d19-1018 2019
[48]

Lower scores indicate better separation between thematic groups

Davies-Bouldin Index:Measures the average similarity between clusters. Lower scores indicate better separation between thematic groups
[49]

Calinski-Harabasz (CH) Index:Evaluates the ratio of between-cluster dispersion to within-cluster dispersion (the Variance Ratio Criterion). 36