hub

Scalable training of

Andrew, Galen, Gao, Jianfeng , booktitle=

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

browse 17 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

other 1

citation-polarity summary

unclear 1

representative citing papers

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

Human-Grounded Multimodal Benchmark with 900K-Scale Aggregated Student Response Distributions from Japan's National Assessment of Academic Ability

cs.CL · 2026-05-12 · unverdicted · novelty 7.0

A new benchmark dataset drawn from Japan's National Assessment of Academic Ability supplies real exam layouts, diagrams, Japanese text, and nationwide student response distributions for evaluating multimodal LLMs.

The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

Semantic Softmax aggregates probabilities from semantic synonyms around target labels to correct renormalization bias in zero-shot LLM classification, lowering calibration error and raising AUROC and F1.

CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.

Accurate and Efficient Statistical Testing for Word Semantic Breadth

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

A new permutation test uses Householder reflection to align word embedding clouds before testing dispersion differences, cutting Type-I error by 32.5% and speeding up 23x on GPU.

Longformer: The Long-Document Transformer

cs.CL · 2020-04-10 · accept · novelty 7.0

Longformer uses local windowed attention plus task-specific global attention to achieve linear scaling and state-of-the-art results on long-document language modeling, QA, and summarization after pretraining.

Linking Extreme Discourse to Structural Polarization in Signed Interaction Networks

cs.SI · 2026-05-12 · unverdicted · novelty 6.0

A pipeline derives continuous signed edges from LLM stance scores on text and links discourse signals such as toxicity and extreme claims to changes in structural polarization measured by spectral and frustration scores on Reddit Brexit data.

Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

Summing outputs from separately trained QLoRA PEFT modules provides strong performance for attribute-controlled text generation, often matching or exceeding single-task modules even on single-attribute tests.

Improving Lexical Difficulty Prediction with Context-Aligned Contrastive Learning and Ridge Ensembling

cs.CL · 2026-05-09 · unverdicted · novelty 6.0

Context-Aligned Contrastive Regression combines cross-view context alignment and ordinal soft contrastive learning with ridge ensembles to improve lexical difficulty prediction across L1 backgrounds on three datasets.

Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding

cs.CV · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

Response-G1 uses query-guided scene graphs, memory retrieval, and augmented prompting to improve when Video-LLMs decide to respond during streaming videos.

Grounded Satirical Generation with RAG

cs.CL · 2026-05-11 · unverdicted · novelty 5.0

RAG and topic-based word selection increase perceived political relevance in generated satirical definitions but produce no clear improvement in humor according to human raters.

SciVQR: A Multidisciplinary Multimodal Benchmark for Advanced Scientific Reasoning Evaluation

cs.CV · 2026-05-11 · unverdicted · novelty 5.0 · 2 refs

SciVQR is a new multimodal benchmark covering 54 scientific subfields that evaluates MLLMs on visual comprehension and multi-step reasoning, revealing significant limitations in leading models.

Text-Guided Multi-Scale Frequency Representation Adaptation

cs.CV · 2026-05-05 · unverdicted · novelty 5.0

FreqAdapter adapts multimodal models by text-guided multi-scale fine-tuning in the frequency domain, claiming better performance and efficiency than signal-space PEFT methods.

Towards General Text Embeddings with Multi-stage Contrastive Learning

cs.CL · 2023-08-07 · unverdicted · novelty 5.0

GTE_base is a compact text embedding model using multi-stage contrastive learning on diverse data that outperforms OpenAI's API and 10x larger models on massive benchmarks and works for code as text.

From Traditional Taggers to LLMs: A Comparative Study of POS Tagging for Medieval Romance Languages

cs.CL · 2026-05-09 · unverdicted · novelty 4.0

LLM-based POS tagging outperforms traditional taggers on medieval Occitan, Catalan, and French, with fine-tuning and cross-lingual transfer providing the largest gains for under-resourced varieties.

Qwen Goes Brrr: Off-the-Shelf RAG for Ukrainian Multi-Domain Document Understanding

cs.CL · 2026-05-11 · unverdicted · novelty 3.0

A RAG pipeline with contextual PDF chunking, question-and-answer-aware retrieval and reranking using Qwen3 models reaches 0.96 accuracy on a Ukrainian multi-domain document QA shared task.

PSK@EEUCA 2026: Fine-Tuning Large Language Models with Synthetic Data Augmentation for Multi-Class Toxicity Detection in Gaming Chat

cs.CL · 2026-05-08 · unverdicted · novelty 2.0

Llama 3.1 8B fine-tuned with calibrated 5% synthetic data augmentation reaches 0.6234 F1-macro on multi-class toxicity detection in gaming chat and places fourth among 35 teams.

citing papers explorer

Showing 17 of 17 citing papers.

Evaluating Very Long-Term Conversational Memory of LLM Agents cs.CL · 2024-02-27 · unverdicted · none · ref 4
Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.
Human-Grounded Multimodal Benchmark with 900K-Scale Aggregated Student Response Distributions from Japan's National Assessment of Academic Ability cs.CL · 2026-05-12 · unverdicted · none · ref 4
A new benchmark dataset drawn from Japan's National Assessment of Academic Ability supplies real exam layouts, diagrams, Japanese text, and nationwide student response distributions for evaluating multimodal LLMs.
The Silent Vote: Improving Zero-Shot LLM Reliability by Aggregating Semantic Neighborhoods cs.CL · 2026-05-10 · unverdicted · none · ref 22
Semantic Softmax aggregates probabilities from semantic synonyms around target labels to correct renormalization bias in zero-shot LLM classification, lowering calibration error and raising AUROC and F1.
CA-SQL: Complexity-Aware Inference Time Reasoning for Text-to-SQL via Exploration and Compute Budget Allocation cs.CL · 2026-05-08 · unverdicted · none · ref 4
CA-SQL achieves 51.72% execution accuracy on the challenging tier of the BIRD benchmark using GPT-4o-mini by scaling exploration breadth according to estimated task difficulty, evolutionary prompt seeding, and candidate voting.
Accurate and Efficient Statistical Testing for Word Semantic Breadth cs.CL · 2026-05-08 · unverdicted · none · ref 4
A new permutation test uses Householder reflection to align word embedding clouds before testing dispersion differences, cutting Type-I error by 32.5% and speeding up 23x on GPU.
Longformer: The Long-Document Transformer cs.CL · 2020-04-10 · accept · none · ref 30
Longformer uses local windowed attention plus task-specific global attention to achieve linear scaling and state-of-the-art results on long-document language modeling, QA, and summarization after pretraining.
Linking Extreme Discourse to Structural Polarization in Signed Interaction Networks cs.SI · 2026-05-12 · unverdicted · none · ref 4
A pipeline derives continuous signed edges from LLM stance scores on text and links discourse signals such as toxicity and extreme claims to changes in structural polarization measured by spectral and frustration scores on Reddit Brexit data.
Output Composability of QLoRA PEFT Modules for Plug-and-Play Attribute-Controlled Text Generation cs.CL · 2026-05-12 · unverdicted · none · ref 37
Summing outputs from separately trained QLoRA PEFT modules provides strong performance for attribute-controlled text generation, often matching or exceeding single-task modules even on single-attribute tests.
Improving Lexical Difficulty Prediction with Context-Aligned Contrastive Learning and Ridge Ensembling cs.CL · 2026-05-09 · unverdicted · none · ref 11
Context-Aligned Contrastive Regression combines cross-view context alignment and ordinal soft contrastive learning with ridge ensembles to improve lexical difficulty prediction across L1 backgrounds on three datasets.
Response-G1: Explicit Scene Graph Modeling for Proactive Streaming Video Understanding cs.CV · 2026-05-08 · unverdicted · none · ref 4 · 2 links
Response-G1 uses query-guided scene graphs, memory retrieval, and augmented prompting to improve when Video-LLMs decide to respond during streaming videos.
Grounded Satirical Generation with RAG cs.CL · 2026-05-11 · unverdicted · none · ref 4
RAG and topic-based word selection increase perceived political relevance in generated satirical definitions but produce no clear improvement in humor according to human raters.
SciVQR: A Multidisciplinary Multimodal Benchmark for Advanced Scientific Reasoning Evaluation cs.CV · 2026-05-11 · unverdicted · none · ref 4 · 2 links
SciVQR is a new multimodal benchmark covering 54 scientific subfields that evaluates MLLMs on visual comprehension and multi-step reasoning, revealing significant limitations in leading models.
Text-Guided Multi-Scale Frequency Representation Adaptation cs.CV · 2026-05-05 · unverdicted · none · ref 4
FreqAdapter adapts multimodal models by text-guided multi-scale fine-tuning in the frequency domain, claiming better performance and efficiency than signal-space PEFT methods.
Towards General Text Embeddings with Multi-stage Contrastive Learning cs.CL · 2023-08-07 · unverdicted · none · ref 88
GTE_base is a compact text embedding model using multi-stage contrastive learning on diverse data that outperforms OpenAI's API and 10x larger models on massive benchmarks and works for code as text.
From Traditional Taggers to LLMs: A Comparative Study of POS Tagging for Medieval Romance Languages cs.CL · 2026-05-09 · unverdicted · none · ref 25
LLM-based POS tagging outperforms traditional taggers on medieval Occitan, Catalan, and French, with fine-tuning and cross-lingual transfer providing the largest gains for under-resourced varieties.
Qwen Goes Brrr: Off-the-Shelf RAG for Ukrainian Multi-Domain Document Understanding cs.CL · 2026-05-11 · unverdicted · none · ref 4
A RAG pipeline with contextual PDF chunking, question-and-answer-aware retrieval and reranking using Qwen3 models reaches 0.96 accuracy on a Ukrainian multi-domain document QA shared task.
PSK@EEUCA 2026: Fine-Tuning Large Language Models with Synthetic Data Augmentation for Multi-Class Toxicity Detection in Gaming Chat cs.CL · 2026-05-08 · unverdicted · none · ref 19
Llama 3.1 8B fine-tuned with calibrated 5% synthetic data augmentation reaches 0.6234 F1-macro on multi-class toxicity detection in gaming chat and places fourth among 35 teams.

Scalable training of

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer