arxiv: 1301.3781 · v3 · submitted 2013-01-16 · 💻 cs.CL

Recognition: 2 theorem links

Efficient Estimation of Word Representations in Vector Space

Greg Corrado, Jeffrey Dean, Kai Chen, Tomas Mikolov

Authors on Pith no claims yet

Pith reviewed 2026-05-11 02:16 UTC · model grok-4.3

classification 💻 cs.CL

keywords word vectorsvector space modelsneural networksskip-gramcontinuous bag-of-wordsword similaritysyntactic analogiessemantic relationships

0 comments

The pith

Two new neural network architectures learn continuous vector representations of words from massive text data with higher accuracy and far lower training cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces two model architectures for learning word vectors from very large datasets. These architectures are evaluated on word similarity tasks and compared against earlier neural network techniques. They achieve large gains in accuracy while training high-quality vectors on a 1.6 billion word corpus in less than a day. This matters for applications that rely on representations capturing syntactic and semantic word relationships.

Core claim

We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

What carries the argument

The continuous bag-of-words and skip-gram architectures, shallow neural networks trained to predict surrounding words from a target word or the target word from its context to derive dense vector representations.

Load-bearing premise

That performance on the chosen word similarity and analogy test sets reliably indicates that the vectors capture general syntactic and semantic relationships rather than dataset-specific patterns.

What would settle it

Training the models on the 1.6 billion word dataset and finding no accuracy gain on the syntactic and semantic test sets relative to prior neural network methods, or requiring substantially more computation time to reach comparable performance.

read the original abstract

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Mikolov et al. show two simple neural architectures that train word vectors on a 1.6B-word corpus in under a day while beating prior models on similarity and analogy tests.

read the letter

The main takeaway is that CBOW and Skip-gram give a practical way to get dense word vectors from web-scale text with clear speed and accuracy gains over the neural language models they compare against. They train on 1.6 billion words in less than a day and report better numbers on word similarity plus a new analogy task for syntax and semantics. That efficiency claim is the part that stands out, because earlier neural approaches were too slow for that scale. The architectures themselves are straightforward extensions of existing ideas but packaged for large data, and the results show Skip-gram picking up more semantic relations while CBOW is faster. Credit is due for the concrete training-time numbers and for releasing the models in a way that let others reproduce and build on them quickly. The soft spot is the evaluation: all headline numbers come from the chosen similarity datasets and the authors' own analogy set. Those proxies are reasonable for the time, but nothing in the paper tests whether the vectors transfer to downstream tasks like tagging or parsing, so the claim of generally high-quality representations rests on how well those benchmarks predict real use. Hyperparameter details and exact baseline re-implementations are also light in the write-up, which makes it harder to rule out tuning effects. This paper is aimed at anyone who needs scalable word representations for NLP work. A reader who wants to understand the shift from count-based to predictive embeddings will find the methods and trade-offs useful. It is worth sending to peer review because the empirical results are specific enough to check and the scaling improvement is real even if the broader interpretation needs more validation.

Referee Report

2 major / 3 minor

Summary. The manuscript proposes two novel neural network architectures (Continuous Bag-of-Words and Skip-gram) for learning continuous vector representations of words from very large corpora. It evaluates these representations on word similarity tasks against prior neural methods and introduces an analogy task for syntactic and semantic relations, claiming substantially higher accuracy at far lower computational cost, including training high-quality vectors on a 1.6 billion word dataset in less than a day.

Significance. If the reported accuracy gains and training-time reductions hold under scrutiny, the work is significant for establishing practical, scalable methods to produce high-quality word embeddings. The efficiency stems from the architectural simplifications and use of hierarchical softmax, enabling training on billion-word scales that were previously prohibitive. This has provided a foundation for subsequent embedding techniques and downstream NLP improvements.

major comments (2)

[§4] §4 (Experimental results): The central efficiency claim rests on the reported training time (<1 day on 1.6B words) and accuracy improvements versus prior neural baselines, but the section provides insufficient detail on exact baseline re-implementations, hyperparameter search procedures, and whether the same hardware/resources were used for all methods. This makes it difficult to confirm the comparisons are free of post-hoc tuning.
[§4.2] §4.2 (Evaluation on word analogy task): The state-of-the-art claim is made on a test set introduced by the authors themselves. While the task is a useful contribution, the manuscript does not include results on independent downstream tasks (e.g., named entity recognition or machine translation) or cross-corpus validation to support the broader interpretation that the vectors capture general syntactic and semantic relationships.

minor comments (3)

[§2] §2 (Model architectures): The notation for the input/output layers and context window could be clarified with an explicit equation for the CBOW averaging operation to avoid ambiguity in implementation.
[Table 1, Figure 2] Table 1 and Figure 2: The reported accuracy numbers and training times would benefit from error bars or multiple runs to indicate variability, especially given the stochastic nature of the training.
[References] References: The comparison to prior work (e.g., neural language models by Bengio et al.) could include a more explicit discussion of why the proposed models avoid the computational bottlenecks of those approaches.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive review and constructive comments. We address each major comment below and will make the indicated revisions to improve clarity and transparency.

read point-by-point responses

Referee: [§4] §4 (Experimental results): The central efficiency claim rests on the reported training time (<1 day on 1.6B words) and accuracy improvements versus prior neural baselines, but the section provides insufficient detail on exact baseline re-implementations, hyperparameter search procedures, and whether the same hardware/resources were used for all methods. This makes it difficult to confirm the comparisons are free of post-hoc tuning.

Authors: We agree that greater detail on the experimental setup would strengthen the comparisons. In the revised manuscript we will expand §4 with additional information on the re-implementations of the prior neural baselines, the hyperparameter ranges explored for each method, and explicit confirmation that all timing and accuracy measurements were performed under comparable hardware and resource constraints. revision: yes
Referee: [§4.2] §4.2 (Evaluation on word analogy task): The state-of-the-art claim is made on a test set introduced by the authors themselves. While the task is a useful contribution, the manuscript does not include results on independent downstream tasks (e.g., named entity recognition or machine translation) or cross-corpus validation to support the broader interpretation that the vectors capture general syntactic and semantic relationships.

Authors: The analogy task was introduced in this work precisely to probe syntactic and semantic relations in a controlled manner. While we recognize that evaluations on downstream tasks would provide further support, the scope of the paper centers on efficient learning of high-quality vectors and direct assessment via the new task. We will add a short discussion in the revised version acknowledging this limitation and outlining how the vectors could be applied to downstream problems. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical claims rest on independent external benchmarks

full rationale

The paper defines CBOW and Skip-gram models via explicit objective functions (Eqs. 1-4) trained on raw text corpora, then measures vector quality solely on held-out similarity datasets (WordSim-353) and a newly constructed analogy test set. These evaluation sets are not constructed from the fitted parameters or training objective, nor do any central claims reduce to self-citation or renaming of inputs. The reported accuracy gains and computational savings are direct empirical outcomes against external references, satisfying the self-contained benchmark criterion.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard neural-network training assumptions plus the empirical claim that similarity-task performance measures semantic quality. No new physical entities or ad-hoc constants beyond ordinary hyperparameters are introduced.

free parameters (2)

vector dimensionality
Hyperparameter chosen by the authors; value not stated in abstract.
context window size
Hyperparameter controlling how many surrounding words are used.

axioms (2)

domain assumption Back-propagation through a single hidden layer produces useful word vectors when trained on next-word or context prediction.
Invoked implicitly by proposing the architectures.
domain assumption Word similarity and analogy test sets are valid proxies for syntactic and semantic understanding.
Used to claim state-of-the-art performance.

pith-pipeline@v0.9.0 · 5383 in / 1332 out tokens · 30340 ms · 2026-05-11T02:16:02.313356+00:00 · methodology

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
cs.CL 2026-05 unverdicted novelty 8.0

REALISTA optimizes continuous combinations of valid editing directions in latent space to produce realistic adversarial prompts that elicit hallucinations more effectively than prior methods, including on large reason...
Language Models are Few-Shot Learners
cs.CL 2020-05 accept novelty 8.0

GPT-3 shows that scaling an autoregressive language model to 175 billion parameters enables strong few-shot performance across diverse NLP tasks via in-context prompting without fine-tuning.
Intriguing properties of neural networks
cs.CV 2013-12 accept novelty 8.0

Deep neural networks exhibit distributed high-level semantic representations and discontinuous input-output mappings vulnerable to transferable adversarial perturbations.
Differentially Private Sampling from Distributions via Wasserstein Projection
stat.ML 2026-05 unverdicted novelty 7.0

Proposes Wasserstein Projection Mechanism for differentially private sampling that optimizes Wasserstein distance utility and provides convergence guarantees for approximate computation.
OZ-TAL: Online Zero-Shot Temporal Action Localization
cs.CV 2026-05 unverdicted novelty 7.0

Defines OZ-TAL task and presents a training-free VLM-based method that outperforms prior approaches for online and offline zero-shot temporal action localization on THUMOS14 and ActivityNet-1.3.
An Experimental Method to Study Opinion Diffusion in Human-AI Hybrid Societies
cs.SI 2026-05 unverdicted novelty 7.0

Hybrid human-AI networks in 5x5 grids reached lower final polarization than human-only networks after eight rounds of opinion revision on polarizing topics.
EditRefiner: A Human-Aligned Agentic Framework for Image Editing Refinement
cs.CV 2026-05 unverdicted novelty 7.0

EditRefiner uses a perception-reasoning-action-evaluation agent loop and the EditFHF-15K human feedback dataset to refine text-guided image edits more accurately than prior methods.
Expressiveness Limits of Autoregressive Semantic ID Generation in Generative Recommendation
cs.IR 2026-05 unverdicted novelty 7.0

Autoregressive semantic ID generation creates tree-induced probability correlations that prevent generative recommenders from capturing simple patterns; Latte adds latent tokens to relax these correlations.
Rational Communication Shapes Morphological Composition
cs.CL 2026-05 unverdicted novelty 7.0

Using historical corpora and the Rational Speech Act framework, attested English morphological compositions are ranked higher than plausible alternatives from the same time period when both semantic recoverability and...
Embedding-based In-Context Prompt Training for Enhancing LLMs as Text Encoders
cs.CL 2026-05 unverdicted novelty 7.0

EPIC trains LLMs to treat continuous embeddings as in-context prompts, yielding state-of-the-art text embedding performance on MTEB with or without prompts at inference and lower compute.
Identifying and Characterizing Semantic Clones of Solidity Functions
cs.SE 2026-04 unverdicted novelty 7.0

A code-and-comment analysis method detects semantic clones in Solidity functions with 59% overall precision (84% for same-name functions) and 97% recall on 300k contracts, plus LLM summaries for uncommented code.
Self-Improving Tabular Language Models via Iterative Group Alignment
cs.LG 2026-04 unverdicted novelty 7.0

TabGRAA enables self-improving tabular language models through iterative group-relative advantage alignment using modular automated quality signals like distinguishability classifiers.
Beyond Nodes vs. Edges: A Multi-View Fusion Framework for Provenance-Based Intrusion Detection
cs.CR 2026-04 unverdicted novelty 7.0

PROVFUSION fuses three complementary views of provenance data with lightweight schemes and voting to achieve higher detection accuracy and lower false positives than node- or edge-only baselines on nine benchmarks.
A Simple Framework for Contrastive Learning of Visual Representations
cs.LG 2020-02 accept novelty 7.0

SimCLR learns visual representations by contrasting augmented views of the same image and reaches 76.5% ImageNet top-1 accuracy with a linear classifier, matching a supervised ResNet-50.
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
cs.CL 2019-10 accept novelty 7.0

BART introduces a denoising pretraining method for seq2seq models that matches RoBERTa on GLUE and SQuAD while setting new state-of-the-art results on abstractive summarization, dialogue, and QA with up to 6 ROUGE gains.
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
cs.LG 2019-10 unverdicted novelty 7.0

T5 casts all NLP tasks as text-to-text generation, systematically explores pre-training choices, and reaches strong performance on summarization, QA, classification and other tasks via large-scale training on the Colo...
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space
cs.CL 2026-05 unverdicted novelty 6.0

LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via inte...
FeatMap: Understanding image manipulation in the feature space and its implications for feature space geometry
cs.LG 2026-05 unverdicted novelty 6.0

Linear mappings in feature space can reconstruct a wide range of image manipulations including semantic edits, suggesting that feature representations are approximately linearly organized.
Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs
cs.CV 2026-05 unverdicted novelty 6.0

Exploiting linear structure in VLM embeddings, a synthetic-data pre-training method yields background-invariant representations that exceed 90% worst-group accuracy on Waterbirds even under 100% spurious correlation w...
Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes
cs.CL 2026-05 unverdicted novelty 6.0

Fixed 16-bit binary token codes can replace trainable input embeddings in 32-layer decoder-only models while maintaining comparable held-out perplexity on 17B tokens.
Semantic Smoothing for Language Models via Distribution Estimation and Embeddings
cs.IT 2026-05 conditional novelty 6.0

Semantic smoothing formulates next-word distribution estimation under KL loss with embedding-based KL-proximity side information, yielding an interpolation estimator with worst-case risk O(min{Δ, d/n}) that empiricall...
TAS-LoRA: Transformer Architecture Search with Mixture-of-LoRA Experts
cs.CV 2026-05 unverdicted novelty 6.0

TAS-LoRA attaches a mixture of LoRA experts to a supernet and uses a dynamic router plus group-wise initialization to let different architecture subnets learn distinct features, yielding higher accuracy than prior TAS...
Query-efficient model evaluation using cached responses
cs.LG 2026-05 unverdicted novelty 6.0

DKPS-based methods leverage cached model responses to achieve equivalent benchmark prediction accuracy with substantially fewer queries than standard evaluation.
The Weight Gram Matrix Captures Sequential Feature Linearization in Deep Networks
cs.LG 2026-05 unverdicted novelty 6.0

Gradient descent in deep networks implicitly drives features toward target-linear structure as captured by the weight Gram matrix and a derived virtual covariance.
When AI Meets Science: Research Diversity, Interdisciplinarity, Visibility, and Retractions across Disciplines in a Global Surge
cs.DL 2026-05 unverdicted novelty 6.0

AI adoption in science has shown exponential growth since 2015 across domains but stays confined to few CS-linked topics, carries citation premiums, higher retraction rates, and uneven geographic spread, leaving its t...
When AI Meets Science: Research Diversity, Interdisciplinarity, Visibility, and Retractions across Disciplines in a Global Surge
cs.DL 2026-05 unverdicted novelty 6.0

Post-2015 AI adoption in science grew exponentially across domains but stayed limited to CS-linked topics, carried citation premiums, higher retractions, and showed rising Asian middle-income country involvement.
When AI Meets Science: Research Diversity, Interdisciplinarity, Visibility, and Retractions across Disciplines in a Global Surge
cs.DL 2026-05 unverdicted novelty 6.0

AI use in science has grown exponentially since 2015 but stays confined to computer science and statistics topics, shows higher retraction rates and citations, and follows distinct global adoption patterns.
A Unified Benchmark for Evaluating Knowledge Graph Construction Methods and Graph Neural Networks
cs.LG 2026-05 unverdicted novelty 6.0

A dual-purpose benchmark supplies two text-derived knowledge graphs and one expert reference graph on the same biomedical corpus to jointly measure construction method quality and GNN robustness via semi-supervised no...
Provable Accuracy Collapse in Embedding-Based Representations under Dimensionality Mismatch
cs.DS 2026-05 unverdicted novelty 6.0

Triplet constraints realizable in D-dimensional Euclidean space cannot be preserved above 50% accuracy by any embedding of dimension at most cD for constant c<1, with UGC-hardness preventing better polynomial-time sol...
Deep Kernel Learning for Stratifying Glaucoma Trajectories
cs.LG 2026-05 unverdicted novelty 6.0

A deep kernel learning architecture with transformer feature extraction on clinical-BERT embeddings and Gaussian process backend identifies three glaucoma subgroups by decoupling progression trajectories from current ...
The TEA Nets framework combines AI and cognitive network science to model targets, events and actors in text
cs.AI 2026-04 unverdicted novelty 6.0

TEA Nets extracts agents, events, and targets from text to reveal emotional and semantic patterns in conspiracy theories and psychotherapy transcripts from humans and LLMs.
ImproBR: Bug Report Improver Using LLMs
cs.SE 2026-04 unverdicted novelty 6.0

ImproBR combines a hybrid detector with GPT-4o mini and RAG to raise bug report structural completeness from 7.9% to 96.4% and executable steps from 28.8% to 67.6% on 139 Mojira reports.
ADE: Adaptive Dictionary Embeddings -- Scaling Multi-Anchor Representations to Large Language Models
cs.CL 2026-04 unverdicted novelty 6.0

ADE scales multi-anchor word representations to transformers via Vocabulary Projection, Grouped Positional Encoding, and context-aware reweighting, achieving 98.7% fewer trainable parameters than DeBERTa-v3-base while...
Self-supervised pretraining for an iterative image size agnostic vision transformer
cs.CV 2026-04 unverdicted novelty 6.0

A sequential-to-global SSL method based on DINO pretrains iterative foveal-inspired vision transformers to achieve competitive ImageNet-1K performance with constant compute regardless of input resolution.
Context-Aware Search and Retrieval Under Token Erasure
cs.IR 2026-04 unverdicted novelty 6.0

Assigning higher redundancy to semantically important query features reduces retrieval error probability under token erasures, via multivariate Gaussian approximations of similarity margins and supporting numerical results.
Embedding Arithmetic: A Lightweight, Tuning-Free Framework for Post-hoc Bias Mitigation in Text-to-Image Models
cs.CV 2026-04 unverdicted novelty 6.0

Embedding Arithmetic performs vector operations in the embedding space of T2I models to mitigate bias at inference time, outperforming baselines on diversity while preserving coherence via a new Concept Coherence Score.
Beyond Fine-Tuning: In-Context Learning and Chain-of-Thought for Reasoned Distractor Generation
cs.CL 2026-04 unverdicted novelty 6.0

LLMs prompted with few-shot examples and rationales generate better reasoned distractors for MCQs than fine-tuned contrastive models across six benchmarks.
REZE: Representation Regularization for Domain-adaptive Text Embedding Pre-finetuning
cs.CL 2026-04 unverdicted novelty 6.0

REZE controls representation shifts in contrastive pre-finetuning of text embeddings via eigenspace decomposition of anchor-positive pairs and adaptive soft-shrinkage on task-variant directions.
SIMMER: Cross-Modal Food Image--Recipe Retrieval via MLLM-Based Embedding
cs.CV 2026-04 unverdicted novelty 6.0

SIMMER uses a single multimodal LLM (VLM2Vec) with custom prompts and partial-recipe augmentation to embed food images and recipes, achieving new state-of-the-art retrieval accuracy on Recipe1M.
AFGNN: API Misuse Detection using Graph Neural Networks and Clustering
cs.SE 2026-04 unverdicted novelty 6.0

AFGNN detects API misuses in Java code more effectively than prior methods by representing usage as graphs and clustering learned embeddings from self-supervised training.
The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment
cs.LG 2026-04 unverdicted novelty 6.0

The Master Key Hypothesis states that capabilities are low-dimensional directions transferable across models through linear subspace alignment, with UNLOCK demonstrating gains such as 12.1% accuracy improvement on MAT...
Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space
cs.CL 2026-04 unverdicted novelty 6.0

PAM, a complex-valued associative memory model, exhibits steeper power-law scaling in loss and perplexity than a matched real-valued baseline when trained on WikiText-103 from 5M to 100M parameters.
ERPPO: Entropy Regularization-based Proximal Policy Optimization
cs.LG 2026-05 unverdicted novelty 5.0

ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.
Lost in Translation? Exploring the Shift in Grammatical Gender from Latin to Occitan
cs.CL 2026-05 unverdicted novelty 5.0

An interpretable deep learning framework with a new tokenizer is used to quantify how grammatical gender information is distributed between lemmas and sentential context during the Latin-to-Occitan transition.
Bridging the Programming Language Gap: Constructing a Multilingual Shared Semantic Space through AST Unification and Graph Matching
cs.SE 2026-05 unverdicted novelty 5.0

Unifying AST labels across languages and encoding paired graphs with a Graph Matching Network creates a shared semantic vector space that places functionally equivalent code from different languages near each other.
Natural Language Processing: A Comprehensive Practical Guide from Tokenisation to RLHF
cs.CL 2026-05 unverdicted novelty 5.0

A structured practicum guides readers through the complete modern NLP pipeline with reproducible sessions and new linguistic resources for Tajik and Tatar.
Semantic Structure of Feature Space in Large Language Models
cs.CL 2026-04 unverdicted novelty 5.0

LLM hidden states encode semantic features whose geometric relations, including axis projections, cosine similarities, low-dimensional subspaces, and steering spillovers, closely mirror human psychological associations.
Come Together: Analyzing Popular Songs Through Statistical Embeddings
stat.AP 2026-04 unverdicted novelty 5.0

Logistic PCA embeddings of musical features enable statistical analysis of clustering by album and stylistic changes in Beatles songs by Lennon and McCartney.
Multi-Perspective Evidence Synthesis and Reasoning for Unsupervised Multimodal Entity Linking
cs.CL 2026-04 unverdicted novelty 5.0

MSR-MEL synthesizes instance-centric, group-level, lexical, and statistical evidence with LLMs and asymmetric teacher-student GNNs to outperform prior unsupervised methods on multimodal entity linking benchmarks.
Language, Place, and Social Media: Geographic Dialect Alignment in New Zealand
cs.CL 2026-04 unverdicted novelty 5.0

New Zealand Reddit users link language to place and form contiguous speech communities with complex geographic alignment; Word2Vec embeddings reveal semantic variations and shifts in NZ English on a 4.26 billion word corpus.
Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)
cs.CL 2026-04 unverdicted novelty 5.0

SSAS improves LLM sentiment prediction consistency and data quality by up to 30% on three review datasets via syntactic and semantic context assessment summarization.
NFTDELTA: Detecting Permission Control Vulnerabilities in NFT Contracts through Multi-View Learning
cs.CR 2026-04 unverdicted novelty 5.0

NFTDELTA detects permission control vulnerabilities in NFT contracts by combining sequence and graph views of function CFGs, reporting 241 confirmed issues across 795 collections with 97.92% average precision.
NOMAD: Generating Embeddings for Massive Distributed Graphs
cs.LG 2026-04 unverdicted novelty 5.0

NOMAD delivers an MPI-based distributed implementation of graph embedding models achieving 10-100x median speedups over multi-threaded baselines and 35-76x over prior distributed systems on large clusters.
Digital Skin, Digital Bias: Uncovering Tone-Based Biases in LLMs and Emoji Embeddings
cs.SI 2026-04 unverdicted novelty 5.0

LLMs handle skin tone emoji modifiers better than dedicated embedding models but display systemic disparities in sentiment and semantic consistency across tones.
Bridging the Language Gap in Scholarly Data I: Enhancing Author Disambiguation Algorithms for Chinese Names
cs.DL 2026-04 unverdicted novelty 5.0

A rule-based disambiguation method using networks and content features achieves F1 scores of 0.88 for Pinyin and 0.89 for character names on 80 annotated pairs from 65k physics papers, outperforming baselines via high...
What Are Adversaries Doing? Automating Tactics, Techniques, and Procedures Extraction: A Systematic Review
cs.SE 2026-04 accept novelty 5.0

Systematic review of 80 papers shows TTP extraction shifting to transformer and LLM methods but limited by narrow datasets, single-label focus, and low reproducibility.
StarCoder: may the source be with you!
cs.CL 2023-05 accept novelty 5.0

StarCoderBase matches or beats OpenAI's code-cushman-001 on multi-language code benchmarks; the Python-fine-tuned StarCoder reaches 40% pass@1 on HumanEval while retaining other-language performance.
FastOmniTMAE: Parallel Clause Learning for Scalable and Hardware-Efficient Tsetlin Embeddings
cs.LG 2026-05 unverdicted novelty 4.0

FastOmniTMAE parallelizes clause learning in Tsetlin Machine autoencoders to achieve up to 5x faster training with comparable embedding quality and low-footprint FPGA deployment.
"I Don't Know" -- Towards Appropriate Trust with Certainty-Aware Retrieval Augmented Generation
cs.IR 2026-05 unverdicted novelty 4.0

CERTA adds relevance-based certainty estimation to RAG so LLMs can better signal uncertainty on non-objective questions, reducing overconfidence.
Generating Synthetic Malware Samples Using Generative AI
cs.LG 2026-04 conditional novelty 4.0

Opcode-sequence generative models produce synthetic malware data that raises minor-class classification accuracy by up to 60% and overall detection to 96%.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · cited by 64 Pith papers

[1]

Bengio, R

Y . Bengio, R. Ducharme, P. Vincent. A neural probabilistic language model. Journal of Ma- chine Learning Research, 3:1137-1155, 2003

work page 2003
[2]

Bengio, Y

Y . Bengio, Y . LeCun. Scaling learning algorithms towards AI. In: Large-Scale Kernel Ma- chines, MIT Press, 2007

work page 2007
[3]

Brants, A

T. Brants, A. C. Popat, P. Xu, F. J. Och, and J. Dean. Large language models in machine translation. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Language Learning, 2007

work page 2007
[4]

Collobert and J

R. Collobert and J. Weston. A Uniﬁed Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. In International Conference on Machine Learning, ICML, 2008

work page 2008
[5]

Collobert, J

R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu and P. Kuksa. Natural Lan- guage Processing (Almost) from Scratch. Journal of Machine Learning Research, 12:2493- 2537, 2011

work page 2011
[6]

Dean, G.S

J. Dean, G.S. Corrado, R. Monga, K. Chen, M. Devin, Q.V . Le, M.Z. Mao, M.A. Ranzato, A. Senior, P. Tucker, K. Yang, A. Y . Ng., Large Scale Distributed Deep Networks, NIPS, 2012

work page 2012
[7]

Duchi, E

J.C. Duchi, E. Hazan, and Y . Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 2011

work page 2011
[8]

J. Elman. Finding Structure in Time. Cognitive Science, 14, 179-211, 1990

work page 1990
[9]

Huang, R

Eric H. Huang, R. Socher, C. D. Manning and Andrew Y . Ng. Improving Word Representations via Global Context and Multiple Word Prototypes. In: Proc. Association for Computational Linguistics, 2012

work page 2012
[10]

Hinton, J.L

G.E. Hinton, J.L. McClelland, D.E. Rumelhart. Distributed representations. In: Parallel dis- tributed processing: Explorations in the microstructure of cognition. V olume 1: Foundations, MIT Press, 1986

work page 1986
[11]

Jurgens, S.M

D.A. Jurgens, S.M. Mohammad, P.D. Turney, K.J. Holyoak. Semeval-2012 task 2: Measuring degrees of relational similarity. In: Proceedings of the 6th International Workshop on Semantic Evaluation (SemEval 2012), 2012

work page 2012
[12]

Maas, R.E

A.L. Maas, R.E. Daly, P.T. Pham, D. Huang, A.Y . Ng, and C. Potts. Learning word vectors for sentiment analysis. In Proceedings of ACL, 2011

work page 2011
[13]

T. Mikolov. Language Modeling for Speech Recognition in Czech, Masters thesis, Brno Uni- versity of Technology, 2007

work page 2007
[14]

Mikolov, J

T. Mikolov, J. Kopeck ´y, L. Burget, O. Glembek and J. ˇCernock´y. Neural network based lan- guage models for higly inﬂective languages, In: Proc. ICASSP 2009

work page 2009
[15]

Mikolov, M

T. Mikolov, M. Karaﬁ ´at, L. Burget, J. ˇCernock´y, S. Khudanpur. Recurrent neural network based language model, In: Proceedings of Interspeech, 2010

work page 2010
[16]

Mikolov, S

T. Mikolov, S. Kombrink, L. Burget, J. ˇCernock´y, S. Khudanpur. Extensions of recurrent neural network language model, In: Proceedings of ICASSP 2011

work page 2011
[17]

Mikolov, A

T. Mikolov, A. Deoras, S. Kombrink, L. Burget, J. ˇCernock´y. Empirical Evaluation and Com- bination of Advanced Language Modeling Techniques, In: Proceedings of Interspeech, 2011. 4The code is available at https://code.google.com/p/word2vec/ 11

work page 2011
[18]

Mikolov, A

T. Mikolov, A. Deoras, D. Povey, L. Burget, J. ˇCernock´y. Strategies for Training Large Scale Neural Network Language Models, In: Proc. Automatic Speech Recognition and Understand- ing, 2011

work page 2011
[19]

T. Mikolov. Statistical Language Models based on Neural Networks. PhD thesis, Brno Univer- sity of Technology, 2012

work page 2012
[20]

Mikolov, W.T

T. Mikolov, W.T. Yih, G. Zweig. Linguistic Regularities in Continuous Space Word Represen- tations. NAACL HLT 2013

work page 2013
[21]

Mikolov, I

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed Representations of Words and Phrases and their Compositionality. Accepted to NIPS 2013

work page 2013
[22]

A. Mnih, G. Hinton. Three new graphical models for statistical language modelling. ICML, 2007

work page 2007
[23]

A. Mnih, G. Hinton. A Scalable Hierarchical Distributed Language Model. Advances in Neural Information Processing Systems 21, MIT Press, 2009

work page 2009
[24]

Mnih, Y .W

A. Mnih, Y .W. Teh. A fast and simple algorithm for training neural probabilistic language models. ICML, 2012

work page 2012
[25]

Morin, Y

F. Morin, Y . Bengio. Hierarchical Probabilistic Neural Network Language Model. AISTATS, 2005

work page 2005
[26]

D. E. Rumelhart, G. E. Hinton, R. J. Williams. Learning internal representations by back- propagating errors. Nature, 323:533.536, 1986

work page 1986
[27]

H. Schwenk. Continuous space language models. Computer Speech and Language, vol. 21, 2007

work page 2007
[28]

Socher, E.H

R. Socher, E.H. Huang, J. Pennington, A.Y . Ng, and C.D. Manning. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. In NIPS, 2011

work page 2011
[29]

Turian, L

J. Turian, L. Ratinov, Y . Bengio. Word Representations: A Simple and General Method for Semi-Supervised Learning. In: Proc. Association for Computational Linguistics, 2010

work page 2010
[30]

P. D. Turney. Measuring Semantic Similarity by Latent Relational Analysis. In: Proc. Interna- tional Joint Conference on Artiﬁcial Intelligence, 2005

work page 2005
[31]

Zhila, W.T

A. Zhila, W.T. Yih, C. Meek, G. Zweig, T. Mikolov. Combining Heterogeneous Models for Measuring Relational Similarity. NAACL HLT 2013

work page 2013
[32]

Zweig, C.J.C

G. Zweig, C.J.C. Burges. The Microsoft Research Sentence Completion Challenge, Microsoft Research Technical Report MSR-TR-2011-129, 2011. 12

work page 2011