Minimal collectives of three LLM agents develop spontaneous cooperation, storage strategies, and complex evolving cultural artifacts via interaction with a decaying shared text store and evolutionary pressure.
hub
A Structured Self-attentive Sentence Embedding
12 Pith papers cite this work. Polarity classification is still indexing.
abstract
This paper proposes a new model for extracting an interpretable sentence embedding by introducing self-attention. Instead of using a vector, we use a 2-D matrix to represent the embedding, with each row of the matrix attending on a different part of the sentence. We also propose a self-attention mechanism and a special regularization term for the model. As a side effect, the embedding comes with an easy way of visualizing what specific parts of the sentence are encoded into the embedding. We evaluate our model on 3 different tasks: author profiling, sentiment classification, and textual entailment. Results show that our model yields a significant performance gain compared to other sentence embedding methods in all of the 3 tasks.
hub tools
citation-role summary
citation-polarity summary
polarities
unclear 2representative citing papers
FRACTAL integrates fractional recurrent architecture into SSMs using a tunable singularity index to capture multi-scale temporal features, reporting 87.11% average on Long Range Arena and outperforming S5.
Graph Attention Networks compute learnable attention coefficients over node neighborhoods to produce weighted feature aggregations, achieving state-of-the-art results on citation networks and inductive protein-protein interaction graphs.
VR head and hand motion data can be adapted to motion foundation models to classify cognitive states like confusion and hesitation at 82% accuracy with better cross-user generalization than baseline models on a new 24-participant dataset.
DropAttention regularizes attention weights in fully-connected self-attention networks to reduce overfitting and improve performance.
DG-STA builds dynamic graphs from hand skeletons, applies spatial-temporal self-attention to learn features, and uses a mask to cut cost by 99%, outperforming prior methods on DHG-14/28 and SHREC'17.
DMPP models spatio-temporal event intensity as a deep NN-weighted mixture of kernels to incorporate high-dimensional context while keeping likelihood integration tractable.
PRISM learns shared sentiment prototypes to enable structured cross-modal comparison and dynamic modality reweighting in multimodal sentiment analysis, outperforming baselines on three benchmark datasets.
Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.
AMAD is an end-to-end model using adversarial autoencoders and RNNs with attention for multiscale anomaly detection on time-evolving high-dimensional categorical data.
Pith review generated a malformed one-line summary.
Standard NLP classifiers can surface valid injury precursors from raw construction safety reports.
citing papers explorer
-
Emergent Culture in Minimal LLM Systems
Minimal collectives of three LLM agents develop spontaneous cooperation, storage strategies, and complex evolving cultural artifacts via interaction with a decaying shared text store and evolutionary pressure.
-
FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences
FRACTAL integrates fractional recurrent architecture into SSMs using a tunable singularity index to capture multi-scale temporal features, reporting 87.11% average on Long Range Arena and outperforming S5.
-
Cognitive State Inference from VR Motion via Motion Foundation Model
VR head and hand motion data can be adapted to motion foundation models to classify cognitive states like confusion and hesitation at 82% accuracy with better cross-user generalization than baseline models on a new 24-participant dataset.
-
DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks
DropAttention regularizes attention weights in fully-connected self-attention networks to reduce overfitting and improve performance.
-
Construct Dynamic Graphs for Hand Gesture Recognition via Spatial-Temporal Attention
DG-STA builds dynamic graphs from hand skeletons, applies spatial-temporal self-attention to learn features, and uses a mask to cut cost by 99%, outperforming prior methods on DHG-14/28 and SHREC'17.
-
Deep Mixture Point Processes: Spatio-temporal Event Prediction with Rich Contextual Information
DMPP models spatio-temporal event intensity as a deep NN-weighted mixture of kernels to incorporate high-dimensional context while keeping likelihood integration tractable.
-
Learning Shared Sentiment Prototypes for Adaptive Multimodal Sentiment Analysis
PRISM learns shared sentiment prototypes to enable structured cross-modal comparison and dynamic modality reweighting in multimodal sentiment analysis, outperforming baselines on three benchmark datasets.
-
Universal Transformers
Universal Transformers combine Transformer parallelism with recurrent updates and dynamic halting to achieve Turing-completeness under assumptions and outperform standard Transformers on algorithmic and language tasks.
-
AMAD: Adversarial Multiscale Anomaly Detection on High-Dimensional and Time-Evolving Categorical Data
AMAD is an end-to-end model using adversarial autoencoders and RNNs with attention for multiscale anomaly detection on time-evolving high-dimensional categorical data.
-
Attention Is All You Need
Pith review generated a malformed one-line summary.
-
Automatically Learning Construction Injury Precursors from Text
Standard NLP classifiers can surface valid injury precursors from raw construction safety reports.