Fitting logic gates as 4D multilinear polynomials with covariance Jacobian selection matches or beats 16D softmax baselines on seven datasets and remains stable at 12-layer depth where the baseline drops 37 points on CIFAR-10.
Bishop.Pattern Recognition and Machine Learning
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7verdicts
UNVERDICTED 7representative citing papers
In-context learning decomposes into concept-coordinate regression plus off-subspace leakage, with recoverable task information concentrating in a 68-73 dimensional task-aligned subspace of the residual stream that restores 78.8% of the accuracy gap in Llama-3-8B experiments.
A Bayesian finite mixture of cluster-specific low-rank regressions for mixed Gaussian-Bernoulli-negative binomial outcomes, with posterior contraction results and WAIC-based tuning of clusters and rank.
A Dirichlet process mixture model for marked Poisson point processes with squared-link intensities and Laplace variational inference jointly infers clusters, cluster count, and continuous mark-specific intensity surfaces.
Self-supervised encoders prefer isotropic Gaussian latent states because the Information Bottleneck, recast as rate-distortion over the predictive manifold, makes these states optimal for target-neutral representations.
Reinforcement learning with shallow networks produces stronger Schnapsen agents than supervised imitation and yields statistically significant gains against strong search-based baselines when paired with lookahead.
Supervised LDA restructuring of PCA-compressed embeddings raises silhouette separability from near zero to 0.197 in plant phenomics but yields mixed classical ML gains and persistent challenges for quantum kernel alignment under limited compute.
citing papers explorer
-
Fitting Multilinear Polynomials for Logic Gate Networks
Fitting logic gates as 4D multilinear polynomials with covariance Jacobian selection matches or beats 16D softmax baselines on seven datasets and remains stable at 12-layer depth where the baseline drops 37 points on CIFAR-10.
-
In-Context Learning Operates as Concept Subspace Learning
In-context learning decomposes into concept-coordinate regression plus off-subspace leakage, with recoverable task information concentrating in a 68-73 dimensional task-aligned subspace of the residual stream that restores 78.8% of the accuracy gap in Llama-3-8B experiments.
-
Bayesian low-rank latent-cluster regression for mixed health outcomes
A Bayesian finite mixture of cluster-specific low-rank regressions for mixed Gaussian-Bernoulli-negative binomial outcomes, with posterior contraction results and WAIC-based tuning of clusters and rank.
-
Laplace Variational Inference for Dirichlet Process Mixtures of Marked Poisson Point Processes
A Dirichlet process mixture model for marked Poisson point processes with squared-link intensities and Laplace variational inference jointly infers clusters, cluster count, and continuous mark-specific intensity surfaces.
-
Why Self-Supervised Encoders Want to Be Normal
Self-supervised encoders prefer isotropic Gaussian latent states because the Information Bottleneck, recast as rate-distortion over the predictive manifold, makes these states optimal for target-neutral representations.
-
From Imitation to Interaction: Mastering Game of Schnapsen with Shallow Reinforcement Learning
Reinforcement learning with shallow networks produces stronger Schnapsen agents than supervised imitation and yields statistically significant gains against strong search-based baselines when paired with lookahead.
-
Supervised Latent Restructuring for Small-Data Quantum Learning in Plant Phenomics
Supervised LDA restructuring of PCA-compressed embeddings raises silhouette separability from near zero to 0.197 in plant phenomics but yields mixed classical ML gains and persistent challenges for quantum kernel alignment under limited compute.