pith. sign in

hub

International conference on machine learning , pages=

26 Pith papers cite this work. Polarity classification is still indexing.

26 Pith papers citing it

hub tools

citation-role summary

background 1

citation-polarity summary

years

2026 26

roles

background 1

polarities

background 1

representative citing papers

Language-Induced Priors for Domain Adaptation

cs.LG · 2026-05-14 · conditional · novelty 7.0

Language-Induced Priors from LLMs guide source selection in cold-start domain adaptation through an EM algorithm, matching oracle MSE under a correct prior and remaining asymptotically consistent.

Approximation-Free Differentiable Oblique Decision Trees

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

DTSemNet gives an exact, invertible neural-network encoding of hard oblique decision trees that supports direct gradient training for both classification and regression without probabilistic softening or quantized estimators.

Repeated Deceptive Path Planning against Learnable Observer

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

Deceptive Meta Planning (DeMP) uses two-level optimization to sustain deception against learning observers by combining short-term adaptation with meta-level learning of observer updates.

Randomness is sometimes necessary for coordination

cs.AI · 2026-05-07 · conditional · novelty 7.0

Structured per-agent randomness via ranked masking in attention allows symmetric agents to break ties and coordinate, achieving perfect success on symmetric tasks where deterministic policies fail and enabling zero-shot transfer across team sizes.

Implicit Safety Alignment from Crowd Preferences

cs.AI · 2026-05-20 · unverdicted · novelty 6.0

A hierarchical framework extracts implicit safety criteria from crowd preferences and composes them via high-level policy to reduce safety violations in downstream RL tasks without explicit safety rewards.

Dynamic Plasma Shape Control with Arbitrary Sensor Subsets

cs.RO · 2026-05-15 · unverdicted · novelty 6.0

Reinforcement learning agent trained in DIII-D tokamak simulator achieves 2.01 cm mean shape error on held-out data, tracks dynamic targets, and remains functional under 30% random sensor dropout with direct transfer to experimental shots.

Holder Policy Optimisation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

HölderPO unifies token-level aggregation in GRPO via the Hölder mean with a tunable p parameter and annealing schedule, delivering 54.9% average accuracy on math benchmarks and 93.8% success on ALFWorld.

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

OGPO enables sample-efficient full-finetuning of generative control policies via off-policy critics and modified PPO, achieving SOTA on robot manipulation tasks while rescuing poorly initialized behavior cloning policies without expert data.

Implicit Action Chunking for Smooth Continuous Control

cs.RO · 2026-05-19 · unverdicted · novelty 5.0

Dual-Window Smoothing uses an execution window for deterministic smoothness and a value window to correct critic bias, plus a first-order temporal regularizer, to achieve smoother RL control than explicit chunking or standard baselines.

citing papers explorer

Showing 26 of 26 citing papers.