{"paper":{"title":"Eradicating Negative Transfer in Multi-Physics Foundation Models via Sparse Mixture-of-Experts Routing","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Shodh-MoE uses Top-1 routing on compressed latents from a divergence-free autoencoder to let distinct physics regimes train separate experts, yielding low MSE and autonomous domain separation in mixed pretraining.","cross_cats":["cs.AI","physics.comp-ph"],"primary_cat":"cs.LG","authors_text":"Arastu Sharma, Ellwil Sharma","submitted_at":"2026-05-14T17:58:15Z","abstract_excerpt":"Scaling Scientific Machine Learning (SciML) toward universal foundation models is bottlenecked by negative transfer: the simultaneous co-training of disparate partial differential equation (PDE) regimes can induce gradient conflict, unstable optimization, and plasticity loss in dense neural operators. In particular, broadband open-channel fluid dynamics and boundary-dominated porous media flows impose incompatible spectral and geometric demands on a single dense parameter path. We introduce Shodh-MoE, a sparse-activated latent transformer architecture for multi-physics transport. Shodh-MoE ope"},"claims":{"count":3,"items":[{"kind":"strongest_claim","text":"These results support sparse expert routing as a practical architectural mechanism for mitigating multi-physics interference in universal neural operators.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That a Top-1 soft-semantic router will reliably produce autonomous domain bifurcation and specialized parameter paths for incompatible PDE regimes without losing shared symmetries or requiring extensive hyperparameter tuning.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Shodh-MoE uses Top-1 routing on compressed latents from a divergence-free autoencoder to let distinct physics regimes train separate experts, yielding low MSE and autonomous domain separation in mixed pretraining.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"}],"snapshot_sha256":"e76be456e71c57d198b2ec7be7b925085850212f14cdefbdd6acfee8054a7dce"},"source":{"id":"2605.15179","kind":"arxiv","version":1},"verdict":{"id":"11a974fb-0d35-4ed9-a551-8266fefc7ee9","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T03:18:39.052599Z","strongest_claim":"These results support sparse expert routing as a practical architectural mechanism for mitigating multi-physics interference in universal neural operators.","one_line_summary":"Shodh-MoE uses Top-1 routing on compressed latents from a divergence-free autoencoder to let distinct physics regimes train separate experts, yielding low MSE and autonomous domain separation in mixed pretraining.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That a Top-1 soft-semantic router will reliably produce autonomous domain bifurcation and specialized parameter paths for incompatible PDE regimes without losing shared symmetries or requiring extensive hyperparameter tuning.","pith_extraction_headline":""},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}