Recognition: unknown
Generalized Slow Roll for Tensors
read the original abstract
The recent BICEP2 detection of degree scale CMB B-mode polarization, coupled with a deficit of observed power in large angle temperature anisotropy, suggest that the slow-roll parameter $\epsilon_H$, the fractional variation in the Hubble rate per efold, is both relatively large and may evolve from an even larger value on scales greater than the horizon at recombination. The relatively large tensor contribution implied also requires finite matching features in the tensor power spectrum for any scalar power spectrum feature proposed to explain anomalies in the temperature data. We extend the generalized slow-roll approach for computing power spectra, appropriate for such models where the slow-roll parameters vary, to tensor features where scalar features are large. This approach also generalizes the tensor-scalar consistency relation to be between the ratio of tensor and scalar sources and features in the two power spectra. Features in the tensor spectrum are generically suppressed by $\epsilon_H$ relative those in the scalar spectrum and by the smoothness of the Hubble rate, which must obey covariant conservation of energy, versus its derivatives. Their detection in near future CMB data would indicate a fast roll period of inflation where $\epsilon_H$ approaches order unity, allowed but not required by inflationary explanations of temperature anomalies.
This paper has not been read by Pith yet.
Forward citations
Cited by 18 Pith papers
-
AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures
AsyncSparse presents BCSR and WCSR kernels that use TMA and warp specialization to accelerate SpMM, outperforming prior libraries by 1.47-6.24x on SuiteSparse and achieving 2.66x end-to-end speedup on Qwen2.5-7B at 90...
-
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
A recurrent-depth architecture enables language models to improve reasoning performance by iterating computation in latent space, achieving gains equivalent to much larger models on benchmarks.
-
Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
Piper introduces resource modeling and pipelined hybrid parallelism for MoE training, delivering 2-3.5X higher MFU than prior frameworks and 1.2-9X better all-to-all bandwidth.
-
COPUS: Co-adaptive Parallelism and Batch Size Selection in Large Language Model Training
COPUS co-adapts batch size and parallelism during LLM training via goodput to deliver 3.9-8% average faster convergence than fixing one while tuning the other.
-
COMPASS: A Unified Decision-Intelligence System for Navigating Performance Trade-off in HPC
COMPASS formalizes HPC configuration questions as ML tasks on traces, quantifies recommendation trustworthiness, and delivers 65.93% lower average job turnaround time plus 80.93% lower node usage versus prior methods ...
-
Muon is Scalable for LLM Training
Muon optimizer with weight decay and update scaling achieves ~2x efficiency over AdamW for large LLMs, shown via the Moonlight 3B/16B MoE model trained on 5.7T tokens.
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM is a 176B-parameter open-access multilingual language model trained on the ROOTS corpus that achieves competitive performance on benchmarks, with improved results after multitask prompted finetuning.
-
Entanglement-informed distributed wavefunction approach to scalable quantum many-body systems
Entanglement structure provides a natural distributed representation for quantum wavefunctions that reduces Hamiltonian applications to local contractions and enables near-linear scaling in simulations.
-
Selecting optimal unrestricted Hartree-Fock trial wavefunctions for phaseless auxiliary-field quantum Monte Carlo: Accuracy and limitations in modeling three iron-sulfur clusters
Chemical properties and symmetries, not variational energy, should guide UHF trial selection for ph-AFQMC on iron-sulfur clusters, yielding accurate energies despite suboptimal sampling and bias compensation.
-
Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics
A heterogeneous HPC diagnostics framework achieves 314x GPU speedup for 100k execution traces and identifies 32.28% potential speedup for GAMESS on Frontier via a tri-dimensional performance model.
-
Enhancing Performance Insight at Scale: A Heterogeneous Framework for Exascale Diagnostics
An accelerated hpcanalysis framework ingests performance data from 100,000 MPI ranks in 9.69 seconds, delivers up to 314x GPU speedup, maps network congestion on Aurora, and uses a new tri-dimensional model to identif...
-
Practical Formal Verification for MLIR Programs
A hybrid concrete-symbolic verifier checks MLIR program equivalence in linear time for a supported subset and is applied to AMD MLIR-AIR, MLIR-AIE, and mlir-opt on hundreds of benchmarks.
-
LASER: Learning Active Sensing for Continuum Field Reconstruction
LASER trains a reinforcement learning policy inside a latent dynamics model to choose sensor placements that improve reconstruction of continuum fields under sparsity.
-
NOMAD: Generating Embeddings for Massive Distributed Graphs
NOMAD delivers an MPI-based distributed implementation of graph embedding models achieving 10-100x median speedups over multi-threaded baselines and 35-76x over prior distributed systems on large clusters.
-
Measurement of Generative AI Workload Power Profiles for Whole-Facility Data Center Infrastructure Planning
High-resolution power profiles for AI workloads on H100 GPUs are measured and scaled to whole-facility energy demand using a bottom-up model, with the dataset made public.
-
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
DeepSeekMoE 2B matches GShard 2.9B performance and approaches a dense 2B model; the 16B version matches LLaMA2-7B at 40% compute by using fine-grained expert segmentation plus shared experts.
-
Cross-Layer Energy Analysis of Multimodal Training on Grace Hopper Superchips
On Grace Hopper superchips, energy efficiency during multimodal training is governed by data movement and overlap rather than compute utilization, and runtime-optimal configurations are not always energy-optimal.
-
AI-Powered Surrogate Modelling for Multiscale Combustion: A Critical Review and Opportunities
A critical review of AI surrogate models for multiscale combustion that compares supervised, unsupervised, and physics-guided methods, identifies transferability and consistency challenges, and outlines future opportunities.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.