{"total":24,"items":[{"citing_arxiv_id":"2606.24228","ref_index":1,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Uncertainty intervals for multilevel models with missing not at random data","primary_cat":"stat.ME","submitted_at":"2026-06-23T07:17:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A sensitivity analysis for MNAR data in multilevel models derives bias adjustments conditional on user-specified sensitivity parameters to produce bounds on parameters of interest under weaker assumptions than missing at random.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.23174","ref_index":11,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Principal Covariate Regression with Nuclear Norm Penalty","primary_cat":"stat.ME","submitted_at":"2026-06-22T11:13:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Proposes PcovRnnp method enabling simultaneous dimension reduction and regularized coefficient estimation via nuclear norm penalty in high-dimensional settings.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.12731","ref_index":8,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Normative Robustness as a Frontier for Non-Verifiable Reasoning in LLMs","primary_cat":"cs.LG","submitted_at":"2026-06-10T22:37:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Frontier LLMs exhibit moral deliberative sycophancy by shifting their moral reasoning and justifications up to 6.5% on average toward a user's stated preferred view in simulated deliberations.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.22986","ref_index":7,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Robots That Know What to Ask: Recovering Misaligned Rewards through Targeted Explanations","primary_cat":"cs.RO","submitted_at":"2026-05-21T19:34:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Robots detect underspecified reward features via demonstration variation and query targeted natural language explanations to improve reward recovery from imperfect demos.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21782","ref_index":98,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A Scalable Parametric Item Calibration Engine (SPICE) for Explanatory IRT with Sparse Data","primary_cat":"stat.ME","submitted_at":"2026-05-20T22:22:06+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"SPICE is a scalable Bayesian MCMC engine for explanatory IRT calibration on sparsely linked persons and items in large assessment banks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.21299","ref_index":90,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Tracing the ongoing emergence of human-like reasoning in Large Language Models","primary_cat":"cs.CL","submitted_at":"2026-05-20T15:28:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"LLMs function as accurate semantic processors for conditionals but do not replicate the pragmatic inferences that define human reasoning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.19816","ref_index":74,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Performance of low vision individuals when selecting a target with head-pointing in virtual reality","primary_cat":"q-bio.NC","submitted_at":"2026-05-19T13:11:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Low vision individuals with central visual field loss can use head-pointing to select 2° targets in VR, reaching near-control performance with sufficiently large pointer activation zones.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12619","ref_index":89,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Human face perception reflects inverse-generative and naturalistic discriminative objectives","primary_cat":"q-bio.NC","submitted_at":"2026-05-12T18:06:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Human face perception aligns with neural networks trained on inverse-generative and naturalistic discriminative tasks, as these best predict human dissimilarity judgments on controversial and random face pairs.","context_count":1,"top_context_role":"method","top_context_polarity":"use_method","context_text":"Thisprocessyieldedasinglecorrelationcoefficientperseed,model,participant,andtrial. Correlation coefficients were clipped to[−1 + 10−6,1−10 −6]before Fisher-ztransformation to avoid infinite values. We then Fisher-ztransformed these clipped correlation coefficients and analyzed them separately for each stimulus condition using linear mixed-effects models implemented in thelme4package in R [89, 90]. The model was specified as: 18 z ~ model + (1|seed:subj_id) + (1|seed:trial_id) wheremodelwas treated as a fixed effect representing the different models being compared. Random intercepts wereincludedforparticipantsandtrials,eachnestedwithinstimulus-setseed,toaccountforrepeatedmeasurements and within-seed dependence. We considered more complex random-effects structures, including random intercepts for seed and random slopes"},{"citing_arxiv_id":"2605.05752","ref_index":46,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Generative AI-Based Monte Carlo Simulation for Method Evaluation Using Synthetic Multilevel Data","primary_cat":"stat.ME","submitted_at":"2026-05-07T06:45:17+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A framework using generative AI to produce synthetic multilevel data for Monte Carlo simulations that evaluate the performance and parameter recovery of quantitative methods.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03863","ref_index":33,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Quantifying the human visual exposome with vision language models","primary_cat":"cs.AI","submitted_at":"2026-05-05T15:25:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Vision language models applied to daily-life photos quantify visual environmental features that correlate with momentary affect and chronic stress, establishing a paradigm for visual exposomics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01006","ref_index":3,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Can AI Debias the News? LLM Interventions Improve Cross-Partisan Receptivity but LLMs Overestimate Their Own Effectiveness","primary_cat":"cs.CL","submitted_at":"2026-05-01T18:20:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Substantive LLM reframing boosts cross-partisan receptivity to news headlines without backfire, but models overestimate effect sizes and lack fidelity in modeling human psychological responses.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.25905","ref_index":3,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A paradox of AI fluency","primary_cat":"cs.CL","submitted_at":"2026-04-28T17:51:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Fluent AI users adopt an active, iterative collaboration mode that produces more visible failures but better recovery and success on hard tasks, whereas novices experience more invisible failures from passive use.","context_count":1,"top_context_role":"other","top_context_polarity":"unclear","context_text":"future will fail to deliver for most users. References Anthropic. Anthropic education report: The AI fluency index. https://www.anthropic. com/research/AI-fluency-index, February 2026. Accessed: 2026-03-11. Dale J. Barr, Roger Levy, Christoph Scheepers, and Harry J. Tily. Random effects structure in mixed-effects models: Keep it maximal.Journal of Memory and Language, 68(3):255-278, August 2011. Douglas Bates, Martin M¨achler, Ben Bolker, and Steve Walker. Fitting linear mixed-effects models using lme4.Journal of Statistical Software, 67(1):1-48, 2015. doi: 10.18637/jss.v067. i01. Erik Brynjolfsson, Bharat Chandar, and Ruyu Chen. Canaries in the coal mine? Six facts about the recent employment effects of artificial intelligence."},{"citing_arxiv_id":"2604.23081","ref_index":1,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Visual Accessibility in a Virtual Kitchen: Effects of Open Shelving on Performance, Cognitive Load, and Experience in Older Adults with and without MCI","primary_cat":"cs.HC","submitted_at":"2026-04-25T00:26:00+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Open shelving in a virtual kitchen reduced task time and physical activity for older adults with and without MCI while increasing gaze entropy, with no change in subjective cognitive load or motivation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20743","ref_index":52,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"ProfileGLMM: a R Package Extending Bayesian Profile Regression using Generalised Linear Mixed Models","primary_cat":"stat.ME","submitted_at":"2026-04-22T16:30:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"ProfileGLMM is an R package extending Bayesian profile regression with GLMMs to support hierarchical data, random effects, and cluster-covariate interactions for continuous or binary outcomes.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20569","ref_index":11,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"The Effect of Idea Elaboration on the Automatic Assessment of Idea Originality","primary_cat":"cs.HC","submitted_at":"2026-04-22T13:49:57+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"LLM originality raters exhibit self-preference bias toward artificial responses that disappears after controlling for idea elaboration in the Alternate Uses Task.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"In this regard, the most widely used approach is the semantic distance metric [9, 10, 19]. It reflects how far apart two concepts are in the semantic space. Assessing creativity using this approach reflects the associative theory of creativity, which states that highly original ideas require the combination of semantically distant concepts (i.e., higher semantic distance values) [11]. Semantic distance is quantified by1 −𝑐𝑜𝑠𝑖𝑛𝑒 of the angle be- tween pairs of vectors in semantic space [9, 10, 32]. Considering the example provided by [38],coffeeanddrinkhave a low seman- tic distance (.46), whereascoffeeandwritehave a higher semantic distance (.93) [38]. Semantic distance is considered a proxy of di- vergent thinking [34]. In the case of the AUT responses, semantic"},{"citing_arxiv_id":"2604.18563","ref_index":70,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Dual Alignment Between Language Model Layers and Human Sentence Processing","primary_cat":"cs.CL","submitted_at":"2026-04-20T17:51:26+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Later LLM layers align better with human cognitive effort in syntactic ambiguity than early layers do, indicating dual processing modes and complementary benefits from multi-layer probability updates.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.16755","ref_index":12,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Machine individuality: Separating genuine idiosyncrasy from response bias in large language models","primary_cat":"cs.AI","submitted_at":"2026-04-18T00:02:41+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"Crossed random-effects models on LLM word ratings show 16.9% variance from genuine stimulus-specific individuality, exceeding null models and forming coherent per-model fingerprints.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10511","ref_index":4,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Thinking Fast, Thinking Wrong: Intuitiveness Modulates LLM Counterfactual Reasoning in Policy Evaluation","primary_cat":"cs.AI","submitted_at":"2026-04-12T08:00:38+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.26954","ref_index":13,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"The Impact of LLM Self-Consistency and Reasoning Effort on Automated Scoring Accuracy and Cost","primary_cat":"cs.CY","submitted_at":"2026-04-03T14:26:08+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Strategic selection of LLMs and reasoning effort optimizes automated scoring accuracy and cost more effectively than self-consistency ensembling.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.07776","ref_index":3,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"SCOOTER: A Human Evaluation Framework for Unrestricted Adversarial Examples","primary_cat":"cs.CV","submitted_at":"2025-07-10T13:56:32+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"SCOOTER supplies best-practice guidelines, open tools, and a 3K-image benchmark with 34K+ human ratings showing that six tested unrestricted attacks produce images humans can detect as fake.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2503.11572","ref_index":23,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Implicit Bias-Like Patterns in Reasoning Models","primary_cat":"cs.CY","submitted_at":"2025-03-14T16:40:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Reasoning models expend more tokens on association-incompatible tasks than compatible ones, indicating greater effort on counter-stereotypical information, except for Claude 3.7 Sonnet which shows the reverse pattern linked to its bias-focused reasoning.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2408.05086","ref_index":16,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"A systematic framework for generating novel experimental hypotheses from language models","primary_cat":"cs.CL","submitted_at":"2024-08-09T14:17:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"A framework using language models to simulate non-existent experiments and derive novel testable hypotheses on dative verb acquisition and cross-structural generalization in children.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2403.05566","ref_index":6,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"Bringing Age Back In: Accounting for Population Age Distribution in Forecasting Migration","primary_cat":"stat.AP","submitted_at":"2024-02-21T01:59:50+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Introduces MASI to standardize net migration rates for age structure and applies a Bayesian hierarchical model to forecast adjusted total and age-sex specific migration rates through 2100, yielding narrower intervals and moderated decline projections.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"1907.11146","ref_index":3,"ref_count":1,"confidence":0.88,"is_internal_anchor":false,"paper_title":"What's in an accent? The impact of accented synthetic speech on lexical choice in human-machine dialogue","primary_cat":"cs.HC","submitted_at":"2019-07-25T15:39:11+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Accented synthetic speech leads users to align their lexical choices with the perceived accent of the machine partner, mirroring human-human dialogue patterns.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}