{"work":{"id":"8481976a-f196-4822-833d-e487ae5a1e81","openalex_id":null,"doi":null,"arxiv_id":null,"raw_key":"raw:7d36d7be38b77231f031d477","title":"write newline","authors":null,"authors_text":"\" write newline \"\" before","year":null,"venue":null,"abstract":null,"external_url":null,"cited_by_count":null,"metadata_source":"raw_reference","metadata_fetched_at":"2026-05-27T09:58:56.470201+00:00","pith_arxiv_id":null,"created_at":"2026-05-12T16:46:42.137544+00:00","updated_at":"2026-06-05T21:23:00.469572+00:00","title_quality_ok":false,"display_title":"write newline","render_title":"write newline"},"hub":{"state":{"work_id":"8481976a-f196-4822-833d-e487ae5a1e81","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":107,"external_cited_by_count":null,"distinct_field_count":22,"first_pith_cited_at":"2019-06-19T19:10:26+00:00","last_pith_cited_at":"2026-04-30T11:24:04+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T18:49:07.332660+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"background","n":5},{"context_role":"other","n":2}],"polarity_counts":[{"context_polarity":"unclear","n":5},{"context_polarity":"background","n":2}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"write newline","claims":[{"claim_text":"Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 6491-6501, 2024. Fei, N., Lu, Z., Gao, Y ., Yang, G., Huo, Y ., Wen, J., Lu, H., Song, R., Gao, X., Xiang, T., et al. Towards artificial general intelligence via a multimodal foundation model. Nature Communications, 13(1):3094, 2022. Feng, T., Jin, C., Liu, J., Zhu, K., Tu, H., Cheng, Z., Lin, G., and You, J. How far are we from agi. arXiv preprint arXiv","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Trace through the logic with the given test input 3. Determine the CORRECT output and which code(s) produced it Respond in the following format: <reasoning> Brief explanation (2-3 sentences max) of why this is the correct output. </reasoning> <correct output> The correct output value </correct output> <correct codes id> List of correct code indices, e.g., [1, 3] or [2] 16 ADVERMCTS </correct codes id> E. Algorithm We present the detailed procedure of ADVERMCTS in pseudocode in Algorithm 1. Algor","claim_type":"other","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"tic proximity between entities sharing a common surface, where the dependent object's placement is conditioned by its functional utility relative to an anchor (e.g., a keyboard placed relative to a laptop). Based on these relations, a global scene is represented as an ordered sequence of relational tuplesS={T 1,T 2, . . . ,TN }. Each tupleT i is formulated as: Ti =⟨O dep,i,O sup,i,{O f nc,i}opt⟩,(1) where Odep,i is the object to be generated, Osup,i is the mandatory support anchor, and Of nc,i i","claim_type":"background","confidence":0.7,"evidence_strength":"citation_context"},{"claim_text":"to symbolic constraints as specified in the symbolic scaf- fold. We give some examples of curated reasoning traces following this procedure in Appendix B.1. For each example (x,y) that is correctly predicted by the de- cision tree model, we let R(x,y, S(x)) denote the curated reasoning tokens. As a result, we collect a set of reason- ing data {xi,z i,y i}i∈C, where zi =R(x i,y i, S(xi)), and C ∈[1, . . . , n] denotes the subset of data that is correctly predicted by the decision tree model. 3.4.","claim_type":"other","confidence":0.6,"evidence_strength":"citation_context"},{"claim_text":"reweight these constraints during inversion using a spec- tral objective derived from a local linearization of the full 3 Information-Regularized Constrained Inversion for Stable Avatar Editing from Sparse Supervision decoding-and-rendering pipeline. 3.1 Differentiable Avatar Rendering Pipeline We assume a differentiable, animatable rendering pipeline yt =f(v, θ t)∈R m,(1) where v∈R r is a globalediting codeshared across frames, θt denotes the frame-specific driving state (pose parame- ters, cam","claim_type":"background","confidence":0.6,"evidence_strength":"citation_context"},{"claim_text":"where the squares are applied element-wise. • Low-rank term.Let θi = 1 i Pi j=1 θj be the running mean afteri snapshots, and define deviation columnsdi =θ i−θi. To limit the rank, SW AG retains only the lastK such columns in a matrix D∈R d×K, giving the low-rank covariance Σlr = 1 K−1 DD⊺.(27) The resulting SW AG posterior approximation is qSW AG(θ) =N \u0010 θSW A, 1 2(Σdiag +Σ lr) \u0011 .(28) Givenz 1 ∼ N(0, I d)andz 2 ∼ N(0, I K), SW AG draws samples via eθ=θ SW A+ 1√ 2 Σ1/2 diagz1 + 1p 2(K−1) Dz2.(29","claim_type":"background","confidence":0.5,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks write newline because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (5 contexts).","role_counts":[{"n":5,"context_role":"background"},{"n":2,"context_role":"other"}]},"error":null,"updated_at":"2026-05-26T13:26:38.551484+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"ac6873e2-34b6-473a-8aae-6009abdcfe0e","orcid":null,"display_name":"\" write newline \"\" before"}]},"error":null,"updated_at":"2026-05-26T13:26:38.544757+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-24T09:35:14.229465+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":8},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":7},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":7},{"title":"@esa (Ref","work_id":"b058608d-98d0-4821-a4ae-403d2b7cd411","shared_citers":6},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":6},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":6},{"title":null,"work_id":"ea79bfb8-d434-45e9-8607-416d3839ec5c","shared_citers":6},{"title":"Deep residual learning for image recognition","work_id":"ad888ecb-42a3-4afe-8cad-1b82c4ceba5d","shared_citers":5},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":5},{"title":"write newline","work_id":"8e5fda61-e601-4df4-8204-015bee341570","shared_citers":5},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":4},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":4},{"title":"N., Kaiser, ., and Polosukhin, I","work_id":"1fd3dff0-bb47-462f-9c43-de65746ef810","shared_citers":4},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":3},{"title":"gpt-oss-120b & gpt-oss-20b Model Card","work_id":"178c1f7e-4f19-4392-a45d-45a6dfa88ead","shared_citers":3},{"title":"Gradient-based learning applied to document recognition","work_id":"ad843adc-13d5-44a0-b124-d5468ffec663","shared_citers":3},{"title":"Learning multiple layers of features from tiny images","work_id":"044e652b-5701-4a1e-bda0-ccb538cb31de","shared_citers":3},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":3},{"title":"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism","work_id":"c888e6d1-0b1d-43d6-9ef5-f0912a0efa1b","shared_citers":3},{"title":"Program Synthesis with Large Language Models","work_id":"fd241a05-03b9-4de2-9588-9d77ce176125","shared_citers":3},{"title":"Qwen2.5 Technical Report","work_id":"d8432992-4980-4a81-85c7-9fa2c2b87f85","shared_citers":3},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":3},{"title":"Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters","work_id":"a8d50b24-bdf5-46ed-bc4f-2927dfd81f1d","shared_citers":3},{"title":"Training Compute-Optimal Large Language Models","work_id":"b2faf28d-86b7-429c-bc42-469458efc246","shared_citers":3}],"time_series":[{"n":4,"year":2023},{"n":9,"year":2024},{"n":23,"year":2025},{"n":35,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-24T09:35:23.738686+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-24T09:35:23.691873+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"write newline","claims":[{"claim_text":"Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 6491-6501, 2024. Fei, N., Lu, Z., Gao, Y ., Yang, G., Huo, Y ., Wen, J., Lu, H., Song, R., Gao, X., Xiang, T., et al. Towards artificial general intelligence via a multimodal foundation model. Nature Communications, 13(1):3094, 2022. Feng, T., Jin, C., Liu, J., Zhu, K., Tu, H., Cheng, Z., Lin, G., and You, J. How far are we from agi. arXiv preprint arXiv","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Trace through the logic with the given test input 3. Determine the CORRECT output and which code(s) produced it Respond in the following format: <reasoning> Brief explanation (2-3 sentences max) of why this is the correct output. </reasoning> <correct output> The correct output value </correct output> <correct codes id> List of correct code indices, e.g., [1, 3] or [2] 16 ADVERMCTS </correct codes id> E. Algorithm We present the detailed procedure of ADVERMCTS in pseudocode in Algorithm 1. Algor","claim_type":"other","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"tic proximity between entities sharing a common surface, where the dependent object's placement is conditioned by its functional utility relative to an anchor (e.g., a keyboard placed relative to a laptop). Based on these relations, a global scene is represented as an ordered sequence of relational tuplesS={T 1,T 2, . . . ,TN }. Each tupleT i is formulated as: Ti =⟨O dep,i,O sup,i,{O f nc,i}opt⟩,(1) where Odep,i is the object to be generated, Osup,i is the mandatory support anchor, and Of nc,i i","claim_type":"background","confidence":0.7,"evidence_strength":"citation_context"},{"claim_text":"to symbolic constraints as specified in the symbolic scaf- fold. We give some examples of curated reasoning traces following this procedure in Appendix B.1. For each example (x,y) that is correctly predicted by the de- cision tree model, we let R(x,y, S(x)) denote the curated reasoning tokens. As a result, we collect a set of reason- ing data {xi,z i,y i}i∈C, where zi =R(x i,y i, S(xi)), and C ∈[1, . . . , n] denotes the subset of data that is correctly predicted by the decision tree model. 3.4.","claim_type":"other","confidence":0.6,"evidence_strength":"citation_context"},{"claim_text":"reweight these constraints during inversion using a spec- tral objective derived from a local linearization of the full 3 Information-Regularized Constrained Inversion for Stable Avatar Editing from Sparse Supervision decoding-and-rendering pipeline. 3.1 Differentiable Avatar Rendering Pipeline We assume a differentiable, animatable rendering pipeline yt =f(v, θ t)∈R m,(1) where v∈R r is a globalediting codeshared across frames, θt denotes the frame-specific driving state (pose parame- ters, cam","claim_type":"background","confidence":0.6,"evidence_strength":"citation_context"},{"claim_text":"where the squares are applied element-wise. • Low-rank term.Let θi = 1 i Pi j=1 θj be the running mean afteri snapshots, and define deviation columnsdi =θ i−θi. To limit the rank, SW AG retains only the lastK such columns in a matrix D∈R d×K, giving the low-rank covariance Σlr = 1 K−1 DD⊺.(27) The resulting SW AG posterior approximation is qSW AG(θ) =N \u0010 θSW A, 1 2(Σdiag +Σ lr) \u0011 .(28) Givenz 1 ∼ N(0, I d)andz 2 ∼ N(0, I K), SW AG draws samples via eθ=θ SW A+ 1√ 2 Σ1/2 diagz1 + 1p 2(K−1) Dz2.(29","claim_type":"background","confidence":0.5,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks write newline because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (5 contexts).","role_counts":[{"n":5,"context_role":"background"},{"n":2,"context_role":"other"}]},"error":null,"updated_at":"2026-05-26T13:26:38.555260+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"write newline","claims":[{"claim_text":"Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 6491-6501, 2024. Fei, N., Lu, Z., Gao, Y ., Yang, G., Huo, Y ., Wen, J., Lu, H., Song, R., Gao, X., Xiang, T., et al. Towards artificial general intelligence via a multimodal foundation model. Nature Communications, 13(1):3094, 2022. Feng, T., Jin, C., Liu, J., Zhu, K., Tu, H., Cheng, Z., Lin, G., and You, J. How far are we from agi. arXiv preprint arXiv","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Trace through the logic with the given test input 3. Determine the CORRECT output and which code(s) produced it Respond in the following format: <reasoning> Brief explanation (2-3 sentences max) of why this is the correct output. </reasoning> <correct output> The correct output value </correct output> <correct codes id> List of correct code indices, e.g., [1, 3] or [2] 16 ADVERMCTS </correct codes id> E. Algorithm We present the detailed procedure of ADVERMCTS in pseudocode in Algorithm 1. Algor","claim_type":"other","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"tic proximity between entities sharing a common surface, where the dependent object's placement is conditioned by its functional utility relative to an anchor (e.g., a keyboard placed relative to a laptop). Based on these relations, a global scene is represented as an ordered sequence of relational tuplesS={T 1,T 2, . . . ,TN }. Each tupleT i is formulated as: Ti =⟨O dep,i,O sup,i,{O f nc,i}opt⟩,(1) where Odep,i is the object to be generated, Osup,i is the mandatory support anchor, and Of nc,i i","claim_type":"background","confidence":0.7,"evidence_strength":"citation_context"},{"claim_text":"to symbolic constraints as specified in the symbolic scaf- fold. We give some examples of curated reasoning traces following this procedure in Appendix B.1. For each example (x,y) that is correctly predicted by the de- cision tree model, we let R(x,y, S(x)) denote the curated reasoning tokens. As a result, we collect a set of reason- ing data {xi,z i,y i}i∈C, where zi =R(x i,y i, S(xi)), and C ∈[1, . . . , n] denotes the subset of data that is correctly predicted by the decision tree model. 3.4.","claim_type":"other","confidence":0.6,"evidence_strength":"citation_context"},{"claim_text":"reweight these constraints during inversion using a spec- tral objective derived from a local linearization of the full 3 Information-Regularized Constrained Inversion for Stable Avatar Editing from Sparse Supervision decoding-and-rendering pipeline. 3.1 Differentiable Avatar Rendering Pipeline We assume a differentiable, animatable rendering pipeline yt =f(v, θ t)∈R m,(1) where v∈R r is a globalediting codeshared across frames, θt denotes the frame-specific driving state (pose parame- ters, cam","claim_type":"background","confidence":0.6,"evidence_strength":"citation_context"},{"claim_text":"where the squares are applied element-wise. • Low-rank term.Let θi = 1 i Pi j=1 θj be the running mean afteri snapshots, and define deviation columnsdi =θ i−θi. To limit the rank, SW AG retains only the lastK such columns in a matrix D∈R d×K, giving the low-rank covariance Σlr = 1 K−1 DD⊺.(27) The resulting SW AG posterior approximation is qSW AG(θ) =N \u0010 θSW A, 1 2(Σdiag +Σ lr) \u0011 .(28) Givenz 1 ∼ N(0, I d)andz 2 ∼ N(0, I K), SW AG draws samples via eθ=θ SW A+ 1√ 2 Σ1/2 diagz1 + 1p 2(K−1) Dz2.(29","claim_type":"background","confidence":0.5,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks write newline because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (5 contexts).","role_counts":[{"n":5,"context_role":"background"},{"n":2,"context_role":"other"}]},"error":null,"updated_at":"2026-05-24T09:35:07.450002+00:00"}},"summary":{"title":"write newline","claims":[{"claim_text":"Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 6491-6501, 2024. Fei, N., Lu, Z., Gao, Y ., Yang, G., Huo, Y ., Wen, J., Lu, H., Song, R., Gao, X., Xiang, T., et al. Towards artificial general intelligence via a multimodal foundation model. Nature Communications, 13(1):3094, 2022. Feng, T., Jin, C., Liu, J., Zhu, K., Tu, H., Cheng, Z., Lin, G., and You, J. How far are we from agi. arXiv preprint arXiv","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"Trace through the logic with the given test input 3. Determine the CORRECT output and which code(s) produced it Respond in the following format: <reasoning> Brief explanation (2-3 sentences max) of why this is the correct output. </reasoning> <correct output> The correct output value </correct output> <correct codes id> List of correct code indices, e.g., [1, 3] or [2] 16 ADVERMCTS </correct codes id> E. Algorithm We present the detailed procedure of ADVERMCTS in pseudocode in Algorithm 1. Algor","claim_type":"other","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"tic proximity between entities sharing a common surface, where the dependent object's placement is conditioned by its functional utility relative to an anchor (e.g., a keyboard placed relative to a laptop). Based on these relations, a global scene is represented as an ordered sequence of relational tuplesS={T 1,T 2, . . . ,TN }. Each tupleT i is formulated as: Ti =⟨O dep,i,O sup,i,{O f nc,i}opt⟩,(1) where Odep,i is the object to be generated, Osup,i is the mandatory support anchor, and Of nc,i i","claim_type":"background","confidence":0.7,"evidence_strength":"citation_context"},{"claim_text":"to symbolic constraints as specified in the symbolic scaf- fold. We give some examples of curated reasoning traces following this procedure in Appendix B.1. For each example (x,y) that is correctly predicted by the de- cision tree model, we let R(x,y, S(x)) denote the curated reasoning tokens. As a result, we collect a set of reason- ing data {xi,z i,y i}i∈C, where zi =R(x i,y i, S(xi)), and C ∈[1, . . . , n] denotes the subset of data that is correctly predicted by the decision tree model. 3.4.","claim_type":"other","confidence":0.6,"evidence_strength":"citation_context"},{"claim_text":"reweight these constraints during inversion using a spec- tral objective derived from a local linearization of the full 3 Information-Regularized Constrained Inversion for Stable Avatar Editing from Sparse Supervision decoding-and-rendering pipeline. 3.1 Differentiable Avatar Rendering Pipeline We assume a differentiable, animatable rendering pipeline yt =f(v, θ t)∈R m,(1) where v∈R r is a globalediting codeshared across frames, θt denotes the frame-specific driving state (pose parame- ters, cam","claim_type":"background","confidence":0.6,"evidence_strength":"citation_context"},{"claim_text":"where the squares are applied element-wise. • Low-rank term.Let θi = 1 i Pi j=1 θj be the running mean afteri snapshots, and define deviation columnsdi =θ i−θi. To limit the rank, SW AG retains only the lastK such columns in a matrix D∈R d×K, giving the low-rank covariance Σlr = 1 K−1 DD⊺.(27) The resulting SW AG posterior approximation is qSW AG(θ) =N \u0010 θSW A, 1 2(Σdiag +Σ lr) \u0011 .(28) Givenz 1 ∼ N(0, I d)andz 2 ∼ N(0, I K), SW AG draws samples via eθ=θ SW A+ 1√ 2 Σ1/2 diagz1 + 1p 2(K−1) Dz2.(29","claim_type":"background","confidence":0.5,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks write newline because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (5 contexts).","role_counts":[{"n":5,"context_role":"background"},{"n":2,"context_role":"other"}]},"graph":{"co_cited":[{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":8},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":7},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":7},{"title":"@esa (Ref","work_id":"b058608d-98d0-4821-a4ae-403d2b7cd411","shared_citers":6},{"title":"Proximal Policy Optimization Algorithms","work_id":"240c67fe-d14d-4520-91c1-38a4e272ca19","shared_citers":6},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":6},{"title":null,"work_id":"ea79bfb8-d434-45e9-8607-416d3839ec5c","shared_citers":6},{"title":"Deep residual learning for image recognition","work_id":"ad888ecb-42a3-4afe-8cad-1b82c4ceba5d","shared_citers":5},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":5},{"title":"write newline","work_id":"8e5fda61-e601-4df4-8204-015bee341570","shared_citers":5},{"title":"DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models","work_id":"c5006563-f3ec-438a-9e35-b7b484f34828","shared_citers":4},{"title":"DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning","work_id":"e6b75ad5-2877-4168-97c8-710407094d20","shared_citers":4},{"title":"N., Kaiser, ., and Polosukhin, I","work_id":"1fd3dff0-bb47-462f-9c43-de65746ef810","shared_citers":4},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":3},{"title":"gpt-oss-120b & gpt-oss-20b Model Card","work_id":"178c1f7e-4f19-4392-a45d-45a6dfa88ead","shared_citers":3},{"title":"Gradient-based learning applied to document recognition","work_id":"ad843adc-13d5-44a0-b124-d5468ffec663","shared_citers":3},{"title":"Learning multiple layers of features from tiny images","work_id":"044e652b-5701-4a1e-bda0-ccb538cb31de","shared_citers":3},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":3},{"title":"Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism","work_id":"c888e6d1-0b1d-43d6-9ef5-f0912a0efa1b","shared_citers":3},{"title":"Program Synthesis with Large Language Models","work_id":"fd241a05-03b9-4de2-9588-9d77ce176125","shared_citers":3},{"title":"Qwen2.5 Technical Report","work_id":"d8432992-4980-4a81-85c7-9fa2c2b87f85","shared_citers":3},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":3},{"title":"Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters","work_id":"a8d50b24-bdf5-46ed-bc4f-2927dfd81f1d","shared_citers":3},{"title":"Training Compute-Optimal Large Language Models","work_id":"b2faf28d-86b7-429c-bc42-469458efc246","shared_citers":3}],"time_series":[{"n":4,"year":2023},{"n":9,"year":2024},{"n":23,"year":2025},{"n":35,"year":2026}],"dependency_candidates":[]},"authors":[{"id":"ac6873e2-34b6-473a-8aae-6009abdcfe0e","orcid":null,"display_name":"\" write newline \"\" before","source":"manual","import_confidence":0.72}]}}