{"work":{"id":"5fa65a50-c437-40cd-bd09-3ae031477803","openalex_id":"https://openalex.org/W2117539524","doi":"10.1007/s11263-015-0816-y","arxiv_id":null,"raw_key":null,"title":"Bernstein, Alexander C","authors":[{"given":"Olga","family":"Russakovsky","sequence":"first","affiliation":[]},{"given":"Jia","family":"Deng","sequence":"additional","affiliation":[]},{"given":"Hao","family":"Su","sequence":"additional","affiliation":[]},{"given":"Jonathan","family":"Krause","sequence":"additional","affiliation":[]},{"given":"Sanjeev","family":"Satheesh","sequence":"additional","affiliation":[]},{"given":"Sean","family":"Ma","sequence":"additional","affiliation":[]},{"given":"Zhiheng","family":"Huang","sequence":"additional","affiliation":[]},{"given":"Andrej","family":"Karpathy","sequence":"additional","affiliation":[]},{"given":"Aditya","family":"Khosla","sequence":"additional","affiliation":[]},{"given":"Michael","family":"Bernstein","sequence":"additional","affiliation":[]},{"given":"Alexander C.","family":"Berg","sequence":"additional","affiliation":[]},{"given":"Li","family":"Fei-Fei","sequence":"additional","affiliation":[]}],"authors_text":"Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S","year":2015,"venue":"International Journal of Computer Vision","abstract":null,"external_url":"https://doi.org/10.1007/s11263-015-0816-y","cited_by_count":30004,"metadata_source":"doi_reference","metadata_fetched_at":"2026-06-29T12:53:26.234346+00:00","pith_arxiv_id":null,"created_at":"2026-05-08T17:08:34.488066+00:00","updated_at":"2026-06-29T12:53:26.234346+00:00","title_quality_ok":false,"display_title":"ImageNet Large Scale Visual Recognition Challenge","render_title":"ImageNet Large Scale Visual Recognition Challenge"},"hub":{"state":{"work_id":"5fa65a50-c437-40cd-bd09-3ae031477803","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":50,"external_cited_by_count":30004,"distinct_field_count":9,"first_pith_cited_at":"2016-11-10T22:02:36+00:00","last_pith_cited_at":"2026-06-25T08:33:34+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-29T14:58:58.795837+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"background","n":2},{"context_role":"dataset","n":2}],"polarity_counts":[{"context_polarity":"background","n":2},{"context_polarity":"use_dataset","n":2}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"Berg, and Li Fei-Fei","claims":[{"claim_text":"T able 4Common datasets used in CDOD benchmarks, summarizing modality, scale, annotation volume, typical role, and dominant shift type. Acronyms: S = Source, T = Target. Symbol:∼ indicates approximate counts. Dataset Y ear Modality #Images #Cls #Anno Role Domain Shift PASCAL VOC [95] 2007-2012 RGB∼16.5K∼20∼40K S/T mild scene shift MS COCO [96] 2014 RGB∼330K∼80∼2.5M S scene diversity ImageNet DET [97] 2013 RGB∼450K∼200∼500K S fine-grained cate- gory Cityscapes [98] 2016 RGB∼3.0K∼8∼65K T urban sce","claim_type":"dataset","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Finally, ifg 1 andg 2 both do not depend on the second argument, (3) is a linear parabolic SPDE with additive noise: dUt =α 1(t)∆Ut dt+α 2(t) dWt for allt∈I.(20) I Numerical simulation For the numerical simulation of the forward and backward processes, (3) and (1), we modeled the image space Λ as Λ = (0, d1)×(0, d 2)and decomposed the boundary∂Λaccording to ∂LΛ :={0} ×[0, d 2);(21) ∂T Λ := [0, d1)× {d 2};(22) ∂RΛ :={d 1} ×(0, d 2];(23) ∂BΛ := (0, d1]× {0}(24) into its left, top, right and bottom","claim_type":"dataset","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"ImageNet Large Scale Visual Recognition Challenge.International Journal of Computer Vision (IJCV)115, 3 (2015), 211-252. doi:10.1007/s11263-015-0816-y [41] Shuran Song, Samuel P Lichtenberg, and Jianxiong Xiao. 2015. Sun rgb-d: A rgb-d scene understanding benchmark suite. InProceedings of the IEEE conference on computer vision and pattern recognition. 567-576. [42] Alex Tamkin, Mike Wu, and Noah D. Goodman. 2020. Viewmaker Networks: Learning Views for Unsupervised Representation Learning.ArXivab","claim_type":"background","confidence":0.8,"evidence_strength":"citation_context"},{"claim_text":"1 Introduction In recent years, the emergence and evolution of auto-regressive models [18, 44, 66] and diffusion models [32, 61, 16, 50, 58, 55, 56] have led to AI-generated content (AIGC) becoming increasingly realistic and widely applied across industries, bringing convenience to fields such as entertainment [51, 2, 63], advertising [ 39, 17], and medicine [ 60, 83]. This progress is particularly evident in AI- synthesized images, which have seen gradual improvements in resolution and semantic","claim_type":"background","confidence":0.75,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Berg, and Li Fei-Fei because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (2 contexts).","role_counts":[{"n":2,"context_role":"background"},{"n":2,"context_role":"dataset"}]},"error":null,"updated_at":"2026-05-20T19:42:28.246770+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"2068d8ae-d5b3-403b-b007-948fb56b81a8","orcid":null,"display_name":"Olga Russakovsky"},{"id":"4ad8dd1a-4071-42bd-bed7-9b5ae70a9ad7","orcid":null,"display_name":"Jia Deng"},{"id":"142360c7-59d4-4b6d-bb98-c28d9dfe4f17","orcid":null,"display_name":"Hao Su"},{"id":"d1b82ee0-798e-406e-b9d4-8a9d7381f94c","orcid":null,"display_name":"Jonathan Krause"},{"id":"4ad7bf42-cf25-4a80-8b67-2a6386aa35b2","orcid":null,"display_name":"Sanjeev Satheesh"},{"id":"faac52d9-d6e0-49f6-a178-969a152727dd","orcid":null,"display_name":"Sean Ma"},{"id":"c29edb90-ac01-494f-936a-f184c2ca85c9","orcid":null,"display_name":"Zhiheng Huang"},{"id":"f94ba83a-cece-440f-accc-0a98ec296ada","orcid":null,"display_name":"Andrej Karpathy"},{"id":"aac3b460-b492-4d43-9709-73ce9a88f62e","orcid":null,"display_name":"Aditya Khosla"},{"id":"4d54c265-d361-49f7-9e02-57781b4bfebe","orcid":null,"display_name":"Michael Bernstein"},{"id":"9e6fde49-87d1-41d4-a901-d7d70f607be0","orcid":null,"display_name":"Alexander C. Berg"},{"id":"28c2b449-b02b-4cb0-bfdb-afe0448f6513","orcid":null,"display_name":"Li Fei-Fei"}]},"error":null,"updated_at":"2026-05-20T19:42:28.241850+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-20T19:42:24.855828+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale","work_id":"e96730e3-129b-4db6-b981-15ab7932e297","shared_citers":3},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":3},{"title":"In: 2021 IEEE/CVF In- ternational Conference on Computer Vision (ICCV)","work_id":"3820f598-11b0-45c3-8c99-0079181ac0a7","shared_citers":3},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":3},{"title":"Adabins: Depth estimation using adap- tive bins","work_id":"7083a41e-5666-435b-ab26-c753f6490b9a","shared_citers":2},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":2},{"title":"and Cogswell, Michael and Das, Abhishek and Vedantam, Ramakrishna and Parikh, Devi and Batra, Dhruv , year = 2017, month = oct, pages =","work_id":"cea663cf-1775-4bde-ae02-57f8ba3348c0","shared_citers":2},{"title":"Barron, Ben Mildenhall, Mehdi S","work_id":"0a23d1b7-bd56-43cc-8a80-7c43ce994e1e","shared_citers":2},{"title":"CoRR , booktitle =","work_id":"4a7c0052-7cf8-4055-a2b8-0853668d673b","shared_citers":2},{"title":"Deep Residual Learning for Image Recognition","work_id":"ae9e5671-23e8-4853-82a4-699b5b8dd639","shared_citers":2},{"title":"Explaining and Harnessing Adversarial Examples","work_id":"2cedf8f6-7539-4c49-8136-f42a20487146","shared_citers":2},{"title":"GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models","work_id":"34430d19-7919-48ce-88a5-17b3bfe2192e","shared_citers":2},{"title":"In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","work_id":"9da51225-b7bd-4032-b7db-ca577971dafe","shared_citers":2},{"title":"Learning Transferable Visual Models From Natural Language Supervision","work_id":"6de86bb5-27bd-4d5c-8b89-967ebfc52659","shared_citers":2},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":2},{"title":"LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop","work_id":"9bec023f-409e-4089-aa94-fc3b64943758","shared_citers":2},{"title":"Pattern Recognition 127 (2022), 108611","work_id":"238df2e4-a3e5-46f3-860e-3ae2b0094b97","shared_citers":2},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":2},{"title":"SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis","work_id":"8034c587-fba6-4941-87ba-c98f2ac962cb","shared_citers":2},{"title":"URLhttp://dx.doi.org/10.1109/CVPR.2016.90","work_id":"b353bda2-591d-479a-9c8b-22dfcba12431","shared_citers":2},{"title":"URL https://doi.org/10.1109/CVPR52733","work_id":"7efbc2dd-b0f2-4f71-bb1c-d2fcf110d805","shared_citers":2},{"title":"","work_id":"34091332-cb3e-4889-864f-83e78bc201a5","shared_citers":1},{"title":"10012–10022","work_id":"a33f7b8d-c102-4985-a0bd-1ce44d82f754","shared_citers":1},{"title":"10.1051/0004-6361/202346396","work_id":"4e231a54-4817-4bad-8825-ca036a9043b2","shared_citers":1}],"time_series":[{"n":1,"year":2016},{"n":1,"year":2017},{"n":1,"year":2022},{"n":4,"year":2025},{"n":19,"year":2026}],"dependency_candidates":[{"n":1,"role":"dataset","polarity":"use_dataset","paper_title":"Score-Based Generative Modeling through Anisotropic Stochastic Partial Differential Equations","primary_cat":"cs.CE","context_text":"Finally, ifg 1 andg 2 both do not depend on the second argument, (3) is a linear parabolic SPDE with additive noise: dUt =α 1(t)∆Ut dt+α 2(t) dWt for allt∈I.(20) I Numerical simulation For the numerical simulation of the forward and backward processes, (3) and (1), we modeled the image space Λ as Λ = (0, d1)×(0, d 2)and decomposed the boundary∂Λaccording to ∂LΛ :={0} ×[0, d 2);(21) ∂T Λ := [0, d1)× {d 2};(22) ∂RΛ :={d 1} ×(0, d 2];(23) ∂BΛ := (0, d1]× {0}(24) into its left, top, right and bottom part. We discretized the derivatives using a mixture of forward, backward and central finite differences, respecting Neumann boundary conditions. I.1 Domain discretization After discretization, we decomposed the discretized domain D={0, . . . , d 1} × {0, . . . , d2} in the same spirit into its interior,","citing_arxiv_id":"2605.08976"},{"n":1,"role":"dataset","polarity":"use_dataset","paper_title":"Generalization Under Scrutiny: Cross-Domain Detection Progresses, Pitfalls, and Persistent Challenges","primary_cat":"cs.CV","context_text":"T able 4Common datasets used in CDOD benchmarks, summarizing modality, scale, annotation volume, typical role, and dominant shift type. Acronyms: S = Source, T = Target. Symbol:∼ indicates approximate counts. Dataset Y ear Modality #Images #Cls #Anno Role Domain Shift PASCAL VOC [95] 2007-2012 RGB∼16.5K∼20∼40K S/T mild scene shift MS COCO [96] 2014 RGB∼330K∼80∼2.5M S scene diversity ImageNet DET [97] 2013 RGB∼450K∼200∼500K S fine-grained cate- gory Cityscapes [98] 2016 RGB∼3.0K∼8∼65K T urban scene shift Foggy Cityscapes [99] 2018 RGB∼3.0K∼8∼65K T weather (clear→fog) SIM10K [100] 2018 RGB (Synthetic)∼10K∼1∼58K S synth→real GTA5 [101] 2016 RGB (Synthetic)∼25K∼9∼300K S synth→real SYNTHIA [102] 2016 RGB (Synthetic)∼9.4K∼9∼200K S synth→real BDD100K [103] 2020 RGB / Video∼100K∼10∼1.","citing_arxiv_id":"2604.08230"}]},"error":null,"updated_at":"2026-05-20T19:42:28.269878+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-20T19:42:22.109518+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"Berg, and Li Fei-Fei","claims":[{"claim_text":"T able 4Common datasets used in CDOD benchmarks, summarizing modality, scale, annotation volume, typical role, and dominant shift type. Acronyms: S = Source, T = Target. Symbol:∼ indicates approximate counts. Dataset Y ear Modality #Images #Cls #Anno Role Domain Shift PASCAL VOC [95] 2007-2012 RGB∼16.5K∼20∼40K S/T mild scene shift MS COCO [96] 2014 RGB∼330K∼80∼2.5M S scene diversity ImageNet DET [97] 2013 RGB∼450K∼200∼500K S fine-grained cate- gory Cityscapes [98] 2016 RGB∼3.0K∼8∼65K T urban sce","claim_type":"dataset","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Finally, ifg 1 andg 2 both do not depend on the second argument, (3) is a linear parabolic SPDE with additive noise: dUt =α 1(t)∆Ut dt+α 2(t) dWt for allt∈I.(20) I Numerical simulation For the numerical simulation of the forward and backward processes, (3) and (1), we modeled the image space Λ as Λ = (0, d1)×(0, d 2)and decomposed the boundary∂Λaccording to ∂LΛ :={0} ×[0, d 2);(21) ∂T Λ := [0, d1)× {d 2};(22) ∂RΛ :={d 1} ×(0, d 2];(23) ∂BΛ := (0, d1]× {0}(24) into its left, top, right and bottom","claim_type":"dataset","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"ImageNet Large Scale Visual Recognition Challenge.International Journal of Computer Vision (IJCV)115, 3 (2015), 211-252. doi:10.1007/s11263-015-0816-y [41] Shuran Song, Samuel P Lichtenberg, and Jianxiong Xiao. 2015. Sun rgb-d: A rgb-d scene understanding benchmark suite. InProceedings of the IEEE conference on computer vision and pattern recognition. 567-576. [42] Alex Tamkin, Mike Wu, and Noah D. Goodman. 2020. Viewmaker Networks: Learning Views for Unsupervised Representation Learning.ArXivab","claim_type":"background","confidence":0.8,"evidence_strength":"citation_context"},{"claim_text":"1 Introduction In recent years, the emergence and evolution of auto-regressive models [18, 44, 66] and diffusion models [32, 61, 16, 50, 58, 55, 56] have led to AI-generated content (AIGC) becoming increasingly realistic and widely applied across industries, bringing convenience to fields such as entertainment [51, 2, 63], advertising [ 39, 17], and medicine [ 60, 83]. This progress is particularly evident in AI- synthesized images, which have seen gradual improvements in resolution and semantic","claim_type":"background","confidence":0.75,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Berg, and Li Fei-Fei because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (2 contexts).","role_counts":[{"n":2,"context_role":"background"},{"n":2,"context_role":"dataset"}]},"error":null,"updated_at":"2026-05-20T19:42:24.859014+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"Berg, and Li Fei-Fei","claims":[{"claim_text":"T able 4Common datasets used in CDOD benchmarks, summarizing modality, scale, annotation volume, typical role, and dominant shift type. Acronyms: S = Source, T = Target. Symbol:∼ indicates approximate counts. Dataset Y ear Modality #Images #Cls #Anno Role Domain Shift PASCAL VOC [95] 2007-2012 RGB∼16.5K∼20∼40K S/T mild scene shift MS COCO [96] 2014 RGB∼330K∼80∼2.5M S scene diversity ImageNet DET [97] 2013 RGB∼450K∼200∼500K S fine-grained cate- gory Cityscapes [98] 2016 RGB∼3.0K∼8∼65K T urban sce","claim_type":"dataset","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Finally, ifg 1 andg 2 both do not depend on the second argument, (3) is a linear parabolic SPDE with additive noise: dUt =α 1(t)∆Ut dt+α 2(t) dWt for allt∈I.(20) I Numerical simulation For the numerical simulation of the forward and backward processes, (3) and (1), we modeled the image space Λ as Λ = (0, d1)×(0, d 2)and decomposed the boundary∂Λaccording to ∂LΛ :={0} ×[0, d 2);(21) ∂T Λ := [0, d1)× {d 2};(22) ∂RΛ :={d 1} ×(0, d 2];(23) ∂BΛ := (0, d1]× {0}(24) into its left, top, right and bottom","claim_type":"dataset","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"ImageNet Large Scale Visual Recognition Challenge.International Journal of Computer Vision (IJCV)115, 3 (2015), 211-252. doi:10.1007/s11263-015-0816-y [41] Shuran Song, Samuel P Lichtenberg, and Jianxiong Xiao. 2015. Sun rgb-d: A rgb-d scene understanding benchmark suite. InProceedings of the IEEE conference on computer vision and pattern recognition. 567-576. [42] Alex Tamkin, Mike Wu, and Noah D. Goodman. 2020. Viewmaker Networks: Learning Views for Unsupervised Representation Learning.ArXivab","claim_type":"background","confidence":0.8,"evidence_strength":"citation_context"},{"claim_text":"1 Introduction In recent years, the emergence and evolution of auto-regressive models [18, 44, 66] and diffusion models [32, 61, 16, 50, 58, 55, 56] have led to AI-generated content (AIGC) becoming increasingly realistic and widely applied across industries, bringing convenience to fields such as entertainment [51, 2, 63], advertising [ 39, 17], and medicine [ 60, 83]. This progress is particularly evident in AI- synthesized images, which have seen gradual improvements in resolution and semantic","claim_type":"background","confidence":0.75,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Berg, and Li Fei-Fei because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (2 contexts).","role_counts":[{"n":2,"context_role":"background"},{"n":2,"context_role":"dataset"}]},"error":null,"updated_at":"2026-05-20T19:42:28.244423+00:00"}},"summary":{"title":"Berg, and Li Fei-Fei","claims":[{"claim_text":"T able 4Common datasets used in CDOD benchmarks, summarizing modality, scale, annotation volume, typical role, and dominant shift type. Acronyms: S = Source, T = Target. Symbol:∼ indicates approximate counts. Dataset Y ear Modality #Images #Cls #Anno Role Domain Shift PASCAL VOC [95] 2007-2012 RGB∼16.5K∼20∼40K S/T mild scene shift MS COCO [96] 2014 RGB∼330K∼80∼2.5M S scene diversity ImageNet DET [97] 2013 RGB∼450K∼200∼500K S fine-grained cate- gory Cityscapes [98] 2016 RGB∼3.0K∼8∼65K T urban sce","claim_type":"dataset","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"Finally, ifg 1 andg 2 both do not depend on the second argument, (3) is a linear parabolic SPDE with additive noise: dUt =α 1(t)∆Ut dt+α 2(t) dWt for allt∈I.(20) I Numerical simulation For the numerical simulation of the forward and backward processes, (3) and (1), we modeled the image space Λ as Λ = (0, d1)×(0, d 2)and decomposed the boundary∂Λaccording to ∂LΛ :={0} ×[0, d 2);(21) ∂T Λ := [0, d1)× {d 2};(22) ∂RΛ :={d 1} ×(0, d 2];(23) ∂BΛ := (0, d1]× {0}(24) into its left, top, right and bottom","claim_type":"dataset","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"ImageNet Large Scale Visual Recognition Challenge.International Journal of Computer Vision (IJCV)115, 3 (2015), 211-252. doi:10.1007/s11263-015-0816-y [41] Shuran Song, Samuel P Lichtenberg, and Jianxiong Xiao. 2015. Sun rgb-d: A rgb-d scene understanding benchmark suite. InProceedings of the IEEE conference on computer vision and pattern recognition. 567-576. [42] Alex Tamkin, Mike Wu, and Noah D. Goodman. 2020. Viewmaker Networks: Learning Views for Unsupervised Representation Learning.ArXivab","claim_type":"background","confidence":0.8,"evidence_strength":"citation_context"},{"claim_text":"1 Introduction In recent years, the emergence and evolution of auto-regressive models [18, 44, 66] and diffusion models [32, 61, 16, 50, 58, 55, 56] have led to AI-generated content (AIGC) becoming increasingly realistic and widely applied across industries, bringing convenience to fields such as entertainment [51, 2, 63], advertising [ 39, 17], and medicine [ 60, 83]. This progress is particularly evident in AI- synthesized images, which have seen gradual improvements in resolution and semantic","claim_type":"background","confidence":0.75,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks Berg, and Li Fei-Fei because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (2 contexts).","role_counts":[{"n":2,"context_role":"background"},{"n":2,"context_role":"dataset"}]},"graph":{"co_cited":[{"title":"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale","work_id":"e96730e3-129b-4db6-b981-15ab7932e297","shared_citers":3},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":3},{"title":"In: 2021 IEEE/CVF In- ternational Conference on Computer Vision (ICCV)","work_id":"3820f598-11b0-45c3-8c99-0079181ac0a7","shared_citers":3},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":3},{"title":"Adabins: Depth estimation using adap- tive bins","work_id":"7083a41e-5666-435b-ab26-c753f6490b9a","shared_citers":2},{"title":"Adam: A Method for Stochastic Optimization","work_id":"1910796d-9b52-4683-bf5c-de9632c1028b","shared_citers":2},{"title":"and Cogswell, Michael and Das, Abhishek and Vedantam, Ramakrishna and Parikh, Devi and Batra, Dhruv , year = 2017, month = oct, pages =","work_id":"cea663cf-1775-4bde-ae02-57f8ba3348c0","shared_citers":2},{"title":"Barron, Ben Mildenhall, Mehdi S","work_id":"0a23d1b7-bd56-43cc-8a80-7c43ce994e1e","shared_citers":2},{"title":"CoRR , booktitle =","work_id":"4a7c0052-7cf8-4055-a2b8-0853668d673b","shared_citers":2},{"title":"Deep Residual Learning for Image Recognition","work_id":"ae9e5671-23e8-4853-82a4-699b5b8dd639","shared_citers":2},{"title":"Explaining and Harnessing Adversarial Examples","work_id":"2cedf8f6-7539-4c49-8136-f42a20487146","shared_citers":2},{"title":"GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models","work_id":"34430d19-7919-48ce-88a5-17b3bfe2192e","shared_citers":2},{"title":"In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","work_id":"9da51225-b7bd-4032-b7db-ca577971dafe","shared_citers":2},{"title":"Learning Transferable Visual Models From Natural Language Supervision","work_id":"6de86bb5-27bd-4d5c-8b89-967ebfc52659","shared_citers":2},{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":2},{"title":"LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop","work_id":"9bec023f-409e-4089-aa94-fc3b64943758","shared_citers":2},{"title":"Pattern Recognition 127 (2022), 108611","work_id":"238df2e4-a3e5-46f3-860e-3ae2b0094b97","shared_citers":2},{"title":"Scaling Laws for Neural Language Models","work_id":"b7dd8749-9c45-4977-ab9b-64478dce1ae8","shared_citers":2},{"title":"SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis","work_id":"8034c587-fba6-4941-87ba-c98f2ac962cb","shared_citers":2},{"title":"URLhttp://dx.doi.org/10.1109/CVPR.2016.90","work_id":"b353bda2-591d-479a-9c8b-22dfcba12431","shared_citers":2},{"title":"URL https://doi.org/10.1109/CVPR52733","work_id":"7efbc2dd-b0f2-4f71-bb1c-d2fcf110d805","shared_citers":2},{"title":"","work_id":"34091332-cb3e-4889-864f-83e78bc201a5","shared_citers":1},{"title":"10012–10022","work_id":"a33f7b8d-c102-4985-a0bd-1ce44d82f754","shared_citers":1},{"title":"10.1051/0004-6361/202346396","work_id":"4e231a54-4817-4bad-8825-ca036a9043b2","shared_citers":1}],"time_series":[{"n":1,"year":2016},{"n":1,"year":2017},{"n":1,"year":2022},{"n":4,"year":2025},{"n":19,"year":2026}],"dependency_candidates":[{"n":1,"role":"dataset","polarity":"use_dataset","paper_title":"Score-Based Generative Modeling through Anisotropic Stochastic Partial Differential Equations","primary_cat":"cs.CE","context_text":"Finally, ifg 1 andg 2 both do not depend on the second argument, (3) is a linear parabolic SPDE with additive noise: dUt =α 1(t)∆Ut dt+α 2(t) dWt for allt∈I.(20) I Numerical simulation For the numerical simulation of the forward and backward processes, (3) and (1), we modeled the image space Λ as Λ = (0, d1)×(0, d 2)and decomposed the boundary∂Λaccording to ∂LΛ :={0} ×[0, d 2);(21) ∂T Λ := [0, d1)× {d 2};(22) ∂RΛ :={d 1} ×(0, d 2];(23) ∂BΛ := (0, d1]× {0}(24) into its left, top, right and bottom part. We discretized the derivatives using a mixture of forward, backward and central finite differences, respecting Neumann boundary conditions. I.1 Domain discretization After discretization, we decomposed the discretized domain D={0, . . . , d 1} × {0, . . . , d2} in the same spirit into its interior,","citing_arxiv_id":"2605.08976"},{"n":1,"role":"dataset","polarity":"use_dataset","paper_title":"Generalization Under Scrutiny: Cross-Domain Detection Progresses, Pitfalls, and Persistent Challenges","primary_cat":"cs.CV","context_text":"T able 4Common datasets used in CDOD benchmarks, summarizing modality, scale, annotation volume, typical role, and dominant shift type. Acronyms: S = Source, T = Target. Symbol:∼ indicates approximate counts. Dataset Y ear Modality #Images #Cls #Anno Role Domain Shift PASCAL VOC [95] 2007-2012 RGB∼16.5K∼20∼40K S/T mild scene shift MS COCO [96] 2014 RGB∼330K∼80∼2.5M S scene diversity ImageNet DET [97] 2013 RGB∼450K∼200∼500K S fine-grained cate- gory Cityscapes [98] 2016 RGB∼3.0K∼8∼65K T urban scene shift Foggy Cityscapes [99] 2018 RGB∼3.0K∼8∼65K T weather (clear→fog) SIM10K [100] 2018 RGB (Synthetic)∼10K∼1∼58K S synth→real GTA5 [101] 2016 RGB (Synthetic)∼25K∼9∼300K S synth→real SYNTHIA [102] 2016 RGB (Synthetic)∼9.4K∼9∼200K S synth→real BDD100K [103] 2020 RGB / Video∼100K∼10∼1.","citing_arxiv_id":"2604.08230"}]},"authors":[{"id":"aac3b460-b492-4d43-9709-73ce9a88f62e","orcid":null,"display_name":"Aditya Khosla","source":"manual","import_confidence":0.72},{"id":"9e6fde49-87d1-41d4-a901-d7d70f607be0","orcid":null,"display_name":"Alexander C. Berg","source":"manual","import_confidence":0.72},{"id":"f94ba83a-cece-440f-accc-0a98ec296ada","orcid":null,"display_name":"Andrej Karpathy","source":"manual","import_confidence":0.72},{"id":"142360c7-59d4-4b6d-bb98-c28d9dfe4f17","orcid":null,"display_name":"Hao Su","source":"manual","import_confidence":0.72},{"id":"4ad8dd1a-4071-42bd-bed7-9b5ae70a9ad7","orcid":null,"display_name":"Jia Deng","source":"manual","import_confidence":0.72},{"id":"d1b82ee0-798e-406e-b9d4-8a9d7381f94c","orcid":null,"display_name":"Jonathan Krause","source":"manual","import_confidence":0.72},{"id":"28c2b449-b02b-4cb0-bfdb-afe0448f6513","orcid":null,"display_name":"Li Fei-Fei","source":"manual","import_confidence":0.72},{"id":"4d54c265-d361-49f7-9e02-57781b4bfebe","orcid":null,"display_name":"Michael Bernstein","source":"manual","import_confidence":0.72},{"id":"2068d8ae-d5b3-403b-b007-948fb56b81a8","orcid":null,"display_name":"Olga Russakovsky","source":"manual","import_confidence":0.72},{"id":"4ad7bf42-cf25-4a80-8b67-2a6386aa35b2","orcid":null,"display_name":"Sanjeev Satheesh","source":"manual","import_confidence":0.72},{"id":"faac52d9-d6e0-49f6-a178-969a152727dd","orcid":null,"display_name":"Sean Ma","source":"manual","import_confidence":0.72},{"id":"c29edb90-ac01-494f-936a-f184c2ca85c9","orcid":null,"display_name":"Zhiheng Huang","source":"manual","import_confidence":0.72}]}}