{"paper":{"title":"You Only Landmark Once: Lightweight U-Net Face Super Resolution with YOLO-World Landmark Heatmaps","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A lightweight U-Net reconstructs 128x128 faces from 16x16 inputs by weighting its loss with YOLO-World landmark heatmaps.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Anna Briotto, Endi Hysa, Lamberto Ballan, Marco Fiorucci, Riccardo Carraro","submitted_at":"2026-05-13T22:41:23Z","abstract_excerpt":"Face image super-resolution aims to recover high-resolution facial images from severely degraded inputs. Under extreme upscaling factors, fine facial details are often lost, making accurate reconstruction challenging. Existing methods typically rely on heavy network architectures, adversarial training schemes, or separate alignment networks, increasing model complexity and computational cost. To address these issues, we propose a lightweight U-Net based-architecture designed to reconstructs $128{ \\times }128$ facial images from severely degraded $16{ \\times }16$ inputs, achieving an $8 \\times "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Experiments on the aligned CelebA dataset demonstrate that the proposed loss consistently improves quantitative metrics and produces sharper, more realistic reconstructions.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"YOLO-World heatmaps generated directly from severely degraded 16x16 inputs remain accurate enough to serve as reliable spatial weights for the reconstruction loss without introducing misalignment artifacts.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Lightweight U-Net for 8x face super-resolution uses YOLO-World landmark heatmaps as spatial loss weights to improve reconstruction on CelebA without extra networks or adversarial training.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A lightweight U-Net reconstructs 128x128 faces from 16x16 inputs by weighting its loss with YOLO-World landmark heatmaps.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"6f107e9f5e6b093542fca5122d37bcb493a49c0ad750f4781b260f67480c3812"},"source":{"id":"2605.14166","kind":"arxiv","version":1},"verdict":{"id":"09a799e1-7ed0-4ecf-9817-68686deac460","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T04:48:42.556341Z","strongest_claim":"Experiments on the aligned CelebA dataset demonstrate that the proposed loss consistently improves quantitative metrics and produces sharper, more realistic reconstructions.","one_line_summary":"Lightweight U-Net for 8x face super-resolution uses YOLO-World landmark heatmaps as spatial loss weights to improve reconstruction on CelebA without extra networks or adversarial training.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"YOLO-World heatmaps generated directly from severely degraded 16x16 inputs remain accurate enough to serve as reliable spatial weights for the reconstruction loss without introducing misalignment artifacts.","pith_extraction_headline":"A lightweight U-Net reconstructs 128x128 faces from 16x16 inputs by weighting its loss with YOLO-World landmark heatmaps."},"references":{"count":32,"sample":[{"doi":"","year":2003,"title":"Super-resolution image re- construction: a technical overview,","work_id":"f8de6d01-f725-4ef4-ba9c-ff16e3fa6f8f","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2019,"title":"Deep learning for single image super-resolution: A brief review,","work_id":"6eb6e8aa-fca7-493d-a0ac-1767b9686881","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"Photo-realistic single image super-resolution using a generative adversarial network,","work_id":"4e6adeb9-fbc1-4c56-b34e-fd49149670f1","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Srflow: Learning the super-resolution space with normalizing flow,","work_id":"77057d8f-6d0d-42b7-bfdc-123665b57fad","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"Esrgan: Enhanced super-resolution generative adversar- ial networks,","work_id":"d2e8e7ac-31aa-4f46-838c-015584744e62","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":32,"snapshot_sha256":"22aba0918074735a94aaae21ccb21f0aaad42f53bfd6a96fb75c308428136c45","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}