pith. sign in
Pith Number

pith:3BCWRAFN

pith:2026:3BCWRAFNCLVNTQV55HKVPBOZPC
not attested not anchored not stored refs resolved

VGGT-Occ: Geometry-Grounded and Density-Aware Gated Fusion for 3D Occupancy Prediction

Danwei Wang, Fangjinhua Wang, Hesheng Wang, Hongming Shen, Junyi Ma, Rui Wang, Tianchen Deng, Xun Chen

Embedding camera geometry into every attention and fusion step produces more accurate 3D semantic occupancy from multi-view images.

arxiv:2605.16911 v1 · 2026-05-16 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{3BCWRAFNCLVNTQV55HKVPBOZPC}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

By embedding geometric tokens throughout the pipeline with Projection-Aware Deformable Attention that projects 3D offsets and uses the projection Jacobian as an additive bias, plus a view-quality semantic gate and sequential coarse-to-fine gated fusion, VGGT-Occ achieves 33.00% IoU and 21.08% mIoU on SurroundOcc-nuScenes, outperforming prior methods with only ~41M trainable parameters in the occupancy head.

C2weakest assumption

That projecting 3D offsets back to image planes and adding the projection Jacobian as a bias term will reliably suppress unreliable observations, and that the view-quality semantic gate will correctly enforce cross-view consistency without introducing new errors in feature integration.

C3one line summary

VGGT-Occ embeds geometric tokens via PA-DA and uses sequential coarse-to-fine gated fusion to reach 33.00% IoU and 21.08% mIoU on SurroundOcc-nuScenes while using only ~41M parameters in the occupancy head.

References

50 extracted · 50 resolved · 2 Pith anchors

[1] Maxim Berman, Amal Rannen Triki, and Matthew B. Blaschko. The Lovász-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks. InCVPR, 2018 2018
[2] Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom 2020
[3] MonoScene: Monocular 3D semantic scene completion 2022
[4] Gauss- Render: Learning 3D occupancy with Gaussian rendering 2025
[5] Compact 3d gaussian splatting for dense visual slam 2024 · arXiv:2403.11247

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-20T00:03:29.789963Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

d8456880ad12ead9c2bde9d55785d97885049292fb006e9fe678c8865d21f5db

Aliases

arxiv: 2605.16911 · arxiv_version: 2605.16911v1 · doi: 10.48550/arxiv.2605.16911 · pith_short_12: 3BCWRAFNCLVN · pith_short_16: 3BCWRAFNCLVNTQV5 · pith_short_8: 3BCWRAFN
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/3BCWRAFNCLVNTQV55HKVPBOZPC \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d8456880ad12ead9c2bde9d55785d97885049292fb006e9fe678c8865d21f5db
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1353cf93b475b7f2ca6d0e31faac0017f55055833e5f8becd4e7bc431ccea2fa",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2026-05-16T09:51:04Z",
    "title_canon_sha256": "77eeea72c02c88cbd3d75aff2d1ddc552ed1cf35d00707d1dd97b03ccaff796a"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.16911",
    "kind": "arxiv",
    "version": 1
  }
}