TCG-AR: Real-Time Multi-View Augmented Reality for Trading Card Game Streaming

Anthony Cioppa; Antoine Verdonck; Marc Van Droogenbroeck; Maxim Henry; Rapha\"el La Rocca

arxiv: 2607.02090 · v1 · pith:GYHKFRXXnew · submitted 2026-07-02 · 💻 cs.CV

TCG-AR: Real-Time Multi-View Augmented Reality for Trading Card Game Streaming

Anthony Cioppa , Antoine Verdonck , Maxim Henry , Marc Van Droogenbroeck , Rapha\"el La Rocca This is my paper

Pith reviewed 2026-07-03 15:44 UTC · model grok-4.3

classification 💻 cs.CV

keywords augmented realitytrading card gamesreal-time detectionsynthetic datamulti-view camerasobject identificationgame streamingcomputer vision

0 comments

The pith

TCG-AR detects, orients, and identifies cards from ordinary RGB cameras then renders virtual models onto them in real time across multiple views.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a pipeline that processes multi-view RGB footage to locate cards on the table, determine their orientation, recognize which card is which, and overlay virtual three-dimensional content directly on each card. Training relies on an automatic generator that creates labeled synthetic images from a flat reference set of card pictures, eliminating the need for hand-annotated real photographs. The same pipeline can also assemble a single summarized broadcast view that shows the current game state for remote spectators and feeds the result into standard streaming tools. Because the method uses only commodity cameras and consumer hardware, it removes the requirement for instrumented play surfaces or embedded electronics. A sympathetic reader would care because the approach could turn existing top-down streams into visually richer experiences without added cost or setup complexity.

Core claim

TCG-AR is a real-time pipeline that augments trading card games using ordinary RGB cameras alone, without physical markers or specialized hardware. It detects, orients, and identifies the cards on the board, renders virtual content onto each card across all views, and can compose a broadcast-style view that summarizes the game state for spectators before streaming the augmented feeds to standard broadcasting software such as OBS. Models for detection, orientation, and identification are trained on annotated synthetic data produced automatically from a reference set of card images, and performance is measured on a new manually annotated real-image dataset to assess usability.

What carries the argument

The automatic procedure that generates annotated synthetic training data from a reference set of card images, which supplies the labeled examples needed to train the detection, orientation, and identification models.

If this is right

Augmentation becomes possible without instrumented tables or embedded chips.
Multiple ordinary cameras can supply consistent overlays on the same physical cards.
A summarized game-state view can be produced and sent directly to OBS or similar software.
All models, code, and the real-image evaluation dataset are released for community use.
Runtime throughput measurements establish whether the pipeline meets live-streaming latency limits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same synthetic-data generator could be adapted to other physical card or board games that lack large labeled datasets.
Adding a temporal consistency term across video frames might reduce flickering of the rendered overlays without changing the core pipeline.
The broadcast view could be extended to highlight legal moves or track life totals if game rules are encoded as additional input.
Performance on the real dataset provides a baseline for measuring whether future improvements in synthetic rendering close the remaining domain gap.

Load-bearing premise

Synthetic images created automatically from reference card pictures capture enough of the lighting, perspective, and occlusion variation present in real camera footage for models trained on them to work on actual streams.

What would settle it

If a model trained solely on the generated synthetic data records substantially lower accuracy on the manually annotated real-image test set than on synthetic test images, the generalization premise fails.

Figures

Figures reproduced from arXiv: 2607.02090 by Anthony Cioppa, Antoine Verdonck, Marc Van Droogenbroeck, Maxim Henry, Rapha\"el La Rocca.

**Figure 2.** Figure 2: Overview of our TCG-AR pipeline. TCG-AR runs two concurrent paths. The game state recognition path (top) operates on a zenithal view alone: it detects and orients the cards, then identifies and tracks them to estimate the game state, i.e., the set of cards on the board with their identity, position, and orientation. The rendering path (bottom) operates on all views: it registers the auxiliary views to the … view at source ↗

**Figure 3.** Figure 3: Examples from our datasets. (a) Synthetic training data generated automatically from the reference card images: a detection sample with multiple cards composited onto a textured background, single-card crops for orientation, and identification crops of one card under varied lighting, color, and noise. (b) Real evaluation scenes captured under different lighting and board conditions, manually annotated wi… view at source ↗

**Figure 4.** Figure 4: Qualitative results. Augmented output produced by TCG-AR. Left: the zenithal view after augmentation, with a virtual model rendered onto each recognized card. Right: Auxiliary view augmented from the same game state through registration with side panels showing card information. Several cards carry damage counters and status markers that occlude part of their design. TCG-AR identifies them correctly. 5.3 R… view at source ↗

**Figure 5.** Figure 5: GUI composition produced by TCG-AR. The user can control which camera feeds to send to OBS. A side panel provides extra card annotation and control over the model size and forms, e.g., shiny or male versus female versions of the creature. Furthermore, a small user interface lets the operator control the augmented output without editing code. It displays the available feeds in a grid, showing for each camer… view at source ↗

**Figure 6.** Figure 6: Operator interface (deck-selection panel). [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗

**Figure 7.** Figure 7: Hard-negative tiers used for the triplet head. [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: Representative identification errors (real set, ArcFace). [PITH_FULL_IMAGE:figures/full_fig_p029_8.png] view at source ↗

read the original abstract

Trading card games are increasingly played and broadcast online, yet live streams remain mostly limited to flat top-down footage of the playing area. Augmenting such streams with virtual models of the played cards would improve the viewing experience, but most existing systems rely on instrumented playing surfaces and embedded chips, which are costly and impractical for casual players and large-scale events. In this work, we present TCG-AR, a novel real-time pipeline that augments trading card games using ordinary RGB cameras alone, without any physical markers or specialized hardware. Our pipeline detects, orients, and identifies the cards on the board, renders virtual content onto each card across all views, and can additionally compose a broadcaststyle view that summarizes the game state for spectators, streaming the augmented feeds to standard broadcasting software such as OBS. To train the detection, orientation, and identification models without manual labeling, we introduce an automatic procedure that generates annotated synthetic training data from a reference set of card images. Then, we evaluate several trained models on a new manually annotated dataset with real images, analyzing performance and runtime throughput that determine real-world usability. Overall, by relying only on commodity cameras and hardware, and by open-sourcing all code, models, and datasets, this work aims to serve as a reference for real-time trading card recognition and to make real-time augmented-reality streaming accessible to the broader community of players and streamers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TCG-AR delivers a practical RGB-only AR pipeline for trading card streams with synthetic data and open resources, but the real-image evaluation looks thin on metrics and generalization checks.

read the letter

The core takeaway is that this paper gives a working end-to-end system for detecting, orienting, and augmenting cards in live TCG streams using only ordinary cameras, plus a broadcast composition step that feeds OBS. It avoids any need for marked tables or chips.

What is new is the integrated pipeline that combines standard detection with automatic synthetic data generation from reference card images, multi-view rendering, and the open release of code, models, and a real-image test set. The authors show the system can run in real time and target casual players and streamers, which is a clear practical step beyond hardware-heavy prior work.

The paper does a reasonable job describing the full flow from card recognition to augmented output and broadcast view. Open-sourcing everything lowers the barrier for others to test or extend it.

The main soft spot is the evaluation. The abstract and description mention testing on a manually annotated real dataset and checking throughput, yet no numbers, baselines, or error breakdowns appear in the provided material. That leaves the synthetic-to-real transfer—the key assumption—largely unproven in detail. If lighting, perspective, or occlusion shifts cause the models to drop accuracy, the rest of the pipeline cannot recover. This is a common issue in synthetic-data papers and needs stronger evidence here.

The work is aimed at applied computer vision groups and game-streaming developers who want a ready reference implementation. Readers building similar hobby-scale AR systems will find the pipeline and released assets useful even if they adapt the models.

It deserves peer review. The system-level integration and open resources make it worth referee time, though the authors should add quantitative results and failure analysis before publication.

Referee Report

2 major / 1 minor

Summary. The paper presents TCG-AR, a real-time multi-view AR pipeline for trading card game streaming that relies solely on ordinary RGB cameras. It detects, orients, and identifies cards using models trained via an automatic synthetic data generation procedure from reference card images (no manual labeling), renders virtual content onto detected cards across views, and optionally composes a broadcast-style summary view streamed to tools such as OBS. The work claims to evaluate the trained models on a new manually annotated real-image dataset, analyze performance and runtime throughput for real-world usability, and open-sources all code, models, and datasets.

Significance. If the synthetic-to-real transfer achieves usable accuracy, the approach would offer a practical, hardware-free method for enhancing TCG streams with per-card AR overlays and spectator views, addressing limitations of existing instrumented systems. Open-sourcing strengthens potential for reproducibility and adoption in the computer vision and streaming communities.

major comments (2)

[Abstract; evaluation section] Abstract and evaluation section: the manuscript states that models were evaluated on a manually annotated real-image dataset with analysis of performance and runtime throughput, yet reports no quantitative metrics (accuracy, mAP, precision/recall, FPS, baselines, or error bars), preventing verification of whether the results support the real-time multi-view AR and usability claims.
[§3] Synthetic data generation procedure (described in §3): the end-to-end pipeline depends on this procedure producing training data representative of real-world lighting, perspective, and occlusion variations so that models generalize to ordinary RGB footage; no domain-gap analysis, ablation studies, or transfer results on the real dataset are provided to substantiate this load-bearing assumption.

minor comments (1)

Figure captions and pipeline diagrams could more explicitly label the multi-view fusion and broadcast composition stages for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen the evaluation and analysis sections.

read point-by-point responses

Referee: [Abstract; evaluation section] Abstract and evaluation section: the manuscript states that models were evaluated on a manually annotated real-image dataset with analysis of performance and runtime throughput, yet reports no quantitative metrics (accuracy, mAP, precision/recall, FPS, baselines, or error bars), preventing verification of whether the results support the real-time multi-view AR and usability claims.

Authors: We agree that the evaluation section would be strengthened by explicit quantitative results. In the revised version we will report detection and identification accuracy, mAP, precision/recall, runtime FPS (with and without multi-view fusion), relevant baselines, and error bars computed over multiple runs on the manually annotated real-image dataset. revision: yes
Referee: [§3] Synthetic data generation procedure (described in §3): the end-to-end pipeline depends on this procedure producing training data representative of real-world lighting, perspective, and occlusion variations so that models generalize to ordinary RGB footage; no domain-gap analysis, ablation studies, or transfer results on the real dataset are provided to substantiate this load-bearing assumption.

Authors: The effectiveness of the synthetic-to-real transfer is central to the pipeline. While the manuscript already evaluates the resulting models on real images, we did not include explicit domain-gap quantification or ablations. In the revision we will add (i) an analysis of the domain gap (e.g., feature distribution distances), (ii) ablation studies on the synthetic data augmentations for lighting/perspective/occlusion, and (iii) detailed per-model transfer results comparing synthetic-only training against the real test set. revision: yes

Circularity Check

0 steps flagged

No circularity; pipeline is empirical engineering with external validation

full rationale

The paper describes a standard computer-vision pipeline (detection, orientation, identification) trained on automatically generated synthetic data from reference card images and evaluated on a separate manually annotated real-image dataset. No equations, derivations, or mathematical claims appear. No self-citations are invoked as load-bearing uniqueness theorems, no fitted parameters are relabeled as predictions, and no ansatz or renaming reduces the central claim to its own inputs by construction. The synthetic-to-real generalization is an empirical precondition that can be falsified by the held-out real test set; it is not circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that synthetic data generation suffices for training models that generalize to real images; no free parameters or invented entities are mentioned.

axioms (1)

domain assumption Synthetic data generated from reference card images is representative enough of real-world conditions for model training and generalization
The training procedure described in the abstract depends on this to avoid manual labeling while still achieving usable performance on real data.

pith-pipeline@v0.9.1-grok · 5797 in / 1195 out tokens · 22654 ms · 2026-07-03T15:44:49.475537+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

77 extracted references · 57 canonical work pages · 3 internal anchors

[1]

Presence: Teleoperators Vir- tual Environ.6(4), 355–385 (Aug 1997).https://doi.org/10.1162/pres

Azuma, R.T.: A survey of augmented reality. Presence: Teleoperators Vir- tual Environ.6(4), 355–385 (Aug 1997).https://doi.org/10.1162/pres. 1997.6.4.355

work page doi:10.1162/pres 1997
[2]

Github site,https://github

Backes, A.: Pokémon TCG SDK for Python. Github site,https://github. com/PokemonTCG/pokemon-tcg-sdk-python(2021)

2021
[3]

Big Orbit Cards: A brief history of trading card games.https : / / www.bigorbitcards.co.uk/blog/a-brief-history-of-trading-card- games.html(2023)

2023
[4]

Billinghurst, M., Clark, A., Lee, G.: A survey of augmented reality. Found. Trends®Human-Comput. Interact.8(2-3), 73–272 (Mar 2015).https:// doi.org/10.1561/1100000049

work page doi:10.1561/1100000049 2015
[5]

Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature ver- ification using a “siamese” time delay neural network. In: Adv. Neural Inf. Process. Syst. (NeurIPS). vol. 6, pp. 737–744 (1993)

1993
[6]

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Eur. Conf. Comput. Vis. (ECCV). Lect. Notes Comput. Sci., vol. 12346, pp. 213–229. Springer Int. Publ. (2020).https://doi.org/10.1007/978-3-030-58452-8_13

work page doi:10.1007/978-3-030-58452-8_13 2020
[7]

Chen, Q., Rigall, E., Wang, X., Fan, H., Dong, J.: Poker watcher: Playing card detection based on EfficientDet and sandglass block. In: Int. Conf. Aware. Sci. Technol. (icast). pp. 1–6. IEEE, Qingdao, China (Dec 2020). https://doi.org/10.1109/icast51195.2020.9319468

work page doi:10.1109/icast51195.2020.9319468 2020
[8]

Chum, O., Pajdla, T., Sturm, P.: The geometric error for homographies. Comput. Vis. Image Underst.97(1), 86–102 (Jan 2005).https://doi. org/10.1016/j.cviu.2004.03.004

work page doi:10.1016/j.cviu.2004.03.004 2005
[9]

In: IEEE Conf

Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describ- ing textures in the wild. In: IEEE Conf. Comput. Vis. Pattern Recog- nit. (CVPR). pp. 3606–3613. IEEE, Columbus, OH, USA (Jun 2014). https://doi.org/10.1109/cvpr.2014.461

work page doi:10.1109/cvpr.2014.461 2014
[10]

IEEE Trans

Deng, J., Guo, J., Yang, J., Xue, N., Kotsia, I., Zafeiriou, S.: ArcFace: Additive angular margin loss for deep face recognition. IEEE Trans. Pattern Anal. Mach. Intell.44(10), 5962–5979 (Oct 2022).https://doi.org/10. 1109/tpami.2021.3087709

work page arXiv 2022
[11]

In: IEEE/CVF Conf

DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: Self-supervised interest point detection and description. In: IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Work. (CVPRW). pp. 337–33712. IEEE, Salt Lake City, UT, USA (Jun 2018).https://doi.org/10.1109/cvprw.2018.00060

work page doi:10.1109/cvprw.2018.00060 2018
[12]

In: IEEE/CVF Conf

Ding, J., Xue, N., Long, Y., Xia, G.S., Lu, Q.: Learning RoI transformer for oriented object detection in aerial images. In: IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 2844–2853. IEEE, Long Beach, CA, USA (Jun 2019).https://doi.org/10.1109/cvpr.2019.00296 16 A. Cioppa

work page doi:10.1109/cvpr.2019.00296 2019
[13]

In: IEEE Int

Dwibedi, D., Misra, I., Hebert, M.: Cut, paste and learn: Surprisingly easy synthesis for instance detection. In: IEEE Int. Conf. Comput. Vis. (ICCV). pp. 1310–1319. IEEE, Venice, Italy (Oct 2017).https://doi.org/10. 1109/iccv.2017.146

2017
[14]

Web site

Eyevo: Eyevo: Pokémon TCG scanner. Web site
[15]

Fischler, M.A., Bolles, R.C.: Random sample consensus. Commun. ACM 24(6), 381–395 (Jun 1981).https://doi.org/10.1145/358669.358692

work page doi:10.1145/358669.358692 1981
[16]

In: IEEE Int

Fonder, M., Van Droogenbroeck, M.: Mid-air: A multi-modal dataset for ex- tremely low altitude drone flights. In: IEEE Int. Conf. Comput. Vis. Pattern Recognit. Work. (CVPRW), UAVision. pp. 553–562. IEEE, Long Beach, CA, USA (Jun 2019).https://doi.org/10.1109/cvprw.2019.00081

work page doi:10.1109/cvprw.2019.00081 2019
[17]

In: Robotics: Science and Systems XIII

Georgakis, G., Mousavian, A., Berg, A., Kosecka, J.: Synthesizing training data for object detection in indoor scenes. In: Robotics: Science and Systems XIII. vol. XIII, pp. 1–9. Cambridge, MA, USA (Jul 2017).https://doi. org/10.15607/rss.2017.xiii.043

work page doi:10.15607/rss.2017.xiii.043 2017
[18]

In: IEEE Int

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Int. Conf. Comput.Vis.PatternRecognit.(CVPR).pp.580–587.Columbus,OH,USA (Jun 2014).https://doi.org/10.1109/CVPR.2014.81

work page doi:10.1109/cvpr.2014.81 2014
[19]

In: IEEE Conf

Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learn- ing an invariant mapping. In: IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR). vol. 2, pp. 1735–1742. Inst. Electr. Electron. Eng. (IEEE), New York City, NY, USA (Jun 2006).https://doi.org/10.1109/CVPR.2006. 100

work page doi:10.1109/cvpr.2006 2006
[20]

Zee- shan Zia, and Quoc-Huy Tran

Han, J., Ding, J., Xue, N., Xia, G.S.: ReDet: A rotation-equivariant detec- tor for aerial object detection. In: IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 2785–2794. IEEE, Nashville, TN, USA (Jun 2021). https://doi.org/10.1109/cvpr46437.2021.00281

work page doi:10.1109/cvpr46437.2021.00281 2021
[21]

In: IEEE Conf

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recogni- tion. In: IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 770–778. IEEE, Las Vegas, NV, USA (Jun 2016).https://doi.org/10.1109/CVPR. 2016.90

work page doi:10.1109/cvpr 2016
[22]

Swin transformer: Hierarchical vision transformer using shifted windows,

He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: TransReID: Transformer-based object re-identification. In: IEEE/CVF Int. Conf. Com- put. Vis. (ICCV). pp. 14993–15002. IEEE, Montréal, Can. (Oct 2021). https://doi.org/10.1109/iccv48922.2021.01474

work page doi:10.1109/iccv48922.2021.01474 2021
[23]

Web site,https://www.hitscantcg.com(2025)

HitScanTCG: HitScanTCG: AI-powered card scanning for streamers. Web site,https://www.hitscantcg.com(2025)

2025
[24]

In: IEEE/CVF Int

Howard, A., Sandler, M., Chen, B., Wang, W., Chen, L.C., Tan, M., Chu, G., Vasudevan, V., Zhu, Y., Pang, R., Adam, H., Le, Q.: Searching for MobileNetV3. In: IEEE/CVF Int. Conf. Comput. Vis. (ICCV). pp. 1314–
[25]

IEEE, Seoul, South Korea (Oct 2019).https://doi.org/10.1109/ iccv.2019.00140

work page arXiv 2019
[26]

Web site,https://hypeoverlay.com(2026) TCG-AR 17

Hype Overlay: Hype Overlay: OBS-native trading card game scanner. Web site,https://hypeoverlay.com(2026) TCG-AR 17

2026
[27]

In: IEEE Int

Kendall, A., Cipolla, R.: Modelling uncertainty in deep learning for camera relocalization. In: IEEE Int. Conf. Robot. Autom. (ICRA). pp. 4762–4769. IEEE, Stockholm, Sweden (May 2016).https://doi.org/10.1109/icra. 2016.7487679

work page doi:10.1109/icra 2016
[28]

In: IEEE Int

Kendall, A., Grimes, M., Cipolla, R.: PoseNet: A convolutional network for real-time 6-DOF camera relocalization. In: IEEE Int. Conf. Comput. Vis. (ICCV). pp. 2938–2946. Inst. Electr. Electron. Eng. (IEEE), Santiago, Chile (Dec 2015).https://doi.org/10.1109/iccv.2015.336

work page doi:10.1109/iccv.2015.336 2015
[29]

In: ACM Int

Lam, A.H.T., Chow, K.C.H., Yau, E.H.H., Lyu, M.R.: ART: augmented reality table for interactive trading card game. In: ACM Int. Conf. Virtual Real. Contin. Its Appl. pp. 357–360. ACM, Hong Kong, China (Jun 2006). https://doi.org/10.1145/1128923.1128987

work page doi:10.1145/1128923.1128987 2006
[30]

In: PerGames

Lee, W., Woo, W., Lee, J.: TARBoard: Tangible augmented reality system for table-top game environment. In: PerGames. pp. 1–5. Munich, Germany (May 2005)

2005
[31]

Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Featurepyramidnetworks forobjectdetection.In:IEEEInt.Conf.Comput. Vis. Pattern Recognit. (CVPR). pp. 2117–2125. Honolulu, HI, USA (Jul 2017).https://doi.org/10.1109/CVPR.2017.106

work page doi:10.1109/cvpr.2017.106 2017
[32]

Focal Loss for Dense Object Detection

Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. arXivabs/1708.02002(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[33]

Peebles and S

Lindenberger, P., Sarlin, P.E., Pollefeys, M.: LightGlue: Local feature matching at light speed. In: IEEE/CVF Int. Conf. Comput. Vis. (ICCV). pp. 17581–17592. IEEE, Paris, Fr. (Oct 2023).https://doi.org/10.1109/ iccv51070.2023.01616

work page arXiv 2023
[34]

SSD: Single Shot MultiBox Detector

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.: SSD: Single shot multibox detector. arXivabs/1512.02325(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[35]

Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis.60(2), 91–110 (Nov 2004)

2004
[36]

Ma, N., Zhang, X., Zheng, H.T., Sun, J.: ShuffleNet v2: Practical guide- lines for efficient CNN architecture design. In: Eur. Conf. Comput. Vis. (ECCV). Lect. Notes Comput. Sci., vol. 11218, pp. 122–138. Springer Int. Publ. (2018).https://doi.org/10.1007/978-3-030-01264-9_8

work page doi:10.1007/978-3-030-01264-9_8 2018
[37]

arXivabs/2106.03146(2021).https : / / doi

Ma, T., Mao, M., Zheng, H., Gao, P., Wang, X., Han, S., Ding, E., Zhang, B., Doermann, D.: Oriented object detection with transformer. arXivabs/2106.03146(2021).https : / / doi . org / 10 . 48550 / arXiv . 2106.03146

work page arXiv 2021
[38]

IEEE Trans

Marchand, E., Uchiyama, H., Spindler, F.: Pose estimation for augmented reality: A hands-on survey. IEEE Trans. Vis. Comput. Graph.22(12), 2633– 2651 (Dec 2016).https://doi.org/10.1109/tvcg.2015.2513408

work page doi:10.1109/tvcg.2015.2513408 2016
[39]

Mittal, K., Gill, K.S., Chauhan, R., Sharma, M., Sunil, G.: Playing cards classification and detection using sequential CNN model through machine learning techniques using artificial intelligence. In: Int. Conf. E-mobility, Power Control Smart Syst. (ICEMPS). pp. 1–4. IEEE, Thiruvananthapu- ram, India (Apr 2024).https://doi.org/10.1109/icemps60684.2024. 1...

work page doi:10.1109/icemps60684.2024 2024
[40]

In: IEEE Int

Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: IEEE Int. Conf. Comput. Vis. (ICCV). pp. 360–368. IEEE, Venice, Italy (Oct 2017).https://doi.org/ 10.1109/iccv.2017.47

work page doi:10.1109/iccv.2017.47 2017
[41]

IEEE Trans

Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot.31(5), 1147–1163 (Oct 2015).https://doi.org/10.1109/tro.2015.2463671

work page doi:10.1109/tro.2015.2463671 2015
[42]

Musgrave, K., Belongie, S., Lim, S.N.: A metric learning reality check. In: Eur. Conf. Comput. Vis. (ECCV). Lect. Notes Comput. Sci., vol. 12370, pp. 681–699. Springer Int. Publ. (2020).https://doi.org/10.1007/978- 3-030-58595-2_41

work page doi:10.1007/978- 2020
[43]

In: IEEE Conf

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 779–788. Inst. Electr. Electron. Eng. (IEEE), Las Vegas, NV, USA (Jun 2016).https://doi.org/10.1109/cvpr.2016.91

work page doi:10.1109/cvpr.2016.91 2016
[44]

In: IEEE Int

Rematas, K., Kemelmacher-Shlizerman, I., Curless, B., Seitz, S.: Soccer on your tabletop. In: IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 4738–4747. Salt Lake City, UT, USA (Jun 2018).https://doi.org/ 10.1109/CVPR.2018.00498

work page doi:10.1109/cvpr.2018.00498 2018
[45]

IEEE Trans

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell.39(6), 1137–1149 (Jun 2017).https://doi.org/10.1109/ TPAMI.2016.2577031

work page arXiv 2017
[46]

Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: Ground truth from computer games. In: Eur. Conf. Comput. Vis. (ECCV). Lect. Notes Comput. Sci., vol. 9906, pp. 102–118. Springer Int. Publ. (2016). https://doi.org/10.1007/978-3-319-46475-6_7

work page doi:10.1007/978-3-319-46475-6_7 2016
[47]

In: IEEE Conf

Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The SYN- THIA dataset: A large collection of synthetic images for semantic seg- mentation of urban scenes. In: IEEE Conf. Comput. Vis. Pattern Recog- nit. (CVPR). pp. 3234–3243. IEEE, Las Vegas, NV, USA (Jun 2016). https://doi.org/10.1109/cvpr.2016.352

work page doi:10.1109/cvpr.2016.352 2016
[48]

In: IEEE Int

Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient al- ternative to SIFT or SURF. In: IEEE Int. Conf. Comput. Vis. (ICCV). pp. 2564–2571. IEEE, Barcelona, Spain (Nov 2011).https://doi.org/10. 1109/iccv.2011.6126544

work page arXiv 2011
[49]

In: IEEE Conf

Sarlin,P.E.,DeTone,D.,Malisiewicz,T.,Rabinovich,A.:SuperGlue:Learn- ing feature matching with graph neural networks. In: IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 4937–4946. IEEE, Seattle, WA, USA (Jun 2020).https://doi.org/10.1109/CVPR42600.2020. 00499

work page doi:10.1109/cvpr42600.2020 2020
[50]

In: IEEE Conf

Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: A unified embedding for face recognition and clustering. In: IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 815–823. IEEE, Boston, MA, USA (Jun 2015). https://doi.org/10.1109/cvpr.2015.7298682 TCG-AR 19

work page doi:10.1109/cvpr.2015.7298682 2015
[51]

Scrydex: Scrydex: TCG API and toolkit for pokémon, magic, lorcana, and yu-gi-oh! Web site,https://scrydex.com(2025)

2025
[52]

Sohn, K.: Improved deep metric learning with multi-class N-pair loss objec- tive. In: Adv. Neural Inf. Process. Syst. (NeurIPS). vol. 29, pp. 1857–1865. Curran Assoc. Inc., Barcelona, Spain (Dec 2016)

2016
[53]

Somers, V.: Person Re-Identification and its Application to Multi-Object Tracking. Ph.D. thesis, Cathol. Univ. Louvain, Belg. (May 1995)

1995
[54]

In: IEEE/CVF Winter Conf

Somers, V., De Vleeschouwer, C., Alahi, A.: Body part-based representation learning for occluded person re-identification. In: IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV). pp. 1613–1623. IEEE, Waikoloa, HI, USA (Jan 2023).https://doi.org/10.1109/wacv56688.2023.00166

work page doi:10.1109/wacv56688.2023.00166 2023
[55]

statista.com/statistics/672617/most-valuable-pokemon-trading- cards/(2025)

Statista: Pokémon trading card game - statistics & facts.https://www. statista.com/statistics/672617/most-valuable-pokemon-trading- cards/(2025)

2025
[56]

Zee- shan Zia, and Quoc-Huy Tran

Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: Detector-free local feature matching with transformers. In: IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 8918–8927. IEEE, Nashville, TN, USA (Jun 2021).https://doi.org/10.1109/cvpr46437.2021.00881

work page doi:10.1109/cvpr46437.2021.00881 2021
[57]

youtube.com/watch?v=64-LfbggqKI

SuperZouloux:LerêvedetouslesfansdeYu-Gi-Oh!YouTubevideo,https: //www.youtube.com/watch?v=64-LfbggqKI(Oct 2005),https://www. youtube.com/watch?v=64-LfbggqKI

2005
[58]

PMLR, Long Beach, CA, USA (Jun 2019)

Tan, M., Le, Q.V.: EfficientNet: Rethinking model scaling for convolutional neuralnetworks.In:Int.Conf.Mach.Learn.(ICML).vol.97,pp.6105–6114. PMLR, Long Beach, CA, USA (Jun 2019)

2019
[59]

In: IEEE Conf

Tan, M., Pang, R., Le, Q.V.: EfficientDet: Scalable and efficient object de- tection. In: IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 10778–10787. Seattle, WA, USA (Jun 2020).https://doi.org/10.1109/ CVPR42600.2020.01079

work page arXiv 2020
[60]

https://worlds.2025.pokemon.com/en-us/championships/(2025)

The Pokémon Company International: 2025 pokémon world championships. https://worlds.2025.pokemon.com/en-us/championships/(2025)

2025
[61]

The Pokémon Company International: Media alert: Pokémon unveils new pokémon trading card game: 30th celebration expansion, commemorat- ing 30 years.https://press.pokemon.com/en/releases/MEDIA-ALERT- Pokemon - Unveils - New - Pokemon - Trading - Card - Game - 30th - Celebra (2026)

2026
[62]

Thomas, B.H.: A survey of visual, mixed, and augmented reality gaming. Comput. Entertain.10(1), 1–33 (Oct 2012).https://doi.org/10.1145/ 2381876.2381879

work page arXiv 2012
[63]

IEEE Trans

Tian, Z., Shen, C., Chen, H., He, T.: FCOS: A simple and strong anchor-free object detector. IEEE Trans. Pattern Anal. Mach. Intell.44(4), 1922–1933 (Apr 2020).https://doi.org/10.1109/tpami.2020.3032166

work page doi:10.1109/tpami.2020.3032166 1922
[64]

In: IEEE/RSJ Int

Tobin,J.,Fong,R.,Ray,A.,Schneider,J.,Zaremba,W.,Abbeel,P.:Domain randomization for transferring deep neural networks from simulation to the real world. In: IEEE/RSJ Int. Conf. Intell. Robot. Syst. (IROS). pp. 23–30. IEEE, Vancouver, Can. (Sept 2017).https://doi.org/10.1109/iros. 2017.8202133 20 A. Cioppa

work page doi:10.1109/iros 2017
[65]

In: IEEE/CVF Conf

Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., Birchfield, S.: Training deep networks with synthetic data: Bridging the reality gap by domain randomization. In: IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Work. (CVPRW). pp. 1082–10828. IEEE, Salt Lake City, UT, USA (Jun 2018).https://doi. or...

work page doi:10.1109/cvprw.2018.00143 2018
[66]

In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. In: IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 5017–5025. IEEE, Long Beach, CA, USA (Jun 2019).https://doi.org/10.1109/cvpr.2019. 00516

work page doi:10.1109/cvpr.2019 2019
[67]

Woods, D.L., Wyma, J.M., Yund, E.W., Herron, T.J., Reed, B.: Factors influencing the latency of simple reaction time. Front. Hum. Neurosci.9 (Mar 2015).https://doi.org/10.3389/fnhum.2015.00131

work page doi:10.3389/fnhum.2015.00131 2015
[68]

In: IEEE Int

Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: IEEE Int. Conf. Comput. Vis. (ICCV). pp. 2859–2867. IEEE, Venice, Italy (Oct 2017).https://doi.org/10.1109/ iccv.2017.309

2017
[69]

arXivabs/2109.11861(2021).https://doi.org/10.48550/ arXiv.2109.11861

Wzorek, P., Kryjak, T.: Training dataset generation for bridge game reg- istration. arXivabs/2109.11861(2021).https://doi.org/10.48550/ arXiv.2109.11861

work page arXiv 2021
[70]

In: IEEE/CVF Int

Xie, X., Cheng, G., Wang, J., Yao, X., Han, J.: Oriented r-CNN for object detection. In: IEEE/CVF Int. Conf. Comput. Vis. (ICCV). pp. 3500–3509. IEEE,Montréal,Can. (Oct2021).https://doi.org/10.1109/iccv48922. 2021.00350

work page doi:10.1109/iccv48922 2021
[71]

IEEE Trans

Xu, Y., Fu, M., Wang, Q., Wang, Y., Chen, K., Xia, G.S., Bai, X.: Glid- ing vertex on the horizontal bounding box for multi-oriented object detec- tion. IEEE Trans. Pattern Anal. Mach. Intell.43(4), 1452–1459 (Apr 2021). https://doi.org/10.1109/tpami.2020.2974745

work page doi:10.1109/tpami.2020.2974745 2021
[72]

In: AAAI Conf

Yang, X., Yan, J., Feng, Z., He, T.: R3Det: Refined single-stage detec- tor with feature refinement for rotating object. In: AAAI Conf. Artif. In- tell. vol. 35, pp. 3163–3171. Assoc. Adv. Artif. Intell. (AAAI) (May 2021). https://doi.org/10.1609/aaai.v35i4.16426

work page doi:10.1609/aaai.v35i4.16426 2021
[73]

Ye, M., Chen, S., Li, C., Zheng, W.S., Crandall, D., Du, B.: Transformer for object re-identification: A survey. Int. J. Comput. Vis.133(5), 2410–2440 (Nov 2024).https://doi.org/10.1007/s11263-024-02284-4

work page doi:10.1007/s11263-024-02284-4 2024
[74]

Yi, D., Lei, Z., Liao, S., Li, S.Z.: Deep metric learning for person re- identification. In: Int. Conf. Pattern Recognit. (ICPR). pp. 34–39. IEEE, Stockholm, Sweden (Aug 2014).https://doi.org/10.1109/icpr.2014. 16

work page doi:10.1109/icpr.2014 2014
[75]

Yang, Extension of a complete monotonicity theorem with ap- plications, arXiv:2507.10954, 2025,https://doi.org/10.48550/arXiv

Zhai, A., Wu, H.Y.: Classification is a strong baseline for deep metric learn- ing. arXivabs/1811.12649(2018).https://doi.org/10.48550/arXiv. 1811.12649

work page internal anchor Pith review doi:10.48550/arxiv 2018
[76]

In: IEEE Conf

Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: IEEE Conf. Comput. Vis. Pattern TCG-AR 21 Recognit. (CVPR). pp. 5122–5130. IEEE, Honolulu, HI, USA (Jul 2017). https://doi.org/10.1109/cvpr.2017.544 22 A. Cioppa 7 Supplementary Material This appendix presents more information about the interfac...

work page doi:10.1109/cvpr.2017.544 2017
[77]

this energy powers that Pokémon,

A linear warm-up over500iterations (ratio1/3) precedes a step schedule that divides the learning rate by10at epochs2and4. Training runs for5epochs with a batch size of2images on one GPU (∼9,000iterations/epoch over the18,000 training images); a checkpoint is saved every epoch. The RPN uses anchors of scale8with aspect ratios{0.5,1,2}over strides{4,8,16,32...

2048

[1] [1]

Presence: Teleoperators Vir- tual Environ.6(4), 355–385 (Aug 1997).https://doi.org/10.1162/pres

Azuma, R.T.: A survey of augmented reality. Presence: Teleoperators Vir- tual Environ.6(4), 355–385 (Aug 1997).https://doi.org/10.1162/pres. 1997.6.4.355

work page doi:10.1162/pres 1997

[2] [2]

Github site,https://github

Backes, A.: Pokémon TCG SDK for Python. Github site,https://github. com/PokemonTCG/pokemon-tcg-sdk-python(2021)

2021

[3] [3]

Big Orbit Cards: A brief history of trading card games.https : / / www.bigorbitcards.co.uk/blog/a-brief-history-of-trading-card- games.html(2023)

2023

[4] [4]

Billinghurst, M., Clark, A., Lee, G.: A survey of augmented reality. Found. Trends®Human-Comput. Interact.8(2-3), 73–272 (Mar 2015).https:// doi.org/10.1561/1100000049

work page doi:10.1561/1100000049 2015

[5] [5]

Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature ver- ification using a “siamese” time delay neural network. In: Adv. Neural Inf. Process. Syst. (NeurIPS). vol. 6, pp. 737–744 (1993)

1993

[6] [6]

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Eur. Conf. Comput. Vis. (ECCV). Lect. Notes Comput. Sci., vol. 12346, pp. 213–229. Springer Int. Publ. (2020).https://doi.org/10.1007/978-3-030-58452-8_13

work page doi:10.1007/978-3-030-58452-8_13 2020

[7] [7]

Chen, Q., Rigall, E., Wang, X., Fan, H., Dong, J.: Poker watcher: Playing card detection based on EfficientDet and sandglass block. In: Int. Conf. Aware. Sci. Technol. (icast). pp. 1–6. IEEE, Qingdao, China (Dec 2020). https://doi.org/10.1109/icast51195.2020.9319468

work page doi:10.1109/icast51195.2020.9319468 2020

[8] [8]

Chum, O., Pajdla, T., Sturm, P.: The geometric error for homographies. Comput. Vis. Image Underst.97(1), 86–102 (Jan 2005).https://doi. org/10.1016/j.cviu.2004.03.004

work page doi:10.1016/j.cviu.2004.03.004 2005

[9] [9]

In: IEEE Conf

Cimpoi, M., Maji, S., Kokkinos, I., Mohamed, S., Vedaldi, A.: Describ- ing textures in the wild. In: IEEE Conf. Comput. Vis. Pattern Recog- nit. (CVPR). pp. 3606–3613. IEEE, Columbus, OH, USA (Jun 2014). https://doi.org/10.1109/cvpr.2014.461

work page doi:10.1109/cvpr.2014.461 2014

[10] [10]

IEEE Trans

Deng, J., Guo, J., Yang, J., Xue, N., Kotsia, I., Zafeiriou, S.: ArcFace: Additive angular margin loss for deep face recognition. IEEE Trans. Pattern Anal. Mach. Intell.44(10), 5962–5979 (Oct 2022).https://doi.org/10. 1109/tpami.2021.3087709

work page arXiv 2022

[11] [11]

In: IEEE/CVF Conf

DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: Self-supervised interest point detection and description. In: IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Work. (CVPRW). pp. 337–33712. IEEE, Salt Lake City, UT, USA (Jun 2018).https://doi.org/10.1109/cvprw.2018.00060

work page doi:10.1109/cvprw.2018.00060 2018

[12] [12]

In: IEEE/CVF Conf

Ding, J., Xue, N., Long, Y., Xia, G.S., Lu, Q.: Learning RoI transformer for oriented object detection in aerial images. In: IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 2844–2853. IEEE, Long Beach, CA, USA (Jun 2019).https://doi.org/10.1109/cvpr.2019.00296 16 A. Cioppa

work page doi:10.1109/cvpr.2019.00296 2019

[13] [13]

In: IEEE Int

Dwibedi, D., Misra, I., Hebert, M.: Cut, paste and learn: Surprisingly easy synthesis for instance detection. In: IEEE Int. Conf. Comput. Vis. (ICCV). pp. 1310–1319. IEEE, Venice, Italy (Oct 2017).https://doi.org/10. 1109/iccv.2017.146

2017

[14] [14]

Web site

Eyevo: Eyevo: Pokémon TCG scanner. Web site

[15] [15]

Fischler, M.A., Bolles, R.C.: Random sample consensus. Commun. ACM 24(6), 381–395 (Jun 1981).https://doi.org/10.1145/358669.358692

work page doi:10.1145/358669.358692 1981

[16] [16]

In: IEEE Int

Fonder, M., Van Droogenbroeck, M.: Mid-air: A multi-modal dataset for ex- tremely low altitude drone flights. In: IEEE Int. Conf. Comput. Vis. Pattern Recognit. Work. (CVPRW), UAVision. pp. 553–562. IEEE, Long Beach, CA, USA (Jun 2019).https://doi.org/10.1109/cvprw.2019.00081

work page doi:10.1109/cvprw.2019.00081 2019

[17] [17]

In: Robotics: Science and Systems XIII

Georgakis, G., Mousavian, A., Berg, A., Kosecka, J.: Synthesizing training data for object detection in indoor scenes. In: Robotics: Science and Systems XIII. vol. XIII, pp. 1–9. Cambridge, MA, USA (Jul 2017).https://doi. org/10.15607/rss.2017.xiii.043

work page doi:10.15607/rss.2017.xiii.043 2017

[18] [18]

In: IEEE Int

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Int. Conf. Comput.Vis.PatternRecognit.(CVPR).pp.580–587.Columbus,OH,USA (Jun 2014).https://doi.org/10.1109/CVPR.2014.81

work page doi:10.1109/cvpr.2014.81 2014

[19] [19]

In: IEEE Conf

Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learn- ing an invariant mapping. In: IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR). vol. 2, pp. 1735–1742. Inst. Electr. Electron. Eng. (IEEE), New York City, NY, USA (Jun 2006).https://doi.org/10.1109/CVPR.2006. 100

work page doi:10.1109/cvpr.2006 2006

[20] [20]

Zee- shan Zia, and Quoc-Huy Tran

Han, J., Ding, J., Xue, N., Xia, G.S.: ReDet: A rotation-equivariant detec- tor for aerial object detection. In: IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 2785–2794. IEEE, Nashville, TN, USA (Jun 2021). https://doi.org/10.1109/cvpr46437.2021.00281

work page doi:10.1109/cvpr46437.2021.00281 2021

[21] [21]

In: IEEE Conf

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recogni- tion. In: IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 770–778. IEEE, Las Vegas, NV, USA (Jun 2016).https://doi.org/10.1109/CVPR. 2016.90

work page doi:10.1109/cvpr 2016

[22] [22]

Swin transformer: Hierarchical vision transformer using shifted windows,

He, S., Luo, H., Wang, P., Wang, F., Li, H., Jiang, W.: TransReID: Transformer-based object re-identification. In: IEEE/CVF Int. Conf. Com- put. Vis. (ICCV). pp. 14993–15002. IEEE, Montréal, Can. (Oct 2021). https://doi.org/10.1109/iccv48922.2021.01474

work page doi:10.1109/iccv48922.2021.01474 2021

[23] [23]

Web site,https://www.hitscantcg.com(2025)

HitScanTCG: HitScanTCG: AI-powered card scanning for streamers. Web site,https://www.hitscantcg.com(2025)

2025

[24] [24]

In: IEEE/CVF Int

Howard, A., Sandler, M., Chen, B., Wang, W., Chen, L.C., Tan, M., Chu, G., Vasudevan, V., Zhu, Y., Pang, R., Adam, H., Le, Q.: Searching for MobileNetV3. In: IEEE/CVF Int. Conf. Comput. Vis. (ICCV). pp. 1314–

[25] [25]

IEEE, Seoul, South Korea (Oct 2019).https://doi.org/10.1109/ iccv.2019.00140

work page arXiv 2019

[26] [26]

Web site,https://hypeoverlay.com(2026) TCG-AR 17

Hype Overlay: Hype Overlay: OBS-native trading card game scanner. Web site,https://hypeoverlay.com(2026) TCG-AR 17

2026

[27] [27]

In: IEEE Int

Kendall, A., Cipolla, R.: Modelling uncertainty in deep learning for camera relocalization. In: IEEE Int. Conf. Robot. Autom. (ICRA). pp. 4762–4769. IEEE, Stockholm, Sweden (May 2016).https://doi.org/10.1109/icra. 2016.7487679

work page doi:10.1109/icra 2016

[28] [28]

In: IEEE Int

Kendall, A., Grimes, M., Cipolla, R.: PoseNet: A convolutional network for real-time 6-DOF camera relocalization. In: IEEE Int. Conf. Comput. Vis. (ICCV). pp. 2938–2946. Inst. Electr. Electron. Eng. (IEEE), Santiago, Chile (Dec 2015).https://doi.org/10.1109/iccv.2015.336

work page doi:10.1109/iccv.2015.336 2015

[29] [29]

In: ACM Int

Lam, A.H.T., Chow, K.C.H., Yau, E.H.H., Lyu, M.R.: ART: augmented reality table for interactive trading card game. In: ACM Int. Conf. Virtual Real. Contin. Its Appl. pp. 357–360. ACM, Hong Kong, China (Jun 2006). https://doi.org/10.1145/1128923.1128987

work page doi:10.1145/1128923.1128987 2006

[30] [30]

In: PerGames

Lee, W., Woo, W., Lee, J.: TARBoard: Tangible augmented reality system for table-top game environment. In: PerGames. pp. 1–5. Munich, Germany (May 2005)

2005

[31] [31]

Lin, T.Y., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Featurepyramidnetworks forobjectdetection.In:IEEEInt.Conf.Comput. Vis. Pattern Recognit. (CVPR). pp. 2117–2125. Honolulu, HI, USA (Jul 2017).https://doi.org/10.1109/CVPR.2017.106

work page doi:10.1109/cvpr.2017.106 2017

[32] [32]

Focal Loss for Dense Object Detection

Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. arXivabs/1708.02002(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[33] [33]

Peebles and S

Lindenberger, P., Sarlin, P.E., Pollefeys, M.: LightGlue: Local feature matching at light speed. In: IEEE/CVF Int. Conf. Comput. Vis. (ICCV). pp. 17581–17592. IEEE, Paris, Fr. (Oct 2023).https://doi.org/10.1109/ iccv51070.2023.01616

work page arXiv 2023

[34] [34]

SSD: Single Shot MultiBox Detector

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.: SSD: Single shot multibox detector. arXivabs/1512.02325(2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[35] [35]

Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis.60(2), 91–110 (Nov 2004)

2004

[36] [36]

Ma, N., Zhang, X., Zheng, H.T., Sun, J.: ShuffleNet v2: Practical guide- lines for efficient CNN architecture design. In: Eur. Conf. Comput. Vis. (ECCV). Lect. Notes Comput. Sci., vol. 11218, pp. 122–138. Springer Int. Publ. (2018).https://doi.org/10.1007/978-3-030-01264-9_8

work page doi:10.1007/978-3-030-01264-9_8 2018

[37] [37]

arXivabs/2106.03146(2021).https : / / doi

Ma, T., Mao, M., Zheng, H., Gao, P., Wang, X., Han, S., Ding, E., Zhang, B., Doermann, D.: Oriented object detection with transformer. arXivabs/2106.03146(2021).https : / / doi . org / 10 . 48550 / arXiv . 2106.03146

work page arXiv 2021

[38] [38]

IEEE Trans

Marchand, E., Uchiyama, H., Spindler, F.: Pose estimation for augmented reality: A hands-on survey. IEEE Trans. Vis. Comput. Graph.22(12), 2633– 2651 (Dec 2016).https://doi.org/10.1109/tvcg.2015.2513408

work page doi:10.1109/tvcg.2015.2513408 2016

[39] [39]

Mittal, K., Gill, K.S., Chauhan, R., Sharma, M., Sunil, G.: Playing cards classification and detection using sequential CNN model through machine learning techniques using artificial intelligence. In: Int. Conf. E-mobility, Power Control Smart Syst. (ICEMPS). pp. 1–4. IEEE, Thiruvananthapu- ram, India (Apr 2024).https://doi.org/10.1109/icemps60684.2024. 1...

work page doi:10.1109/icemps60684.2024 2024

[40] [40]

In: IEEE Int

Movshovitz-Attias, Y., Toshev, A., Leung, T.K., Ioffe, S., Singh, S.: No fuss distance metric learning using proxies. In: IEEE Int. Conf. Comput. Vis. (ICCV). pp. 360–368. IEEE, Venice, Italy (Oct 2017).https://doi.org/ 10.1109/iccv.2017.47

work page doi:10.1109/iccv.2017.47 2017

[41] [41]

IEEE Trans

Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot.31(5), 1147–1163 (Oct 2015).https://doi.org/10.1109/tro.2015.2463671

work page doi:10.1109/tro.2015.2463671 2015

[42] [42]

Musgrave, K., Belongie, S., Lim, S.N.: A metric learning reality check. In: Eur. Conf. Comput. Vis. (ECCV). Lect. Notes Comput. Sci., vol. 12370, pp. 681–699. Springer Int. Publ. (2020).https://doi.org/10.1007/978- 3-030-58595-2_41

work page doi:10.1007/978- 2020

[43] [43]

In: IEEE Conf

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 779–788. Inst. Electr. Electron. Eng. (IEEE), Las Vegas, NV, USA (Jun 2016).https://doi.org/10.1109/cvpr.2016.91

work page doi:10.1109/cvpr.2016.91 2016

[44] [44]

In: IEEE Int

Rematas, K., Kemelmacher-Shlizerman, I., Curless, B., Seitz, S.: Soccer on your tabletop. In: IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 4738–4747. Salt Lake City, UT, USA (Jun 2018).https://doi.org/ 10.1109/CVPR.2018.00498

work page doi:10.1109/cvpr.2018.00498 2018

[45] [45]

IEEE Trans

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell.39(6), 1137–1149 (Jun 2017).https://doi.org/10.1109/ TPAMI.2016.2577031

work page arXiv 2017

[46] [46]

Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: Ground truth from computer games. In: Eur. Conf. Comput. Vis. (ECCV). Lect. Notes Comput. Sci., vol. 9906, pp. 102–118. Springer Int. Publ. (2016). https://doi.org/10.1007/978-3-319-46475-6_7

work page doi:10.1007/978-3-319-46475-6_7 2016

[47] [47]

In: IEEE Conf

Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The SYN- THIA dataset: A large collection of synthetic images for semantic seg- mentation of urban scenes. In: IEEE Conf. Comput. Vis. Pattern Recog- nit. (CVPR). pp. 3234–3243. IEEE, Las Vegas, NV, USA (Jun 2016). https://doi.org/10.1109/cvpr.2016.352

work page doi:10.1109/cvpr.2016.352 2016

[48] [48]

In: IEEE Int

Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient al- ternative to SIFT or SURF. In: IEEE Int. Conf. Comput. Vis. (ICCV). pp. 2564–2571. IEEE, Barcelona, Spain (Nov 2011).https://doi.org/10. 1109/iccv.2011.6126544

work page arXiv 2011

[49] [49]

In: IEEE Conf

Sarlin,P.E.,DeTone,D.,Malisiewicz,T.,Rabinovich,A.:SuperGlue:Learn- ing feature matching with graph neural networks. In: IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 4937–4946. IEEE, Seattle, WA, USA (Jun 2020).https://doi.org/10.1109/CVPR42600.2020. 00499

work page doi:10.1109/cvpr42600.2020 2020

[50] [50]

In: IEEE Conf

Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: A unified embedding for face recognition and clustering. In: IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 815–823. IEEE, Boston, MA, USA (Jun 2015). https://doi.org/10.1109/cvpr.2015.7298682 TCG-AR 19

work page doi:10.1109/cvpr.2015.7298682 2015

[51] [51]

Scrydex: Scrydex: TCG API and toolkit for pokémon, magic, lorcana, and yu-gi-oh! Web site,https://scrydex.com(2025)

2025

[52] [52]

Sohn, K.: Improved deep metric learning with multi-class N-pair loss objec- tive. In: Adv. Neural Inf. Process. Syst. (NeurIPS). vol. 29, pp. 1857–1865. Curran Assoc. Inc., Barcelona, Spain (Dec 2016)

2016

[53] [53]

Somers, V.: Person Re-Identification and its Application to Multi-Object Tracking. Ph.D. thesis, Cathol. Univ. Louvain, Belg. (May 1995)

1995

[54] [54]

In: IEEE/CVF Winter Conf

Somers, V., De Vleeschouwer, C., Alahi, A.: Body part-based representation learning for occluded person re-identification. In: IEEE/CVF Winter Conf. Appl. Comput. Vis. (WACV). pp. 1613–1623. IEEE, Waikoloa, HI, USA (Jan 2023).https://doi.org/10.1109/wacv56688.2023.00166

work page doi:10.1109/wacv56688.2023.00166 2023

[55] [55]

statista.com/statistics/672617/most-valuable-pokemon-trading- cards/(2025)

Statista: Pokémon trading card game - statistics & facts.https://www. statista.com/statistics/672617/most-valuable-pokemon-trading- cards/(2025)

2025

[56] [56]

Zee- shan Zia, and Quoc-Huy Tran

Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: Detector-free local feature matching with transformers. In: IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 8918–8927. IEEE, Nashville, TN, USA (Jun 2021).https://doi.org/10.1109/cvpr46437.2021.00881

work page doi:10.1109/cvpr46437.2021.00881 2021

[57] [57]

youtube.com/watch?v=64-LfbggqKI

SuperZouloux:LerêvedetouslesfansdeYu-Gi-Oh!YouTubevideo,https: //www.youtube.com/watch?v=64-LfbggqKI(Oct 2005),https://www. youtube.com/watch?v=64-LfbggqKI

2005

[58] [58]

PMLR, Long Beach, CA, USA (Jun 2019)

Tan, M., Le, Q.V.: EfficientNet: Rethinking model scaling for convolutional neuralnetworks.In:Int.Conf.Mach.Learn.(ICML).vol.97,pp.6105–6114. PMLR, Long Beach, CA, USA (Jun 2019)

2019

[59] [59]

In: IEEE Conf

Tan, M., Pang, R., Le, Q.V.: EfficientDet: Scalable and efficient object de- tection. In: IEEE Int. Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 10778–10787. Seattle, WA, USA (Jun 2020).https://doi.org/10.1109/ CVPR42600.2020.01079

work page arXiv 2020

[60] [60]

https://worlds.2025.pokemon.com/en-us/championships/(2025)

The Pokémon Company International: 2025 pokémon world championships. https://worlds.2025.pokemon.com/en-us/championships/(2025)

2025

[61] [61]

The Pokémon Company International: Media alert: Pokémon unveils new pokémon trading card game: 30th celebration expansion, commemorat- ing 30 years.https://press.pokemon.com/en/releases/MEDIA-ALERT- Pokemon - Unveils - New - Pokemon - Trading - Card - Game - 30th - Celebra (2026)

2026

[62] [62]

Thomas, B.H.: A survey of visual, mixed, and augmented reality gaming. Comput. Entertain.10(1), 1–33 (Oct 2012).https://doi.org/10.1145/ 2381876.2381879

work page arXiv 2012

[63] [63]

IEEE Trans

Tian, Z., Shen, C., Chen, H., He, T.: FCOS: A simple and strong anchor-free object detector. IEEE Trans. Pattern Anal. Mach. Intell.44(4), 1922–1933 (Apr 2020).https://doi.org/10.1109/tpami.2020.3032166

work page doi:10.1109/tpami.2020.3032166 1922

[64] [64]

In: IEEE/RSJ Int

Tobin,J.,Fong,R.,Ray,A.,Schneider,J.,Zaremba,W.,Abbeel,P.:Domain randomization for transferring deep neural networks from simulation to the real world. In: IEEE/RSJ Int. Conf. Intell. Robot. Syst. (IROS). pp. 23–30. IEEE, Vancouver, Can. (Sept 2017).https://doi.org/10.1109/iros. 2017.8202133 20 A. Cioppa

work page doi:10.1109/iros 2017

[65] [65]

In: IEEE/CVF Conf

Tremblay, J., Prakash, A., Acuna, D., Brophy, M., Jampani, V., Anil, C., To, T., Cameracci, E., Boochoon, S., Birchfield, S.: Training deep networks with synthetic data: Bridging the reality gap by domain randomization. In: IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Work. (CVPRW). pp. 1082–10828. IEEE, Salt Lake City, UT, USA (Jun 2018).https://doi. or...

work page doi:10.1109/cvprw.2018.00143 2018

[66] [66]

In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Wang, X., Han, X., Huang, W., Dong, D., Scott, M.R.: Multi-similarity loss with general pair weighting for deep metric learning. In: IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR). pp. 5017–5025. IEEE, Long Beach, CA, USA (Jun 2019).https://doi.org/10.1109/cvpr.2019. 00516

work page doi:10.1109/cvpr.2019 2019

[67] [67]

Woods, D.L., Wyma, J.M., Yund, E.W., Herron, T.J., Reed, B.: Factors influencing the latency of simple reaction time. Front. Hum. Neurosci.9 (Mar 2015).https://doi.org/10.3389/fnhum.2015.00131

work page doi:10.3389/fnhum.2015.00131 2015

[68] [68]

In: IEEE Int

Wu, C.Y., Manmatha, R., Smola, A.J., Krahenbuhl, P.: Sampling matters in deep embedding learning. In: IEEE Int. Conf. Comput. Vis. (ICCV). pp. 2859–2867. IEEE, Venice, Italy (Oct 2017).https://doi.org/10.1109/ iccv.2017.309

2017

[69] [69]

arXivabs/2109.11861(2021).https://doi.org/10.48550/ arXiv.2109.11861

Wzorek, P., Kryjak, T.: Training dataset generation for bridge game reg- istration. arXivabs/2109.11861(2021).https://doi.org/10.48550/ arXiv.2109.11861

work page arXiv 2021

[70] [70]

In: IEEE/CVF Int

Xie, X., Cheng, G., Wang, J., Yao, X., Han, J.: Oriented r-CNN for object detection. In: IEEE/CVF Int. Conf. Comput. Vis. (ICCV). pp. 3500–3509. IEEE,Montréal,Can. (Oct2021).https://doi.org/10.1109/iccv48922. 2021.00350

work page doi:10.1109/iccv48922 2021

[71] [71]

IEEE Trans

Xu, Y., Fu, M., Wang, Q., Wang, Y., Chen, K., Xia, G.S., Bai, X.: Glid- ing vertex on the horizontal bounding box for multi-oriented object detec- tion. IEEE Trans. Pattern Anal. Mach. Intell.43(4), 1452–1459 (Apr 2021). https://doi.org/10.1109/tpami.2020.2974745

work page doi:10.1109/tpami.2020.2974745 2021

[72] [72]

In: AAAI Conf

Yang, X., Yan, J., Feng, Z., He, T.: R3Det: Refined single-stage detec- tor with feature refinement for rotating object. In: AAAI Conf. Artif. In- tell. vol. 35, pp. 3163–3171. Assoc. Adv. Artif. Intell. (AAAI) (May 2021). https://doi.org/10.1609/aaai.v35i4.16426

work page doi:10.1609/aaai.v35i4.16426 2021

[73] [73]

Ye, M., Chen, S., Li, C., Zheng, W.S., Crandall, D., Du, B.: Transformer for object re-identification: A survey. Int. J. Comput. Vis.133(5), 2410–2440 (Nov 2024).https://doi.org/10.1007/s11263-024-02284-4

work page doi:10.1007/s11263-024-02284-4 2024

[74] [74]

Yi, D., Lei, Z., Liao, S., Li, S.Z.: Deep metric learning for person re- identification. In: Int. Conf. Pattern Recognit. (ICPR). pp. 34–39. IEEE, Stockholm, Sweden (Aug 2014).https://doi.org/10.1109/icpr.2014. 16

work page doi:10.1109/icpr.2014 2014

[75] [75]

Yang, Extension of a complete monotonicity theorem with ap- plications, arXiv:2507.10954, 2025,https://doi.org/10.48550/arXiv

Zhai, A., Wu, H.Y.: Classification is a strong baseline for deep metric learn- ing. arXivabs/1811.12649(2018).https://doi.org/10.48550/arXiv. 1811.12649

work page internal anchor Pith review doi:10.48550/arxiv 2018

[76] [76]

In: IEEE Conf

Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: IEEE Conf. Comput. Vis. Pattern TCG-AR 21 Recognit. (CVPR). pp. 5122–5130. IEEE, Honolulu, HI, USA (Jul 2017). https://doi.org/10.1109/cvpr.2017.544 22 A. Cioppa 7 Supplementary Material This appendix presents more information about the interfac...

work page doi:10.1109/cvpr.2017.544 2017

[77] [77]

this energy powers that Pokémon,

A linear warm-up over500iterations (ratio1/3) precedes a step schedule that divides the learning rate by10at epochs2and4. Training runs for5epochs with a batch size of2images on one GPU (∼9,000iterations/epoch over the18,000 training images); a checkpoint is saved every epoch. The RPN uses anchors of scale8with aspect ratios{0.5,1,2}over strides{4,8,16,32...

2048