pith. sign in

arxiv: 1907.05315 · v1 · pith:N462GFOXnew · submitted 2019-07-11 · 💻 cs.CV

Graph Neural Based End-to-end Data Association Framework for Online Multiple-Object Tracking

Pith reviewed 2026-05-24 23:05 UTC · model grok-4.3

classification 💻 cs.CV
keywords multiple object trackingdata associationgraph neural networksbipartite matchingend-to-end learningaffinity learningonline trackingmotion cues
0
0 comments X

The pith

A graph neural network can solve maximum weighted bipartite matching for data association in online multiple object tracking directly from detections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an end-to-end neural framework that treats frame-by-frame data association in multiple object tracking as a maximum weighted bipartite matching problem. An affinity module computes similarities between objects using both appearance and motion features, which then become edge weights on a bipartite graph. A graph neural network optimization module solves this matching problem while adapting to different numbers of detections. Training uses a multi-level matrix loss with assembled supervision so all components learn together. The result is a tracker that requires less manual tuning and shows stronger performance on standard MOT benchmarks.

Core claim

The central claim is that an end-to-end network with an affinity learning module and a graph neural network optimization module can resolve the data association problem in online MOT by learning to solve the maximum weighted bipartite matching task, allowing the entire system to co-adapt during training and handle varying object cardinalities with good scalability.

What carries the argument

The graph neural network optimization module that takes computed affinities as edge weights and solves the maximum weighted bipartite matching problem while adapting to varying numbers of detections.

If this is right

  • All modules in the tracker co-adapt during joint training, improving overall model adaptiveness.
  • The system handles association problems with changing numbers of detections without fixed-size assumptions.
  • Parameter tuning effort decreases because the network learns the matching process directly.
  • The approach integrates appearance and motion cues into a single trainable pipeline for online tracking.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same graph neural network approach to matching could apply to other vision tasks that reduce to bipartite assignment.
  • Replacing traditional solvers with learned optimization might lower computational overhead in real-time systems.
  • End-to-end training of association could allow trackers to adjust automatically to new camera setups or object types.

Load-bearing premise

The graph neural network can reliably approximate optimal solutions to the maximum weighted bipartite matching problem for different numbers of objects without post-processing or separate solvers.

What would settle it

Compare the assignments produced by the trained graph neural network against exact solutions from a standard bipartite matching solver on sequences with known ground-truth associations and varying object counts; systematic mismatches would falsify the claim.

Figures

Figures reproduced from arXiv: 1907.05315 by Peizhao Li, Xiantong Zhen, Xiaolong Jiang, Yanjing Li.

Figure 1
Figure 1. Figure 1: The pipeline of the proposed framework. It consists of the Siamese Network for affinity computation and the Graph Neural [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The pipeline of the proposed optimization module based [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of tracking results on MOT17 benchmark [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

In this work, we present an end-to-end framework to settle data association in online Multiple-Object Tracking (MOT). Given detection responses, we formulate the frame-by-frame data association as Maximum Weighted Bipartite Matching problem, whose solution is learned using a neural network. The network incorporates an affinity learning module, wherein both appearance and motion cues are investigated to encode object feature representation and compute pairwise affinities. Employing the computed affinities as edge weights, the following matching problem on a bipartite graph is resolved by the optimization module, which leverages a graph neural network to adapt with the varying cardinalities of the association problem and solve the combinatorial hardness with favorable scalability and compatibility. To facilitate effective training of the proposed tracking network, we design a multi-level matrix loss in conjunction with the assembled supervision methodology. Being trained end-to-end, all modules in the tracker can co-adapt and co-operate collaboratively, resulting in improved model adaptiveness and less parameter-tuning efforts. Experiment results on the MOT benchmarks demonstrate the efficacy of the proposed approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper proposes an end-to-end neural framework for online multiple-object tracking that formulates frame-by-frame data association as a maximum weighted bipartite matching problem. An affinity learning module encodes appearance and motion cues to produce edge weights; these are fed to a graph neural network optimization module that is claimed to adapt to varying detection cardinalities and solve the combinatorial problem directly. Training uses a multi-level matrix loss with assembled supervision, allowing all modules to co-adapt.

Significance. If the GNN optimization module produces valid, high-quality matchings for arbitrary cardinalities without external solvers or post-processing, the work would advance fully differentiable MOT pipelines and reduce reliance on hand-tuned components. The multi-level loss and end-to-end training are presented as enabling better adaptability on MOT benchmarks.

major comments (1)
  1. [Abstract / Optimization Module] Abstract (and optimization module description): the central claim that the GNN 'leverages a graph neural network to adapt with the varying cardinalities of the association problem and solve the combinatorial hardness' without post-hoc adjustments is load-bearing for the 'end-to-end' and 'no separate solvers' assertions. Standard message-passing GNNs on bipartite graphs output soft scores; converting them to feasible permutation matrices for unseen cardinalities typically requires argmax, Sinkhorn normalization, or an external solver such as Hungarian. The multi-level matrix loss supervises toward ground-truth assignments only during training and does not guarantee feasible or optimal outputs at inference. Concrete evidence (architecture diagram, inference procedure, or ablation removing any post-processing) is required to substantiate the claim.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. Below we respond point-by-point to the major comment, offering clarification on the optimization module while committing to revisions that strengthen the presentation of our claims.

read point-by-point responses
  1. Referee: [Abstract / Optimization Module] Abstract (and optimization module description): the central claim that the GNN 'leverages a graph neural network to adapt with the varying cardinalities of the association problem and solve the combinatorial hardness' without post-hoc adjustments is load-bearing for the 'end-to-end' and 'no separate solvers' assertions. Standard message-passing GNNs on bipartite graphs output soft scores; converting them to feasible permutation matrices for unseen cardinalities typically requires argmax, Sinkhorn normalization, or an external solver such as Hungarian. The multi-level matrix loss supervises toward ground-truth assignments only during training and does not guarantee feasible or optimal outputs at inference. Concrete evidence (architecture diagram, inference procedure, or ablation removing any post-processing) is required to substantiate the claim.

    Authors: We appreciate the referee highlighting the need for precision on this central aspect of the framework. The optimization module constructs a bipartite graph whose nodes correspond to detections in the current and previous frames (thus naturally accommodating arbitrary cardinalities) and whose edges are initialized with affinities from the appearance-motion module. Successive GNN layers perform message passing that refines these affinities into an output matrix whose entries directly encode assignment decisions. The multi-level matrix loss, applied with assembled supervision, explicitly penalizes deviations from the ground-truth assignment matrix at multiple resolutions, encouraging the network to produce outputs that are already close to valid permutation matrices. At inference the GNN output is used to recover the matching by selecting the highest-scoring entries while enforcing the one-to-one constraint implicit in the learned representation; no external combinatorial solver is invoked. This design keeps the entire pipeline differentiable. We nevertheless recognize that the manuscript would benefit from greater transparency. In the revision we will add an architecture diagram of the optimization module, a step-by-step description of the inference procedure that converts the GNN output into a feasible matching for unseen cardinalities, and an ablation that isolates the contribution of any minimal post-processing steps. revision: yes

Circularity Check

0 steps flagged

No circularity: framework trained end-to-end on external data with no self-definitional reductions

full rationale

The paper formulates data association as a maximum weighted bipartite matching problem and learns its solution via a neural network with affinity and optimization modules. All components are trained on labeled tracking data using a multi-level matrix loss; outputs are not equivalent to inputs by construction, nor are any predictions statistically forced from fitted subsets. No self-citations, uniqueness theorems, or ansatzes imported from prior author work appear in the provided text. The derivation chain is therefore self-contained against external benchmarks and does not reduce to renaming or tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review limits visibility into free parameters or invented entities; the central claim rests on the domain assumption that a GNN can learn to solve the combinatorial matching task.

axioms (1)
  • domain assumption A graph neural network can adapt to varying cardinalities and solve the maximum weighted bipartite matching problem with favorable scalability.
    Invoked in the abstract when describing the optimization module.

pith-pipeline@v0.9.0 · 5714 in / 1162 out tokens · 40728 ms · 2026-05-24T23:05:22.207549+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

95 extracted references · 95 canonical work pages · 20 internal anchors

  1. [1]

    Alahi, K

    A. Alahi, K. Goel, V . Ramanathan, A. Robicquet, L. Fei-Fei, and S. Savarese. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 961–971, 2016

  2. [2]

    S. Avidan. Ensemble tracking. IEEE transactions on pattern analysis and machine intelligence, 29(2), 2007

  3. [3]

    Bae and K.-J

    S.-H. Bae and K.-J. Yoon. Confidence-based data associa- tion and discriminative deep appearance learning for robust online multi-object tracking. IEEE transactions on pattern analysis and machine intelligence, 40(3):595–610, 2018

  4. [4]

    Balas and M

    E. Balas and M. W. Padberg. Set partitioning: A survey. SIAM review, 18(4):710–760, 1976

  5. [5]

    P. W. Battaglia, J. B. Hamrick, V . Bapst, A. Sanchez- Gonzalez, V . Zambaldi, M. Malinowski, A. Tacchetti, D. Ra- poso, A. Santoro, R. Faulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018

  6. [6]

    Beyer, S

    L. Beyer, S. Breuers, V . Kurin, and B. Leibe. Towards a principled integration of multi-camera re-identification and tracking through optimal bayes filters. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on, pages 1444–1453. IEEE, 2017

  7. [7]

    Bochinski, V

    E. Bochinski, V . Eiselein, and T. Sikora. High-speed tracking-by-detection without using image information. In Advanced Video and Signal Based Surveillance (AVSS), 2017 14th IEEE International Conference on , pages 1–6. IEEE, 2017

  8. [8]

    Brendel, M

    W. Brendel, M. Amer, and S. Todorovic. Multiobject track- ing as maximum weight independent set. InComputer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1273–1280. IEEE, 2011

  9. [9]

    Brendel and S

    W. Brendel and S. Todorovic. Learning spatiotemporal graphs of human activities. InComputer vision (ICCV), 2011 IEEE international conference on , pages 778–785. IEEE, 2011

  10. [10]

    M. M. Bronstein, J. Bruna, Y . LeCun, A. Szlam, and P. Van- dergheynst. Geometric deep learning: going beyond eu- clidean data. IEEE Signal Processing Magazine, 34(4):18– 42, 2017

  11. [11]

    Cai and G

    Y . Cai and G. Medioni. Exploring context information for inter-camera multiple target tracking. In Applications of Computer Vision (WACV), 2014 IEEE Winter Conference on, pages 761–768. IEEE, 2014

  12. [12]

    X. Cao, X. Jiang, X. Li, and P. Yan. Correlation-based track- ing of multiple targets with hierarchical layered structure. IEEE transactions on cybernetics, 48(1):90–102, 2018

  13. [13]

    J. Chen, H. Sheng, Y . Zhang, and Z. Xiong. Enhancing de- tection model for multiple hypothesis tracking. In Conf. on Computer Vision and Pattern Recognition Workshops, pages 2143–2152, 2017

  14. [14]

    W. Choi. Near-online multi-target tracking with aggregated local flow descriptor. In Proceedings of the IEEE inter- national conference on computer vision , pages 3029–3037, 2015

  15. [15]

    Choi and S

    W. Choi and S. Savarese. A unified framework for multi- target tracking and collective activity recognition. In Eu- ropean Conference on Computer Vision , pages 215–230. Springer, 2012

  16. [16]

    Chopra, R

    S. Chopra, R. Hadsell, and Y . LeCun. Learning a similar- ity metric discriminatively, with application to face verifica- tion. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on , vol- ume 1, pages 539–546. IEEE, 2005

  17. [17]

    Q. Chu, W. Ouyang, H. Li, X. Wang, B. Liu, and N. Yu. Online multi-object tracking using cnn-based single ob- ject tracker with spatial-temporal attention mechanism. In 2017 IEEE International Conference on Computer Vision (ICCV).(Oct 2017), pages 4846–4855, 2017

  18. [19]

    R. T. Collins. Multitarget data association with higher-order motion models. In Computer Vision and Pattern Recogni- tion (CVPR), 2012 IEEE Conference on , pages 1744–1751. IEEE, 2012

  19. [20]

    H. Dai, E. B. Khalil, Y . Zhang, B. Dilkina, and L. Song. Learning combinatorial optimization algorithms over graphs. arXiv preprint arXiv:1704.01665, 2017

  20. [21]

    De Cao and T

    N. De Cao and T. Kipf. Molgan: An implicit gener- ative model for small molecular graphs. arXiv preprint arXiv:1805.11973, 2018

  21. [22]

    Dehghan, S

    A. Dehghan, S. Modiri Assari, and M. Shah. Gmmcp tracker: Globally optimal generalized maximum multi clique prob- lem for multiple object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 4091–4099, 2015

  22. [23]

    Dehghan, Y

    A. Dehghan, Y . Tian, P. H. Torr, and M. Shah. Target identity-aware network flow for online multiple target track- ing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1146–1154, 2015

  23. [24]

    M. Ding, J. Tang, and J. Zhang. Semi-supervised learning on graphs with generative adversarial nets. arXiv preprint arXiv:1809.00130, 2018

  24. [25]

    Dong and J

    X. Dong and J. Shen. Triplet loss in siamese network for object tracking. In Proceedings of the European Conference on Computer Vision (ECCV), pages 459–474, 2018

  25. [26]

    Eiselein, D

    V . Eiselein, D. Arp, M. P ¨atzold, and T. Sikora. Real-time multi-human tracking using a probability hypothesis density filter and multiple detectors. In Advanced Video and Signal- Based Surveillance (AVSS), 2012 IEEE Ninth International Conference on, pages 325–330. IEEE, 2012

  26. [27]

    Few-Shot Learning with Graph Neural Networks

    V . Garcia and J. Bruna. Few-shot learning with graph neural networks. arXiv preprint arXiv:1711.04043, 2017

  27. [29]

    Neural Message Passing for Quantum Chemistry

    J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for quantum chemistry. arXiv preprint arXiv:1704.01212, 2017

  28. [30]

    Goodfellow, Y

    I. Goodfellow, Y . Bengio, A. Courville, and Y . Bengio.Deep learning, volume 1. MIT press Cambridge, 2016

  29. [31]

    M. Gori, G. Monfardini, and F. Scarselli. A new model for learning in graph domains. In Neural Networks, 2005. IJCNN’05. Proceedings. 2005 IEEE International Joint Con- ference on, volume 2, pages 729–734. IEEE, 2005

  30. [32]

    A. He, C. Luo, X. Tian, and W. Zeng. A twofold siamese network for real-time object tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 4834–4843, 2018

  31. [33]

    Q. He, J. Wu, G. Yu, and C. Zhang. Sot for mot. arXiv preprint arXiv:1712.01059, 2017

  32. [34]

    W. Hu, X. Li, W. Luo, X. Zhang, S. Maybank, and Z. Zhang. Single and multiple object tracking using log-euclidean rie- mannian subspace and block-division appearance model. IEEE Transactions on Pattern Analysis and Machine Intel- ligence, 34(12):2420–2440, 2012

  33. [35]

    Huang, B

    C. Huang, B. Wu, and R. Nevatia. Robust object tracking by hierarchical association of detection responses. In European Conference on Computer Vision , pages 788–801. Springer, 2008

  34. [36]

    Javed, K

    O. Javed, K. Shafique, Z. Rasheed, and M. Shah. Mod- eling inter-camera space–time and appearance relationships for tracking across non-overlapping views. Computer Vision and Image Understanding, 109(2):146–162, 2008

  35. [37]

    Jiang, P

    X. Jiang, P. Li, X. Zhen, and X. Cao. Model-free tracking with deep appearance and motion features integration. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 101–110. IEEE, 2019

  36. [38]

    C. Kim, F. Li, and J. M. Rehg. Multi-object tracking with neural gating using bilinear lstm. In Proceedings of the Eu- ropean Conference on Computer Vision (ECCV), pages 200– 215, 2018

  37. [39]

    D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

  38. [40]

    T. N. Kipf and M. Welling. Semi-supervised classifica- tion with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016

  39. [41]

    H. W. Kuhn. The hungarian method for the assignment prob- lem. Naval research logistics quarterly, 2(1-2):83–97, 1955

  40. [42]

    C.-H. Kuo, C. Huang, and R. Nevatia. Multi-target track- ing by on-line learned discriminative appearance models. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 685–692. IEEE, 2010

  41. [43]

    Kutschbach, E

    T. Kutschbach, E. Bochinski, V . Eiselein, and T. Sikora. Sequential sensor fusion combining probability hypothesis density and kernelized correlation filters for multi-object tracking in video data. In2017 14th IEEE International Con- ference on Advanced Video and Signal Based Surveillance (AVSS), pages 1–5. IEEE, 2017

  42. [44]

    Leal-Taix ´e, C

    L. Leal-Taix ´e, C. Canton-Ferrer, and K. Schindler. Learn- ing by tracking: Siamese cnn for robust target association. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 33–40, 2016

  43. [45]

    MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking

    L. Leal-Taix ´e, A. Milan, I. Reid, S. Roth, and K. Schindler. Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942, 2015

  44. [46]

    Leal-Taix ´e, G

    L. Leal-Taix ´e, G. Pons-Moll, and B. Rosenhahn. Everybody needs somebody: Modeling social and grouping behavior on a linear programming multiple people tracker. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE Interna- tional Conference on, pages 120–127. IEEE, 2011

  45. [47]

    H. Li, Y . Li, and F. Porikli. Deeptrack: Learning discrimina- tive feature representations online for robust visual tracking. IEEE Transactions on Image Processing, 25(4):1834–1848, 2016

  46. [48]

    Y . Li, C. Huang, and R. Nevatia. Learning to associate: Hy- bridboosted multi-target tracker for crowded scene. 2009

  47. [49]

    MOT16: A Benchmark for Multi-Object Tracking

    A. Milan, L. Leal-Taix ´e, I. Reid, S. Roth, and K. Schindler. Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831, 2016

  48. [50]

    Milan, S

    A. Milan, S. H. Rezatofighi, A. R. Dick, I. D. Reid, and K. Schindler. Online multi-target tracking using recurrent neural networks. In AAAI, volume 2, page 4, 2017

  49. [51]

    Milan, S

    A. Milan, S. H. Rezatofighi, R. Garg, A. R. Dick, and I. D. Reid. Data-driven approximations to np-hard problems. In AAAI, pages 1453–1459, 2017

  50. [52]

    Milan, S

    A. Milan, S. Roth, and K. Schindler. Continuous energy min- imization for multitarget tracking. IEEE Trans. Pattern Anal. Mach. Intell., 36(1):58–72, 2014

  51. [53]

    Mitzel, E

    D. Mitzel, E. Horbert, A. Ess, and B. Leibe. Multi-person tracking with sparse detection and continuous segmentation. In European Conference on Computer Vision , pages 397–

  52. [54]

    Nowak, S

    A. Nowak, S. Villar, A. S. Bandeira, and J. Bruna. Revised note on learning quadratic assignment with graph neural net- works. In 2018 IEEE Data Science Workshop (DSW), pages 1–5. IEEE, 2018

  53. [55]

    End-to-End Tracking and Semantic Segmentation Using Recurrent Neural Networks

    P. Ondruska, J. Dequaire, D. Z. Wang, and I. Posner. End- to-end tracking and semantic segmentation using recurrent neural networks. arXiv preprint arXiv:1604.05091, 2016

  54. [56]

    Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks

    P. Ondruska and I. Posner. Deep tracking: Seeing be- yond seeing using recurrent neural networks. arXiv preprint arXiv:1602.00991, 2016

  55. [57]

    Pellegrini, A

    S. Pellegrini, A. Ess, and L. Van Gool. Improving data as- sociation by joint modeling of pedestrian trajectories and groupings. In European conference on computer vision , pages 452–465. Springer, 2010

  56. [58]

    Pirsiavash, D

    H. Pirsiavash, D. Ramanan, and C. C. Fowlkes. Globally- optimal greedy algorithms for tracking a variable num- ber of objects. In Computer Vision and Pattern Recogni- tion (CVPR), 2011 IEEE Conference on , pages 1201–1208. IEEE, 2011

  57. [59]

    Possegger, T

    H. Possegger, T. Mauthner, P. M. Roth, and H. Bischof. Oc- clusion geodesics for online multi-object tracking. In Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1306–1313, 2014

  58. [60]

    Qin and C

    Z. Qin and C. R. Shelton. Improving multi-target tracking via social grouping. In Computer Vision and Pattern Recog- nition (CVPR), 2012 IEEE Conference on, pages 1972–1978. IEEE, 2012

  59. [61]

    Discovering objects and their relations from entangled scene representations

    D. Raposo, A. Santoro, D. Barrett, R. Pascanu, T. Lilli- crap, and P. Battaglia. Discovering objects and their rela- tions from entangled scene representations. arXiv preprint arXiv:1702.05068, 2017

  60. [62]

    Features for Multi-Target Multi-Camera Tracking and Re-Identification

    E. Ristani and C. Tomasi. Features for multi-target multi-camera tracking and re-identification. arXiv preprint arXiv:1803.10859, 2018

  61. [63]

    Robicquet, A

    A. Robicquet, A. Sadeghian, A. Alahi, and S. Savarese. Learning social etiquette: Human trajectory understanding in crowded scenes. In European conference on computer vi- sion, pages 549–565. Springer, 2016

  62. [64]

    Tracking The Untrackable: Learning To Track Multiple Cues with Long-Term Dependencies

    A. Sadeghian, A. Alahi, and S. Savarese. Tracking the un- trackable: Learning to track multiple cues with long-term de- pendencies. arXiv preprint arXiv:1701.01909, 4(5):6, 2017

  63. [65]

    Sanchez-Matilla, F

    R. Sanchez-Matilla, F. Poiesi, and A. Cavallaro. Online multi-target tracking with strong and weak detections. In European Conference on Computer Vision , pages 84–99. Springer, 2016

  64. [66]

    Scarselli, M

    F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. Computational capabilities of graph neu- ral networks. IEEE Transactions on Neural Networks , 20(1):81–102, 2009

  65. [68]

    Scarselli, M

    F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20(1):61–80, 2009

  66. [69]

    Schulter, P

    S. Schulter, P. Vernaza, W. Choi, and M. Chandraker. Deep network flow for multi-object tracking. In Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on, pages 2730–2739. IEEE, 2017

  67. [70]

    Scovanner and M

    P. Scovanner and M. F. Tappen. Learning pedestrian dynam- ics from the real world. In Computer Vision, 2009 IEEE 12th International Conference on, pages 381–388. IEEE, 2009

  68. [71]

    G. Shu, A. Dehghan, O. Oreifej, E. Hand, and M. Shah. Part- based multiple-person tracking with partial occlusion han- dling. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1815–1821. IEEE, 2012

  69. [72]

    J. Son, M. Baek, M. Cho, and B. Han. Multi-object tracking with quadruplet convolutional neural networks. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5620–5629, 2017

  70. [73]

    Song, T.-Y

    B. Song, T.-Y . Jeng, E. Staudt, and A. K. Roy-Chowdhury. A stochastic graph evolution framework for robust multi- target tracking. In European Conference on Computer Vi- sion, pages 605–619. Springer, 2010

  71. [74]

    PeerNets: Exploiting Peer Wisdom Against Adversarial Attacks

    J. Svoboda, J. Masci, F. Monti, M. M. Bronstein, and L. Guibas. Peernets: Exploiting peer wisdom against ad- versarial attacks. arXiv preprint arXiv:1806.00088, 2018

  72. [75]

    S. Tang, B. Andres, M. Andriluka, and B. Schiele. Sub- graph decomposition for multi-target tracking. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5033–5041, 2015

  73. [76]

    S. Tang, B. Andres, M. Andriluka, and B. Schiele. Multi- person tracking by multicut and deep matching. InEuropean Conference on Computer Vision , pages 100–111. Springer, 2016

  74. [77]

    S. Tang, M. Andriluka, B. Andres, and B. Schiele. Multiple people tracking by lifted multicut and person reidentifica- tion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3539–3548, 2017

  75. [78]

    Graph Attention Networks

    P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y . Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017

  76. [79]

    X. Wan, J. Wang, Z. Kong, Q. Zhao, and S. Deng. Multi- object tracking using online metric learning with long short- term memory. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 788–792. IEEE, 2018

  77. [80]

    Wang and D.-Y

    N. Wang and D.-Y . Yeung. Learning a deep compact im- age representation for visual tracking. In Advances in neural information processing systems, pages 809–817, 2013

  78. [81]

    Q. Wang, Z. Teng, J. Xing, J. Gao, W. Hu, and S. May- bank. Learning attentions: residual attentional siamese net- work for high performance online visual tracking. In Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4854–4863, 2018

  79. [82]

    X. Wang, E. T ¨uretken, F. Fleuret, and P. Fua. Tracking inter- acting objects using intertwined flows. IEEE transactions on pattern analysis and machine intelligence , 38(EPFL- ARTICLE-210040):2312–2326, 2016

  80. [83]

    Yang and R

    B. Yang and R. Nevatia. Multi-target tracking by online learning of non-linear motion patterns and robust appear- ance models. In Computer Vision and Pattern Recogni- tion (CVPR), 2012 IEEE Conference on , pages 1918–1925. IEEE, 2012

Showing first 80 references.