Recognition: unknown
Boundary-Centric Active Learning for Temporal Action Segmentation
Pith reviewed 2026-05-10 11:30 UTC · model grok-4.3
The pith
Focusing supervision only on action boundaries yields stronger temporal segmentation with far fewer labels than standard active learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
B-ACT ranks unlabeled videos by predictive uncertainty, then inside each chosen video detects candidate transitions from current predictions and selects the top-K via a boundary score that combines neighborhood uncertainty, class ambiguity, and temporal predictive dynamics; labels are requested only for those boundary frames while training proceeds on boundary-centered clips, producing higher label efficiency than representative TAS active-learning baselines under sparse budgets.
What carries the argument
The boundary score, which fuses neighborhood uncertainty, class ambiguity, and temporal predictive dynamics to rank candidate transitions for labeling.
If this is right
- Label budgets can be cut while maintaining or improving edit and overlap F1 because supervision targets the locations where errors concentrate.
- Gains are largest on datasets where boundary placement dominates the evaluation metrics rather than interior frame accuracy.
- The two-stage hierarchy first reduces video-level redundancy then focuses frame-level effort on transitions.
- Training on boundary-centered clips preserves receptive-field context without requiring dense labels across entire videos.
Where Pith is reading between the lines
- The same boundary-first idea could transfer to other dense temporal tasks such as speech diarization or surgical video phase recognition.
- Combining the boundary score with existing semi-supervised consistency losses might shrink the required labeled set even further.
- Evaluating the method on longer, uncurated videos would test whether the two-stage selection still avoids redundant boundary queries at scale.
Load-bearing premise
The proposed boundary score reliably identifies the frames whose labels will improve the model the most, and labeling only those frames while training on centered clips supplies enough temporal context.
What would settle it
Running the same video-selection stage but replacing boundary selection with random or uniform frame sampling inside each video, then measuring whether segmentation F1 scores on GTEA, 50Salads, or Breakfast fall below the boundary-centric results at identical label budgets.
Figures
read the original abstract
Temporal action segmentation (TAS) demands dense temporal supervision, yet most of the annotation cost in untrimmed videos is spent identifying and refining action transitions, where segmentation errors concentrate and small temporal shifts disproportionately degrade segmental metrics. We introduce B-ACT, a clip-budgeted active learning framework that explicitly allocates supervision to these high-leverage boundary regions. B-ACT operates in a hierarchical two-stage loop: (i) it ranks and queries unlabeled videos using predictive uncertainty, and (ii) within each selected video, it detects candidate transitions from the current model predictions and selects the top-$K$ boundaries via a novel boundary score that fuses neighborhood uncertainty, class ambiguity, and temporal predictive dynamics. Importantly, our annotation protocol requests labels for only the boundary frames while still training on boundary-centered clips to exploit temporal context through the model's receptive field. Extensive experiments on GTEA, 50Salads, and Breakfast demonstrate that boundary-centric supervision delivers strong label efficiency and consistently surpasses representative TAS active learning baselines and prior state of the art under sparse budgets, with the largest gains on datasets where boundary placement dominates edit and overlap-based F1 scores.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents B-ACT, a clip-budgeted active learning framework for temporal action segmentation that prioritizes labeling boundary frames. It uses a hierarchical approach: ranking videos by predictive uncertainty and then selecting top-K boundaries in selected videos via a boundary score fusing neighborhood uncertainty, class ambiguity, and temporal predictive dynamics. Training is performed on boundary-centered clips using only the boundary labels, claiming better label efficiency and outperformance over TAS active learning baselines on GTEA, 50Salads, and Breakfast under sparse budgets.
Significance. Should the empirical results prove robust, this boundary-centric strategy could meaningfully advance label-efficient learning for dense video tasks like TAS, where annotation is costly and errors cluster at transitions. The multi-dataset evaluation and focus on segmental metrics are positive aspects. However, the absence of theoretical derivation for the boundary score and limited ablations temper the broader significance.
major comments (3)
- The definition of the boundary score as a fusion of three components lacks an ablation analysis to determine the individual contributions of neighborhood uncertainty, class ambiguity, and temporal predictive dynamics. This is critical as the central claim depends on this score effectively identifying frames that most improve the model.
- The results do not include statistical significance tests for the reported improvements, nor detailed comparisons with exact metric values against all baselines. Additionally, there is no analysis addressing whether boundary-only labeling suffices for learning action interiors on datasets like Breakfast with variable action lengths, which directly impacts the validity of the label efficiency claim.
- The choice of top-K boundaries per video and the fusion weights in the boundary score are free parameters without sensitivity analysis, potentially affecting the reproducibility and generality of the reported gains.
minor comments (2)
- The abstract mentions 'prior state of the art' but does not specify which methods are included in the comparisons.
- Ensure all figures clearly label the axes and include error bars if applicable for the performance curves.
Simulated Author's Rebuttal
We thank the referee for the constructive and insightful comments on our manuscript. We appreciate the recognition of the potential impact of our boundary-centric active learning strategy for label-efficient temporal action segmentation. We address each major comment point by point below, indicating the revisions we will incorporate to strengthen the paper.
read point-by-point responses
-
Referee: The definition of the boundary score as a fusion of three components lacks an ablation analysis to determine the individual contributions of neighborhood uncertainty, class ambiguity, and temporal predictive dynamics. This is critical as the central claim depends on this score effectively identifying frames that most improve the model.
Authors: We agree that an ablation analysis is essential to substantiate the boundary score design. In the revised manuscript, we will add a dedicated ablation study evaluating each component (neighborhood uncertainty, class ambiguity, and temporal predictive dynamics) in isolation as well as in all combinations. This will report segmental metrics on GTEA, 50Salads, and Breakfast under identical budgets, highlighting the synergistic gains from the full fusion. revision: yes
-
Referee: The results do not include statistical significance tests for the reported improvements, nor detailed comparisons with exact metric values against all baselines. Additionally, there is no analysis addressing whether boundary-only labeling suffices for learning action interiors on datasets like Breakfast with variable action lengths, which directly impacts the validity of the label efficiency claim.
Authors: We acknowledge the validity of these concerns. We will add statistical significance tests (e.g., paired t-tests with p-values) for all reported improvements and expand the result tables to list exact metric values for every baseline and our method across all datasets. To directly examine boundary-only labeling, we will include a new analysis on Breakfast: performance on interior (non-boundary) frames when training exclusively with boundary labels versus full supervision, demonstrating how boundary-centered clips enable the model to learn interiors via temporal context despite variable action lengths. revision: yes
-
Referee: The choice of top-K boundaries per video and the fusion weights in the boundary score are free parameters without sensitivity analysis, potentially affecting the reproducibility and generality of the reported gains.
Authors: We agree that sensitivity analysis is important for reproducibility. In the revision, we will add experiments varying K (e.g., 1–10) and different fusion weight combinations (equal, grid-searched, and component-specific), showing that gains remain consistent across reasonable ranges. We will explicitly state the hyperparameters used in the main results and release code to support exact reproduction. revision: yes
Circularity Check
No significant circularity in the B-ACT framework derivation
full rationale
The paper defines a procedural two-stage active learning loop that ranks videos by predictive uncertainty and selects boundary frames inside each video via a composite boundary score fusing neighborhood uncertainty, class ambiguity, and temporal dynamics. All performance claims are supported by direct empirical comparisons on GTEA, 50Salads, and Breakfast against external baselines under fixed clip budgets. No equation or prediction is shown to be mathematically identical to a fitted parameter or self-citation input; the boundary score is an explicit design choice whose utility is measured rather than derived by construction from the evaluation data.
Axiom & Free-Parameter Ledger
free parameters (2)
- top-K boundaries per video
- boundary score fusion weights
axioms (2)
- domain assumption Segmentation errors in TAS concentrate at action boundaries and small temporal shifts disproportionately affect segmental metrics.
- domain assumption Predictive uncertainty is a reliable proxy for annotation value in both video selection and boundary ranking.
invented entities (1)
-
boundary score
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Local- izing moments of actions in untrimmed videos of infants with autism spectrum disorder,
H. I. Helvaci, C.-N. Chuah, S. Ozonoff, and S.-C. S. Cheung, “Local- izing moments of actions in untrimmed videos of infants with autism spectrum disorder,” in2024 IEEE International Conference on Image Processing (ICIP). IEEE, 2024, pp. 3841–3847
2024
-
[2]
A review of machine learning and deep learning for object detection, semantic segmentation, and human action recognition in machine and robotic vision,
N. Manakitsa, G. S. Maraslidis, L. Moysis, and G. F. Fragulis, “A review of machine learning and deep learning for object detection, semantic segmentation, and human action recognition in machine and robotic vision,”Technologies, vol. 12, no. 2, p. 15, 2024
2024
-
[3]
Mmta: Multi membership temporal attention for fine-grained stroke rehabilitation assessment,
H. I. Helvaci, J. P. Huber, J. Bae, and S.-c. S. Cheung, “Mmta: Multi membership temporal attention for fine-grained stroke rehabilitation assessment,”arXiv preprint arXiv:2603.00878, 2026. 13
-
[4]
Temporal action segmentation: An analysis of modern techniques,
G. Ding, F. Sener, and A. Yao, “Temporal action segmentation: An analysis of modern techniques,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 2, pp. 1011–1030, 2023
2023
-
[5]
Iterative contrast-classify for semi-supervised temporal action segmentation,
D. Singhania, R. Rahaman, and A. Yao, “Iterative contrast-classify for semi-supervised temporal action segmentation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 2262–2270
2022
-
[6]
Leveraging action affinity and continuity for semi- supervised temporal action segmentation,
G. Ding and A. Yao, “Leveraging action affinity and continuity for semi- supervised temporal action segmentation,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 17–32
2022
-
[7]
Self- supervised learning for semi-supervised temporal action proposal,
X. Wang, S. Zhang, Z. Qing, Y . Shao, C. Gao, and N. Sang, “Self- supervised learning for semi-supervised temporal action proposal,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1905–1914
2021
-
[8]
D3tw: Discriminative differentiable dynamic time warping for weakly supervised action alignment and segmentation,
C.-Y . Chang, D.-A. Huang, Y . Sui, L. Fei-Fei, and J. C. Niebles, “D3tw: Discriminative differentiable dynamic time warping for weakly supervised action alignment and segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 3546–3555
2019
-
[9]
Learning discriminative prototypes with dynamic time warping,
X. Chang, F. Tung, and G. Mori, “Learning discriminative prototypes with dynamic time warping,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2021, pp. 8395–8404
2021
-
[10]
Weakly-supervised action segmentation with itera- tive soft boundary assignment,
L. Ding and C. Xu, “Weakly-supervised action segmentation with itera- tive soft boundary assignment,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6508–6516
2018
-
[11]
Connectionist temporal modeling for weakly supervised action labeling,
D.-A. Huang, L. Fei-Fei, and J. C. Niebles, “Connectionist temporal modeling for weakly supervised action labeling,” inEuropean confer- ence on computer Vision. Springer, 2016, pp. 137–153
2016
-
[12]
Weakly supervised learning of actions from transcripts,
H. Kuehne, A. Richard, and J. Gall, “Weakly supervised learning of actions from transcripts,”Computer Vision and Image Understanding, vol. 163, pp. 78–89, 2017
2017
-
[13]
Timestamp-supervised action segmentation with graph convolutional networks,
H. Khan, S. Haresh, A. Ahmed, S. Siddiqui, A. Konin, M. Z. Zia, and Q.-H. Tran, “Timestamp-supervised action segmentation with graph convolutional networks,” in2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 10 619–10 626
2022
-
[14]
Temporal action segmentation from timestamp supervision,
Z. Li, Y . Abu Farha, and J. Gall, “Temporal action segmentation from timestamp supervision,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8365–8374
2021
-
[15]
Reducing the label bias for timestamp supervised temporal action segmentation,
K. Liu, Y . Li, S. Liu, C. Tan, and Z. Shao, “Reducing the label bias for timestamp supervised temporal action segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6503–6513
2023
-
[16]
A generalized and robust framework for timestamp supervision in temporal action segmentation,
R. Rahaman, D. Singhania, A. Thiery, and A. Yao, “A generalized and robust framework for timestamp supervision in temporal action segmentation,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 279–296
2022
-
[17]
A survey of deep active learning,
P. Ren, Y . Xiao, X. Chang, P.-Y . Huang, Z. Li, B. B. Gupta, X. Chen, and X. Wang, “A survey of deep active learning,”ACM computing surveys (CSUR), vol. 54, no. 9, pp. 1–40, 2021
2021
-
[18]
Are all frames equal? active sparse labeling for video action detection,
A. Rana and Y . Rawat, “Are all frames equal? active sparse labeling for video action detection,”Advances in Neural Information Processing Systems, vol. 35, pp. 14 358–14 373, 2022
2022
-
[19]
Hybrid active learning via deep clustering for video action detection,
A. J. Rana and Y . S. Rawat, “Hybrid active learning via deep clustering for video action detection,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 867–18 877
2023
-
[20]
A new efficient hybrid technique for human action recognition using 2d conv-rbm and lstm with optimized frame selection,
M. Joudaki, M. Imani, and H. R. Arabnia, “A new efficient hybrid technique for human action recognition using 2d conv-rbm and lstm with optimized frame selection,”Technologies, vol. 13, no. 2, p. 53, 2025
2025
-
[21]
Boundary-aware cascade networks for temporal action segmentation,
Z. Wang, Z. Gao, L. Wang, Z. Li, and G. Wu, “Boundary-aware cascade networks for temporal action segmentation,” inEuropean Conference on Computer Vision. Springer, 2020, pp. 34–51
2020
-
[22]
Dir-as: Decoupling individual identifica- tion and temporal reasoning for action segmentation,
P. Wang and H. Ling, “Dir-as: Decoupling individual identifica- tion and temporal reasoning for action segmentation,”arXiv preprint arXiv:2304.02110, 2023
-
[23]
Uncertainty-aware representation learning for action segmentation
L. Chen, M. Li, Y . Duan, J. Zhou, and J. Lu, “Uncertainty-aware representation learning for action segmentation.” inIJCAI, vol. 2, 2022, p. 6
2022
-
[24]
Faster diffusion action segmentation,
S. Wang, S. Wang, M. Li, D. Yang, H. Kuang, Z. Qian, and L. Zhang, “Faster diffusion action segmentation,”arXiv preprint arXiv:2408.02024, 2024
-
[25]
Ms-tcn: Multi-stage temporal convolutional network for action segmentation,
Y . A. Farha and J. Gall, “Ms-tcn: Multi-stage temporal convolutional network for action segmentation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3575– 3584
2019
-
[26]
Learning to recognize objects in egocentric activities,
A. Fathi, X. Ren, and J. M. Rehg, “Learning to recognize objects in egocentric activities,” inCVPR 2011. IEEE, 2011, pp. 3281–3288
2011
-
[27]
The language of actions: Recov- ering the syntax and semantics of goal-directed human activities,
H. Kuehne, A. Arslan, and T. Serre, “The language of actions: Recov- ering the syntax and semantics of goal-directed human activities,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 780–787
2014
-
[28]
Combining embedded accelerometers with computer vision for recognizing food preparation activities,
S. Stein and S. J. McKenna, “Combining embedded accelerometers with computer vision for recognizing food preparation activities,” inProceed- ings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, 2013, pp. 729–738
2013
-
[29]
Weakly supervised action learning with rnn based fine-to-coarse modeling,
A. Richard, H. Kuehne, and J. Gall, “Weakly supervised action learning with rnn based fine-to-coarse modeling,” inProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 754– 763
2017
-
[30]
A multi- stream bi-directional recurrent neural network for fine-grained action detection,
B. Singh, T. K. Marks, M. Jones, O. Tuzel, and M. Shao, “A multi- stream bi-directional recurrent neural network for fine-grained action detection,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1961–1970
2016
-
[31]
Tempo- ral convolutional networks for action segmentation and detection,
C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager, “Tempo- ral convolutional networks for action segmentation and detection,” in proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 156–165
2017
-
[32]
Asformer: Transformer for action segmentation
F. Yi, H. Wen, and T. Jiang, “Asformer: Transformer for action segmen- tation,”arXiv preprint arXiv:2110.08568, 2021
-
[33]
Actionformer: Localizing moments of actions with transformers,
C.-L. Zhang, J. Wu, and Y . Li, “Actionformer: Localizing moments of actions with transformers,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 492–510
2022
-
[34]
Improving action segmentation via graph-based temporal reasoning,
Y . Huang, Y . Sugano, and Y . Sato, “Improving action segmentation via graph-based temporal reasoning,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 14 024–14 034
2020
-
[35]
Semantic2graph: graph-based multi-modal feature fusion for action segmentation in videos,
J. Zhang, P.-H. Tsai, and M.-H. Tsai, “Semantic2graph: graph-based multi-modal feature fusion for action segmentation in videos,”arXiv preprint arXiv:2209.05653, 2022
-
[36]
Diffusion action segmentation,
D. Liu, Q. Li, A.-D. Dinh, T. Jiang, M. Shah, and C. Xu, “Diffusion action segmentation,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 10 139–10 149
2023
-
[37]
Actfusion: a unified diffusion model for action segmentation and anticipation,
D. Gong, S. Kwak, and M. Cho, “Actfusion: a unified diffusion model for action segmentation and anticipation,”Advances in Neural Information Processing Systems, vol. 37, pp. 89 913–89 942, 2024
2024
-
[38]
Fact: Frame-action cross-attention temporal modeling for efficient action segmentation,
Z. Lu and E. Elhamifar, “Fact: Frame-action cross-attention temporal modeling for efficient action segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 18 175–18 185
2024
-
[39]
Refining action segmentation with hierarchical video representations,
H. Ahn and D. Lee, “Refining action segmentation with hierarchical video representations,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 16 302–16 310
2021
-
[40]
Attention is all you need,
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Advances in neural information processing systems, vol. 30, 2017
2017
-
[41]
Weakly supervised energy-based learning for action segmentation,
J. Li, P. Lei, and S. Todorovic, “Weakly supervised energy-based learning for action segmentation,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 6243–6251
2019
-
[42]
Leveraging triplet loss for unsupervised action segmentation,
E. Bueno-Benito, B. T. Vecino, and M. Dimiccoli, “Leveraging triplet loss for unsupervised action segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4922–4930
2023
-
[43]
Howto100m: Learning a text-video embedding by watching hundred million narrated video clips,
A. Miech, D. Zhukov, J.-B. Alayrac, M. Tapaswi, I. Laptev, and J. Sivic, “Howto100m: Learning a text-video embedding by watching hundred million narrated video clips,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 2630–2640
2019
-
[44]
A perceptual prediction framework for self supervised event segmentation,
S. N. Aakur and S. Sarkar, “A perceptual prediction framework for self supervised event segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1197–1206
2019
-
[45]
My view is the best view: Procedure learning from egocentric videos,
S. Bansal, C. Arora, and C. Jawahar, “My view is the best view: Procedure learning from egocentric videos,” inEuropean Conference on Computer Vision. Springer, 2022, pp. 657–675
2022
-
[46]
Stepformer: Self-supervised step discovery and localization in instructional videos,
N. Dvornik, I. Hadji, R. Zhang, K. G. Derpanis, R. P. Wildes, and A. D. Jepson, “Stepformer: Self-supervised step discovery and localization in instructional videos,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 18 952–18 961
2023
-
[47]
Self-supervised multi-task procedure learn- ing from instructional videos,
E. Elhamifar and D. Huynh, “Self-supervised multi-task procedure learn- ing from instructional videos,” inEuropean Conference on Computer Vision. Springer, 2020, pp. 557–573
2020
-
[48]
Unsupervised learning of action classes with continuous temporal embedding,
A. Kukleva, H. Kuehne, F. Sener, and J. Gall, “Unsupervised learning of action classes with continuous temporal embedding,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 12 066–12 074. 14
2019
-
[49]
Temporally-weighted hierarchical clustering for unsupervised action segmentation,
S. Sarfraz, N. Murray, V . Sharma, A. Diba, L. Van Gool, and R. Stiefel- hagen, “Temporally-weighted hierarchical clustering for unsupervised action segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 225–11 234
2021
-
[50]
Steps: Self- supervised key step extraction and localization from unlabeled procedu- ral videos,
A. Shah, B. Lundell, H. Sawhney, and R. Chellappa, “Steps: Self- supervised key step extraction and localization from unlabeled procedu- ral videos,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 10 375–10 387
2023
-
[51]
Efficient temporal action segmentation via boundary-aware query voting,
P. Wang, Y . Lin, E. Blasch, H. Linget al., “Efficient temporal action segmentation via boundary-aware query voting,”Advances in Neural Information Processing Systems, vol. 37, pp. 37 765–37 790, 2024
2024
-
[52]
The power of ensembles for active learning in image classification,
W. H. Beluch, T. Genewein, A. N ¨urnberger, and J. M. K ¨ohler, “The power of ensembles for active learning in image classification,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9368–9377
2018
-
[53]
Dropout as a bayesian approximation: Representing model uncertainty in deep learning,
Y . Gal and Z. Ghahramani, “Dropout as a bayesian approximation: Representing model uncertainty in deep learning,” ininternational conference on machine learning. PMLR, 2016, pp. 1050–1059
2016
-
[54]
Simple and scalable predictive uncertainty estimation using deep ensembles,
B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,”Advances in neural information processing systems, vol. 30, 2017
2017
-
[55]
What uncertainties do we need in bayesian deep learning for computer vision?
A. Kendall and Y . Gal, “What uncertainties do we need in bayesian deep learning for computer vision?” inAdvances in Neural Information Processing Systems, I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017
2017
-
[56]
Bayesian active learning for classification and preferenc e learning,
N. Houlsby, F. Husz ´ar, Z. Ghahramani, and M. Lengyel, “Bayesian active learning for classification and preference learning,”arXiv preprint arXiv:1112.5745, 2011
-
[57]
Deep bayesian active learning with image data,
Y . Gal, R. Islam, and Z. Ghahramani, “Deep bayesian active learning with image data,” inInternational conference on machine learning. PMLR, 2017, pp. 1183–1192
2017
-
[58]
Active Learning for Convolutional Neural Networks: A Core-Set Approach
O. Sener and S. Savarese, “Active learning for convolutional neural networks: A core-set approach,”arXiv preprint arXiv:1708.00489, 2017
work page Pith review arXiv 2017
-
[59]
Region-based active learning for efficient labeling in se- mantic segmentation,
T. Kasarla, G. Nagendar, G. M. Hegde, V . Balasubramanian, and C. Jawahar, “Region-based active learning for efficient labeling in se- mantic segmentation,” in2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2019, pp. 1109–1117
2019
-
[60]
Viewal: Active learning with viewpoint entropy for semantic segmentation,
Y . Siddiqui, J. Valentin, and M. Nießner, “Viewal: Active learning with viewpoint entropy for semantic segmentation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9433–9443
2020
-
[61]
Revisiting superpixels for active learning in semantic segmentation with realistic annotation costs,
L. Cai, X. Xu, J. H. Liew, and C. S. Foo, “Revisiting superpixels for active learning in semantic segmentation with realistic annotation costs,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 10 988–10 997
2021
-
[62]
How to measure un- certainty in uncertainty sampling for active learning,
V .-L. Nguyen, M. H. Shaker, and E. H ¨ullermeier, “How to measure un- certainty in uncertainty sampling for active learning,”Machine Learning, vol. 111, no. 1, pp. 89–122, 2022
2022
-
[63]
Two-stage active learning for efficient temporal action segmentation,
Y . Su and E. Elhamifar, “Two-stage active learning for efficient temporal action segmentation,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 161–183
2024
-
[64]
Drop-dtw: Aligning common signal between sequences while dropping outliers,
M. Dvornik, I. Hadji, K. G. Derpanis, A. Garg, and A. Jepson, “Drop-dtw: Aligning common signal between sequences while dropping outliers,”Advances in Neural Information Processing Systems, vol. 34, pp. 13 782–13 793, 2021
2021
-
[65]
Quo vadis, action recognition? a new model and the kinetics dataset,
J. Carreira and A. Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” inproceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6299–6308
2017
-
[66]
Active finetuning: Exploiting annotation budget in the pretraining-finetuning paradigm,
Y . Xie, H. Lu, J. Yan, X. Yang, M. Tomizuka, and W. Zhan, “Active finetuning: Exploiting annotation budget in the pretraining-finetuning paradigm,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 23 715–23 724
2023
-
[67]
A mathematical theory of communication,
C. E. Shannon, “A mathematical theory of communication,”Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948
1948
-
[68]
A. Kirsch and Y . Gal, “Powerevaluationbald: Efficient evaluation- oriented deep (bayesian) active learning with stochastic acquisition functions,”arXiv preprint arXiv:2101.03552, 2021
-
[69]
Divergence measures based on the shannon entropy,
J. Lin, “Divergence measures based on the shannon entropy,”IEEE Transactions on Information Theory, vol. 37, no. 1, pp. 145–151, 1991
1991
-
[70]
L. C. Freeman,Elementary Applied Statistics: For Students in Behav- ioral Science. New York: Wiley, 1965
1965
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.