Subjective and Objective Quality-of-Experience Evaluation Study for Live Video Streaming
Pith reviewed 2026-05-23 21:07 UTC · model grok-4.3
The pith
An end-to-end model predicts retrospective QoE for live video streaming from semantic and motion features alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By releasing the TaoLive QoE dataset and showing the shortcomings of prior metrics on it, the work establishes that an end-to-end model called Tao-QoE, which fuses multi-scale semantic features with optical flow-based motion features, can predict retrospective QoE scores for live streaming without any statistical QoS inputs and performs competitively on both the new live dataset and existing VoD datasets.
What carries the argument
Tao-QoE, an end-to-end neural model that integrates multi-scale semantic features and optical flow-based motion features to produce a retrospective QoE score.
If this is right
- Existing QoE models struggle to assess live video content accurately.
- The TaoLive QoE dataset supplies the first public subjective ratings for live-specific distortions such as frame skipping.
- Tao-QoE removes the need for statistical QoS features when estimating viewer experience.
- The same feature combination can be benchmarked on both live and on-demand video datasets.
Where Pith is reading between the lines
- The approach could support QoE-driven decisions in settings where network statistics are unavailable or costly to obtain.
- Retraining or fine-tuning the model on additional live sources might reveal how well the semantic-motion combination generalizes across platforms.
- A real-time version of the same architecture could be tested for use inside adaptive streaming controllers.
Load-bearing premise
The distortions and subjective ratings collected in the TaoLive QoE dataset accurately represent real-world live streaming conditions, and semantic plus motion features alone are sufficient to predict QoE without QoS inputs.
What would settle it
Apply Tao-QoE and competing QoS-dependent models to a fresh collection of live streaming videos accompanied by new human subjective ratings; if Tao-QoE's prediction error rises substantially above the QoS-based models, the central claim does not hold.
Figures
read the original abstract
In recent years, live video streaming has gained widespread popularity across various social media platforms. Quality of experience (QoE), which reflects end-users' satisfaction and overall experience, plays a critical role for media service providers to optimize large-scale live compression and transmission strategies to achieve perceptually optimal rate-distortion trade-off. Although many QoE metrics for video-on-demand (VoD) have been proposed, there remain significant challenges in developing QoE metrics for live video streaming. To bridge this gap, we conduct a comprehensive study of subjective and objective QoE evaluations for live video streaming. For the subjective QoE study, we introduce the first live video streaming QoE dataset, TaoLive QoE, which consists of $42$ source videos collected from real live broadcasts and $1,155$ corresponding distorted ones degraded due to a variety of streaming distortions, including conventional streaming distortions such as compression, stalling, as well as live streaming-specific distortions like frame skipping, variable frame rate, etc. Subsequently, a human study was conducted to derive subjective QoE scores of videos in the TaoLive QoE dataset. For the objective QoE study, we benchmark existing QoE models on the TaoLive QoE dataset as well as publicly available QoE datasets for VoD scenarios, highlighting that current models struggle to accurately assess video QoE, particularly for live content. Hence, we propose an end-to-end QoE evaluation model, Tao-QoE, which integrates multi-scale semantic features and optical flow-based motion features to predicting a retrospective QoE score, eliminating reliance on statistical quality of service (QoS) features.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the TaoLive QoE dataset (42 source videos from real broadcasts and 1,155 distorted versions covering compression, stalling, frame skipping, and variable frame rate), reports results from a human subjective study yielding QoE scores, benchmarks existing QoE models on this dataset and VoD datasets to show their limitations on live content, and proposes the Tao-QoE end-to-end model that combines multi-scale semantic features with optical flow-based motion features to predict retrospective QoE scores without any QoS inputs.
Significance. If the reported correlation metrics and ablations hold, the new live-specific dataset and the QoS-free model would address a clear gap in QoE assessment for live streaming, with the feature-based approach offering a potentially more generalizable alternative to QoS-dependent metrics.
minor comments (4)
- [§3] §3 (Dataset): provide the exact breakdown of distortion types and their frequencies across the 1,155 videos to allow readers to assess coverage of live-specific artifacts.
- [§7] §7 (Experiments): report the precise Pearson/Spearman correlation values, RMSE, and any statistical significance tests for Tao-QoE versus the benchmarked models on the held-out test set.
- [§6] §6 (Model): clarify the exact architecture for fusing multi-scale semantic features with optical-flow motion features (e.g., concatenation layer dimensions or attention mechanism) so the end-to-end claim can be reproduced.
- [Figure 4] Figure 4 (Ablation): add error bars or p-values to the bar plots showing contribution of semantic versus motion features.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment of the TaoLive QoE dataset and Tao-QoE model, and the recommendation of minor revision. No major comments were provided in the report.
Circularity Check
No significant circularity
full rationale
The paper introduces a new dataset (TaoLive QoE) with subjective scores collected independently, then trains and evaluates Tao-QoE on held-out test data using multi-scale semantic and optical-flow features. The central claim—that retrospective QoE can be predicted without QoS inputs—is supported by standard correlation metrics and feature ablations that do not reduce to self-definition, fitted-input renaming, or load-bearing self-citation. No equation or step equates the output to its inputs by construction; the model is an independent predictor whose performance is externally falsifiable on the reported test set.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Tao-QoE integrates multi-scale semantic features and optical flow-based motion features to predict retrospective QoE score, eliminating reliance on statistical QoS features
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Swin Transformer stages + PWC-Net + ResNet3D-18 + MLP fusion + FC regression
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
QoS-QoE Translation with Large Language Model
A new QoS-QoE Translation dataset is constructed from multimedia literature and fine-tuned LLMs demonstrate strong performance on bidirectional continuous and discrete QoS-QoE predictions.
Reference graph
Works this paper leans on
-
[1]
Hamilton, William A., Oliver Garretson, and Andruid Kerne. ”Streaming on twitch: fostering participatory communities of play within live mixed media.” Proceedings of the SIGCHI conference on human factors in computing systems. 2014
work page 2014
-
[2]
Akhtar, Zahid, et al. ”Why is multimedia quality of experience assessment a challenging problem?.” IEEE Access 7 (2019): 117897-117915
work page 2019
-
[4]
”Qualinet white paper on definitions of quality of experience.” (2013)
Brunnstr ¨om, Kjell, et al. ”Qualinet white paper on definitions of quality of experience.” (2013)
work page 2013
-
[5]
Mok, Ricky KP, Edmond WW Chan, and Rocky KC Chang. ”Measuring the quality of experience of HTTP video streaming.” 12th IFIP/IEEE international symposium on integrated network management (IM 2011) and workshops. IEEE, 2011
work page 2011
-
[6]
Xue, Jingteng, et al. ”Assessing quality of experience for adaptive HTTP video streaming.” 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW). IEEE, 2014
work page 2014
-
[7]
Yin, Xiaoqi, et al. ”A control-theoretic approach for dynamic adaptive video streaming over HTTP.” Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication. 2015
work page 2015
-
[8]
Manasa, K., and Sumohana S. Channappayya. ”An optical flow-based full reference video quality assessment algorithm.” IEEE Transactions on Image Processing 25.6 (2016): 2480-2492
work page 2016
-
[9]
Wu, Jinjian, et al. ”Quality assessment for video with degradation along salient trajectories.” IEEE Transactions on Multimedia 21.11 (2019): 2738-2749
work page 2019
-
[10]
Bampis, Christos G., Zhi Li, and Alan C. Bovik. ”Spatiotemporal feature integration and model fusion for full reference video quality assessment.” IEEE Transactions on Circuits and Systems for Video Technology 29.8 (2018): 2256-2270
work page 2018
-
[11]
Wang, Zhou, Ligang Lu, and Alan C. Bovik. ”Video quality assessment based on structural distortion measurement.” Signal processing: Image communication 19.2 (2004): 121-132
work page 2004
-
[12]
Seshadrinathan, Kalpana, and Alan Conrad Bovik. ”Motion tuned spatio- temporal quality assessment of natural videos.” IEEE transactions on image processing 19.2 (2009): 335-350
work page 2009
-
[13]
Soundararajan, Rajiv, and Alan C. Bovik. ”Video quality assessment by reduced reference spatio-temporal entropic differencing.” IEEE Transac- tions on Circuits and Systems for Video Technology 23.4 (2012): 684- 694
work page 2012
-
[14]
Wang, Zhou, and Eero P. Simoncelli. ”Reduced-reference image quality assessment using a wavelet-domain natural image statistic model.” Human vision and electronic imaging X. V ol. 5666. SPIE, 2005
work page 2005
-
[15]
Ma, Lin, et al. ”Reduced-reference image quality assessment using reorganized DCT-based image representation.” IEEE Transactions on multimedia 13.4 (2011): 824-829
work page 2011
-
[16]
Rehman, Abdul, and Zhou Wang. ”Reduced-reference image quality assessment by structural similarity estimation.” IEEE transactions on image processing 21.8 (2012): 3378-3389
work page 2012
-
[17]
Men, Hui, Hanhe Lin, and Dietmar Saupe. ”Empirical evaluation of no- reference VQA methods on a natural video quality database.” 2017 Ninth international conference on quality of multimedia experience (QoMEX). IEEE, 2017
work page 2017
-
[18]
Men, Hui, Hanhe Lin, and Dietmar Saupe. ”Spatiotemporal feature com- bination model for no-reference video quality assessment.” 2018 Tenth international conference on quality of multimedia experience (QoMEX). IEEE, 2018
work page 2018
-
[19]
Li, Yuming, et al. ”No-reference video quality assessment with 3D shearlet transform and convolutional neural networks.” IEEE Transactions on Circuits and Systems for Video Technology 26.6 (2015): 1044-1057
work page 2015
-
[20]
Bovik, and Christophe Charrier
Saad, Michele A., Alan C. Bovik, and Christophe Charrier. ”Blind pre- diction of natural video quality.” IEEE Transactions on image Processing 23.3 (2014): 1352-1365
work page 2014
-
[21]
Xu, Jingtao, et al. ”No-reference video quality assessment via feature learning.” 2014 IEEE international conference on image processing (ICIP). IEEE, 2014
work page 2014
-
[22]
Mittal, Anish, Michele A. Saad, and Alan C. Bovik. ”A completely blind video integrity oracle.” IEEE Transactions on Image Processing 25.1 (2015): 289-300. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 13
work page 2015
-
[24]
Chen, Pengfei, et al. ”RIRNet: Recurrent-in-recurrent network for video quality assessment.” Proceedings of the 28th ACM international confer- ence on multimedia. 2020
work page 2020
- [25]
-
[26]
Bentaleb, Abdelhak, Ali C. Begen, and Roger Zimmermann. ”SD- NDASH: Improving QoE of HTTP adaptive streaming using software defined networking.” Proceedings of the 24th ACM international confer- ence on Multimedia. 2016
work page 2016
-
[28]
Bampis, Christos G., and Alan C. Bovik. ”Learning to predict stream- ing video QoE: Distortions, rebuffering and memory.” arXiv preprint arXiv:1703.00633 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[29]
Duanmu, Zhengfang, et al. ”A knowledge-driven quality-of-experience model for adaptive streaming videos.” arXiv preprint arXiv:1911.07944 (2019)
-
[30]
Zhou, Zhiming, et al. ”Quality of Experience Evaluation for Streaming Video Using CGNN.” 2020 IEEE International Conference on Visual Communications and Image Processing (VCIP). IEEE, 2020
work page 2020
-
[31]
Li, Leida, et al. ”From Whole Video to Frames: Weakly-Supervised Domain Adaptive Continuous-Time QoE Evaluation.” IEEE Transactions on Image Processing 31 (2022): 4937-4951
work page 2022
-
[32]
Ghosh, Monalisa, Dr Chetna Singhal, and Rushikesh Wayal. ”DeSVQ: Deep learning based streaming video QoE estimation.” Proceedings of the 23rd International Conference on Distributed Computing and Networking. 2022
work page 2022
-
[33]
Chen, Pengfei, et al. ”Temporal reasoning guided QoE evaluation for mobile live video broadcasting.” IEEE Transactions on Image Processing 30 (2021): 3279-3292
work page 2021
-
[34]
”A bitstream-based, scalable video-quality model for HTTP adaptive streaming: ITU-T P
Raake, Alexander, et al. ”A bitstream-based, scalable video-quality model for HTTP adaptive streaming: ITU-T P. 1203.1.” 2017 Ninth international conference on quality of multimedia experience (QoMEX). IEEE, 2017
work page 2017
-
[35]
Duanmu, Zhengfang, et al. ”A quality-of-experience index for streaming video.” IEEE Journal of Selected Topics in Signal Processing 11.1 (2016): 154-166
work page 2016
-
[36]
Duanmu, Zhengfang, Kede Ma, and Zhou Wang. ”Quality-of-experience for adaptive streaming videos: An expectation confirmation theory moti- vated approach.” IEEE Transactions on Image Processing 27.12 (2018): 6135-6146
work page 2018
-
[37]
Duanmu, Zhengfang, Abdul Rehman, and Zhou Wang. ”A quality-of- experience database for adaptive video streaming.” IEEE Transactions on Broadcasting 64.2 (2018): 474-487
work page 2018
-
[38]
Duanmu, Zhengfang, et al. ”Assessing the quality-of-experience of adap- tive bitrate video streaming.” arXiv preprint arXiv:2008.08804 (2020)
-
[39]
Bampis, Christos George, et al. ”Study of temporal effects on subjective video quality of experience.” IEEE Transactions on Image Processing 26.11 (2017): 5217-5231
work page 2017
-
[40]
Towards Perceptually Optimized End-to-end Adaptive Video Streaming
Bampis, Christos G., et al. ”Towards perceptually optimized end-to-end adaptive video streaming.” arXiv preprint arXiv:1808.03898 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[41]
Li, Chunyi, et al. ”A real-time blind quality-of-experience assessment metric for HTTP adaptive streaming.” arXiv preprint arXiv:2303.09818 (2023)
-
[42]
BT, RIR. ”Methodology for the subjective assessment of the quality of television pictures.” International Telecommunication Union 4 (2002)
work page 2002
-
[43]
Liu, Ze, et al. ”Swin transformer: Hierarchical vision transformer using shifted windows.” Proceedings of the IEEE/CVF international conference on computer vision. 2021
work page 2021
-
[44]
Sun, Deqing, et al. ”Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018
work page 2018
-
[45]
Hara, Kensho, Hirokatsu Kataoka, and Yutaka Satoh. ”Can spatiotempo- ral 3d cnns retrace the history of 2d cnns and imagenet?.” Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018
work page 2018
-
[47]
Narwaria, Manish, Weisi Lin, and Anmin Liu. ”Low-complexity video quality assessment using temporal quality variations.” IEEE Transactions on Multimedia 14.3 (2012): 525-535
work page 2012
-
[48]
Deng, Jia, et al. ”Imagenet: A large-scale hierarchical image database.” 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009
work page 2009
-
[49]
The Kinetics Human Action Video Dataset
Kay, Will, et al. ”The kinetics human action video dataset.” arXiv preprint arXiv:1705.06950 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[50]
Adam: A Method for Stochastic Optimization
Kingma, Diederik P., and Jimmy Ba. ”Adam: A method for stochastic optimization.” arXiv preprint arXiv:1412.6980 (2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[51]
”Automatic differentiation in pytorch.” (2017)
Paszke, Adam, et al. ”Automatic differentiation in pytorch.” (2017)
work page 2017
-
[52]
Ghadiyaram, Deepti, et al. ”In-capture mobile video distortions: A study of subjective behavior and objective algorithms.” IEEE Transactions on Circuits and Systems for Video Technology 28.9 (2017): 2061-2077
work page 2017
-
[53]
Nuutinen, Mikko, et al. ”CVD2014—A database for evaluating no- reference video quality assessment algorithms.” IEEE Transactions on Image Processing 25.7 (2016): 3073-3086
work page 2016
-
[54]
Hosu, Vlad, et al. ”The Konstanz natural video database (KoNViD-1k).” 2017 Ninth international conference on quality of multimedia experience (QoMEX). IEEE, 2017
work page 2017
-
[55]
Gao, Yixuan, et al. ”VDPVE: VQA Dataset for Perceptual Video Enhancement.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023
work page 2023
-
[56]
Antsiferova, Anastasia, et al. ”Video compression dataset and benchmark of learning-based video-quality metrics.” Advances in Neural Information Processing Systems 35 (2022): 13814-13825
work page 2022
-
[57]
Sinno, Zeina, and Alan Conrad Bovik. ”Large-scale study of perceptual video quality.” IEEE Transactions on Image Processing 28.2 (2018): 612- 627
work page 2018
-
[58]
Wang, Yilin, Sasi Inguva, and Balu Adsumilli. ”YouTube UGC dataset for video compression research.” 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP). IEEE, 2019
work page 2019
-
[59]
Yu, Xiangxu, et al. ”Predicting the quality of compressed videos with pre-existing distortions.” IEEE Transactions on Image Processing 30 (2021): 7511-7526
work page 2021
-
[60]
Zhu, Wenhan, et al. ”Multi-channel decomposition in tandem with free- energy principle for reduced-reference image quality assessment.” IEEE Transactions on Multimedia 21.9 (2019): 2334-2346
work page 2019
-
[61]
Min, Xiongkuo, et al. ”Objective quality evaluation of dehazed images.” IEEE Transactions on Intelligent Transportation Systems 20.8 (2018): 2879-2892
work page 2018
-
[62]
Min, Xiongkuo, et al. ”Quality evaluation of image dehazing methods using synthetic hazy images.” IEEE Transactions on Multimedia 21.9 (2019): 2319-2333
work page 2019
-
[63]
Krasula, Luk ´aˇs, et al. ”On the accuracy of objective image and video quality models: New methodology for performance evaluation.” 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX). IEEE, 2016
work page 2016
-
[64]
Hanhart, Philippe, et al. ”How to benchmark objective quality metrics from paired comparison data?.” 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX). Ieee, 2016
work page 2016
-
[65]
Krasula, Luk ´aˇs, et al. ”Quality assessment of sharpened images: Chal- lenges, methodology, and objective metrics.” IEEE Transactions on Image Processing 26.3 (2017): 1496-1508
work page 2017
-
[66]
Krasula, Luk ´aˇs, et al. ”Preference of experience in image tone-mapping: Dataset and framework for objective measures comparison.” IEEE Journal of Selected Topics in Signal Processing 11.1 (2016): 64-74
work page 2016
-
[67]
Brill, Michael H., et al. ”Accuracy and cross-calibration of video quality metrics: new methods from ATIS/T1A1.” Signal Processing: Image Communication 19.2 (2004): 101-107
work page 2004
-
[68]
Hanley, James A., and Barbara J. McNeil. ”A method of comparing the areas under receiver operating characteristic curves derived from the same cases.” Radiology 148.3 (1983): 839-843
work page 1983
-
[69]
Korhonen, Jari. ”Two-level approach for no-reference consumer video quality assessment.” IEEE Transactions on Image Processing 28.12 (2019): 5923-5938
work page 2019
-
[70]
Li, Dingquan, Tingting Jiang, and Ming Jiang. ”Quality assessment of in- the-wild videos.” Proceedings of the 27th ACM International Conference on Multimedia. 2019
work page 2019
-
[71]
Sun, Wei, et al. ”A deep learning based no-reference quality assessment model for ugc videos.” Proceedings of the 30th ACM International Conference on Multimedia. 2022
work page 2022
-
[72]
Wu, Haoning, et al. ”Fast-vqa: Efficient end-to-end video quality as- sessment with fragment sampling.” European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14 Zehao Zhu received the B.E. degree in electronic in- formation engineering from Jilin University in 2018 and th...
work page 2022
-
[73]
He is currently a Post-Doctoral Fellow with Shanghai Jiao Tong University. His research interests include image quality assessment, perceptual signal processing and mobile video processing. Jun Jia received the B.S. degree in computer science and technology from Hunan University, Changsha, China, in 2018. He is currently pursuing the Ph.D. degree in elect...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.