arxiv: 2604.21369 · v1 · submitted 2026-04-23 · 💻 cs.LG · cs.HC

Recognition: unknown

Channel-Free Human Activity Recognition via Inductive-Bias-Aware Fusion Design for Heterogeneous IoT Sensor Environments

Tatsuhito Hasegawa

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:32 UTC · model grok-4.3

classification 💻 cs.LG cs.HC

keywords human activity recognitionchannel-free processingIoT sensor fusionconditional batch normalizationheterogeneous sensorsmetadata conditioningjoint optimization

0 comments

The pith

A single shared model can recognize human activities from any combination of IoT sensors by processing channels independently and using metadata to guide fusion.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that human activity recognition can operate without assuming any fixed number, order, or semantic arrangement of input channels from heterogeneous IoT sensors. It does so by encoding each channel separately, feeding sensor metadata such as body location and modality into a conditional batch normalization step for late fusion, and training with a joint loss on both per-channel and fused outputs. A sympathetic reader would care because conventional models tie their input layers to specific dataset channel templates, rendering them unusable when sensor setups change across devices or environments. The design therefore aims at reusable inference that preserves discriminability even as channel compositions vary.

Core claim

The central claim is that strict channel-free HAR becomes feasible through channel-wise encoding paired with a shared encoder, metadata-conditioned late fusion via conditional batch normalization, and a combination loss that jointly optimizes individual channel predictions and the final fused result. Sensor metadata recovers structural relations that independent channel processing would otherwise discard, allowing one model to handle arbitrary channel counts and arrangements across datasets.

What carries the argument

Metadata-conditioned late fusion via conditional batch normalization, which adapts the fusion step using sensor details such as body location, modality, and axis to restore information lost when channels are processed independently.

Load-bearing premise

Sensor metadata such as body location, modality, and axis is available and sufficient to recover the structural information that channel-independent processing alone cannot retain.

What would settle it

A direct comparison on the same heterogeneous datasets in which the version without metadata conditioning matches or exceeds the full model's accuracy and cross-dataset transfer performance.

Figures

Figures reproduced from arXiv: 2604.21369 by Tatsuhito Hasegawa.

**Figure 2.** Figure 2: Architecture of the proposed channel-free HAR model. Each channel [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Accuracy (%) distributions on PAMAP2 for Baseline, EF, MF, LF, [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Inference time as a function of the number of input channels for [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Accuracy as a function of perturbation intensity under six conditions on PAMAP2 (LOSO-CV; shaded: [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Sensitivity analysis on PAMAP2 (single trial, error bars: [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

Human activity recognition (HAR) in Internet of Things (IoT) environments must cope with heterogeneous sensor settings that vary across datasets, devices, body locations, sensing modalities, and channel compositions. This heterogeneity makes conventional channel-fixed models difficult to reuse across sensing environments because their input representations are tightly coupled to predefined channel structures. To address this problem, we investigate strict channel-free HAR, in which a single shared model performs inference without assuming a fixed number, order, or semantic arrangement of input channels, and without relying on sensor-specific input layers or dataset-specific channel templates. We argue that fusion design is the central issue in this setting. Accordingly, we propose a channel-free HAR framework that combines channel-wise encoding with a shared encoder, metadata-conditioned late fusion via conditional batch normalization, and joint optimization of channel-level and fused predictions through a combination loss. The proposed model processes each channel independently to handle varying channel configurations, while sensor metadata such as body location, modality, and axis help recover structural information that channel-independent processing alone cannot retain. In addition, the joint loss encourages both the discriminability of individual channels and the consistency of the final fused prediction. Experiments on PAMAP2, together with robustness analysis on six HAR datasets, ablation studies, sensitivity analysis, efficiency evaluation, and cross-dataset transfer learning, demonstrate three main findings...

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a workable channel-free HAR design with metadata fusion, but the lack of tests on incomplete metadata is a notable gap.

read the letter

The main takeaway is that this paper proposes a channel-free human activity recognition framework designed for heterogeneous IoT sensor environments. It processes each input channel independently with a shared encoder, then fuses the outputs using conditional batch normalization conditioned on sensor metadata such as body location, modality, and axis. A joint loss optimizes both the per-channel predictions and the final fused one. This approach is new in its strict requirement for channel independence combined with that specific fusion mechanism. It does a solid job addressing the reusability problem that comes up when models trained on one sensor setup need to work on another with different channel arrangements. The evaluation includes tests on PAMAP2, robustness analysis across six datasets, ablation studies, sensitivity checks, efficiency measures, and cross-dataset transfer learning. That breadth shows they took the heterogeneity issue seriously. The design choices make sense on paper. Channel-wise encoding avoids the need for fixed input dimensions, and the metadata helps restore some of the lost context about how sensors relate to the body or each other. The joint loss is a practical way to ensure the model doesn't just rely on the fusion step. Where it gets softer is the assumption that reliable metadata will always be on hand during inference. The framework claims to be strict channel-free, but if metadata is incomplete or unavailable for some sensors, the conditional fusion path breaks down. The experiments described seem to provide full metadata in all cases, without any reported trials where metadata is masked or noisy. That leaves an open question about performance in the messier real-world IoT scenarios the paper aims to solve. This kind of work is useful for applied researchers building activity recognition systems that need to adapt to varying hardware without constant retraining. It has enough technical detail and experimental scope to merit peer review, though the reviewers would likely push for additional experiments on metadata robustness.

Referee Report

1 major / 1 minor

Summary. The paper proposes a channel-free HAR framework for heterogeneous IoT sensor environments that processes each input channel independently via a shared encoder, performs metadata-conditioned late fusion using conditional batch normalization (conditioned on sensor metadata such as body location, modality, and axis), and jointly optimizes channel-level and fused predictions with a combination loss. This design aims to enable a single reusable model that handles arbitrary channel counts, orders, and compositions without fixed input layers or dataset-specific templates. The approach is evaluated via experiments on PAMAP2, robustness analysis across six HAR datasets, ablation studies, sensitivity analysis, efficiency evaluation, and cross-dataset transfer learning, with the abstract indicating these demonstrate three main findings on the fusion design's effectiveness.

Significance. If the empirical results hold and the framework generalizes, this could be significant for practical HAR deployment in variable IoT settings, as it reduces the need for per-environment model redesign or retraining. The combination of channel-independent processing with metadata-driven inductive biases via CBN offers a concrete mechanism to retain structural information without sacrificing flexibility, and the joint loss provides a principled way to balance per-channel discriminability with fused consistency. Cross-dataset transfer experiments are a positive element for assessing real-world reusability.

major comments (1)

[Robustness analysis on six HAR datasets] Robustness analysis and ablation studies sections: The central 'strict channel-free' claim depends on the assumption that complete, accurate sensor metadata is always available at inference to condition the CBN fusion and recover structural priors discarded by channel-wise encoding. However, the described PAMAP2 and cross-dataset experiments supply full metadata by construction, with no reported ablation using masked, noisy, or absent metadata. This leaves untested the regime where the late-fusion path would collapse to the weaker channel-independent baseline, which is load-bearing for claims about arbitrary heterogeneous IoT deployments.

minor comments (1)

[Abstract] The abstract states that the experiments 'demonstrate three main findings' but does not enumerate or summarize those findings, which reduces clarity when assessing whether the data supports the central claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment regarding the robustness analysis and metadata assumptions below.

read point-by-point responses

Referee: [Robustness analysis on six HAR datasets] Robustness analysis and ablation studies sections: The central 'strict channel-free' claim depends on the assumption that complete, accurate sensor metadata is always available at inference to condition the CBN fusion and recover structural priors discarded by channel-wise encoding. However, the described PAMAP2 and cross-dataset experiments supply full metadata by construction, with no reported ablation using masked, noisy, or absent metadata. This leaves untested the regime where the late-fusion path would collapse to the weaker channel-independent baseline, which is load-bearing for claims about arbitrary heterogeneous IoT deployments.

Authors: We appreciate this observation. In the proposed framework, sensor metadata (body location, modality, axis) is treated as auxiliary configuration information that is known a priori in IoT deployments and supplied at inference; it is not inferred from the raw signals. The 'strict channel-free' property refers specifically to the absence of fixed input-layer assumptions or dataset-specific channel templates, allowing arbitrary channel counts/orders/compositions via per-channel encoding. Metadata-conditioned CBN then injects the structural priors needed for effective late fusion. We agree that an explicit test of the fallback regime is valuable. In the revision we will add an ablation that simulates missing/noisy metadata (randomly masking 30% of fields and injecting categorical noise) across PAMAP2 and two additional datasets, reporting both fused and per-channel accuracies to quantify graceful degradation to the channel-independent baseline. This will be included in the robustness analysis section without changing the core experimental claims. revision: partial

Circularity Check

0 steps flagged

No significant circularity; proposal is self-contained methodological design

full rationale

The paper introduces a channel-free HAR framework as an explicit architectural proposal combining independent channel encoding, a shared encoder, metadata-conditioned late fusion via conditional batch normalization, and a joint loss for channel-level and fused predictions. This construction is presented as a new design choice whose value is demonstrated empirically on PAMAP2 and cross-dataset experiments rather than derived from or reduced to prior fitted parameters, self-citations, or definitional loops. The role of sensor metadata (body location, modality, axis) is stated as an inductive bias to recover structure, not as a hidden redefinition of the input channels or a prediction forced by construction. No equations or claims in the provided text exhibit self-definitional reduction, fitted-input-as-prediction, or load-bearing self-citation chains. The framework remains falsifiable via ablation on metadata availability, consistent with a non-circular proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the effectiveness of metadata-conditioned fusion and joint optimization. No new physical entities are postulated. The approach assumes metadata availability and that the proposed loss improves both per-channel and fused discriminability.

axioms (2)

domain assumption Sensor metadata (body location, modality, axis) is available at both training and inference time.
Required to condition the late fusion step via conditional batch normalization.
domain assumption Joint optimization via a combination loss simultaneously improves channel-level discriminability and fused prediction consistency.
The training procedure is built on this assumption.

pith-pipeline@v0.9.0 · 5537 in / 1328 out tokens · 31009 ms · 2026-05-09T22:32:35.361796+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 1 canonical work pages

[1]

Activity recognition using cell phone accelerom- eters,

J. R. Kwapiszet al., “Activity recognition using cell phone accelerom- eters,”SIGKDD Explor. Newsl., vol. 12, no. 2, pp. 74–82, 2011

2011
[2]

Hasc challenge: Gathering large scale human activity corpus for the real-world activity understandings,

N. Kawaguchiet al., “Hasc challenge: Gathering large scale human activity corpus for the real-world activity understandings,” inIn Proc. of the 2nd Augmented Human International Conference, Mar. 2011

2011
[3]

Unimib shar: A dataset for human activity recognition using acceleration data from smartphones,

D. Micucciet al., “Unimib shar: A dataset for human activity recognition using acceleration data from smartphones,”Applied Sciences, vol. 7, no. 10, 2017

2017
[4]

A public domain dataset for human activity recog- nition using smartphones,

D. Anguitaet al., “A public domain dataset for human activity recog- nition using smartphones,”European Symposium on Artificial Neural Networks (ESANN), pp. 437–442, 2013

2013
[5]

Complex human activity recognition using smartphone and wrist-worn motion sensors,

M. Shoaibet al., “Complex human activity recognition using smartphone and wrist-worn motion sensors,”Sensors, vol. 16, no. 4, 2016

2016
[6]

Smart devices are different: Assessing and Mitigating- Mobile sensing heterogeneities for activity recognition,

A. Stisenet al., “Smart devices are different: Assessing and Mitigating- Mobile sensing heterogeneities for activity recognition,” inProceedings of the 13th ACM Conference on Embedded Networked Sensor Systems. New York, NY , USA: ACM, Nov. 2015

2015
[7]

MHealthDroid: A novel framework for agile develop- ment of mobile health applications,

O. Banoset al., “MHealthDroid: A novel framework for agile develop- ment of mobile health applications,” inAmbient Assisted Living and Daily Activities, ser. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2014, pp. 91–98

2014
[8]

Comparative study on classifying human activities with miniature inertial and magnetic sensors,

K. Altunet al., “Comparative study on classifying human activities with miniature inertial and magnetic sensors,”Pattern Recognit., vol. 43, no. 10, pp. 3605–3620, Oct. 2010

2010
[9]

Creating and benchmarking a new dataset for physical activity monitoring,

A. Reiss and D. Stricker, “Creating and benchmarking a new dataset for physical activity monitoring,” inIn Proc. of the 5th International Con- ference on PErvasive Technologies Related to Assistive Environments (PETRA), 2012, pp. 40:1–40:8

2012
[10]

Har-net: Fusing deep representation and hand-crafted features for human activity recognition,

M. Donget al., “Har-net: Fusing deep representation and hand-crafted features for human activity recognition,” inSignal and Information Pro- cessing, Networking and Computers. Singapore: Springer Singapore, 2019, pp. 32–40

2019
[11]

On the opportunities and risks of foundation models,

R. Bommasaniet al., “On the opportunities and risks of foundation models,” 2022

2022
[12]

Grounding dino: Marrying dino with grounded pre-training for open-set object detection,

S. Liuet al., “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” inEuropean Conference on Computer Vision (ECCV) 2024. Cham: Springer Nature Switzerland, 2025, pp. 38–55

2024
[13]

Segment anything,

A. Kirillovet al., “Segment anything,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2023, pp. 4015–4026

2023
[14]

Improving language understanding by generative pre-training,

A. Radfordet al., “Improving language understanding by generative pre-training,” OpenAI Technical Report, 2018, accessed: 2026- 01-14. [Online]. Available: https://cdn.openai.com/research-covers/ language-unsupervised/language understanding paper.pdf

2018
[15]

Cross-dataset activity recognition via adaptive spatial- temporal transfer learning,

X. Qinet al., “Cross-dataset activity recognition via adaptive spatial- temporal transfer learning,”Proc. ACM Interact. Mob. Wearable Ubiq- uitous Technol., vol. 3, no. 4, 2020

2020
[16]

Crosshar: Generalizing cross-dataset human activity recognition via hierarchical self-supervised pretraining,

Z. Honget al., “Crosshar: Generalizing cross-dataset human activity recognition via hierarchical self-supervised pretraining,”Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., vol. 8, no. 2, 2024

2024
[17]

Domain-robust pre-training method for the sensor-based human activity recognition,

Z. Zhao and T. Hasegawa, “Domain-robust pre-training method for the sensor-based human activity recognition,” in2022 International Conference on Machine Learning and Cybernetics (ICMLC), 2022, pp. 67–71

2022
[18]

Har-doremi: Optimizing data mixture for self-supervised human activity recognition across heterogeneous imu datasets,

L. Banet al., “Har-doremi: Optimizing data mixture for self-supervised human activity recognition across heterogeneous imu datasets,” 2025. [Online]. Available: https://arxiv.org/abs/2503.13542

work page arXiv 2025
[19]

Deep sets,

M. Zaheeret al., “Deep sets,” inAdvances in Neural Information Processing Systems, I. Guyon, U. V . Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017

2017
[20]

Human activity recognition model capable of handling various input waveforms,

T. Hasegawa, “Human activity recognition model capable of handling various input waveforms,” inNeural Information Processing. Singa- pore: Springer Nature Singapore, 2025, pp. 1–16

2025
[21]

Mobile activity recognition for a whole day: recognizing real nursing activities with big dataset,

S. Inoueet al., “Mobile activity recognition for a whole day: recognizing real nursing activities with big dataset,” inProceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp), 2015, p. 1269–1280

2015
[22]

Toward practical factory activity recognition: unsupervised understanding of repetitive assembly work in a factory,

T. Maekawaet al., “Toward practical factory activity recognition: unsupervised understanding of repetitive assembly work in a factory,” inProceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp), 2016, p. 1088–1099

2016
[23]

Human physical activity recognition using smart- phone sensors,

R.-A. V oicuet al., “Human physical activity recognition using smart- phone sensors,”Sensors, vol. 19, no. 3, 2019. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 13

2019
[24]

Ensemble residual network-based gender and activity recognition method with signals,

T. Tunceret al., “Ensemble residual network-based gender and activity recognition method with signals,”The Journal of Supercomputing, vol. 76, no. 3, pp. 2119–2138, Mar 2020

2020
[25]

isplinception: An inception-resnet deep learning architecture for human activity recognition,

M. Ronaldet al., “isplinception: An inception-resnet deep learning architecture for human activity recognition,”IEEE Access, vol. 9, pp. 68 985–69 001, 2021

2021
[26]

A comparative study: Toward an effective convolutional neural network architecture for sensor-based human activity recogni- tion,

Z. Zhaoet al., “A comparative study: Toward an effective convolutional neural network architecture for sensor-based human activity recogni- tion,”IEEE Access, vol. 10, pp. 20 547–20 558, 2022

2022
[27]

Sensor-based human activity recognition based on multi-stream time-varying features with eca-net dimensionality re- duction,

A. S. M. Miahet al., “Sensor-based human activity recognition based on multi-stream time-varying features with eca-net dimensionality re- duction,”IEEE Access, vol. 12, 2024

2024
[28]

Enhancing human activity recognition with tb-convatt: A multi-dimensional attention framework,

H. Yanget al., “Enhancing human activity recognition with tb-convatt: A multi-dimensional attention framework,”Biomedical Signal Processing and Control, vol. 110, p. 108314, 2025

2025
[29]

Deep learning models for real-time human activity recognition with smartphones,

S. Wan,et al., “Deep learning models for real-time human activity recognition with smartphones,”Mobile Networks and Applications, vol. 25, no. 2, pp. 743–755, Apr. 2020

2020
[30]

Multi-input CNN-GRU based human activity recognition using wearable sensors,

N. Duaet al., “Multi-input CNN-GRU based human activity recognition using wearable sensors,”Computing, vol. 103, no. 7, pp. 1461–1478, Jul. 2021

2021
[31]

Times-series data augmentation and deep learning for construction equipment activity recognition,

K. M. Rashid and J. Louis, “Times-series data augmentation and deep learning for construction equipment activity recognition,”Advanced Engineering Informatics, vol. 42, p. 100944, 2019

2019
[32]

Octave mix: Data augmentation using frequency de- composition for activity recognition,

T. Hasegawa, “Octave mix: Data augmentation using frequency de- composition for activity recognition,”IEEE Access, vol. 9, pp. 53 679– 53 686, 2021

2021
[33]

Activitygan: generative adversarial networks for data augmentation in sensor-based human activity recognition,

X. Liet al., “Activitygan: generative adversarial networks for data augmentation in sensor-based human activity recognition,” inAdjunct Proceedings of the 2020 ACM International Joint Conference on Per- vasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers (UbiComp/ISWC), 2020, p. 249–254

2020
[34]

SensorLM: Learning the language of wearable sen- sors,

Y . Zhanget al., “SensorLM: Learning the language of wearable sen- sors,” inThe Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025

2025
[35]

Human activity recognition using deep transfer learning of cross position sensor based on vertical distribution of data,

N. Varshneyet al., “Human activity recognition using deep transfer learning of cross position sensor based on vertical distribution of data,” Multimedia Tools and Applications, vol. 81, no. 16, pp. 22 307–22 322, Jul 2022

2022
[36]

Local domain adaptation for cross-domain activity recognition,

J. Zhaoet al., “Local domain adaptation for cross-domain activity recognition,”IEEE Transactions on Human-Machine Systems, vol. 51, no. 1, pp. 12–21, 2021

2021
[37]

Goat: A generalized cross-dataset activity recognition framework with natural language supervision,

S. Miao and L. Chen, “Goat: A generalized cross-dataset activity recognition framework with natural language supervision,”Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., vol. 8, no. 4, 2024

2024
[38]

Towards customizable foundation models for human activity recognition with wearable devices,

M. Qiuet al., “Towards customizable foundation models for human activity recognition with wearable devices,”Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., vol. 9, no. 3, 2025

2025
[39]

Unveiling the power of audio-visual early fusion transformers with dense interactions through masked modeling,

S. Mo and P. Morgado, “Unveiling the power of audio-visual early fusion transformers with dense interactions through masked modeling,” inProceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2024

2024
[40]

Audio–visual keyword transformer for unconstrained sentence-level keyword spotting,

Y . Liet al., “Audio–visual keyword transformer for unconstrained sentence-level keyword spotting,”CAAI Transactions on Intelligence Technology, vol. 9, no. 1, pp. 142–152, 2024

2024
[41]

CNN-LSTM-Based late sensor fusion for human activity recognition in big data networks,

Z. Baloch, F. K. Shaikh, and M. A. Unar, “CNN-LSTM-Based late sensor fusion for human activity recognition in big data networks,” Wireless Communications and Mobile Computing, vol. 2022, Aug. 2022

2022
[42]

DFTerNet: Towards 2-bit dynamic fusion networks for accurate human activity recognition,

Z. Yanget al., “DFTerNet: Towards 2-bit dynamic fusion networks for accurate human activity recognition,”IEEE Access, vol. 6, pp. 56 750– 56 764, 2018

2018
[43]

P2lhap: Wearable-sensor-based human activity recognition, segmentation, and forecast through patch-to-label seq2seq transformer,

S. Liet al., “P2lhap: Wearable-sensor-based human activity recognition, segmentation, and forecast through patch-to-label seq2seq transformer,” IEEE Internet of Things Journal, vol. 12, no. 6, pp. 6818–6830, 2025

2025
[44]

Y . Zhaoet al., “Attention-based sensor fusion for emotion recognition from human motion by combining convolutional neural network and weighted kernel support vector machine and using inertial measurement unit signals,”IET Signal Proc., vol. 17, no. 4, Apr. 2023

2023
[45]

Human activity recognition from multiple sensors data using multi-fusion representations and cnns,

F. M. Nooriet al., “Human activity recognition from multiple sensors data using multi-fusion representations and cnns,”ACM Trans. Multi- media Comput. Commun. Appl., vol. 16, no. 2, 2020

2020
[46]

A time-efficient convolu- tional neural network model in human activity recognition,

M. Gholamrezaii and S. M. T. AlModarresi, “A time-efficient convolu- tional neural network model in human activity recognition,”Multimedia Tools and Applications, vol. 80, no. 13, pp. 19 361–19 376, 2021

2021
[47]

Perceptionnet: A deep convolutional neural network for late sensor fusion,

P. Kasnesiset al., “Perceptionnet: A deep convolutional neural network for late sensor fusion,” inIntelligent Systems and Applications. Cham: Springer International Publishing, 2019, pp. 101–119

2019
[48]

CNN-based sensor fusion techniques for multimodal human activity recognition,

S. M ¨unzneret al., “CNN-based sensor fusion techniques for multimodal human activity recognition,” inProceedings of the 2017 ACM Interna- tional Symposium on Wearable Computers, ser. ISWC ’17, Sep. 2017, pp. 158–165

2017
[49]

Modulating early visual processing by language,

H. de Vries, F. Strub, J. Mary, H. Larochelle, O. Pietquin, and A. Courville, “Modulating early visual processing by language,” in Proceedings of the 31st International Conference on Neural Information Processing Systems, ser. NIPS’17. Red Hook, NY , USA: Curran Associates Inc., 2017, p. 6597–6607

2017
[50]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778

2016
[51]

Object-centric learning with slot attention,

F. Locatello, D. Weissenborn, T. Unterthiner, A. Mahendran, G. Heigold, J. Uszkoreit, A. Dosovitskiy, and T. Kipf, “Object-centric learning with slot attention,” inProceedings of the 34th International Conference on Neural Information Processing Systems, ser. NIPS ’20. Red Hook, NY , USA: Curran Associates Inc., 2020

2020
[52]

Tempo- ral convolutional networks for action segmentation and detection,

C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager, “Tempo- ral convolutional networks for action segmentation and detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

2017
[53]

Very deep convolutional networks for large-scale image recognition,

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” inProc. of the International Conference on Learning Representations, May 2015, pp. 1–14

2015
[54]

Mobilenetv2: Inverted residuals and linear bottlenecks,

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

2018
[55]

EfficientNet: Rethinking model scaling for con- volutional neural networks,

M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for con- volutional neural networks,” inProceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97. PMLR, 09–15 Jun 2019, pp. 6105–6114

2019
[56]

Dynamic negative correlation learning in deep ensemble learning,

H. Takama and T. Hasegawa, “Dynamic negative correlation learning in deep ensemble learning,” inProceedings of the 14th IIAE International Conference on Industrial Application Engineering 2026. The Institute of Industrial Applications Engineers, Japan, 2026. Tatsuhito Hasegawa(Member, IEEE) received the Ph.D. degree in engineering from Kanazawa Univer- si...

2026