Counting Machine Parts
Pith reviewed 2026-05-20 12:47 UTC · model grok-4.3
The pith
Extending FamNet with an added loss term counts machine washer parts at 1.96 mean absolute error.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors extend FamNet by adding a custom loss component and train the resulting model on a dataset of machine washer images. When evaluated against ground-truth counts, the trained network produces a mean absolute error of 1.96 and a root-mean-squared error that is also lower than the errors obtained from a classical image-processing pipeline, an instance-segmentation baseline, and a standard density-map estimator.
What carries the argument
FamNet extended by one additional loss term that is minimized together with the original density-map loss during training on washer images.
If this is right
- The same training recipe can be applied to other small, uniformly shaped industrial parts once a modest labeled set is collected.
- The lower error relative to density-map estimation alone indicates that the extra loss term supplies useful supervisory signal for this scale of object.
- Because the method builds directly on an existing counting architecture, it can be swapped into any pipeline that already uses FamNet without redesigning the rest of the system.
Where Pith is reading between the lines
- If the added loss term is shown to be portable across object classes, the same modification could improve counting performance on other dense, repetitive scenes such as screws, bolts, or electronic components.
- Pairing the counter with a simple foreground mask could further reduce errors caused by background clutter that the current dataset may not fully capture.
- A production version could be deployed on an edge device attached to a conveyor belt, turning the reported accuracy into real-time inventory updates.
Load-bearing premise
The collection of washer images used for training and testing is representative enough of real factory conditions that the measured error will not rise sharply on new photographs.
What would settle it
Run the trained model on a fresh set of washer photographs taken under different lighting, with heavier overlaps, or from a different camera angle and record whether the mean absolute error stays near 1.96 or climbs well above 2.
Figures
read the original abstract
Counting objects in an image is a task applicable across many domains. For instance, crowd counting, inventory counting, and cell counting have been the focus of recent research. The major challenges in estimating the count of objects include overlapping objects, object scale issues, occlusions, and varying lighting conditions. In this report, we explore the problem of counting machine washer parts. Our technique is an extension of FamNet with an additional loss component, trained on the given dataset. We compare to three baseline methods: a traditional image processing pipeline, instance segmentation, and density map estimation. We evaluate the performance of these algorithms by computing the Mean Absolute Error (MAE) and the Root Mean Squared Error (RMSE) between the true object counts and the model outputs. Our approach achieves a performance of 1.96 MAE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper extends FamNet with an additional loss term for counting machine washer parts in images. It compares the approach to three baselines (traditional image processing pipeline, instance segmentation, and density map estimation) and reports a mean absolute error (MAE) of 1.96 and root mean squared error (RMSE) on the provided dataset of washer images.
Significance. If the 1.96 MAE reflects performance on a held-out test set with proper controls for overfitting, the work could offer a modest practical advance for domain-specific object counting in industrial settings. The addition of a loss term is a standard technique, but without details on dataset size, splits, or hyperparameters, the significance of the improvement over baselines remains difficult to evaluate.
major comments (1)
- [Abstract and Evaluation] The central performance claim of 1.96 MAE is not supported by any description of a train/test split or confirmation that evaluation images were excluded from training. The abstract and evaluation description provide no dataset size, split details, training hyperparameters, or error bars, leaving open the possibility that the metric was computed on training images and does not demonstrate generalization from the added loss term.
minor comments (2)
- [Evaluation] The manuscript should include the total number of images, the train/validation/test split ratios, and whether the reported MAE excludes the training distribution.
- [Methods] Training hyperparameters (learning rate, batch size, number of epochs) and the exact formulation of the additional loss term should be provided to allow reproduction.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the major comment on the evaluation details below and will revise the paper accordingly to improve clarity and rigor.
read point-by-point responses
-
Referee: [Abstract and Evaluation] The central performance claim of 1.96 MAE is not supported by any description of a train/test split or confirmation that evaluation images were excluded from training. The abstract and evaluation description provide no dataset size, split details, training hyperparameters, or error bars, leaving open the possibility that the metric was computed on training images and does not demonstrate generalization from the added loss term.
Authors: We agree that the manuscript lacks sufficient details on the dataset and evaluation protocol, which is necessary to properly evaluate the generalization of the added loss term. In the revised version, we will expand the evaluation section to describe the full dataset size, the train/test split with explicit confirmation that test images were held out during training, the training hyperparameters, and error bars from multiple runs. This will support the reported 1.96 MAE as a measure of performance on unseen data. revision: yes
Circularity Check
No circularity: empirical MAE is independent of model internals
full rationale
The manuscript reports a standard empirical performance metric (MAE of 1.96) obtained by comparing model outputs to ground-truth object counts on the provided washer dataset. No derivation chain, mathematical prediction, or first-principles result is claimed that reduces to fitted parameters, self-definitions, or self-citations. The evaluation step is a direct, external comparison against held-out labels and remains self-contained; the result does not loop back to any quantity defined inside the FamNet extension or loss term.
Axiom & Free-Parameter Ledger
free parameters (1)
- weight of additional loss term
axioms (1)
- domain assumption FamNet architecture transfers effectively to machine-part images when augmented with one extra loss
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our technique is an extension of FamNet with an additional loss component... loss = density MSE loss + λ * mismatch loss
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We use a train-dev-test split of 80:10:10... FamNet [12] + Angle Aggregation (Mean) 1.96 2.70
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Alison Noble, and An- drew Zisserman
Carlos Arteta, Victor Lempitsky, J. Alison Noble, and An- drew Zisserman. Interactive object counting. InEuropean Conference on Computer Vision, 2014. 2
work page 2014
-
[2]
An Image Processing based Object Counting Approach for Machine Vision Application
Mehmet Baygin, Mehmet Karak ¨ose, Alisan Sarimaden, and Erhan Akin. An image processing based object count- ing approach for machine vision application.CoRR, abs/1802.05911, 2018. 1, 3, 6
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[3]
Kaiming He, Georgia Gkioxari, Piotr Doll ´ar, and Ross Gir- shick. Mask r-cnn. In2017 IEEE International Conference on Computer Vision (ICCV), pages 2980–2988, 2017. 4
work page 2017
-
[4]
Deep Residual Learning for Image Recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition.arXiv preprint arXiv:1512.03385, 2015. 2, 4
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[5]
Attention scaling for crowd counting
Xiaoheng Jiang, Li Zhang, Mingliang Xu, Tianzhu Zhang, Pei Lv, Bing Zhou, Xin Yang, and Yanwei Pang. Attention scaling for crowd counting. In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4705–4714, 2020. 1, 2, 3
work page 2020
-
[6]
Learning to count objects in images
Victor Lempitsky and Andrew Zisserman. Learning to count objects in images. InAdvances in Neural Information Pro- cessing Systems, 2010. 2, 3
work page 2010
-
[7]
Feature pyramid networks for object detection
Tsung-Yi Lin, Piotr Doll ´ar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyramid networks for object detection. In2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 936–944, 2017. 4
work page 2017
-
[8]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis, 2020. 6
work page 2020
-
[9]
Indoor segmentation and support inference from rgbd images
Pushmeet Kohli Nathan Silberman, Derek Hoiem and Rob Fergus. Indoor segmentation and support inference from rgbd images. InECCV, 2012. 6
work page 2012
-
[10]
Towards perspective-free object counting with deep learning
Daniel O ˜noro and Roberto L ´opez-Sastre. Towards perspective-free object counting with deep learning. volume 9911, 10 2016. 2, 3
work page 2016
-
[11]
Vi- sion transformers for dense prediction.ArXiv preprint, 2021
Ren ´e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vi- sion transformers for dense prediction.ArXiv preprint, 2021. 5, 6
work page 2021
-
[12]
Viresh Ranjan, Udbhav Sharma, Thua Nguyen, and Minh Hoai. Learning to count everything.2021 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 3393–3402, 2021. 1, 2, 3, 4, 6
work page 2021
-
[13]
Dong Kyun Shin, Minhaz Uddin Ahmed, and Phil Kyu Rhee. Incremental deep learning for robust object detection in unknown cluttered environments.IEEE Access, 6:61748– 61760, 2018. 2
work page 2018
-
[14]
Vishwanath A. Sindagi and Vishal M. Patel. A survey of recent advances in cnn-based single image crowd counting and density estimation.Pattern Recognition Letters, 107:3– 16, 2018. Video Surveillance-oriented Biometrics. 1
work page 2018
-
[15]
Jia Wan, Ziquan Liu, and Antoni B. Chan. A generalized loss function for crowd counting and localization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition (CVPR), pages 1974–1983, June 2021. 2, 3
work page 1974
-
[16]
Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2.https://github. com/facebookresearch/detectron2, 2019. 4, 6
work page 2019
-
[17]
Reverse perspective network for perspective- aware object counting
Yifan Yang, Guorong Li, Zhe Wu, Li Su, Qingming Huang, and Nicu Sebe. Reverse perspective network for perspective- aware object counting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020. 1, 2 7
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.