pith. machine review for the scientific record. sign in

arxiv: 2604.21893 · v1 · submitted 2026-04-23 · 📊 stat.ML · cs.LG· q-fin.RM

Recognition: unknown

Revealing Geography-Driven Signals in Zone-Level Claim Frequency Models: An Empirical Study using Environmental and Visual Predictors

Cristi\'an Bravo, Kristina G. Stankova, Sherly Alfonso-S\'anchez

Authors on Pith no claims yet

Pith reviewed 2026-05-08 14:08 UTC · model grok-4.3

classification 📊 stat.ML cs.LGq-fin.RM
keywords geographic informationclaim frequency modelingmotor insuranceenvironmental featuresimage embeddingszone-level modelspredictive accuracyMTPL
0
0 comments X

The pith

Geographic features from maps and imagery improve accuracy in zone-level motor insurance claim frequency models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether geographic context can strengthen predictions of motor insurance claims when individual location details are scarce in public data. It tests this on Belgian zone-level records by feeding coordinates, land-cover indicators from open maps, and image embeddings into linear models and gradient-boosted trees. Sympathetic readers would care because clearer location signals could support more accurate risk pricing without requiring private address data. The experiments show gains across model types, with the largest lifts coming from combining coordinates and moderate-scale environmental features.

Core claim

Geographic information constructed from OpenStreetMap indicators, CORINE Land Cover, and Belgian orthoimagery augments standard actuarial variables to raise predictive accuracy in zone-level Motor Third Party Liability claim frequency models. Linear and tree-based models both improve, with the strongest results from latitude-longitude paired with environmental features at the 5 km scale; smaller neighborhoods still help baselines. Image embeddings add value mainly when environmental features are unavailable, and overall performance hinges more on how geography is represented than on model complexity.

What carries the argument

Zone-level aggregation of claims paired with constructed geographic predictors—coordinates, scale-specific environmental features, and pretrained vision-transformer embeddings—added to GLM, regularized GLM, and gradient-boosted tree baselines.

If this is right

  • Coordinates combined with 5 km environmental features deliver the largest accuracy lift for both linear and tree-based models.
  • Environmental features at smaller neighborhood scales still improve baseline specifications.
  • Pretrained image embeddings raise accuracy and stability for regularized GLMs only when environmental features are absent.
  • The predictive contribution of geography depends less on model type than on the chosen representation of location.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same open-data approach could be tested in other insurance lines where location influences risk but detailed addresses are restricted.
  • Moving from zone aggregates to policy-level data might show whether the geographic signals strengthen or weaken without averaging.
  • Widespread use of these public sources could lower dependence on proprietary location datasets for actuarial work.

Load-bearing premise

The observed accuracy gains truly reflect location-based risk differences rather than dataset-specific correlations or the particular feature construction choices.

What would settle it

Repeating the same feature additions on an independent insurance dataset from another country or time period and finding no gain or a loss in held-out predictive metrics.

Figures

Figures reproduced from arXiv: 2604.21893 by Cristi\'an Bravo, Kristina G. Stankova, Sherly Alfonso-S\'anchez.

Figure 1
Figure 1. Figure 1: ResNet blocks. Because the images used in our experiments are grayscale, we adapt the first convolutional layer from three input channels to one, by averaging the pretrained RGB weights across the channel dimension. All subsequent layers of the ResNet18 backbone remain unchanged. After the final residual stage, a global average pooling layer produces a 512 dimensional embedding that summarizes the visual i… view at source ↗
Figure 2
Figure 2. Figure 2: General scheme of the employed ResNet18 model. view at source ↗
Figure 3
Figure 3. Figure 3: Extended cross validation scheme. In the following subsections, we introduce the tabular data used in the analysis, the construction of the environmental features, the acquisition of the imagery, and the role of image embeddings in the modeling framework. 3.1 Tabular Data The original dataset used in our research is the 1997 Belgian MTPL dataset (beMTPL97), available in the R package CASdatasets (Dutang, C… view at source ↗
Figure 4
Figure 4. Figure 4: Histogram of the aggregated number of claims and exposure. view at source ↗
Figure 5
Figure 5. Figure 5: Histogram of frequency and summary statistics. view at source ↗
Figure 6
Figure 6. Figure 6: Histogram and bar plots of some environmental features at radius 5 km view at source ↗
Figure 7
Figure 7. Figure 7: Orthoimages tiles centered on the given postcodes’ coordinates, with apothem lengths of 0.5 km, 1 km and 3 km. view at source ↗
Figure 8
Figure 8. Figure 8: Detailed example for the squares generated for the postcode 1140 with latitude and longitude, 50.87064 N and 4.39674 E, respectively. view at source ↗
read the original abstract

Geographic context is often consider relevant to motor insurance risk, yet public actuarial datasets provide limited location identifiers, constraining how this information can be incorporated and evaluated in claim-frequency models. This study examines how geographic information from alternative data sources can be incorporated into actuarial models for Motor Third Party Liability (MTPL) claim prediction under such constraints. Using the BeMTPL97 dataset, we adopt a zone-level modeling framework and evaluate predictive performance on unseen postcodes. Geographic information is introduced through two channels: environmental indicators from OpenStreetMap and CORINE Land Cover, and orthoimagery released by the Belgian National Geographic Institute for academic use. We evaluate the predictive contribution of coordinates, environmental features, and image embeddings across three baseline models: generalized linear models (GLMs), regularized GLMs, and gradient-boosted trees, while raw imagery is modeled using convolutional neural networks. Our results show that augmenting actuarial variables with constructed geographic information improves accuracy. Across experiments, both linear and tree-based models benefit most from combining coordinates with environmental features extracted at 5 km scale, while smaller neighborhoods also improve baseline specifications. Generally, image embeddings do not improve performance when environmental features are available; however, when such features are absent, pretrained vision-transformer embeddings enhance accuracy and stability for regularized GLMs. Our results show that the predictive value of geographic information in zone-level MTPL frequency models depends less on model complexity than on how geography is represented, and illustrate that geographic context can be incorporated despite limited individual-level spatial information.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. This paper investigates incorporating geographic information from public sources (OpenStreetMap, CORINE Land Cover, Belgian orthoimagery) into zone-level MTPL claim-frequency models on the BeMTPL97 dataset. It evaluates GLMs, regularized GLMs, and gradient-boosted trees on unseen postcodes, claiming that augmenting actuarial baselines with coordinates plus environmental features (especially at 5 km scale) improves accuracy, while image embeddings help mainly when environmental features are absent; the predictive value depends more on geography representation than model complexity.

Significance. If the reported gains hold after addressing selection and leakage concerns, the work would provide actionable evidence that alternative geographic data can enhance actuarial models when individual-level location identifiers are limited. It would illustrate practical trade-offs between feature construction and model class, with potential to inform risk pricing in motor insurance using publicly available spatial layers.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'both linear and tree-based models benefit most from combining coordinates with environmental features extracted at 5 km scale' is presented without any quantitative metrics (e.g., change in Poisson deviance, log-loss, or AUC), confidence intervals, or a table of results across all tested scales; this omission makes it impossible to judge the magnitude or robustness of the improvement that underpins the paper's main contribution.
  2. [Evaluation methodology] Evaluation methodology (implied in abstract and results description): hold-out on 'unseen postcodes' is not described as spatially blocked or geographically stratified; because environmental features are extracted from fixed external maps at fixed radii, random postcode splits permit spatial autocorrelation leakage between train and test sets, which directly threatens the claim that observed gains reflect genuine location-based risk signals rather than correlated predictors.
minor comments (2)
  1. [Abstract] Abstract: grammatical error ('is often consider relevant' should be 'is often considered relevant').
  2. [Abstract] Abstract: the statement 'smaller neighborhoods also improve baseline specifications' is imprecise; it should specify the radii tested and report the corresponding performance deltas to allow readers to assess the scale-sensitivity claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed and constructive comments. We address each major comment below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'both linear and tree-based models benefit most from combining coordinates with environmental features extracted at 5 km scale' is presented without any quantitative metrics (e.g., change in Poisson deviance, log-loss, or AUC), confidence intervals, or a table of results across all tested scales; this omission makes it impossible to judge the magnitude or robustness of the improvement that underpins the paper's main contribution.

    Authors: We agree that the abstract would be strengthened by including quantitative indicators of the reported improvements. The results section of the manuscript already contains tables and figures with performance metrics (Poisson deviance, log-loss) across models, feature sets, and spatial scales. In the revised version we will update the abstract to report the key numerical gains (e.g., relative reduction in Poisson deviance for the best coordinate-plus-5 km environmental configuration versus the actuarial baseline) and will explicitly reference the corresponding results table. revision: yes

  2. Referee: [Evaluation methodology] Evaluation methodology (implied in abstract and results description): hold-out on 'unseen postcodes' is not described as spatially blocked or geographically stratified; because environmental features are extracted from fixed external maps at fixed radii, random postcode splits permit spatial autocorrelation leakage between train and test sets, which directly threatens the claim that observed gains reflect genuine location-based risk signals rather than correlated predictors.

    Authors: We acknowledge the validity of this concern. The current evaluation splits postcodes into train and test sets to evaluate performance on unseen locations, but a purely random postcode split does not explicitly enforce spatial separation. Because environmental features are derived from fixed-radius buffers, nearby postcodes can share highly correlated predictors, raising the possibility of leakage. In the revision we will replace the simple hold-out with a spatially blocked or geographically stratified procedure (e.g., blocking by larger administrative units or using a distance-based split) and will report the updated performance metrics together with a discussion of how this change affects the interpretation of the geographic signals. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical geographic feature augmentation

full rationale

The paper is a standard empirical ML study on the BeMTPL97 dataset. It augments zone-level claim frequency models with coordinates, environmental features from public sources (OpenStreetMap, CORINE), and image embeddings, then reports predictive performance on held-out postcodes using GLMs, regularized GLMs, and gradient-boosted trees. No mathematical derivation chain exists that reduces predictions or results to inputs by construction. No self-citations are load-bearing, no fitted parameters are relabeled as independent predictions, and no ansatzes or uniqueness theorems are invoked. The reported accuracy gains are data-driven outcomes evaluated against external benchmarks, making the analysis self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard statistical assumptions plus domain assumptions about geographic relevance to insurance risk; free parameters include chosen spatial scales and model hyperparameters.

free parameters (2)
  • environmental feature extraction scale (5 km)
    Selected neighborhood size for feature aggregation; performance reported as best but chosen from tested options.
  • model hyperparameters (regularization, tree parameters)
    Tuned or selected to optimize performance on the dataset.
axioms (2)
  • domain assumption Zone-level aggregation preserves predictive signals without introducing bias that geographic features merely compensate for
    Invoked by the zone-level modeling framework choice.
  • domain assumption Environmental and visual features from public sources are causally or predictively relevant to claim frequency
    Core premise for testing their contribution.

pith-pipeline@v0.9.0 · 5596 in / 1393 out tokens · 42476 ms · 2026-05-08T14:08:08.897351+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

55 extracted references · 42 canonical work pages

  1. [1]

    URL https://www.soa.org/4a6a75/globalassets/assets/files/resources/experience-studies/2019/ltc-intercompany-study.pdf

    Long term care intercompany study, January 2015. URL https://www.soa.org/4a6a75/globalassets/assets/files/resources/experience-studies/2019/ltc-intercompany-study.pdf. Accessed: 2025-02-14

  2. [2]

    Statistics Surveys 4:40--79

    Arlot, S. and Celisse, A. A survey of cross-validation procedures for model selection. Statistics surveys, 4 0 (none): 0 40--79, 2010. ISSN 1935-7516. doi:10.1214/09-SS054

  3. [3]

    Y., Asare, I

    Asabere, N. Y., Asare, I. O., Lawson, G., Balde, F., Duodu, N. Y., Tsoekeku, G., Afriyie, P. O., and Ganiu, A. R. A. Geo-insurance: Improving big data challenges in the context of insurance services using a geographical information system (gis). Human Behavior and Emerging Technologies, 2024 0 (1): 0 9015012, 2024. doi:10.1155/2024/9015012

  4. [4]

    Ayuso, M., Guillen, M., and Nielsen, J. P. Improving automobile insurance ratemaking using telematics: incorporating mileage and driver behaviour data. Transportation, 46 0 (3): 0 735--752, 2019. doi:10.1007/s11116-018-9890-7

  5. [5]

    and Nagy, B

    Benedek, B. and Nagy, B. Z. Traditional versus ai-based fraud detection: cost efficiency in the field of automobile insurance. Financial and Economic Review, 22 0 (2): 0 77--98, 2023. doi:10.33893/FER.22.2.77

  6. [6]

    Deep learning, volume 1

    Bengio, Y., Goodfellow, I., Courville, A., et al. Deep learning, volume 1. MIT press Cambridge, MA, USA, 2017

  7. [7]

    Ai revolution in insurance: bridging research and reality

    Bhattacharya, S., Castignani, G., Masello, L., and Sheehan, B. Ai revolution in insurance: bridging research and reality. Frontiers in Artificial Intelligence, 8: 0 1568266, 2025. doi:10.3389/frai.2025.1568266

  8. [8]

    Geographic ratemaking with spatial embeddings

    Blier-Wong, C., Cossette, H., Lamontagne, L., and Marceau, E. Geographic ratemaking with spatial embeddings. ASTIN Bulletin: The Journal of the IAA, 52 0 (1): 0 1--31, 2022. doi:10.1017/asb.2021.25

  9. [9]

    A representation-learning approach for insurance pricing with images

    Blier-Wong, C., Lamontagne, L., and Marceau, E. A representation-learning approach for insurance pricing with images. ASTIN Bulletin: The Journal of the IAA, 54 0 (2): 0 280--309, 2024. doi:10.1017/asb.2024.9

  10. [10]

    Modelling mtpl insurance claim events: Can machine learning methods overperform the traditional glm approach? Hungarian Statistical Review, 4 0 (2), 2021

    Burka, D., Kov \'a cs, L., and Szepesv \'a ry, L. Modelling mtpl insurance claim events: Can machine learning methods overperform the traditional glm approach? Hungarian Statistical Review, 4 0 (2), 2021. doi:10.35618/hsr2021.02.en034

  11. [11]

    Belgian national geospatial data portal

    Cartesius / National Geographic Institute (NGI Belgium) . Belgian national geospatial data portal. https://www.cartesius.be. Accessed 04.12.2025

  12. [12]

    Chen and C

    Chen, T. and Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785--794, 2016. doi:10.1145/2939672.2939785

  13. [13]

    R., and Bravo, J

    Clemente, C., Guerreiro, G. R., and Bravo, J. M. Modelling motor insurance claim frequency and severity using gradient boosting. Risks, 11 0 (9): 0 163, 2023. doi:10.3390/risks11090163

  14. [14]

    CORINE Land Cover 2000 (CLC 2000)

    Copernicus Land Monitoring Service / European Environment Agency . CORINE Land Cover 2000 (CLC 2000) . European Union’s Copernicus Land Monitoring Service, 2020. URL https://land.copernicus.eu/en/products/corine-land-cover/clc-2000. Accessed 02.12.2025

  15. [15]

    Automobile insurance fraud detection based on pso-xgboost model and interpretable machine learning method

    Ding, N., Ruan, X., Wang, H., and Liu, Y. Automobile insurance fraud detection based on pso-xgboost model and interpretable machine learning method. Insurance: Mathematics and Economics, 120: 0 51--60, 2025. ISSN 0167-6687. doi:https://doi.org/10.1016/j.insmatheco.2024.11.006. URL https://www.sciencedirect.com/science/article/pii/S0167668724001112

  16. [16]

    and Quan, Z

    Dong, P. and Quan, Z. Automated machine learning in insurance. Insurance: Mathematics and Economics, 120: 0 17--41, 2025. ISSN 0167-6687. doi:https://doi.org/10.1016/j.insmatheco.2024.10.002. URL https://www.sciencedirect.com/science/article/pii/S0167668724001057

  17. [17]

    K., and Rane, S

    Dubey, A., Parida, T., Birajdar, A., Prajapati, A. K., and Rane, S. Smart underwriting system: An intelligent decision support system for insurance approval & risk assessment. In 2018 3rd International Conference for Convergence in Technology (I2CT), pages 1--6. IEEE, 2018. doi:10.1109/I2CT.2018.8529792

  18. [18]

    and Charpentier, A

    Dutang, C. and Charpentier, A. CASdatasets: Insurance datasets, 2024. R package version 1.2-0

  19. [19]

    Insurance dataset

    Dutang, C., Charpentier, A., and Gallic, E. Insurance dataset. 2024

  20. [20]

    Belgium postcode boundaries

    Environmental Systems Research Institute (Esri) . Belgium postcode boundaries. https://www.arcgis.com/home/item.html?id=e385aeef974a4aea8ae7fb1b0efc1341, 2022. GIS dataset accessed January 2026

  21. [21]

    M., Malawany, K., Osman, A

    Fouad, M. M., Malawany, K., Osman, A. G., Amer, H. M., Abdulkhalek, A. M., and Eldin, A. B. Automated vehicle inspection model using a deep learning approach. Journal of Ambient Intelligence and Humanized Computing, 14 0 (10): 0 13971--13979, 2023. doi:10.1007/s12652-022-04105-3

  22. [22]

    Gao, G., Wang, H., and W \"u thrich, M. V. Boosting poisson regression models with telematics car driving data. Machine Learning, 111 0 (1): 0 243--272, 2022. doi:10.2139/ssrn.3596034

  23. [23]

    K., and Sahu, G

    Gupta, S., Ghardallou, W., Pandey, D. K., and Sahu, G. P. Artificial intelligence adoption in the insurance industry: Evidence using the technology--organization--environment framework. Research in International Business and Finance, 63: 0 101757, 2022. doi:10.1016/j.ribaf.2022.101757

  24. [24]

    and Renshaw, A

    Haberman, S. and Renshaw, A. E. Generalized linear models and actuarial science. Journal of the Royal Statistical Society: Series D (The Statistician), 45 0 (4): 0 407--436, 1996. doi:10.2307/2988543

  25. [25]

    The Elements of Statistical Learning

    Hastie, T., Tibshirani, R., Friedman, J., et al. The elements of statistical learning. Springer, New York, 2009. ISBN 978-0-387-84857-0. doi:10.1007/978-0-387-84858-7

  26. [26]

    Deep residual learning for image recognition

    He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770--778, 2016. doi:10.1109/CVPR.2016.90

  27. [27]

    and Antonio, K

    Henckaerts, R. and Antonio, K. The added value of dynamically updating motor insurance prices with telematics collected driving behavior data. Insurance: Mathematics and Economics, 105: 0 79--95, 2022. doi:10.1016/j.insmatheco.2022.03.011

  28. [28]

    Boosting insights in insurance tariff plans with tree-based machine learning methods

    Henckaerts, R., C \^o t \'e , M.-P., Antonio, K., and Verbelen, R. Boosting insights in insurance tariff plans with tree-based machine learning methods. North American Actuarial Journal, 25 0 (2): 0 255--285, 2021. doi:10.1080/10920277.2020.1745656

  29. [29]

    Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff

    Holvoet, F., Antonio, K., and Henckaerts, R. Neural networks for insurance pricing with frequency and severity data: a benchmark study from data preprocessing to technical tariff. North American Actuarial Journal, pages 1--44, 2025. doi:10.1080/10920277.2025.2451860

  30. [30]

    Evaluating xgboost for competitive insurance pricing: A case study on motor third-party liability insurance

    Ibrahim, J., Stanley, J., Murfi, H., Novkaniza, F., and Devila, S. Evaluating xgboost for competitive insurance pricing: A case study on motor third-party liability insurance. In 2024 International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA), pages 847--852. IEEE, 2024. doi:10.1109/icicyta64807.2024.10912952

  31. [31]

    M., Ahamed, T., Matsushita, S., and Noguchi, R

    Islam, M. M., Ahamed, T., Matsushita, S., and Noguchi, R. A damage-based crop insurance system for flash flooding: a satellite remote sensing and econometric approach. In Remote sensing application II: A climate change perspective in agriculture, pages 121--163. Springer, 2024. doi:10.1007/978-981-97-1188-8\_5

  32. [32]

    ISO 19109:2022 Geographic information -- Rules for application schema

    ISO . ISO 19109:2022 Geographic information -- Rules for application schema . Standard, International Organization for Standardization, Geneva, Switzerland, 2022

  33. [33]

    Impact of ai in the general insurance underwriting factors

    Jaiswal, R. Impact of ai in the general insurance underwriting factors. Central European Management Journal, 31 0 (2): 0 697--705, 2023

  34. [34]

    and Kidzi \'n ski,

    Kita-Wojciechowska, K. and Kidzi \'n ski, . Google street view image predicts car accident risk. Central European Economic Journal, 6 0 (53): 0 151--163, 2019. doi:10.2478/ceej-2019-0011

  35. [35]

    Heung-Chang Lee and Jeonggeun Song

    Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86 0 (11): 0 2278--2324, 1998. ISSN 0018-9219. doi:10.1109/5.726791

  36. [36]

    A survey of convolutional neural networks: analysis, applications, and prospects

    Li, Z., Liu, F., Yang, W., Peng, S., and Zhou, J. A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems, 33 0 (12): 0 6999--7019, 2021. doi:10.1109/tnnls.2021.3084827

  37. [37]

    A., Goodchild, M

    Longley, P. A., Goodchild, M. F., Maguire, D. J., and Rhind, D. W. Geographic information science and systems. John Wiley & Sons, 2015

  38. [38]

    Generalized linear models

    McCullagh, P. Generalized linear models. Routledge, 2019. doi:10.1201/9780203753736

  39. [39]

    C., Belnap, T., Dwivedi, P., Deligani, A

    Nguyen, Q. C., Belnap, T., Dwivedi, P., Deligani, A. H. N., Kumar, A., Li, D., Whitaker, R., Keralis, J., Mane, H., Yue, X., et al. Google street view images as predictors of patient health outcomes, 2017--2019. Big data and cognitive computing, 6 0 (1): 0 15, 2022. doi:10.3390/bdcc6010015

  40. [40]

    Noll, A., Salzmann, R., and Wuthrich, M. V. Case study: French motor third-party liability claims. Available at SSRN 3164764, 2020. doi:10.2139/ssrn.3164764

  41. [41]

    Nomic embed vision: Expanding the latent space.arXiv preprint arXiv:2406.18587, 2024

    Nussbaum, Z., Duderstadt, B., and Mulyar, A. Nomic embed vision: Expanding the latent space. arXiv preprint arXiv:2406.18587, 2024. doi:10.48550/arXiv.2406.18587

  42. [42]

    OpenStreetMap , 2025 a

    OpenStreetMap contributors . OpenStreetMap , 2025 a . URL https://www.openstreetmap.org. Data licensed under the Open Database License (ODbL)

  43. [43]

    OpenStreetMap Belgium Data Extract

    OpenStreetMap contributors . OpenStreetMap Belgium Data Extract . Geofabrik GmbH, 2025 b . URL https://download.geofabrik.de/europe/belgium.html. Distributed by Geofabrik. Licensed under ODbL

  44. [44]

    Social network analytics for supervised fraud detection in insurance

    \'O skarsd \'o ttir, M., Ahmed, W., Antonio, K., Baesens, B., Dendievel, R., Donas, T., and Reynkens, T. Social network analytics for supervised fraud detection in insurance. Risk Analysis, 42 0 (8): 0 1872--1890, 2022. doi:10.1111/risa.13693

  45. [45]

    A., Corzo-Garc \' a, D., Pro-Mart \' n, J

    P \'e rez-Zarate, S. A., Corzo-Garc \' a, D., Pro-Mart \' n, J. L., \'A lvarez-Garc \' a, J. A., Mart \' nez-del Amor, M. A., and Fern \'a ndez-Cabrera, D. Automated car damage assessment using computer vision: Insurance company use case. Applied Sciences, 14 0 (20): 0 9560, 2024. doi:10.3390/app14209560

  46. [46]

    On the validation of claims with excess zeros in liability insurance: A comparative study

    Qazvini, M. On the validation of claims with excess zeros in liability insurance: A comparative study. Risks, 7 0 (3): 0 71, 2019. doi:10.3390/risks7030071

  47. [47]

    Rababaah, A. R. Investigation of deep learning models for vehicle damage classification. In 2023 10th International Conference on Signal Processing and Integrated Networks (SPIN), pages 25--30. IEEE, 2023. doi:10.1109/spin57001.2023.10116703

  48. [48]

    Seyam, E. A. Predicting motor insurance claim incidence using generalized and tree-based models: A comparative statistical approach. Insurance Markets and Companies, 16 0 (2): 0 38, 2025. doi:10.21511/ins.16(2).2025.04

  49. [49]

    Deep residential representations: Using unsupervised learning to unlock elevation data for geo-demographic prediction

    Stevenson, M., Mues, C., and Bravo, C. Deep residential representations: Using unsupervised learning to unlock elevation data for geo-demographic prediction. ISPRS Journal of Photogrammetry and Remote Sensing, 187: 0 378--392, 2022. ISSN 0924-2716. doi:https://doi.org/10.1016/j.isprsjprs.2022.03.015. URL https://www.sciencedirect.com/science/article/pii/S...

  50. [50]

    and Thomas, I

    Thiran, P. and Thomas, I. Accidents de la route et distance au domicile. approche quantitative pour bruxelles. Les Cahiers Scientifiques du Transport-Scientific Papers in Transportation, 32, 1997. doi:10.46298/cst.11958

  51. [51]

    o m, J., and Lindstr \

    Tufvesson, O., Lindstr \"o m, J., and Lindstr \"o m, E. Spatial statistical modelling of insurance risk: a spatial epidemiological approach to car insurance. Scandinavian Actuarial Journal, 2019 0 (6): 0 508--522, 2019. doi:10.1080/03461238.2019.1576146

  52. [52]

    Claim frequency estimation in motor third-party liability (mtpl): Classical statistical models versus machine learning methods

    Vít, O., Seif, L., and Štěpánek, L. Claim frequency estimation in motor third-party liability (mtpl): Classical statistical models versus machine learning methods. In Annals of Computer Science and Information Systems, volume 45, pages 161--166. Polish Information Processing Society, 2025. doi:10.15439/2025f5118

  53. [53]

    Predictive analytics in long term care

    Zail, H. Predictive analytics in long term care. In Actuarial Aspects of Long Term Care, pages 309--336. Springer, 2019. doi:10.1007/978-3-030-05660-5\_13

  54. [54]

    C., Li, M., and Smola, A

    Zhang, A., Lipton, Z. C., Li, M., and Smola, A. J. Dive into deep learning. Cambridge University Press, 2023

  55. [55]

    and Hastie, T

    Zou, H. and Hastie, T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67 0 (2): 0 301--320, 2005. doi:10.1111/j.1467-9868.2005.00503.x