arxiv: 2604.04431 · v1 · submitted 2026-04-06 · 📊 stat.CO

Recognition: no theorem link

iLBA: An R package for confidentially disseminating aggregated frequency tables

Dongsun Yoon, Inkwon Yeo, Jeehyun Hwang, Min-Jeong Park, Sungkyu Jung

Pith reviewed 2026-05-10 19:56 UTC · model grok-4.3

classification 📊 stat.CO

keywords iLBAdisclosure controlfrequency tablesR packageconfidentialityaggregationsmall cell adjustmentstatistical disclosure limitation

0 comments

The pith

An R package implements the iLBA algorithm to release aggregated frequency tables while protecting confidentiality through controlled ambiguity and bounded information loss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Statistical agencies release frequency tables from microdata, but small cells create disclosure risks. The paper presents the iLBA R package that applies small cell adjustment at the finest table level followed by an aggregation step. This step adds controlled ambiguity to prevent identification of individuals or small groups. The method keeps the total information loss within explicit bounds. A reader would care because it offers a practical, open-source tool for producing usable public tables without heavy suppression or rounding.

Core claim

The iLBA algorithm combines Small Cell Adjustment (SCA) at the finest level table with an aggregation procedure that introduces controlled ambiguity while bounding information loss. The software enables users to construct masked finest level tables, generate confidential aggregated tables for selected variables, and obtain masked frequencies for single-cell queries.

What carries the argument

The Information-Loss-Bounded Aggregation (iLBA) algorithm, which merges small cell adjustment with a subsequent aggregation step that adds ambiguity while limiting total deviation from true counts.

If this is right

Statistical agencies can produce masked finest-level tables from their microdata.
Confidential aggregated tables can be generated for any chosen set of variables.
Masked frequency values become available for single-cell queries on the released tables.
The process supports reproducible disclosure control without requiring custom code for each table.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same aggregation logic could be adapted to protect other tabular outputs such as cross-tabulations of continuous variables after binning.
Agencies might compare iLBA outputs directly against traditional cell suppression on the same datasets to measure utility differences.
The bounded-loss property opens a path to combining iLBA with differential privacy noise addition as a layered defense.

Load-bearing premise

The aggregation procedure adds enough ambiguity to block identification of small groups while the overall information loss stays small enough for practical use on real microdata.

What would settle it

Apply the package to a public microdata file with known small cells and check whether any original small frequency remains identifiable in the output tables or whether the published aggregates differ from the true values by more than the stated bound.

Figures

Figures reproduced from arXiv: 2604.04431 by Dongsun Yoon, Inkwon Yeo, Jeehyun Hwang, Min-Jeong Park, Sungkyu Jung.

**Figure 2.** Figure 2: The coarser level table and its information loss summary. [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗

read the original abstract

Statistical agencies frequently release frequency tables derived from microdata, but small frequency cells may lead to disclosure risks. We present \texttt{iLBA}, an open-source \textsf{R} package for confidential dissemination of aggregated frequency tables. The package implements the Information-Loss-Bounded Aggregation (iLBA) algorithm, which combines Small Cell Adjustment (SCA) at the finest level table with an aggregation procedure that introduces controlled ambiguity while bounding information loss. The software enables users to construct masked finest level tables, generate confidential aggregated tables for selected variables, and obtain masked frequencies for single-cell queries. By providing an accessible implementation of the iLBA method, the package facilitates reproducible and efficient disclosure control for tabular data derived from microdata.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a software paper releasing an R package for table disclosure control that combines small-cell adjustment with aggregation, but it provides almost no evidence that the method actually works as claimed.

read the letter

The core contribution here is an open-source R package called iLBA that lets users apply small cell adjustment to fine-grained tables and then aggregate with controlled ambiguity to reduce disclosure risk while claiming to bound information loss. That is a practical thing for statistical agencies that need to release frequency tables from microdata, and packaging it in R makes it reproducible and easy to run for people already in that ecosystem. The abstract is clear about the intended workflow: masked finest-level tables, confidential aggregates, and single-cell queries. That part is straightforward and useful as a tool release. Beyond the package itself, though, there is little new. The method description stays at the level of combining SCA with an aggregation step; no derivation, no formal bounds, and no comparison to established alternatives like cell suppression, rounding, or differential privacy appear in the material. The real weakness is the complete absence of any validation. There are no results on real or simulated data showing how much utility is lost, whether the ambiguity actually prevents disclosure in practice, or how the method performs against other disclosure-control approaches. Without those checks, the claim that information loss stays bounded remains an assertion rather than a demonstrated fact. This paper is aimed at practitioners in official statistics who want a ready R implementation rather than readers looking for methodological advances. It could be worth a referee if the target venue publishes software tools, but the reviewers will almost certainly ask for empirical tests and comparisons before acceptance. I would not cite it for the algorithm, only if I needed the specific code.

Referee Report

1 major / 2 minor

Summary. The manuscript presents the iLBA R package implementing the Information-Loss-Bounded Aggregation (iLBA) algorithm for confidential dissemination of aggregated frequency tables from microdata. The algorithm applies Small Cell Adjustment (SCA) to the finest-level table followed by an aggregation step that introduces controlled ambiguity while aiming to bound information loss; the package supports construction of masked tables, generation of confidential aggregates for selected variables, and masked frequencies for single-cell queries.

Significance. If the iLBA procedure reliably achieves its stated balance of confidentiality protection and bounded utility loss, the package would supply a practical, open-source tool for statistical agencies handling tabular data release. The combination of SCA with controlled aggregation addresses a recurring need in official statistics, and the R implementation promotes reproducibility. However, the absence of any validation results, explicit bounds, or comparisons in the provided material substantially limits the demonstrated significance.

major comments (1)

Abstract: the central claim that the aggregation procedure 'introduces controlled ambiguity while bounding information loss' is presented without any quantitative bounds, simulation results, error analysis, or comparison to existing methods such as standard SCA or other perturbation techniques; this is load-bearing for assessing whether the method meets its utility and confidentiality objectives.

minor comments (2)

The manuscript would benefit from explicit statements of package availability (e.g., CRAN, GitHub repository) and installation instructions to improve accessibility for users.
A short worked example with real or synthetic microdata, showing input table, SCA step, aggregation output, and resulting information-loss metric, would clarify the workflow for readers.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and for recommending major revision. The feedback correctly identifies that the abstract's claims regarding controlled ambiguity and bounded information loss lack supporting quantitative evidence in the current manuscript. We address this below and will revise the paper to strengthen the presentation.

read point-by-point responses

Referee: Abstract: the central claim that the aggregation procedure 'introduces controlled ambiguity while bounding information loss' is presented without any quantitative bounds, simulation results, error analysis, or comparison to existing methods such as standard SCA or other perturbation techniques; this is load-bearing for assessing whether the method meets its utility and confidentiality objectives.

Authors: We agree that the manuscript, as a software description, does not currently include empirical validation, explicit numerical bounds, or direct comparisons. The iLBA algorithm's design aims to bound information loss through the controlled aggregation step following SCA, but this is not demonstrated quantitatively here. In revision we will add a dedicated section with simulation results on synthetic and real microdata, reporting metrics such as average information loss, disclosure risk measures, and comparisons against plain SCA and simple perturbation methods. This will provide the necessary evidence for the abstract claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an R package implementing the iLBA algorithm, which combines Small Cell Adjustment at the finest level table with an aggregation procedure that introduces controlled ambiguity while bounding information loss. No mathematical derivations, equations, fitted parameters, predictions, or self-citations are described in the provided abstract and summary. The contribution is purely implementational and algorithmic, with the central claim being a factual description of the software's functionality rather than any load-bearing derivation that reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, mathematical axioms, or new invented entities are described in the abstract; the paper contributes a software tool rather than new theoretical constructs.

pith-pipeline@v0.9.0 · 5432 in / 1096 out tokens · 63806 ms · 2026-05-10T19:56:24.673342+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 7 canonical work pages

[1]

Chipperfield J., Gow D., Loong B., The Australian Bureau of Statistics and releasing frequency tables via a remote server, Stat. J. IAOS 32 (2016) 53–64. https://doi.org/10.3233/SJI-160969

work page doi:10.3233/sji-160969 2016
[2]

Rinott Y., O’Keefe C.M., Shlomo N., Skinner C., Confidentiality and Differential Privacy in the Dissemination of Frequency Tables, Stat. Sci. 33 (3) (2018) 358–385. https://doi.org/10.1214/17-STS641

work page doi:10.1214/17-sts641 2018
[3]

Shlomo N., Antal L., Elliot M., Measuring Disclosure Risk and Data Utility for Flexible Table Generators, J. Off. Stat. 31 (2) (2015) 305–

2015
[4]

https://doi.org/10.1515/jos-2015-0019. 18

work page doi:10.1515/jos-2015-0019 2015
[5]

MSCI Inc., S&P Dow Jones Indices, The Global Industry Clas- sification Standard (GICS®), https://www.msci.com/indexes/index- resources/gics (accessed 1 April 2026)

2026
[6]

Sweeney L.,k-Anonymity: A model for protecting privacy, Int. J. Un- certain. Fuzziness Knowl.-Based Syst. 10 (5) (2002) 557–570

2002
[7]

Privacy Confidentiality 8 (1) (2018)

Shlomo N., Statistical Disclosure Limitation: New Direc- tions and Challenges, J. Privacy Confidentiality 8 (1) (2018). https://doi.org/10.29012/jpc.684

work page doi:10.29012/jpc.684 2018
[8]

Korean Stat

Park M.-J., Kim H.J., Kwon S., Disseminating massive frequency ta- bles by masking aggregated cell frequencies, J. Korean Stat. Soc. 53 (2) (2024) 328–348. https://doi.org/10.1007/s42952-023-00248-x

work page doi:10.1007/s42952-023-00248-x 2024
[9]

Hundepool A., Domingo-Ferrer J., Franconi L., Giessing S., Schulte Nordholt E., Spicer K., De Wolf P.-P., Statistical Disclosure Control, Wiley, 2012

2012
[10]

(Eds.), Privacy in Statistical Databases (PSD 2018), Lect

Park M.-J., Bounded Small Cell Adjustments for Flexible Frequency Table Generators, in: Domingo-Ferrer J., Montes F. (Eds.), Privacy in Statistical Databases (PSD 2018), Lect. Notes Comput. Sci., vol. 11126, Springer, Cham, 2018. https://doi.org/10.1007/978-3-319-99771-1_2

work page doi:10.1007/978-3-319-99771-1_2 2018
[11]

Hundepool A., Domingo-Ferrer J., Franconi L., Giessing S., Lenz R., Naylor J., Schulte Nordholt E., Seri G., De Wolf P.-P., Tent R., Mło- dak A., Gussenbauer J., Wilak K., Handbook on Statistical Disclosure Control, 2nd ed., Center of Excellence SDC, 2026

2026
[12]

Ministry of Data and Statistics, Republic of Ko- rea, SGIS+: Statistical Geographic Information Service, https://sgis.mods.go.kr/jsp/english/index.jsp (accessed 1 April 2026)

2026
[13]

de Wolf P.P., Hundepool A., Tau-ARGUS: Software for Statistical Dis- closure Control of Tabular Data, Statistics Netherlands, 2003

2003
[14]

Available at: https://research.cbs.nl/casc/tau.htm (accessed 1 April 2026)

Statistics Netherlands, Tau-ARGUS 3.5 User’s Manual, 2009. Available at: https://research.cbs.nl/casc/tau.htm (accessed 1 April 2026)

2009
[15]

Meindl B., Templ M., Alfons A., sdcTable: An R Package for Statistical Disclosure Control in Tabular Data, J. Stat. Softw. 76 (1) (2017) 1–31. https://doi.org/10.18637/jss.v076.i01. 19

work page doi:10.18637/jss.v076.i01 2017
[16]

Meindl B., A Computational Framework to Protect Tabular Data – R Package sdcTable, in: Joint UNECE/Eurostat Work Session on Statis- tical Data Confidentiality, 2011

2011
[17]

Meindl B., CellKey: An R Package to Perturb Statistical Tables [soft- ware], Austrian J. Stat. (2025)

2025
[18]

Thompson G., Broadfoot S., Elazar D., Methodology for the Automatic Confidentialisation of Statistical Outputs from Remote Servers at the Australian Bureau of Statistics, in: UNECE Work Session on Statistical Data Confidentiality, 2013

2013
[19]

Eurostat, Guidelines for Statistical Disclosure Control Methods Applied on Geo-Referenced Data, European Commission, 2025

2025
[20]

20 Appendix A

Ministry of Data and Statistics, Republic of Korea, Statistics Data Cen- ter, https://data.kostat.go.kr (accessed 1 April 2026). 20 Appendix A. Pitfalls of naive application of the SCA method If one naively applies the SCA rule to the aggregated count of small fre- quency cells and releases˜f SCA S = 6, users can narrow down the possible true counts of th...

2026