pith. sign in

arxiv: 2301.07855 · v4 · submitted 2023-01-19 · 💰 econ.EM · stat.AP

Digital Divide: Evidence from the 2020 Canadian Internet Use Survey

Pith reviewed 2026-05-24 10:30 UTC · model grok-4.3

classification 💰 econ.EM stat.AP
keywords digital divideeducationdigital literacyCanadainternet useonline bankingdisabilitiesimmigrants
0
0 comments X

The pith

Education is the only determinant that remains significant at every rung of the digital ladder from internet access onward.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines inequality in digital participation using the 2020 Canadian Internet Use Survey. It combines logistic models, Shapley decompositions, sequential logits, and a digital literacy measure to trace where socioeconomic and demographic gaps arise along the adoption path. Education stands out as the sole factor that stays significant across all stages. Digital literacy fully accounts for the education gap at initial internet entry and reduces it substantially at online banking, though a large residual remains. Income effects concentrate at virtual wallets, disability penalties hit hardest at payments, and security shortfalls appear among immigrants and visible minorities.

Core claim

The paper establishes that education is the only determinant that remains significant at every rung of the digital ladder. Conditioning on digital literacy eliminates the education gradient at internet entry and reduces it by 61 percent at the online banking rung, but a substantial residual persists, pointing to behavioral and institutional frictions beyond measurable competence. Income inequality is most pronounced for virtual-wallet adoption; for online banking, employment and education together account for nearly half of the pro-rich concentration. Persons with disabilities face the largest penalty at the digital-payments stage rather than at online banking.

What carries the argument

A bifactor item response theory measure of digital literacy combined with survey-weighted logistic Lasso, exact Shapley decomposition of age-education gaps, and sequential logit models to locate gaps along the adoption sequence.

Load-bearing premise

The bifactor IRT digital-literacy score fully captures competence relevant to adoption decisions and the survey-weighted decompositions isolate each factor's contribution without substantial omitted-variable bias or measurement error in self-reported items.

What would settle it

A finding that the education coefficient loses significance at every rung once the digital literacy score is included would falsify the claim of a persistent residual education effect.

Figures

Figures reproduced from arXiv: 2301.07855 by Joann Jasiak, Peter MacKenzie, Purevdorj Tuvaandorj.

Figure 1
Figure 1. Figure 1: Coordinate plot for Internet Use, Email Use and Online Banking [PITH_FULL_IMAGE:figures/full_fig_p024_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Coordinate plot for Virtual Wallet and Credit Card Use [PITH_FULL_IMAGE:figures/full_fig_p025_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Digital Literacy Score By Cluster Note [PITH_FULL_IMAGE:figures/full_fig_p029_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Percentage of Respondents in Each Cluster Using Credit Card Online, Email, Online [PITH_FULL_IMAGE:figures/full_fig_p030_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Demographics of Digital Adopters Cluster [PITH_FULL_IMAGE:figures/full_fig_p031_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Percentage of Digital Adopters from each Province [PITH_FULL_IMAGE:figures/full_fig_p032_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: COVID-19 Stringency Index by Province [PITH_FULL_IMAGE:figures/full_fig_p033_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: COVID-19 Stringency Index and Percentage of Observations in the Digital Adopters [PITH_FULL_IMAGE:figures/full_fig_p034_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Regressor-by-Regressor Variation in the Oaxaca-Blinder Decomposition [PITH_FULL_IMAGE:figures/full_fig_p036_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Weighted Histogram of Digital Literacy Scores [PITH_FULL_IMAGE:figures/full_fig_p050_10.png] view at source ↗
read the original abstract

This paper studies inequality in digital participation across socioeconomic and demographic groups using the 2020 Canadian Internet Use Survey (CIUS). We combine survey-weighted logistic Lasso, an exact Shapley decomposition of age--education gaps, a sequential logit, and a bifactor item response theory (IRT) measure of digital literacy to identify who is excluded, why gaps persist, and where along the adoption path they arise. Education is the only determinant that remains significant at every rung of the digital ladder. Income inequality is most pronounced for virtual-wallet adoption; for online banking, employment and education together account for nearly half of the pro-rich concentration, indicating a broad socioeconomic gradient rather than a purely income-based divide. Persons with disabilities face the largest penalty at the digital-payments stage rather than at online banking, pointing to accessibility gaps in retail payment interfaces. Conditioning on digital literacy eliminates the education gradient at internet entry and reduces it by 61\% at the online banking rung, but a substantial residual persists, pointing to behavioral and institutional frictions beyond measurable competence. The youngest cohort records the lowest information-seeking score despite high digital engagement, and security deficits are concentrated among landed immigrants and visible minorities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper analyzes inequality in digital participation using the 2020 Canadian Internet Use Survey, combining survey-weighted logistic Lasso, exact Shapley decomposition of age-education gaps, sequential logit models, and a bifactor IRT measure of digital literacy. Key claims include that education is the only determinant significant at every adoption stage, income effects are strongest for virtual wallets, disability penalties are largest at digital payments, and conditioning on the IRT digital-literacy score eliminates the education gradient at internet entry while reducing it by 61% at online banking, with a residual attributed to behavioral and institutional factors.

Significance. If the bifactor IRT score is a valid, unbiased measure of relevant competence, the multi-method decomposition provides useful evidence on the stages at which socioeconomic gaps arise and the partial role of measurable literacy versus other frictions, with implications for targeted digital-inclusion policies. The combination of Lasso selection, Shapley values, and sequential modeling on a recent Canadian survey is a strength for descriptive decomposition work.

major comments (2)
  1. [Methods (bifactor IRT specification) and Results (education-gradient decompositions)] The headline result that conditioning on the bifactor IRT digital-literacy score reduces the education gradient by 61% at the online-banking rung (and eliminates it at entry) is load-bearing for the claim of residual behavioral frictions. This requires the IRT latent trait to be exogenous to adoption outcomes and free of differential item functioning by education; no tests for DIF, control-function corrections, or validation against objective skill measures are described, and self-reported items on skills and security are likely to violate these conditions.
  2. [§5 (sequential logit and Shapley results)] The sequential logit and Shapley decompositions treat the IRT score as an observed regressor without adjustment for classical measurement error that is plausibly correlated with education or other covariates; this could bias the residual education coefficient and the attribution of the 61% reduction.
minor comments (2)
  1. [Abstract] The abstract states that the Shapley decomposition is 'exact' but does not clarify how exactness is preserved under survey weights or with the Lasso-selected covariates.
  2. [Results tables] Table or figure reporting the 61% reduction should include the unadjusted and adjusted coefficients side-by-side with standard errors to allow direct assessment of precision.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for focusing on the identification assumptions underlying the bifactor IRT measure and its role in the decompositions. These are central to the paper's claims, and we address each point directly below, indicating the revisions we will make.

read point-by-point responses
  1. Referee: [Methods (bifactor IRT specification) and Results (education-gradient decompositions)] The headline result that conditioning on the bifactor IRT digital-literacy score reduces the education gradient by 61% at the online-banking rung (and eliminates it at entry) is load-bearing for the claim of residual behavioral frictions. This requires the IRT latent trait to be exogenous to adoption outcomes and free of differential item functioning by education; no tests for DIF, control-function corrections, or validation against objective skill measures are described, and self-reported items on skills and security are likely to violate these conditions.

    Authors: We agree that the headline decomposition result rests on the IRT score satisfying exogeneity and no DIF by education. The CIUS items are indeed self-reported, so reporting bias correlated with education cannot be ruled out a priori. The current manuscript does not report formal DIF tests or control-function corrections. In revision we will add a new subsection in the methods that (i) states the local-independence and exogeneity assumptions of the bifactor model, (ii) discusses why DIF by education is a plausible concern given the self-reported nature of the items, and (iii) reports a simple robustness check that re-estimates the sequential logit after dropping the most education-sensitive items. We will also note that objective performance-based skill measures are unavailable in the CIUS and therefore full external validation is not feasible with these data. revision: partial

  2. Referee: [§5 (sequential logit and Shapley results)] The sequential logit and Shapley decompositions treat the IRT score as an observed regressor without adjustment for classical measurement error that is plausibly correlated with education or other covariates; this could bias the residual education coefficient and the attribution of the 61% reduction.

    Authors: We concur that treating the estimated IRT factor score as an error-free regressor can bias the remaining education coefficient if measurement error is correlated with education. The paper presents the 61 percent reduction as a descriptive mediation result rather than a causal claim. In the revised version we will add an explicit caveat in §5 on the direction and likely magnitude of attenuation bias, and we will include a sensitivity exercise that replaces the point-estimate IRT score with draws from its posterior distribution (multiple-imputation style) to show how the education coefficient and the 61 percent figure change under plausible error assumptions. revision: partial

Circularity Check

0 steps flagged

No significant circularity; purely empirical survey decompositions

full rationale

The paper applies standard econometric tools (survey-weighted logistic Lasso, exact Shapley decomposition, sequential logit, bifactor IRT) to CIUS microdata to estimate gradients and decompositions. No derivation chain exists that reduces a claimed prediction or result to its own fitted inputs by construction, nor any self-definitional loop, self-citation load-bearing premise, or ansatz imported via prior work. The 61% reduction figure is a direct statistical output from regressing outcomes on the IRT score and education; it is not forced by the paper's equations. The analysis is self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The analysis rests on standard survey-sampling and psychometric assumptions rather than new free parameters or invented entities.

axioms (2)
  • domain assumption The 2020 CIUS sample weights produce unbiased population estimates after non-response adjustment
    Invoked by the survey-weighted logistic Lasso and decompositions
  • domain assumption The bifactor IRT model extracts a unidimensional digital-literacy trait that is causally relevant to adoption decisions
    Central to the claim that conditioning on literacy eliminates or reduces education gradients

pith-pipeline@v0.9.0 · 5745 in / 1416 out tokens · 25796 ms · 2026-05-24T10:30:51.775245+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

  1. [1]

    The initial weight is the inverse of an adjusted version of the probability of selection given above

  2. [2]

    The person weight is equal to Initial Household weight ×Factor 1 ×Number of Eligible Household Members (capped at 5) , where Factor 1 involves an adjustment for non-response among others

  3. [3]

    selective inference

    The final person weight wi is an adjusted version of the person weight above. B Technical appendix B.1 Inference with survey logistic Lasso Since CIUS 2020 data were collected using a stratified sampling scheme which is close to simple stratified sampling where the units within each stratum are sampled independently with equal 3Further details of the weig...

  4. [4]

    Have you used social networking websites or apps?

    “Have you used social networking websites or apps?”

  5. [5]

    Have you made online voice calls or video calls?

    “Have you made online voice calls or video calls?”

  6. [6]

    Have you researched for information about community events?

    “Have you researched for information about community events?”

  7. [7]

    Have you accessed the news?

    “Have you accessed the news?”

  8. [8]

    Have you found locations and directions?

    “Have you found locations and directions?”

  9. [9]

    Have you researched for information on health?

    “Have you researched for information on health?”

  10. [10]

    Have you researched for information about goods or services?

    “Have you researched for information about goods or services?” The remaining questions 8-10 are: 48

  11. [11]

    During the past 12 months, how did you pay for the goods and services ordered over the Internet? Did you use an online payment service?

    “During the past 12 months, how did you pay for the goods and services ordered over the Internet? Did you use an online payment service?”

  12. [12]

    During the past 12 months, which of the following software related activities have you carried out using any device? Have you copied or moved files or folders?

    “During the past 12 months, which of the following software related activities have you carried out using any device? Have you copied or moved files or folders?”

  13. [13]

    Figure 10 plots the weighted histogram and Table 12 below reports the weighted descriptive statistics of the scores of 11,874 respondents who answered all the relevant questions

    “Have you carried out any of the following to manage access to your personal data over the Internet during the past 12 months? Have you checked that the website where you provided personal data was secure e.g., https sites, safety logo or certificate?” 11,874 out of 12,431 possible respondents answered all the relevant questions and the remaining responde...