pith. machine review for the scientific record. sign in

arxiv: 2604.21202 · v1 · submitted 2026-04-23 · 💰 econ.EM · cs.CL

Recognition: unknown

Participation and Representation in Local Government Speech

Amar Venugopal, Olivia Martin

Authors on Pith no claims yet

Pith reviewed 2026-05-08 13:26 UTC · model grok-4.3

classification 💰 econ.EM cs.CL
keywords public participationlocal governmentcity council meetingsdemographic representationremote accessnatural experimentland useCalifornia
0
0 comments X

The pith

Public speakers at California city council meetings are older, whiter, more male, more liberal, and more likely to own homes than registered voters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper builds a large dataset of transcribed city council meetings from 115 California cities spanning a decade. It establishes that people who speak at these meetings differ markedly from the broader voter population in age, race, gender, ideology, and homeownership status, and that participation rises when land use and zoning appear on the agenda. The authors then use shifts in remote access during the pandemic as a natural experiment to test whether lowering the cost of attending changes who shows up. These patterns matter because city council meetings are the main formal setting where residents can directly address elected officials and influence local policy.

Core claim

Using transcripts from a decade of city council meetings across 115 California cities, we find that public participants are substantially older, whiter, more male, more liberal, and more likely to own homes than the registered voter population. Participation surges when land use and zoning topics are on the agenda. Exploiting pandemic-era changes in remote access options, we show that removing remote participation reduces the total number of speakers but does not clearly alter the demographic or ideological composition of those who speak.

What carries the argument

Large transcribed dataset of city council meetings paired with pandemic-driven variation in remote access as a natural experiment to measure effects on speaker volume and composition.

If this is right

  • Land use and zoning topics draw substantially more public speakers than other agenda items.
  • Eliminating remote meeting options lowers the total volume of public input.
  • The demographic and ideological skew in speakers remains stable even when remote access is available or removed.
  • Local policy outcomes may systematically reflect the preferences of a non-representative subset of residents.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The persistent skew toward homeowners could tilt local land-use decisions toward preserving property values over broader housing supply goals.
  • Remote access appears more effective at raising overall engagement levels than at correcting demographic imbalances in who participates.
  • Targeted outreach or changes to agenda-setting processes may be needed alongside access reforms to diversify input.
  • Similar patterns may appear in other states, but California-specific rules on public comment could shape the exact size of the skew.

Load-bearing premise

That automated transcription and diarization can reliably infer speaker demographics and ideology, and that pandemic-era changes in remote access created a clean natural experiment without major shifts in meeting topics or resident interests.

What would settle it

Direct validation showing that inferred speaker demographics from transcripts match actual self-reported data from meetings, or evidence that agenda content or public priorities changed systematically alongside remote access rules in a confounding way.

Figures

Figures reproduced from arXiv: 2604.21202 by Amar Venugopal, Olivia Martin.

Figure 1
Figure 1. Figure 1: Our data construction pipeline. Orange tiles denote data sources we ingest, while blue view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of agendized discussion (red) and unagendized public comment (blue) topic view at source ↗
Figure 3
Figure 3. Figure 3: Exponentially weighted moving averages (with smoothing factor view at source ↗
Figure 4
Figure 4. Figure 4: Average participation rates by age 5.2 City Characteristics and Participation Having documented extensive individual-level differences between participants and non-participants, we now identify city-level characteristics that are predictive of these differences view at source ↗
Figure 5
Figure 5. Figure 5: City characteristics and per-capita participation rates view at source ↗
Figure 6
Figure 6. Figure 6: Representation gaps and per-capita participation rates view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of first-time participation rates across cities view at source ↗
Figure 8
Figure 8. Figure 8: Average public speakers per meeting by remote access adoption view at source ↗
Figure 9
Figure 9. Figure 9: Average public speakers per meeting by age group view at source ↗
Figure 10
Figure 10. Figure 10: Dynamic treatment effects of eliminating remote public participation view at source ↗
Figure 11
Figure 11. Figure 11: Heterogeneous treatment effects by pre-treatment speaker age and city racial composi view at source ↗
Figure 12
Figure 12. Figure 12: Geographic coverage of collected data. Dots reflect sampled cities, with size of the dot view at source ↗
read the original abstract

Local government meetings are the most common formal channel through which residents speak directly with elected officials, contest policies, and shape local agendas. However, data constraints typically limit the empirical study of these meetings to agendas, single cities, or short time horizons. We collect and transcribe a massive new dataset of city council meetings from 115 California cities over the last decade, using advanced transcription and diarization techniques to analyze the speech content of the meetings themselves. We document two sets of descriptive findings: First, city council meetings are frequent, long, and vary modestly across towns and time in topical content. Second, public participants are substantially older, whiter, more male, more liberal, and more likely to own homes than the registered voter population, and public participation surges when topics related to land use and zoning are included in meeting agendas. Given this skew, we examine the main policy lever municipalities have to shift participation patterns: meeting access costs. Exploiting pandemic-era variation in remote access, we show that eliminating remote options reduces the number of speakers, but does not clearly change the composition of speakers. Collectively, these results provide the most comprehensive empirical portrait to date of who participates in local democracy, what draws them in, and how institutional design choices shape both the volume and composition of public input.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper assembles a large transcribed dataset of city council meetings from 115 California cities over the last decade. It reports that public speakers are substantially older, whiter, more male, more liberal, and more likely to be homeowners than the registered voter population; that participation rises sharply when land-use and zoning items appear on agendas; and that the removal of remote-access options during the pandemic reduced the number of speakers without materially altering their demographic composition.

Significance. If the descriptive skew and the limited compositional effect of remote access hold after robustness checks, the work supplies the largest-scale empirical portrait to date of who participates in local government meetings and what institutional levers affect volume versus composition. The data-collection effort itself is a clear contribution to the study of local democracy.

major comments (2)
  1. [abstract and main empirical section on remote access] The claim that eliminating remote options 'does not clearly change the composition of speakers' (abstract) rests on treating pandemic-era policy shifts as a clean natural experiment. The manuscript must show that changes in meeting agendas, health-risk perceptions, and differential mobility across demographic groups are not confounding the result; without explicit controls or placebo tests for these factors, the finding that composition is stable could reflect offsetting selection rather than a pure access-cost effect.
  2. [data and methods section] The demographic comparisons (older, whiter, more male, etc.) require documented validation of the automated transcription and diarization pipeline. The manuscript should report error rates for speaker identification, the method used to link transcribed speakers to voter-file demographics, and any manual audit of a subsample; absent these, the magnitude of the reported skew cannot be assessed for measurement error.
minor comments (2)
  1. [empirical strategy] Clarify the exact sample restrictions and time windows used for the pre- and post-remote-access comparisons to allow replication.
  2. [data description] Add a table or figure showing the raw number of meetings and speakers per city-year to document coverage and balance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the strengths and limitations of our analysis. We address the two major comments below, indicating the revisions we will undertake.

read point-by-point responses
  1. Referee: [abstract and main empirical section on remote access] The claim that eliminating remote options 'does not clearly change the composition of speakers' (abstract) rests on treating pandemic-era policy shifts as a clean natural experiment. The manuscript must show that changes in meeting agendas, health-risk perceptions, and differential mobility across demographic groups are not confounding the result; without explicit controls or placebo tests for these factors, the finding that composition is stable could reflect offsetting selection rather than a pure access-cost effect.

    Authors: We agree that additional robustness checks are warranted to strengthen the interpretation of the remote-access results as reflecting changes in access costs rather than confounding factors. In the revised manuscript, we will add controls for agenda composition (specifically the presence of land-use and zoning items, which we already document as strong predictors of participation volume) in the composition regressions. We will also include placebo tests using variation in meeting formats outside the pandemic period where possible, and discuss potential selection due to health risks and mobility by examining whether the effects differ by age and other demographics in ways consistent with differential mobility. While we cannot directly measure health-risk perceptions, these additions should address the main concerns about offsetting selection. revision: partial

  2. Referee: [data and methods section] The demographic comparisons (older, whiter, more male, etc.) require documented validation of the automated transcription and diarization pipeline. The manuscript should report error rates for speaker identification, the method used to link transcribed speakers to voter-file demographics, and any manual audit of a subsample; absent these, the magnitude of the reported skew cannot be assessed for measurement error.

    Authors: We appreciate this point and will expand the data and methods section to include the requested validation details. The revised manuscript will report benchmark error rates for the transcription (word error rate) and diarization (speaker error rate) pipelines based on standard evaluation datasets. We will describe the linking procedure to voter files, including the matching algorithm, match rates, and any assumptions about name and address uniqueness. Additionally, we will present results from a manual audit of a random subsample of 200 transcribed meetings, comparing automated speaker identification and demographic assignments to human-coded versions. These changes will allow readers to better evaluate potential measurement error in the reported demographic skews. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical descriptive study with external data

full rationale

The paper collects and transcribes external meeting data from 115 cities, computes descriptive statistics on speaker demographics versus voter populations, and uses pandemic-era policy variation as a natural experiment to compare participation volume and composition. No equations, fitted parameters, predictions, or derivations are present that could reduce to self-defined quantities or self-citations by construction. All findings rest on independent data sources and standard empirical methods without any load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based on abstract only; no free parameters, invented entities, or non-standard axioms are stated. Work rests on conventional empirical assumptions about data accuracy and exogeneity of the pandemic shock.

axioms (2)
  • domain assumption Automated transcription and diarization produce sufficiently accurate speaker identification and demographic inference for the reported comparisons.
    Necessary to support claims about who participates and how composition changes.
  • domain assumption Pandemic-era variation in remote meeting access constitutes a valid natural experiment for isolating the effect of access costs.
    Underpins the causal interpretation of the remote-access results.

pith-pipeline@v0.9.0 · 5520 in / 1303 out tokens · 74483 ms · 2026-05-08T13:26:58.314767+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 1 canonical work pages

  1. [1]

    Census Bureau

    Technical Report P20-586, U.S. Census Bureau. Series: Current Population Reports.(Cited on page 18.) Fabina, Jacob and Zachary Scherer (2022). Voting and Registration in the Election of November

  2. [2]

    Census Bureau

    Technical Report P20-585, U.S. Census Bureau. Series: Current Population Reports.(Cited on page 18.) Fischel, William A. (2009).The Homevoter Hypothesis: How Home Values Influence Local Gov- ernment Taxation, School Finance, and Land-Use Policies. Harvard University Press.(Cited on pages 2 and 28.) Fowler, Christopher S. and Billy Southern (2025). What We...

  3. [3]

    NA". - If you can only identify a first name or last name, return only that | \ do NOT fabricate or guess the missing part. For example, if a speaker \ says

    "name": Your best guess for the speaker's real name, based ONLY on what \ appears in the transcript (e.g., a speaker introducing themselves, or \ another speaker addressing them by name). - If you cannot confidently identify a name, return "NA". - If you can only identify a first name or last name, return only that | \ do NOT fabricate or guess the missin...

  4. [4]

    gov": Whether the speaker is part of the city government (labeled

    "gov": Whether the speaker is part of the city government (labeled "G") \ or a member of the public (labeled "NG"). City government includes \ council members, mayor, city clerk, city manager, city attorney, city \ planner, police, fire, and any other city staff

  5. [5]

    group": The capacity in which the speaker is appearing before the \ council. Use exactly one of the following single-letter codes: -

    "group": The capacity in which the speaker is appearing before the \ council. Use exactly one of the following single-letter codes: - "I": individual member of the public (not representing any group) - "L": labor union - "A": advocacy group - "E": environmental group - "B": business - "O": other organized group not covered above - "NA": speaker is a gover...

  6. [6]

    issue": A short descriptive title. Always include a specific street \ address, parcel number, or project name if one is mentioned in the \ transcript (e.g.,

    "issue": A short descriptive title. Always include a specific street \ address, parcel number, or project name if one is mentioned in the \ transcript (e.g., "1075 Pomeroy Avenue Rezoning", "Santana West Mixed-Use \ Project", "Whitmore Ranch Annexation"). Favor substantive names over \ procedural ones | use "Parking Rate Changes" not "Agenda Item B"

  7. [7]

    summary": A concise description (~70 characters) of what is under \ consideration. Include any specific address or project name. Do NOT begin \ with filler like

    "summary": A concise description (~70 characters) of what is under \ consideration. Include any specific address or project name. Do NOT begin \ with filler like "The city council considered" | lead with substance. \ Include the procedural posture if discernible from the transcript (e.g., \ "Second reading of ordinance to rezone...", "Appeal of PC denial ...

  8. [8]

    "public": true/false | was there any public comment or testimony? 41

  9. [9]

    "vote": true/false | was a vote taken?

  10. [10]

    vote_res

    "vote_res": The vote tally as "yea-nay-abstain" (e.g., "3-2-0"). \ Always report votes in favor first, then opposed, then abstentions. \ Do not report absences. If no vote was held, report "None"

  11. [11]

    vote_outcome

    "vote_outcome": A short phrase describing what the vote decided | \ the practical result for the item. Focus on what happened to the item, not \ the tally. Examples: "Approved", "Denied", "Approved on consent calendar", \ "Continued to [date]", "Continued indefinitely", "Introduced on first \ reading", "Adopted on second reading", "Referred to Planning Co...

  12. [12]

    vote_stage

    "vote_stage": Classify the vote's procedural significance: - "final": a dispositive action that resolved the matter (approval, \ denial, adoption on final reading, withdrawal). - "procedural": advanced or delayed the item without final resolution \ (first reading/introduction, continuance, referral to committee, \ tabling, motion to reconsider, direction ...

  13. [13]

    timestamp_start

    "timestamp_start": The start time of the earliest speaker block that \ covers discussion of this issue. The transcript labels each speaker block \ with a time range in the format "SPEAKER_XX (HH:MM:SS - HH:MM:SS):"; \ use the HH:MM:SS start time from the first relevant block. Include any \ speech related to debate, public comment, or the voting portion | ...

  14. [14]

    timestamp_end

    "timestamp_end": The end time of the latest speaker block that covers \ discussion of this issue. Use the HH:MM:SS end time from the \ "SPEAKER_XX (HH:MM:SS - HH:MM:SS):" label of the last relevant block, \ spanning the full conversation including debate, public comment, and \ voting. Report as a string in "HH:MM:SS" format. Use null if the \ transcript d...

  15. [15]

    If multiple topics overlap or are extremely similar, merge them

    Combine and unify these topics into exactly 10 final topics. If multiple topics overlap or are extremely similar, merge them. Choose concise and representative titles for the final set

  16. [16]

    Provide a brief, 1-3 sentence description for each of the 10 final topics. 43

  17. [17]

    Select the examples from among all the examples in the provided topic chunks

    Include exactly 5 representative examples per topic. Select the examples from among all the examples in the provided topic chunks

  18. [18]

    topicID": 0,

    Output the final topics in the following JSON structure: [ {{ "topicID": 0, "topicTitle": "Your Topic Title", "description": "Short description (1-3 sentences)", "representativeExamples": [ "Example 1", "Example 2", "Example 3", "Example 4", "Example 5" ] }} ] Do not return any code. Return ONLY valid JSON. A.5 Topic Classification You are an expert text-...

  19. [19]

    Select a meeting and its corresponding extracted LLM-identified speaker names, subsetting to speakers tagged by the LLM as members of the public

  20. [20]

    Pull down the subset of L2 data corresponding to that town and year the selected meeting occurred

  21. [21]

    Select a name from the LLM-identified set and ensure that the name appears in the underlying meeting transcript (to filter out hallucinations) •If both a first and last name are provided, ensure that both the first name and last name appear in the transcript, but not necessarily together •If both a first and last name are provided, but only the first name...

  22. [22]

    Compare the name to all registered voters in L2 and assign a match priority based on closeness of match, based on the following logic: •If both a first and last name are provided, use the following match priority order: –Exact match on first name and last name –Exact match on last name and either exact first name nickname match or metaphone match for firs...

  23. [23]

    •If there is a single matched name in the highest priority class, assign it as the matched name

    Consider only candidate matches in the class with highest match priority and apply the following logic: 46 •If all priority sets are empty, do not match the name at all. •If there is a single matched name in the highest priority class, assign it as the matched name. •If there is more than one matched name in the highest priority class and the LLM did not ...

  24. [24]

    Repeat Steps 3-5 for all names in the given meeting

  25. [25]

    SPEAKER 01: TEXT; SPEAKER 02: TEXT

    Repeat Steps 1-6 for all meetings in the corpus. If this process fails to yield a match for a given speakers, we attempt to instead find a match in the preceding or subsequent year of L2 data. This allows us to account for people who move to or from a given town, or change their registered voter address, in the middle of the year, before or after the L2 r...