Recognition: unknown
Participation and Representation in Local Government Speech
Pith reviewed 2026-05-08 13:26 UTC · model grok-4.3
The pith
Public speakers at California city council meetings are older, whiter, more male, more liberal, and more likely to own homes than registered voters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using transcripts from a decade of city council meetings across 115 California cities, we find that public participants are substantially older, whiter, more male, more liberal, and more likely to own homes than the registered voter population. Participation surges when land use and zoning topics are on the agenda. Exploiting pandemic-era changes in remote access options, we show that removing remote participation reduces the total number of speakers but does not clearly alter the demographic or ideological composition of those who speak.
What carries the argument
Large transcribed dataset of city council meetings paired with pandemic-driven variation in remote access as a natural experiment to measure effects on speaker volume and composition.
If this is right
- Land use and zoning topics draw substantially more public speakers than other agenda items.
- Eliminating remote meeting options lowers the total volume of public input.
- The demographic and ideological skew in speakers remains stable even when remote access is available or removed.
- Local policy outcomes may systematically reflect the preferences of a non-representative subset of residents.
Where Pith is reading between the lines
- The persistent skew toward homeowners could tilt local land-use decisions toward preserving property values over broader housing supply goals.
- Remote access appears more effective at raising overall engagement levels than at correcting demographic imbalances in who participates.
- Targeted outreach or changes to agenda-setting processes may be needed alongside access reforms to diversify input.
- Similar patterns may appear in other states, but California-specific rules on public comment could shape the exact size of the skew.
Load-bearing premise
That automated transcription and diarization can reliably infer speaker demographics and ideology, and that pandemic-era changes in remote access created a clean natural experiment without major shifts in meeting topics or resident interests.
What would settle it
Direct validation showing that inferred speaker demographics from transcripts match actual self-reported data from meetings, or evidence that agenda content or public priorities changed systematically alongside remote access rules in a confounding way.
Figures
read the original abstract
Local government meetings are the most common formal channel through which residents speak directly with elected officials, contest policies, and shape local agendas. However, data constraints typically limit the empirical study of these meetings to agendas, single cities, or short time horizons. We collect and transcribe a massive new dataset of city council meetings from 115 California cities over the last decade, using advanced transcription and diarization techniques to analyze the speech content of the meetings themselves. We document two sets of descriptive findings: First, city council meetings are frequent, long, and vary modestly across towns and time in topical content. Second, public participants are substantially older, whiter, more male, more liberal, and more likely to own homes than the registered voter population, and public participation surges when topics related to land use and zoning are included in meeting agendas. Given this skew, we examine the main policy lever municipalities have to shift participation patterns: meeting access costs. Exploiting pandemic-era variation in remote access, we show that eliminating remote options reduces the number of speakers, but does not clearly change the composition of speakers. Collectively, these results provide the most comprehensive empirical portrait to date of who participates in local democracy, what draws them in, and how institutional design choices shape both the volume and composition of public input.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper assembles a large transcribed dataset of city council meetings from 115 California cities over the last decade. It reports that public speakers are substantially older, whiter, more male, more liberal, and more likely to be homeowners than the registered voter population; that participation rises sharply when land-use and zoning items appear on agendas; and that the removal of remote-access options during the pandemic reduced the number of speakers without materially altering their demographic composition.
Significance. If the descriptive skew and the limited compositional effect of remote access hold after robustness checks, the work supplies the largest-scale empirical portrait to date of who participates in local government meetings and what institutional levers affect volume versus composition. The data-collection effort itself is a clear contribution to the study of local democracy.
major comments (2)
- [abstract and main empirical section on remote access] The claim that eliminating remote options 'does not clearly change the composition of speakers' (abstract) rests on treating pandemic-era policy shifts as a clean natural experiment. The manuscript must show that changes in meeting agendas, health-risk perceptions, and differential mobility across demographic groups are not confounding the result; without explicit controls or placebo tests for these factors, the finding that composition is stable could reflect offsetting selection rather than a pure access-cost effect.
- [data and methods section] The demographic comparisons (older, whiter, more male, etc.) require documented validation of the automated transcription and diarization pipeline. The manuscript should report error rates for speaker identification, the method used to link transcribed speakers to voter-file demographics, and any manual audit of a subsample; absent these, the magnitude of the reported skew cannot be assessed for measurement error.
minor comments (2)
- [empirical strategy] Clarify the exact sample restrictions and time windows used for the pre- and post-remote-access comparisons to allow replication.
- [data description] Add a table or figure showing the raw number of meetings and speakers per city-year to document coverage and balance.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the strengths and limitations of our analysis. We address the two major comments below, indicating the revisions we will undertake.
read point-by-point responses
-
Referee: [abstract and main empirical section on remote access] The claim that eliminating remote options 'does not clearly change the composition of speakers' (abstract) rests on treating pandemic-era policy shifts as a clean natural experiment. The manuscript must show that changes in meeting agendas, health-risk perceptions, and differential mobility across demographic groups are not confounding the result; without explicit controls or placebo tests for these factors, the finding that composition is stable could reflect offsetting selection rather than a pure access-cost effect.
Authors: We agree that additional robustness checks are warranted to strengthen the interpretation of the remote-access results as reflecting changes in access costs rather than confounding factors. In the revised manuscript, we will add controls for agenda composition (specifically the presence of land-use and zoning items, which we already document as strong predictors of participation volume) in the composition regressions. We will also include placebo tests using variation in meeting formats outside the pandemic period where possible, and discuss potential selection due to health risks and mobility by examining whether the effects differ by age and other demographics in ways consistent with differential mobility. While we cannot directly measure health-risk perceptions, these additions should address the main concerns about offsetting selection. revision: partial
-
Referee: [data and methods section] The demographic comparisons (older, whiter, more male, etc.) require documented validation of the automated transcription and diarization pipeline. The manuscript should report error rates for speaker identification, the method used to link transcribed speakers to voter-file demographics, and any manual audit of a subsample; absent these, the magnitude of the reported skew cannot be assessed for measurement error.
Authors: We appreciate this point and will expand the data and methods section to include the requested validation details. The revised manuscript will report benchmark error rates for the transcription (word error rate) and diarization (speaker error rate) pipelines based on standard evaluation datasets. We will describe the linking procedure to voter files, including the matching algorithm, match rates, and any assumptions about name and address uniqueness. Additionally, we will present results from a manual audit of a random subsample of 200 transcribed meetings, comparing automated speaker identification and demographic assignments to human-coded versions. These changes will allow readers to better evaluate potential measurement error in the reported demographic skews. revision: yes
Circularity Check
No circularity: purely empirical descriptive study with external data
full rationale
The paper collects and transcribes external meeting data from 115 cities, computes descriptive statistics on speaker demographics versus voter populations, and uses pandemic-era policy variation as a natural experiment to compare participation volume and composition. No equations, fitted parameters, predictions, or derivations are present that could reduce to self-defined quantities or self-citations by construction. All findings rest on independent data sources and standard empirical methods without any load-bearing self-referential steps.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Automated transcription and diarization produce sufficiently accurate speaker identification and demographic inference for the reported comparisons.
- domain assumption Pandemic-era variation in remote meeting access constitutes a valid natural experiment for isolating the effect of access costs.
Reference graph
Works this paper leans on
-
[1]
Census Bureau
Technical Report P20-586, U.S. Census Bureau. Series: Current Population Reports.(Cited on page 18.) Fabina, Jacob and Zachary Scherer (2022). Voting and Registration in the Election of November
2022
-
[2]
Technical Report P20-585, U.S. Census Bureau. Series: Current Population Reports.(Cited on page 18.) Fischel, William A. (2009).The Homevoter Hypothesis: How Home Values Influence Local Gov- ernment Taxation, School Finance, and Land-Use Policies. Harvard University Press.(Cited on pages 2 and 28.) Fowler, Christopher S. and Billy Southern (2025). What We...
-
[3]
NA". - If you can only identify a first name or last name, return only that | \ do NOT fabricate or guess the missing part. For example, if a speaker \ says
"name": Your best guess for the speaker's real name, based ONLY on what \ appears in the transcript (e.g., a speaker introducing themselves, or \ another speaker addressing them by name). - If you cannot confidently identify a name, return "NA". - If you can only identify a first name or last name, return only that | \ do NOT fabricate or guess the missin...
-
[4]
gov": Whether the speaker is part of the city government (labeled
"gov": Whether the speaker is part of the city government (labeled "G") \ or a member of the public (labeled "NG"). City government includes \ council members, mayor, city clerk, city manager, city attorney, city \ planner, police, fire, and any other city staff
-
[5]
group": The capacity in which the speaker is appearing before the \ council. Use exactly one of the following single-letter codes: -
"group": The capacity in which the speaker is appearing before the \ council. Use exactly one of the following single-letter codes: - "I": individual member of the public (not representing any group) - "L": labor union - "A": advocacy group - "E": environmental group - "B": business - "O": other organized group not covered above - "NA": speaker is a gover...
-
[6]
issue": A short descriptive title. Always include a specific street \ address, parcel number, or project name if one is mentioned in the \ transcript (e.g.,
"issue": A short descriptive title. Always include a specific street \ address, parcel number, or project name if one is mentioned in the \ transcript (e.g., "1075 Pomeroy Avenue Rezoning", "Santana West Mixed-Use \ Project", "Whitmore Ranch Annexation"). Favor substantive names over \ procedural ones | use "Parking Rate Changes" not "Agenda Item B"
-
[7]
summary": A concise description (~70 characters) of what is under \ consideration. Include any specific address or project name. Do NOT begin \ with filler like
"summary": A concise description (~70 characters) of what is under \ consideration. Include any specific address or project name. Do NOT begin \ with filler like "The city council considered" | lead with substance. \ Include the procedural posture if discernible from the transcript (e.g., \ "Second reading of ordinance to rezone...", "Appeal of PC denial ...
-
[8]
"public": true/false | was there any public comment or testimony? 41
-
[9]
"vote": true/false | was a vote taken?
-
[10]
vote_res
"vote_res": The vote tally as "yea-nay-abstain" (e.g., "3-2-0"). \ Always report votes in favor first, then opposed, then abstentions. \ Do not report absences. If no vote was held, report "None"
-
[11]
vote_outcome
"vote_outcome": A short phrase describing what the vote decided | \ the practical result for the item. Focus on what happened to the item, not \ the tally. Examples: "Approved", "Denied", "Approved on consent calendar", \ "Continued to [date]", "Continued indefinitely", "Introduced on first \ reading", "Adopted on second reading", "Referred to Planning Co...
-
[12]
vote_stage
"vote_stage": Classify the vote's procedural significance: - "final": a dispositive action that resolved the matter (approval, \ denial, adoption on final reading, withdrawal). - "procedural": advanced or delayed the item without final resolution \ (first reading/introduction, continuance, referral to committee, \ tabling, motion to reconsider, direction ...
-
[13]
timestamp_start
"timestamp_start": The start time of the earliest speaker block that \ covers discussion of this issue. The transcript labels each speaker block \ with a time range in the format "SPEAKER_XX (HH:MM:SS - HH:MM:SS):"; \ use the HH:MM:SS start time from the first relevant block. Include any \ speech related to debate, public comment, or the voting portion | ...
-
[14]
timestamp_end
"timestamp_end": The end time of the latest speaker block that covers \ discussion of this issue. Use the HH:MM:SS end time from the \ "SPEAKER_XX (HH:MM:SS - HH:MM:SS):" label of the last relevant block, \ spanning the full conversation including debate, public comment, and \ voting. Report as a string in "HH:MM:SS" format. Use null if the \ transcript d...
-
[15]
If multiple topics overlap or are extremely similar, merge them
Combine and unify these topics into exactly 10 final topics. If multiple topics overlap or are extremely similar, merge them. Choose concise and representative titles for the final set
-
[16]
Provide a brief, 1-3 sentence description for each of the 10 final topics. 43
-
[17]
Select the examples from among all the examples in the provided topic chunks
Include exactly 5 representative examples per topic. Select the examples from among all the examples in the provided topic chunks
-
[18]
topicID": 0,
Output the final topics in the following JSON structure: [ {{ "topicID": 0, "topicTitle": "Your Topic Title", "description": "Short description (1-3 sentences)", "representativeExamples": [ "Example 1", "Example 2", "Example 3", "Example 4", "Example 5" ] }} ] Do not return any code. Return ONLY valid JSON. A.5 Topic Classification You are an expert text-...
-
[19]
Select a meeting and its corresponding extracted LLM-identified speaker names, subsetting to speakers tagged by the LLM as members of the public
-
[20]
Pull down the subset of L2 data corresponding to that town and year the selected meeting occurred
-
[21]
Select a name from the LLM-identified set and ensure that the name appears in the underlying meeting transcript (to filter out hallucinations) •If both a first and last name are provided, ensure that both the first name and last name appear in the transcript, but not necessarily together •If both a first and last name are provided, but only the first name...
-
[22]
Compare the name to all registered voters in L2 and assign a match priority based on closeness of match, based on the following logic: •If both a first and last name are provided, use the following match priority order: –Exact match on first name and last name –Exact match on last name and either exact first name nickname match or metaphone match for firs...
-
[23]
•If there is a single matched name in the highest priority class, assign it as the matched name
Consider only candidate matches in the class with highest match priority and apply the following logic: 46 •If all priority sets are empty, do not match the name at all. •If there is a single matched name in the highest priority class, assign it as the matched name. •If there is more than one matched name in the highest priority class and the LLM did not ...
-
[24]
Repeat Steps 3-5 for all names in the given meeting
-
[25]
SPEAKER 01: TEXT; SPEAKER 02: TEXT
Repeat Steps 1-6 for all meetings in the corpus. If this process fails to yield a match for a given speakers, we attempt to instead find a match in the preceding or subsequent year of L2 data. This allows us to account for people who move to or from a given town, or change their registered voter address, in the middle of the year, before or after the L2 r...
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.