pith. sign in
Pith Number

pith:WUMKX3E3

pith:2026:WUMKX3E3SHNFAAQH7HL7VVC4CD
not attested not anchored not stored refs resolved

Leveraging Multimodal Self-Consistency Reasoning in Coding Motivational Interviewing for Alcohol Use Reduction

Benjamin O. Ladd, Brian Borsari, Guangzeng Han, James G. Murphy, Xiaolei Huang

A multimodal self-consistency method using audio-language models codes motivational interviewing sessions more accurately than single-pass approaches.

arxiv:2605.12987 v1 · 2026-05-13 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{WUMKX3E3SHNFAAQH7HL7VVC4CD}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

The proposed multimodal self-consistency approach achieved 52.56% accuracy, 54.03% precision, 47.45% recall, and a macro-F1 score of 46.40%, exceeding baseline methods.

C2weakest assumption

That the five de-identified MI audio tapes are representative of typical sessions and that majority voting across the twelve trajectories reliably improves accuracy without introducing systematic bias from the chosen prompts or model stochasticity.

C3one line summary

Multimodal self-consistency reasoning with audio-language models reaches 52.56% accuracy on coding five MI sessions, outperforming single-pass baselines.

References

22 extracted · 22 resolved · 1 Pith anchors

[1] Funding National Science Foundation CNS-2318210 and TI-2434589 2025
[2] BMC psychiatry , volume= 2025
[3] M 3 TCM : Multi-modal Multi-task Context Model for Utterance Classification in Motivational Interviews 2024
[4] Advances in neural information processing systems , volume=
[5] Multimodal audio-language model for speech emotion recognition , author=. 2024 , publisher= 2024

Formal links

1 machine-checked theorem link

Receipt and verification
First computed 2026-05-18T03:09:00.584598Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

b518abec9b91da500207f9d7fad45c10dd675a2766dd630a7557ff09dcd3f165

Aliases

arxiv: 2605.12987 · arxiv_version: 2605.12987v1 · doi: 10.48550/arxiv.2605.12987 · pith_short_12: WUMKX3E3SHNF · pith_short_16: WUMKX3E3SHNFAAQH · pith_short_8: WUMKX3E3
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/WUMKX3E3SHNFAAQH7HL7VVC4CD \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b518abec9b91da500207f9d7fad45c10dd675a2766dd630a7557ff09dcd3f165
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "95c0c5d51065316becd12dcbeeb23978a1645a1979cdaef4829b7cd19aaa2bb8",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2026-05-13T04:36:04Z",
    "title_canon_sha256": "46a4ca994abd6fde2c1d861045d5119b643ffa29015b1ae649ef3b7c722dbd5a"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.12987",
    "kind": "arxiv",
    "version": 1
  }
}