Cohen, and Xinghua Lu

Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William W · 1909 · arXiv 1909.06146

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

representative citing papers

PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts

cs.CR · 2026-05-07 · unverdicted · novelty 7.0

PragLocker protects agent prompts as IP by building non-portable obfuscated versions that function only on the intended LLM through code-symbol semantic anchoring followed by target-model feedback noise injection.

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

cs.CL · 2024-02-05 · unverdicted · novelty 7.0

M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.

Preventing Safety Drift in Large Language Models via Coupled Weight and Activation Constraints

cs.AI · 2026-04-14 · unverdicted · novelty 6.0

Coupled constraints on weight updates in a safety subspace and regularization of SAE-identified safety features preserve LLM refusal behaviors during fine-tuning better than weight-only or activation-only methods.

MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors

cs.CL · 2026-04-08 · unverdicted · novelty 6.0

MedDialBench shows LLMs suffer 1.7-3.4x larger diagnostic accuracy drops from patients fabricating symptoms than withholding them, with fabrication driving super-additive interaction effects across models.

Galactica: A Large Language Model for Science

cs.CL · 2022-11-16 · unverdicted · novelty 5.0

Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.

Medical Incident Causal Factors and Preventive Measures Generation Using Tag-based Example Selection in Few-shot Learning

cs.CL · 2026-05-11 · unverdicted · novelty 4.0

Tag-based few-shot selection yields higher precision and stability than random or similarity-based methods when using LLMs to analyze medical incidents.

MedGemma 1.5 Technical Report

cs.AI · 2026-04-06 · unverdicted · novelty 4.0

MedGemma 1.5 4B reports absolute gains of 11% on 3D MRI classification, 3% on 3D CT, 47% macro F1 on pathology slides, 35% IoU on anatomical localization, and 5-22% on clinical QA tasks over MedGemma 1.

Comparative Analysis of Large Language Models in Healthcare

cs.CL · 2026-04-11 · unverdicted · novelty 3.0

Domain-specific models like ChatDoctor excel at medically accurate and contextually reliable text while general-purpose models like Grok and LLaMA perform better on structured medical question-answering tasks.

Gyan: An Explainable Neuro-Symbolic Language Model

cs.CL · 2026-05-06

citing papers explorer

Showing 9 of 9 citing papers.

PragLocker: Protecting Agent Intellectual Property in Untrusted Deployments via Non-Portable Prompts cs.CR · 2026-05-07 · unverdicted · none · ref 46
PragLocker protects agent prompts as IP by building non-portable obfuscated versions that function only on the intended LLM through code-symbol semantic anchoring followed by target-model feedback noise injection.
M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation cs.CL · 2024-02-05 · unverdicted · none · ref 73
M3-Embedding is a single model for multi-lingual, multi-functional, and multi-granular text embeddings trained via self-knowledge distillation that achieves new state-of-the-art results on multilingual, cross-lingual, and long-document retrieval benchmarks.
Preventing Safety Drift in Large Language Models via Coupled Weight and Activation Constraints cs.AI · 2026-04-14 · unverdicted · none · ref 21
Coupled constraints on weight updates in a safety subspace and regularization of SAE-identified safety features preserve LLM refusal behaviors during fine-tuning better than weight-only or activation-only methods.
MedDialBench: Benchmarking LLM Diagnostic Robustness under Parametric Adversarial Patient Behaviors cs.CL · 2026-04-08 · unverdicted · none · ref 6
MedDialBench shows LLMs suffer 1.7-3.4x larger diagnostic accuracy drops from patients fabricating symptoms than withholding them, with fabrication driving super-additive interaction effects across models.
Galactica: A Large Language Model for Science cs.CL · 2022-11-16 · unverdicted · none · ref 183
Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.
Medical Incident Causal Factors and Preventive Measures Generation Using Tag-based Example Selection in Few-shot Learning cs.CL · 2026-05-11 · unverdicted · none · ref 27
Tag-based few-shot selection yields higher precision and stability than random or similarity-based methods when using LLMs to analyze medical incidents.
MedGemma 1.5 Technical Report cs.AI · 2026-04-06 · unverdicted · none · ref 10
MedGemma 1.5 4B reports absolute gains of 11% on 3D MRI classification, 3% on 3D CT, 47% macro F1 on pathology slides, 35% IoU on anatomical localization, and 5-22% on clinical QA tasks over MedGemma 1.
Comparative Analysis of Large Language Models in Healthcare cs.CL · 2026-04-11 · unverdicted · none · ref 41
Domain-specific models like ChatDoctor excel at medically accurate and contextually reliable text while general-purpose models like Grok and LLaMA perform better on structured medical question-answering tasks.
Gyan: An Explainable Neuro-Symbolic Language Model cs.CL · 2026-05-06 · unreviewed · ref 31

Cohen, and Xinghua Lu

fields

years

verdicts

representative citing papers

citing papers explorer