C-pack: Packaged resources to advance general chinese embedding

Shitao Xiao, Zheng Liu, Peitian Zhang, Niklas Muennighoff · 2023

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

browse 8 citing papers

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

SMMBench: A Benchmark for Source-Distributed Multimodal Agent Memory

cs.CL · 2026-05-15 · unverdicted · novelty 7.0

SMMBench is a benchmark evaluating multimodal agents on cross-source reasoning, conflict resolution, preference reasoning, and action prediction, showing current systems struggle with evidence distributed across heterogeneous sources.

Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning

cs.CL · 2026-05-09 · unverdicted · novelty 7.0

OPHSD uses harness-augmented models as teachers to distill reasoning capabilities into base LLMs, yielding strong standalone performance on classification and math tasks.

"I'm Not Mad, Just Focused'': Understanding Human Emotions in Human-Robot Collaboration

cs.RO · 2026-05-16 · conditional · novelty 6.0

A VLM-based emotion recognition system for human-robot collaboration achieves higher semantic and sentiment alignment with human annotations than a CNN baseline and results in preferred adaptive robot behavior in a user study.

LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation

cs.IR · 2026-05-11 · unverdicted · novelty 6.0

LASAR uses two-stage supervised training plus reinforcement learning to ground semantic IDs, align latent reasoning trajectories to CoT hidden states via KL divergence, and adaptively choose reasoning depth, halving average steps while improving quality on three datasets.

MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

MAS-Algorithm is a multi-agent workflow that improves AI acceptance rates on algorithmic problems by 6.48% on average, outperforming parameter-efficient fine-tuning.

Supervising the search process produces reliable and generalizable information-seeking agents

cs.CL · 2025-02-19 · unverdicted · novelty 6.0

Process supervision via RAG-Gym produces more reliable and generalizable search agents, with gains driven by higher-quality queries on out-of-domain multi-hop tasks.

DataComp-LM: In search of the next generation of training sets for language models

cs.LG · 2024-06-17 · unverdicted · novelty 6.0

DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.

Nomic Embed: Training a Reproducible Long Context Text Embedder

cs.CL · 2024-02-02 · conditional · novelty 6.0

Nomic AI produced and open-sourced a reproducible 8192-context English text embedder that exceeds OpenAI Ada-002 and text-embedding-3-small performance on MTEB short-context and LoCo long-context benchmarks.

citing papers explorer

Showing 8 of 8 citing papers.

SMMBench: A Benchmark for Source-Distributed Multimodal Agent Memory cs.CL · 2026-05-15 · unverdicted · none · ref 45
SMMBench is a benchmark evaluating multimodal agents on cross-source reasoning, conflict resolution, preference reasoning, and action prediction, showing current systems struggle with evidence distributed across heterogeneous sources.
Training with Harnesses: On-Policy Harness Self-Distillation for Complex Reasoning cs.CL · 2026-05-09 · unverdicted · none · ref 36
OPHSD uses harness-augmented models as teachers to distill reasoning capabilities into base LLMs, yielding strong standalone performance on classification and math tasks.
"I'm Not Mad, Just Focused'': Understanding Human Emotions in Human-Robot Collaboration cs.RO · 2026-05-16 · conditional · none · ref 25
A VLM-based emotion recognition system for human-robot collaboration achieves higher semantic and sentiment alignment with human annotations than a CNN baseline and results in preferred adaptive robot behavior in a user study.
LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation cs.IR · 2026-05-11 · unverdicted · none · ref 52
LASAR uses two-stage supervised training plus reinforcement learning to ground semantic IDs, align latent reasoning trajectories to CoT hidden states via KL divergence, and adaptively choose reasoning depth, halving average steps while improving quality on three datasets.
MAS-Algorithm: A Workflow for Solving Algorithmic Programming Problems with a Multi-Agent System cs.AI · 2026-05-07 · unverdicted · none · ref 40 · 2 links
MAS-Algorithm is a multi-agent workflow that improves AI acceptance rates on algorithmic problems by 6.48% on average, outperforming parameter-efficient fine-tuning.
Supervising the search process produces reliable and generalizable information-seeking agents cs.CL · 2025-02-19 · unverdicted · none · ref 84
Process supervision via RAG-Gym produces more reliable and generalizable search agents, with gains driven by higher-quality queries on out-of-domain multi-hop tasks.
DataComp-LM: In search of the next generation of training sets for language models cs.LG · 2024-06-17 · unverdicted · none · ref 202
DCLM-Baseline dataset lets a 7B model reach 64% 5-shot MMLU accuracy after 2.6T tokens, beating prior open-data models by 6.6 points on MMLU with 40% less compute.
Nomic Embed: Training a Reproducible Long Context Text Embedder cs.CL · 2024-02-02 · conditional · none · ref 71
Nomic AI produced and open-sourced a reproducible 8192-context English text embedder that exceeds OpenAI Ada-002 and text-embedding-3-small performance on MTEB short-context and LoCo long-context benchmarks.

C-pack: Packaged resources to advance general chinese embedding

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer