A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.
https://arxiv.org/abs/2211.12561
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
Kosmos-1 shows strong zero-shot and few-shot results on language tasks, image captioning, visual QA, OCR-free document understanding, and image recognition guided by text instructions.
Introduces a fairness layer for deep learning models that guarantees output parity and an online primal-dual algorithm for aggregate fairness guarantees in streaming predictions with small batch sizes.
PVM adds a parallel branch to LVLMs that directly supplies visual embeddings to prevent attention decay over long generated sequences, yielding accuracy gains on reasoning tasks with minimal overhead.
Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.
REPLUG improves frozen black-box LMs by prepending LM-supervised retrieved documents, delivering 6.3% better language modeling on GPT-3 and 5.1% better five-shot MMLU on Codex.
AGREE boosts visual document retrieval by adding local relevance signals from MLLM attention maps to global document labels during retriever training.
mPLUG-Owl3 introduces hyper attention blocks to integrate vision and language for long image-sequence understanding and reports SOTA results on single-image, multi-image, and video benchmarks.
A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.
citing papers explorer
-
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems
A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.
-
Language Is Not All You Need: Aligning Perception with Language Models
Kosmos-1 shows strong zero-shot and few-shot results on language tasks, image captioning, visual QA, OCR-free document understanding, and image recognition guided by text instructions.
-
Differentiable Optimization Layers for Guaranteed Fairness in Deep Learning
Introduces a fairness layer for deep learning models that guarantees output parity and an online primal-dual algorithm for aggregate fairness guarantees in streaming predictions with small batch sizes.
-
Persistent Visual Memory: Sustaining Perception for Deep Generation in LVLMs
PVM adds a parallel branch to LVLMs that directly supplies visual embeddings to prevent attention decay over long generated sequences, yielding accuracy gains on reasoning tasks with minimal overhead.
-
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Self-RAG trains LLMs to adaptively retrieve passages on demand and self-critique using reflection tokens, outperforming ChatGPT and retrieval-augmented Llama2 on QA, reasoning, and fact verification.
-
REPLUG: Retrieval-Augmented Black-Box Language Models
REPLUG improves frozen black-box LMs by prepending LM-supervised retrieved documents, delivering 6.3% better language modeling on GPT-3 and 5.1% better five-shot MMLU on Codex.
-
Attention Grounded Enhancement for Visual Document Retrieval
AGREE boosts visual document retrieval by adding local relevance signals from MLLM attention maps to global document labels during retriever training.
-
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
mPLUG-Owl3 introduces hyper attention blocks to integrate vision and language for long image-sequence understanding and reports SOTA results on single-image, multi-image, and video benchmarks.
-
Retrieval-Augmented Generation for Large Language Models: A Survey
A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.