arxiv: 2604.02368 · v4 · submitted 2026-03-27 · 💻 cs.AI · cs.CL

Recognition: 2 theorem links

· Lean Theorem

Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation

Xue Liu , Xin Ma , Yuxin Ma , Yongchang Peng , Duo Wang , Zhoufutu Wen , Ge Zhang , Kaiyuan Zhang

show 31 more authors

Xinyu Chen Yida Ding Tianci He Jiani Hou Liang Hu Ziyun Huang Yongzhe Hui Jianpeng Jiao Chennan Ju Yingru Kong Yiran Li Jiashuo Liu Mengyun Liu Luyao Ma Fei Ni Yiqing Ni Pengbo Niu Yueyan Qiu Yanle Ren Xinyu Shen Zilin Shi Zaiyuan Wang Wenjie Yue Chun Zhang Shiyu Zhang Xinyi Zhang Kaiwen Zhao Zhenwei Zhu Shanshan Wu Qi Zhao Wenhao Huang

Authors on Pith no claims yet

Pith reviewed 2026-05-14 23:18 UTC · model grok-4.3

classification 💻 cs.AI cs.CL

keywords XpertBenchLLM benchmarkexpert tasksrubric evaluationprofessional domainsperformance ceilingShotJudge

0 comments

The pith

Leading LLMs reach only around 66 percent success on expert-level professional tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces XpertBench, a collection of 1,346 tasks drawn directly from more than 1,000 submissions by domain experts in finance, healthcare, law, education, and research. Each task comes with detailed rubrics that break performance into 15 to 40 weighted checkpoints to measure professional rigor. To scale evaluation without self-bias, the work defines ShotJudge, an approach that calibrates LLM judges using expert few-shot examples. When applied to current top models, the benchmark shows a hard ceiling of roughly 66 percent peak success and 55 percent mean scores, plus clear splits in which models handle numbers better than language synthesis and vice versa. This gap indicates that general-purpose training has not yet produced reliable expert-level judgment.

Core claim

XpertBench demonstrates that state-of-the-art large language models exhibit a pronounced performance ceiling of approximately 66 percent peak success rate and mean scores near 55 percent when tested on 1,346 expert-curated tasks across finance, healthcare, legal services, education, and dual-track research domains, accompanied by non-overlapping strengths between quantitative reasoning and linguistic synthesis.

What carries the argument

XpertBench benchmark of 1,346 tasks sourced from expert submissions and scored with rubrics of 15-40 weighted checkpoints, assessed via the ShotJudge paradigm of expert-calibrated few-shot LLM judging.

If this is right

Current models remain general assistants rather than dependable specialized collaborators in high-stakes professional settings.
Quantitative and linguistic domains require distinct improvements because model strengths do not overlap.
Rubric-based scoring with many checkpoints supplies finer diagnostics than single-score accuracy metrics.
Training objectives must target the identified expert-gap to move beyond plateaued general benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Sustained progress on these tasks could function as a practical yardstick for when AI systems become viable independent professional agents.
Domain divergence suggests hybrid systems that route tasks to the strongest available model for each area could raise overall performance.
The benchmark tasks themselves could serve as training data for targeted fine-tuning once the ceiling is better understood.
Human-AI workflows could use the checkpoints to decide precisely where expert review remains necessary.

Load-bearing premise

The expert submissions and the resulting weighted rubric checkpoints accurately capture genuine expert-level cognition without selection bias or construction artifacts.

What would settle it

If panels of actual domain experts re-grade the same 1,346 tasks and find that leading models exceed 80 percent success or agreement on the rubrics, the reported performance ceiling would be falsified.

read the original abstract

As Large Language Models (LLMs) exhibit plateauing performance on conventional benchmarks, a pivotal challenge persists: evaluating their proficiency in complex, open-ended tasks characterizing genuine expert-level cognition. Existing frameworks suffer from narrow domain coverage, reliance on generalist tasks, or self-evaluation biases. To bridge this gap, we present XpertBench, a high-fidelity benchmark engineered to assess LLMs across authentic professional domains. XpertBench consists of 1,346 meticulously curated tasks across 80 categories, spanning finance, healthcare, legal services, education, and dual-track research (STEM and Humanities). These tasks are derived from over 1,000 submissions by domain experts--including researchers from elite institutions and practitioners with extensive clinical or industrial experience--ensuring superior ecological validity. Each task uses detailed rubrics with mostly 15-40 weighted checkpoints to assess professional rigor. To facilitate scalable yet human-aligned assessment, we introduce ShotJudge, a novel evaluation paradigm that employs LLM judges calibrated with expert few-shot exemplars to mitigate self-rewarding biases. Our empirical evaluation of state-of-the-art LLMs reveals a pronounced performance ceiling: even leading models achieve a peak success rate of only ~66%, with a mean score around 55%. Models also exhibit domain-specific divergence, showing non-overlapping strengths in quantitative reasoning versus linguistic synthesis.. These findings underscore a significant "expert-gap" in current AI systems and establish XpertBench as a critical instrument for navigating the transition from general-purpose assistants to specialized professional collaborators.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

XpertBench collects a large set of real expert tasks with detailed rubrics and introduces ShotJudge for scoring, but the results rest on unvalidated rubric weights and judge calibration.

read the letter

XpertBench is worth knowing about because it assembles a large set of tasks from real experts and uses detailed rubrics for scoring. The main contribution is the collection of 1,346 tasks across 80 categories in professional areas, drawn from submissions by researchers and practitioners. They pair this with ShotJudge, a few-shot method to calibrate LLM judges using expert examples. This setup produces results showing leading models at a 66% peak and 55% mean, plus some domain-specific patterns in performance. The paper does well in emphasizing the shift toward professional tasks and in sourcing the content from people with actual experience rather than synthetic or academic problems. The soft spots center on the evaluation details. There are no reported figures for how consistently the rubrics match human expert judgments or how well ShotJudge aligns with independent human scoring. The weights on the checkpoints also lack explanation on their derivation or cross-task consistency. This makes it difficult to fully trust the exact performance numbers without additional checks for bias or reliability in the rubric construction. Readers who work on benchmark development or on AI for specific industries will find the task descriptions and category breakdown useful. The scale here is bigger than prior efforts, so it could serve as a reference point even if the current results are preliminary. I would send this to peer review. The idea is solid enough and the expert sourcing is a clear step forward, but the authors need to address the missing agreement metrics to strengthen the claims.

Referee Report

3 major / 1 minor

Summary. The paper introduces XpertBench, a benchmark of 1,346 expert-derived tasks across 80 categories in domains including finance, healthcare, legal, education, and research. Tasks are assessed via rubrics containing 15-40 weighted checkpoints, with evaluation performed by the introduced ShotJudge method that uses LLM judges calibrated on expert few-shot exemplars. Empirical results on state-of-the-art LLMs report a peak success rate of ~66% and mean score of ~55%, together with domain-specific performance divergences between quantitative and linguistic tasks, which the authors interpret as evidence of an expert-level gap.

Significance. If the rubric validity and ShotJudge alignment claims are substantiated, the work would provide a useful high-ecological-validity instrument for tracking progress toward professional-grade AI capabilities. The scale of expert-sourced tasks and the shift from self-evaluation to calibrated external judging are constructive steps beyond many existing LLM benchmarks.

major comments (3)

[Abstract / Evaluation Methodology] Abstract and evaluation section: the central performance-ceiling claim (~66% peak, ~55% mean) and the domain-divergence finding rest on ShotJudge outputs, yet no inter-rater reliability statistics (Cohen’s kappa, Pearson r, or equivalent) between ShotJudge scores and independent human experts scoring the same tasks are reported.
[Benchmark Construction] Rubric construction paragraph: the weighting scheme for the 15-40 checkpoints is described as “weighted” but no procedure for deriving or validating the weights across the 1,346 tasks is supplied, leaving open the possibility that domain-specific score differences are artifacts of rubric construction rather than genuine model behavior.
[Empirical Evaluation] Results section: the reported success rates and non-overlapping domain strengths are presented without accompanying statistical tests, confidence intervals, or controls for judge calibration drift, so the quantitative support for the expert-gap conclusion remains incomplete.

minor comments (1)

[Abstract] The final sentence of the abstract contains a double period (“synthesis.. These”).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies key areas where additional validation and statistical detail will strengthen the manuscript. We respond to each major comment below and will incorporate the recommended changes in the revised version.

read point-by-point responses

Referee: [Abstract / Evaluation Methodology] Abstract and evaluation section: the central performance-ceiling claim (~66% peak, ~55% mean) and the domain-divergence finding rest on ShotJudge outputs, yet no inter-rater reliability statistics (Cohen’s kappa, Pearson r, or equivalent) between ShotJudge scores and independent human experts scoring the same tasks are reported.

Authors: We agree that reporting inter-rater reliability would provide stronger substantiation for ShotJudge. Although the method is calibrated on expert few-shot exemplars, a dedicated comparison with independent human scorers was not included. In the revision we will add a human validation subsection based on a representative subset of tasks, reporting Cohen’s kappa and Pearson correlation between ShotJudge outputs and expert ratings. revision: yes
Referee: [Benchmark Construction] Rubric construction paragraph: the weighting scheme for the 15-40 checkpoints is described as “weighted” but no procedure for deriving or validating the weights across the 1,346 tasks is supplied, leaving open the possibility that domain-specific score differences are artifacts of rubric construction rather than genuine model behavior.

Authors: The observation is correct; the weighting derivation process requires fuller description. Weights were assigned through direct consultation with the domain experts who authored each task, prioritizing checkpoints according to professional standards. We will expand the Benchmark Construction section to detail this procedure, including expert review steps used to validate the assigned weights. revision: yes
Referee: [Empirical Evaluation] Results section: the reported success rates and non-overlapping domain strengths are presented without accompanying statistical tests, confidence intervals, or controls for judge calibration drift, so the quantitative support for the expert-gap conclusion remains incomplete.

Authors: We accept that the current presentation lacks necessary statistical support. The revised Results section will include 95% confidence intervals for success rates and mean scores, appropriate statistical tests for domain comparisons, and explicit controls for judge calibration drift such as fixed few-shot sets and periodic consistency checks. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's performance claims are produced by applying the introduced ShotJudge (few-shot calibrated LLM judge) to 1,346 tasks and rubrics sourced from over 1,000 independent domain-expert submissions. These inputs pre-exist the evaluated model outputs and are not derived from them. No equations, fitted parameters, or self-citations are shown to reduce the reported success rates or domain divergences back to the same data by construction. The evaluation pipeline is externally grounded and self-contained against the expert-sourced benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the premise that expert-submitted tasks and weighted rubrics faithfully represent professional expertise and that ShotJudge produces human-aligned scores without introducing new biases.

axioms (2)

domain assumption Expert submissions from elite institutions and practitioners ensure ecological validity and superior task quality
Invoked to justify the benchmark's fidelity over prior generalist tasks.
domain assumption Rubrics with 15-40 weighted checkpoints provide a reliable, professional-grade scoring standard
Central to the evaluation protocol but not independently validated in the abstract.

invented entities (1)

ShotJudge no independent evidence
purpose: LLM-based judge calibrated with expert few-shot exemplars to reduce self-rewarding biases
New evaluation paradigm introduced to enable scalable assessment aligned with human experts.

pith-pipeline@v0.9.0 · 5715 in / 1424 out tokens · 49813 ms · 2026-05-14T23:18:39.401843+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Each task uses detailed rubrics with mostly 15-40 weighted checkpoints... ShotJudge... LLM judges calibrated with expert few-shot exemplars
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

1,346 meticulously curated tasks across 80 categories... domain-specific divergence

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

MMLU-pro: Amorerobustandchallengingmulti-tasklanguageunderstanding benchmark

Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, XuanHe, ZiyanJiang, etal. MMLU-pro: Amorerobustandchallengingmulti-tasklanguageunderstanding benchmark. In Advancesin Neural Information Processing Systems, volume 37, 2024

work page 2024
[2]

David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R. Bowman. GPQA: A graduate-level Google-proof Q&A benchmark. InProceedings of the First Conference on Language Modeling, 2024

work page 2024
[3]

Humanity’s last exam.Nature, 2026

Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, et al. Humanity’s last exam.Nature, 2026

work page 2026
[4]

FrontierMath: A benchmark for evaluating advanced mathematical reasoning in AI.arXiv preprint arXiv:2411.04872, 2024

Elliot Glazer, Ege Erdil, Tamay Besiroglu, Diego Chicharro, Evan Chen, Alex Gunning, Caroline Falkman Olsson, Jean-Stanislas Denain, Anson Ho, Emily de Oliveira Santos, et al. FrontierMath: A benchmark for evaluating advanced mathematical reasoning in AI.arXiv preprint arXiv:2411.04872, 2024

work page arXiv 2024
[5]

GAIA: a benchmark for general AI assistants

Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, and Thomas Scialom. GAIA: a benchmark for general AI assistants. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[6]

BrowseComp: A simple yet challenging benchmark for browsing agents.arXiv preprint arXiv:2501.12959, 2025

Jason Wei, Mia Cho, Aidan Cummings, Karina Guo, Shixiang Shane Hu, Simon Kang, Heidy Khlaaf, Neal Miao, Oam Neyman, Noa Rubin, et al. BrowseComp: A simple yet challenging benchmark for browsing agents.arXiv preprint arXiv:2501.12959, 2025

work page arXiv 2025
[7]

What disease does this patient have? a large-scale open domain question answering dataset from medical exams.Applied Sciences, 11(14):6421, 2021

Di Jin, Eileen Pan, Nassim Oufattole, Wei-Hung Weng, Hanyi Fang, and Peter Szolovits. What disease does this patient have? a large-scale open domain question answering dataset from medical exams.Applied Sciences, 11(14):6421, 2021

work page 2021
[8]

PubMedQA: A dataset for biomedical research question answering

Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William Cohen, and Xinghua Lu. PubMedQA: A dataset for biomedical research question answering. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing andthe9thInternationalJointConferenceonNaturalLanguageProcessing (EMNLP-IJCNLP), pages 2567–2577, 2019

work page 2019
[9]

Loomba, Shichang Zhang, Yizhou Sun, and Wei Wang

Xiaoxuan Wang, Ziniu Hu, Pan Lu, Yanqiao Zhu, Jieyu Zhang, Satyen Subramaniam, Arjun R. Loomba, Shichang Zhang, Yizhou Sun, and Wei Wang. SciBench: Evaluating college-level scientific problem-solving abilities of large language models. InProceedings of the Forty-FirstInternational Conference on Machine Learning, 2024

work page 2024
[10]

Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel N

Neel Guha, Julian Nyarko, Daniel E. Ho, Christopher Ré, Adam Chilton, Aditya Narayana, Alex Chohlas-Wood, Austin Peters, Brandon Waldon, Daniel N. Rockmore, et al. LegalBench: A collaboratively built benchmark for measuring legal reasoning in large language models. InAdvancesin Neural Information Processing Systems, volume 36, pages 59201–59242. Curran As...

work page 2023
[11]

FinBen: A holistic financial benchmark for large language models

Qianqian Xie, Weiguang Han, Zhengyu Chen, Ruoyu Xia, Xiao Zhang, Yueru He, Mengxi Xiao, Dong Li, Yongfu Dai, Duanyu Feng, et al. FinBen: A holistic financial benchmark for large language models. InAdvances in Neural Information Processing Systems, volume 37, 2024

work page 2024
[12]

AgentBench: Evaluating LLMs as agents

Xiao Liu, Hao Yu, Hanchen Zhang, Yifan Xu, Xuanyu Lei, Hanyu Lai, Yu Gu, Hangliang Ding, Kaiwen Men, Kejuan Yang, et al. AgentBench: Evaluating LLMs as agents. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[13]

DeepResearch Bench: A comprehensive benchmark for deep research agents

Mingxuan Du, Benfeng Xu, Chiwei Zhu, Licheng Zhang, Xiaorui Wang, and Zhendong Mao. DeepResearch Bench: A comprehensive benchmark for deep research agents. InInternational Conference on Learning Representations, 2026

work page 2026
[14]

DEER: A comprehensive and reliable benchmark for deep research agents on expert-level research tasks.arXiv preprint arXiv:2512.17776, 2025

Yifan Zhang, Yifan Chen, Haoyang Liu, Zhicheng Fang, et al. DEER: A comprehensive and reliable benchmark for deep research agents on expert-level research tasks.arXiv preprint arXiv:2512.17776, 2025

work page arXiv 2025
[15]

AlpacaEval: An Automatic Evaluator for Instruction-following Models.https://github.com/tatsu-lab/alpaca_eval, 2023

Xuechen Li, Tianyi Zhang, Yann Dubois, et al. AlpacaEval: An Automatic Evaluator for Instruction-following Models.https://github.com/tatsu-lab/alpaca_eval, 2023

work page 2023
[16]

Xing, Hao Zhang, Joseph E

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. Judging LLM-as-a-judge with MT-Bench and chatbot arena. InAdvancesin Neural Information Processing Systems, volume 36, 2023. 13

work page 2023
[17]

Arena-Hard Auto: Evaluating LLMs with Human-in-the-loop Standards.https://lmsys.org/blog/2024-04-19-arena-hard/, 2024

Tianle Li, Wei-Lin Chiang, Evan Frick, et al. Arena-Hard Auto: Evaluating LLMs with Human-in-the-loop Standards.https://lmsys.org/blog/2024-04-19-arena-hard/, 2024

work page 2024
[18]

Bill Yuchen Lin, Yuntian Deng, Khyathi Raghavi Chandu, Faeze Brahman, Abhilasha Srivastava, Abhilasha Ravichander, Yejin Choi, and Noah A. Smith. WildBench: Benchmarking LLMs with challenging tasks from real users in the wild. InAdvancesin Neural Information Processing Systems, volume 37, 2024

work page 2024
[19]

Prometheus: Inducing fine-grained evaluation capability in language models

Seungone Bi, Christoph Koch, Guoyin Chen, Trevor Agarwal, et al. Prometheus: Inducing fine-grained evaluation capability in language models. InThe TwelfthInternational Conference on Learning Representations, 2024

work page 2024
[20]

JudgeBench: A benchmark for evaluating LLM-based judges

Sijun Zhou, Nuo Huang, Ran Xu, Renren Yan, Muning Li, Yanghua Xiao, and Libby Hemphill. JudgeBench: A benchmark for evaluating LLM-based judges. InInternational Conference on Learning Representations, 2025

work page 2025
[21]

Holistic evaluation of language models.Transactionson Machine Learning Research, 2023

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, et al. Holistic evaluation of language models.Transactionson Machine Learning Research, 2023

work page 2023
[22]

Gonzalez, and Ion Stoica

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, and Ion Stoica. Chatbot arena: An open platform for evaluating LLMs by human preference. InProceedings of the Forty-FirstInternational Conference on Machine Learning, 2024

work page 2024
[23]

Measuring short-form factuality in large language models, 2024

Jason Wei, Karina Nguyen, Hyung Won Chung, Yunxin Joy Jiao, Spencer Papay, Amelia Glaese, John Schulman, and William Fedus. Measuring short-form factuality in large language models, 2024

work page 2024
[24]

RubricEval: A scalable human-LLM evaluation framework for open-ended tasks

Meera Bhat, Xi Fang, and Jacob Steinhardt. RubricEval: A scalable human-LLM evaluation framework for open-ended tasks. Stanford CS224N Final Reports, 2024

work page 2024
[25]

Large language models are not fair evaluators, 2023

Peiyi Wang, Lei Li, Liang Chen, Zefan Cai, Dawei Zhu, Binghuai Lin, Yunbo Cao, Qi Liu, Tianyu Liu, and Zhifang Sui. Large language models are not fair evaluators, 2023. 14 Appendix A Example Tasks and Scoring Rubrics In this appendix, we present representative example tasks and their corresponding scoring rubrics for each of the five evaluation categories...

work page 2023
[26]

Lockheed Martin Corporation (NYSE: LMT) – The world’s largest defense contractor, renowned for its dominant position in aviation (e.g., the F-35 fighter jet)

work page
[27]

Book-to-Bill Ratio

Northrop Grumman Corporation (NYSE: NOC) – A defense giant with formidable technological moats in aerospace, mission systems, and strategic weapons (e.g., the B-21 stealth bomber). Core Analysis Requirements: 1.Future Revenue Visibility Comparison: • The “Book-to-Bill Ratio” serves as the lifeline for gauging a defense company’s future revenue growth pote...

work page 2023
[28]

Based on legal theory, determine whether the agreement signed between Guangxi Company and China Construction Bank on June 1, 2023, constitutes a loan relationship or a factoring contract relationship?

work page 2023
[29]

What is the validity of the Factoring Financing Agreement signed between a financing guarantee company in Yunnan Province and Guangxi Company?

work page
[30]

accounts receivable transfer

If the financing guarantee company in Yunnan Province asserts rights against Guangxi Company or Yunnan Company based on the factoring contract relationship under the Factoring Financing Agreement, how should the liability be allocated between Yunnan Company and Guangxi Company? A.2.2 Scoring Rubric Table A.2Scoring rubric for the Law example task. Criteri...

work page
[31]

Based on the above conditions and comprehensive analysis of electrophoresis patterns in Figures 1 and 2 of the second image, what is the empty vector rate among the 20 selected single colonies?

work page
[32]

What is the minimum average size of the foreign fragment in the 20 plasmids shown in the figure?

work page
[33]

vector+foreign fragment,

Why do Figures 1 and 2 in the second figure show differences when the same plasmid is digested with the same restriction enzymeMboI and its restriction enzymeSau3AI? Attachments: Figure A.1The recognition sequence and cleavage sites (indicated by triangles) of the restriction enzymesMboI and Sau3AI. Both enzymes recognize the same5′-GATC-3′ nucleotide seq...

work page