{"work":{"id":"19ed8c44-202a-48f6-8169-637d5a5f2408","openalex_id":null,"doi":null,"arxiv_id":"2210.17323","raw_key":null,"title":"GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers","authors":null,"authors_text":"Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh","year":2022,"venue":"cs.LG","abstract":"Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart through breakthrough performance across complex language modelling tasks, but also by their extremely high computational and storage costs. Specifically, due to their massive size, even inference for large, highly-accurate GPT models may require multiple performant GPUs, which limits the usability of such models. While there is emerging work on relieving this pressure via model compression, the applicability and performance of existing compression techniques is limited by the scale and complexity of GPT models. In this paper, we address this challenge, and propose GPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient. Specifically, GPTQ can quantize GPT models with 175 billion parameters in approximately four GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible accuracy degradation relative to the uncompressed baseline. Our method more than doubles the compression gains relative to previously-proposed one-shot quantization methods, preserving accuracy, allowing us for the first time to execute an 175 billion-parameter model inside a single GPU for generative inference. Moreover, we also show that our method can still provide reasonable accuracy in the extreme quantization regime, in which weights are quantized to 2-bit or even ternary quantization levels. We show experimentally that these improvements can be leveraged for end-to-end inference speedups over FP16, of around 3.25x when using high-end GPUs (NVIDIA A100) and 4.5x when using more cost-effective ones (NVIDIA A6000). The implementation is available at https://github.com/IST-DASLab/gptq.","external_url":"https://arxiv.org/abs/2210.17323","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-07-04T07:59:39.556646+00:00","pith_arxiv_id":"2210.17323","created_at":"2026-05-09T06:20:42.391635+00:00","updated_at":"2026-07-04T07:59:39.556646+00:00","title_quality_ok":true,"display_title":"GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers","render_title":"GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers"},"hub":{"state":{"work_id":"19ed8c44-202a-48f6-8169-637d5a5f2408","tier":"super_hub","tier_reason":"100+ Pith inbound or 10,000+ external citations","pith_inbound_count":220,"external_cited_by_count":null,"distinct_field_count":16,"first_pith_cited_at":"2023-05-22T17:16:38+00:00","last_pith_cited_at":"2026-07-02T17:27:34+00:00","author_build_status":"needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-07-04T09:26:40.921799+00:00","tier_text":"super_hub"},"tier":"super_hub","role_counts":[{"context_role":"background","n":20},{"context_role":"baseline","n":3},{"context_role":"method","n":3},{"context_role":"dataset","n":1},{"context_role":"other","n":1}],"polarity_counts":[{"context_polarity":"background","n":19},{"context_polarity":"baseline","n":3},{"context_polarity":"use_method","n":3},{"context_polarity":"unclear","n":2},{"context_polarity":"use_dataset","n":1}],"runs":{"ask_index":{"job_type":"ask_index","status":"succeeded","result":{"title":"GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers","claims":[{"claim_text":"Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart through breakthrough performance across complex language modelling tasks, but also by their extremely high computational and storage costs. Specifically, due to their massive size, even inference for large, highly-accurate GPT models may require multiple performant GPUs, which limits the usability of such models. While there is emerging work on relieving this pressure via model compression, the applicability and performance of existing compression techniques is limited by the scale and complexity of GPT models","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"To compareDittowith alternative quantization approaches, we evaluate it against Activation- aware Weight Quantization (AWQ) [41], a state-of-the-art method that maps individual weights to low-bit integers guided by activation statistics and has been shown to preserve accuracy at 4-bit precision for code LLMs [1], on Code Llama-7B under the same hardware environment. AWQ and similar approaches, such as GPTQ [20], require a calibration dataset to reduce quantization error. In contrast,Dittoperform","claim_type":"baseline","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"erations for the BF16 baseline, TahQuant, and TACO; degradation (Deg.) indicates the relative loss increase over the baseline. Method Val Loss↓Test Loss↓Val Deg.↓Test Deg.↓ Baseline 2.389899 2.344701 - - TahQuant 2.458742 2.413642 +2.88 % +2.94 % TACO 2.395784 2.351210 +0.25 % +0.28 % Models and Datasets.In Section 5.2, we evaluate compression algorithms on GPT-350M trained on the Pile dataset [ 17], using a learning rate schedule of (3 × 10−4 → 3 × 10−5), with a global batch size of 256 and 10,","claim_type":"dataset","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"(c) Agentic workloads significantly increase memory capacity re- quirements for activations and KV cache. Meanwhile, bandwidth uti- lization increases moderately and exhibits distinct behaviors across the prefill and decode stages. 2 Figure 1: Heterogeneous memory design opportunities for agentic LLM inference. In addition, existing LLM models with agentic capabilities [16, 31, 44] have been scaled to millions of tokens in context window size, which natively demands massive memory capacity (e.g.","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"com/xuyuzhuang11/FINE. \"When you reach the edge, hidden forces take over. \" 1 Introduction Large language models (LLMs) face substantial deployment costs, posing a major bottleneck for real-world adoption. To unlock model capability under constrained budgets, model quantization is widely used, replacing high-bit value representations with low-bit counterparts [14, 23]. Model quantization has now been pushed to an extremely low bit-width, such as 1-bit [ 41, 21] or even sub-1-bit [10]. Under such","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"[14] Qinsi Wang, Jinghan Ke, Masayoshi Tomizuka, Yiran Chen, Kurt Keutzer, and Chenfeng Xu. Dobi-svd: Differentiable svd for llm compression and some new perspectives.arXiv preprint arXiv:2502.02723, 2025. [15] Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin, Bing Li, and Grace Li Zhang. Basis shar- ing: Cross-layer parameter sharing for large language model compression.arXiv preprint arXiv:2410.03765, 2024. [16] Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. Gptq: Accurate post-tra","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling 5.2. Efficient Inference with Low-bit Quantization Model quantization has become a mainstream technique for deploying large foundation models. For Large Language Models (LLMs), early breakthroughs primarily focused on 8-bit integer (INT8) quantization [ 49, 50]. For 4-bit quantization, methods such as GPTQ and AWQ [51, 52], leverage second-order Hessian information and activation- aware scaling to maintain hi","claim_type":"background","confidence":0.85,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (8 contexts).","role_counts":[{"n":8,"context_role":"background"},{"n":2,"context_role":"baseline"},{"n":2,"context_role":"method"},{"n":1,"context_role":"dataset"},{"n":1,"context_role":"other"}]},"error":null,"updated_at":"2026-05-18T13:50:47.952185+00:00"},"author_expand":{"job_type":"author_expand","status":"succeeded","result":{"authors_linked":[{"id":"c5135a0d-e531-418f-8ee6-32b0fd16bc44","orcid":null,"display_name":"Elias Frantar"},{"id":"35f59a05-5488-4467-8cf0-e2f2ba5cddd7","orcid":null,"display_name":"Saleh Ashkboos"},{"id":"b23610da-d5c6-4cbf-bd7f-166427f61cd6","orcid":null,"display_name":"Torsten Hoefler"},{"id":"bef9d14f-c238-48b3-818a-7d438e6bd82c","orcid":null,"display_name":"Dan Alistarh"}]},"error":null,"updated_at":"2026-05-18T13:50:47.946932+00:00"},"context_extract":{"job_type":"context_extract","status":"succeeded","result":{"enqueued_papers":25},"error":null,"updated_at":"2026-05-14T08:38:18.534873+00:00"},"graph_features":{"job_type":"graph_features","status":"succeeded","result":{"co_cited":[{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":17},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":17},{"title":"Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge","work_id":"28ea1282-d657-4c61-a83c-f1249be6d6b1","shared_citers":17},{"title":"AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration","work_id":"ea9d1d72-db24-4cae-8c89-4ecd83dd87c1","shared_citers":15},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":14},{"title":"9 Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher","work_id":"489855f8-bd1c-4c87-a334-f6ab27d6707d","shared_citers":12},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":11},{"title":"LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale","work_id":"98201f98-f4e5-4d1c-9ed7-b795e3c8f76c","shared_citers":11},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":10},{"title":"Measuring Massive Multitask Language Understanding","work_id":"e87ec49a-544b-4ec8-8991-75298c64ff5e","shared_citers":10},{"title":"Spqr: A sparse-quantized representation for near-lossless llm weight compression.arXiv preprint arXiv:2306.03078","work_id":"9f122cd0-f5dd-4105-8cf5-cc97759850e8","shared_citers":10},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":9},{"title":"Quip\\#: Even better llm quantization with hadamard incoherence and lattice codebooks","work_id":"921fbb92-5d63-4c93-bb32-b74fe6f12ff8","shared_citers":8},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":8},{"title":"arXiv preprint arXiv:2306.07629 , year=","work_id":"aebf1644-0d5e-418f-a910-1b207d30e16d","shared_citers":7},{"title":"arXiv preprint arXiv:2308.13137 , year=","work_id":"3aac6f51-8082-4d19-9349-f76711c6de1c","shared_citers":7},{"title":"Fast Transformer Decoding: One Write-Head is All You Need","work_id":"160ea164-b1d4-4adb-8ccb-a4655d8a0bb4","shared_citers":7},{"title":"GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints","work_id":"b73ad5b2-e553-4c71-b0c9-67e67ba7b158","shared_citers":7},{"title":"Mistral 7B","work_id":"eb5e1305-ad11-4875-ad8d-ad8b8f697599","shared_citers":7},{"title":"Pointer Sentinel Mixture Models","work_id":"fef3833e-dc80-42a3-a1e0-ffbfafee6fff","shared_citers":7},{"title":"DeepSeek-V3 Technical Report","work_id":"57d2791d-2219-4c31-a077-afc04b12a75c","shared_citers":6},{"title":"Distilling the Knowledge in a Neural Network","work_id":"d927ab1f-17b8-4002-9d09-c3d55764fbad","shared_citers":6},{"title":"HellaSwag: Can a Machine Really Finish Your Sentence?","work_id":"79f44c0c-96f4-4edb-bc50-a3c9d6b85936","shared_citers":6},{"title":"& Hooker, S","work_id":"b0bace20-bbcd-45d2-99df-9bf26e089b47","shared_citers":6}],"time_series":[{"n":2,"year":2023},{"n":3,"year":2024},{"n":70,"year":2026}],"dependency_candidates":[]},"error":null,"updated_at":"2026-05-14T08:38:14.400618+00:00"},"identity_refresh":{"job_type":"identity_refresh","status":"succeeded","result":{"items":[{"title":"Qwen3 Technical Report","outcome":"unchanged","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","resolver":"local_arxiv","confidence":0.98,"old_work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e"}],"counts":{"fixed":0,"merged":0,"unchanged":1,"quarantined":0,"needs_external_resolution":0},"errors":[],"attempted":1},"error":null,"updated_at":"2026-05-14T08:38:27.032075+00:00"},"role_polarity":{"job_type":"role_polarity","status":"succeeded","result":{"title":"GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers","claims":[{"claim_text":"Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart through breakthrough performance across complex language modelling tasks, but also by their extremely high computational and storage costs. Specifically, due to their massive size, even inference for large, highly-accurate GPT models may require multiple performant GPUs, which limits the usability of such models. While there is emerging work on relieving this pressure via model compression, the applicability and performance of existing compression techniques is limited by the scale and complexity of GPT models","claim_type":"abstract","evidence_strength":"source_metadata"},{"claim_text":"To compareDittowith alternative quantization approaches, we evaluate it against Activation- aware Weight Quantization (AWQ) [41], a state-of-the-art method that maps individual weights to low-bit integers guided by activation statistics and has been shown to preserve accuracy at 4-bit precision for code LLMs [1], on Code Llama-7B under the same hardware environment. AWQ and similar approaches, such as GPTQ [20], require a calibration dataset to reduce quantization error. In contrast,Dittoperform","claim_type":"baseline","confidence":0.95,"evidence_strength":"citation_context"},{"claim_text":"erations for the BF16 baseline, TahQuant, and TACO; degradation (Deg.) indicates the relative loss increase over the baseline. Method Val Loss↓Test Loss↓Val Deg.↓Test Deg.↓ Baseline 2.389899 2.344701 - - TahQuant 2.458742 2.413642 +2.88 % +2.94 % TACO 2.395784 2.351210 +0.25 % +0.28 % Models and Datasets.In Section 5.2, we evaluate compression algorithms on GPT-350M trained on the Pile dataset [ 17], using a learning rate schedule of (3 × 10−4 → 3 × 10−5), with a global batch size of 256 and 10,","claim_type":"dataset","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"(c) Agentic workloads significantly increase memory capacity re- quirements for activations and KV cache. Meanwhile, bandwidth uti- lization increases moderately and exhibits distinct behaviors across the prefill and decode stages. 2 Figure 1: Heterogeneous memory design opportunities for agentic LLM inference. In addition, existing LLM models with agentic capabilities [16, 31, 44] have been scaled to millions of tokens in context window size, which natively demands massive memory capacity (e.g.","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"com/xuyuzhuang11/FINE. \"When you reach the edge, hidden forces take over. \" 1 Introduction Large language models (LLMs) face substantial deployment costs, posing a major bottleneck for real-world adoption. To unlock model capability under constrained budgets, model quantization is widely used, replacing high-bit value representations with low-bit counterparts [14, 23]. Model quantization has now been pushed to an extremely low bit-width, such as 1-bit [ 41, 21] or even sub-1-bit [10]. Under such","claim_type":"method","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"[14] Qinsi Wang, Jinghan Ke, Masayoshi Tomizuka, Yiran Chen, Kurt Keutzer, and Chenfeng Xu. Dobi-svd: Differentiable svd for llm compression and some new perspectives.arXiv preprint arXiv:2502.02723, 2025. [15] Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin, Bing Li, and Grace Li Zhang. Basis shar- ing: Cross-layer parameter sharing for large language model compression.arXiv preprint arXiv:2410.03765, 2024. [16] Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. Gptq: Accurate post-tra","claim_type":"background","confidence":0.9,"evidence_strength":"citation_context"},{"claim_text":"FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling 5.2. Efficient Inference with Low-bit Quantization Model quantization has become a mainstream technique for deploying large foundation models. For Large Language Models (LLMs), early breakthroughs primarily focused on 8-bit integer (INT8) quantization [ 49, 50]. For 4-bit quantization, methods such as GPTQ and AWQ [51, 52], leverage second-order Hessian information and activation- aware scaling to maintain hi","claim_type":"background","confidence":0.85,"evidence_strength":"citation_context"}],"why_cited":"Pith tracks GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers because it crossed a citation-hub threshold. Current citing contexts most often use it as background evidence (8 contexts).","role_counts":[{"n":8,"context_role":"background"},{"n":2,"context_role":"baseline"},{"n":2,"context_role":"method"},{"n":1,"context_role":"dataset"},{"n":1,"context_role":"other"}]},"error":null,"updated_at":"2026-05-18T13:50:47.955864+00:00"},"summary_claims":{"job_type":"summary_claims","status":"succeeded","result":{"title":"GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers","claims":[{"claim_text":"Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart through breakthrough performance across complex language modelling tasks, but also by their extremely high computational and storage costs. Specifically, due to their massive size, even inference for large, highly-accurate GPT models may require multiple performant GPUs, which limits the usability of such models. While there is emerging work on relieving this pressure via model compression, the applicability and performance of existing compression techniques is limited by the scale and complexity of GPT models","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers because it crossed a citation-hub threshold.","role_counts":[]},"error":null,"updated_at":"2026-05-14T08:38:14.343271+00:00"}},"summary":{"title":"GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers","claims":[{"claim_text":"Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart through breakthrough performance across complex language modelling tasks, but also by their extremely high computational and storage costs. Specifically, due to their massive size, even inference for large, highly-accurate GPT models may require multiple performant GPUs, which limits the usability of such models. While there is emerging work on relieving this pressure via model compression, the applicability and performance of existing compression techniques is limited by the scale and complexity of GPT models","claim_type":"abstract","evidence_strength":"source_metadata"}],"why_cited":"Pith tracks GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers because it crossed a citation-hub threshold.","role_counts":[]},"graph":{"co_cited":[{"title":"LLaMA: Open and Efficient Foundation Language Models","work_id":"c018fc23-6f3f-4035-9d02-28a2173b2b9d","shared_citers":17},{"title":"The Llama 3 Herd of Models","work_id":"1549a635-88af-4ac1-acfe-51ae7bb53345","shared_citers":17},{"title":"Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge","work_id":"28ea1282-d657-4c61-a83c-f1249be6d6b1","shared_citers":17},{"title":"AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration","work_id":"ea9d1d72-db24-4cae-8c89-4ecd83dd87c1","shared_citers":15},{"title":"Qwen3 Technical Report","work_id":"25a4e30c-1232-48e7-9925-02fa12ba7c9e","shared_citers":14},{"title":"9 Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher","work_id":"489855f8-bd1c-4c87-a334-f6ab27d6707d","shared_citers":12},{"title":"Llama 2: Open Foundation and Fine-Tuned Chat Models","work_id":"68a5177f-d644-44c1-bd4f-4e5278c22f5d","shared_citers":11},{"title":"LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale","work_id":"98201f98-f4e5-4d1c-9ed7-b795e3c8f76c","shared_citers":11},{"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","shared_citers":10},{"title":"Measuring Massive Multitask Language Understanding","work_id":"e87ec49a-544b-4ec8-8991-75298c64ff5e","shared_citers":10},{"title":"Spqr: A sparse-quantized representation for near-lossless llm weight compression.arXiv preprint arXiv:2306.03078","work_id":"9f122cd0-f5dd-4105-8cf5-cc97759850e8","shared_citers":10},{"title":"Evaluating Large Language Models Trained on Code","work_id":"042493e9-b26f-4b4e-bbde-382072ca9b08","shared_citers":9},{"title":"Quip\\#: Even better llm quantization with hadamard incoherence and lattice codebooks","work_id":"921fbb92-5d63-4c93-bb32-b74fe6f12ff8","shared_citers":8},{"title":"Training Verifiers to Solve Math Word Problems","work_id":"acab1aa8-b4d6-40e0-a3ee-25341701dca2","shared_citers":8},{"title":"arXiv preprint arXiv:2306.07629 , year=","work_id":"aebf1644-0d5e-418f-a910-1b207d30e16d","shared_citers":7},{"title":"arXiv preprint arXiv:2308.13137 , year=","work_id":"3aac6f51-8082-4d19-9349-f76711c6de1c","shared_citers":7},{"title":"Fast Transformer Decoding: One Write-Head is All You Need","work_id":"160ea164-b1d4-4adb-8ccb-a4655d8a0bb4","shared_citers":7},{"title":"GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints","work_id":"b73ad5b2-e553-4c71-b0c9-67e67ba7b158","shared_citers":7},{"title":"Mistral 7B","work_id":"eb5e1305-ad11-4875-ad8d-ad8b8f697599","shared_citers":7},{"title":"Pointer Sentinel Mixture Models","work_id":"fef3833e-dc80-42a3-a1e0-ffbfafee6fff","shared_citers":7},{"title":"DeepSeek-V3 Technical Report","work_id":"57d2791d-2219-4c31-a077-afc04b12a75c","shared_citers":6},{"title":"Distilling the Knowledge in a Neural Network","work_id":"d927ab1f-17b8-4002-9d09-c3d55764fbad","shared_citers":6},{"title":"HellaSwag: Can a Machine Really Finish Your Sentence?","work_id":"79f44c0c-96f4-4edb-bc50-a3c9d6b85936","shared_citers":6},{"title":"& Hooker, S","work_id":"b0bace20-bbcd-45d2-99df-9bf26e089b47","shared_citers":6}],"time_series":[{"n":2,"year":2023},{"n":3,"year":2024},{"n":70,"year":2026}],"dependency_candidates":[]},"authors":[{"id":"bef9d14f-c238-48b3-818a-7d438e6bd82c","orcid":null,"display_name":"Dan Alistarh","source":"manual","import_confidence":0.72},{"id":"c5135a0d-e531-418f-8ee6-32b0fd16bc44","orcid":null,"display_name":"Elias Frantar","source":"manual","import_confidence":0.72},{"id":"35f59a05-5488-4467-8cf0-e2f2ba5cddd7","orcid":null,"display_name":"Saleh Ashkboos","source":"manual","import_confidence":0.72},{"id":"b23610da-d5c6-4cbf-bd7f-166427f61cd6","orcid":null,"display_name":"Torsten Hoefler","source":"manual","import_confidence":0.72}]}}