TableSeq: Unified Generation of Structure, Content, and Layout

Laziz Hamdi , Amine Tamasna , Pascal Boisson , Thierry Paquet

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:53 UTC · model grok-4.3

classification 💻 cs.CV

keywords tableseqcellcompetitivecontentrecognitionstructuretableunified

0 comments

The pith

TableSeq unifies table structure recognition, content extraction, and cell localization by generating an interleaved autoregressive sequence of HTML tags, cell text, and discretized coordinate tokens from an input image.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The system takes a table image and feeds it through a simple encoder made of a high-resolution network and a small transformer. A decoder then predicts tokens one after another: HTML tags that describe rows and columns, the words inside each cell, and rounded numbers that say where each cell sits on the page. Because everything comes out in one sequence, the model learns to keep the structure, text, and positions consistent without extra post-processing steps or separate text readers. The authors test this on standard table datasets and report high scores on structure and content accuracy metrics.

Core claim

TableSeq reaches 95.23 TEDS / 96.83 S-TEDS on PubTabNet, 97.45 TEDS / 98.69 S-TEDS on FinTabNet, and 99.79 / 99.54 / 99.66 precision / recall / F1 on SciTSR under the CAR protocol while using a compact architecture without external OCR or auxiliary decoders.

Load-bearing premise

That a single autoregressive decoder can reliably produce correctly interleaved HTML structure, accurate cell text, and sufficiently precise discretized coordinates without external OCR, auxiliary heads, or complex post-processing.

read the original abstract

We present TableSeq, an image-only, end-to-end framework for joint table structure recognition, content recognition, and cell localization. The model formulates these tasks as a single sequence-generation problem: one decoder produces an interleaved stream of \texttt{HTML} tags, cell text, and discretized coordinate tokens, thereby aligning logical structure, textual content, and cell geometry within a unified autoregressive sequence. This design avoids external OCR, auxiliary decoders, and complex multi-stage post-processing. TableSeq combines a lightweight high-resolution FCN-H16 encoder with a minimal structure-prior head and a single-layer transformer encoder, yielding a compact architecture that remains effective on challenging layouts. Across standard benchmarks, TableSeq achieves competitive or state-of-the-art results while preserving architectural simplicity. It reaches 95.23 TEDS / 96.83 S-TEDS on PubTabNet, 97.45 TEDS / 98.69 S-TEDS on FinTabNet, and 99.79 / 99.54 / 99.66 precision / recall / F1 on SciTSR under the CAR protocol, while remaining competitive on PubTables-1M under GriTS. Beyond TSR/TCR, the same sequence interface generalizes to index-based table querying without task-specific heads, achieving the best IRDR score and competitive ICDR/ICR performance. We also study multi-token prediction for faster blockwise decoding and show that it reduces inference latency with only limited accuracy degradation. Overall, TableSeq provides a practical and reproducible single-stream baseline for unified table recognition, and the source code will be made publicly available at https://github.com/hamdilaziz/TableSeq.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; coordinate discretization and the choice of single-layer transformer are design decisions whose impact is not quantified here.

pith-pipeline@v0.9.0 · 5612 in / 1096 out tokens · 39796 ms · 2026-05-10T08:53:05.297928+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 9 canonical work pages · 1 internal anchor

[1]

Image-based table recognition: data, model, and evalua- tion.ECCV, 564–580, 2020

Xu Zhong, Elaheh ShafieiBavani, Antonio Jimeno-Yepes. Image-based table recognition: data, model, and evalua- tion.ECCV, 564–580, 2020

2020
[2]

DeepDeSRT: deep learning for detec- tion and structure recognition of tables in document im- ages.ICDAR, 2017

Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Den- gel, Sheraz Ahmed. DeepDeSRT: deep learning for detec- tion and structure recognition of tables in document im- ages.ICDAR, 2017

2017
[3]

Sachin Raja, Ajoy Mondal, C. V. Jawahar. Table structure recognition using top-down and bottom-up cues.ECCV, 2020

2020
[4]

SEMv2: table separation line detection based on condi- tional convolution.CoRR, abs/2303.04384, 2023

Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Jun Du, Jian- shu Zhang, Huihui Zhu, Baocai Yin, Bing Yin, Cong Liu. SEMv2: table separation line detection based on condi- tional convolution.CoRR, abs/2303.04384, 2023. TableSeq: Unified Generation of Structure, Content, and Layout 13

work page arXiv 2023
[5]

Split, embed and merge: an accurate table structure recog- nizer.Pattern Recognit., 126:108565, 2022

Zhenrong Zhang, Jianshu Zhang, Jun Du, Fengren Wang. Split, embed and merge: an accurate table structure recog- nizer.Pattern Recognit., 126:108565, 2022

2022
[6]

Vishwanath, Rohit Rahul, Monika Sharma, Lovekesh Vig

Shubham Singh Paliwal, D. Vishwanath, Rohit Rahul, Monika Sharma, Lovekesh Vig. TableNet: deep learning model for end-to-end table detection and tabular data ex- traction from scanned document images.ICDAR, 128–133, 2019

2019
[7]

LGPMA: complicated table structure recognition with lo- cal and global pyramid mask alignment.ICDAR, 99–114, 2021

Liang Qiao, Zaisheng Li, Zhanzhan Cheng, Peng Zhang, Shiliang Pu, Yi Niu, Wenqi Ren, Wenming Tan, Fei Wu. LGPMA: complicated table structure recognition with lo- cal and global pyramid mask alignment.ICDAR, 99–114, 2021

2021
[8]

Aligning benchmark datasets for table structure recognition.IC- DAR, 371–386, 2023

Brandon Smock, Rohith Pesala, Robin Abraham. Aligning benchmark datasets for table structure recognition.IC- DAR, 371–386, 2023

2023
[9]

Table structure recognition based on cell relationship, a bottom-up approach.RANLP, 1– 8, 2019

Darshan Adiga, Shabir Ahmad Bhat, Muzaffar Bashir Shah, Viveka Vyeth. Table structure recognition based on cell relationship, a bottom-up approach.RANLP, 1– 8, 2019

2019
[10]

TSR-DSAW: table structure recognition via deep spa- tial association of words.CoRR, abs/2203.06873, 2022

Arushi Jain, Shubham Paliwal, Monika Sharma, Lovekesh Vig. TSR-DSAW: table structure recognition via deep spa- tial association of words.CoRR, abs/2203.06873, 2022

work page arXiv 2022
[11]

Morariu, Brian Price, Scott Co- hen, Tony Martinez

Chris Tensmeyer, Vlad I. Morariu, Brian Price, Scott Co- hen, Tony Martinez. Deep splitting and merging for table structure decomposition.ICDAR, 114–121, 2019

2019
[12]

Sachin Raja, Ajoy Mondal, C. V. Jawahar. Visual under- standing of complex table structures from document im- ages.W ACV, 2299–2308, 2022

2022
[13]

Complicated Table Structure Recognition

Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanx- uan Yin, Xian-Ling Mao. Complicated table structure recognition.CoRR, abs/1908.04729, 2019

work page arXiv 1908
[14]

CascadeTabNet: an approach for end-to-end table detection and structure recognition from image-based documents.CVPR Workshops, 2020

Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, Kavita Sultanpure. CascadeTabNet: an approach for end-to-end table detection and structure recognition from image-based documents.CVPR Workshops, 2020

2020
[15]

TableStrRec: framework for table structure recognition in data sheet im- ages.Int

Johan Fernandes, Bin Xiao, Murat Simsek, Burak Kantarci, Shahzad Khan, Ala Abu Alkheir. TableStrRec: framework for table structure recognition in data sheet im- ages.Int. J. Document Anal. Recognit., 27(2):127–145, 2024

2024
[16]

PubTables-1M: towards comprehensive table extrac- tion from unstructured documents.CVPR, 2022

Brandon Smock, Rohith Pesala, Robin Abraham. PubTables-1M: towards comprehensive table extrac- tion from unstructured documents.CVPR, 2022

2022
[17]

TRUST: an accurate and end-to-end ta- ble structure recognizer using splitting-based transformers

Zengyuan Guo, Yuechen Yu, Pengyuan Lv, Chengquan Zhang, Haojie Li, Zhihui Wang, Kun Yao, Jingtuo Liu, Jingdong Wang. TRUST: an accurate and end-to-end ta- ble structure recognizer using splitting-based transformers. CoRR, abs/2208.14687, 2022

work page arXiv 2022
[18]

TGRNet: a table graph reconstruction net- work for table structure recognition.ICCV, 2021

Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao, Qingyong Li. TGRNet: a table graph reconstruction net- work for table structure recognition.ICCV, 2021

2021
[19]

Neural collaborative graph machines for table structure recognition.CVPR, 4533–4542, 2022

Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren. Neural collaborative graph machines for table structure recognition.CVPR, 4533–4542, 2022

2022
[20]

End- to-end handwritten paragraph text recognition using a ver- tical attention network.IEEE Trans

Denis Coquenet, Clément Chatelain, Thierry Paquet. End- to-end handwritten paragraph text recognition using a ver- tical attention network.IEEE Trans. Pattern Anal. Mach. Intell., 45(1):508–524, 2023

2023
[21]

Optimized table tokenization for table structure recognition.ICDAR, 37–50, 2023

Maksym Lysak, Ahmed Nassar, Nikolaos Livathinos, Christoph Auer, Peter Staar. Optimized table tokenization for table structure recognition.ICDAR, 37–50, 2023

2023
[22]

TableVLM: multi-modal pre- training for table structure recognition.ACL, 2437–2449, 2023

Leiyuan Chen, Chengsong Huang, Xiaoqing Zheng, Jin- shu Lin, Xuan-Jing Huang. TableVLM: multi-modal pre- training for table structure recognition.ACL, 2437–2449, 2023

2023
[23]

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.CoRR, abs/1910.13461, 2019

work page internal anchor Pith review arXiv 1910
[24]

Scene tablestructurerecognitionwithsegmentationcollaboration and alignment.Pattern Recognit

Hongyi Wang, Yang Xue, Jiaxin Zhang, Lianwen Jin. Scene tablestructurerecognitionwithsegmentationcollaboration and alignment.Pattern Recognit. Lett., 165:146–153, 2023

2023
[25]

TSRFormer: table structure recognition with transformers.ACM Multimedia, 6473– 6482, 2022

Weihong Lin, Zheng Sun, Chixiang Ma, Mingze Li, Jiawei Wang, Lei Sun, Qiang Huo. TSRFormer: table structure recognition with transformers.ACM Multimedia, 6473– 6482, 2022

2022
[26]

Res2TIM: re- construct syntactic structures from table images.ICDAR, 749–755, 2019

Wenyuan Xue, Qingyong Li, Dacheng Tao. Res2TIM: re- construct syntactic structures from table images.ICDAR, 749–755, 2019

2019
[27]

Tablet: Table structure recog- nition using encoder-only transformers.arXiv preprint arXiv:2506.07015, 2025

Qiyu Hou, Jun Wang. TABLET: table structure recognition using encoder-only transformers.CoRR, abs/2506.07015, 2025

work page arXiv 2025
[28]

Pingan-vcgroup’s solution for icdar 2021 competition on scientific literature parsing task b: table recognition to html.arXiv preprint arXiv:2105.01848,

Jiaquan Ye, Xianbiao Qi, Yelin He, Yihao Chen, Dengyi Gu, Peng Gao, Rong Xiao. PingAn-VCGroup’s solu- tion for ICDAR 2021 competition on scientific literature parsing task B: table recognition to HTML.CoRR, abs/2105.01848, 2021

work page arXiv 2021
[29]

Global Table Extractor (GTE): a framework for joint table identification and cell structure recognition using visual context.W ACV, 697–706, 2021

Xinyi Zheng, Douglas Burdick, Lucian Popa, Xu Zhong, Nancy Xin Ru Wang. Global Table Extractor (GTE): a framework for joint table identification and cell structure recognition using visual context.W ACV, 697–706, 2021

2021
[30]

UniTabNet:bridgingvision and language models for enhanced table structure recogni- tion.Findings of ACL: EMNLP, 6131–6143, 2024

Zhenrong Zhang, Shuhang Liu, Pengfei Hu, Jiefeng Ma, JunDu,JianshuZhang,YuHu. UniTabNet:bridgingvision and language models for enhanced table structure recogni- tion.Findings of ACL: EMNLP, 6131–6143, 2024

2024
[31]

OMNIPARSER: a unified framework for text spotting, key information extraction and table recognition.CVPR, 15641–15653, 2024

Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wen- qing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang. OMNIPARSER: a unified framework for text spotting, key information extraction and table recognition.CVPR, 15641–15653, 2024

2024
[32]

OCR- free document understanding transformer.ECCV, 498– 517, 2022

Geewook Kim, Teakgyu Hong, Moonbin Yim, JeongYeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. OCR- free document understanding transformer.ECCV, 498– 517, 2022

2022
[33]

TableFormer: table structure understanding with transformers.CVPR, 4614–4623, 2022

Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar. TableFormer: table structure understanding with transformers.CVPR, 4614–4623, 2022

2022
[34]

An end-to-end multi-task learning model for image-based table recognition.VISAPP, 626–634, 2023

Nam Tuan Ly, Atsuhiro Takasu. An end-to-end multi-task learning model for image-based table recognition.VISAPP, 626–634, 2023

2023
[35]

Im- proving table structure recognition with visual-alignment sequential coordinate modeling.CVPR, 11134–11143, 2023

Yongshuai Huang, Ning Lu, Dapeng Chen, Yibo Li, Zecheng Xie, Shenggao Zhu, Liangcai Gao, Wei Peng. Im- proving table structure recognition with visual-alignment sequential coordinate modeling.CVPR, 11134–11143, 2023

2023
[36]

From detection to application: recent advances in understanding scientific tables and figures.ACM Comput

Jiani Huang, Haihua Chen, Fengchang Yu, Wei Lu. From detection to application: recent advances in understanding scientific tables and figures.ACM Comput. Surv., 56(10):1– 39, 2024

2024
[37]

Better & faster large language models via multi-token prediction.ICML, 15706– 15734, 2024

Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve. Better & faster large language models via multi-token prediction.ICML, 15706– 15734, 2024

2024
[38]

TableBank: table benchmark for image- based table detection and recognition.LREC, 1918–1925, 2020

Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li. TableBank: table benchmark for image- based table detection and recognition.LREC, 1918–1925, 2020. 14 Laziz Hamdi et al

1918
[39]

Jianlin Su, Murtadha H. M. Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, Yunfeng Liu. RoFormer: enhanced trans- former with rotary position embedding.Neurocomputing, 568:127063, 2024

2024
[40]

A survey for table recognition based on deep learning.Neurocomputing, 600:128154, 2024

Chenglong Yu, Weibin Li, Wei Li, Zixuan Zhu, Ruochen Liu, Biao Hou, Licheng Jiao. A survey for table recognition based on deep learning.Neurocomputing, 600:128154, 2024

2024
[41]

Parsing table structures in the wild.ICCV, 944–952, 2021

Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, Gui-Song Xia. Parsing table structures in the wild.ICCV, 944–952, 2021

2021
[42]

TRACE: table reconstruction aligned to corner and edges.ICDAR, 472–489, 2023

Youngmin Baek, Daehyun Nam, Jaeheung Surh, Seung Shin, Seonghyeon Kim. TRACE: table reconstruction aligned to corner and edges.ICDAR, 472–489, 2023

2023
[43]

Pix2Struct: screenshot parsing as pretraining for visual lan- guage understanding.ICML, 18893–18912, 2023

Kenton Lee, Mandar Joshi, Iulia Raluca Turc, Hexiang Hu, Fangyu Liu, Julian Martin Eisenschlos, Urvashi Khandel- wal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova. Pix2Struct: screenshot parsing as pretraining for visual lan- guage understanding.ICML, 18893–18912, 2023

2023
[44]

Enhancing table recog- nition with vision LLMs: a benchmark and neighbor-guided toolchain reasoner.IJCAI, 2503–2511, 2025

Yitong Zhou, Mingyue Cheng, Qingyang Mao, Qi Liu, Feiyang Xu, Xin Li, Enhong Chen. Enhancing table recog- nition with vision LLMs: a benchmark and neighbor-guided toolchain reasoner.IJCAI, 2503–2511, 2025

2025
[45]

LORE: logical location regression network for table structure recognition

Hangdi Xing, Feiyu Gao, Rujiao Long, Jiajun Bu, Qi Zheng, Liangcheng Li, Cong Yao, Zhi Yu. LORE: logical location regression network for table structure recognition. AAAI, 37(3):2992–3000, 2023

2023
[46]

TFLOP: table struc- ture recognition framework with layout pointer mechanism

Minsoo Khang, Teakgyu Hong. TFLOP: table struc- ture recognition framework with layout pointer mechanism. CoRR, abs/2501.11800, 2025

work page arXiv 2025
[47]

Fleet, Ge- offrey E

Ting Chen, Saurabh Saxena, Lala Li, David J. Fleet, Ge- offrey E. Hinton. Pix2Seq: a language modeling framework for object detection.ICLR, 2022

2022
[48]

Fleet, Geoffrey E

Ting Chen, Saurabh Saxena, Lala Li, Tsung-Yi Lin, David J. Fleet, Geoffrey E. Hinton. A unified sequence interface for vision tasks.NeurIPS, 2022

2022
[49]

Sachin Raja, Ajoy Mondal, C. V. Jawahar. Tread- ing towards privacy-preserving table structure recognition. W ACV, 2311–2321, 2025

2025
[50]

Towards unified scene text spotting based on sequence generation.CVPR, 15223–15232, 2023

Taeho Kil, Seonghyeon Kim, Sukmin Seo, Yoonsik Kim, Daehee Kim. Towards unified scene text spotting based on sequence generation.CVPR, 15223–15232, 2023

2023
[51]

Hierarchical text spotter for joint text spotting and layout analysis.W ACV, 892–902, 2024

Shangbang Long, Siyang Qin, Yasuhisa Fujii, Alessandro Bissacco, Michalis Raptis. Hierarchical text spotter for joint text spotting and layout analysis.W ACV, 892–902, 2024

2024
[52]

Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.Neural Netw., 107:3–11, 2018

Stefan Elfwing, Eiji Uchibe, Kenji Doya. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.Neural Netw., 107:3–11, 2018

2018
[53]

Ex- ploring plain vision transformer backbones for object de- tection.ECCV, 280–296, 2022

Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He. Ex- ploring plain vision transformer backbones for object de- tection.ECCV, 280–296, 2022

2022
[54]

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model.arXiv preprint arXiv:2409.01704, 2024

Haoran Wei, Chenglong Liu, Jinyue Chen, Jia Wang, Lingyu Kong, Yanming Xu, Zheng Ge, Liang Zhao, Jian- jian Sun, Yuang Peng, Chunrui Han, Xiangyu Zhang. Gen- eral OCR theory: towards OCR-2.0 via a unified end-to-end model.CoRR, abs/2409.01704, 2024

work page arXiv 2024
[55]

Vary: scaling up the vision vocabulary for large vision-language model.ECCV, 408–424, 2024

Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, Jinrong Yang, Jianjian Sun, Chunrui Han, Xi- angyu Zhang. Vary: scaling up the vision vocabulary for large vision-language model.ECCV, 408–424, 2024. A Synthetic Data High-level procedureWe denote the page image byI, the HTML byH(with per-cell coordinate tags<x_i>,<y_j>), and the scale factor fro...

2024