Recognition: unknown
TableSeq: Unified Generation of Structure, Content, and Layout
Pith reviewed 2026-05-10 08:53 UTC · model grok-4.3
The pith
TableSeq unifies table structure recognition, content extraction, and cell localization by generating an interleaved autoregressive sequence of HTML tags, cell text, and discretized coordinate tokens from an input image.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TableSeq reaches 95.23 TEDS / 96.83 S-TEDS on PubTabNet, 97.45 TEDS / 98.69 S-TEDS on FinTabNet, and 99.79 / 99.54 / 99.66 precision / recall / F1 on SciTSR under the CAR protocol while using a compact architecture without external OCR or auxiliary decoders.
Load-bearing premise
That a single autoregressive decoder can reliably produce correctly interleaved HTML structure, accurate cell text, and sufficiently precise discretized coordinates without external OCR, auxiliary heads, or complex post-processing.
read the original abstract
We present TableSeq, an image-only, end-to-end framework for joint table structure recognition, content recognition, and cell localization. The model formulates these tasks as a single sequence-generation problem: one decoder produces an interleaved stream of \texttt{HTML} tags, cell text, and discretized coordinate tokens, thereby aligning logical structure, textual content, and cell geometry within a unified autoregressive sequence. This design avoids external OCR, auxiliary decoders, and complex multi-stage post-processing. TableSeq combines a lightweight high-resolution FCN-H16 encoder with a minimal structure-prior head and a single-layer transformer encoder, yielding a compact architecture that remains effective on challenging layouts. Across standard benchmarks, TableSeq achieves competitive or state-of-the-art results while preserving architectural simplicity. It reaches 95.23 TEDS / 96.83 S-TEDS on PubTabNet, 97.45 TEDS / 98.69 S-TEDS on FinTabNet, and 99.79 / 99.54 / 99.66 precision / recall / F1 on SciTSR under the CAR protocol, while remaining competitive on PubTables-1M under GriTS. Beyond TSR/TCR, the same sequence interface generalizes to index-based table querying without task-specific heads, achieving the best IRDR score and competitive ICDR/ICR performance. We also study multi-token prediction for faster blockwise decoding and show that it reduces inference latency with only limited accuracy degradation. Overall, TableSeq provides a practical and reproducible single-stream baseline for unified table recognition, and the source code will be made publicly available at https://github.com/hamdilaziz/TableSeq.
Editorial analysis
A structured set of objections, weighed in public.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Image-based table recognition: data, model, and evalua- tion.ECCV, 564–580, 2020
Xu Zhong, Elaheh ShafieiBavani, Antonio Jimeno-Yepes. Image-based table recognition: data, model, and evalua- tion.ECCV, 564–580, 2020
2020
-
[2]
DeepDeSRT: deep learning for detec- tion and structure recognition of tables in document im- ages.ICDAR, 2017
Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Den- gel, Sheraz Ahmed. DeepDeSRT: deep learning for detec- tion and structure recognition of tables in document im- ages.ICDAR, 2017
2017
-
[3]
Sachin Raja, Ajoy Mondal, C. V. Jawahar. Table structure recognition using top-down and bottom-up cues.ECCV, 2020
2020
-
[4]
SEMv2: table separation line detection based on condi- tional convolution.CoRR, abs/2303.04384, 2023
Zhenrong Zhang, Pengfei Hu, Jiefeng Ma, Jun Du, Jian- shu Zhang, Huihui Zhu, Baocai Yin, Bing Yin, Cong Liu. SEMv2: table separation line detection based on condi- tional convolution.CoRR, abs/2303.04384, 2023. TableSeq: Unified Generation of Structure, Content, and Layout 13
-
[5]
Split, embed and merge: an accurate table structure recog- nizer.Pattern Recognit., 126:108565, 2022
Zhenrong Zhang, Jianshu Zhang, Jun Du, Fengren Wang. Split, embed and merge: an accurate table structure recog- nizer.Pattern Recognit., 126:108565, 2022
2022
-
[6]
Vishwanath, Rohit Rahul, Monika Sharma, Lovekesh Vig
Shubham Singh Paliwal, D. Vishwanath, Rohit Rahul, Monika Sharma, Lovekesh Vig. TableNet: deep learning model for end-to-end table detection and tabular data ex- traction from scanned document images.ICDAR, 128–133, 2019
2019
-
[7]
LGPMA: complicated table structure recognition with lo- cal and global pyramid mask alignment.ICDAR, 99–114, 2021
Liang Qiao, Zaisheng Li, Zhanzhan Cheng, Peng Zhang, Shiliang Pu, Yi Niu, Wenqi Ren, Wenming Tan, Fei Wu. LGPMA: complicated table structure recognition with lo- cal and global pyramid mask alignment.ICDAR, 99–114, 2021
2021
-
[8]
Aligning benchmark datasets for table structure recognition.IC- DAR, 371–386, 2023
Brandon Smock, Rohith Pesala, Robin Abraham. Aligning benchmark datasets for table structure recognition.IC- DAR, 371–386, 2023
2023
-
[9]
Table structure recognition based on cell relationship, a bottom-up approach.RANLP, 1– 8, 2019
Darshan Adiga, Shabir Ahmad Bhat, Muzaffar Bashir Shah, Viveka Vyeth. Table structure recognition based on cell relationship, a bottom-up approach.RANLP, 1– 8, 2019
2019
-
[10]
Arushi Jain, Shubham Paliwal, Monika Sharma, Lovekesh Vig. TSR-DSAW: table structure recognition via deep spa- tial association of words.CoRR, abs/2203.06873, 2022
-
[11]
Morariu, Brian Price, Scott Co- hen, Tony Martinez
Chris Tensmeyer, Vlad I. Morariu, Brian Price, Scott Co- hen, Tony Martinez. Deep splitting and merging for table structure decomposition.ICDAR, 114–121, 2019
2019
-
[12]
Sachin Raja, Ajoy Mondal, C. V. Jawahar. Visual under- standing of complex table structures from document im- ages.W ACV, 2299–2308, 2022
2022
-
[13]
Complicated Table Structure Recognition
Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanx- uan Yin, Xian-Ling Mao. Complicated table structure recognition.CoRR, abs/1908.04729, 2019
-
[14]
CascadeTabNet: an approach for end-to-end table detection and structure recognition from image-based documents.CVPR Workshops, 2020
Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, Kavita Sultanpure. CascadeTabNet: an approach for end-to-end table detection and structure recognition from image-based documents.CVPR Workshops, 2020
2020
-
[15]
TableStrRec: framework for table structure recognition in data sheet im- ages.Int
Johan Fernandes, Bin Xiao, Murat Simsek, Burak Kantarci, Shahzad Khan, Ala Abu Alkheir. TableStrRec: framework for table structure recognition in data sheet im- ages.Int. J. Document Anal. Recognit., 27(2):127–145, 2024
2024
-
[16]
PubTables-1M: towards comprehensive table extrac- tion from unstructured documents.CVPR, 2022
Brandon Smock, Rohith Pesala, Robin Abraham. PubTables-1M: towards comprehensive table extrac- tion from unstructured documents.CVPR, 2022
2022
-
[17]
TRUST: an accurate and end-to-end ta- ble structure recognizer using splitting-based transformers
Zengyuan Guo, Yuechen Yu, Pengyuan Lv, Chengquan Zhang, Haojie Li, Zhihui Wang, Kun Yao, Jingtuo Liu, Jingdong Wang. TRUST: an accurate and end-to-end ta- ble structure recognizer using splitting-based transformers. CoRR, abs/2208.14687, 2022
-
[18]
TGRNet: a table graph reconstruction net- work for table structure recognition.ICCV, 2021
Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao, Qingyong Li. TGRNet: a table graph reconstruction net- work for table structure recognition.ICCV, 2021
2021
-
[19]
Neural collaborative graph machines for table structure recognition.CVPR, 4533–4542, 2022
Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren. Neural collaborative graph machines for table structure recognition.CVPR, 4533–4542, 2022
2022
-
[20]
End- to-end handwritten paragraph text recognition using a ver- tical attention network.IEEE Trans
Denis Coquenet, Clément Chatelain, Thierry Paquet. End- to-end handwritten paragraph text recognition using a ver- tical attention network.IEEE Trans. Pattern Anal. Mach. Intell., 45(1):508–524, 2023
2023
-
[21]
Optimized table tokenization for table structure recognition.ICDAR, 37–50, 2023
Maksym Lysak, Ahmed Nassar, Nikolaos Livathinos, Christoph Auer, Peter Staar. Optimized table tokenization for table structure recognition.ICDAR, 37–50, 2023
2023
-
[22]
TableVLM: multi-modal pre- training for table structure recognition.ACL, 2437–2449, 2023
Leiyuan Chen, Chengsong Huang, Xiaoqing Zheng, Jin- shu Lin, Xuan-Jing Huang. TableVLM: multi-modal pre- training for table structure recognition.ACL, 2437–2449, 2023
2023
-
[23]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.CoRR, abs/1910.13461, 2019
work page internal anchor Pith review arXiv 1910
-
[24]
Scene tablestructurerecognitionwithsegmentationcollaboration and alignment.Pattern Recognit
Hongyi Wang, Yang Xue, Jiaxin Zhang, Lianwen Jin. Scene tablestructurerecognitionwithsegmentationcollaboration and alignment.Pattern Recognit. Lett., 165:146–153, 2023
2023
-
[25]
TSRFormer: table structure recognition with transformers.ACM Multimedia, 6473– 6482, 2022
Weihong Lin, Zheng Sun, Chixiang Ma, Mingze Li, Jiawei Wang, Lei Sun, Qiang Huo. TSRFormer: table structure recognition with transformers.ACM Multimedia, 6473– 6482, 2022
2022
-
[26]
Res2TIM: re- construct syntactic structures from table images.ICDAR, 749–755, 2019
Wenyuan Xue, Qingyong Li, Dacheng Tao. Res2TIM: re- construct syntactic structures from table images.ICDAR, 749–755, 2019
2019
-
[27]
Qiyu Hou, Jun Wang. TABLET: table structure recognition using encoder-only transformers.CoRR, abs/2506.07015, 2025
-
[28]
Jiaquan Ye, Xianbiao Qi, Yelin He, Yihao Chen, Dengyi Gu, Peng Gao, Rong Xiao. PingAn-VCGroup’s solu- tion for ICDAR 2021 competition on scientific literature parsing task B: table recognition to HTML.CoRR, abs/2105.01848, 2021
-
[29]
Global Table Extractor (GTE): a framework for joint table identification and cell structure recognition using visual context.W ACV, 697–706, 2021
Xinyi Zheng, Douglas Burdick, Lucian Popa, Xu Zhong, Nancy Xin Ru Wang. Global Table Extractor (GTE): a framework for joint table identification and cell structure recognition using visual context.W ACV, 697–706, 2021
2021
-
[30]
UniTabNet:bridgingvision and language models for enhanced table structure recogni- tion.Findings of ACL: EMNLP, 6131–6143, 2024
Zhenrong Zhang, Shuhang Liu, Pengfei Hu, Jiefeng Ma, JunDu,JianshuZhang,YuHu. UniTabNet:bridgingvision and language models for enhanced table structure recogni- tion.Findings of ACL: EMNLP, 6131–6143, 2024
2024
-
[31]
OMNIPARSER: a unified framework for text spotting, key information extraction and table recognition.CVPR, 15641–15653, 2024
Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wen- qing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang. OMNIPARSER: a unified framework for text spotting, key information extraction and table recognition.CVPR, 15641–15653, 2024
2024
-
[32]
OCR- free document understanding transformer.ECCV, 498– 517, 2022
Geewook Kim, Teakgyu Hong, Moonbin Yim, JeongYeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. OCR- free document understanding transformer.ECCV, 498– 517, 2022
2022
-
[33]
TableFormer: table structure understanding with transformers.CVPR, 4614–4623, 2022
Ahmed Nassar, Nikolaos Livathinos, Maksym Lysak, Peter Staar. TableFormer: table structure understanding with transformers.CVPR, 4614–4623, 2022
2022
-
[34]
An end-to-end multi-task learning model for image-based table recognition.VISAPP, 626–634, 2023
Nam Tuan Ly, Atsuhiro Takasu. An end-to-end multi-task learning model for image-based table recognition.VISAPP, 626–634, 2023
2023
-
[35]
Im- proving table structure recognition with visual-alignment sequential coordinate modeling.CVPR, 11134–11143, 2023
Yongshuai Huang, Ning Lu, Dapeng Chen, Yibo Li, Zecheng Xie, Shenggao Zhu, Liangcai Gao, Wei Peng. Im- proving table structure recognition with visual-alignment sequential coordinate modeling.CVPR, 11134–11143, 2023
2023
-
[36]
From detection to application: recent advances in understanding scientific tables and figures.ACM Comput
Jiani Huang, Haihua Chen, Fengchang Yu, Wei Lu. From detection to application: recent advances in understanding scientific tables and figures.ACM Comput. Surv., 56(10):1– 39, 2024
2024
-
[37]
Better & faster large language models via multi-token prediction.ICML, 15706– 15734, 2024
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Rozière, David Lopez-Paz, Gabriel Synnaeve. Better & faster large language models via multi-token prediction.ICML, 15706– 15734, 2024
2024
-
[38]
TableBank: table benchmark for image- based table detection and recognition.LREC, 1918–1925, 2020
Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li. TableBank: table benchmark for image- based table detection and recognition.LREC, 1918–1925, 2020. 14 Laziz Hamdi et al
1918
-
[39]
Jianlin Su, Murtadha H. M. Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, Yunfeng Liu. RoFormer: enhanced trans- former with rotary position embedding.Neurocomputing, 568:127063, 2024
2024
-
[40]
A survey for table recognition based on deep learning.Neurocomputing, 600:128154, 2024
Chenglong Yu, Weibin Li, Wei Li, Zixuan Zhu, Ruochen Liu, Biao Hou, Licheng Jiao. A survey for table recognition based on deep learning.Neurocomputing, 600:128154, 2024
2024
-
[41]
Parsing table structures in the wild.ICCV, 944–952, 2021
Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, Gui-Song Xia. Parsing table structures in the wild.ICCV, 944–952, 2021
2021
-
[42]
TRACE: table reconstruction aligned to corner and edges.ICDAR, 472–489, 2023
Youngmin Baek, Daehyun Nam, Jaeheung Surh, Seung Shin, Seonghyeon Kim. TRACE: table reconstruction aligned to corner and edges.ICDAR, 472–489, 2023
2023
-
[43]
Pix2Struct: screenshot parsing as pretraining for visual lan- guage understanding.ICML, 18893–18912, 2023
Kenton Lee, Mandar Joshi, Iulia Raluca Turc, Hexiang Hu, Fangyu Liu, Julian Martin Eisenschlos, Urvashi Khandel- wal, Peter Shaw, Ming-Wei Chang, Kristina Toutanova. Pix2Struct: screenshot parsing as pretraining for visual lan- guage understanding.ICML, 18893–18912, 2023
2023
-
[44]
Enhancing table recog- nition with vision LLMs: a benchmark and neighbor-guided toolchain reasoner.IJCAI, 2503–2511, 2025
Yitong Zhou, Mingyue Cheng, Qingyang Mao, Qi Liu, Feiyang Xu, Xin Li, Enhong Chen. Enhancing table recog- nition with vision LLMs: a benchmark and neighbor-guided toolchain reasoner.IJCAI, 2503–2511, 2025
2025
-
[45]
LORE: logical location regression network for table structure recognition
Hangdi Xing, Feiyu Gao, Rujiao Long, Jiajun Bu, Qi Zheng, Liangcheng Li, Cong Yao, Zhi Yu. LORE: logical location regression network for table structure recognition. AAAI, 37(3):2992–3000, 2023
2023
-
[46]
TFLOP: table struc- ture recognition framework with layout pointer mechanism
Minsoo Khang, Teakgyu Hong. TFLOP: table struc- ture recognition framework with layout pointer mechanism. CoRR, abs/2501.11800, 2025
-
[47]
Fleet, Ge- offrey E
Ting Chen, Saurabh Saxena, Lala Li, David J. Fleet, Ge- offrey E. Hinton. Pix2Seq: a language modeling framework for object detection.ICLR, 2022
2022
-
[48]
Fleet, Geoffrey E
Ting Chen, Saurabh Saxena, Lala Li, Tsung-Yi Lin, David J. Fleet, Geoffrey E. Hinton. A unified sequence interface for vision tasks.NeurIPS, 2022
2022
-
[49]
Sachin Raja, Ajoy Mondal, C. V. Jawahar. Tread- ing towards privacy-preserving table structure recognition. W ACV, 2311–2321, 2025
2025
-
[50]
Towards unified scene text spotting based on sequence generation.CVPR, 15223–15232, 2023
Taeho Kil, Seonghyeon Kim, Sukmin Seo, Yoonsik Kim, Daehee Kim. Towards unified scene text spotting based on sequence generation.CVPR, 15223–15232, 2023
2023
-
[51]
Hierarchical text spotter for joint text spotting and layout analysis.W ACV, 892–902, 2024
Shangbang Long, Siyang Qin, Yasuhisa Fujii, Alessandro Bissacco, Michalis Raptis. Hierarchical text spotter for joint text spotting and layout analysis.W ACV, 892–902, 2024
2024
-
[52]
Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.Neural Netw., 107:3–11, 2018
Stefan Elfwing, Eiji Uchibe, Kenji Doya. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning.Neural Netw., 107:3–11, 2018
2018
-
[53]
Ex- ploring plain vision transformer backbones for object de- tection.ECCV, 280–296, 2022
Yanghao Li, Hanzi Mao, Ross Girshick, Kaiming He. Ex- ploring plain vision transformer backbones for object de- tection.ECCV, 280–296, 2022
2022
-
[54]
Haoran Wei, Chenglong Liu, Jinyue Chen, Jia Wang, Lingyu Kong, Yanming Xu, Zheng Ge, Liang Zhao, Jian- jian Sun, Yuang Peng, Chunrui Han, Xiangyu Zhang. Gen- eral OCR theory: towards OCR-2.0 via a unified end-to-end model.CoRR, abs/2409.01704, 2024
-
[55]
Vary: scaling up the vision vocabulary for large vision-language model.ECCV, 408–424, 2024
Haoran Wei, Lingyu Kong, Jinyue Chen, Liang Zhao, Zheng Ge, Jinrong Yang, Jianjian Sun, Chunrui Han, Xi- angyu Zhang. Vary: scaling up the vision vocabulary for large vision-language model.ECCV, 408–424, 2024. A Synthetic Data High-level procedureWe denote the page image byI, the HTML byH(with per-cell coordinate tags<x_i>,<y_j>), and the scale factor fro...
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.