FastTab: A Fast Table Recognizer with a Tiny Recursive Module and 1D Transformers

Amine Tamasna; Laziz Hamdi; Pascal Boisson; Thierry Paquet

arxiv: 2605.22422 · v1 · pith:E5LEVN7Unew · submitted 2026-05-21 · 💻 cs.CV · cs.AI

FastTab: A Fast Table Recognizer with a Tiny Recursive Module and 1D Transformers

Laziz Hamdi , Amine Tamasna , Pascal Boisson , Thierry Paquet This is my paper

Pith reviewed 2026-05-22 07:09 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords table structure recognitiondocument image analysistransformer encoderrecursive modulegrid predictionseparator localizationlow-latency inference

0 comments

The pith

FastTab recovers table structure by predicting row and column counts plus separators directly with a tiny recursive module and 1D transformers instead of sequential HTML generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents FastTab as a grid-centric approach to table structure recognition that first builds an explicit grid from predicted counts, headers, and separators, then fills in cell spans using region-of-interest features. This design replaces the common autoregressive decoding of HTML tags with parallel axial processing along rows and columns plus a lightweight recursive module for global coherence. On four standard benchmarks the method reaches competitive accuracy at substantially lower inference latency. It also demonstrates robustness when text is pixel-level anonymized and extends naturally to documents with curved separators.

Core claim

FastTab constructs table structure by predicting the number of rows and columns, identifying header rows, and localizing horizontal and vertical separators with a combination of a Tiny Recursive Module for global reasoning and axial 1D Transformer encoders that model long-range dependencies separately along each axis; cell spans are then inferred from ROI-aligned features extracted after the grid is assembled.

What carries the argument

Tiny Recursive Module (TRM) combined with axial 1D Transformer encoders that process rows and columns independently to predict grid elements before span inference.

If this is right

Table structure can be recovered at inference speeds suitable for real-time document pipelines without sacrificing benchmark accuracy.
Pixel-level anonymization of cell content does not degrade separator localization when the model relies on grid-level predictions.
The same grid-construction pipeline extends to camera-captured documents containing curved separators.
Avoiding sequential tag generation removes a major source of compounding errors in complex multi-span tables.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The grid-first strategy could transfer to other layout-heavy tasks such as form field extraction or chart parsing where sequential decoding currently dominates.
Replacing the Tiny Recursive Module with a larger but still lightweight recurrent unit might improve global coherence on very large tables without losing the speed benefit.
Because the model separates row-wise and column-wise reasoning, it may scale more gracefully to tables with hundreds of rows than fully 2D attention approaches.

Load-bearing premise

Directly predicting row and column counts, headers, and separators followed by ROI-based span inference is enough to recover accurate structure without the error buildup that occurs in autoregressive HTML decoding.

What would settle it

A head-to-head comparison on PubTables-1M or SciTSR showing that FastTab's structure accuracy falls below strong autoregressive baselines on tables with frequent spanning cells while its latency advantage remains.

Figures

Figures reproduced from arXiv: 2605.22422 by Amine Tamasna, Laziz Hamdi, Pascal Boisson, Thierry Paquet.

**Figure 2.** Figure 2: Architecture details of the Span Head and Tiny Recursive Module. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: PubTabNet example (PMC4608158_004_00.png) and its anonymised variants. 4 Experiments Datasets We evaluate FastTab for table structure recognition (TSR) on four established benchmarks spanning scientific and financial domains. PubTabNet [1] contains table images from scientific articles paired with HTML annotations, and is widely used to assess image-based structure recovery under diverse visual styles. … view at source ↗

**Figure 4.** Figure 4: Representative predictions. Red/blue: row/column separators; dashed [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: FinTabNet rotation robustness. Mean S-TEDS (%) ± one standard deviation under random in-plane rotations sampled uniformly from [−α, α]. The dashed line indicates the no-rotation baseline (98.2). predict a bounded residual offset for each boundary and each sample point; a smoothness penalty encourages locally regular curves, while a non-crossing regularizer discourages violations of the ordering constraint … view at source ↗

**Figure 6.** Figure 6: Effect of TRM iterations T on PubTabNet. (a) Accuracy improves rapidly for small T and saturates after T ≈ 6. (b) Throughput decreases with T, revealing an elbow around T = 6 that provides a favorable accuracy–speed trade-off. 5 Ablation study 5.1 Impact of the Tiny Recursive Module (TRM) FastTab refines the global latent representation z through a small number of recursive refinement steps. To quantify th… view at source ↗

read the original abstract

Table structure recognition (TSR) requires both table-level coherence (row/column counts, headers, spanning cells) and precise separator localization. We introduce FastTab, a grid-centric TSR model that avoids autoregressive HTML decoding by combining (i) a lightweight Tiny Recursive Module (TRM) for global reasoning and (ii) axial 1D Transformer encoders that capture long-range dependencies along rows and columns. The model predicts row/column counts, header rows, and separators to construct a grid, then infers rowspan/colspan using ROI-aligned cell features. Across four benchmarks (PubTabNet, FinTabNet, PubTables-1M, and SciTSR), FastTab achieves competitive structure recovery performance while operating at low-latency inference. We further study robustness under pixel-level anonymisation and show an extension to curved separators for camera-captured documents. The source code will be made publicly available at https://github.com/hamdilaziz/FastTab .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FastTab swaps autoregressive HTML decoding for direct grid prediction via a tiny recursive module and axial 1D transformers, hitting competitive TSR numbers at lower latency on the usual benchmarks.

read the letter

FastTab's core idea is to first predict row and column counts, headers, and separators to build a grid, then infer cell spans from ROI features. This avoids the sequential mistakes that pile up in autoregressive HTML generation, and the results look competitive on PubTabNet, FinTabNet, PubTables-1M, and SciTSR while running faster at inference time. They also check robustness on anonymized pixels and add a curved-separator extension for photos, which is a reasonable practical touch. Code release helps too.

Referee Report

2 major / 0 minor

Summary. The paper introduces FastTab, a grid-centric table structure recognition (TSR) model that combines a Tiny Recursive Module (TRM) for global reasoning with axial 1D Transformer encoders to predict row/column counts, header rows, and separators, thereby constructing a grid before performing ROI-aligned inference of rowspan and colspan on cell features. It reports competitive structure recovery on PubTabNet, FinTabNet, PubTables-1M, and SciTSR while emphasizing low-latency inference, robustness under pixel-level anonymization, and an extension to curved separators; source code is promised to be released.

Significance. If the performance claims are substantiated with detailed metrics and ablations, the work could offer a practical advance for high-throughput document pipelines by sidestepping autoregressive HTML decoding and its potential error accumulation. The lightweight TRM plus 1D axial design and the public-code commitment are clear strengths; the robustness and curved-separator experiments add real-world relevance.

major comments (2)

Abstract: the central performance claim is stated only qualitatively ('competitive structure recovery performance') with no TEDS, F1, or latency numbers, error bars, or ablation tables. Without these data it is impossible to judge whether the grid-first pipeline actually delivers the promised reduction in error accumulation relative to autoregressive baselines.
Method description (grid construction step): the approach instantiates the table grid from predicted row/column counts and separators before ROI span inference. An off-by-one error in any global count would misalign all subsequent cell features. The manuscript provides no quantitative breakdown of count/separator prediction accuracy versus final structure metrics, leaving open whether competitive results hold only for simple rectangular tables or also for complex spanning cases.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: Abstract: the central performance claim is stated only qualitatively ('competitive structure recovery performance') with no TEDS, F1, or latency numbers, error bars, or ablation tables. Without these data it is impossible to judge whether the grid-first pipeline actually delivers the promised reduction in error accumulation relative to autoregressive baselines.

Authors: We agree that the abstract would be strengthened by quantitative claims. In the revised version we will include specific TEDS, F1, and latency figures (with comparisons to the main baselines) so that readers can directly evaluate the claimed advantages of the grid-centric pipeline. revision: yes
Referee: Method description (grid construction step): the approach instantiates the table grid from predicted row/column counts and separators before ROI span inference. An off-by-one error in any global count would misalign all subsequent cell features. The manuscript provides no quantitative breakdown of count/separator prediction accuracy versus final structure metrics, leaving open whether competitive results hold only for simple rectangular tables or also for complex spanning cases.

Authors: We acknowledge the importance of quantifying error propagation from the global count and separator predictions. While the final structure metrics already reflect performance on benchmarks containing many spanning cells, we will add a dedicated analysis (new table and discussion) that reports count/separator prediction accuracy separately and correlates it with the end-to-end TEDS and F1 scores. This will explicitly address both simple and complex table cases. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture and evaluation are self-contained against external benchmarks

full rationale

The paper presents FastTab as a new grid-centric architecture that predicts row/column counts, headers and separators to instantiate a grid, followed by ROI-based span inference. All performance claims rest on evaluation against four independent external benchmarks (PubTabNet, FinTabNet, PubTables-1M, SciTSR) rather than any internal derivation that reduces to fitted parameters or self-citations. No equations, uniqueness theorems, or ansatzes are shown to be smuggled in via prior self-work; the method is described as a direct architectural choice evaluated empirically. This is the standard non-circular pattern for an ML vision paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that the four named benchmarks are representative and that the grid-centric prediction pipeline captures table structure without systematic failure modes on real documents.

axioms (1)

domain assumption Standard benchmarks (PubTabNet, FinTabNet, PubTables-1M, SciTSR) provide reliable ground truth for table structure.
Invoked implicitly by reporting competitive performance on these datasets.

invented entities (1)

Tiny Recursive Module (TRM) no independent evidence
purpose: Lightweight global reasoning over table layout
New module introduced in the paper for the grid-centric approach.

pith-pipeline@v0.9.0 · 5709 in / 1344 out tokens · 33941 ms · 2026-05-22T07:09:09.671847+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean (and Cost/FunctionalEquation.lean) reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

FastTab combines (i) a Tiny Recursive Module (TRM) for global reasoning and (ii) axial 1D Transformer encoders that capture long-range dependencies along rows and columns. The model predicts row/column counts, header rows, and separators to construct a grid, then infers rowspan/colspan using ROI-aligned cell features.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 3 internal anchors

[1]

In: Proc

Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: Data, model, and evaluation. In: Proc. ECCV (2020)

work page 2020
[2]

In: Proc

Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: DeepDeSRT: Deep learning for detection and structure recognition of tables in document images. In: Proc. ICDAR (2017)

work page 2017
[3]

arXiv:2303.04384 (2023)

Zhang, Z., Hu, P., Ma, J., Du, J., Zhang, J., Zhu, H., Yin, B., Yin, B., Liu, C.: SEMv2: Table separation line detection based on conditional convolution. arXiv:2303.04384 (2023)

work page arXiv 2023
[4]

Pattern Recognition126, 108565 (2022)

Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: An accurate table structure recognizer. Pattern Recognition126, 108565 (2022)

work page 2022
[5]

In: Proc

Paliwal, S.S., Vishwanath, D., Rahul, R., Sharma, M., Vig, L.: TableNet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In: Proc. ICDAR (2019) 16 Laziz Hamdi, Amine Tamasna, Pascal Boisson and Thierry Paquet

work page 2019
[6]

In: Proc

Qiao, L., Li, Z., Cheng, Z., Zhang, P., Pu, S., Niu, Y., Ren, W., Tan, W., Wu, F.: LGPMA: Complicated table structure recognition with local and global pyramid mask alignment. In: Proc. ICDAR (2021)

work page 2021
[7]

In: Proc

Tensmeyer, C., Morariu, V.I., Price, B., Cohen, S., Martinez, T.: Deep splitting and merging for table structure decomposition. In: Proc. ICDAR (2019)

work page 2019
[8]

In: Proc

Smock, B., Pesala, R., Abraham, R.: PubTables-1M: Towards comprehensive table extraction from unstructured documents. In: Proc. CVPR (2022)

work page 2022
[9]

arXiv:2208.14687 (2022)

Guo, Z., Yu, Y., Lv, P., Zhang, C., Li, H., Wang, Z., Yao, K., Liu, J., Wang, J.: TRUST: An accurate and end-to-end table structure recognizer using splitting- based transformers. arXiv:2208.14687 (2022)

work page arXiv 2022
[10]

In: Proc

Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: TGRNet: A table graph reconstruction network for table structure recognition. In: Proc. ICCV (2021)

work page 2021
[11]

In: Proc

Liu, H., Li, X., Liu, B., Jiang, D., Liu, Y., Ren, B.: Neural collaborative graph machines for table structure recognition. In: Proc. CVPR (2022)

work page 2022
[12]

arXiv:1908.04729 (2019)

Chi, Z., Huang, H., Xu, H.-D., Yu, H., Yin, W., Mao, X.-L.: Complicated table structure recognition. arXiv:1908.04729 (2019)

work page arXiv 1908
[13]

In: Proc

Lysak, M., Nassar, A., Livathinos, N., Auer, C., Staar, P.: Optimized table tok- enization for table structure recognition. In: Proc. ICDAR (2023)

work page 2023
[14]

In: Proc

Chen, L., Huang, C., Zheng, X., Lin, J., Huang, X.-J.: TableVLM: Multi-modal pre-training for table structure recognition. In: Proc. ACL (2023)

work page 2023
[15]

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoy- anov, V., Zettlemoyer, L.: BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv:1910.13461 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1910
[16]

Pattern Recognition Letters165(2023)

Wang, H., Xue, Y., Zhang, J., Jin, L.: Scene table structure recognition with seg- mentation collaboration and alignment. Pattern Recognition Letters165(2023)

work page 2023
[17]

In: Proc

Lin, W., Sun, Z., Ma, C., Li, M., Wang, J., Sun, L., Huo, Q.: TSRFormer: Table structure recognition with transformers. In: Proc. ACM Multimedia (2022)

work page 2022
[18]

In: Proc

Xue, W., Li, Q., Tao, D.: Res2TIM: Reconstruct syntactic structures from table images. In: Proc. ICDAR (2019)

work page 2019
[19]

arXiv:2506.07015 (2025)

Hou, Q., Wang, J.: TABLET: Table structure recognition using encoder-only trans- formers. arXiv:2506.07015 (2025)

work page arXiv 2025
[20]

In: Proc

Qi, X., He, Y., Chen, Y., Gu, D., Gao, P., Ye, J., Xiao, R.: Ping An VCGROUP’s solution for ICDAR 2021 competition on scientific literature parsing task B: Table recognition to HTML. In: Proc. ICDAR (2021)

work page 2021
[21]

In: Proc

Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor (GTE): A framework for joint table identification and cell structure recognition using visual context. In: Proc. WACV (2021)

work page 2021
[22]

arXiv:2409.13148 (2024)

Zhang, Z., Liu, S., Hu, P., Ma, J., Du, J., Zhang, J., Hu, Y.: UniTabNet: Bridging vision and language models for enhanced table structure recognition. arXiv:2409.13148 (2024)

work page arXiv 2024
[23]

In: Proc

Wan, J., Song, S., Yu, W., Liu, Y., Cheng, W., Huang, F., Bai, X., Yao, C., Yang, Z.: OmniParser: A unified framework for text spotting, key information extraction and table recognition. In: Proc. CVPR (2024)

work page 2024
[24]

In: Proc

Kim, G., Hong, T., Yim, M., Nam, J., Park, J., Yim, J., Hwang, W., Yun, S., Han, D., Park, S.: OCR-free document understanding transformer. In: Proc. ECCV (2022)

work page 2022
[25]

In: Proc

Nassar, A., Livathinos, N., Lysak, M., Staar, P.: TableFormer: Table structure understanding with transformers. In: Proc. CVPR (2022)

work page 2022
[26]

Less is More: Recursive Reasoning with Tiny Networks

Jolicoeur-Martineau, A.: Less is more: Recursive reasoning with tiny networks. arXiv:2510.04871 (2025) FastTab: A Fast Table Recognizer with a TRM and 1D Transformers 17

work page internal anchor Pith review Pith/arXiv arXiv 2025
[27]

In: Proc

Ly, N.T., Takasu, A.: An end-to-end multi-task learning model for image-based table recognition. In: Proc. VISAPP (2023)

work page 2023
[28]

In: Proc

Huang, Y., Lu, N., Chen, D., Li, Y., Xie, Z., Zhu, S., Gao, L., Peng, W.: Improving table structure recognition with visual-alignment sequential coordinate modeling. In: Proc. CVPR (2023)

work page 2023
[29]

ACM Computing Surveys 56(10) (2024)

Huang, J., Chen, H., Yu, F., Lu, W.: From detection to application: Recent ad- vances in understanding scientific tables and figures. ACM Computing Surveys 56(10) (2024)

work page 2024
[30]

Better & Faster Large Language Models via Multi-token Prediction

Gloeckle, F., Youbi Idrissi, B., Rozi‘ere, B., Lopez-Paz, D., Synnaeve, G.: Better & faster large language models via multi-token prediction. arXiv:2404.19737 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[31]

Pattern Recognition157, 110816 (2025)

Long,R.,Xing,H.,Yang,Z.,Zheng,Q.,Yu,Z.,Huang,F.,Yao,C.:LORE++:Log- ical location regression network for table structure recognition with pre-training. Pattern Recognition157, 110816 (2025)

work page 2025
[32]

Neurocomputing (2024)

Yu, C., Li, W., Li, W., Zhu, Z., Liu, R., Hou, B., Jiao, L.: A survey for table recognition based on deep learning. Neurocomputing (2024)

work page 2024
[33]

In: Proc

Liu, H., Li, X., Liu, B., Jiang, D., Liu, Y., Ren, B.: Show, read and reason: Table structure recognition with flexible context aggregator. In: Proc. ACM Multimedia (2021)

work page 2021
[34]

In: Proc

Nguyen, N.Q., Pham, X.P., Tran, T.-A.: RTSR: A real-time table structure recog- nition approach. In: Proc. ECAI (2024)

work page 2024
[35]

In: Proc

Xing, H., Gao, F., Long, R., Bu, J., Zheng, Q., Li, L., Yao, C., Yu, Z.: LORE: Logical location regression network for table structure recognition. In: Proc. AAAI (2023)

work page 2023
[36]

In: Proc

Lyu, P., Ma, W., Wang, H., Yu, Y., Zhang, C., Yao, K., Xue, Y., Wang, J.: Grid- Former: Towards accurate table structure recognition via grid prediction. In: Proc. ACM Multimedia (2023)

work page 2023
[37]

arXiv:2501.11800 (2025)

Khang, M., Hong, T.: TFLOP: Table structure recognition framework with layout pointer mechanism. arXiv:2501.11800 (2025)

work page arXiv 2025

[1] [1]

In: Proc

Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: Data, model, and evaluation. In: Proc. ECCV (2020)

work page 2020

[2] [2]

In: Proc

Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: DeepDeSRT: Deep learning for detection and structure recognition of tables in document images. In: Proc. ICDAR (2017)

work page 2017

[3] [3]

arXiv:2303.04384 (2023)

Zhang, Z., Hu, P., Ma, J., Du, J., Zhang, J., Zhu, H., Yin, B., Yin, B., Liu, C.: SEMv2: Table separation line detection based on conditional convolution. arXiv:2303.04384 (2023)

work page arXiv 2023

[4] [4]

Pattern Recognition126, 108565 (2022)

Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: An accurate table structure recognizer. Pattern Recognition126, 108565 (2022)

work page 2022

[5] [5]

In: Proc

Paliwal, S.S., Vishwanath, D., Rahul, R., Sharma, M., Vig, L.: TableNet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In: Proc. ICDAR (2019) 16 Laziz Hamdi, Amine Tamasna, Pascal Boisson and Thierry Paquet

work page 2019

[6] [6]

In: Proc

Qiao, L., Li, Z., Cheng, Z., Zhang, P., Pu, S., Niu, Y., Ren, W., Tan, W., Wu, F.: LGPMA: Complicated table structure recognition with local and global pyramid mask alignment. In: Proc. ICDAR (2021)

work page 2021

[7] [7]

In: Proc

Tensmeyer, C., Morariu, V.I., Price, B., Cohen, S., Martinez, T.: Deep splitting and merging for table structure decomposition. In: Proc. ICDAR (2019)

work page 2019

[8] [8]

In: Proc

Smock, B., Pesala, R., Abraham, R.: PubTables-1M: Towards comprehensive table extraction from unstructured documents. In: Proc. CVPR (2022)

work page 2022

[9] [9]

arXiv:2208.14687 (2022)

Guo, Z., Yu, Y., Lv, P., Zhang, C., Li, H., Wang, Z., Yao, K., Liu, J., Wang, J.: TRUST: An accurate and end-to-end table structure recognizer using splitting- based transformers. arXiv:2208.14687 (2022)

work page arXiv 2022

[10] [10]

In: Proc

Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: TGRNet: A table graph reconstruction network for table structure recognition. In: Proc. ICCV (2021)

work page 2021

[11] [11]

In: Proc

Liu, H., Li, X., Liu, B., Jiang, D., Liu, Y., Ren, B.: Neural collaborative graph machines for table structure recognition. In: Proc. CVPR (2022)

work page 2022

[12] [12]

arXiv:1908.04729 (2019)

Chi, Z., Huang, H., Xu, H.-D., Yu, H., Yin, W., Mao, X.-L.: Complicated table structure recognition. arXiv:1908.04729 (2019)

work page arXiv 1908

[13] [13]

In: Proc

Lysak, M., Nassar, A., Livathinos, N., Auer, C., Staar, P.: Optimized table tok- enization for table structure recognition. In: Proc. ICDAR (2023)

work page 2023

[14] [14]

In: Proc

Chen, L., Huang, C., Zheng, X., Lin, J., Huang, X.-J.: TableVLM: Multi-modal pre-training for table structure recognition. In: Proc. ACL (2023)

work page 2023

[15] [15]

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoy- anov, V., Zettlemoyer, L.: BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv:1910.13461 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1910

[16] [16]

Pattern Recognition Letters165(2023)

Wang, H., Xue, Y., Zhang, J., Jin, L.: Scene table structure recognition with seg- mentation collaboration and alignment. Pattern Recognition Letters165(2023)

work page 2023

[17] [17]

In: Proc

Lin, W., Sun, Z., Ma, C., Li, M., Wang, J., Sun, L., Huo, Q.: TSRFormer: Table structure recognition with transformers. In: Proc. ACM Multimedia (2022)

work page 2022

[18] [18]

In: Proc

Xue, W., Li, Q., Tao, D.: Res2TIM: Reconstruct syntactic structures from table images. In: Proc. ICDAR (2019)

work page 2019

[19] [19]

arXiv:2506.07015 (2025)

Hou, Q., Wang, J.: TABLET: Table structure recognition using encoder-only trans- formers. arXiv:2506.07015 (2025)

work page arXiv 2025

[20] [20]

In: Proc

Qi, X., He, Y., Chen, Y., Gu, D., Gao, P., Ye, J., Xiao, R.: Ping An VCGROUP’s solution for ICDAR 2021 competition on scientific literature parsing task B: Table recognition to HTML. In: Proc. ICDAR (2021)

work page 2021

[21] [21]

In: Proc

Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor (GTE): A framework for joint table identification and cell structure recognition using visual context. In: Proc. WACV (2021)

work page 2021

[22] [22]

arXiv:2409.13148 (2024)

Zhang, Z., Liu, S., Hu, P., Ma, J., Du, J., Zhang, J., Hu, Y.: UniTabNet: Bridging vision and language models for enhanced table structure recognition. arXiv:2409.13148 (2024)

work page arXiv 2024

[23] [23]

In: Proc

Wan, J., Song, S., Yu, W., Liu, Y., Cheng, W., Huang, F., Bai, X., Yao, C., Yang, Z.: OmniParser: A unified framework for text spotting, key information extraction and table recognition. In: Proc. CVPR (2024)

work page 2024

[24] [24]

In: Proc

Kim, G., Hong, T., Yim, M., Nam, J., Park, J., Yim, J., Hwang, W., Yun, S., Han, D., Park, S.: OCR-free document understanding transformer. In: Proc. ECCV (2022)

work page 2022

[25] [25]

In: Proc

Nassar, A., Livathinos, N., Lysak, M., Staar, P.: TableFormer: Table structure understanding with transformers. In: Proc. CVPR (2022)

work page 2022

[26] [26]

Less is More: Recursive Reasoning with Tiny Networks

Jolicoeur-Martineau, A.: Less is more: Recursive reasoning with tiny networks. arXiv:2510.04871 (2025) FastTab: A Fast Table Recognizer with a TRM and 1D Transformers 17

work page internal anchor Pith review Pith/arXiv arXiv 2025

[27] [27]

In: Proc

Ly, N.T., Takasu, A.: An end-to-end multi-task learning model for image-based table recognition. In: Proc. VISAPP (2023)

work page 2023

[28] [28]

In: Proc

Huang, Y., Lu, N., Chen, D., Li, Y., Xie, Z., Zhu, S., Gao, L., Peng, W.: Improving table structure recognition with visual-alignment sequential coordinate modeling. In: Proc. CVPR (2023)

work page 2023

[29] [29]

ACM Computing Surveys 56(10) (2024)

Huang, J., Chen, H., Yu, F., Lu, W.: From detection to application: Recent ad- vances in understanding scientific tables and figures. ACM Computing Surveys 56(10) (2024)

work page 2024

[30] [30]

Better & Faster Large Language Models via Multi-token Prediction

Gloeckle, F., Youbi Idrissi, B., Rozi‘ere, B., Lopez-Paz, D., Synnaeve, G.: Better & faster large language models via multi-token prediction. arXiv:2404.19737 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[31] [31]

Pattern Recognition157, 110816 (2025)

Long,R.,Xing,H.,Yang,Z.,Zheng,Q.,Yu,Z.,Huang,F.,Yao,C.:LORE++:Log- ical location regression network for table structure recognition with pre-training. Pattern Recognition157, 110816 (2025)

work page 2025

[32] [32]

Neurocomputing (2024)

Yu, C., Li, W., Li, W., Zhu, Z., Liu, R., Hou, B., Jiao, L.: A survey for table recognition based on deep learning. Neurocomputing (2024)

work page 2024

[33] [33]

In: Proc

Liu, H., Li, X., Liu, B., Jiang, D., Liu, Y., Ren, B.: Show, read and reason: Table structure recognition with flexible context aggregator. In: Proc. ACM Multimedia (2021)

work page 2021

[34] [34]

In: Proc

Nguyen, N.Q., Pham, X.P., Tran, T.-A.: RTSR: A real-time table structure recog- nition approach. In: Proc. ECAI (2024)

work page 2024

[35] [35]

In: Proc

Xing, H., Gao, F., Long, R., Bu, J., Zheng, Q., Li, L., Yao, C., Yu, Z.: LORE: Logical location regression network for table structure recognition. In: Proc. AAAI (2023)

work page 2023

[36] [36]

In: Proc

Lyu, P., Ma, W., Wang, H., Yu, Y., Zhang, C., Yao, K., Xue, Y., Wang, J.: Grid- Former: Towards accurate table structure recognition via grid prediction. In: Proc. ACM Multimedia (2023)

work page 2023

[37] [37]

arXiv:2501.11800 (2025)

Khang, M., Hong, T.: TFLOP: Table structure recognition framework with layout pointer mechanism. arXiv:2501.11800 (2025)

work page arXiv 2025