pith. sign in

arxiv: 2605.22422 · v1 · pith:E5LEVN7Unew · submitted 2026-05-21 · 💻 cs.CV · cs.AI

FastTab: A Fast Table Recognizer with a Tiny Recursive Module and 1D Transformers

Pith reviewed 2026-05-22 07:09 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords table structure recognitiondocument image analysistransformer encoderrecursive modulegrid predictionseparator localizationlow-latency inference
0
0 comments X

The pith

FastTab recovers table structure by predicting row and column counts plus separators directly with a tiny recursive module and 1D transformers instead of sequential HTML generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents FastTab as a grid-centric approach to table structure recognition that first builds an explicit grid from predicted counts, headers, and separators, then fills in cell spans using region-of-interest features. This design replaces the common autoregressive decoding of HTML tags with parallel axial processing along rows and columns plus a lightweight recursive module for global coherence. On four standard benchmarks the method reaches competitive accuracy at substantially lower inference latency. It also demonstrates robustness when text is pixel-level anonymized and extends naturally to documents with curved separators.

Core claim

FastTab constructs table structure by predicting the number of rows and columns, identifying header rows, and localizing horizontal and vertical separators with a combination of a Tiny Recursive Module for global reasoning and axial 1D Transformer encoders that model long-range dependencies separately along each axis; cell spans are then inferred from ROI-aligned features extracted after the grid is assembled.

What carries the argument

Tiny Recursive Module (TRM) combined with axial 1D Transformer encoders that process rows and columns independently to predict grid elements before span inference.

If this is right

  • Table structure can be recovered at inference speeds suitable for real-time document pipelines without sacrificing benchmark accuracy.
  • Pixel-level anonymization of cell content does not degrade separator localization when the model relies on grid-level predictions.
  • The same grid-construction pipeline extends to camera-captured documents containing curved separators.
  • Avoiding sequential tag generation removes a major source of compounding errors in complex multi-span tables.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The grid-first strategy could transfer to other layout-heavy tasks such as form field extraction or chart parsing where sequential decoding currently dominates.
  • Replacing the Tiny Recursive Module with a larger but still lightweight recurrent unit might improve global coherence on very large tables without losing the speed benefit.
  • Because the model separates row-wise and column-wise reasoning, it may scale more gracefully to tables with hundreds of rows than fully 2D attention approaches.

Load-bearing premise

Directly predicting row and column counts, headers, and separators followed by ROI-based span inference is enough to recover accurate structure without the error buildup that occurs in autoregressive HTML decoding.

What would settle it

A head-to-head comparison on PubTables-1M or SciTSR showing that FastTab's structure accuracy falls below strong autoregressive baselines on tables with frequent spanning cells while its latency advantage remains.

Figures

Figures reproduced from arXiv: 2605.22422 by Amine Tamasna, Laziz Hamdi, Pascal Boisson, Thierry Paquet.

Figure 1
Figure 1. Figure 1: Overview of FastTab. An FCN encoder extracts a 2D feature map, which [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture details of the Span Head and Tiny Recursive Module. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: PubTabNet example (PMC4608158_004_00.png) and its anonymised vari￾ants. 4 Experiments Datasets We evaluate FastTab for table structure recognition (TSR) on four es￾tablished benchmarks spanning scientific and financial domains. PubTabNet [1] contains table images from scientific articles paired with HTML annotations, and is widely used to assess image-based structure recovery under diverse vi￾sual styles. … view at source ↗
Figure 4
Figure 4. Figure 4: Representative predictions. Red/blue: row/column separators; dashed [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FinTabNet rotation robustness. Mean S-TEDS (%) ± one standard deviation under random in-plane rotations sampled uniformly from [−α, α]. The dashed line indicates the no-rotation baseline (98.2). predict a bounded residual offset for each boundary and each sample point; a smoothness penalty encourages locally regular curves, while a non-crossing regularizer discourages violations of the ordering constraint … view at source ↗
Figure 6
Figure 6. Figure 6: Effect of TRM iterations T on PubTabNet. (a) Accuracy improves rapidly for small T and saturates after T ≈ 6. (b) Throughput decreases with T, revealing an elbow around T = 6 that provides a favorable accuracy–speed trade-off. 5 Ablation study 5.1 Impact of the Tiny Recursive Module (TRM) FastTab refines the global latent representation z through a small number of recursive refinement steps. To quantify th… view at source ↗
read the original abstract

Table structure recognition (TSR) requires both table-level coherence (row/column counts, headers, spanning cells) and precise separator localization. We introduce FastTab, a grid-centric TSR model that avoids autoregressive HTML decoding by combining (i) a lightweight Tiny Recursive Module (TRM) for global reasoning and (ii) axial 1D Transformer encoders that capture long-range dependencies along rows and columns. The model predicts row/column counts, header rows, and separators to construct a grid, then infers rowspan/colspan using ROI-aligned cell features. Across four benchmarks (PubTabNet, FinTabNet, PubTables-1M, and SciTSR), FastTab achieves competitive structure recovery performance while operating at low-latency inference. We further study robustness under pixel-level anonymisation and show an extension to curved separators for camera-captured documents. The source code will be made publicly available at https://github.com/hamdilaziz/FastTab .

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces FastTab, a grid-centric table structure recognition (TSR) model that combines a Tiny Recursive Module (TRM) for global reasoning with axial 1D Transformer encoders to predict row/column counts, header rows, and separators, thereby constructing a grid before performing ROI-aligned inference of rowspan and colspan on cell features. It reports competitive structure recovery on PubTabNet, FinTabNet, PubTables-1M, and SciTSR while emphasizing low-latency inference, robustness under pixel-level anonymization, and an extension to curved separators; source code is promised to be released.

Significance. If the performance claims are substantiated with detailed metrics and ablations, the work could offer a practical advance for high-throughput document pipelines by sidestepping autoregressive HTML decoding and its potential error accumulation. The lightweight TRM plus 1D axial design and the public-code commitment are clear strengths; the robustness and curved-separator experiments add real-world relevance.

major comments (2)
  1. Abstract: the central performance claim is stated only qualitatively ('competitive structure recovery performance') with no TEDS, F1, or latency numbers, error bars, or ablation tables. Without these data it is impossible to judge whether the grid-first pipeline actually delivers the promised reduction in error accumulation relative to autoregressive baselines.
  2. Method description (grid construction step): the approach instantiates the table grid from predicted row/column counts and separators before ROI span inference. An off-by-one error in any global count would misalign all subsequent cell features. The manuscript provides no quantitative breakdown of count/separator prediction accuracy versus final structure metrics, leaving open whether competitive results hold only for simple rectangular tables or also for complex spanning cases.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: Abstract: the central performance claim is stated only qualitatively ('competitive structure recovery performance') with no TEDS, F1, or latency numbers, error bars, or ablation tables. Without these data it is impossible to judge whether the grid-first pipeline actually delivers the promised reduction in error accumulation relative to autoregressive baselines.

    Authors: We agree that the abstract would be strengthened by quantitative claims. In the revised version we will include specific TEDS, F1, and latency figures (with comparisons to the main baselines) so that readers can directly evaluate the claimed advantages of the grid-centric pipeline. revision: yes

  2. Referee: Method description (grid construction step): the approach instantiates the table grid from predicted row/column counts and separators before ROI span inference. An off-by-one error in any global count would misalign all subsequent cell features. The manuscript provides no quantitative breakdown of count/separator prediction accuracy versus final structure metrics, leaving open whether competitive results hold only for simple rectangular tables or also for complex spanning cases.

    Authors: We acknowledge the importance of quantifying error propagation from the global count and separator predictions. While the final structure metrics already reflect performance on benchmarks containing many spanning cells, we will add a dedicated analysis (new table and discussion) that reports count/separator prediction accuracy separately and correlates it with the end-to-end TEDS and F1 scores. This will explicitly address both simple and complex table cases. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture and evaluation are self-contained against external benchmarks

full rationale

The paper presents FastTab as a new grid-centric architecture that predicts row/column counts, headers and separators to instantiate a grid, followed by ROI-based span inference. All performance claims rest on evaluation against four independent external benchmarks (PubTabNet, FinTabNet, PubTables-1M, SciTSR) rather than any internal derivation that reduces to fitted parameters or self-citations. No equations, uniqueness theorems, or ansatzes are shown to be smuggled in via prior self-work; the method is described as a direct architectural choice evaluated empirically. This is the standard non-circular pattern for an ML vision paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the assumption that the four named benchmarks are representative and that the grid-centric prediction pipeline captures table structure without systematic failure modes on real documents.

axioms (1)
  • domain assumption Standard benchmarks (PubTabNet, FinTabNet, PubTables-1M, SciTSR) provide reliable ground truth for table structure.
    Invoked implicitly by reporting competitive performance on these datasets.
invented entities (1)
  • Tiny Recursive Module (TRM) no independent evidence
    purpose: Lightweight global reasoning over table layout
    New module introduced in the paper for the grid-centric approach.

pith-pipeline@v0.9.0 · 5709 in / 1344 out tokens · 33941 ms · 2026-05-22T07:09:09.671847+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 3 internal anchors

  1. [1]

    In: Proc

    Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: Data, model, and evaluation. In: Proc. ECCV (2020)

  2. [2]

    In: Proc

    Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: DeepDeSRT: Deep learning for detection and structure recognition of tables in document images. In: Proc. ICDAR (2017)

  3. [3]

    arXiv:2303.04384 (2023)

    Zhang, Z., Hu, P., Ma, J., Du, J., Zhang, J., Zhu, H., Yin, B., Yin, B., Liu, C.: SEMv2: Table separation line detection based on conditional convolution. arXiv:2303.04384 (2023)

  4. [4]

    Pattern Recognition126, 108565 (2022)

    Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: An accurate table structure recognizer. Pattern Recognition126, 108565 (2022)

  5. [5]

    In: Proc

    Paliwal, S.S., Vishwanath, D., Rahul, R., Sharma, M., Vig, L.: TableNet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In: Proc. ICDAR (2019) 16 Laziz Hamdi, Amine Tamasna, Pascal Boisson and Thierry Paquet

  6. [6]

    In: Proc

    Qiao, L., Li, Z., Cheng, Z., Zhang, P., Pu, S., Niu, Y., Ren, W., Tan, W., Wu, F.: LGPMA: Complicated table structure recognition with local and global pyramid mask alignment. In: Proc. ICDAR (2021)

  7. [7]

    In: Proc

    Tensmeyer, C., Morariu, V.I., Price, B., Cohen, S., Martinez, T.: Deep splitting and merging for table structure decomposition. In: Proc. ICDAR (2019)

  8. [8]

    In: Proc

    Smock, B., Pesala, R., Abraham, R.: PubTables-1M: Towards comprehensive table extraction from unstructured documents. In: Proc. CVPR (2022)

  9. [9]

    arXiv:2208.14687 (2022)

    Guo, Z., Yu, Y., Lv, P., Zhang, C., Li, H., Wang, Z., Yao, K., Liu, J., Wang, J.: TRUST: An accurate and end-to-end table structure recognizer using splitting- based transformers. arXiv:2208.14687 (2022)

  10. [10]

    In: Proc

    Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: TGRNet: A table graph reconstruction network for table structure recognition. In: Proc. ICCV (2021)

  11. [11]

    In: Proc

    Liu, H., Li, X., Liu, B., Jiang, D., Liu, Y., Ren, B.: Neural collaborative graph machines for table structure recognition. In: Proc. CVPR (2022)

  12. [12]

    arXiv:1908.04729 (2019)

    Chi, Z., Huang, H., Xu, H.-D., Yu, H., Yin, W., Mao, X.-L.: Complicated table structure recognition. arXiv:1908.04729 (2019)

  13. [13]

    In: Proc

    Lysak, M., Nassar, A., Livathinos, N., Auer, C., Staar, P.: Optimized table tok- enization for table structure recognition. In: Proc. ICDAR (2023)

  14. [14]

    In: Proc

    Chen, L., Huang, C., Zheng, X., Lin, J., Huang, X.-J.: TableVLM: Multi-modal pre-training for table structure recognition. In: Proc. ACL (2023)

  15. [15]

    BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

    Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoy- anov, V., Zettlemoyer, L.: BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv:1910.13461 (2019)

  16. [16]

    Pattern Recognition Letters165(2023)

    Wang, H., Xue, Y., Zhang, J., Jin, L.: Scene table structure recognition with seg- mentation collaboration and alignment. Pattern Recognition Letters165(2023)

  17. [17]

    In: Proc

    Lin, W., Sun, Z., Ma, C., Li, M., Wang, J., Sun, L., Huo, Q.: TSRFormer: Table structure recognition with transformers. In: Proc. ACM Multimedia (2022)

  18. [18]

    In: Proc

    Xue, W., Li, Q., Tao, D.: Res2TIM: Reconstruct syntactic structures from table images. In: Proc. ICDAR (2019)

  19. [19]

    arXiv:2506.07015 (2025)

    Hou, Q., Wang, J.: TABLET: Table structure recognition using encoder-only trans- formers. arXiv:2506.07015 (2025)

  20. [20]

    In: Proc

    Qi, X., He, Y., Chen, Y., Gu, D., Gao, P., Ye, J., Xiao, R.: Ping An VCGROUP’s solution for ICDAR 2021 competition on scientific literature parsing task B: Table recognition to HTML. In: Proc. ICDAR (2021)

  21. [21]

    In: Proc

    Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor (GTE): A framework for joint table identification and cell structure recognition using visual context. In: Proc. WACV (2021)

  22. [22]

    arXiv:2409.13148 (2024)

    Zhang, Z., Liu, S., Hu, P., Ma, J., Du, J., Zhang, J., Hu, Y.: UniTabNet: Bridging vision and language models for enhanced table structure recognition. arXiv:2409.13148 (2024)

  23. [23]

    In: Proc

    Wan, J., Song, S., Yu, W., Liu, Y., Cheng, W., Huang, F., Bai, X., Yao, C., Yang, Z.: OmniParser: A unified framework for text spotting, key information extraction and table recognition. In: Proc. CVPR (2024)

  24. [24]

    In: Proc

    Kim, G., Hong, T., Yim, M., Nam, J., Park, J., Yim, J., Hwang, W., Yun, S., Han, D., Park, S.: OCR-free document understanding transformer. In: Proc. ECCV (2022)

  25. [25]

    In: Proc

    Nassar, A., Livathinos, N., Lysak, M., Staar, P.: TableFormer: Table structure understanding with transformers. In: Proc. CVPR (2022)

  26. [26]

    Less is More: Recursive Reasoning with Tiny Networks

    Jolicoeur-Martineau, A.: Less is more: Recursive reasoning with tiny networks. arXiv:2510.04871 (2025) FastTab: A Fast Table Recognizer with a TRM and 1D Transformers 17

  27. [27]

    In: Proc

    Ly, N.T., Takasu, A.: An end-to-end multi-task learning model for image-based table recognition. In: Proc. VISAPP (2023)

  28. [28]

    In: Proc

    Huang, Y., Lu, N., Chen, D., Li, Y., Xie, Z., Zhu, S., Gao, L., Peng, W.: Improving table structure recognition with visual-alignment sequential coordinate modeling. In: Proc. CVPR (2023)

  29. [29]

    ACM Computing Surveys 56(10) (2024)

    Huang, J., Chen, H., Yu, F., Lu, W.: From detection to application: Recent ad- vances in understanding scientific tables and figures. ACM Computing Surveys 56(10) (2024)

  30. [30]

    Better & Faster Large Language Models via Multi-token Prediction

    Gloeckle, F., Youbi Idrissi, B., Rozi‘ere, B., Lopez-Paz, D., Synnaeve, G.: Better & faster large language models via multi-token prediction. arXiv:2404.19737 (2024)

  31. [31]

    Pattern Recognition157, 110816 (2025)

    Long,R.,Xing,H.,Yang,Z.,Zheng,Q.,Yu,Z.,Huang,F.,Yao,C.:LORE++:Log- ical location regression network for table structure recognition with pre-training. Pattern Recognition157, 110816 (2025)

  32. [32]

    Neurocomputing (2024)

    Yu, C., Li, W., Li, W., Zhu, Z., Liu, R., Hou, B., Jiao, L.: A survey for table recognition based on deep learning. Neurocomputing (2024)

  33. [33]

    In: Proc

    Liu, H., Li, X., Liu, B., Jiang, D., Liu, Y., Ren, B.: Show, read and reason: Table structure recognition with flexible context aggregator. In: Proc. ACM Multimedia (2021)

  34. [34]

    In: Proc

    Nguyen, N.Q., Pham, X.P., Tran, T.-A.: RTSR: A real-time table structure recog- nition approach. In: Proc. ECAI (2024)

  35. [35]

    In: Proc

    Xing, H., Gao, F., Long, R., Bu, J., Zheng, Q., Li, L., Yao, C., Yu, Z.: LORE: Logical location regression network for table structure recognition. In: Proc. AAAI (2023)

  36. [36]

    In: Proc

    Lyu, P., Ma, W., Wang, H., Yu, Y., Zhang, C., Yao, K., Xue, Y., Wang, J.: Grid- Former: Towards accurate table structure recognition via grid prediction. In: Proc. ACM Multimedia (2023)

  37. [37]

    arXiv:2501.11800 (2025)

    Khang, M., Hong, T.: TFLOP: Table structure recognition framework with layout pointer mechanism. arXiv:2501.11800 (2025)