arxiv: 2604.02103 · v2 · submitted 2026-04-02 · 💻 cs.CV · cs.LG

Recognition: 2 theorem links

· Lean Theorem

CASHG: Context-Aware Stylized Online Handwriting Generation

Jinsu Shin , Sungeun Hong , JinYeong Bak

Authors on Pith no claims yet

Pith reviewed 2026-05-13 21:41 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords online handwriting generationcontext-aware synthesisstylized handwritingtransformer decodercurriculum learningconnectivity metricssentence-level generationbigram modeling

0 comments

The pith

CASHG explicitly models character transitions to generate more natural sentence-level stylized handwriting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Online handwriting generation at sentence scale requires maintaining style, continuity, and spacing between characters, which prior methods handle only implicitly through sequence modeling. CASHG addresses this by encoding character identity together with sentence-dependent context memory, then fusing them inside a bigram-aware sliding-window Transformer decoder that stresses local predecessor-current transitions plus gated sentence-level context. Training follows a three-stage curriculum that starts with isolated glyphs and scales to full sentences, improving robustness when transition data is sparse. The result is higher scores on a new Connectivity and Spacing Metrics suite while staying competitive on standard DTW trajectory similarity, with human evaluators confirming the gains. If the approach holds, it makes reusable, style-preserving digital ink more reliable for applications that need continuous, context-sensitive strokes.

Core claim

CASHG is a context-aware generator that obtains character identity and sentence context via a Character Context Encoder, fuses them in a bigram-aware sliding-window Transformer decoder with gated context fusion, and trains through a three-stage curriculum from isolated glyphs to full sentences; this explicit modeling of inter-character connectivity yields improved Connectivity and Spacing Metrics under benchmark-matched protocols while remaining competitive in DTW trajectory similarity, with gains corroborated by human evaluation.

What carries the argument

Bigram-aware sliding-window Transformer decoder that emphasizes local predecessor-current transitions, fused with sentence context from the Character Context Encoder via gated fusion.

If this is right

Higher scores on Connectivity and Spacing Metrics than prior methods under matched evaluation protocols.
Competitive performance on DTW-based trajectory similarity measures.
Gains in boundary naturalness confirmed by human evaluation.
Improved robustness to sparse transition coverage through staged curriculum training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The explicit transition modeling could transfer to other sequential synthesis tasks such as speech prosody or gesture generation where local continuity matters.
The new Connectivity and Spacing Metrics may serve as a reusable benchmark that shifts future handwriting evaluation toward boundary properties.
Curriculum scaling from glyphs to sentences offers a reusable training pattern for any generative model facing compositional data scarcity.

Load-bearing premise

That explicit bigram-aware modeling of predecessor-current transitions plus curriculum training will reliably produce natural inter-character connectivity even when training data has sparse transition coverage at sentence scale.

What would settle it

A controlled test on sentences containing rare or unseen character bigrams where CASHG shows no improvement or a drop in Connectivity and Spacing Metrics relative to strong baselines would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.02103 by Jinsu Shin, JinYeong Bak, Sungeun Hong.

**Figure 2.** Figure 2: Overview of CASHG. Reference handwriting images are encoded into Writer-style memory Mw and Glyph-style memory Mg . The Character Context Encoder is used in two input modes: isolated-character inputs produce deterministic Character-Identity Embeddings, while sentence inputs are further processed by a lightweight Transformer encoder to produce position-dependent context memory. The handwriting generator s… view at source ↗

**Figure 3.** Figure 3: Bigram-aware sliding-window Transformer decoding with gated con [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Human evaluation of perceptual similarity in style, connectivity, and spacing under two comparison protocols (DSD, DeepWriting). Tie denotes Cannot judge. use BRUSH for the DSD-style comparison, IAM-expanded for the DeepWriting comparison, and CASIA-OLHWDB (2.0–2.2) for the OLHWG comparison. We note DiffInk [29] as a relevant Chinese baseline but omit it from our comparisons due to the absence of publicly… view at source ↗

**Figure 5.** Figure 5: Qualitative sentence-level comparison on BRUSH (English) and CA [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Writer-style diversity under fixed content. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

Online handwriting represents strokes as time-ordered trajectories, which makes handwritten content easier to transform and reuse in a wide range of applications. However, generating natural sentence-level online handwriting that faithfully reflects a writer's style remains challenging, since sentence synthesis demands context-dependent characters with stroke continuity and spacing. Prior methods treat these boundary properties as implicit outcomes of sequence modeling, which becomes unreliable at the sentence scale and under limited compositional diversity. We propose CASHG, a context-aware stylized online handwriting generator that explicitly models inter-character connectivity for style-consistent sentence-level trajectory synthesis. CASHG uses a Character Context Encoder to obtain character identity and sentence-dependent context memory and fuses them in a bigram-aware sliding-window Transformer decoder that emphasizes local predecessor--current transitions, complemented by gated context fusion for sentence-level context.Training proceeds through a three-stage curriculum from isolated glyphs to full sentences, improving robustness under sparse transition coverage. We further introduce Connectivity and Spacing Metrics (CSM), a boundary-aware evaluation suite that quantifies cursive connectivity and spacing similarity. Under benchmark-matched evaluation protocols, CASHG consistently improves CSM over comparison methods while remaining competitive in DTW-based trajectory similarity, with gains corroborated by a human evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CASHG adds explicit bigram modeling and a boundary metric to sentence handwriting synthesis, but the curriculum's handling of sparse transitions needs tighter checks.

read the letter

The main point is that CASHG makes predecessor-current transitions explicit through a Character Context Encoder and bigram-aware decoder, plus a three-stage curriculum, and backs it with a new Connectivity and Spacing Metric. This moves beyond treating boundaries as side effects of sequence models, which is a clear step for sentence-level stylized output. The reported CSM gains and human preference scores look useful for HCI applications where natural connectivity matters more than pure trajectory match.

Referee Report

2 major / 2 minor

Summary. The paper introduces CASHG, a context-aware model for stylized online handwriting generation at the sentence level. It uses a Character Context Encoder for character identity and sentence-dependent context memory, fused via a bigram-aware sliding-window Transformer decoder with gated context fusion. Training follows a three-stage curriculum from isolated glyphs to full sentences to handle sparse transitions, and the work proposes Connectivity and Spacing Metrics (CSM) as a boundary-aware evaluation suite. Under benchmark protocols, CASHG reports consistent CSM improvements over baselines while remaining competitive on DTW trajectory similarity, with corroboration from human evaluation.

Significance. If the CSM gains are substantiated by ablations and statistical tests, the explicit modeling of predecessor-current transitions and curriculum training could meaningfully advance sentence-level stylized handwriting synthesis beyond implicit sequence modeling approaches. The new CSM metrics address a gap in evaluating cursive connectivity and spacing, potentially influencing future benchmarks in the field.

major comments (2)

[Experimental Evaluation] Experimental section: the central claim of consistent CSM gains rests on the three-stage curriculum and bigram-aware decoder, yet no ablation isolating the curriculum stages (glyphs to words to sentences) or frequency analysis of bigram coverage in the training data is provided. This leaves open whether gains hold for rare transitions or are driven by frequent ones only.
[Method and Results] Method and results: exact baseline implementations, hyperparameter details, and statistical significance tests (e.g., p-values or confidence intervals) for the reported CSM improvements are not described, making it difficult to verify the 'consistent' outperformance under matched protocols.

minor comments (2)

[Model Architecture] The description of the gated context fusion and sliding-window mechanism would benefit from additional equations or a detailed diagram for clarity and reproducibility.
[Human Evaluation] Ensure the human evaluation protocol (number of participants, rating scale, and statistical analysis) is fully detailed to support the corroboration claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address each major comment below and will revise the paper accordingly to strengthen the experimental validation and reporting.

read point-by-point responses

Referee: [Experimental Evaluation] Experimental section: the central claim of consistent CSM gains rests on the three-stage curriculum and bigram-aware decoder, yet no ablation isolating the curriculum stages (glyphs to words to sentences) or frequency analysis of bigram coverage in the training data is provided. This leaves open whether gains hold for rare transitions or are driven by frequent ones only.

Authors: We agree that the current manuscript would benefit from explicit ablations and bigram analysis to substantiate the curriculum's role. In the revision we will add ablation studies comparing the full three-stage curriculum against reduced variants (e.g., direct sentence-level training and two-stage training) and include a frequency breakdown of bigrams in the training set, reporting separate CSM scores for frequent versus rare transitions to demonstrate that gains are not limited to common cases. revision: yes
Referee: [Method and Results] Method and results: exact baseline implementations, hyperparameter details, and statistical significance tests (e.g., p-values or confidence intervals) for the reported CSM improvements are not described, making it difficult to verify the 'consistent' outperformance under matched protocols.

Authors: We acknowledge that the manuscript lacks sufficient implementation and statistical details. The revised version will provide exact baseline code references and hyperparameter tables for all models, along with statistical significance tests (paired t-tests with p-values and 95% confidence intervals) on the CSM improvements to rigorously verify consistent outperformance under the benchmark protocols. revision: yes

Circularity Check

0 steps flagged

No circularity: CASHG architecture, curriculum, and CSM metrics are independently defined and empirically tested

full rationale

The paper defines a new Transformer-based architecture with Character Context Encoder, bigram-aware decoder, and gated fusion, plus a three-stage curriculum from glyphs to sentences. It introduces Connectivity and Spacing Metrics (CSM) as a separate boundary-aware evaluation suite. All performance claims (CSM gains, DTW competitiveness, human eval) are presented as results of training on standard external handwriting datasets under benchmark protocols. No equations, parameters, or metrics are defined in terms of each other by construction, no load-bearing self-citations appear, and no fitted inputs are relabeled as predictions. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard assumptions of neural sequence modeling (Transformer attention can capture local transitions when augmented with explicit context) and the empirical claim that curriculum training improves robustness under sparse data. No new physical entities or ad-hoc constants are introduced beyond typical neural-net hyperparameters.

axioms (2)

domain assumption Transformer attention mechanisms can be made to emphasize local predecessor-current transitions via sliding-window masking and bigram conditioning.
Invoked in the description of the bigram-aware sliding-window Transformer decoder.
domain assumption A three-stage curriculum from isolated glyphs to full sentences improves robustness when transition coverage is sparse.
Stated as the training procedure that addresses limited compositional diversity.

pith-pipeline@v0.9.0 · 5507 in / 1180 out tokens · 30327 ms · 2026-05-13T21:41:16.111309+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CASHG uses a Character Context Encoder ... bigram-aware sliding-window Transformer decoder ... three-stage curriculum (glyphs→bigrams→sentences)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We further introduce Connectivity and Spacing Metrics (CSM)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

[1]

Aksan, E., Pece, F., Hilliges, O.: Deepwriting: Making digital ink editable via deep generative modeling. In: CHI. pp. 1–14 (2018) 2, 4, 5, 10, 14, 32, 33, 34, 35, 41

work page 2018
[2]

In: KDD Workshop

Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: KDD Workshop. pp. 359–370 (1994) 11, 31, 32, 39, 41

work page 1994
[3]

In: ICCV

Bhunia, A.K., Khan, S., Cholakkal, H., Anwer, R.M., Khan, F., Shah, M.: Hand- writing transformers. In: ICCV. pp. 1066–1074. IEEE (2021) 4, 5

work page 2021
[4]

In: ICPR

Bhunia, A.K., Bhowmick, A., Bhunia, A.K., Konwer, A., Banerjee, P., Roy, P.P., Pal, U.: Handwriting trajectory recovery using end-to-end deep encoder-decoder network. In: ICPR. pp. 3639–3644. IEEE (2018) 4

work page 2018
[5]

TACL10, 73–91 (2022) 4, 6, 27

Clark, J.H., Garrette, D., Turc, I., Wieting, J.: Canine: Pre-training an efficient tokenization-free encoder for language representation. TACL10, 73–91 (2022) 4, 6, 27

work page 2022
[6]

In: CVPR

Dai, G., Zhang, Y., Wang, Q., Du, Q., Yu, Z., Liu, Z., Huang, S.: Disentangling writer and character styles for handwriting generation. In: CVPR. pp. 5977–5986 (2023) 2, 4, 5, 8, 13, 27

work page 2023
[7]

Cartographica10(2), 112– 122 (1973) 11, 25, 26

Douglas, D.H., Peucker, T.K.: Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica10(2), 112– 122 (1973) 11, 25, 26

work page 1973
[8]

In: ICML

Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., et al.: Scaling rectified flow transformers for high-resolution image synthesis. In: ICML. pp. 12606–12633. PMLR (2024) 1

work page 2024
[9]

Cognitive Computation13(5), 1406–1421 (2021) 1, 4

Faundez-Zanuy, M., Mekyska, J., Impedovo, D.: Online handwriting, signature and touch dynamics: tasks and potential applications in the field of security and health. Cognitive Computation13(5), 1406–1421 (2021) 1, 4

work page 2021
[10]

In: CVPR

Fogel, S., Averbuch-Elor, H., Cohen, S., Mazor, S., Litman, R.: Scrabblegan: Semi- supervised varying length handwritten text generation. In: CVPR. pp. 4324–4333 (2020) 4

work page 2020
[11]

In: AAAI

Gan, J., Wang, W.: Higan: handwriting imitation conditioned on arbitrary-length texts and disentangled styles. In: AAAI. vol. 35, pp. 7484–7492 (2021) 4, 10

work page 2021
[12]

In: CVPR

Girdhar, R., El-Nouby, A., Liu, Z., Singh, M., Alwala, K.V., Joulin, A., Misra, I.: Imagebind: One embedding space to bind them all. In: CVPR. pp. 15180–15190 (2023) 1

work page 2023
[13]

arXiv preprint arXiv:1308.0850 (2013) 4, 5

Graves, A.: Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013) 4, 5

work page arXiv 2013
[14]

In: AISTATS

Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: A new estimation prin- ciple for unnormalized statistical models. In: AISTATS. pp. 297–304. JMLR Work- shop and Conference Proceedings (2010) 4

work page 2010
[15]

In: CVPR

Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an in- variant mapping. In: CVPR. vol. 2, pp. 1735–1742. IEEE (2006) 4

work page 2006
[16]

In: CVPR

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR. pp. 770–778 (2016) 27

work page 2016
[17]

In: ICDAR

Jungo, M., Wolf, B., Maksai, A., Musat, C., Fischer, A.: Character queries: a transformer-based approach to on-line handwritten character segmentation. In: ICDAR. pp. 98–114. Springer (2023) 10, 25

work page 2023
[18]

In: ECCV

Kang, L., Riba, P., Wang, Y., Rusinol, M., Fornés, A., Villegas, M.: Ganwriting: content-conditioned generation of styled handwritten word images. In: ECCV. pp. 273–289. Springer (2020) 4 16 J. Shin et al

work page 2020
[19]

NeurIPS33, 18661–18673 (2020) 4

Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., Krishnan, D.: Supervised contrastive learning. NeurIPS33, 18661–18673 (2020) 4

work page 2020
[20]

In: ECCV

Kotani, A., Tellex, S., Tompkin, J.: Generating handwriting via decoupled style descriptors. In: ECCV. pp. 764–780. Springer (2020) 2, 4, 10, 14, 23, 25, 32, 33, 34, 35

work page 2020
[21]

In: International Joint Conference on Neural Networks (IJCNN)

Lee, H., Verma, B.: Over-segmentation and neural binary validation for cursive handwriting recognition. In: International Joint Conference on Neural Networks (IJCNN). pp. 1–5. IEEE (2010) 4

work page 2010
[22]

In: ICDAR

Liu, C.L., Yin, F., Wang, D.H., Wang, Q.F.: Casia online and offline chinese hand- writing databases. In: ICDAR. pp. 37–41. IEEE (2011) 10, 25

work page 2011
[23]

In: ECCV

Liu, Y., Khalid, F.B., Wang, L., Zhang, Y., Wang, C.: Elegantly written: Disen- tangling writer and character styles for enhancing online chinese handwriting. In: ECCV. pp. 409–425. Springer (2024) 4, 13

work page 2024
[24]

In: ICDAR

Liwicki, M., Bunke, H.: Iam-ondb-an on-line english sentence database acquired from handwritten text on a whiteboard. In: ICDAR. pp. 956–961. IEEE (2005) 5, 10, 23, 24, 25

work page 2005
[25]

TNNLS34(11), 8503–8515 (2022) 4

Luo, C., Zhu, Y., Jin, L., Li, Z., Peng, D.: Slogan: handwriting style synthesis for arbitrary-length and out-of-vocabulary text. TNNLS34(11), 8503–8515 (2022) 4

work page 2022
[26]

TMLR (2025) 4

Mitrevski, B., Rak, A., Schnitzler, J., Li, C., Maksai, A., Berent, J., Musat, C.C.: Inksight: Offline-to-online handwriting conversion by teaching vision-language models to read and write. TMLR (2025) 4

work page 2025
[27]

In: ICDAR

Nakatsuru, K., Uchida, S.: Learning to kern: Set-wise estimation of optimal letter space. In: ICDAR. pp. 18–34. Springer (2024) 4

work page 2024
[28]

IJDAR25(4), 385–414 (2022) 1, 4

Ott, F., Rügamer, D., Heublein, L., Hamann, T., Barth, J., Bischl, B., Mutschler, C.: Benchmarking online sequence-to-sequence and character-based handwriting recognition from imu-enhanced pens. IJDAR25(4), 385–414 (2022) 1, 4

work page 2022
[29]

In: ICLR (2026), https://openreview.net/forum?id=XKOEQFKFdL11

Pan, W., He, H., Cheng, H., Shi, Y., Jin, L.: Diffink: Glyph- and style-aware latent diffusion transformer for text to online handwriting generation. In: ICLR (2026), https://openreview.net/forum?id=XKOEQFKFdL11

work page 2026
[30]

In: ICCV

Peebles, W., Xie, S.: Scalable diffusion models with transformers. In: ICCV. pp. 4195–4205 (2023) 1

work page 2023
[31]

IEEE TPAMI22(1), 63–84 (2002) 4

Plamondon, R., Srihari, S.N.: Online and off-line handwriting recognition: a com- prehensive survey. IEEE TPAMI22(1), 63–84 (2002) 4

work page 2002
[32]

Ramer, U.: An iterative procedure for the polygonal approximation of plane curves. Comput. Graph. Image Process.1(3), 244–256 (1972) 11, 25, 26

work page 1972
[33]

In: ICLR (2025),https://openreview.net/forum?id= DhHIw9Nbl12, 4, 5, 10, 14, 25, 26

Ren, M., Zhang, Y.M., Chen, Y.: Decoupling layout from glyph in online chinese handwriting generation. In: ICLR (2025),https://openreview.net/forum?id= DhHIw9Nbl12, 4, 5, 10, 14, 25, 26

work page 2025
[34]

Neurocomputing568, 127063 (2024) 8

Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., Liu, Y.: Roformer: Enhanced trans- former with rotary position embedding. Neurocomputing568, 127063 (2024) 8

work page 2024
[35]

Tang, S., Lian, Z.: Write like you: Synthesizing your cursive online chinese hand- writingviametric-basedmetalearning.In:CGF.vol.40,pp.141–151.WileyOnline Library (2021) 4, 13

work page 2021
[36]

Tang, S., Xia, Z., Lian, Z., Tang, Y., Xiao, J.: Fontrnn: Generating large-scale chinese fonts via recurrent neural network. In: CGF. vol. 38, pp. 567–577. Wiley Online Library (2019) 4, 5

work page 2019
[37]

In: AAAI

Tolosana, R., Delgado-Santos, P., Perez-Uribe, A., Vera-Rodriguez, R., Fierrez, J., Morales, A.: Deepwritesyn: On-line handwriting synthesis via deep short-term representations. In: AAAI. vol. 35, pp. 600–608 (2021) 4, 5 CASHG 17

work page 2021
[38]

NeurIPS30(2017) 5

Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. NeurIPS30(2017) 5

work page 2017
[39]

TACL10, 291–306 (2022) 4, 6

Xue, L., Barua, A., Constant, N., Al-Rfou, R., Narang, S., Kale, M., Roberts, A., Raffel, C.: Byt5: Towards a token-free future with pre-trained byte-to-byte models. TACL10, 291–306 (2022) 4, 6

work page 2022
[40]

IEEE TPAMI40(4), 849–862 (2017) 4, 5, 13

Zhang, X.Y., Yin, F., Zhang, Y.M., Liu, C.L., Bengio, Y.: Drawing and recognizing chinese characters with recurrent neural network. IEEE TPAMI40(4), 849–862 (2017) 4, 5, 13

work page 2017
[41]

de” with top, centroid, and bottom statistics. (b) Boundary offsets.Vertical off- sets used by VDL for the same bigram. Fig.A3: Illustration of VDL on the bigram “de

Zhao, B., Tao, J., Yang, M., Tian, Z., Fan, C., Bai, Y.: Deep imitator: Handwriting calligraphyimitationviadeepattentionnetworks.PatternRecognition104,107080 (2020) 4, 5, 13 18 J. Shin et al. Appendices for CASHG: Context-Aware Stylized Online Handwriting Generation A Training Details and Curriculum Schedule A.1 Three-Stage Curriculum Sentence-level onlin...

work page 2020