arxiv: 2604.19147 · v1 · submitted 2026-04-21 · 💻 cs.LG · cs.AI

Recognition: unknown

Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling

Weijie Zhao , Mingquan Liu , Bolun Wang , Simo Wu , Nuobei Xie , Rui-Jie Zhu , Peng Zhou

Authors on Pith no claims yet

Pith reviewed 2026-05-10 03:11 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords Nexusformernonlinear attentiontransformer scalingprogressive scalinggeometric scaling lawzero initializationattention expansionlanguage modeling

0 comments

The pith

Replacing linear attention projections with nonlinear three-stage mappings lets transformers add capacity incrementally while preserving pretrained knowledge.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies the fixed linear QKV projections in standard attention as the core bottleneck that confines features to rigid subspaces and forces full retraining when scaling models. It replaces those projections with a Nexus-Rank layer that performs nonlinear expansion through dual activations across progressively higher dimensions, allowing new blocks to be inserted along two axes. Because the added blocks start at zero, existing representations remain intact and training continues from the pretrained state. Experiments on language modeling and reasoning tasks show that models grown this way reach the same perplexity as training larger models from scratch but with substantially lower compute cost. The stable dynamics under zero initialization also yield a geometric scaling law that forecasts performance at successive sizes.

Core claim

Nexusformer replaces the linear Q/K/V projections with a Nexus-Rank layer consisting of a three-stage nonlinear mapping driven by dual activations in successively higher-dimensional spaces; this structure removes the fixed-subspace constraint and supports lossless structured growth in which zero-initialized blocks can be added along two axes without disturbing previously learned representations, producing a stable convergence trajectory from which a geometric scaling law can be derived that accurately predicts performance across expansion scales.

What carries the argument

The Nexus-Rank layer, a three-stage nonlinear mapping driven by dual activations in progressively higher dimensional spaces that replaces linear Q/K/V projections and permits zero-initialized block insertion for capacity growth.

If this is right

Progressive scaling from 240M to 440M parameters matches Tokenformer perplexity while using up to 41.5 percent less training compute.
Zero-initialized additions to the Nexus-Rank layer leave previously learned representations unchanged.
The resulting growth trajectory is stable enough to support derivation of a geometric scaling law that predicts performance at intermediate and larger scales.
The same efficiency pattern holds on reasoning benchmarks in addition to language modeling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The two-axis growth mechanism could extend to other attention-based architectures such as vision or multimodal transformers without requiring architecture-specific redesign.
This form of incremental capacity addition points toward modular training pipelines in which model size can be increased on demand rather than in fixed discrete jumps.
If the geometric scaling law continues to hold at larger sizes, it may reduce the need for exhaustive hyperparameter searches when planning compute budgets for next-generation models.

Load-bearing premise

Zero-initialized blocks inserted along the two growth axes preserve all prior learned representations without causing interference or instability during continued training.

What would settle it

A progressive scaling run from a 240M to a 440M Nexusformer that either fails to match the perplexity of a from-scratch 440M baseline or requires more than the claimed 41.5 percent compute savings on the reported language-modeling benchmarks.

Figures

Figures reproduced from arXiv: 2604.19147 by Bolun Wang, Mingquan Liu, Nuobei Xie, Peng Zhou, Rui-Jie Zhu, Simo Wu, Weijie Zhao.

**Figure 1.** Figure 1: Nexusformer. (a) Replacing linear Q/K/V projections with a nonlinear Nexus-Rank mapping D → M → A → D. (b) Three-stage expansion with dual nonlinear activations for capacity enlargement. (c) Dual-axis growth over (M, A) with zero-initialized new blocks for non-destructive scaling. These constraints motivate our Nexus-Rank expansion layer. By introducing high-dimensional nonlinear mappings where M > D and A… view at source ↗

**Figure 2.** Figure 2: Comparison of Model Training Efficiency and Perplexity [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Relationship between Logarithmic Energy Radius and Logarithmic Perplexity under Scaling Law 5.3. Scaling Law for Expansion and Continued Training in Nexusformer Having established the validity of radial energy R, we investigate its quantitative relationship with model performance. While R captures the geometric state, mapping it directly to the optimization objective (perplexity) requires addressing scal… view at source ↗

**Figure 5.** Figure 5: Radial energy trajectory R(t) for Nexusformer under the 240M→380M growth path (0-init). The red curve is the OLS harmonic fit and point colors indicate perplexity. The box reports R 2 and p values. D. Periodicity Test of the Radial Energy R To verify the centripetal law of Appendix C.3.1 from a global geometric perspective, we calculated the radial energy level. R(t) = p (UP %(t))2 + (NOC%(t))2. (32) [PIT… view at source ↗

**Figure 6.** Figure 6: Distribution of Capability Bias and Orthogonality in Radial Potential Field F. Growth Potential via the Radius R We define a feature vector x = [x1, x2, x3] ⊤, where x1, x2, and x3 denote UP %, NOC%, and performance, respectively. To build a prior indicator, we focus on the subspace {UP , NOC} that contributes most to structural alignment. In this subspace, the Euclidean distance to the origin represents t… view at source ↗

**Figure 7.** Figure 7: Comparison of Per-Token FLOPs between Nexusformer and Qwen Series across Different Parameter Scales. Per-Token FLOPs Comparison.Computational costs of Nexusformer and Qwen baseline models across various parameter 20 [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

read the original abstract

Scaling Transformers typically necessitates training larger models from scratch, as standard architectures struggle to expand without discarding learned representations. We identify the primary bottleneck in the attention mechanism's linear projections, which strictly confine feature extraction to fixed-dimensional subspaces, limiting both expressivity and incremental capacity. To address this, we introduce Nexusformer, which replaces linear $Q/K/V$ projections with a Nexus-Rank layer, a three-stage nonlinear mapping driven by dual activations in progressively higher dimensional spaces. This design overcomes the linearity constraint and enables lossless structured growth: new capacity can be injected along two axes via zero-initialized blocks that preserve pretrained knowledge. Experiments on language modeling and reasoning benchmarks demonstrate that Nexusformer matches Tokenformer's perplexity using up to 41.5\% less training compute during progressive scaling (240M to 440M). Furthermore, our analysis of growth dynamics reveals that zero initialization induces a stable convergence trajectory, allowing us to derive a geometric scaling law that accurately predicts performance across expansion scales.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Nexusformer replaces linear attention projections with a nonlinear three-stage layer and zero-init blocks to support incremental growth, but the lossless claim rests on unverified stability during expansion.

read the letter

The core idea is replacing the usual linear Q/K/V projections with a Nexus-Rank layer that uses dual activations across three progressively higher-dimensional stages. This is meant to remove the fixed-subspace limit and let you add capacity by inserting zero-initialized blocks along two axes while keeping prior representations intact. They show this on language modeling and reasoning tasks, matching Tokenformer perplexity from 240M to 440M parameters with up to 41.5% less compute, and they derive a geometric scaling law from the observed growth dynamics. That combination of architecture change plus growth protocol is the actual new piece; prior progressive-training work usually sticks to linear expansions or retraining from scratch. The zero-init trick is straightforward and addresses a practical pain point in scaling experiments. The benchmarks are standard and the reported savings are large enough to notice if they hold. The scaling law is presented as coming out of the dynamics rather than just curve-fitting, which is worth checking. The soft spot is the assumption that the added blocks produce no measurable interference or drift before the extra training steps finish. The abstract states that zero initialization creates a stable trajectory, but without details on how they measured representational overlap or checked sensitivity to random seeds and data order, the compute numbers could partly reflect extra optimization rather than pure architectural gain. If the higher-dimensional intermediate spaces shift the effective mapping even slightly, the lossless premise weakens. This is aimed at people working on efficient transformer scaling and incremental training. A reader who cares about alternative attention designs or reducing the cost of growing models would get concrete ideas to test. The work is coherent enough on its own terms to go to peer review, though the stability experiments will need to be tightened before the savings can be taken at face value.

Referee Report

3 major / 0 minor

Summary. The paper introduces Nexusformer, which replaces the linear Q/K/V projections in transformers with a Nexus-Rank layer consisting of a three-stage nonlinear mapping using dual activations in progressively higher-dimensional spaces. This design is claimed to overcome the fixed-subspace limitation of linear projections and enable lossless structured growth by injecting new capacity along two axes using zero-initialized blocks that preserve pretrained representations. Experiments on language modeling and reasoning benchmarks are said to show that Nexusformer matches Tokenformer's perplexity while using up to 41.5% less training compute during progressive scaling from 240M to 440M parameters. Analysis of the growth dynamics is used to derive a geometric scaling law that is stated to accurately predict performance across expansion scales.

Significance. If the lossless expansion property and associated compute savings hold under rigorous verification, the work would address a practical bottleneck in transformer scaling by allowing incremental capacity addition without full retraining from scratch. The derivation of a geometric scaling law from the zero-initialization dynamics could provide a useful predictive tool for model growth if it is shown to be independent of the specific experimental data.

major comments (3)

[Abstract] Abstract: The headline claim that Nexusformer matches Tokenformer perplexity with up to 41.5% less training compute during scaling from 240M to 440M parameters is load-bearing for the paper's contribution, yet the abstract supplies no experimental details on baselines, number of runs, error bars, random seeds, or data-order sensitivity; without these, the robustness of the reported savings cannot be assessed.
[Abstract] Abstract: The geometric scaling law is presented as derived from the growth dynamics induced by zero initialization and as accurately predicting performance, but it is unclear whether the law is a parameter-free derivation or a fit to the same expansion trajectories it is later used to predict; this circularity risk directly undermines the claim that the law follows from the architecture.
[Abstract] Abstract: The central premise that zero-initialized blocks added to the Nexus-Rank layer preserve all previously learned representations without interference or transient degradation is unverified in the provided text; if even small representational drift occurs via the higher-dimensional intermediate spaces, the lossless growth and compute-saving attributions fail.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below with clarifications drawn from the full text and indicate where revisions strengthen the presentation of our claims.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claim that Nexusformer matches Tokenformer perplexity with up to 41.5% less training compute during scaling from 240M to 440M parameters is load-bearing for the paper's contribution, yet the abstract supplies no experimental details on baselines, number of runs, error bars, random seeds, or data-order sensitivity; without these, the robustness of the reported savings cannot be assessed.

Authors: We acknowledge the abstract's brevity precludes full experimental metadata. Section 4 details the Tokenformer baselines, reports averages over three independent runs with different seeds, displays error bars in Figure 3, and includes data-order sensitivity analysis via controlled shuffling. We have partially revised the abstract to reference the multi-run averaging and error bars for improved transparency while respecting length constraints. revision: partial
Referee: [Abstract] Abstract: The geometric scaling law is presented as derived from the growth dynamics induced by zero initialization and as accurately predicting performance, but it is unclear whether the law is a parameter-free derivation or a fit to the same expansion trajectories it is later used to predict; this circularity risk directly undermines the claim that the law follows from the architecture.

Authors: Section 5.2 derives the geometric form analytically from the zero-initialization dynamics and dual-activation geometry of the Nexus-Rank layer; only normalization constants are set from the base scale. To eliminate any appearance of circularity, the revised manuscript adds a forward-prediction test on an unseen expansion trajectory (440M to 600M) that matches empirical results, confirming the law's predictive utility beyond the original data. revision: yes
Referee: [Abstract] Abstract: The central premise that zero-initialized blocks added to the Nexus-Rank layer preserve all previously learned representations without interference or transient degradation is unverified in the provided text; if even small representational drift occurs via the higher-dimensional intermediate spaces, the lossless growth and compute-saving attributions fail.

Authors: Section 3.3 proves that zero initialization leaves the pretrained mapping unchanged at the instant of expansion, with higher-dimensional activations gated to new capacity only. Section 4.3 supplies supporting empirical checks via representation cosine similarity. We have revised the manuscript to foreground these verifications and added a brief KL-divergence ablation on output stability post-expansion. revision: yes

Circularity Check

1 steps flagged

Geometric scaling law derived from growth dynamics but used to predict the same expansion data

specific steps

fitted input called prediction [Abstract]
"our analysis of growth dynamics reveals that zero initialization induces a stable convergence trajectory, allowing us to derive a geometric scaling law that accurately predicts performance across expansion scales."

Growth dynamics are measured directly from the progressive scaling experiments (240M to 440M) that demonstrate the 41.5% compute reduction. Deriving the law from those same dynamics and then claiming it 'accurately predicts' performance on the identical scales makes the prediction a statistical re-description of the input data rather than an independent result.

full rationale

The paper's central derivation chain for the geometric scaling law reduces to a post-hoc fit of the progressive scaling experiments it claims to predict. The lossless growth premise (zero-init blocks preserving representations) is asserted to enable the derivation, yet the law is then invoked to explain the reported 41.5% compute savings on the identical 240M-to-440M trajectory. No independent, parameter-free derivation or external validation is shown; the 'prediction' is therefore equivalent to the fitted input by construction. Other elements (Nexus-Rank nonlinear mapping) remain non-circular as they are architectural definitions rather than claimed derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated. The Nexus-Rank layer is an architectural construct rather than a new physical entity.

pith-pipeline@v0.9.0 · 5485 in / 1210 out tokens · 53149 ms · 2026-05-10T03:11:17.674463+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 27 canonical work pages · 16 internal anchors

[1]

Phi-4 Technical Report

Abdin, M., Aneja, J., Behl, H., Bubeck, S., Eldan, R., Gunasekar, S., Harrison, M., Hewett, R. J., Javaheripi, M., Kauffmann, P., et al. Phi-4 technical report.arXiv preprint arXiv:2412.08905,

work page internal anchor Pith review arXiv
[2]

GPT-4 Technical Report

Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., 8 Nexusformer: Nonlinear Attention Expansion for Stable and Inheritable Transformer Scaling Anadkat, S., et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

On the expressivity role of layernorm in transformers’ attention.arXiv preprint arXiv:2305.02582,

Brody, S., Alon, U., and Yahav, E. On the expressivity role of layernorm in transformers’ attention.arXiv preprint arXiv:2305.02582,

work page arXiv
[4]

Net2net: Accelerating learning via knowl- edge transfer

Chen, T., Goodfellow, I., and Shlens, J. Net2net: Accel- erating learning via knowledge transfer.arXiv preprint arXiv:1511.05641,

work page arXiv
[5]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Clark, P., Cowhey, I., Etzioni, O., Khot, T., Sabharwal, A., Schoenick, C., and Tafjord, O. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

and Maile, K

Gesmundo, A. and Maile, K. Composable function- preserving expansions for transformer architectures. arXiv preprint arXiv:2308.06103,

work page arXiv
[7]

The Llama 3 Herd of Models

Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Casas, D. d. L., Hendricks, L. A., Welbl, J., Clark, A., et al. Training compute-optimal large language models.arXiv preprint arXiv:2203.15556,

work page internal anchor Pith review arXiv
[9]

Mistral 7B

Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., Casas, D. d. l., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., et al. Mistral 7b.arXiv preprint arXiv:2310.06825,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Mixtral of Experts

Jiang, A. Q., Sablayrolles, A., Roux, A., Mensch, A., Savary, B., Bamford, C., Chaplot, D. S., Casas, D. d. l., Hanna, E. B., Bressand, F., et al. Mixtral of experts.arXiv preprint arXiv:2401.04088,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Scaling Laws for Neural Language Models

Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020a. Kaplan, J., McCandlish, S., Henighan, T., Brown, T. B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., and Amodei, D. Scaling laws for neural lang...

work page internal anchor Pith review Pith/arXiv arXiv 2001
[12]

Crafting papers on machine learning

Langley, P. Crafting papers on machine learning. In Langley, P. (ed.),Proceedings of the 17th International Conference on Machine Learning (ICML 2000), pp. 1207–1216, Stan- ford, CA,

2000
[13]

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Morgan Kaufmann. Lepikhin, D., Lee, H., Xu, Y ., Chen, D., Firat, O., Huang, Y ., Krikun, M., Shazeer, N., and Chen, Z. Gshard: Scaling giant models with conditional computation and automatic sharding.arXiv preprint arXiv:2006.16668,

work page internal anchor Pith review arXiv 2006
[14]

On the expressive power of self-attention matrices.arXiv preprint arXiv:2106.03764,

Likhosherstov, V ., Choromanski, K., and Weller, A. On the expressive power of self-attention matrices.arXiv preprint arXiv:2106.03764,

work page arXiv
[15]

arXiv preprint arXiv:2510.18855 , year=

Ling, T., Shen, A., Li, B., Hu, B., Jing, B., Chen, C., Huang, C., Zhang, C., Yang, C., Lin, C., et al. Every step evolves: Scaling reinforcement learning for trillion-scale thinking model.arXiv preprint arXiv:2510.18855,

work page arXiv
[16]

2 OLMo 2 Furious

OLMo, T., Walsh, P., Soldaini, L., Groeneveld, D., Lo, K., Arora, S., Bhagia, A., Gu, Y ., Huang, S., Jordan, M., et al. 2 olmo 2 furious.arXiv preprint arXiv:2501.00656,

work page internal anchor Pith review arXiv
[17]

Elle: Efficient lifelong pre-training for emerging data

Qin, Y ., Zhang, J., Lin, Y ., Liu, Z., Li, P., Sun, M., and Zhou, J. Elle: Efficient lifelong pre-training for emerging data. arXiv preprint arXiv:2203.06311,

work page arXiv
[18]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Shazeer, N., Mirhoseini, A., Maziarz, K., Davis, A., Le, Q., Hinton, G., and Dean, J. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538,

work page internal anchor Pith review Pith/arXiv arXiv
[19]

Talking-heads attention.arXiv preprint arXiv:2003.02436,

Shazeer, N., Lan, Z., Cheng, Y ., Ding, N., and Hou, L. Talking-heads attention.arXiv preprint arXiv:2003.02436,

work page arXiv 2003
[20]

Gemma 3 Technical Report

Team, G., Kamath, A., Ferret, J., Pathak, S., Vieillard, N., Merhej, R., Perrin, S., Matejovicova, T., Ram ´e, A., Rivi`ere, M., et al. Gemma 3 technical report.arXiv preprint arXiv:2503.19786,

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y ., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al. Llama 2: Open foundation and fine- tuned chat models.arXiv preprint arXiv:2307.09288,

work page internal anchor Pith review Pith/arXiv arXiv
[22]

F., Xian, Y ., Lenssen, J

Wang, H., Fan, Y ., Naeem, M. F., Xian, Y ., Lenssen, J. E., Wang, L., Tombari, F., and Schiele, B. Tokenformer: Rethinking transformer scaling with tokenized model parameters.arXiv preprint arXiv:2410.23168, 2024a. Wang, J., Li, J., Zhang, M., Li, Z., Xia, Q., Duan, X., Wang, Z., and Huai, B. When and how to grow? on efficient pre- training via model gro...

work page arXiv
[23]

mHC: Manifold-Constrained Hyper-Connections

Xie, Z., Wei, Y ., Cao, H., Zhao, C., Deng, C., Li, J., Dai, D., Gao, H., Chang, J., Zhao, L., et al. mhc: Manifold-constrained hyper-connections.arXiv preprint arXiv:2512.24880,

work page internal anchor Pith review arXiv
[24]

Qwen3 Technical Report

Yang, A., Li, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Gao, C., Huang, C., Lv, C., et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388,

work page internal anchor Pith review Pith/arXiv arXiv
[25]

Efficient construction of model family through pro- gressive training using model expansion.arXiv preprint arXiv:2504.00623,

Yano, K., Takase, S., Kobayashi, S., Kiyono, S., and Suzuki, J. Efficient construction of model family through pro- gressive training using model expansion.arXiv preprint arXiv:2504.00623,

work page arXiv
[26]

Masked structural growth for 2x faster language model pre-training.arXiv preprint arXiv:2305.02869,

Yao, Y ., Zhang, Z., Li, J., and Wang, Y . Masked structural growth for 2x faster language model pre-training.arXiv preprint arXiv:2305.02869,

work page arXiv
[27]

HellaSwag: Can a Machine Really Finish Your Sentence?

Zellers, R., Holtzman, A., Bisk, Y ., Farhadi, A., and Choi, Y . Hellaswag: Can a machine really finish your sentence? arXiv preprint arXiv:1905.07830,

work page internal anchor Pith review arXiv 1905
[28]

An attention free transformer,

Zhai, S., Talbott, W., Srivastava, N., Huang, C., Goh, H., Zhang, R., and Susskind, J. An attention free transformer. arXiv preprint arXiv:2105.14103,

work page arXiv
[29]

Main-scale model specifications Table 6 lists detailed configurations for Nexusformer, Tokenformer, Qwen3, and Transformer++ at four parameter scales

E.1. Main-scale model specifications Table 6 lists detailed configurations for Nexusformer, Tokenformer, Qwen3, and Transformer++ at four parameter scales. Empirically, the flexible choices ofM and A allow Nexusformer to achieve competitive or better performance even with smaller hidden sizes, supporting the motivation that nonlinear projections mitigate ...

2048
[30]

Table 8.Comparative analysis of computational efficiency, training cost, and modeling performance between Nexusformer and Qwen series. Model Size FLOPs Train time PPL PPL/FLOPs nexusformer 240M4.256×10 8 47 10.81572.54×10 −8 qwen3 240M4.223×10 8 46.1 18.17414.30×10 −8 nexusformer 300M5.648×10 8 79.79 9.42161.67×10 −8 qwen3 300M6.036×10 8 87.75 13.74952.28...

2016