pith. machine review for the scientific record. sign in

arxiv: 2604.14171 · v1 · submitted 2026-03-25 · 💻 cs.CL · cs.AI

Recognition: no theorem link

Benchmarking Linguistic Adaptation in Comparable-Sized LLMs: A Study of Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:44 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords Romanized NepaliLLM adaptationlow-resource languagesQLoRA fine-tuningBERTScoremodel benchmarkingNepali transliterationinstruction tuning
0
0 comments X

The pith

Fine-tuning rescues three LLMs from failing at Romanized Nepali generation

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper benchmarks how three comparable open-weight LLMs handle Romanized Nepali, the Latin-script form of the language used for most informal digital communication in Nepal. All three models fail to produce correct output when given zero-shot prompts, each showing a distinct failure pattern. After fine-tuning on a set of 10,000 transliterated instruction-following examples, performance converges across the models to roughly 0.75 BERTScore and above 23 on chrF++. Qwen3-8B ranks highest overall for its semantic relevance even before tuning and for structural metrics afterward, while Llama-3.1-8B records the largest gains and therefore offers the most headroom for further adaptation work.

Core claim

Zero-shot prompting produces architecture-specific failures in generating Romanized Nepali for Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B. After QLoRA fine-tuning with rsLoRA on the 10,000-sample bilingual dataset, all three models resolve these failures and reach BERTScore approximately 0.75 and chrF++ greater than 23. Qwen3-8B is the only model that yields semantically relevant zero-shot output and leads all structural alignment metrics after supervised fine-tuning, while Llama-3.1-8B shows the largest absolute gains, confirming the adaptation headroom hypothesis for weaker baselines.

What carries the argument

QLoRA with rsLoRA at rank 32 applied to the curated 10,000-sample bilingual transliterated instruction-following dataset, evaluated across perplexity, BERTScore, chrF++, ROUGE variants, and BLEU to compare zero-shot and post-tuning outputs.

If this is right

  • All three models overcome their distinct zero-shot failure modes and reach similar usable performance after fine-tuning.
  • Qwen3-8B supplies the strongest zero-shot semantic relevance and the best post-tuning structural scores, making it the recommended default choice.
  • Llama-3.1-8B delivers the largest metric gains and is therefore the preferred base model when the goal is iterative low-resource development.
  • The entire adaptation process updates only about 1 percent of each model's parameters and completes in under 27 GPU-hours, showing practical efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The headroom pattern suggests that base models with weaker zero-shot performance on a new script variant may still be the better starting point for fine-tuning pipelines.
  • The same benchmarking method could be applied directly to other languages that rely heavily on romanization for informal digital use.
  • Real deployment would require checking whether the observed metric levels persist on user-generated content outside the instruction-following format used in training.

Load-bearing premise

The 10,000 curated transliterated instruction-following samples represent typical real-world Romanized Nepali usage and are sufficient to support the reported performance numbers and model rankings.

What would settle it

If the fine-tuned models fall short of the reported BERTScore near 0.75 and chrF++ above 23 when tested on an independent collection of real-world Romanized Nepali text such as social-media posts or chat logs, the adaptation claims would not hold.

Figures

Figures reproduced from arXiv: 2604.14171 by Adarsha Rimal (Tribhuvan University), Ananda Rimal (Nepal Engineering College).

Figure 1
Figure 1. Figure 1: Experimental pipeline: data preparation, parameter-efficient fine-tuning, and evaluation across three model architectures. 5 [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Training and validation loss for Llama-3.1-8B over 3,375 steps. The green dotted line marks the best checkpoint at step 2,200 (minimum validation loss = 1.1585). Llama-3.1-8B exhibits the highest initial training loss (≈ 2.10) among all three models, consistent with its Tiktoken tokenizer over-fragmenting Romanized Nepali into low-frequency subword units [15]. Validation loss decreases steadily through Epo… view at source ↗
Figure 3
Figure 3. Figure 3: Training and validation loss for Mistral-7B-v0.1 over 3,375 steps. The green dotted line marks the best checkpoint at step 2,200 (minimum validation loss = 1.0930). Mistral-7B-v0.1 begins with the lowest initial training loss (≈ 1.90) of the three models, reflecting its SentencePiece tokenizer producing more coherent subword units for Latin-script input than Tiktoken. The validation curve is the smoothest … view at source ↗
Figure 4
Figure 4. Figure 4: Training and validation loss for Qwen3-8B over 3,375 steps. The green dotted line marks the best checkpoint at step 2,200 (minimum validation loss = 1.1313). Qwen3-8B starts with an intermediate initial training loss (≈ 2.20) and converges steadily through Epochs 1 14 [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
read the original abstract

Romanized Nepali, the Nepali language written in the Latin alphabet, is the dominant medium for informal digital communication in Nepal, yet it remains critically underresourced in the landscape of Large Language Models (LLMs). This study presents a systematic benchmarking of linguistic adaptation across three comparable-sized open-weight models: Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B. We evaluate these architectures under zero-shot and fine-tuned settings using a curated bilingual dataset of 10,000 transliterated instruction-following samples. Performance is quantified across five metrics spanning seven measurement dimensions: Perplexity (PPL), BERTScore, chrF++, ROUGE-1, ROUGE-2, ROUGE-L, and BLEU, capturing fluency, phonetic consistency, and semantic integrity. Models were fine-tuned using Quantized Low-Rank Adaptation (QLoRA) with Rank-Stabilized LoRA (rsLoRA) at rank r=32 on dual NVIDIA Tesla T4 GPUs, training only approximately 1% of each model's parameters in under 27 total GPU-hours. At zero-shot, all three models fail to generate Romanized Nepali, each exhibiting a distinct architecture-specific failure mode. Following fine-tuning, all three resolve these failures and converge to BERTScore approximately 0.75 and chrF++ greater than 23. Overall dimension-wise assessment across ten criteria identifies Qwen3-8B as the overall recommended architecture, being the only model to produce semantically relevant zero-shot output and leading all structural alignment metrics post-SFT. The adaptation headroom hypothesis is confirmed: Llama-3.1-8B, despite its weakest zero-shot baseline, achieves the largest absolute fine-tuning gains in PPL (Delta = -49.77) and BERTScore (Delta = +0.3287), making it the preferred choice for iterative low-resource development pipelines. This work establishes the first rigorous baseline for Romanized Nepali adaptation in comparable-sized open-weight LLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript benchmarks linguistic adaptation of three comparable-sized open-weight LLMs (Llama-3.1-8B, Mistral-7B-v0.1, Qwen3-8B) to Romanized Nepali. It evaluates zero-shot and QLoRA fine-tuned performance on a curated bilingual dataset of 10,000 transliterated instruction-following samples using PPL, BERTScore, chrF++, ROUGE-1/2/L, and BLEU. All models fail distinctively in zero-shot but converge post-fine-tuning to BERTScore ≈0.75 and chrF++ >23; Qwen3-8B is recommended overall for semantic relevance and structural metrics, while Llama-3.1-8B shows the largest adaptation gains (PPL Δ=-49.77, BERTScore Δ=+0.3287).

Significance. If the dataset is representative, this establishes the first rigorous baseline for Romanized Nepali adaptation in open-weight LLMs, confirming adaptation headroom and providing practical guidance for low-resource fine-tuning with QLoRA/rsLoRA. The multi-metric, dimension-wise comparison across models is a useful empirical contribution to multilingual NLP for under-resourced scripts.

major comments (2)
  1. [Dataset and Experimental Setup] The headline results (post-SFT BERTScore ≈0.75, chrF++ >23, specific deltas such as Llama PPL Δ=-49.77 and BERTScore Δ=+0.3287, and the Qwen3-8B recommendation) all rest on performance measured against a single curated set of 10k transliterated samples. No evidence is provided that this set reflects real-world Romanized Nepali distributions (e.g., informal social-media orthography, code-mixing patterns, domain coverage). Without a held-out real-world test partition, diversity statistics, or inter-annotator agreement on the transliterations, the observed convergence and relative rankings could be artifacts of curation.
  2. [Evaluation Metrics and Results] The evaluation protocol is insufficiently specified to support the comparative claims. The manuscript does not report the size or construction of any held-out test set, whether metrics were computed on the same samples used for fine-tuning, or any statistical significance testing for the reported deltas and model rankings.
minor comments (2)
  1. [Abstract] The abstract reports 'chrF++ greater than 23' without mean, variance, or exact values; report full statistics (means ± std) for all metrics in both zero-shot and fine-tuned settings.
  2. [Methods] Clarify the exact composition of the 10,000 samples (e.g., number of instruction vs. response pairs, sources of the original Nepali text) and any preprocessing or validation steps applied during transliteration.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, acknowledging where revisions are needed to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Dataset and Experimental Setup] The headline results (post-SFT BERTScore ≈0.75, chrF++ >23, specific deltas such as Llama PPL Δ=-49.77 and BERTScore Δ=+0.3287, and the Qwen3-8B recommendation) all rest on performance measured against a single curated set of 10k transliterated samples. No evidence is provided that this set reflects real-world Romanized Nepali distributions (e.g., informal social-media orthography, code-mixing patterns, domain coverage). Without a held-out real-world test partition, diversity statistics, or inter-annotator agreement on the transliterations, the observed convergence and relative rankings could be artifacts of curation.

    Authors: We agree that the representativeness of the curated 10k-sample dataset requires further substantiation to support the headline claims. In the revised manuscript we will expand the Dataset section with details on sample selection criteria, domain coverage, and any code-mixing patterns included. We will also report basic diversity statistics (vocabulary size, n-gram overlap) and explicitly state that evaluation metrics were computed on a held-out 20% test partition (2,000 samples) never seen during fine-tuning. A limitations paragraph will be added discussing the gap to informal social-media orthography and outlining future validation on external corpora. These changes directly address the concern without altering the reported numerical results. revision: yes

  2. Referee: [Evaluation Metrics and Results] The evaluation protocol is insufficiently specified to support the comparative claims. The manuscript does not report the size or construction of any held-out test set, whether metrics were computed on the same samples used for fine-tuning, or any statistical significance testing for the reported deltas and model rankings.

    Authors: We accept that the evaluation protocol description is incomplete. The revised Experimental Setup section will specify the random 80/20 train/test split, confirm that all metrics (PPL, BERTScore, chrF++, ROUGE, BLEU) were calculated exclusively on the held-out test set, and add statistical significance testing (bootstrap resampling with 1,000 iterations and paired t-tests) for all reported deltas and model rankings. These clarifications will be accompanied by the exact test-set size and construction method. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmarking with external metrics

full rationale

The paper conducts a standard empirical evaluation of three LLMs under zero-shot and QLoRA fine-tuning on a fixed 10k-sample dataset, reporting performance via off-the-shelf metrics (PPL, BERTScore, chrF++, ROUGE variants, BLEU). No derivation chain, equations, fitted parameters renamed as predictions, or self-citations appear in the abstract or described methodology. Central claims (post-SFT convergence to ~0.75 BERTScore, model rankings, adaptation headroom deltas) rest directly on measured values against the curated set and can be externally replicated or falsified without reference to any internal construction. This is the expected non-finding for a benchmarking study.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The study rests on standard assumptions from prior LLM adaptation literature rather than new theoretical constructs; the only notable free parameter is the chosen LoRA rank.

free parameters (1)
  • LoRA rank r = 32
    Hyperparameter set to 32 for rsLoRA fine-tuning; chosen rather than derived.
axioms (1)
  • domain assumption QLoRA with rsLoRA at r=32 can adapt LLMs to new language variants using only 1% of parameters
    Invoked in the experimental design without new justification in the abstract.

pith-pipeline@v0.9.0 · 5723 in / 1406 out tokens · 50448 ms · 2026-05-15T00:44:19.460535+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 4 internal anchors

  1. [1]

    QLoRA: Efficient finetuning of quantized LLMs,

    T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “QLoRA: Efficient finetuning of quantized LLMs,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 36, 2023

  2. [2]

    A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

    D. Kalajdzievski, “A rank stabilization scaling factor for fine-tuning with LoRA,”arXiv preprint arXiv:2312.03732, 2023

  3. [3]

    Natural language processing for Nepali text: A review,

    T. B. Shahi and C. Sitaula, “Natural language processing for Nepali text: A review,”Artificial Intelligence Review, vol. 55, no. 5, pp. 3401–3429, 2022

  4. [4]

    National Statistics Office (formerly Central Bureau of Statistics),National Population and Housing Census 2021, Government of Nepal, Kathmandu, 2021

  5. [5]

    On the dangers of stochastic par- rots: Can language models be too big?,

    E. M. Bender, T. Gebru, A. McMillan-Major, and S. Shmitchell, “On the dangers of stochastic par- rots: Can language models be too big?,” inProc. ACM Conference on Fairness, Accountability, and Transparency (FAccT), pp. 610–623, 2021

  6. [6]

    LLaMA: Open and Efficient Foundation Language Models

    H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “LLaMA: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023

  7. [7]

    Mistral 7B

    A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. Renard Lavaud, M.-A. Lachaux, P. Stock, T. Le Scao, T. Lavril, T. Wang, T. Lacroix, and W. El Sayed, “Mistral 7B,”arXiv preprint arXiv:2310.06825, 2023

  8. [8]

    Qwen Technical Report

    J. Bai, S. Bai, Y . Chu, Z. Cui, K. Dang, X. Deng, et al., “Qwen technical report,”arXiv preprint arXiv:2309.16609, 2023

  9. [9]

    Attention is all you need,

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 30, pp. 5998–6008, 2017

  10. [10]

    Language models are few-shot learners,

    T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are few-shot learners,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 33, pp. 1877–1901, 2020

  11. [11]

    Named Entity Recognition for Nepali Text Using Support Vector Machines,

    S. Bam and T. B. Shahi, "Named Entity Recognition for Nepali Text Using Support Vector Machines," in Proc. International Conference on Communication and Information Technology (ICCIT), 2014

  12. [12]

    NPVec1: Word embeddings for Nepali — construction and evaluation,

    P. Koirala and N. Niraula, “NPVec1: Word embeddings for Nepali — construction and evaluation,” in Proc. 6th Workshop on Representation Learning for NLP (RepL4NLP), pp. 174–184, 2021

  13. [13]

    NepBERTa: Nepali language model trained in a large corpus,

    S. Timilsina, M. Gautam, and B. Bhattarai, “NepBERTa: Nepali language model trained in a large corpus,” inProc. 2nd Conf. Asia-Pacific Chapter of ACL (AACL-IJCNLP), pp. 273–284, 2022

  14. [14]

    NepaliGPT: A generative language model for the Nepali language,

    S. Pudasaini, A. Dangol, and S. Shakya, "NepaliGPT: A generative language model for the Nepali language," arXiv preprint arXiv:2506.16399, 2025

  15. [15]

    How good is your tokenizer? On the monolingual performance of multilingual language models,

    P. Rust, J. Pfeiffer, I. Vuli´c, S. Ruder, and I. Gurevych, “How good is your tokenizer? On the monolingual performance of multilingual language models,” inProc. 59th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 3118–3135, 2021. 21

  16. [16]

    Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP,

    S. J. Mielke, Z. Alyafeai, E. Salesky, C. Raffel, M. Dey, M. Gallé, A. Raja, C. Si, W. Y . Lee, B. Sagot, and S. Tan, “Between words and characters: A brief history of open-vocabulary modeling and tokenization in NLP,”arXiv preprint arXiv:2112.10508, 2021

  17. [17]

    SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing,

    T. Kudo and J. Richardson, “SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing,” inProc. 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP): System Demonstrations, pp. 66–71, 2018

  18. [18]

    LoRA: Low-rank adaptation of large language models,

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” inInternational Conference on Learning Representations (ICLR), 2022

  19. [19]

    Finetuned language models are zero-shot learners,

    J. Wei, M. Bosma, V . Y . Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V . Le, “Finetuned language models are zero-shot learners,” inInternational Conference on Learning Representations (ICLR), 2022

  20. [20]

    Stanford Alpaca: An instruction-following LLaMA model,

    R. Taori, I. Gulrajani, T. Zhang, Y . Dubois, X. Li, C. Guestrin, P. Liang, and T. B. Hashimoto, “Stanford Alpaca: An instruction-following LLaMA model,” GitHub repository, Stanford University, 2023. https://github.com/tatsu-lab/stanford_alpaca

  21. [21]

    BLEU: A method for automatic evaluation of machine translation,

    K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: A method for automatic evaluation of machine translation,” inProc. 40th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 311–318, 2002

  22. [22]

    A call for clarity in reporting BLEU scores,

    M. Post, “A call for clarity in reporting BLEU scores,” inProc. Third Conference on Machine Translation (WMT), pp. 186–191, 2018

  23. [23]

    chrF: Character n-gram F-score for automatic MT evaluation,

    M. Popovi´c, “chrF: Character n-gram F-score for automatic MT evaluation,” inProc. Tenth Workshop on Statistical Machine Translation (WMT), pp. 392–395, 2015

  24. [24]

    BERTScore: Evaluating text generation with BERT,

    T. Zhang, V . Kishore, F. Wu, K. Q. Weinberger, and Y . Artzi, “BERTScore: Evaluating text generation with BERT,” inInternational Conference on Learning Representations (ICLR), 2020

  25. [25]

    Jurafsky and J

    D. Jurafsky and J. H. Martin,Speech and Language Processing, 3rd ed. (draft), Stanford University, 2024.https://web.stanford.edu/~jurafsky/slp3/

  26. [26]

    ROUGE: A package for automatic evaluation of summaries,

    C.-Y . Lin, “ROUGE: A package for automatic evaluation of summaries,” inProc. ACL Workshop on Text Summarization Branches Out, pp. 74–81, 2004

  27. [27]

    Alpaca Nepali SFT,

    S. Kafley, “Alpaca Nepali SFT,”Hugging Face Datasets, 2024. https://huggingface.co/ datasets/Saugatkafley/alpaca-nepali-sft

  28. [28]

    Google Translate,

    Google LLC, “Google Translate,”Google, 2024

  29. [29]

    IndicTransliteration: Transliteration library for Indic scripts,

    AI4Bharat, “IndicTransliteration: Transliteration library for Indic scripts,” GitHub repository, 2024

  30. [30]

    Unsloth: 2x faster, 70% less memory LLM finetuning,

    Unsloth AI, “Unsloth: 2x faster, 70% less memory LLM finetuning,” GitHub repository, 2024

  31. [31]

    TRL: Transformers reinforcement learning,

    L. von Werra, Y . Belkada, L. Tunstall, E. Beeching, T. Thrush, N. Lambert, S. Huang, K. Rasul, and Q. Gallouédec, “TRL: Transformers reinforcement learning,” GitHub repository, Hugging Face, 2020. https://github.com/huggingface/trl 22

  32. [32]

    LLM.int8(): 8-bit matrix multiplication for transformers at scale,

    T. Dettmers, M. Lewis, Y . Belkada, and L. Zettlemoyer, “LLM.int8(): 8-bit matrix multiplication for transformers at scale,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 35, pp. 30318–30332, 2022

  33. [33]

    SGDR: Stochastic gradient descent with warm restarts,

    I. Loshchilov and F. Hutter, “SGDR: Stochastic gradient descent with warm restarts,” inInternational Conference on Learning Representations (ICLR), 2017

  34. [34]

    How multilingual is multilingual BERT?,

    T. Pires, E. Schlinger, and D. Garrette, “How multilingual is multilingual BERT?,” inProc. 57th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 4996–5001, 2019

  35. [35]

    Direct preference optimization: Your language model is secretly a reward model,

    R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,” inAdvances in Neural Information Processing Systems (NeurIPS), vol. 36, pp. 53728–53741, 2023. 23 A QUALITATIVE GOLDEN QUESTIONS ANALYSIS This appendix presents the complete model outputs for all 10 G...

  36. [36]

    Diyeko list ma kun chai phalful hoina? 2. Syau

  37. [37]

    Suntala 4. Alu 5. Aap Internet bhane ko k ho? —[no response] Euta sano katha lekhnuhos. Topic: Chalakh Shyal Chalakh shyal gulo ekta jinish Table 9.Llama-3.1-8B base model responses (zero-shot).[no response]on 6 of 10 instructions. 24 A.2 Llama-3.1-8B: Fine-Tuned (Post-SFT) Instruction Input Output Computer bhaneko k ho? Yaslai nepali ma paribhashit garnu...

  38. [38]

    Nepal ma raatko sabai chhato janaa

    Alu, 4. Aap 4| aap Internet bhane ko k ho? — intaraneta (anuchcheda: i) eka vishvavyapi kampyutina netavarka ho jasale janakari sajha garna ra adana pradana garna sakcha| yo dherai kampyutara ra anya upakarana va yantraharu milera baneko cha juna samsarako sabai kunama phailieko cha| 25 Instruction Input Output Euta sano katha lekhnuhos. Topic: Chalakh Sh...

  39. [39]

    sunflower

    Alu, 4. Aap To determine which item does not belong to fruits (phal):Syau— appears to be a misspelling. Suntalatranslates to “sunflower” in English; sunflowers are plants, not fruits.Alumeans potato in Nepali; potatoes are tubers classified as vegetables, not fruits, because they do not develop from flowering parts.Aapis mango, which is a fruit. Internet ...

  40. [40]

    Ma bholi ghar janchu thiyo

    Alu, 4. Aap syau Internet bhane ko k ho? — intaraneta eka vishvavyapi samchara pranali ho jasale manisaharulai janakari adana ra diincha| yo dherai kampyutara ra anya upakaranaharu milera baneko cha juna vibhinna netavarkaharuma jodieko huncha, jasale prayogakartaharulai sajilaisanga janakari pahuncha garna sakshama banauncha| Euta sano katha lekhnuhos. T...

  41. [41]

    Aap The instruction asks for a list of items in order from smallest to largest

    Alu, 4. Aap The instruction asks for a list of items in order from smallest to largest. The input provides four options, and the correct response is to list them in ascending order. Internet bhane ko k ho? —[empty line] Euta sano katha lekhnuhos. Topic: Chalakh Shyal Chalakh Shyal is a popular dish in Nepal that is made with rice flour and yogurt. It is t...

  42. [42]

    Aap ho ,→Semantic error: outputs “yes” only; no item identified

    Alu, 4. Aap ho ,→Semantic error: outputs “yes” only; no item identified. Internet bhane ko k ho? — intaraneta bhaneko samsarako vishvavyapi sanjala ho| Euta sano katha lekhnuhos. Topic: Chalakh Shyal eka pataka tyahan eka javana keti thiyo jo sadhaim chalachitra herna ra nayan shaharaharu anveshana garna chahanthin| unale aphno parivarako barema sochdai t...