NumCoKE: Ordinal-Aware Numerical Reasoning over Knowledge Graphs with Mixture-of-Experts and Contrastive Learning
Pith reviewed 2026-05-23 17:38 UTC · model grok-4.3
The pith
NumCoKE combines a mixture-of-experts encoder with ordinal contrastive learning to integrate numerical attributes into knowledge graph reasoning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NumCoKE is a numerical reasoning framework for knowledge graphs that uses a Mixture-of-Experts Knowledge-Aware encoder to jointly align entities, relations, and numerical attributes in a shared space while routing attribute features to relation-specific experts, paired with Ordinal Knowledge Contrastive Learning that constructs ordinal-aware positive and negative samples from prior knowledge to discriminate subtle semantic shifts.
What carries the argument
Mixture-of-Experts Knowledge-Aware (MoEKA) encoder that jointly encodes symbolic and numeric components with dynamic expert routing, together with Ordinal Knowledge Contrastive Learning (OKCL) that builds positive and negative samples using ordinal prior knowledge.
If this is right
- Knowledge graph models become able to extract relation-aware semantics directly from numerical attribute values.
- Models distinguish fine-grained ordinal relationships even when values are close or hard negatives are present.
- Numerical fact inference improves on benchmarks that contain attributes with varying distributions.
- The unified representation supports downstream tasks that combine symbolic triples with numeric comparisons.
Where Pith is reading between the lines
- The same routing and contrastive design could be tested on recommendation systems that rely on numerical attributes in knowledge graphs.
- Extending the ordinal sampling to handle multi-attribute comparisons would connect this work to broader numerical query answering.
- Applying the encoder to larger, noisier real-world graphs would show whether the gains hold beyond the three public benchmarks.
Load-bearing premise
The two shortcomings of prior work—incomplete semantic integration and ordinal indistinguishability—are the main bottlenecks, and the MoEKA encoder plus OKCL resolve them without introducing new failure modes on the chosen benchmarks.
What would settle it
A direct comparison on the three public KG benchmarks showing that NumCoKE does not outperform competitive baselines across diverse attribute distributions would falsify the claim of superiority in semantic integration and ordinal reasoning.
Figures
read the original abstract
Knowledge graphs (KGs) serve as a vital backbone for a wide range of AI applications, including natural language understanding and recommendation. A promising yet underexplored direction is numerical reasoning over KGs, which involves inferring new facts by leveraging not only symbolic triples but also numerical attribute values (e.g., length, weight). However, existing methods fall short in two key aspects: (1) Incomplete semantic integration: Most models struggle to jointly encode entities, relations, and numerical attributes in a unified representation space, limiting their ability to extract relation-aware semantics from numeric information. (2) Ordinal indistinguishability: Due to subtle differences between close values and sampling imbalance, models often fail to capture fine-grained ordinal relationships (e.g., longer, heavier), especially in the presence of hard negatives. To address these challenges, we propose NumCoKE, a numerical reasoning framework for KGs based on Mixture-of-Experts and Ordinal Contrastive Embedding. To overcome (C1), we introduce a Mixture-of-Experts Knowledge-Aware (MoEKA) encoder that jointly aligns symbolic and numeric components into a shared semantic space, while dynamically routing attribute features to relation-specific experts. To handle (C2), we propose Ordinal Knowledge Contrastive Learning (OKCL), which constructs ordinal-aware positive and negative samples using prior knowledge, enabling the model to better discriminate subtle semantic shifts. Extensive experiments on three public KG benchmarks demonstrate that NumCoKE consistently outperforms competitive baselines across diverse attribute distributions, validating its superiority in both semantic integration and ordinal reasoning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes NumCoKE, a numerical reasoning framework over knowledge graphs that introduces a Mixture-of-Experts Knowledge-Aware (MoEKA) encoder to jointly align symbolic triples and numerical attributes into a shared space with relation-specific routing, and an Ordinal Knowledge Contrastive Learning (OKCL) objective that constructs positive/negative samples using prior knowledge to capture fine-grained ordinal distinctions. It claims these components address incomplete semantic integration and ordinal indistinguishability, respectively, and reports consistent outperformance over competitive baselines on three public KG benchmarks across diverse attribute distributions.
Significance. If the performance gains can be isolated to the proposed mechanisms, the work would offer a concrete, extensible approach for incorporating numerical attributes into KG embeddings while explicitly handling ordinal relations, which could benefit downstream tasks such as recommendation and numerical question answering. The paper merits credit for clearly articulating two failure modes and for designing targeted components (dynamic expert routing and prior-knowledge contrastive sampling) rather than relying on generic capacity increases.
major comments (2)
- [Experimental section (results and ablations)] The central claim that MoEKA and OKCL directly resolve the two stated bottlenecks (incomplete semantic integration and ordinal indistinguishability) is load-bearing, yet the experimental section provides no ablation studies that remove or disable each component individually while holding parameter count and optimization fixed; without such controls it is impossible to rule out that observed gains arise from added routing capacity or curated sample construction rather than improved numeric-symbolic alignment.
- [§4 (or equivalent experimental analysis subsection)] No error analysis, probing tasks, or per-benchmark breakdown is supplied to verify that baseline failures are primarily attributable to the two identified shortcomings rather than other factors (e.g., embedding dimensionality or negative sampling strategy); this leaves the attribution of NumCoKE's superiority unverified.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the need for stronger experimental controls. We address each major comment below and will revise the manuscript to incorporate the suggested analyses.
read point-by-point responses
-
Referee: [Experimental section (results and ablations)] The central claim that MoEKA and OKCL directly resolve the two stated bottlenecks (incomplete semantic integration and ordinal indistinguishability) is load-bearing, yet the experimental section provides no ablation studies that remove or disable each component individually while holding parameter count and optimization fixed; without such controls it is impossible to rule out that observed gains arise from added routing capacity or curated sample construction rather than improved numeric-symbolic alignment.
Authors: We agree that isolating the contributions of MoEKA and OKCL via controlled ablations is necessary to substantiate the central claims. The current results show overall gains but lack these specific controls. In the revised manuscript, we will add ablation experiments that disable each component individually (e.g., single-expert variant for MoEKA and standard contrastive objective for OKCL) while matching parameter counts and optimization settings across all variants. revision: yes
-
Referee: [§4 (or equivalent experimental analysis subsection)] No error analysis, probing tasks, or per-benchmark breakdown is supplied to verify that baseline failures are primarily attributable to the two identified shortcomings rather than other factors (e.g., embedding dimensionality or negative sampling strategy); this leaves the attribution of NumCoKE's superiority unverified.
Authors: We acknowledge the value of additional diagnostic analysis to link baseline shortcomings directly to the two identified issues. We will expand the experimental section with an error analysis, per-benchmark breakdowns, and targeted probing of numerical attribute handling to better attribute performance differences to semantic integration and ordinal reasoning rather than confounding factors. revision: yes
Circularity Check
No significant circularity; empirical ML validation on benchmarks
full rationale
The paper proposes NumCoKE with MoEKA encoder and OKCL objective, then reports empirical outperformance on three public KG benchmarks. No derivation chain, theorem, or first-principles result is claimed that reduces by construction to its inputs. The listed patterns (self-definitional, fitted-input-as-prediction, self-citation load-bearing, etc.) do not apply: performance deltas are measured against external baselines on held-out test splits, with no equations or self-citations shown to force the outcome. This is standard supervised evaluation and remains self-contained against the benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- number of experts and routing mechanism in MoEKA
- contrastive margin and sampling strategy in OKCL
axioms (2)
- domain assumption Numerical attribute values in KGs carry relation-aware semantics that can be aligned with symbolic triples in a shared space.
- ad hoc to paper Hard negatives and sampling imbalance are the main causes of ordinal indistinguishability.
invented entities (2)
-
MoEKA encoder
no independent evidence
-
OKCL contrastive objective
no independent evidence
Reference graph
Works this paper leans on
-
[1]
InProceedings of the AAAI Conference on Artificial Intel- ligence., 1811–1818
Convolutional 2D Knowledge Graph Embeddings. InProceedings of the AAAI Conference on Artificial Intel- ligence., 1811–1818. Duan, H.; Yang, Y .; and Tam, K. Y . 2021. Learning Numer- acy: A Simple Yet Effective Number Embedding Approach Using Knowledge Graph. InFindings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Ca...
work page 2021
-
[2]
Adversarial Bootstrapped Question Representation Learning for Knowledge Tracing. InProceedings of the 31st ACM International Conference on Multimedia, MM 2023, Ottawa, ON, Canada, 29 October 2023- 3 November 2023, 8016–8025. Tay, Y .; Tuan, L. A.; Phan, M. C.; and Hui, S. C. 2017. Multi-Task Neural Network for Non-discrete Attribute Pre- diction in Knowle...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.