pith. sign in

Shuming Ma

Identifiers

  • name variant Shuming Ma 0.60 · backfill

Papers (30)

  1. The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits cs.CL · 2024 · author #1
  2. BitNet: Scaling 1-bit Transformers for Large Language Models cs.CL · 2023 · author #2
  3. Retentive Network: A Successor to Transformer for Large Language Models cs.CL · 2023 · author #4
  4. Kosmos-2: Grounding Multimodal Large Language Models to the World cs.CL · 2023 · author #6
  5. Language Is Not All You Need: Aligning Perception with Language Models cs.CL · 2023 · author #6
  6. Unsupervised Machine Commenting with Neural Variational Topic Model cs.CL · 2018 · author #1
  7. LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts cs.CL · 2018 · author #1
  8. A Deep Reinforced Sequence-to-Set Model for Multi-Label Text Classification cs.CL · 2018 · author #2
  9. Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification cs.CL · 2018 · author #4
  10. Identifying High-Quality Chinese News Comments Based on Multi-Target Text Matching Model cs.CL · 2018 · author #2
  11. SGM: Sequence Generation Model for Multi-label Classification cs.CL · 2018 · author #4
  12. Deconvolution-Based Global Decoding for Neural Machine Translation cs.CL · 2018 · author #4
  13. Bag-of-Words as Target for Neural Machine Translation cs.CL · 2018 · author #1
  14. Autoencoder as Assistant Supervisor: Improving Text Representation for Chinese Social Media Text Summarization cs.CL · 2018 · author #1
  15. Global Encoding for Abstractive Summarization cs.CL · 2018 · author #3
  16. Automatic Academic Paper Rating Based on Modularized Hierarchical Convolutional Neural Network cs.CL · 2018 · author #4
  17. A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification cs.CL · 2018 · author #1
  18. Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation cs.CL · 2018 · author #1
  19. Decoding-History-Based Adaptive Control of Attention for Neural Machine Translation cs.CL · 2018 · author #2
  20. Complex Structure Leads to Overfitting: A Structure Regularization Decoding Method for Natural Language Processing cs.LG · 2017 · author #3
  21. Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling Sequence Data? cs.CL · 2017 · author #3
  22. Training Simplification and Model Simplification for Deep Learning: A Minimal Effort Back Propagation Method cs.LG · 2017 · author #3
  23. Label Embedding Network: Learning Label Representation for Soft Training of Deep Networks cs.LG · 2017 · author #4
  24. A Semantic Relevance Based Neural Network for Text Summarization and Text Simplification cs.CL · 2017 · author #1
  25. meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting cs.LG · 2017 · author #3
  26. Improving Semantic Relevance for Sequence-to-Sequence Learning of Chinese Social Media Text Summarization cs.CL · 2017 · author #1
  27. A Generic Online Parallel Learning Framework for Large Margin Models cs.CL · 2017 · author #1
  28. Lock-Free Parallel Perceptron for Graph-based Dependency Parsing cs.CL · 2017 · author #2
  29. A New Recurrent Neural CRF for Learning Non-linear Edge Features cs.CL · 2016 · author #1
  30. Towards Easier and Faster Sequence Labeling for Natural Language Processing: A Search-based Probabilistic Online Learning Framework (SAPO) cs.LG · 2015 · author #2

Mentions

  • 2402.17764 #1 · arxiv_oai · confidence 0.70 Shuming Ma
  • 2310.11453 #2 · arxiv_oai · confidence 0.70 Shuming Ma
  • 2302.14045 #6 · arxiv_oai · confidence 0.70 Shuming Ma

Frequent Coauthors