pith. machine review for the scientific record. sign in

arxiv: 1904.03061 · v1 · submitted 2019-04-05 · 💻 cs.LG · cs.PL· cs.SE· stat.ML

Recognition: unknown

A Literature Study of Embeddings on Source Code

Authors on Pith no claims yet
classification 💻 cs.LG cs.PLcs.SEstat.ML
keywords codeembeddingsourcebeenembeddingstechniqueswordapplied
0
0 comments X
read the original abstract

Natural language processing has improved tremendously after the success of word embedding techniques such as word2vec. Recently, the same idea has been applied on source code with encouraging results. In this survey, we aim to collect and discuss the usage of word embedding techniques on programs and source code. The articles in this survey have been collected by asking authors of related work and with an extensive search on Google Scholar. Each article is categorized into five categories: 1. embedding of tokens 2. embedding of functions or methods 3. embedding of sequences or sets of method calls 4. embedding of binary code 5. other embeddings. We also provide links to experimental data and show some remarkable visualization of code embeddings. In summary, word embedding has been successfully applied on different granularities of source code. With access to countless open-source repositories, we see a great potential of applying other data-driven natural language processing techniques on source code in the future.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Social Life of Code: Modeling Evolution through Code Embedding and Opinion Dynamics

    cs.SE 2026-02 unverdicted novelty 5.0

    Code embeddings combined with the Expressed-Private Opinion model produce trajectories that quantify developer influence and consensus formation across three open-source repositories.