TL;CE
- Exciting opportunity: Every aspect of practical software development can be improved by ML tools
- Hot topic in software engineering and programming languages: "big code" or "naturalness of software"
- Want to create neural networks that have semantic understanding of code
- Wait? Isn't program semantics a solved problem? Two answers
- Not if you include the connection to the problem domain
- Not if you want an approximate method that works for common cases
- One way to show that you understand software is to summarize it: Given a Java method body, what is the name?
- Our ICML 2016 paper does this with a novel three-way convolutional attention mechanism
- In more recent work, we have looked at mapping symbolic expressions to continuous semantics
We want the semantics to retain information about equivalence
- Tree-based networks seem natural here, but they tend to key on syntactic similarity rather than semantic
- Our method is called equivalence networks (EqNets)
- We add a regularization term, subexpression forcing, that requires that child semantics be predictable from parents and siblings. This is inpsired by what happens in symbolic logics via, e.g., unification.
- Sequence-to-sequence, tree neural networks, etc., do not work well for this problem, but subexpression forcing gives a big improvement.
Slides
Slides as pdf
Video
Video of the talk
Papers
The papers that the talk is based on. Please see bibliographies of these papers for works by others:
Naming: A Convolutional Attention Network for Extreme Summarization of Source Code. Miltiadis Allamanis, Hao Peng and Charles Sutton. In International Conference in Machine Learning (ICML). 2016.
Continuous semantics: Learning Continuous Semantic Representations of Symbolic Expressions. Miltiadis Allamanis, Pankajan Chanthirasegaran, Pushmeet Kohli and Charles Sutton. Open Review submission. 2016.
Code
More General Resources
Collaborators
Why Am I Doing This?
I was inspired to create a "companion web site" for this talk
by Tufte's quixotic diatribe
against Powerpoint. He recommends written handouts instead. This is a first attempt
to adapt his idea to an all-open all-Internet all-the-time age. Feedback welcome.
By Charles Sutton, last modified 10 December 2016