Exciting opportunity: Every aspect of practical software development can be improved by ML tools
Hot topic in software engineering and programming languages: "big code" or "naturalness of software"
Want to create neural networks that have semantic understanding of code
Wait? Isn't program semantics a solved problem? Two answers
Not if you include the connection to the problem domain
Not if you want an approximate method that works for common cases
One way to show that you understand software is to summarize it: Given a Java method body, what is the name?
Our ICML 2016 paper does this with a novel three-way convolutional attention mechanism
In more recent work, we have looked at mapping symbolic expressions to continuous semantics
We want the semantics to retain information about equivalence
Tree-based networks seem natural here, but they tend to key on syntactic similarity rather than semantic
Our method is called equivalence networks (EqNets)
We add a regularization term, subexpression forcing, that requires that child semantics be predictable from parents and siblings. This is inpsired by what happens in symbolic logics via, e.g., unification.
Sequence-to-sequence, tree neural networks, etc., do not work well for this problem, but subexpression forcing gives a big improvement.
I was inspired to create a "companion web site" for this talk
by Tufte's quixotic diatribe
against Powerpoint. He recommends written handouts instead. This is a first attempt
to adapt his idea to an all-open all-Internet all-the-time age. Feedback welcome.