CRAIC: Learning to identify uninformative comments
We want to help developers to write better comments
by steering them away from comments that don’t add any value
Lots of software development authorities admonish against comments
that word-for-word repeat the code.
How to detect these? Here is where deep learning comes in.
Sequence-to-sequence model predicts comment sentence from code
Basically, if my deep learning method can predict your comment,
then probably your comment wasn’t very good.
Good agreement with human judgments. Also we find Javadoc comments
are much easier to predict!
Deep generative models are great new tools for distributions over complicated manifolds.
We want to fix the problem of to mode collapse, which happens when a model,
like a generative adversarial network (GANs) only learns to produce a few modes of the true distribution.
We address this by adding a cyclic consistency. GANs map random noise
to data. We add a reconstructor like in an autoencoder which maps data to noise.
Cyclic consistency means if you map noise –> data –> noise, you should
get the same noise distribution that you started with.
The training objective has an interesting variational interpretation.
The generator of the GAN becomes the approximating distribution
in the variational argument.
There were lots of autoencoder + GAN papers happening at around the same
time as ours. A nice bit about ours is that unlike a standard autoencoder,
VEEGAN does not require specifying a loss function over the data, but rather only over the representations, which are standard normal by assumption.
Continuous Representations of Symbolic Expressions
Speculative motivation: How much of symbolic logical reasoning
can we reproduce using neural perceptual learning?
Out step: Let’s map symbolic expressions to continuous semantics
We want the semantics to retain information about equivalence
Tree-based networks seem natural here, but they tend to key on syntactic similarity rather than semantic
Our method is called equivalence networks (EqNets)
We add a regularization term, subexpression forcing, that requires that child semantics be predictable from parents and siblings. This is inpsired by what happens in symbolic logics via, e.g., unification.
Sequence-to-sequence, tree neural networks, etc., do not work well for this problem, but subexpression forcing gives a big improvement.
Houdini (brief mention)
Deep Learning to Detect Redundant Method Comments.
Annie Louis, Santanu Dash, Earl T Barr, and Charles Sutton.