On the efficiency of recurrent neural network optimization algorithms

Ben Krause, Liang Lu, Iain Murray and Steve Renals.

This study compares the sequential and parallel efficiency of training Recurrent Neural Networks (RNNs) with Hessian-free optimization versus a gradient descent variant. Experiments are performed using the long short term memory (LSTM) architecture and the newly proposed multiplicative LSTM (mLSTM) architecture. Results demonstrate a number of insights into these architectures and optimization algorithms, including that Hessian-free optimization has the potential for large efficiency gains in a highly parallel setup.

Appeared in OPT2015 Optimization for Machine Learning at the Neural Information Processing Systems Conference, 2015. [PDF, DjVu, GoogleViewer, BibTeX]