Sam Roweis described "bold driver" algorithm to me at ICML 2004. Looking at neural network history (eg in Chris Bishop's neural networks book) it seems huge amounts of time was spent tweaking schemes like this. 1/ Take epsilon step. If decrease cost function then increase epsilon by 1.1 2/ If *woops* then multiply epsilon by 0.5 3/ Terminate when epsilon is 0 (ie when it underflows) "works suprisingly well, perhaps not as good as minimize, but pretty good compared to coding time" Lets see... On a Gaussian process learning problem I found that for a reasonable answer (stopping early) it was ~10 times slower than minimize. To actually converge takes a *long* time. Randomizing the increment doesn't seem to make much of a difference. ---- gradopt is my preferred _hack_ in that it sort of learns a (diagonal) metric and has performed /ok/ for me. Very occasionally it's found good solutions fancier methods haven't. I'm not sure whether it's exactly the same as one of the many other gradient heuristics like quick-prop, delta-bar-delta, ... *** I'm not recommending anything in this directory *** I'd try L-BFGS from Ed Snelson's webpage instead.