Sam Roweis described "bold driver" algorithm to me at ICML 2004. Looking at
neural network history (eg in Chris Bishop's neural networks book) it seems huge
amounts of time was spent tweaking schemes like this.

1/ Take epsilon step. If decrease cost function then increase epsilon by 1.1
2/ If *woops* then multiply epsilon by 0.5
3/ Terminate when epsilon is 0 (ie when it underflows)

"works suprisingly well, perhaps not as good as minimize, but pretty good
compared to coding time"

Lets see...

On a Gaussian process learning problem I found that for a reasonable answer
(stopping early) it was ~10 times slower than minimize. To actually converge
takes a *long* time.

Randomizing the increment doesn't seem to make much of a difference.

----

gradopt is my preferred _hack_ in that it sort of learns a (diagonal) metric and
has performed /ok/ for me. Very occasionally it's found good solutions fancier
methods haven't. I'm not sure whether it's exactly the same as one of the many
other gradient heuristics like quick-prop, delta-bar-delta, ...

*** I'm not recommending anything in this directory ***
I'd try L-BFGS from Ed Snelson's webpage instead.