MLPs with a lot of layers
   - One of the (maybe the) big successes of AI in the 80s was
       the MLP
       .
 
   - It's still widely used today in analytics.
 
   - In these nets, people only typically use one hidden layer,
       and at most two.
 
   - Why? It's the vanishing gradient problem.
 
   - It's trained by the backprop rule (or some variant).
 
   - Backprop can only correct from the backpropagation of error.
 
   - It doesn't work as well in the connections further from 
       the output (and thus the error).