MLPs with a lot of layers

One of the (maybe the) big successes of AI in the 80s was the MLP .
It's still widely used today in analytics.
In these nets, people only typically use one hidden layer, and at most two.
Why? It's the vanishing gradient problem.
It's trained by the backprop rule (or some variant).
Backprop can only correct from the backpropagation of error.
It doesn't work as well in the connections further from the output (and thus the error).