Nikiya Anton Bettey 2021: 07-28-2023-1210 - FASTER HARDWARE, LONG SHORT-TERM MEMORY, MULTI-LEVEL HIERARCHY, BACKPROPAGATION, FEATURE DETECTORS, LATENT VARIABLES, GENERATIVE MODEL, LOWER BOUND, DEEP BELIEF NETWORK, UNSUPERVISED LEARNING, SUPERVISED LEARNING VARIABLE TREND CURRICULUM INDOCTRINATION SCHEDULE, 1980, 1900, 1800 DUSTY BLUE, 1700 XLIF OR A BOMBER, 1600 WIFE OR N/A, -10000 HUMANS (HUMAN RECORD ONLY)(DRAFT)(VAR ENV)(DRAFT), <-10000 N/A, ETC.. DRAFT

Friday, July 28, 2023

07-28-2023-1210 - FASTER HARDWARE, LONG SHORT-TERM MEMORY, MULTI-LEVEL HIERARCHY, BACKPROPAGATION, FEATURE DETECTORS, LATENT VARIABLES, GENERATIVE MODEL, LOWER BOUND, DEEP BELIEF NETWORK, UNSUPERVISED LEARNING, SUPERVISED LEARNING VARIABLE TREND CURRICULUM INDOCTRINATION SCHEDULE, 1980, 1900, 1800 DUSTY BLUE, 1700 XLIF OR A BOMBER, 1600 WIFE OR N/A, -10000 HUMANS (HUMAN RECORD ONLY)(DRAFT)(VAR ENV)(DRAFT), <-10000 N/A, ETC.. DRAFT

Multi-level hierarchy

One is Jürgen Schmidhuber's multi-level hierarchy of networks (1992) pre-trained one level at a time through unsupervised learning, fine-tuned through backpropagation.^[10] Here each level learns a compressed representation of the observations that is fed to the next level.

Related approach

Similar ideas have been used in feed-forward neural networks for unsupervised pre-training to structure a neural network, making it first learn generally useful feature detectors. Then the network is trained further by supervised backpropagation to classify labeled data. The deep belief network model by Hinton et al. (2006) involves learning the distribution of a high level representation using successive layers of binary or real-valued latent variables. It uses a restricted Boltzmann machine to model each new layer of higher level features. Each new layer guarantees an increase on the lower-bound of the log likelihood of the data, thus improving the model, if trained properly. Once sufficiently many layers have been learned the deep architecture may be used as a generative model by reproducing the data when sampling down the model (an "ancestral pass") from the top level feature activations.^[11] Hinton reports that his models are effective feature extractors over high-dimensional, structured data.^[12]

Long short-term memory

Another technique particularly used for recurrent neural networks is the long short-term memory (LSTM) network of 1997 by Hochreiter & Schmidhuber.^[13] In 2009, deep multidimensional LSTM networks demonstrated the power of deep learning with many nonlinear layers, by winning three ICDAR 2009 competitions in connected handwriting recognition, without any prior knowledge about the three different languages to be learned.^[14]^[15]

Faster hardware

Hardware advances have meant that from 1991 to 2015, computer power (especially as delivered by GPUs) has increased around a million-fold, making standard backpropagation feasible for networks several layers deeper than when the vanishing gradient problem was recognized. Schmidhuber notes that this "is basically what is winning many of the image recognition competitions now", but that it "does not really overcome the problem in a fundamental way"^[16] since the original models tackling the vanishing gradient problem by Hinton and others were trained in a Xeon processor, not GPUs.^[11]

Nikiya Anton Bettey 2021

Blog Archive

Friday, July 28, 2023

Multi-level hierarchy

Related approach

Long short-term memory

Faster hardware

https://en.wikipedia.org/wiki/Vanishing_gradient_problem

No comments:

Post a Comment