Deep Neural Networks and Dropout
Do you have questions or comments about this model? Ask them here! (You'll first need to log in.)
WHAT IS IT?
This is a model of arbitrarily large neural networks. It is based on the Multilayer Perceptron model, but the network architecture is user-determined.
This network is intended to provide a visualization of the process of neural network training, and to serve as a platform for experimentation with an eye on qualitiative intuitions.
HOW IT WORKS
Initially the weights on the links of the networks are random, with some scaling based on the surrounding network.
The nodes in the leftmost layer are the called the input nodes, the nodes in the middle layers are called the hidden nodes, and the nodes on the rightmost layer are called the output nodes.
The activation values of the input nodes are the inputs to the network. The activation values of the hidden nodes and output nodes are equal to the activation values of the layer before them, multiplied by their link weights, summed together, and passed through the tanh function. The output of the network is 1 if the activation of the output node is greater than 0 and -1 if it is less than 0.
The tanh function maps negative values to values between -1 and 0, and maps positive values to values between 0 and 1. The values increase nonlinearly between -1 and 1 with a sharp transition at 0.
To train the network a lot of inputs are presented to the network along with how the network should correctly classify the inputs. The network uses a back-propagation algorithm to pass error back from the output node and uses this error to update the weights along each link.
If dropout learning is enabled, hidden nodes are randomly dropped from training at each training step. This prevents the model from "overfitting", which means it stops it from making too large of assumptions about the entire data set based on the data it has seen so far.
HOW TO USE IT
To use it press SETUP to load the training data and initialize the patches.
Enter a string of the form "["-any number of space seperated positive integers-"]" into HIDDEN-SIZES-STRING to set the number of hidden nodes in each hidden layer (and how many such layers there are).
Press INIT-NET to initialize the network.
Press TRAIN ONCE to run one epoch of training. The number of examples presented to the network during this epoch is controlled by EXAMPLES-PER-EPOCH slider.
Press TRAIN to continually train the network.
In the view, the greater the intensity of the link's color the greater (in terms of absolute value) the weight it has. If the link is red then it has a positive weight. If the link is blue then it has a negative weight.
LEARNING-RATE controls how much the neural network will learn from any one example.
DROPOUT? activates dropout learning
DROPOUT-RATE controls the probability that the hidden nodes drop out at each step, out of 1000.
THINGS TO NOTICE
As the network trains, high-weight edges (intuitively, connections that the model is placing importance on) become brighter. This exposes the actual process of learning in an intuitive visual medium, and can sometimes be extremely informative.
THINGS TO TRY
Manipulate the HIDDEN-SIZES-STRING. What happens to the learning rate as the number of nodes increases? What happens to the accuracy? How does this effect the visualization of the model?
EXTENDING THE MODEL
The ability to set a threshold weight under which edges become transparent could make the model visualization less cluttered and easier to interpret.
The model would be a viable candidate for extension by any standard neural network optimization, which could be compared to normal learning in a similar way to how dropout was handled in this case.
NETLOGO FEATURES
This model uses the link primitives. It also makes heavy use of lists. Additionally, the csv extension is used to load the MNIST data set.
COPYRIGHT AND LICENSE
Copyright 2006 Uri Wilensky.
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/3.0/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
Commercial licenses are also available. To inquire about commercial licenses, please contact Uri Wilensky at uri@northwestern.edu.
Comments and Questions
extensions [csv] links-own [ weight ;; Weight given to end1 activation by end2 inlayer ;; Layer index of end1 ] breed [bias-nodes bias-node] breed [input-nodes input-node] breed [output-nodes output-node] breed [hidden-nodes hidden-node] turtles-own [ activation ;; Determines the nodes output err ;; Used by backpropagation to feed error backwards layer ;; Layer of network node is contained in. Used for agentset manipulation. estvar ;; Estimated variance of error signal learncoeff ;; Individual learning coefficient dropped? ;; Boolean true if node currently dropped ] globals [ epoch-error ;; Measurement of how many training examples the network got wrong in the epoch input-size ;; Size of inputs hiddensizes ;; Vector of layer sizes, determines net topology output-size ;; Size of outputs traindata ;; Training data testdata ;; Testing data ] ;;; ;;; LOAD FILES ;;; to load-files file-close-all file-open "mnist_train.csv" set traindata (list) repeat 20000 [ set traindata lput (csv:from-row file-read-line) traindata ] file-close set testdata csv:from-file "mnist_test.csv" end ;;; ;;; SETUP PROCEDURES ;;; ;; Set patches, shapes, and files (invariant under node change) to setup clear-all ask patches [ set pcolor gray ] set-default-shape bias-nodes "bias-node" set-default-shape input-nodes "circle" set-default-shape output-nodes "circle" set-default-shape hidden-nodes "output-node" set-default-shape links "small-arrow-shape" load-files end ;; Set up nodes and links, initialize values, and propagate so ;; that activations make sense to init-net clear-links clear-turtles clear-plot setup-nodes setup-links recolor set-learncoeffs propagate reset-ticks end ;; Create, initialize, and position the nodes in the network to setup-nodes set input-size 400 set hiddensizes read-from-string hidden-sizes-string set output-size 10 let l-index 0 let index 0 create-bias-nodes 1 [ setxy nodex l-index nodey l-index index (input-size + 1) set activation 1 set layer l-index set dropped? false ] set index 1 repeat input-size [ create-input-nodes 1 [ setxy nodex l-index nodey l-index index (input-size + 1) set activation ((random 2) * 2) - 1 set layer l-index set dropped? false ] set index index + 1 ] set l-index 1 set index 0 foreach hiddensizes [ create-bias-nodes 1 [ setxy nodex l-index nodey l-index index (? + 1) set activation 1 set layer l-index set dropped? false ] set index 1 repeat ? [ create-hidden-nodes 1 [ setxy nodex l-index nodey l-index index (? + 1) set activation ((random 2) * 2) - 1 set layer l-index set dropped? false ] set index index + 1 ] set l-index l-index + 1 set index 0 ] repeat output-size [ create-output-nodes 1 [ setxy nodex l-index nodey l-index index output-size set activation ((random 2) * 2) - 1 set layer l-index set dropped? false ] set index index + 1 ] ask turtles [set size 0.5] end ;; Create and initialize links between nodes in the network to setup-links let l-index 0 repeat (length hiddensizes) [ connect-all (turtles with [layer = l-index]) (hidden-nodes with [layer = (l-index + 1)]) set l-index l-index + 1 ] connect-all (turtles with [layer = l-index]) (output-nodes with [layer = (l-index + 1)]) end ;; Completely connect nodes1 to nodes2 with links to connect-all [nodes1 nodes2] let r 1 / (sqrt (count nodes1)) ask nodes1 [ create-links-to nodes2 [ set weight random-float (2 * r) - r set inlayer [layer] of one-of nodes1 ] ] end ;; Adjust color of nodes and edges according to values to recolor ask turtles [ set color item (step activation) [black white] ] let l-index 0 let maxw 0 repeat (length hiddensizes) + 1 [ set maxw max [abs weight] of links with [inlayer = l-index] ask links with [inlayer = l-index] [ let wquotient (weight / maxw) let colorstr (wquotient * 127) let colorvec (list (colorstr + 127) (127 - (abs colorstr)) (127 - colorstr) 196) set color colorvec ] set l-index l-index + 1 ] ask turtles [ if dropped? [set color [127 127 127]] ] ask links [ if ([dropped?] of end1) or ([dropped?] of end2) [set color [127 127 127 196]] ] end ;; Set the local learning rate coefficients for the nodes to set-learncoeffs let l-index ((length hiddensizes) + 1) let v (1 / (item (l-index - 2) hiddensizes)) let lc (1 / ((item (l-index - 2) hiddensizes) * sqrt v)) ask output-nodes [ set estvar v set learncoeff lc ] set l-index (l-index - 1) repeat (length hiddensizes) - 1 [ set v (((count hidden-nodes with [layer = (l-index - 1)]) * v) / (item (l-index - 2) hiddensizes)) set lc (1 / ((item (l-index - 2) hiddensizes) * sqrt v)) ask hidden-nodes with [layer = l-index] [ set estvar v set learncoeff lc ] set l-index (l-index - 1) ] set v (((count input-nodes) * v) / (count input-nodes)) set lc (1 / ((count input-nodes) * (sqrt v))) ask hidden-nodes with [layer = l-index] [ set estvar v set learncoeff lc ] end ;;; ;;; VISUAL LAYOUT FUNCTIONS ;;; ;; Find the appropriate x coordinate for this layer to-report nodex [l-index] report min-pxcor + (((l-index + 1) * (world-width - 1)) / (length hiddensizes + 3)) end ;; Find the appropriate y cooridinate for this node to-report nodey [l-index index in-layer] report max-pycor - (((index + 1) * (world-height - 1)) / (in-layer + 1)) end ;;; ;;; TRAINING PROCEDURES ;;; to train set epoch-error 0 let currentdatum (one-of traindata) let sortin sort input-nodes let index 0 let target n-values 10 [ifelse-value (? = (item 0 currentdatum)) [1][-1]] repeat examples-per-epoch [ if dropout? [ ask hidden-nodes [if (random 1000) < dropout-rate [set dropped? true]] ] repeat (length sortin) [ ask (item index sortin) [set activation ((item (index + 1) currentdatum) / 127.5) - 1] set index index + 1 ] propagate back-propagate target set index 0 set currentdatum (one-of traindata) set target n-values 10 [ifelse-value (? = (item 0 currentdatum)) [1][-1]] ask hidden-nodes [set dropped? false] ] set epoch-error epoch-error / examples-per-epoch tick end ;;; ;;; PROPAGATION PROCEDURES ;;; ;; carry out one calculation from beginning to end to propagate let l-index 1 repeat length hiddensizes [ ask hidden-nodes with [layer = l-index and not dropped?] [ set activation new-activation ] set l-index l-index + 1 ] ask output-nodes [set activation new-activation] recolor end ;; Determine the activation of a node based on the activation of its input nodes to-report new-activation ;; node procedure report tanh sum [[activation] of end1 * weight] of my-in-links with [not [dropped?] of end1] end ;; changes weights to correct for errors to back-propagate [target] let example-error 0 let sortout sort output-nodes let l-index (length hiddensizes) + 1 let index 0 repeat (count output-nodes) [ ask (item index sortout) [ set err (item index target) - activation set example-error example-error + (err ^ 2) ] set index index + 1 ] set example-error .5 * example-error set l-index l-index - 1 repeat length hiddensizes [ ask hidden-nodes with [layer = l-index and not dropped?] [ let sumerror sum [weight * ([err] of end2)] of my-out-links set err (1 - (activation ^ 2)) * sumerror ] set l-index l-index + 1 ] ask links with [not [dropped?] of end1 and not [dropped?] of end2] [ let change ([err * learncoeff * learning-rate] of end2)*([activation] of end1) set weight weight + change ] set epoch-error epoch-error + example-error end ;;; ;;; MISC PROCEDURES ;;; ;; Calculates the tanh function to-report tanh [input] let exp2x e ^ (2 * input) report (exp2x - 1) / (exp2x + 1) end ;; computes the step function given an input value and the weight on the link to-report step [input] report ifelse-value (input > 0) [1][0] end ; Copyright 2006 Uri Wilensky. ; See Info tab for full copyright and license.
There is only one version of this model, created over 9 years ago by Jacob Samson.
Attached files
File | Type | Description | Last updated | |
---|---|---|---|---|
Deep Architectures and Dropout Learning Poster Slam.pptx | powerpoint | Poster Slam Slides | over 9 years ago, by Jacob Samson | Download |
Deep Neural Networks and Dropout.png | preview | Preview for 'Deep Neural Networks and Dropout' | over 9 years ago, by Jacob Samson | Download |
DeepNetPoster.pptx | powerpoint | Poster Slam Poster | over 9 years ago, by Jacob Samson | Download |
Final Report.docx | word | Final Report | over 9 years ago, by Jacob Samson | Download |
JacobSamson_June1.docx | word | Project Update - June 1st | over 9 years ago, by Jacob Samson | Download |
JacobSamson_May18.docx | word | Project Update - May 18th | over 9 years ago, by Jacob Samson | Download |
JacobSamson_May25.docx | word | Project Update - May 25th | over 9 years ago, by Jacob Samson | Download |
Project Proposal.docx | word | Project Proposal | over 9 years ago, by Jacob Samson | Download |
This model does not have any ancestors.
This model does not have any descendants.