1 / 20

The Boltzmann Machine

The Boltzmann Machine. Psych 419/719 March 1, 2001. Recall Constraint Satisfaction. We have a network of units and connections… Finding an optimal state involves relaxation : letting the network settle into a configuration that maximizes a goodness function This is done by annealing.

Download Presentation

The Boltzmann Machine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Boltzmann Machine Psych 419/719 March 1, 2001

  2. Recall Constraint Satisfaction.. • We have a network of units and connections… • Finding an optimal state involves relaxation: letting the network settle into a configuration that maximizes a goodness function • This is done by annealing

  3. Simulated Annealing • Update unit states according to a probability distribution, which is based on: • The input to the unit. Higher input = greater odds of being on • The temperature. High temperature = more random. Low temperature = deterministic function of input • Start with high temperature, and gradually reduce it

  4. Constraint Satisfaction Networks Have Nice Properties • Can settle into stable configurations based on partial or noisy information • Can do pattern completion • Have well formed attractors corresponding to stable states • BUT: How can we make a network learn?

  5. What about Backprop? • Two problems: • Tends to split the probability distributions • If input is ambiguous (say, the word LEAD), output reflects that distribution. Not like the necker cube • Also: not very biologically plausible. • Error gradients travel backwards along connections. Neurons don’t seem to do this.

  6. We Need Hidden Units • Hidden units are needed to solve xor-style problems • In these networks, we have a set of symmetric connections between units. • Some units are visible and others are hidden

  7. The Boltzmann Machine:Memorizing Patterns • Here, we want to train the network on a set of patterns. • We want the network to learn about the statistics and relationships between the parts of the patterns. • Not really performing an explicit mapping (like backprop is good for)

  8. How it Works • Step 1. Pick an example • Step 2. Run network in positive phase • Step 3. Run network in negativephase • Step 4. Compare the statistics of the two phases • Step 5. Update the weights based on statistics • Step 6. Go to step 1 and repeat.

  9. Step 1: Pick Example • Pretty simple. Just select an example at random.

  10. Step 2. The Positive Phase • Clamp our visible units with the pattern specified by our current example • Let network settle using the simulated annealing method • Record the outputs of the units • Start again with our example, settling again and recording units again.

  11. Step 3. The Negative Phase • Here, we don’t clamp the network units. We just let it settle to some state as before. • Do this several times, again recording the unit outputs.

  12. Step 4. Compare Statistics • For each pair of units, we compute the odds that both units are coactive (both on) for the positive phase. Do it also for the negative phase. • If we have n units, this gives us two n x n matrices of probabilities • pi,j is probability that both unit i and j are both on.

  13. Step 5: Update Weights • Change each weight according to the difference of the probabilities for the positive and negative phase • Here, k is like a learning rate

  14. Why it Works • This reduces the difference between what the network settles to when the inputs are clamped, and what it settles to when its allowed to free-run. • So, the weights learn about what kinds of visible units go together. • Recruits hidden units to help learn higher order relationships

  15. Can Be Used For Mappings Too • Here, the positive phase involves clamping both the input and output units and letting the network settle. • The negative phase involves clamping just the input units • Network learns that given the input, it should settle to a state where the output units are what they should be

  16. Contrastive Hebbian Learning • Very similar to a normal Boltzmann machine, except we can have units whose outputs are a deterministic function of their input (like the logistic). • As before, we have two phases: positive and negative.

  17. Contrastive Hebbian Learning Rule • Weight updates based on actual unit outputs, not probabilities that they’re both on.

  18. Problems • Weight explosion. If weights get too big too early, network will get stuck in one goodness optimum. • Can be alleviated with weight decay • Settling time. Time to process an example is long, due to settling process. • Learning time. Takes a lot of presentations to learn. • Symmetric weights? Phases?

  19. Sleep? • It has been suggested that something like the minus phase might be happening during sleep: • Spontaneous correlations between hidden units (not those driven by external input) get subtracted off. Will vanish, unless driven by external input while awake. • Not a lot of evidence to support this conjecture. • We can learn while awake!

  20. For Next Time • Optional reading handed out. • Ends section on learning internal representations. Next: biologically plausible learning. • Remember: • No class next Thursday • Homework 3 due March 13 • Project proposal due March 15. See web page.

More Related