0 ratings0% found this document useful (0 votes) 19 views9 pagesTable of Content
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Deep Learning
Tan Goodfellow
Yoshua Bengio
Aaron CourvilleContents
Website
Acknowledgments
Notation
1 Introduction
1.1 Who Should Read This Book’
1.2 Historical Trends in Deep Learning
I Applied Math and Machine Learning Basics
2 Linear Algebra
2.1 Scalars, Vectors, Matrices and Tensors
2.2 Multiplying Matrices and Vectors
2.3 Identity and Inverse Matrices
2.4 Linear Dependence and Span
2.5 Norms beeen ee
2.6 Special Kinds of Matrices and Vect:
27 decomposition :
2.8 Singular Value Decomposition
2.9 The Moore-Penrose Pseudoinverse
2.10 The Trace Operator
2.11 The Determinant
2.12 Example: Principal Components Analysis
3 Probability and Information Theory
3.1 Why Probability? . . .
viii
ix
xiii
27
29
29
32
34
35
37
38
40
42
43
44
45
45,
51
52Ir
Random Variables
Probability Distributions
Marginal Probability
Conditional Probability
The Chain Rule of Conditional Probabilities
Independence and Conditional Indey
Expectation, Variance and Covariance
Common Probability Distributions
3.10 Useful Properties of Common Functions
B11 Bayes’ Rules... eee
3.12. Technical Details of Continuous Variables
3.13 Information Theory .
3.14 Structured Probabilistic Models
endence
Numerical Computation
4.1 Overflow and Underflow
4.2 Poor Conditioning
4.3 Gradient-Based Optimization
4.4 Constrained Optimization
4.5 Example: Linear Least Squares
Machine Learning Basics
5.1 Learning Algorithms
5.2 Capacity, Overfitting and Underfitting
5.3 Hyperparameters and Validation Sets
54 Estimators, Bias and Variance .
5.5 Maximum Likelihood Estimation
5.6 Bayesian Statistics
Supervised Learning Algorithms
5.8 Unsupervised Learning Algorithms
5.9 Stochastic Gradient Descent . .
5.10 Building a Machine Learning Algorithm
5.11 Challenges Motivating Deep Learning . .
Deep Networks: Modern Practices
Deep Feedforward Networks
6.1 Example: Learning XOR
6.2 Gradient-Based Learning .
54
54
56
57
57
58
58
60
65
68
69
7
73
73
78
80
80
91
94
96
.. OT
- 108
118
. 120
129
137
142
. 49
151
152
162
164
167
. 1726.3,
64
6.5
6.6
Hidden Units
Architecture Design
Back Propagation and Other Differentiation
Algorithms
Historical Notes
Regularization for Deep Learning
7A
72
7.3
74
75
76
7.7
Parameter Norm Penalties
Norm Penalties as Constrained Optimization
Regularization and Under-Constrained Problems
Dataset Augmentation .
Noise Robustness
Semi-Supervised Learning
Multitask Learning
Early Stopping
Paraineter Tying and Paramet
Sparse Representations
Bagging and Other Ensemble Methods
Dropout
Adversarial Training
Tangent Distance, Tangent Prop and Manifold
Tangent Classifier
Sharing .
Optimization for Training Deep Models
81
8.2
8.3,
8.6
87
How Learning Differs from Pure Optimization
Challenges in Neural Network Optimi:
Basic Algorithms
Parameter Initialization Strategies
Algorithms with Adaptive Learning Rates
Approximate Second-Order Methods
Optimization Strategies and Meta-Algorithms . .
tion
Convolutional Networks
9.1
9.2
9.3
94
9.5
9.6
9.7
The Convolution Operation
Motivation
Pooling -
Convolution and Pooling as an Infinitely Strong Prior... .
Variants of the Basic Convolution Function
Structured Outputs . .
Data Types
187
193
200
- 220
224
. 226
233
. 236
238
. 240
241
. 241
249
251
~ 265
267
271
272
. 279
290
- 296
302
307
. 313
326
. 327
329
335
. 339
342
352
3549.8 Efficient Convolution Algorithms
9.9 Random or Unsupervised Features
9.10 The Neuroscientific Basis for Convolutional
Networks
9.11 Convolutional Networks and the History of Deep Learning .
10 Sequence Modeling: Recurrent and Recursive Nets
10.1 Unfolding Computational Graphs .
10.2 Recurrent Neural Networks
10.3 Bidirectional RNNs
10.4
Architectures
10.5 Deep Recurrent Networks
10.6 Recursive Neural Networks
10.7 The Challenge of Long-Term Dependencies
10.8 s etworks 2. ee
10.9 Leaky Units and Other Strategies for Multiple
Time Scales... 0.000. bees
10.10 The Long Short-Term Memory and Other Gated RNI
10.11 Optimization for Long-Term Dependencies
10.12 Explicit Memory
>
11 Practical Methodology
11.1 Performance Metrics . .
11.2 Default Baseline Models,
11.3. Determining Whether to Gather More Data
11.4 Selecting Hyperparameters
11.5 Debugging Strategies .
11.6 Example: Multi-Digit Number Re
12 Applications
12.1 Large-Scale Deep Learning . . .
12.2 Computer Vision
12.3. Speech Recognition . .
12.4 Natural Language Proce
12.5 Other Applications . .
sing
416
. 417
420
421
422
- 431
438
- 438
447
- 453
456
- 473III Deep Learning Research
13 Linear Factor Models
13.1
13.2
13.3
13.4
13.5
Probabilistic PCA and Factor Analysis
Independent Component Analysis (ICA)
Slow Feature Analysis .
Sparse Coding
Manifold Interpretation of PC ‘A
14 Autoencoders
141
14.2
14.3
14.4
14.5
14.6
14.7
14.8
14.9
Undercomplete Autoencoders
Regularized Autoencoders
Representational Power, Lay
Encoder
Denoising Autoencoders
Learning Manifolds with A
Size and Depth . . .
and Decoders .
utoencoders .
Contractive Autoencoders
Predictive Sparse Decomposition
Applications of Autoencoders
15 Representation Learning
15.1
15.2
15.3
15.4
15.5
15.6
Greedy Layer-Wise Unsupervised Pretraining
Transfer Learning and Domain Adaptation
ed Di
Distributed Rep:
Exponential Gains from Depth :
Providing Clues to Discover Underlying Caus
Semi-Supe
ntangling of Causal Factors
ntation
16 Structured Probabilistic Models for Deep Learning
16.1
16.2
16.3
16.4
16.5
16.6
16.7
The Challenge of Unstructured Modeling
Using Graphs to Describe Model Structure
Sampling from Graphical Models... .
Advantages of Structured Modeling . .
Learning about Dependencies
Inference and Approximate Infe:
The Deep Learning Approach to Structured
Probabilistic Models
17 Monte Carlo Methods
171
Sampling and Monte Carlo Methods
555
. 556
560
. S77
579
579
. 58018
19
20
17.2. Importance Sampling
17.3 Markov Chain Monte Carlo Methods . .
17.4 Gibbs Sampling
17.5 The Challenge of Mixing between Separated
Modes
Confronting the Partition Function
18.1 The Log-Likelihood Gradient
18.2 Stochastic Maximum Likelihood and Contrastive Divergence
18.3. Pseudolikelihood
184 Score Matching and Ratio Matching
18.5 Denoising Score Matching
18.6 Noise-Contrastive Estimation
18.7. Estimating the Partition Function
Approximate Inference
19.1 Inference as Optimization
19.2. Expectation Maximization .
19.3 MAP Inference and Sparse Coding
19.4. Variational Inference and Learning
19.5 Learned Approximate Inference
Deep Generative Models
20.1 Boltzmann Machines . .
20.2. Restricted Boltzmann Machin
20.3. Deep Belief Networks
20.4 Deep Boltzmann Machines :
205 Boltzmann Machines for Real-Valued Data
20.6 Convolutional Boltzmann Machines .
20.7 Boltzmann Machines for Structured or Sequential Outputs -
20.8 Other Boltzmann Machines
20.9 Back-Propagation through Random Operations .
20.10 Directed Generative Nets
20.11 Drawing Samples from Autoencoders . .
20.12 Generative Stochastic Networks
20.13 Other Generation Schemes
20.14 Evaluating Generative Model
20.15 Conclusion
Bibliography
-. 657
- 660
589
592
596
603
. 604
605
-. 613
. 615
617
. 618
621
629
631
- 632
636
- 648
651
. 651
673
.. 679
. 681
683
. 684
688
. 707
710
. 712
713
716
717Index 173‘Website
www.deeplearningbook.org
This book is accompanied by the above website. The website provides a
variety of supplementary material, including exercises, lecture slides, corrections of
mistakes, and other resources that should be useful to both readers and instructors.
viii