concepts
A running log of concepts I learn, organized by date.
Tags:
All
deep-learning
neural-networks
regularization
Status:
All
evergreen
08 Mar 2026
Dropout
Paper: Srivastava, Hinton, et al. (2014)
- Motivation: Prevents “co-adaptation” (feature dependency); approximates an ensemble of $2^n$ thinned networks.
- Equation: $r^{(l)} \sim \text{Bernoulli}(p)$; $\tilde{y}^{(l)} = r^{(l)} * y^{(l)}$.
- Training (Inverted): $a_{train} = \frac{a \cdot mask}{1-p}$ (keeps expected sum consistent).
- Inference: $a_{test} = a$ (all neurons active; no scaling needed).
- Pros: Robust generalization; $O(n)$ overhead; prevents overfitting in deep/wide nets.
- Cons: Increases training time (approx. 2x); requires tuning $p$ (dropout rate).
- Impl:
nn.Dropout(p)(PyTorch) orlayers.Dropout(p)(Keras).
Dropout
Reviewed: 08 Mar 2026
Paper: Srivastava, Hinton, et al. (2014)
- Motivation: Prevents “co-adaptation” (feature dependency); approximates an ensemble of $2^n$ thinned networks.
- Equation: $r^{(l)} \sim \text{Bernoulli}(p)$; $\tilde{y}^{(l)} = r^{(l)} * y^{(l)}$.
- Training (Inverted): $a_{train} = \frac{a \cdot mask}{1-p}$ (keeps expected sum consistent).
- Inference: $a_{test} = a$ (all neurons active; no scaling needed).
- Pros: Robust generalization; $O(n)$ overhead; prevents overfitting in deep/wide nets.
- Cons: Increases training time (approx. 2x); requires tuning $p$ (dropout rate).
- Impl:
nn.Dropout(p)(PyTorch) orlayers.Dropout(p)(Keras).
| Date | Concept | Status | Source | Tags |
|---|---|---|---|---|
| 08-03-2026 | Dropout | evergreen | paper | deep-learning regularization neural-networks |