Bishop - Pattern Recognition And Machine Learning - Springer 2006. 24.[r]
Trang 1Sonpvh
Trang 21. Learning from data
2. Error & Noise
3. Approximation vs Generalization
4. Bias – Variance trade-off
5. Learning curve
1
Trang 4Hoeffding’s inequality
Trang 54
Trang 6Learning Purpose: g(x) ~ f(x)
But what the “g ~ f” mean ? E(g,f)
Trang 7Supper market
verify for discount
CIA verify for security
Trang 87
Trang 9y = ො𝑦 + noise = f(x) + noise = 𝔼 (𝑦|𝑥) + noise
Trang 109
Trang 13𝟏 − 𝜹: confidence requirement
Approximation – generalization trade-off
More complex H ➔ better chance of approximation f Less complex H ➔ better chance of generalizing out of sample
Trang 14VC Dimension (1960 – 1990)
"fundamental theory of learning"
Vladimir Vapnik - Alexey Chervonenkis
Generalization bound
Trang 17“Approximation” - bias
Trang 18”Generalization” - Variance
Trang 19Bias – Variance – who win ?
Trang 21𝐸𝑜𝑢𝑡 𝑔 𝐷 = 𝔼𝐷 𝑔 𝐷 𝑥 − ҧ𝑔 x 2 + 𝔼𝐷 ҧ𝑔 x − 𝑓 𝑥 2
Bias – variance decomposition Eoutto:
• How well H can approximate f
• How well we can zoom in on a good h of H
Trang 22WHO WON … ? Congratulation ℋ0
Trang 2322
Trang 26VC Dimension (1960 – 1990)
"fundamental theory of learning"
Vladimir Vapnik - Alexey Chervonenkis
Trang 2726