logging in or signing up prosper Lucianna Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 56 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 28, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Bayesian Within The GatesA View From Particle Physics: Bayesian Within The Gates A View From Particle Physics Harrison B. Prosper Florida State University SAMSI 24 January, 2006Outline: Outline Measuring Zero as Precisely as Possible! Signal/Background Discrimination 1-D Example 14-D Example Some Open Issues Summary Measuring Zero!: Measuring Zero! Diamonds may not be forever Neutron <-> anti-neutron transitions, CRISP Experiment (1982 – 1985), Institut Laue Langevin Grenoble, France Method Fire gas of cold neutrons onto a graphite foil. Look for annihilation of anti-neutron component. Measuring Zero!: Measuring Zero! Count number of signal + background events N. Suppress putative signal and count background events B, independently. Results: N = 3 B = 7Measuring Zero!: Measuring Zero! Classic 2-Parameter Counting Experiment N ~ Poisson(s+b) B ~ Poisson(b) Wanted: A statement like s < u(N,B) @ 90% CL Measuring Zero!: Measuring Zero! In 1984, no exact solution existed in the particle physics literature! But, surely it must have been solved by statisticians. Alas, from Kendal and Stuart I learnt that calculating exact confidence intervals is “a matter of very considerable difficulty”.Measuring Zero!: Measuring Zero! Exact in what way? Over the ensemble of statements of the form s є [0, u) at least 90% of them should be true whatever the true value of the signal s AND whatever the true value of the background parameter b. blame… Neyman (1937) Slide8: “Keep it simple, but no simpler” Albert EinsteinBayesian @ the Gate (1984): Bayesian @ the Gate (1984) Solution: p(N,B|s,b) = Poisson(s+b) Poisson(b) the likelihood p(s,b) = uniform(s,b) the prior Compute the posterior density p(s,b|N,B) p(s,b|N,B) = p(N,B|s,b) p(s,b)/p(N,B) Marginalize over b p(s|N,B) = ∫p(s,b|N,B) db This reasoning was compelling to me then, and is much more so now!Particle Physics Data: Particle Physics Data proton + anti-proton -> positron (e+) neutrino (n) Jet1 Jet2 Jet3 Jet4 This event “lives” in 3 + 2 + 3 x 4 = 17 dimensions.Particle Physics Data: CDF/Dzero Discovery of top quark (1995) Data red Signal green Background blue, magenta Dzero: 17-D -> 2-D Particle Physics DataSlide12: But that was then, and now is now! Today we have 2 GHz laptops, with 2 GB of memory! It is fun to deploy huge, sometimes unreliable, computational resources, that is, brains, to reduce the dimensionality of data. But perhaps it is now feasible to work directly in the original high-dimensional space, using hardware!Signal/Background Discrimination: Signal/Background Discrimination The optimal solution is to compute p(S|x) = p(x|s) p(s) / [p(x|s) p(s) + p(x|B) p(B)] Every signal/background discrimination method is ultimately an algorithm to approximate this solution, or a mapping thereof. Therefore, if a method is already at the Bayes limit, no other method, however sophisticated, can do better! Signal/Background Discrimination: Given D = x, y x = {x1,…xN}, y = {y1,…yN} of N training examples Infer A discriminant function f(x, w), with parameters w p(w|x, y) = p(x, y|w) p(w) / p(x, y) = p(y|x, w) p(x|w) p(w) / p(y|x) p(x) = p(y|x, w) p(w) / p(y|x) assuming p(x|w) -> p(x) Signal/Background DiscriminationSignal/Background Discrimination: A typical likelihood for classification: p(y|x, w) = Pi f(xi, w)y [1 – f(xi, w)]1-y where y = 0 for background events y = 1 for signal events If f(x, w) flexible enough, then maximizing p(y|x, w) with respect to w yields f = p(S|x), asymptotically. Signal/Background DiscriminationSignal/Background Discrimination: However, in a full Bayesian calculation one usually averages with respect to the posterior density y(x) = ∫ f(x, w) p(w|D) dw Questions: 1. Do suitably flexible functions f(x, w) exist? 2. Is there a feasible way to do the integral? Signal/Background DiscriminationAnswer 1: Hilbert’s 13th Problem!: Answer 1: Hilbert’s 13th Problem! Prove that the following is impossible y(x,y,z) = F( A(x), B(y), C(z) ) In 1957, Kolmogorov proved the contrary conjecture y(x1,..,xn) = F( f1(x1),…,fn(xn) ) I’ll call such functions, F, Kolmogorov functions Kolmogorov Functions: Kolmogorov Functions A neural network is an example of a Kolmogorov function, that is, a function capable of approximating arbitrary mappings f:RN -> U The parameters w = (u, a, v, b) are called weightsAnswer 2: Use Hybrid MCMC: Answer 2: Use Hybrid MCMC Computational Method Generate a Markov chain (MC) of N points {w} drawn from the posterior density p(w|D) and average over the last M points. Each point corresponds to a network. Software Flexible Bayesian Modeling by Radford Neal http://www.cs.utoronto.ca/~radford/fbm.software.htmlA 1-D Example: A 1-D Example Signal p+pbar -> t q b Background p+pbar -> W b b NN Model Class (1, 15, 1) MCMC 500 tqb + Wbb events Use last 20 networks in a MC chain of 500. x Wbb tqbA 1-D Example : A 1-D Example x Dots p(S|x) = HS/(HS+HB) HS, HB, 1-D histograms Curves Individual NNs n(x, wk) Black curve < n(x, w) >A 14-D Example (Finding Susy!): A 14-D Example (Finding Susy!) Transverse momentum spectra Signal: black curve Signal/Noise 1/100,000A 14-D Example (Finding Susy!): A 14-D Example (Finding Susy!) Missing transverse momentum spectrum (caused by escape of neutrinos and Susy particles) Variable count 4 x (ET, h, f) + (ET, f) = 14A 14-D Example (Finding Susy!): Likelihood Prior A 14-D Example (Finding Susy!) Signal 250 p+pbar -> top + anti-top (MC) events Background 250 p+pbar -> gluino gluino (MC) events NN Model Class (14, 40, 1) (641-D parameter space!) MCMC Use last 100 networks in a Markov chain of 10,000, skipping every 20. But does it Work?: But does it Work? Signal to noise can reach 1/1 with an acceptable signal strengthBut does it Work? : But does it Work? Let d(x) = N p(x|S) + N p(x|B) be the density of the data, containing 2N events, assuming, for simplicity, p(S) = p(B). A properly trained classifier y(x) approximates p(S|x) = p(x|S)/[p(x|S) + p(x|B)] Therefore, if the signal and background events are weighted with y(x), we should recover the signal density.But does it Work? : But does it Work? Amazingly well !Some Open Issues: Some Open Issues Why does this insane function p(w1,…,w641|x1,…,x500) behave so well? 641 parameters > 500 events! How should one verify that an n-D (n ~ 14) swarm of simulated background events matches the n-D swarm of observed events (in the background region)? How should one verify that y(x) is indeed a reasonable approximation to the Bayes discriminant, p(S|x)?Summary: Summary Bayesian methods have been, and are being, used with considerable success by particle physicists. Happily, the frequentist/Bayesian Cold War is abating! The application of Bayesian methods to highly flexible functions, e.g., neural networks, is very promising and should be broadly applicable. Needed: A powerful way to compare high-dimensional swarms of points. Agree, or not agree, that is the question! You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
prosper Lucianna Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 56 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: December 28, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Bayesian Within The GatesA View From Particle Physics: Bayesian Within The Gates A View From Particle Physics Harrison B. Prosper Florida State University SAMSI 24 January, 2006Outline: Outline Measuring Zero as Precisely as Possible! Signal/Background Discrimination 1-D Example 14-D Example Some Open Issues Summary Measuring Zero!: Measuring Zero! Diamonds may not be forever Neutron <-> anti-neutron transitions, CRISP Experiment (1982 – 1985), Institut Laue Langevin Grenoble, France Method Fire gas of cold neutrons onto a graphite foil. Look for annihilation of anti-neutron component. Measuring Zero!: Measuring Zero! Count number of signal + background events N. Suppress putative signal and count background events B, independently. Results: N = 3 B = 7Measuring Zero!: Measuring Zero! Classic 2-Parameter Counting Experiment N ~ Poisson(s+b) B ~ Poisson(b) Wanted: A statement like s < u(N,B) @ 90% CL Measuring Zero!: Measuring Zero! In 1984, no exact solution existed in the particle physics literature! But, surely it must have been solved by statisticians. Alas, from Kendal and Stuart I learnt that calculating exact confidence intervals is “a matter of very considerable difficulty”.Measuring Zero!: Measuring Zero! Exact in what way? Over the ensemble of statements of the form s є [0, u) at least 90% of them should be true whatever the true value of the signal s AND whatever the true value of the background parameter b. blame… Neyman (1937) Slide8: “Keep it simple, but no simpler” Albert EinsteinBayesian @ the Gate (1984): Bayesian @ the Gate (1984) Solution: p(N,B|s,b) = Poisson(s+b) Poisson(b) the likelihood p(s,b) = uniform(s,b) the prior Compute the posterior density p(s,b|N,B) p(s,b|N,B) = p(N,B|s,b) p(s,b)/p(N,B) Marginalize over b p(s|N,B) = ∫p(s,b|N,B) db This reasoning was compelling to me then, and is much more so now!Particle Physics Data: Particle Physics Data proton + anti-proton -> positron (e+) neutrino (n) Jet1 Jet2 Jet3 Jet4 This event “lives” in 3 + 2 + 3 x 4 = 17 dimensions.Particle Physics Data: CDF/Dzero Discovery of top quark (1995) Data red Signal green Background blue, magenta Dzero: 17-D -> 2-D Particle Physics DataSlide12: But that was then, and now is now! Today we have 2 GHz laptops, with 2 GB of memory! It is fun to deploy huge, sometimes unreliable, computational resources, that is, brains, to reduce the dimensionality of data. But perhaps it is now feasible to work directly in the original high-dimensional space, using hardware!Signal/Background Discrimination: Signal/Background Discrimination The optimal solution is to compute p(S|x) = p(x|s) p(s) / [p(x|s) p(s) + p(x|B) p(B)] Every signal/background discrimination method is ultimately an algorithm to approximate this solution, or a mapping thereof. Therefore, if a method is already at the Bayes limit, no other method, however sophisticated, can do better! Signal/Background Discrimination: Given D = x, y x = {x1,…xN}, y = {y1,…yN} of N training examples Infer A discriminant function f(x, w), with parameters w p(w|x, y) = p(x, y|w) p(w) / p(x, y) = p(y|x, w) p(x|w) p(w) / p(y|x) p(x) = p(y|x, w) p(w) / p(y|x) assuming p(x|w) -> p(x) Signal/Background DiscriminationSignal/Background Discrimination: A typical likelihood for classification: p(y|x, w) = Pi f(xi, w)y [1 – f(xi, w)]1-y where y = 0 for background events y = 1 for signal events If f(x, w) flexible enough, then maximizing p(y|x, w) with respect to w yields f = p(S|x), asymptotically. Signal/Background DiscriminationSignal/Background Discrimination: However, in a full Bayesian calculation one usually averages with respect to the posterior density y(x) = ∫ f(x, w) p(w|D) dw Questions: 1. Do suitably flexible functions f(x, w) exist? 2. Is there a feasible way to do the integral? Signal/Background DiscriminationAnswer 1: Hilbert’s 13th Problem!: Answer 1: Hilbert’s 13th Problem! Prove that the following is impossible y(x,y,z) = F( A(x), B(y), C(z) ) In 1957, Kolmogorov proved the contrary conjecture y(x1,..,xn) = F( f1(x1),…,fn(xn) ) I’ll call such functions, F, Kolmogorov functions Kolmogorov Functions: Kolmogorov Functions A neural network is an example of a Kolmogorov function, that is, a function capable of approximating arbitrary mappings f:RN -> U The parameters w = (u, a, v, b) are called weightsAnswer 2: Use Hybrid MCMC: Answer 2: Use Hybrid MCMC Computational Method Generate a Markov chain (MC) of N points {w} drawn from the posterior density p(w|D) and average over the last M points. Each point corresponds to a network. Software Flexible Bayesian Modeling by Radford Neal http://www.cs.utoronto.ca/~radford/fbm.software.htmlA 1-D Example: A 1-D Example Signal p+pbar -> t q b Background p+pbar -> W b b NN Model Class (1, 15, 1) MCMC 500 tqb + Wbb events Use last 20 networks in a MC chain of 500. x Wbb tqbA 1-D Example : A 1-D Example x Dots p(S|x) = HS/(HS+HB) HS, HB, 1-D histograms Curves Individual NNs n(x, wk) Black curve < n(x, w) >A 14-D Example (Finding Susy!): A 14-D Example (Finding Susy!) Transverse momentum spectra Signal: black curve Signal/Noise 1/100,000A 14-D Example (Finding Susy!): A 14-D Example (Finding Susy!) Missing transverse momentum spectrum (caused by escape of neutrinos and Susy particles) Variable count 4 x (ET, h, f) + (ET, f) = 14A 14-D Example (Finding Susy!): Likelihood Prior A 14-D Example (Finding Susy!) Signal 250 p+pbar -> top + anti-top (MC) events Background 250 p+pbar -> gluino gluino (MC) events NN Model Class (14, 40, 1) (641-D parameter space!) MCMC Use last 100 networks in a Markov chain of 10,000, skipping every 20. But does it Work?: But does it Work? Signal to noise can reach 1/1 with an acceptable signal strengthBut does it Work? : But does it Work? Let d(x) = N p(x|S) + N p(x|B) be the density of the data, containing 2N events, assuming, for simplicity, p(S) = p(B). A properly trained classifier y(x) approximates p(S|x) = p(x|S)/[p(x|S) + p(x|B)] Therefore, if the signal and background events are weighted with y(x), we should recover the signal density.But does it Work? : But does it Work? Amazingly well !Some Open Issues: Some Open Issues Why does this insane function p(w1,…,w641|x1,…,x500) behave so well? 641 parameters > 500 events! How should one verify that an n-D (n ~ 14) swarm of simulated background events matches the n-D swarm of observed events (in the background region)? How should one verify that y(x) is indeed a reasonable approximation to the Bayes discriminant, p(S|x)?Summary: Summary Bayesian methods have been, and are being, used with considerable success by particle physicists. Happily, the frequentist/Bayesian Cold War is abating! The application of Bayesian methods to highly flexible functions, e.g., neural networks, is very promising and should be broadly applicable. Needed: A powerful way to compare high-dimensional swarms of points. Agree, or not agree, that is the question!