prosper

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

Bayesian Within The Gates A View From Particle Physics: 

Bayesian Within The Gates A View From Particle Physics Harrison B. Prosper Florida State University SAMSI 24 January, 2006

Outline: 

Outline Measuring Zero as Precisely as Possible! Signal/Background Discrimination 1-D Example 14-D Example Some Open Issues Summary

Measuring Zero!: 

Measuring Zero! Diamonds may not be forever Neutron <-> anti-neutron transitions, CRISP Experiment (1982 – 1985), Institut Laue Langevin Grenoble, France Method Fire gas of cold neutrons onto a graphite foil. Look for annihilation of anti-neutron component.

Measuring Zero!: 

Measuring Zero! Count number of signal + background events N. Suppress putative signal and count background events B, independently. Results: N = 3 B = 7

Measuring Zero!: 

Measuring Zero! Classic 2-Parameter Counting Experiment N ~ Poisson(s+b) B ~ Poisson(b) Wanted: A statement like s < u(N,B) @ 90% CL

Measuring Zero!: 

Measuring Zero! In 1984, no exact solution existed in the particle physics literature! But, surely it must have been solved by statisticians. Alas, from Kendal and Stuart I learnt that calculating exact confidence intervals is “a matter of very considerable difficulty”.

Measuring Zero!: 

Measuring Zero! Exact in what way? Over the ensemble of statements of the form s є [0, u) at least 90% of them should be true whatever the true value of the signal s AND whatever the true value of the background parameter b. blame… Neyman (1937)

Slide8: 

“Keep it simple, but no simpler” Albert Einstein

Bayesian @ the Gate (1984): 

Bayesian @ the Gate (1984) Solution: p(N,B|s,b) = Poisson(s+b) Poisson(b) the likelihood p(s,b) = uniform(s,b) the prior Compute the posterior density p(s,b|N,B) p(s,b|N,B) = p(N,B|s,b) p(s,b)/p(N,B) Marginalize over b p(s|N,B) = ∫p(s,b|N,B) db This reasoning was compelling to me then, and is much more so now!

Particle Physics Data: 

Particle Physics Data proton + anti-proton -> positron (e+) neutrino (n) Jet1 Jet2 Jet3 Jet4 This event “lives” in 3 + 2 + 3 x 4 = 17 dimensions.

Particle Physics Data: 

CDF/Dzero Discovery of top quark (1995) Data red Signal green Background blue, magenta Dzero: 17-D -> 2-D Particle Physics Data

Slide12: 

But that was then, and now is now! Today we have 2 GHz laptops, with 2 GB of memory! It is fun to deploy huge, sometimes unreliable, computational resources, that is, brains, to reduce the dimensionality of data. But perhaps it is now feasible to work directly in the original high-dimensional space, using hardware!

Signal/Background Discrimination: 

Signal/Background Discrimination The optimal solution is to compute p(S|x) = p(x|s) p(s) / [p(x|s) p(s) + p(x|B) p(B)] Every signal/background discrimination method is ultimately an algorithm to approximate this solution, or a mapping thereof. Therefore, if a method is already at the Bayes limit, no other method, however sophisticated, can do better!

Signal/Background Discrimination: 

Given D = x, y x = {x1,…xN}, y = {y1,…yN} of N training examples Infer A discriminant function f(x, w), with parameters w p(w|x, y) = p(x, y|w) p(w) / p(x, y) = p(y|x, w) p(x|w) p(w) / p(y|x) p(x) = p(y|x, w) p(w) / p(y|x) assuming p(x|w) -> p(x) Signal/Background Discrimination

Signal/Background Discrimination: 

A typical likelihood for classification: p(y|x, w) = Pi f(xi, w)y [1 – f(xi, w)]1-y where y = 0 for background events y = 1 for signal events If f(x, w) flexible enough, then maximizing p(y|x, w) with respect to w yields f = p(S|x), asymptotically. Signal/Background Discrimination

Signal/Background Discrimination: 

However, in a full Bayesian calculation one usually averages with respect to the posterior density y(x) = ∫ f(x, w) p(w|D) dw Questions: 1. Do suitably flexible functions f(x, w) exist? 2. Is there a feasible way to do the integral? Signal/Background Discrimination

Answer 1: Hilbert’s 13th Problem!: 

Answer 1: Hilbert’s 13th Problem! Prove that the following is impossible y(x,y,z) = F( A(x), B(y), C(z) ) In 1957, Kolmogorov proved the contrary conjecture y(x1,..,xn) = F( f1(x1),…,fn(xn) ) I’ll call such functions, F, Kolmogorov functions

Kolmogorov Functions: 

Kolmogorov Functions A neural network is an example of a Kolmogorov function, that is, a function capable of approximating arbitrary mappings f:RN -> U The parameters w = (u, a, v, b) are called weights

Answer 2: Use Hybrid MCMC: 

Answer 2: Use Hybrid MCMC Computational Method Generate a Markov chain (MC) of N points {w} drawn from the posterior density p(w|D) and average over the last M points. Each point corresponds to a network. Software Flexible Bayesian Modeling by Radford Neal http://www.cs.utoronto.ca/~radford/fbm.software.html

A 1-D Example: 

A 1-D Example Signal p+pbar -> t q b Background p+pbar -> W b b NN Model Class (1, 15, 1) MCMC 500 tqb + Wbb events Use last 20 networks in a MC chain of 500. x Wbb tqb

A 1-D Example : 

A 1-D Example x Dots p(S|x) = HS/(HS+HB) HS, HB, 1-D histograms Curves Individual NNs n(x, wk) Black curve < n(x, w) >

A 14-D Example (Finding Susy!): 

A 14-D Example (Finding Susy!) Transverse momentum spectra Signal: black curve Signal/Noise 1/100,000

A 14-D Example (Finding Susy!): 

A 14-D Example (Finding Susy!) Missing transverse momentum spectrum (caused by escape of neutrinos and Susy particles) Variable count 4 x (ET, h, f) + (ET, f) = 14

A 14-D Example (Finding Susy!): 

Likelihood Prior A 14-D Example (Finding Susy!) Signal 250 p+pbar -> top + anti-top (MC) events Background 250 p+pbar -> gluino gluino (MC) events NN Model Class (14, 40, 1) (641-D parameter space!) MCMC Use last 100 networks in a Markov chain of 10,000, skipping every 20.

But does it Work?: 

But does it Work? Signal to noise can reach 1/1 with an acceptable signal strength

But does it Work? : 

But does it Work? Let d(x) = N p(x|S) + N p(x|B) be the density of the data, containing 2N events, assuming, for simplicity, p(S) = p(B). A properly trained classifier y(x) approximates p(S|x) = p(x|S)/[p(x|S) + p(x|B)] Therefore, if the signal and background events are weighted with y(x), we should recover the signal density.

But does it Work? : 

But does it Work? Amazingly well !

Some Open Issues: 

Some Open Issues Why does this insane function p(w1,…,w641|x1,…,x500) behave so well? 641 parameters > 500 events! How should one verify that an n-D (n ~ 14) swarm of simulated background events matches the n-D swarm of observed events (in the background region)? How should one verify that y(x) is indeed a reasonable approximation to the Bayes discriminant, p(S|x)?

Summary: 

Summary Bayesian methods have been, and are being, used with considerable success by particle physicists. Happily, the frequentist/Bayesian Cold War is abating! The application of Bayesian methods to highly flexible functions, e.g., neural networks, is very promising and should be broadly applicable. Needed: A powerful way to compare high-dimensional swarms of points. Agree, or not agree, that is the question!