logging in or signing up ferson Rosalie Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 148 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 09, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Uncertain numbersReliable calculation of probabilities: Uncertain numbers Reliable calculation of probabilities Scott Ferson scott@ramas.com Applied Biomathematics Second Scandinavian Workshop on Interval Methods and Their Applications Technical University of Denmark, København, 26 August 2005Perspective: Perspective Elementary methods of interval analysis Low-dimensional, usually static problems Verified computing (but not roundoff error) Huge uncertainties Intervals combined with probability theory Total probabilities (events) Probability distributions (random variables) Naïve methods very easy to useBounding probability is an old idea: Bounding probability is an old idea Boole and de Morgan Chebyshev and Markov Borel and Fréchet Kolmogorov and Keynes Berger and Walley Williamson and Downs Closely related to other ideas: Closely related to other ideas Second-order probability PBA is easier to work with and more comprehensive Imprecise probabilities PBA is somewhat cruder, but a lot easier Robust Bayesian analysis PBA does convolutions rather than updatingTerminology: Terminology Dependence = stochastic dependence More general than repeated variables Independence = stochastic independence Best possible = tight (almost) some elements in the set may not be possibleIncertitude: Incertitude Arises from incomplete knowledge Incertitude arises from limited sample size measurement uncertainty use of surrogate data Reducible with empirical effort Variability: Variability Arises from natural stochasticity Variability arises from spatial variation temporal fluctuations manufacturing or genetic differences Not reducible by empirical effort They must be treated differently: They must be treated differently Variability should be modeled as randomness with the methods of probability theory Incertitude should be modeled as ignorance with the methods of interval analysis Imprecise probabilities can do both at onceTotal probabilities (events): Total probabilities (events) Logical expressions (Hailperin 1986) Fault trees Event trees Reliability analyses Nuclear power plants Aircraft safety system design Gene technology release assessments etc.Probabilistic logic: Probabilistic logic Conjunctions (and) Disjunctions (or) Negations (not) Exclusive disjunctions (xor) Modus ponens (if-then) etc.Conjunction (and): Conjunction (and) P(A |&| B) = P(A) P(B) Example: P(A) = [0.3, 0.5] P(B) = [0.1, 0.2] A and B are independent P(A |&| B) = [0.03, 0.1]Disjunction (or): Disjunction (or) P(A || B) = 1 (1 P(A))(1 P(B)) Example: P(A) = [0.3, 0.5] P(B) = [0.1, 0.2] A and B are independent P(A || B) = [0.37, 0.6] Disjunction (or): Disjunction (or) P(A || B) = P(A) + P(B) P(A) P(B) Example: P(A) = [0.3, 0.5] P(B) = [0.1, 0.2] A and B are independent P(A || B) = [0.3, 0.67] Negation (not): Negation (not) P(not A) = (1 P(A)) Example: P(A) = [0.3, 0.5] P(not A) = [0.5, 0.7] Stochastic dependence: Stochastic dependence Independent Probabilities are depicted here as the areas in Venn diagramsPerfect dependence: Perfect dependence P(A /&/ B) = min(P(A), P(B)) P(A // B) = max(P(A), P(B)) Examples: P(A) = [0.3, 0.5] P(B) = [0.1, 0.2] A and B are perfectly dependent P(A /&/ B) = [0.1, 0.2] P(A // B) = [0.3, 0.5] Opposite dependence: Opposite dependence P(A \&\ B) = max(P(A) + P(B) 1, 0) P(A \\ B) = min(1, P(A) + P(B)) Examples: P(A) = [0.3, 0.5] P(B) = [0.1, 0.2] A and B are oppositely dependent P(A \&\ B) = 0 P(A \\ B) = [0.4, 0.7] Correlated* events: Correlated* events where a = P(A), b =P(B), s = tan((1r)/4) *There are several possible definitions of correlation Fréchet case: P(A&B)=[max(0, P(A)+ P(B)–1), min(P(A), P(B))] P(AB)=[max(P(A), P(B)), min(1, P(A)+ P(B))] Makes no assumption about the dependence Rigorous (guaranteed to enclose true value) Best possible (cannot be any tighter) Fréchet caseThe proofs are elementary: The proofs are elementary P(A B) = P(A) + P(B) P(A & B) implies P(A) + P(B) P(A B) = P(A & B). P(A B) 1, since probabilities are no bigger than 1, so P(A) + P(B) 1 P(A & B). 0 P(A & B), since probabilities are positive, so max(0, P(A) + P(B) 1) P(A & B). This gives the lower bound on the conjunction. To get the upper bound, recall that P(A & B) = P(A|B) P(B) = P(B|A) P(A). P(A|B) 1 and P(B|A) 1, so P(A & B) P(A) and P(A & B) P(B). Therefore, P(A & B) min(P(A), P(B)), which is the upper bound. The best-possible nature of these bounds follows from observing that they are realized by some dependency between the events A and B. Comparable bounds on the disjunction are similarly derived.Fréchet examples: Fréchet examples Examples: P(A) = [0.3, 0.5] P(B) = [0.1, 0.2] P(A & B) = [0, 0.2] P(A B) = [0.3, 0.7] P(C) = 0.29 P(D) = 0.22 P(C & D) = [0, 0.22] P(C D) = [0.29, 0.51]Sign of the dependence: Sign of the dependence P(A &+ B) = [P(A) P(B), min(P(A), P(B))] P(A + B) = [1(1P(A))(1P(B)), max(P(A), P(B))] P(A & B) = [max(P(A)+ P(B) 1, 0), P(A) P(B)] P(A B) = [1(1P(A))(1P(B)),min(1,P(A)+P(B))] Example: pump system : Example: pump system What’s the chance the tank ruptures under pumping?Fault tree: E1 = “tank rupturing under pressurization” Fault tree Vesely et al. (1981)Boolean expression: Boolean expression E1 = T (K2 (S & (S1 (K1 R)))) Component Pressure tank T Relay K2 Pressure switch S Relay K1 Timer relay R On-switch S1 Probability 5 106 3 105 1 104 3 105 1 104 3 105Different dependency models: Different dependency models Vesely et al. (all variables precise, independent) E1 = T || (K2 || (S |&| (S1 || (K1 || R)))) Mixed dependencies E1 = T || (K2 (S &r (S1 || (K1 // R)))) Correlated to Fréchet E1 = T || (K2 (S & (S1 || (K1 // R)))) All Fréchet E1 = T (K2 (S & (S1 (K1 R)))) Interval probabilities: Interval probabilities Component Pressure tank T Relay K2 Pressure switch S Relay K1 Timer relay R On-switch S1 Probability interval [4.5 106, 5.5 106] [2.5 105, 3.5 105] [0.5 104, 1.5 104] [2.5 105, 3.5 105] [0.5 104, 1.5 104] [2.5 105, 3.5 105]Results: 105 104 103 Probability of tank rupturing under pumping Results 3.5105 [3.499105, 3.504105] [3.50105, 1.35104] [3105, 1.4104] [2.9105, 4.1105] [2.5105, 1.905104] [ 2.5e-05, 0.0001905] Points, all independent Mixed dependencies Correlated to Fréchet Points, all Fréchet Intervals, mixed dependence Intervals, all Fréchet t=[4.5e-6, 5.5e-6] k2=[2.5e-5, 3.5e-5] s=[0.5e-4, 1.5e-4] k1=[2.5e-5, 3.5e-5] r=[0.5e-4, 1.5e-4] s1=[2.5e-5, 3.5e-5] e1 = t | (k2 | (s & (s1 | (k1 | r)))) e1 [ 2.5e-05, 0.0001905] Strategies to handle repetitions: Strategies to handle repetitions Interval analysis is always bounds, but maybe not best possible when parameters are repeated Use cancellation to reduce repetitions, e.g., (A & B) (A & C) (A & D) = A & (B C D) When cancellation is not possible, mathematical programming is needed to get best possible resultSubtle dependencies: Subtle dependencies May also require mathematical programming to obtain the best possible result But rigorous bounds are always easy to get with the artful use of the Fréchet ruleProblems: Problems Derive an algorithm to compute the probability that n of k events occur, given intervals for the probability of each event, assuming they’re independent. Derive an analogous algorithm for the Fréchet case.Distributions (random numbers): Distributions (random numbers) Arithmetic expressions logicals and comparisons Stress-strength comparisons Vulnerability-threat-consequence calculations Exposure analyses Human health risk assessments Ecological risk assessments Financial risk assessmentsProbability box (p-box): Probability box (p-box) 0 1 1.0 2.0 3.0 0.0 X Cumulative probability Interval bounds on an cumulative distribution functionGeneralization of objects: Generalization of objects Not a uniform distribution Cumulative probability 0 10 20 30 40 0 1 Probability distribution Probability box IntervalGeneralization of methods: Generalization of methods Can do arithmetic (and logic) on p-boxes When inputs are distributions, its answers conform with probability theory When inputs are intervals, it agrees with interval analysisProbability bounds arithmetic: Probability bounds arithmetic A B What’s the sum of A+B? 0 1 0 2 4 6 8 10 12 14 Cumulative Probability 0 1 0 1 2 3 4 5 6 Cumulative ProbabilityCartesian product: Cartesian product A+B independence A[1,3] p1 = 1/3 A[3,5] p3 = 1/3 A[2,4] p2 = 1/3 B[2,8] q1 = 1/3 B[8,12] q3 = 1/3 B[6,10] q2 = 1/3 A+B[3,11] prob=1/9A+B under independence: A+B under independence 0 3 6 9 12 18 0.00 0.25 0.50 0.75 1.00 15 A+B Cumulative probabilityCalculations: Calculations All standard mathematical operations Arithmetic (+, , ×, ÷, ^, min, max) Logical operations (and, or, not, if, etc.) Transformations (exp, ln, sin, tan, abs, sqrt, etc.) Backcalculation (deconvolutions, updating) Magnitude comparisons (<, ≤, >, ≥, ) Other operations (envelope, mixture, etc.) Faster than Monte Carlo Guaranteed to bounds answer Optimal solutions often easy to computeDike reliability: Dike reliability D wave sea level revetment blocks clay layerCase study: dike revetment: Case study: dike revetment Reliability depends on the density and thickness of its facing masonry relative density of the revetment blocks = [1.60, 1.65] revetment blocks thickness D = [0.68, 0.72] m slope of the revetment = atan([0.32, 0.34]) model parameter M = [3.0, 5.2] significant wave height H = Gumbel(1.4 m, 0.12 m) offshore peak wave steepness s = normal([0.036], [0.004]) % Reliability depends on the density and thickness of its facing masonry relative density of the revetment blocks = [1.60, 1.65] revetment blocks thickness D = [0.68, 0.72] m slope of the revetment = atan([0.32, 0.34]) model parameter M = [3.0, 5.2] significant wave height H = {([1,1.5], 1/8), ([1.1,1.5], ¼), ([1.3, 1.6], ¼), ([1.3, 1.4]), ([1.5, 1.7],1/8)} m offshore peak wave steepness s = {([3,3.6], 1/20), ([3.4,4.2], 9/20), ([3.9, 4], 9/20), ([4.5, 4.8], 1/20)} %Reliability function: Reliability function H tan() Z = D —————— (all variables are independent) cos() M s The risk Z is less than zero is less than 0.091 The risk Z is less than zero is less than about 0.05What if distribution shape is unknown?: What if distribution shape is unknown? Very challenging for sensitivity analysis since infinite-dimensional problem Bayesians usually fall back on a maximum entropy approach, which erases uncertainty rather than propagates it Bounding seems most reasonable, but should reflect all available informationWhat about other dependencies?: What about other dependencies? Independent Perfectly positive (maximally correlated) Opposite (minimally correlated) Correlated Correlated, with interval correlation Unknown dependence (Fréchet case)Perfect dependence: Perfect dependence A+B perfect positive A[1,3] p1 = 1/3 A[3,5] p3 = 1/3 A[2,4] p2 = 1/3 B[2,8] q1 = 1/3 B[8,12] q3 = 1/3 B[6,10] q2 = 1/3 A+B[3,11] prob=1/3 A+B[5,13] prob=0 A+B[4,12] prob=0 A+B[7,13] prob=0 A+B[9,15] prob=0 A+B[8,14] prob=1/3 A+B[9,15] prob=0 A+B[11,17] prob=1/3 A+B[10,16] prob=0Opposite dependence: Opposite dependence A+B opposite positive A[1,3] p1 = 1/3 A[3,5] p3 = 1/3 A[2,4] p2 = 1/3 B[2,8] q1 = 1/3 B[8,12] q3 = 1/3 B[6,10] q2 = 1/3 A+B[3,11] prob=0 A+B[5,13] prob=1/3 A+B[4,12] prob=0 A+B[7,13] prob=0 A+B[9,15] prob=0 A+B[8,14] prob=1/3 A+B[9,15] prob= 1/3 A+B[11,17] prob=0 A+B[10,16] prob=0Perfect and opposite dependencies: Perfect and opposite dependenciesWhat if dependence is unknown?: What if dependence is unknown? Suppose X, Y ~ uniform(0,24) but we don’t know the dependence between X and Y A sensitivity analysis might vary the correlation coefficient between 1 and +1Varying the correlation coefficient: 0 10 20 30 40 50 0 1 0 10 20 30 40 50 0 1 Cumulative probability X+Y X, Y ~ uniform(0,24) Varying the correlation coefficientCounterexample: CounterexampleSlide52: 0 10 20 30 40 50 0 1 0 10 20 30 40 50 0 1 Cumulative probability X+Y X, Y ~ uniform(0,24) Unknown dependence: Unknown dependence No sensitivity study works in this case (even with an infinite number of trials) Jan Hesthaven Tail risks can be seriously underestimated Only probability bounds analysis worksFréchet (1935) inequalities: Fréchet (1935) inequalities max(0, P(A)+P(B)–1) P(A & B) min(P(A), P(B)) max(P(A), P(B)) P(A B) min(1, P(A)+P(B))Fréchet case (no assumption): Fréchet case (no assumption) A+B Fréchet case A[1,3] p1 = 1/3 A[3,5] p3 = 1/3 A[2,4] p2 = 1/3 B[2,8] q1 = 1/3 B[8,12] q3 = 1/3 B[6,10] q2 = 1/3 A+B[3,11] prob=[0,1/3] A+B[5,13] prob=[0,1/3] A+B[4,12] prob=[0,1/3] A+B[7,13] prob=[0,1/3] A+B[9,15] prob=[0,1/3] A+B[8,14] prob=[0,1/3] A+B[9,15] prob=[0,1/3] A+B[11,17] prob=[0,1/3] A+B[10,16] prob=[0,1/3]Naïve Fréchet case: Naïve Fréchet case 0 1 0 3 6 9 12 15 18 A+B Cumulative Probability This p-box is not best possibleNaïve Fréchet can be improved: Naïve Fréchet can be improved Interval estimates of probabilities don’t reflect the fact that the sum must equal one Resulting p-box is too fat Linear programming can be used to tighten Frank, Nelsen and Sklar (implemented by Williamson) gave a way to compute the optimal answer directlyFrank, Nelsen and Sklar (1987): Frank, Nelsen and Sklar (1987) Suppose X ~ F and Y ~ G. Irrespective of their dependence, this distribution is bounded by This formula can be generalized to work with p-boxes for F and G. Suppose X ~ F and Y ~ G. If X and Y are independent, then the distribution of X+Y isBest possible bounds: Best possible bounds 0 1 0 3 6 9 12 15 18 A+B Cumulative ProbabilityExample: mercury in wild mink: Example: mercury in wild mink Location: Bayou d’Inde, Louisiana Receptor: generic piscivorous small mammal Contaminant: mercury Exposure route: diet (fish and invertebrates) Based loosely on the assessment described in “Appendix I2: Assessment of Risks to Piscivorus [sic] Mammals in the Calcasieu Estuary”, Calcasieu Estuary Remedial Investigation/Feasibility Study (RI/FS): Baseline Ecological Risk Assessment (BERA), prepared October 2002 for the U.S. Environmental Protection Agency. See http://www.epa.gov/earth1r6/6sf/pdffiles/appendixi2.pdf.Total daily intake from diet: Total daily intake from diet FMR = lognormal ([90,120] Kcal/kg/day, [22,30] Kcal/kg/day) //normalized free metabolic rate BW = normal( 608 gram, 66.9 gram) //body mass of the mammal AEfish = minmaxmean(0.77, 0.98, 0.91) //assimilation efficiency AEinverts = minmaxmean(0.72, 0.96, 0.87) //assimilation efficiency GEfish = normal(1200 Kcal per kg, 240 Kcal per kg) //gross energy of fish tissues GEinverts = normal(1050 Kcal per kg, 225 Kcal per kg) //gross energy of invertebrate tissue Cfish = [0.1,0.3] mg per kg //mercury concentration in fish tissue Cinverts = [0.02, 0.06] mg per kg //mercury concentration in invert tissue Pfish = 0.90 //proportion of fish in mammal’s diet Pinverts = 0.10 //proportion of invertebrates in mammal’s diet lognormal(0.196 mg per kg, 0.0213 mg per kg) lognormal(0.0438 mg per kg, 0.00695 mg per kg) Cfish = [0.1,0.3] mg per kg Cinverts = [0.02, 0.06] mg per kgInput p-boxes: 400 600 800 BW 0.1 0.12 P2 0.8 0.9 1 P1 0.02 0.04 0.06 C2 0.1 0.2 0.3 0.4 C1 0 150 300 FMR 0 1000 2000 GE1 0 1000 2000 GE2 0.7 0.8 0.9 1 AE1 0.7 0.8 0.9 1 AE2 Input p-boxes Subscript 1 denotes fish, 2 denotes inverts Exceedance risk (complementary cumulative probability)Results: Results 0 0.1 0.2 0 1 TDI, mg kg1 day1 Exceedance risk mean [ 0.0072, 0.038] median [ 0.0065, 0.038] 95th percentile [ 0.011, 0.065] standard deviation [ 0.0017, 0.022]Example: exotic pest establishment: Example: exotic pest establishment F = A & B & C & D Probability of arriving in the right season Probability of having both sexes present Probability there’s a suitable host Probability of surviving the next winterImperfect information: Imperfect information Calculate A & B & C & D, with partial information: A’s distribution is known, but not its parameters B’s parameters known, but not its shape C has a small empirical data set D is known to be a precise distribution Bounds assuming independence? Without any assumption about dependence?Slide66: A = {lognormal, mean = [.05,.06], variance = [.0001,.001]) B = {min = 0, max = 0.05, mode = 0.03} C = {sample data = 0.2, 0.5, 0.6, 0.7, 0.75, 0.8} D = uniform(0, 1) A=lognormal([.05,.06],sqrt([.0001,.001])) B= minmaxmode(0,0.05,.03) B = max(B,0.000001) C = histogram(0.001,.9999,.2, .5, .6, .7, .75, .8) D = uniform(0.0001,.9999) f = A |&| B |&| C |&| D f ~(range=[9.48437e-14,0.0109203], mean=[0.00006,0.00119], var=[2.90243743e-09,0.00000208]) fi =A & B & C & D fi ~(range=[0,0.05], mean=[0,0.04], var=[0,0.00052]) show fi , f fi ~(range=[0,0.05], mean=[0,0.04], var=[0,0.00052]) f ~(range=[9.48437e-14,0.0109203], mean=[0.00006,0.00119], var=[2.90243743e-09,0.00000208]) 0 0.1 0.2 0.3 0 1 A 0 0.02 0.04 0.06 0 1 B 0 1 0 1 D 0 1 0 1 C CCDF CCDFResulting probability: Resulting probability 0 0.02 0.04 0.06 0 1 0 0.01 0.02 0 1 Exceedance risk Exceedance riskSummary statistics: Summary statistics Independent Range [0, 0.011] Median [0, 0.00113] Mean [0.00006, 0.00119] Variance [2.9109, 2.1106] Standard deviation [0.000054, 0.0014] No assumptions about dependence Range [0, 0.05] Median [0, 0.04] Mean [0, 0.04] Variance [0, 0.00052] Standard deviation [0, 0.023] Uncertainty about dependence: Uncertainty about dependence Impossible with sensitivity analysis since infinite-dimensional problem Probability bounds analysis lets you be sure Can be a large or a small consequence (Other dependencies can also be handled)Assumptions: Assumptions Everyone makes assumptions, but not all sets of assumptions are equal! Linear Gaussian Independence Montonic Unimodal Known correlation Any relation Any distribution Any dependence PBA doesn’t require unwarranted assumptionsProbability bounds analysis: Probability bounds analysis It’s not worst case analysis (distribution tails) Marries intervals with probability theory Distinguishes variability and incertitude Solves many problems in uncertainty analysis Input distributions unknown Imperfectly known correlation and dependency Large measurement error, censoring, small sample sizes Model uncertaintyProbability can be wrong: Probability can be wrong Probability theory doesn’t account for gross uncertainty correctly Precision of the answer (measured as cv) depends strongly on the number inputs and not so strongly on their distribution shapes, even if they are uniforms or flat priors The more inputs, the tighter the answer A few grossly uncertain inputs: A few grossly uncertain inputsA lot of grossly uncertain inputs...: Where does this surety come from? What justifies it? A lot of grossly uncertain inputs...“Smoke and mirrors” certainty: “Smoke and mirrors” certainty Probability makes certainty out of nothing It has an inadequate model of ignorance Probability bounds analysis gives a vacuous answer if all you give it are vacuous inputs Conclusions: Conclusions Interval analysis has an inadequate model of dependence Probability theory has an inadequate model of ignorance Probability bounds analysis corrects both and does things sensitivity studies cannot PBA is much simpler computationally than IPSoftware: Software RAMAS Risk Calc 4.0 StatTool Williamson and Downs (1990)Applications: Applications Superfund risk assessments for human health Ecological risk assessments, endangered spp. Sandia National Labs Climate modeling (PIK)Acknowledgments: Acknowledgments Vladik Kreinovich Roger Nelsen Dan Berleant Lev Ginzburg U.S. National Institutes of Health Sandia National LaboratoriesClassic references: Classic references Boole, G. 1854. An Investigation of the Laws of Thought, On Which Are Founded the Mathematical Theories of Logic and Probability. Walton and Maberly, London. Fréchet, M., 1935. Généralisations du théorème des probabilités totales. Fundamenta Mathematica 25: 379–387. Hailperin, T. 1986. Boole’s Logic and Probability. North-Holland, Amsterdam. Williamson, R.C. and T. Downs 1990. Probabilistic arithmetic I: numerical methods for calculating convolutions and dependency bounds. International Journal of Approximate Reasoning 4:89–158.Web-accessible reading: Web-accessible reading http://www.sandia.gov/epistemic/Reports/SAND2002-4015.pdf (introduction to p-boxes and related structures) http://www.ramas.com/pbawhite.pdf (introduction for Monte Carlo jocks) http://www.ramas.com/depend.pdf (handling dependencies in uncertainty calculations) http://www.ramas.com/bayes.pdf (Bayesian methods in risk analysis) http://www.ramas.com/intstats.pdf (statistics for data that may contain interval uncertainty) scott@ramas.com (other questions and comments welcome)End: End You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
ferson Rosalie Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 148 Category: Entertainment License: All Rights Reserved Like it (0) Dislike it (0) Added: November 09, 2007 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Uncertain numbersReliable calculation of probabilities: Uncertain numbers Reliable calculation of probabilities Scott Ferson scott@ramas.com Applied Biomathematics Second Scandinavian Workshop on Interval Methods and Their Applications Technical University of Denmark, København, 26 August 2005Perspective: Perspective Elementary methods of interval analysis Low-dimensional, usually static problems Verified computing (but not roundoff error) Huge uncertainties Intervals combined with probability theory Total probabilities (events) Probability distributions (random variables) Naïve methods very easy to useBounding probability is an old idea: Bounding probability is an old idea Boole and de Morgan Chebyshev and Markov Borel and Fréchet Kolmogorov and Keynes Berger and Walley Williamson and Downs Closely related to other ideas: Closely related to other ideas Second-order probability PBA is easier to work with and more comprehensive Imprecise probabilities PBA is somewhat cruder, but a lot easier Robust Bayesian analysis PBA does convolutions rather than updatingTerminology: Terminology Dependence = stochastic dependence More general than repeated variables Independence = stochastic independence Best possible = tight (almost) some elements in the set may not be possibleIncertitude: Incertitude Arises from incomplete knowledge Incertitude arises from limited sample size measurement uncertainty use of surrogate data Reducible with empirical effort Variability: Variability Arises from natural stochasticity Variability arises from spatial variation temporal fluctuations manufacturing or genetic differences Not reducible by empirical effort They must be treated differently: They must be treated differently Variability should be modeled as randomness with the methods of probability theory Incertitude should be modeled as ignorance with the methods of interval analysis Imprecise probabilities can do both at onceTotal probabilities (events): Total probabilities (events) Logical expressions (Hailperin 1986) Fault trees Event trees Reliability analyses Nuclear power plants Aircraft safety system design Gene technology release assessments etc.Probabilistic logic: Probabilistic logic Conjunctions (and) Disjunctions (or) Negations (not) Exclusive disjunctions (xor) Modus ponens (if-then) etc.Conjunction (and): Conjunction (and) P(A |&| B) = P(A) P(B) Example: P(A) = [0.3, 0.5] P(B) = [0.1, 0.2] A and B are independent P(A |&| B) = [0.03, 0.1]Disjunction (or): Disjunction (or) P(A || B) = 1 (1 P(A))(1 P(B)) Example: P(A) = [0.3, 0.5] P(B) = [0.1, 0.2] A and B are independent P(A || B) = [0.37, 0.6] Disjunction (or): Disjunction (or) P(A || B) = P(A) + P(B) P(A) P(B) Example: P(A) = [0.3, 0.5] P(B) = [0.1, 0.2] A and B are independent P(A || B) = [0.3, 0.67] Negation (not): Negation (not) P(not A) = (1 P(A)) Example: P(A) = [0.3, 0.5] P(not A) = [0.5, 0.7] Stochastic dependence: Stochastic dependence Independent Probabilities are depicted here as the areas in Venn diagramsPerfect dependence: Perfect dependence P(A /&/ B) = min(P(A), P(B)) P(A // B) = max(P(A), P(B)) Examples: P(A) = [0.3, 0.5] P(B) = [0.1, 0.2] A and B are perfectly dependent P(A /&/ B) = [0.1, 0.2] P(A // B) = [0.3, 0.5] Opposite dependence: Opposite dependence P(A \&\ B) = max(P(A) + P(B) 1, 0) P(A \\ B) = min(1, P(A) + P(B)) Examples: P(A) = [0.3, 0.5] P(B) = [0.1, 0.2] A and B are oppositely dependent P(A \&\ B) = 0 P(A \\ B) = [0.4, 0.7] Correlated* events: Correlated* events where a = P(A), b =P(B), s = tan((1r)/4) *There are several possible definitions of correlation Fréchet case: P(A&B)=[max(0, P(A)+ P(B)–1), min(P(A), P(B))] P(AB)=[max(P(A), P(B)), min(1, P(A)+ P(B))] Makes no assumption about the dependence Rigorous (guaranteed to enclose true value) Best possible (cannot be any tighter) Fréchet caseThe proofs are elementary: The proofs are elementary P(A B) = P(A) + P(B) P(A & B) implies P(A) + P(B) P(A B) = P(A & B). P(A B) 1, since probabilities are no bigger than 1, so P(A) + P(B) 1 P(A & B). 0 P(A & B), since probabilities are positive, so max(0, P(A) + P(B) 1) P(A & B). This gives the lower bound on the conjunction. To get the upper bound, recall that P(A & B) = P(A|B) P(B) = P(B|A) P(A). P(A|B) 1 and P(B|A) 1, so P(A & B) P(A) and P(A & B) P(B). Therefore, P(A & B) min(P(A), P(B)), which is the upper bound. The best-possible nature of these bounds follows from observing that they are realized by some dependency between the events A and B. Comparable bounds on the disjunction are similarly derived.Fréchet examples: Fréchet examples Examples: P(A) = [0.3, 0.5] P(B) = [0.1, 0.2] P(A & B) = [0, 0.2] P(A B) = [0.3, 0.7] P(C) = 0.29 P(D) = 0.22 P(C & D) = [0, 0.22] P(C D) = [0.29, 0.51]Sign of the dependence: Sign of the dependence P(A &+ B) = [P(A) P(B), min(P(A), P(B))] P(A + B) = [1(1P(A))(1P(B)), max(P(A), P(B))] P(A & B) = [max(P(A)+ P(B) 1, 0), P(A) P(B)] P(A B) = [1(1P(A))(1P(B)),min(1,P(A)+P(B))] Example: pump system : Example: pump system What’s the chance the tank ruptures under pumping?Fault tree: E1 = “tank rupturing under pressurization” Fault tree Vesely et al. (1981)Boolean expression: Boolean expression E1 = T (K2 (S & (S1 (K1 R)))) Component Pressure tank T Relay K2 Pressure switch S Relay K1 Timer relay R On-switch S1 Probability 5 106 3 105 1 104 3 105 1 104 3 105Different dependency models: Different dependency models Vesely et al. (all variables precise, independent) E1 = T || (K2 || (S |&| (S1 || (K1 || R)))) Mixed dependencies E1 = T || (K2 (S &r (S1 || (K1 // R)))) Correlated to Fréchet E1 = T || (K2 (S & (S1 || (K1 // R)))) All Fréchet E1 = T (K2 (S & (S1 (K1 R)))) Interval probabilities: Interval probabilities Component Pressure tank T Relay K2 Pressure switch S Relay K1 Timer relay R On-switch S1 Probability interval [4.5 106, 5.5 106] [2.5 105, 3.5 105] [0.5 104, 1.5 104] [2.5 105, 3.5 105] [0.5 104, 1.5 104] [2.5 105, 3.5 105]Results: 105 104 103 Probability of tank rupturing under pumping Results 3.5105 [3.499105, 3.504105] [3.50105, 1.35104] [3105, 1.4104] [2.9105, 4.1105] [2.5105, 1.905104] [ 2.5e-05, 0.0001905] Points, all independent Mixed dependencies Correlated to Fréchet Points, all Fréchet Intervals, mixed dependence Intervals, all Fréchet t=[4.5e-6, 5.5e-6] k2=[2.5e-5, 3.5e-5] s=[0.5e-4, 1.5e-4] k1=[2.5e-5, 3.5e-5] r=[0.5e-4, 1.5e-4] s1=[2.5e-5, 3.5e-5] e1 = t | (k2 | (s & (s1 | (k1 | r)))) e1 [ 2.5e-05, 0.0001905] Strategies to handle repetitions: Strategies to handle repetitions Interval analysis is always bounds, but maybe not best possible when parameters are repeated Use cancellation to reduce repetitions, e.g., (A & B) (A & C) (A & D) = A & (B C D) When cancellation is not possible, mathematical programming is needed to get best possible resultSubtle dependencies: Subtle dependencies May also require mathematical programming to obtain the best possible result But rigorous bounds are always easy to get with the artful use of the Fréchet ruleProblems: Problems Derive an algorithm to compute the probability that n of k events occur, given intervals for the probability of each event, assuming they’re independent. Derive an analogous algorithm for the Fréchet case.Distributions (random numbers): Distributions (random numbers) Arithmetic expressions logicals and comparisons Stress-strength comparisons Vulnerability-threat-consequence calculations Exposure analyses Human health risk assessments Ecological risk assessments Financial risk assessmentsProbability box (p-box): Probability box (p-box) 0 1 1.0 2.0 3.0 0.0 X Cumulative probability Interval bounds on an cumulative distribution functionGeneralization of objects: Generalization of objects Not a uniform distribution Cumulative probability 0 10 20 30 40 0 1 Probability distribution Probability box IntervalGeneralization of methods: Generalization of methods Can do arithmetic (and logic) on p-boxes When inputs are distributions, its answers conform with probability theory When inputs are intervals, it agrees with interval analysisProbability bounds arithmetic: Probability bounds arithmetic A B What’s the sum of A+B? 0 1 0 2 4 6 8 10 12 14 Cumulative Probability 0 1 0 1 2 3 4 5 6 Cumulative ProbabilityCartesian product: Cartesian product A+B independence A[1,3] p1 = 1/3 A[3,5] p3 = 1/3 A[2,4] p2 = 1/3 B[2,8] q1 = 1/3 B[8,12] q3 = 1/3 B[6,10] q2 = 1/3 A+B[3,11] prob=1/9A+B under independence: A+B under independence 0 3 6 9 12 18 0.00 0.25 0.50 0.75 1.00 15 A+B Cumulative probabilityCalculations: Calculations All standard mathematical operations Arithmetic (+, , ×, ÷, ^, min, max) Logical operations (and, or, not, if, etc.) Transformations (exp, ln, sin, tan, abs, sqrt, etc.) Backcalculation (deconvolutions, updating) Magnitude comparisons (<, ≤, >, ≥, ) Other operations (envelope, mixture, etc.) Faster than Monte Carlo Guaranteed to bounds answer Optimal solutions often easy to computeDike reliability: Dike reliability D wave sea level revetment blocks clay layerCase study: dike revetment: Case study: dike revetment Reliability depends on the density and thickness of its facing masonry relative density of the revetment blocks = [1.60, 1.65] revetment blocks thickness D = [0.68, 0.72] m slope of the revetment = atan([0.32, 0.34]) model parameter M = [3.0, 5.2] significant wave height H = Gumbel(1.4 m, 0.12 m) offshore peak wave steepness s = normal([0.036], [0.004]) % Reliability depends on the density and thickness of its facing masonry relative density of the revetment blocks = [1.60, 1.65] revetment blocks thickness D = [0.68, 0.72] m slope of the revetment = atan([0.32, 0.34]) model parameter M = [3.0, 5.2] significant wave height H = {([1,1.5], 1/8), ([1.1,1.5], ¼), ([1.3, 1.6], ¼), ([1.3, 1.4]), ([1.5, 1.7],1/8)} m offshore peak wave steepness s = {([3,3.6], 1/20), ([3.4,4.2], 9/20), ([3.9, 4], 9/20), ([4.5, 4.8], 1/20)} %Reliability function: Reliability function H tan() Z = D —————— (all variables are independent) cos() M s The risk Z is less than zero is less than 0.091 The risk Z is less than zero is less than about 0.05What if distribution shape is unknown?: What if distribution shape is unknown? Very challenging for sensitivity analysis since infinite-dimensional problem Bayesians usually fall back on a maximum entropy approach, which erases uncertainty rather than propagates it Bounding seems most reasonable, but should reflect all available informationWhat about other dependencies?: What about other dependencies? Independent Perfectly positive (maximally correlated) Opposite (minimally correlated) Correlated Correlated, with interval correlation Unknown dependence (Fréchet case)Perfect dependence: Perfect dependence A+B perfect positive A[1,3] p1 = 1/3 A[3,5] p3 = 1/3 A[2,4] p2 = 1/3 B[2,8] q1 = 1/3 B[8,12] q3 = 1/3 B[6,10] q2 = 1/3 A+B[3,11] prob=1/3 A+B[5,13] prob=0 A+B[4,12] prob=0 A+B[7,13] prob=0 A+B[9,15] prob=0 A+B[8,14] prob=1/3 A+B[9,15] prob=0 A+B[11,17] prob=1/3 A+B[10,16] prob=0Opposite dependence: Opposite dependence A+B opposite positive A[1,3] p1 = 1/3 A[3,5] p3 = 1/3 A[2,4] p2 = 1/3 B[2,8] q1 = 1/3 B[8,12] q3 = 1/3 B[6,10] q2 = 1/3 A+B[3,11] prob=0 A+B[5,13] prob=1/3 A+B[4,12] prob=0 A+B[7,13] prob=0 A+B[9,15] prob=0 A+B[8,14] prob=1/3 A+B[9,15] prob= 1/3 A+B[11,17] prob=0 A+B[10,16] prob=0Perfect and opposite dependencies: Perfect and opposite dependenciesWhat if dependence is unknown?: What if dependence is unknown? Suppose X, Y ~ uniform(0,24) but we don’t know the dependence between X and Y A sensitivity analysis might vary the correlation coefficient between 1 and +1Varying the correlation coefficient: 0 10 20 30 40 50 0 1 0 10 20 30 40 50 0 1 Cumulative probability X+Y X, Y ~ uniform(0,24) Varying the correlation coefficientCounterexample: CounterexampleSlide52: 0 10 20 30 40 50 0 1 0 10 20 30 40 50 0 1 Cumulative probability X+Y X, Y ~ uniform(0,24) Unknown dependence: Unknown dependence No sensitivity study works in this case (even with an infinite number of trials) Jan Hesthaven Tail risks can be seriously underestimated Only probability bounds analysis worksFréchet (1935) inequalities: Fréchet (1935) inequalities max(0, P(A)+P(B)–1) P(A & B) min(P(A), P(B)) max(P(A), P(B)) P(A B) min(1, P(A)+P(B))Fréchet case (no assumption): Fréchet case (no assumption) A+B Fréchet case A[1,3] p1 = 1/3 A[3,5] p3 = 1/3 A[2,4] p2 = 1/3 B[2,8] q1 = 1/3 B[8,12] q3 = 1/3 B[6,10] q2 = 1/3 A+B[3,11] prob=[0,1/3] A+B[5,13] prob=[0,1/3] A+B[4,12] prob=[0,1/3] A+B[7,13] prob=[0,1/3] A+B[9,15] prob=[0,1/3] A+B[8,14] prob=[0,1/3] A+B[9,15] prob=[0,1/3] A+B[11,17] prob=[0,1/3] A+B[10,16] prob=[0,1/3]Naïve Fréchet case: Naïve Fréchet case 0 1 0 3 6 9 12 15 18 A+B Cumulative Probability This p-box is not best possibleNaïve Fréchet can be improved: Naïve Fréchet can be improved Interval estimates of probabilities don’t reflect the fact that the sum must equal one Resulting p-box is too fat Linear programming can be used to tighten Frank, Nelsen and Sklar (implemented by Williamson) gave a way to compute the optimal answer directlyFrank, Nelsen and Sklar (1987): Frank, Nelsen and Sklar (1987) Suppose X ~ F and Y ~ G. Irrespective of their dependence, this distribution is bounded by This formula can be generalized to work with p-boxes for F and G. Suppose X ~ F and Y ~ G. If X and Y are independent, then the distribution of X+Y isBest possible bounds: Best possible bounds 0 1 0 3 6 9 12 15 18 A+B Cumulative ProbabilityExample: mercury in wild mink: Example: mercury in wild mink Location: Bayou d’Inde, Louisiana Receptor: generic piscivorous small mammal Contaminant: mercury Exposure route: diet (fish and invertebrates) Based loosely on the assessment described in “Appendix I2: Assessment of Risks to Piscivorus [sic] Mammals in the Calcasieu Estuary”, Calcasieu Estuary Remedial Investigation/Feasibility Study (RI/FS): Baseline Ecological Risk Assessment (BERA), prepared October 2002 for the U.S. Environmental Protection Agency. See http://www.epa.gov/earth1r6/6sf/pdffiles/appendixi2.pdf.Total daily intake from diet: Total daily intake from diet FMR = lognormal ([90,120] Kcal/kg/day, [22,30] Kcal/kg/day) //normalized free metabolic rate BW = normal( 608 gram, 66.9 gram) //body mass of the mammal AEfish = minmaxmean(0.77, 0.98, 0.91) //assimilation efficiency AEinverts = minmaxmean(0.72, 0.96, 0.87) //assimilation efficiency GEfish = normal(1200 Kcal per kg, 240 Kcal per kg) //gross energy of fish tissues GEinverts = normal(1050 Kcal per kg, 225 Kcal per kg) //gross energy of invertebrate tissue Cfish = [0.1,0.3] mg per kg //mercury concentration in fish tissue Cinverts = [0.02, 0.06] mg per kg //mercury concentration in invert tissue Pfish = 0.90 //proportion of fish in mammal’s diet Pinverts = 0.10 //proportion of invertebrates in mammal’s diet lognormal(0.196 mg per kg, 0.0213 mg per kg) lognormal(0.0438 mg per kg, 0.00695 mg per kg) Cfish = [0.1,0.3] mg per kg Cinverts = [0.02, 0.06] mg per kgInput p-boxes: 400 600 800 BW 0.1 0.12 P2 0.8 0.9 1 P1 0.02 0.04 0.06 C2 0.1 0.2 0.3 0.4 C1 0 150 300 FMR 0 1000 2000 GE1 0 1000 2000 GE2 0.7 0.8 0.9 1 AE1 0.7 0.8 0.9 1 AE2 Input p-boxes Subscript 1 denotes fish, 2 denotes inverts Exceedance risk (complementary cumulative probability)Results: Results 0 0.1 0.2 0 1 TDI, mg kg1 day1 Exceedance risk mean [ 0.0072, 0.038] median [ 0.0065, 0.038] 95th percentile [ 0.011, 0.065] standard deviation [ 0.0017, 0.022]Example: exotic pest establishment: Example: exotic pest establishment F = A & B & C & D Probability of arriving in the right season Probability of having both sexes present Probability there’s a suitable host Probability of surviving the next winterImperfect information: Imperfect information Calculate A & B & C & D, with partial information: A’s distribution is known, but not its parameters B’s parameters known, but not its shape C has a small empirical data set D is known to be a precise distribution Bounds assuming independence? Without any assumption about dependence?Slide66: A = {lognormal, mean = [.05,.06], variance = [.0001,.001]) B = {min = 0, max = 0.05, mode = 0.03} C = {sample data = 0.2, 0.5, 0.6, 0.7, 0.75, 0.8} D = uniform(0, 1) A=lognormal([.05,.06],sqrt([.0001,.001])) B= minmaxmode(0,0.05,.03) B = max(B,0.000001) C = histogram(0.001,.9999,.2, .5, .6, .7, .75, .8) D = uniform(0.0001,.9999) f = A |&| B |&| C |&| D f ~(range=[9.48437e-14,0.0109203], mean=[0.00006,0.00119], var=[2.90243743e-09,0.00000208]) fi =A & B & C & D fi ~(range=[0,0.05], mean=[0,0.04], var=[0,0.00052]) show fi , f fi ~(range=[0,0.05], mean=[0,0.04], var=[0,0.00052]) f ~(range=[9.48437e-14,0.0109203], mean=[0.00006,0.00119], var=[2.90243743e-09,0.00000208]) 0 0.1 0.2 0.3 0 1 A 0 0.02 0.04 0.06 0 1 B 0 1 0 1 D 0 1 0 1 C CCDF CCDFResulting probability: Resulting probability 0 0.02 0.04 0.06 0 1 0 0.01 0.02 0 1 Exceedance risk Exceedance riskSummary statistics: Summary statistics Independent Range [0, 0.011] Median [0, 0.00113] Mean [0.00006, 0.00119] Variance [2.9109, 2.1106] Standard deviation [0.000054, 0.0014] No assumptions about dependence Range [0, 0.05] Median [0, 0.04] Mean [0, 0.04] Variance [0, 0.00052] Standard deviation [0, 0.023] Uncertainty about dependence: Uncertainty about dependence Impossible with sensitivity analysis since infinite-dimensional problem Probability bounds analysis lets you be sure Can be a large or a small consequence (Other dependencies can also be handled)Assumptions: Assumptions Everyone makes assumptions, but not all sets of assumptions are equal! Linear Gaussian Independence Montonic Unimodal Known correlation Any relation Any distribution Any dependence PBA doesn’t require unwarranted assumptionsProbability bounds analysis: Probability bounds analysis It’s not worst case analysis (distribution tails) Marries intervals with probability theory Distinguishes variability and incertitude Solves many problems in uncertainty analysis Input distributions unknown Imperfectly known correlation and dependency Large measurement error, censoring, small sample sizes Model uncertaintyProbability can be wrong: Probability can be wrong Probability theory doesn’t account for gross uncertainty correctly Precision of the answer (measured as cv) depends strongly on the number inputs and not so strongly on their distribution shapes, even if they are uniforms or flat priors The more inputs, the tighter the answer A few grossly uncertain inputs: A few grossly uncertain inputsA lot of grossly uncertain inputs...: Where does this surety come from? What justifies it? A lot of grossly uncertain inputs...“Smoke and mirrors” certainty: “Smoke and mirrors” certainty Probability makes certainty out of nothing It has an inadequate model of ignorance Probability bounds analysis gives a vacuous answer if all you give it are vacuous inputs Conclusions: Conclusions Interval analysis has an inadequate model of dependence Probability theory has an inadequate model of ignorance Probability bounds analysis corrects both and does things sensitivity studies cannot PBA is much simpler computationally than IPSoftware: Software RAMAS Risk Calc 4.0 StatTool Williamson and Downs (1990)Applications: Applications Superfund risk assessments for human health Ecological risk assessments, endangered spp. Sandia National Labs Climate modeling (PIK)Acknowledgments: Acknowledgments Vladik Kreinovich Roger Nelsen Dan Berleant Lev Ginzburg U.S. National Institutes of Health Sandia National LaboratoriesClassic references: Classic references Boole, G. 1854. An Investigation of the Laws of Thought, On Which Are Founded the Mathematical Theories of Logic and Probability. Walton and Maberly, London. Fréchet, M., 1935. Généralisations du théorème des probabilités totales. Fundamenta Mathematica 25: 379–387. Hailperin, T. 1986. Boole’s Logic and Probability. North-Holland, Amsterdam. Williamson, R.C. and T. Downs 1990. Probabilistic arithmetic I: numerical methods for calculating convolutions and dependency bounds. International Journal of Approximate Reasoning 4:89–158.Web-accessible reading: Web-accessible reading http://www.sandia.gov/epistemic/Reports/SAND2002-4015.pdf (introduction to p-boxes and related structures) http://www.ramas.com/pbawhite.pdf (introduction for Monte Carlo jocks) http://www.ramas.com/depend.pdf (handling dependencies in uncertainty calculations) http://www.ramas.com/bayes.pdf (Bayesian methods in risk analysis) http://www.ramas.com/intstats.pdf (statistics for data that may contain interval uncertainty) scott@ramas.com (other questions and comments welcome)End: End