# Artificial Intelligence Unit V : Uncertainity

Views:

Category: Education

## Presentation Description

Artificial Intelligence : Uncertainty

## Presentation Transcript

Uncertainty

### Outline:

Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule

### Uncertainty:

Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: partial observability (road state, other drivers' plans, etc.) noisy sensors (traffic reports) uncertainty in action outcomes (flat tire, etc.) immense complexity of modeling and predicting traffic Hence a purely logical approach either risks falsehood: “ A 25 will get me there on time”, or leads to conclusions that are too weak for decision making: “ A 25 will get me there on time if there's no accident on the bridge and it doesn't rain and my tires remain intact etc etc.” ( A 1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport …)

### Methods for handling uncertainty:

Methods for handling uncertainty Default or nonmonotonic logic: Assume my car does not have a flat tire Assume A 25 works unless contradicted by evidence Issues: What assumptions are reasonable? How to handle contradiction? Rules with fudge factors : A 25 | → 0.3 get there on time Sprinkler | → 0.99 WetGrass WetGrass | → 0.7 Rain Issues: Problems with combination, e.g., Sprinkler causes Rain ?? Probability Model agent's degree of belief Given the available evidence, A 25 will get me there on time with probability 0.04

### Probability:

Probability Probabilistic assertions summarize effects of laziness : failure to enumerate exceptions, qualifications, etc . ignorance : lack of relevant facts, initial conditions, etc . Subjective probability: Probabilities relate propositions to agent's own state of knowledge e.g., P(A 25 | no reported accidents) = 0.06 These are not assertions about the world Probabilities of propositions change with new evidence: e.g., P(A 25 | no reported accidents, 5 a.m.) = 0.15

### Making decisions under uncertainty:

Making decisions under uncertainty Suppose I believe the following: P(A 25 gets me there on time | …) = 0.04 P(A 90 gets me there on time | …) = 0.70 P(A 120 gets me there on time | …) = 0.95 P(A 1440 gets me there on time | …) = 0.9999 Which action to choose ? Depends on my preferences for missing flight vs. time spent waiting, etc. Utility theory is used to represent and infer preferences Decision theory = probability theory + utility theory

### Syntax:

Syntax Basic element: random variable Similar to propositional logic: possible worlds defined by assignment of values to random variables. Boolean random variables e.g., Cavity (do I have a cavity?) Discrete random variables e.g., Weather is one of < sunny,rainy,cloudy,snow > Domain values must be exhaustive and mutually exclusive Elementary proposition constructed by assignment of a value to a random variable: e.g., Weather = sunny , Cavity = false (abbreviated as  cavity ) Complex propositions formed from elementary propositions and standard logical connectives e.g., Weather = sunny  Cavity = false

### Syntax:

Syntax Atomic event : A complete specification of the state of the world about which the agent is uncertain E.g., if the world consists of only two Boolean variables Cavity and Toothache , then there are 4 distinct atomic events: Cavity = false  Toothache = false Cavity = false  Toothache = true Cavity = true  Toothache = false Cavity = true  Toothache = true Atomic events are mutually exclusive and exhaustive

### Axioms of probability:

Axioms of probability For any propositions A, B 0 ≤ P( A ) ≤ 1 P( true ) = 1 and P( false ) = 0 P( A  B ) = P( A ) + P( B ) - P( A  B )

### Prior probability:

Prior probability Prior or unconditional probabilities of propositions e.g., P( Cavity = true) = 0.1 and P( Weather = sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution gives values for all possible assignments : P ( Weather ) = <0.72,0.1,0.08,0.1> ( normalized , i.e., sums to 1) Joint probability distribution for a set of random variables gives the probability of every atomic event on those random variables P ( Weather,Cavity ) = a 4 × 2 matrix of values: Weather = sunny rainy cloudy snow Cavity = true 0.144 0.02 0.016 0.02 Cavity = false 0.576 0.08 0.064 0.08 Every question about a domain can be answered by the joint distribution

### Conditional probability:

Conditional probability Conditional or posterior probabilities e.g., P( cavity | toothache ) = 0.8 i.e., given that toothache is all I know (Notation for conditional distributions: P ( Cavity | Toothache ) = 2-element vector of 2-element vectors) If we know more, e.g., cavity is also given, then we have P( cavity | toothache,cavity ) = 1 New evidence may be irrelevant, allowing simplification, e.g., P( cavity | toothache, sunny ) = P( cavity | toothache ) = 0.8 This kind of inference, sanctioned by domain knowledge, is crucial

### Conditional probability:

Conditional probability Definition of conditional probability: P(a | b) = P(a  b) / P(b) if P(b) > 0 Product rule gives an alternative formulation: P(a  b) = P(a | b) P(b) = P(b | a) P(a) A general version holds for whole distributions, e.g., P ( Weather,Cavity ) = P ( Weather | Cavity ) P ( Cavity ) (View as a set of 4 × 2 equations, not matrix mult.) Chain rule is derived by successive application of product rule: P (X 1 , …,X n ) = P (X 1 ,...,X n-1 ) P (X n | X 1 ,...,X n-1 ) = P (X 1 ,...,X n-2 ) P (X n-1 | X 1 ,...,X n-2 ) P (X n | X 1 ,...,X n-1 ) = … = π i= 1 ^n P (X i | X 1 , … ,X i-1 )

### Inference by enumeration:

Inference by enumeration Start with the joint probability distribution: For any proposition φ , sum the atomic events where it is true: P( φ ) = Σ ω : ω╞φ P( ω )

### Inference by enumeration:

Inference by enumeration Start with the joint probability distribution: For any proposition φ , sum the atomic events where it is true: P( φ ) = Σ ω : ω╞φ P( ω ) P( toothache ) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2

### Inference by enumeration:

Inference by enumeration Start with the joint probability distribution: For any proposition φ , sum the atomic events where it is true: P( φ ) = Σ ω : ω╞φ P( ω ) P( toothache ) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2

### Inference by enumeration:

Inference by enumeration Start with the joint probability distribution : Can also compute conditional probabilities : P(  cavity | toothache ) = P(  cavity  toothache ) P( toothache ) = 0.016+0.064 0.108 + 0.012 + 0.016 + 0.064 = 0.4

### Normalization:

Normalization Denominator can be viewed as a normalization constant α P ( Cavity | toothache ) = α, P ( Cavity,toothache ) = α, [ P ( Cavity,toothache,catch ) + P ( Cavity,toothache ,  catch )] = α , [<0.108,0.016> + <0.012,0.064>] = α, <0.12,0.08> = <0.6,0.4> General idea: compute distribution on query variable by fixing evidence variables and summing over hidden variables

### Inference by enumeration, contd.:

Inference by enumeration, contd. Typically, we are interested in the posterior joint distribution of the query variables Y given specific values e for the evidence variables E Let the hidden variables be H = X - Y - E Then the required summation of joint entries is done by summing out the hidden variables: P ( Y | E = e ) = α P ( Y , E = e ) = α Σ h P ( Y , E = e , H = h ) The terms in the summation are joint entries because Y , E and H together exhaust the set of random variables Obvious problems : Worst-case time complexity O( d n ) where d is the largest arity Space complexity O( d n ) to store the joint distribution How to find the numbers for O( d n ) entries?

### Independence:

Independence A and B are independent iff P ( A|B ) = P ( A ) or P ( B|A ) = P ( B ) or P (A, B) = P ( A ) P ( B ) P ( Toothache, Catch, Cavity, Weather ) = P ( Toothache, Catch, Cavity ) P ( Weather ) 32 entries reduced to 12; for n independent biased coins, O(2 n ) → O(n ) Absolute independence powerful but rare Dentistry is a large field with hundreds of variables, none of which are independent. What to do ?

### Conditional independence:

Conditional independence P ( Toothache, Cavity, Catch ) has 2 3 – 1 = 7 independent entries If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache : (1) P ( catch | toothache, cavity ) = P ( catch | cavity ) The same independence holds if I haven't got a cavity : (2) P ( catch | toothache,  cavity ) = P ( catch |  cavity ) Catch is conditionally independent of Toothache given Cavity : P ( Catch | Toothache,Cavity ) = P ( Catch | Cavity ) Equivalent statements: P ( Toothache | Catch, Cavity ) = P ( Toothache | Cavity ) P ( Toothache, Catch | Cavity ) = P ( Toothache | Cavity ) P ( Catch | Cavity )

### Conditional independence contd.:

Conditional independence contd. Write out full joint distribution using chain rule : P ( Toothache, Catch, Cavity ) = P ( Toothache | Catch, Cavity ) P ( Catch, Cavity ) = P ( Toothache | Catch, Cavity ) P ( Catch | Cavity ) P ( Cavity ) = P ( Toothache | Cavity ) P ( Catch | Cavity ) P (Cavity) I.e., 2 + 2 + 1 = 5 independent numbers In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n . Conditional independence is our most basic and robust form of knowledge about uncertain environments .

### Bayes' Rule:

Bayes' Rule Product rule P( a  b ) = P(a | b) P(b) = P(b | a) P(a )  Bayes ' rule: P(a | b) = P(b | a) P(a) / P(b) or in distribution form P (Y|X) = P (X|Y) P (Y) / P (X) = α P (X|Y) P (Y) Useful for assessing diagnostic probability from causal probability : P( Cause|Effect ) = P( Effect|Cause ) P(Cause) / P(Effect ) E.g., let M be meningitis, S be stiff neck : P( m|s ) = P( s|m ) P(m) / P(s) = 0.8 × 0.0001 / 0.1 = 0.0008 Note: posterior probability of meningitis still very small !

### Bayes' Rule and conditional independence:

Bayes' Rule and conditional independence P ( Cavity | toothache  catch ) = α P ( toothache  catch | Cavity ) P ( Cavity ) = α P ( toothache | Cavity ) P ( catch | Cavity ) P ( Cavity ) This is an example of a naïve Bayes model: P (Cause,Effect 1 , … ,Effect n ) = P (Cause) π i P (Effect i |Cause) Total number of parameters is linear in n

### Summary:

Summary Probability is a rigorous formalism for uncertain knowledge Joint probability distribution specifies probability of every atomic event Queries can be answered by summing over atomic events For nontrivial domains, we must find a way to reduce the joint size Independence and conditional independence provide the tools