# pldi04

Views:

## Presentation Description

No description available.

## Presentation Transcript

### Cloning-Based Context-Sensitive Pointer Alias Analysis using BDDs:

Cloning-Based Context-Sensitive Pointer Alias Analysis using BDDs John Whaley Monica Lam Stanford University June 10, 2004

### Unification vs. Inclusion:

Unification vs. Inclusion Earlier scalable pointer analysis was context-insensitive unification-based [Steensgaard ’96] Pointers are either unaliased or point to the same set of objects. Near-linear, but VERY imprecise. Inclusion-based pointer analysis Can point to overlapping sets of objects. Closure calculation is O(n3) Various optimizations [Fahndrich,Su,Heintze,…] BDD formulation, simple, scalable [Berndl,Zhu]

### Context Sensitivity:

Context Sensitivity Context sensitivity is important for precision. Unrealizable paths. Object id(Object x) { return x; } a = id(b); c = id(d);

### Context Sensitivity:

Object id(Object x) { return x; } Object id(Object x) { return x; } Context Sensitivity Context sensitivity is important for precision. Unrealizable paths. Conceptually give each caller its own copy. a = id(b); c = id(d);

### Summary-Based Analysis:

Summary-Based Analysis Popular method for context sensitivity. Two phases: Bottom-up: Summarize effects of methods. Top-down: Propagate information down. Problems: Difficult to summarize pointer analysis. Summary-based analysis using BDD: not shown to scale [Zhu’02] Queries (e.g. which context points to x) require expanding an exponential number of contexts.

### Cloning-Based Analysis:

Cloning-Based Analysis Simple brute force technique. Clone every path through the call graph. Run context-insensitive algorithm on expanded call graph. The catch: exponential blowup

### Cloning is exponential!:

Cloning is exponential!

### Recursion:

Recursion Actually, cloning is unbounded in the presence of recursive cycles. Technique: We treat all methods within a strongly-connected component as a single node.

### Recursion:

Recursion A G B C D E F

### Top 20 Sourceforge Java Apps:

Top 20 Sourceforge Java Apps 1016 1012 108 104 100

### Cloning is infeasible (?):

Cloning is infeasible (?) Typical large program has ~1014 paths If you need 1 byte to represent a clone: Would require 256 terabytes of storage Registered ECC 1GB DIMMs: \$98.6 million Power: 96.4 kilowatts = Power for 128 homes 300 GB hard disks: 939 x \$250 = \$234,750 Time to read (sequential): 70.8 days Seems unreasonable!

### BDD comes to the rescue:

BDD comes to the rescue There are many similarities across contexts. Many copies of nearly-identical results. BDDs can represent large sets of redundant data efficiently. Need a BDD encoding that exploits the similarities.

### Contribution (1):

Contribution (1) Can represent context-sensitive call graph efficiently with BDDs and a clever context numbering scheme Inclusion-based pointer analysis 1014 contexts, 19 minutes Generates all answers

### Contribution (2) :

Contribution (2) BDD hacking is complicated  bddbddb (BDD-based deductive database) Pointer algorithm in 6 lines of Datalog Automatic translate into efficient BDD implementation 10x performance over hand-tuned solver (2164 lines of Java)

### Contribution (3):

Contribution (3) bddbddb: General Datalog solver Supports simple declarative queries Easy use of context-sensitive pointer results Simple context-sensitive analyses: Escape analysis Type refinement Side effect analysis Many more presented in the paper

### Context-sensitive call graphs in BDD:

Context-sensitive call graphs in BDD

### Call graph relation:

Call graph relation Call graph expressed as a relation. Five edges: Calls(A,B) Calls(A,C) Calls(A,D) Calls(B,D) Calls(C,D) B D C A

### Call graph relation:

Call graph relation Relation expressed as a binary function. A=00, B=01, C=10, D=11 B D C A 00 10 01 11

### Binary Decision Diagrams:

Binary Decision Diagrams Graphical encoding of a truth table. x2 x4 x3 x3 x4 x4 x4 0 0 0 1 0 0 0 0 x2 x4 x3 x3 x4 x4 x4 0 1 1 1 0 0 0 1 x1 0 edge 1 edge

### Binary Decision Diagrams:

Binary Decision Diagrams Collapse redundant nodes. x2 x4 x3 x3 x4 x4 x4 0 0 0 0 0 0 0 x2 x4 x3 x3 x4 x4 x4 0 0 0 0 x1 1 1 1 1 1

### Binary Decision Diagrams:

Binary Decision Diagrams Collapse redundant nodes. x2 x4 x3 x3 x4 x4 x4 x2 x4 x3 x3 x4 x4 x4 0 x1 1

### Binary Decision Diagrams:

Binary Decision Diagrams Collapse redundant nodes. x2 x4 x3 x3 x2 x3 x3 x4 x4 0 x1 1

### Binary Decision Diagrams:

Binary Decision Diagrams Collapse redundant nodes. x2 x4 x3 x3 x2 x3 x4 x4 0 x1 1

### Binary Decision Diagrams:

Binary Decision Diagrams Eliminate unnecessary nodes. x2 x4 x3 x3 x2 x3 x4 x4 0 x1 1

### Binary Decision Diagrams:

Binary Decision Diagrams Eliminate unnecessary nodes. x2 x3 x2 x3 x4 0 x1 1

### Binary Decision Diagrams:

Binary Decision Diagrams Size is correlated to amount of redundancy, NOT size of relation. As the set gets larger, the number of don’t-care bits increases, leading to fewer necessary nodes.

### Expanded Call Graph:

Expanded Call Graph A D B C E F G H 0 1 2 0 1 2 2 1 0

### Numbering Clones:

Numbering Clones A D B C E F G H A D B C E F G H E E F F G G H H H H H 0 0 0 0 1 2 0-2 0-2 0-2 3-5 0 0 0 0 1 2 0 1 2 2 1 0 0 1 2 3 4 5 0

Pointer Analysis

### Pointer Analysis Example:

Pointer Analysis Example h1: v1 = new Object(); h2: v2 = new Object(); v1.f = v2; v3 = v1.f; Input Relations vPointsTo(v1,h1) vPointsTo(v2,h2) Store(v1,f,v2) Load(v1,f,v3) Output Relations hPointsTo(h1,f,h2) vPointsTo(v3,h2) v1 h1 v2 h2 f v3

### Inference Rule in Datalog:

hPointsTo(h1, f, h2) :- Store(v1, f, v2), vPointsTo(v1, h1), vPointsTo(v2, h2). v1 h1 v2 h2 f Inference Rule in Datalog v1.f = v2; Stores:

### Context-sensitive pointer analysis:

Context-sensitive pointer analysis Compute call graph with context-insensitive pointer analysis. Datalog rules for: assignments, loads, stores discover call targets, bind parameters type filtering Apply rules until fix-point reached. Compute expanded call graph relation. Apply context-insensitive algorithm to expanded call graph.

### bddbddb:BDD-Based Deductive DataBase:

bddbddb: BDD-Based Deductive DataBase

### Datalog:

Datalog Declarative logic programming language designed for databases Horn clauses Operates on relations Datalog is expressive Relational algebra: Explicitly specify relational join, project, rename. Relational calculus: Specify relations between variables; operations are implicit. Datalog: Allows recursively-defined relations.

### Datalog  BDD:

Datalog  BDD Join, project, rename are directly mapped to built-in BDD operations Automatically optimizes: Rule application order Incrementalization Variable ordering BDD parameter tuning Many more…

### Experimental Results:

Experimental Results

### Experimental Results:

Experimental Results Top 20 Java projects on SourceForge Real programs with 100K+ users each Using automatic bddbddb solver Each analysis only a few lines of code Easy to try new algorithms, new queries Test system: Pentium 4 2.2GHz, 1GB RAM RedHat Fedora Core 1, JDK 1.4.2_04, javabdd library, Joeq compiler

### Multi-type variables:

Multi-type variables A variable is multi-type if it can point to objects of different types. Measure of analysis precision One line in Datalog Two ways of handling context sensitivity: Projected: Merge all contexts together Full: Keep each context separate

### Related Work:

Related Work Context-insensitive pointer analysis Steensgaard: Unification-based (POPL’96) Andersen: Inclusion-based (’94) Optimizations: too many to list Berndl: formulate in BDD (PLDI’03) Das: one-level-flow (PLDI’00) Hybrid unification/inclusion

### Related Work:

Related Work Scalable context-sensitive pointer analysis Fähndrich etal, instantiation constraints (PLDI’00) CFL-reachability Unification-based: Imprecise. Handles recursion well. Computes on-demand. GOLF: Das etal. (SAS’01) One level of context sensitivity. Foster, Fahndrich, Aiken (SAS’00) Throws away information. Wilson andamp; Lam: PTF (PLDI’95) Doesn't really scale (especially complexity)

### Related Work:

Related Work Whaley andamp; Rinard: Escape analysis (OOPSLA’99) Compositional summaries: only weak updates. Achieves scalability by collapsing escaped nodes. Emami andamp; Hendren: Invocation graphs (PLDI’94) Only shown to scale to 8K lines. Zhu andamp; Calman: (PLDI’04) To be presented next in this session. More complete coverage in the paper.

### Conclusion:

Conclusion The first scalable context-sensitive inclusion-based pointer analysis. Achieves context sensitivity by cloning. bddbddb: Datalog  efficient BDD Easy to query results, develop new analyses. Very efficient! andlt;19 minutes, andlt;600mb on largest benchmark. Complete system is publically available at: http://suif.stanford.edu/bddbddb