Cloning-Based Context-Sensitive Pointer Alias Analysis using BDDs : Cloning-Based Context-Sensitive Pointer Alias Analysis using BDDs John Whaley
Monica Lam
Stanford University June 10, 2004
Unification vs. Inclusion : Unification vs. Inclusion Earlier scalable pointer analysis was context-insensitive unification-based [Steensgaard ’96]
Pointers are either unaliased or point to the same set of objects.
Near-linear, but VERY imprecise.
Inclusion-based pointer analysis
Can point to overlapping sets of objects.
Closure calculation is O(n3)
Various optimizations [Fahndrich,Su,Heintze,…]
BDD formulation, simple, scalable [Berndl,Zhu]
Context Sensitivity : Context Sensitivity Context sensitivity is important for precision.
Unrealizable paths. Object id(Object x) { return x; } a = id(b); c = id(d);
Context Sensitivity : Object id(Object x) { return x; } Object id(Object x) { return x; } Context Sensitivity Context sensitivity is important for precision.
Unrealizable paths.
Conceptually give each caller its own copy. a = id(b); c = id(d);
Summary-Based Analysis : Summary-Based Analysis Popular method for context sensitivity.
Two phases:
Bottom-up: Summarize effects of methods.
Top-down: Propagate information down.
Problems:
Difficult to summarize pointer analysis.
Summary-based analysis using BDD: not shown to scale [Zhu’02]
Queries (e.g. which context points to x) require expanding an exponential number of contexts.
Cloning-Based Analysis : Cloning-Based Analysis Simple brute force technique.
Clone every path through the call graph.
Run context-insensitive algorithm on expanded call graph.
The catch: exponential blowup
Cloning is exponential! : Cloning is exponential!
Recursion : Recursion Actually, cloning is unbounded in the presence of recursive cycles.
Technique: We treat all methods within a strongly-connected component as a single node.
Recursion : Recursion A G B C D E F
Top 20 Sourceforge Java Apps : Top 20 Sourceforge Java Apps 1016
1012
108
104
100
Cloning is infeasible (?) : Cloning is infeasible (?) Typical large program has ~1014 paths
If you need 1 byte to represent a clone:
Would require 256 terabytes of storage
Registered ECC 1GB DIMMs: $98.6 million
Power: 96.4 kilowatts = Power for 128 homes
300 GB hard disks: 939 x $250 = $234,750
Time to read (sequential): 70.8 days
Seems unreasonable!
BDD comes to the rescue : BDD comes to the rescue There are many similarities across contexts.
Many copies of nearly-identical results.
BDDs can represent large sets of redundant data efficiently.
Need a BDD encoding that exploits the similarities.
Contribution (1) : Contribution (1) Can represent context-sensitive call graph efficiently with BDDs and a clever context numbering scheme
Inclusion-based pointer analysis
1014 contexts, 19 minutes
Generates all answers
Contribution (2) : Contribution (2) BDD hacking is complicated
bddbddb (BDD-based deductive database)
Pointer algorithm in 6 lines of Datalog
Automatic translate into efficient BDD implementation
10x performance over hand-tuned solver (2164 lines of Java)
Contribution (3) : Contribution (3) bddbddb: General Datalog solver
Supports simple declarative queries
Easy use of context-sensitive pointer results
Simple context-sensitive analyses:
Escape analysis
Type refinement
Side effect analysis
Many more presented in the paper
Context-sensitive call graphs in BDD : Context-sensitive call graphs in BDD
Call graph relation : Call graph relation Call graph expressed as a relation.
Five edges:
Calls(A,B)
Calls(A,C)
Calls(A,D)
Calls(B,D)
Calls(C,D) B D C A
Call graph relation : Call graph relation Relation expressed as a binary function.
A=00, B=01, C=10, D=11 B D C A 00 10 01 11
Binary Decision Diagrams : Binary Decision Diagrams Graphical encoding of a truth table. x2 x4 x3 x3 x4 x4 x4 0 0 0 1 0 0 0 0 x2 x4 x3 x3 x4 x4 x4 0 1 1 1 0 0 0 1 x1 0 edge 1 edge
Binary Decision Diagrams : Binary Decision Diagrams Collapse redundant nodes. x2 x4 x3 x3 x4 x4 x4 0 0 0 0 0 0 0 x2 x4 x3 x3 x4 x4 x4 0 0 0 0 x1 1 1 1 1 1
Binary Decision Diagrams : Binary Decision Diagrams Collapse redundant nodes. x2 x4 x3 x3 x4 x4 x4 x2 x4 x3 x3 x4 x4 x4 0 x1 1
Binary Decision Diagrams : Binary Decision Diagrams Collapse redundant nodes. x2 x4 x3 x3 x2 x3 x3 x4 x4 0 x1 1
Binary Decision Diagrams : Binary Decision Diagrams Collapse redundant nodes. x2 x4 x3 x3 x2 x3 x4 x4 0 x1 1
Binary Decision Diagrams : Binary Decision Diagrams Eliminate unnecessary nodes. x2 x4 x3 x3 x2 x3 x4 x4 0 x1 1
Binary Decision Diagrams : Binary Decision Diagrams Eliminate unnecessary nodes. x2 x3 x2 x3 x4 0 x1 1
Binary Decision Diagrams : Binary Decision Diagrams Size is correlated to amount of redundancy, NOT size of relation.
As the set gets larger, the number of don’t-care bits increases, leading to fewer necessary nodes.
Expanded Call Graph : Expanded Call Graph A D B C E F G H 0 1 2 0 1 2 2 1 0
Numbering Clones : Numbering Clones A D B C E F G H A D B C E F G H E E F F G G H H H H H 0 0 0 0 1 2 0-2 0-2 0-2 3-5 0 0 0 0 1 2 0 1 2 2 1 0 0 1 2 3 4 5 0
Pointer Analysis : Pointer Analysis
Pointer Analysis Example : Pointer Analysis Example
h1: v1 = new Object();
h2: v2 = new Object();
v1.f = v2;
v3 = v1.f; Input Relations
vPointsTo(v1,h1)
vPointsTo(v2,h2)
Store(v1,f,v2)
Load(v1,f,v3)
Output Relations
hPointsTo(h1,f,h2)
vPointsTo(v3,h2) v1 h1 v2 h2 f v3
Inference Rule in Datalog : hPointsTo(h1, f, h2) :- Store(v1, f, v2), vPointsTo(v1, h1), vPointsTo(v2, h2). v1 h1 v2 h2 f Inference Rule in Datalog v1.f = v2;
Stores:
Context-sensitive pointer analysis : Context-sensitive pointer analysis Compute call graph with context-insensitive pointer analysis.
Datalog rules for:
assignments, loads, stores
discover call targets, bind parameters
type filtering
Apply rules until fix-point reached.
Compute expanded call graph relation.
Apply context-insensitive algorithm to expanded call graph.
bddbddb:BDD-Based Deductive DataBase : bddbddb: BDD-Based Deductive DataBase
Datalog : Datalog Declarative logic programming language designed for databases
Horn clauses
Operates on relations
Datalog is expressive
Relational algebra:
Explicitly specify relational join, project, rename.
Relational calculus:
Specify relations between variables; operations are implicit.
Datalog:
Allows recursively-defined relations.
Datalog BDD : Datalog BDD Join, project, rename are directly mapped to built-in BDD operations
Automatically optimizes:
Rule application order
Incrementalization
Variable ordering
BDD parameter tuning
Many more…
Experimental Results : Experimental Results
Experimental Results : Experimental Results Top 20 Java projects on SourceForge
Real programs with 100K+ users each
Using automatic bddbddb solver
Each analysis only a few lines of code
Easy to try new algorithms, new queries
Test system:
Pentium 4 2.2GHz, 1GB RAM
RedHat Fedora Core 1, JDK 1.4.2_04, javabdd library, Joeq compiler
Slide38 :
Slide39 :
Multi-type variables : Multi-type variables A variable is multi-type if it can point to objects of different types.
Measure of analysis precision
One line in Datalog
Two ways of handling context sensitivity:
Projected: Merge all contexts together
Full: Keep each context separate
Slide41 :
Related Work : Related Work Context-insensitive pointer analysis
Steensgaard: Unification-based (POPL’96)
Andersen: Inclusion-based (’94)
Optimizations: too many to list
Berndl: formulate in BDD (PLDI’03)
Das: one-level-flow (PLDI’00)
Hybrid unification/inclusion
Related Work : Related Work Scalable context-sensitive pointer analysis
Fähndrich etal, instantiation constraints (PLDI’00)
CFL-reachability
Unification-based: Imprecise.
Handles recursion well.
Computes on-demand.
GOLF: Das etal. (SAS’01)
One level of context sensitivity.
Foster, Fahndrich, Aiken (SAS’00)
Throws away information.
Wilson andamp; Lam: PTF (PLDI’95)
Doesn't really scale (especially complexity)
Related Work : Related Work Whaley andamp; Rinard: Escape analysis (OOPSLA’99)
Compositional summaries: only weak updates.
Achieves scalability by collapsing escaped nodes.
Emami andamp; Hendren: Invocation graphs (PLDI’94)
Only shown to scale to 8K lines.
Zhu andamp; Calman: (PLDI’04)
To be presented next in this session.
More complete coverage in the paper.
Conclusion : Conclusion The first scalable context-sensitive inclusion-based pointer analysis.
Achieves context sensitivity by cloning.
bddbddb: Datalog efficient BDD
Easy to query results, develop new analyses.
Very efficient!
andlt;19 minutes, andlt;600mb on largest benchmark.
Complete system is publically available at: http://suif.stanford.edu/bddbddb