JVM models in ACL2 : JVM models in ACL2 Hanbing Liu
9/17/2002
Overview: modeling the JVM: Overview: modeling the JVM Implementer's view of the JVM
~ 200 instructions + a set of standard API functions.
~ 40 of those API functions have to be a part of ('native' to) the JVM implementation. (referring to the KVM, a JVM for small devices)
(e.g. To start a thread need an API function.
It can’t be done with any JVM instruction)
Modeling the JVM
M3 basic interpreter loop , ~25 instructions. 26KB
M5 concurrent execution with threads, ~125 instructions 100KB
'M6' not complete yet. Currently have ~125 instruction. 300KB
Dynamic class loading, class initialization, interfaces, exceptions.
Related work: the jvm2acl2 tool, the bytecode verifier.
Maps for the talk : Maps for the talk
M3: M3
Call frames, operand stack, locals: Call frames, operand stack, locals Call stack: a list of call frames.
Record the invocation history
In M3, M5, a call frame records current PC, operand stacks (as list of values), locals, the instructions associated with the program, etc.
In M6, a 'return' address in place of current PC; a method pointer instead of a list instructions.
Operand stack:
Local variables:
in M3, it is an association list, indexed on variable names
In M5, M6, an 'array' indexed by positions.
Example: execute-PUSH: Example: execute-PUSH
Class and method resolution : Class and method resolution Class resolution:
Given class name, find the class representation.
Class representation is used for creating objects of that type.
In M3, all class representations are in the class table.
(assoc-equal classname (class-table s))
– the first field of class representation is the class name
In a standard JVM, if not found, the default class loader is used to create class representation from .class files.
Method resolution
Given method signature, find the method representation.
For the factorial program, method signature is represented as
('Fact' 'fact' 1) in M3, M5
‘(methodCP 'fact' 'Fact' (int) int) in M6.
Example: execute-INVOKEVIRTUAL: Example: execute-INVOKEVIRTUAL (defun execute-INVOKEVIRTURAL (inst s) ;; M3 doesn’t have invokestatic, use invokevirtual as example
(let* ((method-name (arg2 inst))
(nformals (arg3 inst))
(obj-ref (nth nformals (stack s)))
(obj-class-name (class-name-of-ref obj-ref (heap s)))
(closest-method
(lookup-method method-name
obj-class-name ;; note classname specified in the instruction is not used.
(class-table s)))
(vars (cons 'this (method-formal-names closest-method)))
(prog (method-program closest-method))
(s1 (modify s
:pc (+ 1 (pc s))
:stack (popn (len vars) (stack s)))))
(if (null obj-ref)
(modify s :status :ERROR)
(modify s1
:call-stack
(push (make-frame 0
(reverse
(bind-formals (reverse vars)
(stack s)))
General approaches for establishing the correctness of a M3 program : General approaches for establishing the correctness of a M3 program Guess an strong enough invariant
Let INV be
(implies (poised-for-execute-program s)
(equal (run s (program-clock s))
(program-effect s)))
Establish the INV, then prove INV implies the final goal.
Define a clock function, an effect function
The effect function captures the effects of running the bytecode program on the JVM state.
The clock function, compute how many clocks are required to execute an method or complete a loop.
J’s framework in writing clock functions : J’s framework in writing clock functions Clock functions use c+, a special kind of + to compute the sum of two integers.
ACL2 are configured to expand
(run s0 (c+ C X)) into
(run s1 X), where s1= (step (step … s0…)),
depth of C, when C is a constant
The structure of the clock function reflects the structures of paths through the corresponding bytecode program.
Example: Writing the clock function: Example: Writing the clock function
Other work on the M3: Other work on the M3 The bytecode verifier work (not finished yet)
Two model of M3,
one 'defensive' JVM check every possible error at runtime. The other 'standard' JVM, only checks for runtime errors.
Goal to prove
If the bytecode verifier returns 'yes' on the program P in state S.
'standard' JVM == 'defensive' JVM
when JVMs execute P from state S.
M5: M5 Models around 125 JVM instructions
Implements Java APIs for start, stop threads
Implements locks for synchronization between threads
The jvm2acl2 tool will translate .class file into a format readable by M5
M5: M5 M5 interpreter top-level loop
(defun run (sched s)
(if (endp sched)
s
(run (cdr sched) (step (car sched) s)))))
(defun step (th s)
(if (equal (call-stack-status th s) ‘READY) ;; note 1
(do-inst (next-inst th s) th s) ;; note 2
s))
A thread could be in the not-READY state, either because it hasn’t started, or because some other thread call the thread stop method on it.
Both next-inst and do-inst have an extra argument th current thread.
New thread are created when execute-NEW is asked to created an object of either java.lang.Thread type or its subclasses.
Programmer can start and stop the thread by sending start, stop messages to the object created.
Note: according to the JVM spec, when thread stop is called, it must release all the monitor it is holding. Current M5 didn’t do so.
M5 Synchronization: M5 Synchronization
Example: execute-MONITORENTER: Example: execute-MONITORENTER (defun execute-MONITORENTER (inst th s)
(let* ((obj-ref (top (stack (top-frame th s))))
(instance (deref obj-ref (heap s))))
(cond ((objectLockable? instance th)
(modify th s
:pc (+ (inst-length inst) (pc (top-frame th s)))
:stack (pop (stack (top-frame th s)))
:heap (lock-object th obj-ref (heap s))))
(t s))))
Simple monitor implementation: if not lockable, remain in the same state.
In a more efficient JVM implementation, if not lockable, the thread is moving into a waiting queue associated with the monitor, pc is advanced. when the thread is removed from the waiting queue, it is guaranteed to have the lock.
Tricky part: thread must keep track of how many times it enters the same monitor. Thread should only really the monitor when it no longer hold the monitor.
Lock-object: Lock-object
(defun lock-object (th obj-ref heap)
(let* ((obj-ref-num (cadr obj-ref))
(instance (binding (cadr obj-ref) heap))
(obj-fields (binding 'java.lang.Object' instance))
(new-mcount (+ 1 (binding 'mcount' obj-fields)))
(new-obj-fields
(bind 'monitor' th
(bind 'mcount' new-mcount obj-fields)))
(new-object (bind 'java.lang.Object' new-obj-fields instance)))
(bind obj-ref-num new-object heap)))
Example: execute-RETURN: Example: execute-RETURN (defun execute-RETURN (inst th s)
(declare (ignore inst))
(let* ((obj-ref (nth 0 (locals (top-frame th s))))
(sync-status (sync-flg (top-frame th s)))
(class (cur-class (top-frame th s)))
(ret-ref (class-decl-heapref (bound? class (class-table s))))
(new-heap (cond ((equal sync-status 'LOCKED)
(unlock-object th obj-ref (heap s)))
((equal sync-status 'S_LOCKED)
(unlock-object th ret-ref (heap s)))
(t (heap s)))))
(modify th s
:call-stack (pop (call-stack th s))
:heap new-heap)))
Talk about synchronized method. Implicit monitorenter and monitorexit
Issues on proving theorems about multithreaded program correct: Issues on proving theorems about multithreaded program correct J mentioned once that
It is hard to write a state invariant that characterizes the all possible inter-leavings of the thread execution.
Possible project suggested: Reduction techniques
define a JVM that executes a sequence of 'local state' modification instructions atomically. This will reduce the possible interleaving.
We need to proving any run on a JVM with normal interleaving can be mapped to some run on this special JVM.
Issues on approaches for proving programs correct: Issues on approaches for proving programs correct Reasoning about programs with complex synchronizations
How clock functions could help? What forms a clock function would take?
First challenge:
Characterize a general notion of non-interacting programs. Reduce interleaving of executions of non-interacting programs to executions of the each of the programs one by one?
M6: M6 M6 class file format
Class loader
Class initialization
Monitor implementation
Exception handling
Supports for Interfaces
Native APIs
Summary of the JVM models: Summary of the JVM models M3
Simple. Good for verifying algorithmic programs. No threads.
M5
Need to research on ways to reason about concurrent programs.
Limited features of the Java, JVM is modeled. No interfaces.
M6
Many features modeled realistically.
In order to run Java programs, especially standard API that are written in Java. The M6 is very complicated. Maybe only good for study JVM properties, instead of JVM programs
Those models are available from separate directories under /projects/jvm.
Type co –r /the/path/to/it/* you will get a copy of those files in your current directory.