logging in or signing up back to the moon Dionigi Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 307 Category: Education License: All Rights Reserved Like it (1) Dislike it (0) Added: January 11, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Back to the Moon: Back to the Moon The Verification of a Small Microprocessor's Logic Design A Small Microprocessor for What?: A Small Microprocessor for What? Lunar Orbiter (LRO), scheduled launch 2008, with multiple scientific instruments One of these is a Laser Altimeter, hence the name “LOLA” Laser altimetry produces very detailed and precise geodetic maps to aid establishment of a permanent base (why not “selenodetic”??) Each instrument has at least one embedded control microprocessorMicroprocessor design criteria: Microprocessor design criteria Radiation hardening for high endurance High performance to stay a step ahead of a rapid-cycling instrument Simple well-understood architecture Appropriate to embedded controller paradigm No hardware multiply or divide required Simple programs are reliable programs Straightforward assembly-language or C programming—no operating system!Technology & Architecture: Technology & Architecture Gate arrays fulfill the criteria and support any desired architecture We created the “80k85”—what’s that? Based quite closely on the old Intel 8085! Simple instructions assure quick interrupt response Not RISC, but uses limited real estate of gates An instruction set of known “completeness” Availability of established tools Assemblers, simulators, C compiler Exploits skill set of embedded-controller artisans Unimplemented op codes cause a special trapProcessor Design Verification: Processor Design Verification Every processor design needs verification Even the best have stumbled on this point! IBM System 4π AP-101 (Space Shuttle GPC) 4πs had been used in earlier aircraft & spacecraft AP-101 was a special variant for the Shuttle Not quite as off-the-shelf as everyone wanted to believe Intel Pentium FPU for P6 core, 1994 A determined effort to speed up long floating divide Verifying high-precision arithmetic is a challenge! http://www.maa.org/mathland/mathland_5_12.html 1802 Microprocessor (1986) register interactionIBM AP-101 Long Divide: IBM AP-101 Long Divide Floating point arithmetic was specified to match results obtained by System 360 Remove doubts about fidelity of GPC results to those of 360s in JSC control center Original divide design too slow for Shuttle use Designed, in conservative TTL circuit technology, to be interruptible after development of each quotient bit Last-minute redesign (AP-101B) solved the problem Much later “improved” AP-101S divide not well verified! http://klabs.org/richcontent/software_content/hal_s/hal-s_compiler_system_specification.pdf 3-16: “DED and DEDR instructions are broken on the AP-101S” 6-12: “I2DEDR was substituted for DEDR in DMOD in order to avoid incorrect results caused by some inputs. See CR11164 and DR106660.”IBM AP-101 Long Divide Cont’d: IBM AP-101 Long Divide Cont’d DED: Double Exponential (floating) Divide DEDR: same but gets divisor from register Both work for most inputs, but … “Difficult to define” which inputs don’t work(!!) However, OK if low word of divisor = 0 (D’oh!) DMOD (remainder “modulo” function): Is only user of these instructions, per audit All uses OK, per the above “however” rule I2DEDR substituted anyway, just in case IBM AP-101 Long Divide Cont’d: IBM AP-101 Long Divide Cont’d More general remedies: Modify compiler to avoid DED and DEDR Document problem in Principles of Operation Conclusion with Nasty Suspicion: Did “Process” fail to operate at proper time? Developers may have found and worked around the problem without generating DR How can all that vector/matrix code never divide?? DR, audit, etc. may have been “after the fact”Intel Pentium Long Divide: Intel Pentium Long Divide Moore’s Law works better for component density than for processing speed Complex special-casing, with table look-ups, for certain ranges of input values Intel failed to proof-read a table in a PLA! Verification of combinations of high-precision numbers cannot be exhaustive Even if Intel could have tested one combination of input values every microsecond, the exhaustive test would take O(10^30) years (cf. age of universe = O(10^10) years)1802 Microprocessor (1986): 1802 Microprocessor (1986) Not a logic design problem High byte of a register sometimes writes over high byte of Program Counter Dependent on electrical design factors Voltage and temperature toleration Length of polysilicon lines Presence of many ones in other registers A program like Smalley3 could have exposed it (looping through voltage & temp ranges)The 80k85 Verification Challenge: The 80k85 Verification Challenge Two words: “Rigorous” and “Thorough” Exhaustive inputs test almost possible 8085/80k85 word length is only 8 bits, but: The 16-bit precision inputs to instruction DAD (Double-precision register add) would take days to execute an exhaustive test That’s 2^32 combinations, O(10^10) So why not “suck it up” and spend the days? A third word: “Looping” (for margin testing)A Historical Parallel from Apollo: A Historical Parallel from Apollo The Block I and Block II Apollo Guidance Computers (AGC) each needed one self-test program for two purposes Enhancement of manual design verification Assurance that all features are still working Ed Smalley of MIT Instrumentation Lab wrote those two programs Some feedback to design: inclusion of an instruction to perform interrupt (EDRUPT)Exploiting the Parallel Further: Exploiting the Parallel Further Like the AGC models when Ed Smalley began his two tasks, the 80k85 was not quite a “newborn” when I began mine Both machines had a considerable track record of executing a few programs correctly All we needed was “rigorous” and “thorough” In Ed’s honor, I named my 80k85 self-check program Smalley3Overview of 80k85 Architecture: Overview of 80k85 Architecture Addressing by byte (65,536 bytes of RAM) Central registers and register pairs: Accumulator A 4 general registers B,C,D,E, sometimes as 2 pairs 2 indirect addressing (or general) registers H,L Program Counter PC and Stack Pointer SP (pairs) Special: condition flags; interrupt mask Accumulator and flags sometimes function as a register pair called Program Status Word (PSW) 256 one-byte I/O ports: 128 input and 128 outputOverview of 80k85 Instructions: Overview of 80k85 Instructions First (often only) byte divided by Huffman coding into as many as 3 fields Extreme case: MOV with 2-bit op, 3-bit destination tag, and 3-bit source tag—56 nontrivial functionalities Ignoring those subdivisions, 245 valid ops All the valid 8085 ops except DAA (Decimal Adjust) Interrupt masking feature of 8085 omitted 70 distinct instructions functionally Four interrupts All but one of the 8085 interrupts are implemented All interrupts have the same priority (unlike 8085)Phased Development Plan: Phased Development Plan Objective: capability to test some ops before complete test is ready Generally, early releases of Smalley3 tested simpler instructions Later ones: more complex or involving parts of 80k85 design still subject to change, especially I/O ports However, no rule that each instruction has to be tested using only simpler ones Couldn’t achieve that rigorously anyway Final phase: general RAM corruption detectorFunctional Groups of Instructions: Functional Groups of Instructions NOP and single-byte transfers Double-byte transfers Single-byte arithmetic binary operations Double-byte arithmetic binary operations Single-byte Boolean binary operations Assorted unary operations Transfers of control (except HLT) Stack operations Data input & output operations Interrupt management and illegal op codesTop-Level Design of Smalley3: Top-Level Design of Smalley3 Perform entire test “cycle” just once, or: Stated number of times Indefinitely (until failure or manually stopped) A Test Cycle is any subset of the 10 functional groups In each functional group, test any subset of its distinct ops For each distinct op, test any subset of its “parametric variations” (defined by all 8 bits) For each variation, test against 16 systematic data value sets and from 1 to 239 pseudo-random data sets Run RAM corruption check at any of the above levels … or just once, at end of run whether good or bad Any failure stops the test and supports manual analysis Random data is not the same for successive test cyclesTop-Level Design of Smalley3 (cont’d): Top-Level Design of Smalley3 (cont’d) 128 bytes of input data placed in input ports by external test equipment at (fairly) regular intervals But not predictable from “inside” the 80k85 Each input port can be read in a (gulp) partially updated state The four types of interrupt are commanded in turn by external test equipment at (fairly) regular intervals But at truly random times as seen by Smalley3Systematic or Random Data Environment for Instructions: Systematic or Random Data Environment for Instructions Current machine-state data set placed in all central & special registers (except PC), by pairs Value for Stack Pointer restricted so as not to step on Smalley3’s code or scratch registers Machine-state data also used for an address of a pair of bytes of RAM, and for contents thereof Address value restricted to not step on stack, or on Smalley3’s code or scratch registers Insofar as instructions refer to 1 or 2 bytes of RAM, they use these bytes and contents Similarly for address of an I/O port and contentsSystematic Data Sets: Systematic Data Sets 16 zeros to fill register pairs with all zeros 16 ones to fill register pairs with all ones 8 zeros and 8 ones for each register pair 4 zeros, 4 ones, 4 zeros, 4 ones similarly Alternating pairs of zeros and ones ditto Alternating zero and one bits similarly All these are mixed and matched to make 16 systematic data sets (distinct “interesting & edgy” combinations of 16 bits each) Each systematic data set is placed in all register pairsPseudo-Random Data Sets: Pseudo-Random Data Sets The pseudo-random number generator (PRNG) is an implementation in 80k85 code of an 8-bit linear feedback shift register (LFSR) Special-case logic added to “avoid the lockup state” so that the PRNG cycles indefinitely through all 256 states of a byte in a non-trivial sequence The same “Content Engine” routine that deals out the systematic data sets has a mode that uses the PRNG twice to deal out 16 bits of pseudo-random data Unlike the systematic mode, each register pair set up gets a different data set of pseudo-random dataInstruction Test Pass/Fail Criteria: Instruction Test Pass/Fail Criteria What does each instruction affect? A small subset of registers and RAM Mustn’t have any side effects How to test both of these simultaneously? Initialize, predict, and verify “machine state” All central and special registers Two bytes of RAM selected by data set Same as top 2 bytes of stack wherever appropriate One byte in I/O port selected by data set Admittedly not the complete machine stateInitialize, Predict, and Verify: Initialize, Predict, and Verify Scoping/initializing limited “machine state” Use current data set as required to establish “PRE” and default (i.e., matching) “POST” state values Predicting changes in machine state Every parametric variation of every instruction has its own predictor routine to establish whatever different “POST” state values are required Verifying both changes and non-changes All “FOUND” state values seen after execution must match corresponding predicted “POST” state valuesPrinciples of Prediction: Principles of Prediction As far as possible, make predictions for each functionally distinct instruction without using that instruction type Table look-ups contribute to this solution Addition/subtraction prediction (“Blackadder”) uses 256-byte tables aligned with addresses so that entering tables is done by setting L register only—no address addition involved Boolean ops prediction uses loop to test each bit position in turn—no Boolean ops involvedVerification and Analysis Support: Verification and Analysis Support Objective is to save in RAM everything needed to identify the exact fault detected PRE, POST, and FOUND values reside in organized patterns of RAM locations “BADS” value for each state variable is XOR of FOUND and POST values, to highlight wrong bits Tree of BADS values provides overall Go/No-Go state in one byte; some bits are tree branches “pointing” to other BADS Root FBADS has a bit “Regs” meaning examine RBADS, each of whose bits shows which register’s BADS to examine Next slide is a map of analysis support locationsSpecial Note on “Testing” HLT: Special Note on “Testing” HLT Regular testing of HLT impossible in a self-check program that runs indefinitely But some of the operational modes do end Random machine state, PRE and POST, is set up for these necessary HLTs Test engineer can manually obtain actual final machine state and compare against POSTs By varying initial random seed or other run parameters, test engineer can exercise different machine states for HLT“Rigorous & Thorough” for I/O: “Rigorous & Thorough” for I/O Instruction testing doesn’t do much for I/O Input ports can be written only by test equipment, read only by Smalley3 Output ports can be written and read by Smalley3, and can be read by test equipment But test equipment doesn’t read it critically An approach used in some projects is for test equipment to wrap output ports back to input ports for self-check code to inspect LOLA test equipment can’t do that directly“Rigorous & Thorough” for Input: “Rigorous & Thorough” for Input LOLA test equipment generates systematic and pseudo-random input data sets for input ports When limited number of test cycles are run, only the pseudo-random input data sets are used When multiple complete test cycles are run, systematic data is used, then pseudo-random Systematic data is all-zeros and all-ones at present, but that leaves a coverage gap 3 more data sets with mixtures of zeros and ones could detect any case of 2 bit positions in the data being wired into reversed bit positions of an input port Whether to add these is a work in progressRigorous Input Testing Cont’d: Rigorous Input Testing Cont’d Checking systematic input data is simple Smalley3 knows the correct values a priori Checking random input data is trickier We impose a parity rule on each byte, and require a longitudinal XOR checksum Honeywell 800/1800 (1960’s) did this with tape, achieving SEC-DED (“Orthotronic Control®”) Parity rule is reversed between random data sets Smalley3 identifies the bad port (if only one port is bad), and the bad bit within it (if only one bit is bad)Rigorous Input Testing Cont’d: Rigorous Input Testing Cont’d How does Smalley3 know when an input data set is completely resident in the input ports? Checksum logic arbitrarily rules out checksum = 0 Zero in the checksum port is a signal that the 128 input ports, as a class, are not in a stable state Transition from zero to non-zero in the checksum port is a signal that the new data set is stable in all ports (except in checksum port, but that settles long before use) How does Smalley3 deal with possibility that a port can be read while partly updated? A “bad” checksum is read again to see if it settles out to zero and should therefore be ignored“Rigorous & Thorough” for Output: “Rigorous & Thorough” for Output Smalley3 can verify that what it reads from an output port is what it just wrote there… But can’t tell if wiring from the output port to the outside world is correct Another work in progress: my proposal to do the I/O wrapback backwards(!) Smalley3 would copy its verified input data to the corresponding output ports Test equipment can remember what it put in the input ports, and could compare that to what it later reads from the output ports On that point, the test equipment decides pass/failTesting of External Interrupts: Testing of External Interrupts Test equipment commands interrupts in a regular pattern, but their arrival looks truly random to Smalley3 Primary objective: verify that progression of machine states commanded by Smalley3 is not corrupted by interrupt Secondary objective: verify that each interrupt used its correct target location and saved PC correctly In this architecture, interrupt is functionally the same as CALL—resume address is in the stack Can’t use PRE/POST/FOUND paradigm for interruptsRAM Corruption Detector: RAM Corruption Detector Principle is X-Y arrays of XOR checksums 65k is conceived as 256 columns of 256 rows 256 column sums of rows 0-253 form row 254 256 row sums form row 255 Coverage of these checksums is all of RAM! System and Smalley3 scratch registers Smalley3 executable code Leftover general-purpose RAM Identifies any single bad byte, and the bad bit if there is only one Also identifies bad row or column in some multi-byte errorsRAM Corruption Detector Design: RAM Corruption Detector Design Routine to check checksums uses only central registers, no RAM, for itself Separate indirect-address registers H & L are crucial on this point Time consumption restricts construction of checksums to just once (beginning of run) All Smalley3’s scratch locations are quadruply allocated, 1 prime and 3 shadow locations 1 shadow in same row, 1 shadow in same column, and 1 at intersection of those 2 shadows When shadows are up to date, values don’t affect checksumsA True Confession: A True Confession The considerable effort to include all those shadowed variables may not have been worth while Since the shadowing prevents those variables from affecting the checksum, coverage is only marginally better than if checksums excluded Smalley3 variables Still, the prologue to checksum checking does get some coverage from verifying that all 3 shadows are equal before updating themSmalley3’s Achievements: Smalley3’s Achievements Caught a bug in implementation of one op CMP B (compare Accumulator vs. B register) Identified a weak point elsewhere in chip Fan-out excesses caused low-voltage tests to affect one particular 80k85 instruction Smalley3’s alarm induced design engineer to scan non-80k85 parts of chip for the problem Solution to that greatly increased undervolt toleration Motivated upgrade of CPU-memory slew rate (speed) for additional margin during high temperature operationSummary: Summary Smalley3 occupies ~ 9.3 kbytes, of which: 5.8 kbytes are executable code 2.7 kbytes are tables 1.1 kbytes are variables Some executable code gets overlaid with variables Smalley3 Test Cycle takes 14 sec if no RAM corruption checking (assuming 4 MIPS) If RAM checking done at max frequency: 1.8 hour Remember, the policy is to run many test cyclesConclusions: Conclusions Rigorous & thorough testing of a small 8-bit microprocessor with no complicated instructions wasn’t that all-fired simple See later slide on Scalability to 16, 32, or 64 bits Testing of Smalley3 itself was sort of easy Simulator allowed it to be developed on a PC Smalley3 bugs looked like 80k85 faults: convenient! But fidelity of an 8085 simulator to the 80k85 was less than complete Also, a full-bore run overwhelms simulator capacity Conclusions Continued: Conclusions Continued Design of 80k85 and its test equipment could have reduced some complexities Allow normal output-to-input wrapback Eliminate possibility of reading half-baked port Give self-check program more control over when inputs occur But … would that be less realistic? Greater fidelity of 80k85 instructions to 8085 where it doesn’t hurt 80k85 functionSmalley3 is Valuable: Smalley3 is Valuable Smalley3 will occupy one of four pages of EEPROM Enables in-flight testing of processor Flight version will include more systematic RAM testing Rapid bit reversals may detect “fatigue” modes Pattern-Sensitive Fault testing “by the book” Smalley3 is going to the Moon!Scalability Considerations: Scalability Considerations How would 16-bit, 32-bit, and 64-bit processors be more of a challenge? More, and more complex, instructions Multiply, divide, floating point, trig, decimal, etc. Base and index registers add much complexity Vastly greater amounts of RAM to check Even so, table look-up methods get impractical OK, how might the challenge be eased? Larger processors in hi-rel applications might have some fancier hardware Built-In Self-Test Around 1970, we at MIT designed a SIRU (strapdown inertial reference unit) controller whose instructions calculated & compared direct and complement resultsA Minority Opinion (Just My Own) on One Design Point: A Minority Opinion (Just My Own) on One Design Point Allowing interrupts to do everything they do in large multi-tasking systems adds complexity to both mission and test code Embedded control processors don’t need that much flexibility They can sample inputs periodically to obtain the same information that interrupts provide Ideally, that would use a program-readable clock It’s not as if any of the code is raw product from a raw and loop-vulnerable beginner Space Shuttle GPC synchronization took out most of the flexibility even at that level Response to interrupts restricted to programmed sync pointsQUESTIONS?: QUESTIONS? That’s why we reserved the room ’til 1 PM Of course, there is the trade-off vs. lunch! You do not have the permission to view this presentation. In order to view it, please contact the author of the presentation.
back to the moon Dionigi Download Post to : URL : Related Presentations : Share Add to Flag Embed Email Send to Blogs and Networks Add to Channel Uploaded from authorPOINTLite Insert YouTube videos in PowerPont slides with aS Desktop Copy embed code: (To copy code, click on the text box) Embed: URL: Thumbnail: WordPress Embed Customize Embed The presentation is successfully added In Your Favorites. Views: 307 Category: Education License: All Rights Reserved Like it (1) Dislike it (0) Added: January 11, 2008 This Presentation is Public Favorites: 0 Presentation Description No description available. Comments Posting comment... Premium member Presentation Transcript Back to the Moon: Back to the Moon The Verification of a Small Microprocessor's Logic Design A Small Microprocessor for What?: A Small Microprocessor for What? Lunar Orbiter (LRO), scheduled launch 2008, with multiple scientific instruments One of these is a Laser Altimeter, hence the name “LOLA” Laser altimetry produces very detailed and precise geodetic maps to aid establishment of a permanent base (why not “selenodetic”??) Each instrument has at least one embedded control microprocessorMicroprocessor design criteria: Microprocessor design criteria Radiation hardening for high endurance High performance to stay a step ahead of a rapid-cycling instrument Simple well-understood architecture Appropriate to embedded controller paradigm No hardware multiply or divide required Simple programs are reliable programs Straightforward assembly-language or C programming—no operating system!Technology & Architecture: Technology & Architecture Gate arrays fulfill the criteria and support any desired architecture We created the “80k85”—what’s that? Based quite closely on the old Intel 8085! Simple instructions assure quick interrupt response Not RISC, but uses limited real estate of gates An instruction set of known “completeness” Availability of established tools Assemblers, simulators, C compiler Exploits skill set of embedded-controller artisans Unimplemented op codes cause a special trapProcessor Design Verification: Processor Design Verification Every processor design needs verification Even the best have stumbled on this point! IBM System 4π AP-101 (Space Shuttle GPC) 4πs had been used in earlier aircraft & spacecraft AP-101 was a special variant for the Shuttle Not quite as off-the-shelf as everyone wanted to believe Intel Pentium FPU for P6 core, 1994 A determined effort to speed up long floating divide Verifying high-precision arithmetic is a challenge! http://www.maa.org/mathland/mathland_5_12.html 1802 Microprocessor (1986) register interactionIBM AP-101 Long Divide: IBM AP-101 Long Divide Floating point arithmetic was specified to match results obtained by System 360 Remove doubts about fidelity of GPC results to those of 360s in JSC control center Original divide design too slow for Shuttle use Designed, in conservative TTL circuit technology, to be interruptible after development of each quotient bit Last-minute redesign (AP-101B) solved the problem Much later “improved” AP-101S divide not well verified! http://klabs.org/richcontent/software_content/hal_s/hal-s_compiler_system_specification.pdf 3-16: “DED and DEDR instructions are broken on the AP-101S” 6-12: “I2DEDR was substituted for DEDR in DMOD in order to avoid incorrect results caused by some inputs. See CR11164 and DR106660.”IBM AP-101 Long Divide Cont’d: IBM AP-101 Long Divide Cont’d DED: Double Exponential (floating) Divide DEDR: same but gets divisor from register Both work for most inputs, but … “Difficult to define” which inputs don’t work(!!) However, OK if low word of divisor = 0 (D’oh!) DMOD (remainder “modulo” function): Is only user of these instructions, per audit All uses OK, per the above “however” rule I2DEDR substituted anyway, just in case IBM AP-101 Long Divide Cont’d: IBM AP-101 Long Divide Cont’d More general remedies: Modify compiler to avoid DED and DEDR Document problem in Principles of Operation Conclusion with Nasty Suspicion: Did “Process” fail to operate at proper time? Developers may have found and worked around the problem without generating DR How can all that vector/matrix code never divide?? DR, audit, etc. may have been “after the fact”Intel Pentium Long Divide: Intel Pentium Long Divide Moore’s Law works better for component density than for processing speed Complex special-casing, with table look-ups, for certain ranges of input values Intel failed to proof-read a table in a PLA! Verification of combinations of high-precision numbers cannot be exhaustive Even if Intel could have tested one combination of input values every microsecond, the exhaustive test would take O(10^30) years (cf. age of universe = O(10^10) years)1802 Microprocessor (1986): 1802 Microprocessor (1986) Not a logic design problem High byte of a register sometimes writes over high byte of Program Counter Dependent on electrical design factors Voltage and temperature toleration Length of polysilicon lines Presence of many ones in other registers A program like Smalley3 could have exposed it (looping through voltage & temp ranges)The 80k85 Verification Challenge: The 80k85 Verification Challenge Two words: “Rigorous” and “Thorough” Exhaustive inputs test almost possible 8085/80k85 word length is only 8 bits, but: The 16-bit precision inputs to instruction DAD (Double-precision register add) would take days to execute an exhaustive test That’s 2^32 combinations, O(10^10) So why not “suck it up” and spend the days? A third word: “Looping” (for margin testing)A Historical Parallel from Apollo: A Historical Parallel from Apollo The Block I and Block II Apollo Guidance Computers (AGC) each needed one self-test program for two purposes Enhancement of manual design verification Assurance that all features are still working Ed Smalley of MIT Instrumentation Lab wrote those two programs Some feedback to design: inclusion of an instruction to perform interrupt (EDRUPT)Exploiting the Parallel Further: Exploiting the Parallel Further Like the AGC models when Ed Smalley began his two tasks, the 80k85 was not quite a “newborn” when I began mine Both machines had a considerable track record of executing a few programs correctly All we needed was “rigorous” and “thorough” In Ed’s honor, I named my 80k85 self-check program Smalley3Overview of 80k85 Architecture: Overview of 80k85 Architecture Addressing by byte (65,536 bytes of RAM) Central registers and register pairs: Accumulator A 4 general registers B,C,D,E, sometimes as 2 pairs 2 indirect addressing (or general) registers H,L Program Counter PC and Stack Pointer SP (pairs) Special: condition flags; interrupt mask Accumulator and flags sometimes function as a register pair called Program Status Word (PSW) 256 one-byte I/O ports: 128 input and 128 outputOverview of 80k85 Instructions: Overview of 80k85 Instructions First (often only) byte divided by Huffman coding into as many as 3 fields Extreme case: MOV with 2-bit op, 3-bit destination tag, and 3-bit source tag—56 nontrivial functionalities Ignoring those subdivisions, 245 valid ops All the valid 8085 ops except DAA (Decimal Adjust) Interrupt masking feature of 8085 omitted 70 distinct instructions functionally Four interrupts All but one of the 8085 interrupts are implemented All interrupts have the same priority (unlike 8085)Phased Development Plan: Phased Development Plan Objective: capability to test some ops before complete test is ready Generally, early releases of Smalley3 tested simpler instructions Later ones: more complex or involving parts of 80k85 design still subject to change, especially I/O ports However, no rule that each instruction has to be tested using only simpler ones Couldn’t achieve that rigorously anyway Final phase: general RAM corruption detectorFunctional Groups of Instructions: Functional Groups of Instructions NOP and single-byte transfers Double-byte transfers Single-byte arithmetic binary operations Double-byte arithmetic binary operations Single-byte Boolean binary operations Assorted unary operations Transfers of control (except HLT) Stack operations Data input & output operations Interrupt management and illegal op codesTop-Level Design of Smalley3: Top-Level Design of Smalley3 Perform entire test “cycle” just once, or: Stated number of times Indefinitely (until failure or manually stopped) A Test Cycle is any subset of the 10 functional groups In each functional group, test any subset of its distinct ops For each distinct op, test any subset of its “parametric variations” (defined by all 8 bits) For each variation, test against 16 systematic data value sets and from 1 to 239 pseudo-random data sets Run RAM corruption check at any of the above levels … or just once, at end of run whether good or bad Any failure stops the test and supports manual analysis Random data is not the same for successive test cyclesTop-Level Design of Smalley3 (cont’d): Top-Level Design of Smalley3 (cont’d) 128 bytes of input data placed in input ports by external test equipment at (fairly) regular intervals But not predictable from “inside” the 80k85 Each input port can be read in a (gulp) partially updated state The four types of interrupt are commanded in turn by external test equipment at (fairly) regular intervals But at truly random times as seen by Smalley3Systematic or Random Data Environment for Instructions: Systematic or Random Data Environment for Instructions Current machine-state data set placed in all central & special registers (except PC), by pairs Value for Stack Pointer restricted so as not to step on Smalley3’s code or scratch registers Machine-state data also used for an address of a pair of bytes of RAM, and for contents thereof Address value restricted to not step on stack, or on Smalley3’s code or scratch registers Insofar as instructions refer to 1 or 2 bytes of RAM, they use these bytes and contents Similarly for address of an I/O port and contentsSystematic Data Sets: Systematic Data Sets 16 zeros to fill register pairs with all zeros 16 ones to fill register pairs with all ones 8 zeros and 8 ones for each register pair 4 zeros, 4 ones, 4 zeros, 4 ones similarly Alternating pairs of zeros and ones ditto Alternating zero and one bits similarly All these are mixed and matched to make 16 systematic data sets (distinct “interesting & edgy” combinations of 16 bits each) Each systematic data set is placed in all register pairsPseudo-Random Data Sets: Pseudo-Random Data Sets The pseudo-random number generator (PRNG) is an implementation in 80k85 code of an 8-bit linear feedback shift register (LFSR) Special-case logic added to “avoid the lockup state” so that the PRNG cycles indefinitely through all 256 states of a byte in a non-trivial sequence The same “Content Engine” routine that deals out the systematic data sets has a mode that uses the PRNG twice to deal out 16 bits of pseudo-random data Unlike the systematic mode, each register pair set up gets a different data set of pseudo-random dataInstruction Test Pass/Fail Criteria: Instruction Test Pass/Fail Criteria What does each instruction affect? A small subset of registers and RAM Mustn’t have any side effects How to test both of these simultaneously? Initialize, predict, and verify “machine state” All central and special registers Two bytes of RAM selected by data set Same as top 2 bytes of stack wherever appropriate One byte in I/O port selected by data set Admittedly not the complete machine stateInitialize, Predict, and Verify: Initialize, Predict, and Verify Scoping/initializing limited “machine state” Use current data set as required to establish “PRE” and default (i.e., matching) “POST” state values Predicting changes in machine state Every parametric variation of every instruction has its own predictor routine to establish whatever different “POST” state values are required Verifying both changes and non-changes All “FOUND” state values seen after execution must match corresponding predicted “POST” state valuesPrinciples of Prediction: Principles of Prediction As far as possible, make predictions for each functionally distinct instruction without using that instruction type Table look-ups contribute to this solution Addition/subtraction prediction (“Blackadder”) uses 256-byte tables aligned with addresses so that entering tables is done by setting L register only—no address addition involved Boolean ops prediction uses loop to test each bit position in turn—no Boolean ops involvedVerification and Analysis Support: Verification and Analysis Support Objective is to save in RAM everything needed to identify the exact fault detected PRE, POST, and FOUND values reside in organized patterns of RAM locations “BADS” value for each state variable is XOR of FOUND and POST values, to highlight wrong bits Tree of BADS values provides overall Go/No-Go state in one byte; some bits are tree branches “pointing” to other BADS Root FBADS has a bit “Regs” meaning examine RBADS, each of whose bits shows which register’s BADS to examine Next slide is a map of analysis support locationsSpecial Note on “Testing” HLT: Special Note on “Testing” HLT Regular testing of HLT impossible in a self-check program that runs indefinitely But some of the operational modes do end Random machine state, PRE and POST, is set up for these necessary HLTs Test engineer can manually obtain actual final machine state and compare against POSTs By varying initial random seed or other run parameters, test engineer can exercise different machine states for HLT“Rigorous & Thorough” for I/O: “Rigorous & Thorough” for I/O Instruction testing doesn’t do much for I/O Input ports can be written only by test equipment, read only by Smalley3 Output ports can be written and read by Smalley3, and can be read by test equipment But test equipment doesn’t read it critically An approach used in some projects is for test equipment to wrap output ports back to input ports for self-check code to inspect LOLA test equipment can’t do that directly“Rigorous & Thorough” for Input: “Rigorous & Thorough” for Input LOLA test equipment generates systematic and pseudo-random input data sets for input ports When limited number of test cycles are run, only the pseudo-random input data sets are used When multiple complete test cycles are run, systematic data is used, then pseudo-random Systematic data is all-zeros and all-ones at present, but that leaves a coverage gap 3 more data sets with mixtures of zeros and ones could detect any case of 2 bit positions in the data being wired into reversed bit positions of an input port Whether to add these is a work in progressRigorous Input Testing Cont’d: Rigorous Input Testing Cont’d Checking systematic input data is simple Smalley3 knows the correct values a priori Checking random input data is trickier We impose a parity rule on each byte, and require a longitudinal XOR checksum Honeywell 800/1800 (1960’s) did this with tape, achieving SEC-DED (“Orthotronic Control®”) Parity rule is reversed between random data sets Smalley3 identifies the bad port (if only one port is bad), and the bad bit within it (if only one bit is bad)Rigorous Input Testing Cont’d: Rigorous Input Testing Cont’d How does Smalley3 know when an input data set is completely resident in the input ports? Checksum logic arbitrarily rules out checksum = 0 Zero in the checksum port is a signal that the 128 input ports, as a class, are not in a stable state Transition from zero to non-zero in the checksum port is a signal that the new data set is stable in all ports (except in checksum port, but that settles long before use) How does Smalley3 deal with possibility that a port can be read while partly updated? A “bad” checksum is read again to see if it settles out to zero and should therefore be ignored“Rigorous & Thorough” for Output: “Rigorous & Thorough” for Output Smalley3 can verify that what it reads from an output port is what it just wrote there… But can’t tell if wiring from the output port to the outside world is correct Another work in progress: my proposal to do the I/O wrapback backwards(!) Smalley3 would copy its verified input data to the corresponding output ports Test equipment can remember what it put in the input ports, and could compare that to what it later reads from the output ports On that point, the test equipment decides pass/failTesting of External Interrupts: Testing of External Interrupts Test equipment commands interrupts in a regular pattern, but their arrival looks truly random to Smalley3 Primary objective: verify that progression of machine states commanded by Smalley3 is not corrupted by interrupt Secondary objective: verify that each interrupt used its correct target location and saved PC correctly In this architecture, interrupt is functionally the same as CALL—resume address is in the stack Can’t use PRE/POST/FOUND paradigm for interruptsRAM Corruption Detector: RAM Corruption Detector Principle is X-Y arrays of XOR checksums 65k is conceived as 256 columns of 256 rows 256 column sums of rows 0-253 form row 254 256 row sums form row 255 Coverage of these checksums is all of RAM! System and Smalley3 scratch registers Smalley3 executable code Leftover general-purpose RAM Identifies any single bad byte, and the bad bit if there is only one Also identifies bad row or column in some multi-byte errorsRAM Corruption Detector Design: RAM Corruption Detector Design Routine to check checksums uses only central registers, no RAM, for itself Separate indirect-address registers H & L are crucial on this point Time consumption restricts construction of checksums to just once (beginning of run) All Smalley3’s scratch locations are quadruply allocated, 1 prime and 3 shadow locations 1 shadow in same row, 1 shadow in same column, and 1 at intersection of those 2 shadows When shadows are up to date, values don’t affect checksumsA True Confession: A True Confession The considerable effort to include all those shadowed variables may not have been worth while Since the shadowing prevents those variables from affecting the checksum, coverage is only marginally better than if checksums excluded Smalley3 variables Still, the prologue to checksum checking does get some coverage from verifying that all 3 shadows are equal before updating themSmalley3’s Achievements: Smalley3’s Achievements Caught a bug in implementation of one op CMP B (compare Accumulator vs. B register) Identified a weak point elsewhere in chip Fan-out excesses caused low-voltage tests to affect one particular 80k85 instruction Smalley3’s alarm induced design engineer to scan non-80k85 parts of chip for the problem Solution to that greatly increased undervolt toleration Motivated upgrade of CPU-memory slew rate (speed) for additional margin during high temperature operationSummary: Summary Smalley3 occupies ~ 9.3 kbytes, of which: 5.8 kbytes are executable code 2.7 kbytes are tables 1.1 kbytes are variables Some executable code gets overlaid with variables Smalley3 Test Cycle takes 14 sec if no RAM corruption checking (assuming 4 MIPS) If RAM checking done at max frequency: 1.8 hour Remember, the policy is to run many test cyclesConclusions: Conclusions Rigorous & thorough testing of a small 8-bit microprocessor with no complicated instructions wasn’t that all-fired simple See later slide on Scalability to 16, 32, or 64 bits Testing of Smalley3 itself was sort of easy Simulator allowed it to be developed on a PC Smalley3 bugs looked like 80k85 faults: convenient! But fidelity of an 8085 simulator to the 80k85 was less than complete Also, a full-bore run overwhelms simulator capacity Conclusions Continued: Conclusions Continued Design of 80k85 and its test equipment could have reduced some complexities Allow normal output-to-input wrapback Eliminate possibility of reading half-baked port Give self-check program more control over when inputs occur But … would that be less realistic? Greater fidelity of 80k85 instructions to 8085 where it doesn’t hurt 80k85 functionSmalley3 is Valuable: Smalley3 is Valuable Smalley3 will occupy one of four pages of EEPROM Enables in-flight testing of processor Flight version will include more systematic RAM testing Rapid bit reversals may detect “fatigue” modes Pattern-Sensitive Fault testing “by the book” Smalley3 is going to the Moon!Scalability Considerations: Scalability Considerations How would 16-bit, 32-bit, and 64-bit processors be more of a challenge? More, and more complex, instructions Multiply, divide, floating point, trig, decimal, etc. Base and index registers add much complexity Vastly greater amounts of RAM to check Even so, table look-up methods get impractical OK, how might the challenge be eased? Larger processors in hi-rel applications might have some fancier hardware Built-In Self-Test Around 1970, we at MIT designed a SIRU (strapdown inertial reference unit) controller whose instructions calculated & compared direct and complement resultsA Minority Opinion (Just My Own) on One Design Point: A Minority Opinion (Just My Own) on One Design Point Allowing interrupts to do everything they do in large multi-tasking systems adds complexity to both mission and test code Embedded control processors don’t need that much flexibility They can sample inputs periodically to obtain the same information that interrupts provide Ideally, that would use a program-readable clock It’s not as if any of the code is raw product from a raw and loop-vulnerable beginner Space Shuttle GPC synchronization took out most of the flexibility even at that level Response to interrupts restricted to programmed sync pointsQUESTIONS?: QUESTIONS? That’s why we reserved the room ’til 1 PM Of course, there is the trade-off vs. lunch!