# Randomised testing of a HOL 4 microprocessor model

Brian Campbell

REMS project

LFCS, University of Edinburgh

31st January 2014

Why are we testing a processor model?

Want a good model for:

Cost-lifting decompilation



- 1. Compile C code normally
- 2. Simple timing analysis on ARM code for basic blocks
- 3. Use decompilation to
  - check functional equivalence
  - attach timing annotations on to C source

Based on [Sewell, Myreen, Klein, PLDI'13]

#### Need a simple timing model...



With just 56 instructions, it is possible to master quickly the entire Cortex-M0 instruction and its C friendly architecture, making development simple and fast. The option for fully deterministic instruction and interrupt timing makes it easy to calculate response times.

#### Optimized connectivity

Designed to support low power connectivity such as Bluetooth Low Energy (BLE), IEEE 802.15 and Z-wave, particularly in analog devices that are increasing their digital functionality to pre-process and ARM Twitter

#### Need a simple timing model...



4/41

#### Need a simple timing model...

Programmers Model

#### 3.3 Instruction set summary

The processor implements the ARMv6-M Thumb instruction set, including a number of 32-bit instructions that use Thumb-2 technology. The ARMv6-M instruction set comprises:

- all of the 16-bit Thumb instructions from ARMv7-M excluding CBZ, CBNZ and IT
- the 32-bit Thumb instructions BL, DMB, DSB, ISB, MRS and MSR.

Table 3-1 shows the Cortex-M0 instructions and their cycle counts. The cycle counts are based on a system with zero wait-states.

| Operation | Description      | Assembler                  | Cycles |  |
|-----------|------------------|----------------------------|--------|--|
| Move      | 8-bit immediate  | MOVS Rd, # <imm></imm>     | 1      |  |
|           | Lo to Lo         | MOVS Rd, Rm                | 1      |  |
|           | Any to Any       | MOV Rd, Rm                 | 1      |  |
|           | Any to PC        | MOV PC, Rm                 | 3      |  |
| Add       | 3-bit immediate  | ADDS Rd, Rn, # <imm></imm> | 1      |  |
|           | All registers Lo | ADDS Rd, Rn, Rm            | 1      |  |
|           | Any to Any       | ADD Rd, Rd, Rm             | 1      |  |
|           | Any to PC        | ADD PC, PC, Rm             | 3      |  |
|           | 8-bit immediate  | ADDS Rd, Rd, # <imm></imm> | 1      |  |

#### Table 3-1 Cortex-M0 instruction summary

#### Is the simple timing model real?

#### ARM Cortex-M0 processors

- + nice timing table in reference manual
- + none of the usual caveats in the manual
- $+\,$  no cache / write-back buffer concerns
- + implementations have fast enough SRAM

#### Is the simple timing model real?

#### ARM Cortex-M0 processors

- + nice timing table in reference manual
- + none of the usual caveats in the manual
- $+\,$  no cache / write-back buffer concerns
- + implementations have fast enough SRAM
- manufacturers very quiet about timing
- three stage pipeline

(fetch, decode, execute)

#### Overview

- Have a HOL 4 model of ARM Cortex-M0 processor
- which includes timing

Want to check

- 1. suitable for verification (sound)
- 2. timing is correct

So we take random sequences of instructions.

Test predictions the model makes against real chip(s).

## A real chip



 $\begin{array}{c} \mathsf{L3} \\ \textit{(automated)} & \downarrow \\ \mathsf{Model} \to \mathsf{Step} \to \mathsf{Prog} \end{array}$ 

- Written in L3 DSL [Fox, ITP'12]
- Automatic translation to HOL 4
- Model step function
- Step per-instruction behaviour of step fn
- Prog separation-logic-like triples

Main purpose: program verification Not modelled: target memory, exceptions, self-modifying code, ...

⋆ Includes simple timing model

```
L3
                                 \downarrow
                   (automated)
                                \mathsf{Model} \to \mathsf{Step} \to \mathsf{Prog}
instruction DecodeARM(w::word) = {
. . .
      case '1010 : imm24' =>
         if Take (cond, true) then
         { imm32 = SignExtend (imm24 : '00');
             Branch (BranchTarget (imm32))
         }
         else
             Skip ()
. . .
define Branch > BranchTarget
   ( imm32 :: bits(32) )
   =
   BranchWritePC (PC + imm32)
```

. . .

(if

```
L3
                                            \downarrow
                          (automated)
                                            \mathsf{Model} \to \mathsf{Step} \to \mathsf{Prog}
DecodeThumb2 h =
 else if \neg b'14 \land \neg b'12 then
```

FST (ConditionPassed (v2w [b'25; b'24; b'23; b'22]) state) then

```
Branch (BranchTarget
```

```
(sw2sw
        (v2w [b'26] @@ v2w [b'11] @@ v2w [b'13] @@
         v2w [b'21; b'20; b'19; b'18; b'17; b'16] @@
         v2w [b'10; b'9; b'8; b'7; b'6; b'5; b'4; b'3;
            b'2: b'1: b'0] @@ Ow)))
else NoOperation (),state)
```





 $\begin{array}{c} \mathsf{L3} \\ (\textit{automated}) & \downarrow \\ \mathsf{Model} \to \mathsf{Step} \to \mathsf{Prog} \end{array}$ 

We choose to work with Step

- Would need to do equivalent work anyway
  - To specialise to specific instructions
  - To isolate preconditions
- Prog requires up-front decisions about separation

Small danger to validity:

Prog may not use Step in the same way as us

Several distinct stages:

- 1. Instruction sequence generation
- 2. Combining step theorems
- 3. Constructing suitable pre-state
- 4. Instantiate theorem to get prediction
- 5. Run sequence on hardware and compare

#### Instruction sequence generation

Want

- to pick randomly
- but bias selection of instructions, registers, values
- Could reuse L3's knowledge of instructions, but
  - Small instruction set, so
  - opportunity to cross-check

#### Instruction sequence generation

Data structure for instruction formats

```
datatype instr_format =
   Lit of int list
| Reg3
| Reg4NotPC
| ...
val instrs = [
   (1, ([Lit [0,0,0,1,1,1,0], Imm 3, Reg3, Reg3], "ADD (imm) T1")),
   (14,([Lit [1,1,0,1], Cond, Imm 8], "B T1")),
   (1, ([Lit [0,1,0,0,0,1,1,1,1], Reg4NotPC, Lit [0,0,0]], "BLX")),
```

#### Instruction sequence generation

Data structure for instruction formats

```
datatype instr_format =
   Lit of int list
| Reg3
| Reg4NotPC
| ...
val instrs = [
   (1, ([Lit [0,0,0,1,1,1,0], Imm 3, Reg3, Reg3], "ADD (imm) T1")),
   (14,([Lit [1,1,0,1], Cond, Imm 8], "B T1")),
   (1, ([Lit [0,1,0,0,0,1,1,1,1], Reg4NotPC, Lit [0,0,0]], "BLX")),
```

Sanity checks:

- Supported instructions have Prog triples
- Unsupported ones don't

LDRSB was missing from Step!

#### Combining step theorems

. . .

Get Step theorem for each instruction

- randomly picking whether to take a conditional branch
- ... |- NextStateMO s = SOME (s with |< ... >|)
  ... |- NextStateMO s = SOME (s with |< ... >|)

#### Combining step theorems

Get Step theorem for each instruction

randomly picking whether to take a conditional branch

```
... |- NextStateMO s = SOME (s with |< ... >|)
... |- NextStateMO s = SOME (s with |< ... >|)
```

Progressively instantiate each s with previous step theorem.

Simplify as we go

. . .

- Record symbolic memory accesses, instruction locations
  - otherwise we forget about accesses whose value is discarded

```
...
instr_start 1 = s.REG RName_PC + 2w,
memory_address 0 = s.REG RName_PC + 8w,
s.MEM (s.REG RName_PC) = v2w [F; F; T; F; T; F; T; F],
...
|- NStatesMO 5 s =
    s with <| ... |>
```

Finding a pre-state — requirements

Model

- only tells us about successful executions
- gives preconditions
- doesn't cover everything

We need more:

- Memory accesses in range
  - must hit 8kB memory in 4GB address space
- no self-modification
- add test harness (BKPT instruction)
- align stack pointer registers

Add these to model's preconditions as HOL terms

#### Constraints may be complex

| 0: | 5e88 | ldrsh | r0, [r1, r2] | ; load r0 from r1+r2 (16 bits) |
|----|------|-------|--------------|--------------------------------|
| 2: | 4090 | lsl   | r0, r0, r2   | ; shift r0 left by r2          |
| 4: | 1880 | add   | r0, r0, r2   | ; add r2 to r0                 |
| 6: | 6803 | ldr   | r3, [r0, #0] | ; load r2 from r0              |

Constraints involving bitvector adds, shifts, sign extension, repeated variables and inequalities.

#### Constraints may be complex

| 0: | 5e88 | ldrsh | r0, [r1, r2] | ; load r0 from r1+r2 (16 bits) |
|----|------|-------|--------------|--------------------------------|
| 2: | 4090 | lsl   | r0, r0, r2   | ; shift r0 left by r2          |
| 4: | 1880 | add   | r0, r0, r2   | ; add r2 to r0                 |
| 6: | 6803 | ldr   | r3, [r0, #0] | ; load r2 from r0              |

Constraints involving bitvector adds, shifts, sign extension, repeated variables and inequalities.

- Requirements fit SMT solving well
- Existing HolSmtLib targets Yices, Z3
  - but for proving (via negation)
  - adapted Yices part for constraint solving

Constraint solving with SMT appears to be unusual for interactive theorem proving.

The adapted HolSmtLib will translate subset of HOL into Yices format.

- Need to fit preconditions into HOL subset
- Sound rewriting of unsupported definitions Alignment, shifts, add\_with\_carry
- 2. ensure supported form of bitvectors is used
- 3. some mixed bitvector/nat operations unsupported rewrite away, or implement limited version (e.g., 8-bit)

The adapted HolSmtLib will translate subset of HOL into Yices format.

- Need to fit preconditions into HOL subset
- Sound rewriting of unsupported definitions Alignment, shifts, add\_with\_carry
- 2. ensure supported form of bitvectors is used
- some mixed bitvector/nat operations unsupported rewrite away, or implement limited version (e.g., 8-bit)

Discovering what to do isn't easy:

- Tried preconditions for every instruction type to detect all unsupported
- Have to be careful not to undo rewrites when simplifying

#### Instantiate theorem to get prediction

Translating the SMT results into HOL terms gives us a partial state

- Fill in the blanks with random choices
- Instantiating the theorem derived earlier should
  - Discharge all hypotheses
  - predict final state

#### Instantiate theorem to get prediction

Translating the SMT results into HOL terms gives us a partial state

- Fill in the blanks with random choices
- Instantiating the theorem derived earlier should
  - Discharge all hypotheses
  - predict final state

HOL isn't entirely happy with a list of 8192 8-bit bitvectors. Careful handling required.

#### Run sequence on hardware and compare

HOL state extract memory, registers, flags IPC OpenOCD debugger driver USB STMF0-Discovery board USB OpenOCD debugger driver **IPC** Final state

Check memory, registers, flags and onboard SysTick timer.

#### Run sequence on hardware and compare

HOL state extract memory, registers, flags IPC OpenOCD debugger driver USB STMF0-Discovery board USB OpenOCD debugger driver **IPC** Final state

Check memory, registers, flags and onboard SysTick timer.

If processor goes off-sequence, end up in Fault state with huge time.

# Scaling it up

Add logging:

- what did we run
- what happened
- enough to reproduce each case exactly

Categorise by outcome:

- Impossible sequence (e.g., branching opposite ways on a flag)
- ► No suitable pre-state exists (e.g., SMT returned UNSAT)
- Unable to find pre-state (SMT returned UNKNOWN)
- The testing code threw an exception
- 'Proper' failure post-state did not match prediction
- Success

# Scaling it up

Add logging:

- what did we run
- what happened
- enough to reproduce each case exactly

Categorise by outcome:

- Impossible sequence (e.g., branching opposite ways on a flag)
- ► No suitable pre-state exists (e.g., SMT returned UNSAT)
- Unable to find pre-state (SMT returned UNKNOWN)
- The testing code threw an exception
- 'Proper' failure post-state did not match prediction
- Success

Future: gather statistics on coverage.

Surprised SMT solver isn't returning UNKNOWN in practice.

Surprised SMT solver isn't returning UNKNOWN in practice.

Some bugs:

- 1. Missing LDRSB in Step (plus minor issues)
- 2. Inverted check for BX in Model

Both would be found by single-instruction testing.

Surprised SMT solver isn't returning UNKNOWN in practice.

Some bugs:

- 1. Missing LDRSB in Step (plus minor issues)
- 2. Inverted check for BX in Model

Both would be found by single-instruction testing.

Some sequences take one cycle too long:

- add pc, r0 at end
- Some mixtures of branch and memory operation

Surprised SMT solver isn't returning UNKNOWN in practice.

Some bugs:

- 1. Missing LDRSB in Step (plus minor issues)
- 2. Inverted check for BX in Model

Both would be found by single-instruction testing.

Some sequences take one cycle too long:

- add pc, r0 at end
- Some mixtures of branch and memory operation

Appear to be hitting some subtle PolyML GC / HOL incompatibility..!

#### Future

Near future:

- Investigate timing anomalies
  - Different timing harnesses, add padding, ...
- Longer runs
- Longer instruction sequences
- Investigate coverage

#### Future

Near future:

- Investigate timing anomalies
  - Different timing harnesses, add padding, ....
- Longer runs
- Longer instruction sequences
- Investigate coverage

Further ahead:

- New REMS DSL for processor models, SAIL
  - Integrate testing?
  - Reproducible test cases for weak memory models

How much of this can we commoditise?

#### Future

Near future:

- Investigate timing anomalies
  - Different timing harnesses, add padding, ....
- Longer runs
- Longer instruction sequences
- Investigate coverage

Further ahead:

- New REMS DSL for processor models, SAIL
  - Integrate testing?
  - Reproducible test cases for weak memory models

How much of this can we commoditise?

Other opportunities:

- Test bigger processors with proper WCET analyses?
- Can we choose pre-states more randomly?

#### Conclusion

- 1. Took a HOL processor model
- 2. Test sequences of instructions for functional and timing bugs
- 3. Used SMT solving to ensure successful executions
- 4. Preliminary signs of success

Main technical difficulty:

- Getting preconditions into SMT friendly form.
- \* Formal system makes doing this soundly easier

Sort out timing anomalies

 $\Rightarrow$  sound basis for cost-preserving decompilation