As of August 2020 the site you are on (wiki.newae.com) is deprecated, and content is now at rtfm.newae.com.

Changes

Jump to: navigation, search

Tutorial B2 Viewing Instruction Power Differences

2,323 bytes added, 14:44, 9 October 2018
no edit summary
For the rest of this tutorial, we'll be focusing on the STM32F3, which is the microcontroller on the CW303 Arm target (though other targets should demonstrate the same principles). Since the STM32F3 is an Arm Cortex M4 device, we'll need to refer to the Cortex M4 Instruction Set and the Cortex M4 Instruction Set Summary.
The first thing we'll do is replace the <code>nop</code> instructions, since from it's documentation page we can see the processor may not execute them. Instead, let's add some <code>add.w</code> (which is the 32 bit wide version of the add instruction) instructions. We'll be doing this since the <code>mul</code> instruction is always 32 bits wide and the 16 bit thumb instruction has a different power profile than the 32 bit Arm instruction. From the earlier links, we can see that both add and mul take 1 cycle each to complete.
Now hit the ''Run 1'' [[File:Capture One Buttonwe should have 10 <code>add.PNG|image]] button w</code> instructions and capture a single trace. It's recommended that you only capture a single trace. You may also want to change the colour of the next trace to make differentiating them easier10 <code>mul</code> instructions:<syntaxhighlight lang="c">trigger_high();
You should now have something that looks like this:
==== Swapping NOP and MUL ====
Next, we're going to swap when the nop and mul instructions happen to see if we can identify which is which.:<syntaxhighlight lang="c">
asm volatile(
"mul add.w r0,r1r0" "\n\t""mul add.w r0,r1r0" "\n\t""mul add.w r0,r1r0" "\n\t""mul add.w r0,r1r0" "\n\t""mul add.w r0,r1r0" "\n\t""mul add.w r0,r1r0" "\n\t""mul add.w r0,r1r0" "\n\t""mul add.w r0,r1r0" "\n\t" "mul add.w r0,r1r0" "\n\t""mul add.w r0,r1r0" "\n\t"
::
);
asm volatile(
"nopmul r0, r0" "\n\t""nopmul r0, r0" "\n\t""nopmul r0, r0" "\n\t""nopmul r0, r0" "\n\t""nopmul r0, r0" "\n\t""nopmul r0, r0" "\n\t""nopmul r0, r0" "\n\t""nopmul r0, r0" "\n\t""nopmul r0, r0" "\n\t""nopmul r0, r0" "\n\t"
::
);
</syntaxhighlight>Build and program the new firmware onto the device. Then capture another trace. You should have something that looks like:
trigger_low();</syntaxhighlight>Now hit the ''Run 1'' [[File:B2 cap2Capture One Button.PNG|frameless|1149x1149pximage]]button and capture a single trace. You should now have something that looks like this:
We can spot some areas with differences, but it's tough to see where the different instructions are happening. Let's zoom in (click and drag over the area you want to zoom into, right click > View All to go back to the original view)[[File:B2 STM Addmul.PNG|frameless|1374x1374px]]
[[File:B2 cap3We can see the <code>add.w</code> and <code>mul</code> instructions near the beginning, staring about 10 samples in and ending about 90 samples in. There's not really any difference that we can see between the two, but we can see that they take up about 80 samples (20 microcontroller clock cycles) as we expect.PNG|frameless|1126x1126px]]
As you can seeNext, therelet's still very little difference between insert some <code>udiv</code> instructions. From the twoCortex M4 Instruction Set Summary, but we can see that <code>udiv</code> (unsigned divide) instructions take between 2 and 12 cycles to complete (effectively depending on how big the differences numbers we're dividing are there! ). We'll be dividing <code>r0</code> by <code>r0</code>, meaning we expect that every instruction after the first should take 2 cycles. It should have higher power consumption too, since dividing is typically a fairly complex operation:<syntaxhighlight lang="c">trigger_high(); asm volatile("add.w r0, r0" "\n\t""add.w r0, r0" "\n\t""add.w r0, r0" "\n\t""add.w r0, r0" "\n\t""add.w r0, r0" "\n\t""add.w r0, r0" "\n\t""add.w r0, r0" "\n\t""add.w r0, r0" "\n\t""add.w r0, r0" "\n\t""add.w r0, r0" "\n\t"::);asm volatile("mul r0, r0" "\n\t""mul r0, r0" "\n\t""mul r0, r0" "\n\t""mul r0, r0" "\n\t""mul r0, r0" "\n\t""mul r0, r0" "\n\t""mul r0, r0" "\n\t""mul r0, r0" "\n\t""mul r0, r0" "\n\t""mul r0, r0" "\n\t"::); asm volatile("udiv r0, r0" "\n\t""udiv r0, r0" "\n\t""udiv r0, r0" "\n\t""udiv r0, r0" "\n\t""udiv r0, r0" "\n\t""udiv r0, r0" "\n\t""udiv r0, r0" "\n\t""udiv r0, r0" "\n\t""udiv r0, r0" "\n\t""udiv r0, r0" "\n\t"::); trigger_low();</syntaxhighlight>Capture another trace and you should get something like: [[File:B2 STM Addmuldiv.PNG|frameless|1377x1377px]] As we expected, we can see periods of high power consumption measuring about 80 samples in total right after the <code>add.w</code> and <code>mul</code> instructions. Interestingly, the <code>udiv</code> instructions seem to be split into 2 sets of operations. As a final check, we can add some more <code>mul</code> instructions and see the <code>udiv</code> instructions move down (and also break into more sections):  [[File:B2 STM Addmulmuldiv.PNG|frameless|1365x1365px]]
== Clock Phase Adjustment ==
== Conclusion ==
In this tutorial you have learned how power analysis can tell you the operations being performed on a microcontroller. In future work we will move towards using this for breaking various forms of security on devices. In particular, [[Tutorial B3-1 Timing Analysis with Power for Password Bypass]] will examine how we can use this information to exploit a password check.
== Links ==
Approved_users, administrator
366
edits

Navigation menu