As of August 2020 the site you are on (wiki.newae.com) is deprecated, and content is now at rtfm.newae.com.

Difference between revisions of "Tutorial B2 Viewing Instruction Power Differences"

From ChipWhisperer Wiki
Jump to: navigation, search
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{Warningbox|This tutorial has been updated for ChipWhisperer 4.0.0 release. If you are using 3.x.x see the "V3" link in the sidebar.}}
+
{{Warningbox|This tutorial has been updated for ChipWhisperer 5 release. If you are using 4.x.x or 3.x.x see the "V4" or "V3" link in the sidebar.}}
  
 
{{Infobox tutorial
 
{{Infobox tutorial
Line 13: Line 13:
 
}}
 
}}
  
This tutorial will introduce you to measuring the power consumption of a device under attack. It will demonstrate how you can view the difference between assembly instructions
+
<!-- To edit this, edit Template:Tutorial_boilerplate -->
== Prerequisites ==
+
{{Tutorial boilerplate}}
  
You should have already completed [[Tutorial_B1_Building_a_SimpleSerial_Project]]. This tutorial assumes you are capable of building firmware for the target, programming the code, and connecting to the ChipWhisperer.
+
* Jupyter file: '''PA_Intro_2-Instruction_Differences.ipynb'''
  
== Setting Up the Example ==
 
  
In this tutorial, we will once again be working off of the <code>simpleserial-base</code> firmware. Like with the previous tutorial, you'll need to have a copy of the firmware you want to modify and be able to build for your platform. The instructions are repeated in the drop down menus below, but if you're comfortable with the previous example, feel free to skip them and to just build your new firmware. Alternatively, if you're not too attached to your code, you can just modify your firmware from [[Tutorial B1 Building a SimpleSerial Project]] and rebuild it from the same directory.
+
== XMEGA Target ==
{{CollapsibleSection
+
|intro = === Building for CWLite with XMEGA Target ===
+
|content= Building for XMEGA}}
+
  
{{CollapsibleSection
+
See the following for using:
|intro = === Building for CWLite with Arm Target ===
+
* ChipWhisperer-Lite Classic (XMEGA)
|content= Building for Arm}}
+
* ChipWhisperer-Lite Capture + XMEGA Target on UFO Board (including NAE-SCAPACK-L1/L2 users)
 +
* ChipWhisperer-Pro + XMEGA Target on UFO Board
  
{{CollapsibleSection
+
https://chipwhisperer.readthedocs.io/en/latest/tutorials/pa_intro_2-openadc-cwlitexmega.html#tutorial-pa-intro-2-openadc-cwlitexmega
|intro = === Building for Other Targets ===
+
|content= Building for Other Targets}}
+
  
<h2> Modifying the Basic Example </h2>
+
== ChipWhisperer-Lite ARM / STM32F3 Target ==
<ol style="list-style-type: decimal;">
+
<li><p>At this point we want to modify the system to perform a number of operations. We won't actually use the input data. To do so, open the file <code>simpleserial-base.c</code> with a text editor such as Programmer's Notepad (which ships with WinAVR).</p></li>
+
<li><p>Find the following code block towards the end of the file, which may look different if you just completed [[Tutorial_B1_Building_a_SimpleSerial_Project]].</p>
+
<source lang="c">/**********************************
+
* Start user-specific code here. */
+
trigger_high();
+
  
//16 hex bytes held in 'pt' were sent
+
See the following for using:
//from the computer. Store your response
+
* ChipWhisperer-Lite 32-bit (STM32F3 Target)
//back into 'pt', which will send 16 bytes
+
* ChipWhisperer-Lite Capture + STM32F3 Target on UFO Board (including NAE-SCAPACK-L1/L2 users)
//back to computer. Can ignore of course if
+
* ChipWhisperer-Pro + STM32F3 Target on UFO Board
//not needed
+
  
trigger_low();
+
https://chipwhisperer.readthedocs.io/en/latest/tutorials/pa_intro_2-openadc-cwlitearm.html#tutorial-pa-intro-2-openadc-cwlitearm
/* End user-specific code here. *
+
********************************/</source></li>
+
<li><p>Modify it to do some work with no-ops and multiplication instructions:</p>
+
<source lang="c">/**********************************
+
* Start user-specific code here. */
+
trigger_high();
+
  
//16 hex bytes held in 'pt' were sent
+
== ChipWhisperer Nano Target ==
//from the computer. Store your response
+
//back into 'pt', which will send 16 bytes
+
//back to computer. Can ignore of course if
+
//not needed
+
  
asm volatile(
+
See the following for using:
"nop"      "\n\t"
+
* ChipWhisperer-Nano
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
::
+
);
+
  
asm volatile(
+
https://chipwhisperer.readthedocs.io/en/latest/tutorials/pa_intro_2-cwnano-cwnano.html#tutorial-pa-intro-2-cwnano-cwnano
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"         
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
::
+
);
+
 
+
trigger_low();
+
/* End user-specific code here. *
+
********************************/</source></li>
+
<li><p>Change the terminal to the directory with your source, and run the same <code>make</code> command you did earlier to build the firmware. Remember you can press the up arrow on the keyboard to get recently typed commands in most OSes:</p></ol>
+
 
+
== Hardware Setup ==
+
The hardware setup is the same as in [[Tutorial B1 Building a SimpleSerial Project|Tutorial B1 Building a SimpleSerial Project.]] The setup is repeated in the drop down menus below, but if you've already done that, skip to the next section.
+
{{CollapsibleSection
+
|intro = === CW1173 (Lite) Hardware Setup ===
+
|content= CWLite HW Setup}}
+
 
+
{{CollapsibleSection
+
|intro = === CW1200 (Pro) Hardware Setup ===
+
|content= CW1200 HW Setup}}
+
 
+
{{CollapsibleSection
+
|intro = === CW308 (UFO) Hardware Setup ===
+
|content= CW308 HW Setup}}
+
 
+
== Programming the Target ==
+
Programming the target is the same as in previous tutorials. The steps are repeated in the drop down menus below.
+
 
+
{{CollapsibleSection
+
|intro = === Programming the XMEGA Target ===
+
|content= Programming XMEGA}}
+
 
+
{{CollapsibleSection
+
|intro = === Programming the STM32F3 (CW303 Arm) Target ===
+
|content= Programming Arm}}
+
 
+
{{CollapsibleSection
+
|intro = === Programming Other Targets ===
+
|content= Programming Other}}
+
 
+
 
+
== Capturing Power Traces ==
+
 
+
The basic steps to connect to the ChipWhisperer device are described in [[Tutorial_B1_Building_a_SimpleSerial_Project]]. They are repeated here as well, however see [[Tutorial_B1_Building_a_SimpleSerial_Project]] for pictures &amp; mode details.
+
 
+
<ol style="list-style-type: decimal;">
+
<li>Start ChipWhisperer-Capture</li>
+
<li>Under the ''Python Console'' tab, find the ''connect_cwlite_simpleserial.py'' script and double-click.</li>
+
<li>Check there are no errors on the connection.</li>
+
<li>Under the ''Python Console'' tab, find the relevant setup script for your target (such as setup_cwlite_xmega.py) and double-click.</li>
+
<li>Both the Target &amp; Scope should switch to ''CON'' and be green circles.</li>
+
<li>Open the status monitor (<i>Tools > Encryption Status Monitor</i>).</li>
+
<li>Hit the ''Run 1'' [[File:Capture One Button.PNG|image]] button. You may have to hit it a few times, as the very first serial data is often lost. You should see data populate in the ''Text Out'' field of the monitor window. The ''Text In'' and ''Text Out'' aren't actually used in this example, so you can close the ''Monitor'' dialog.</li>
+
 
+
At this point you've completed the same amount of information as the previous tutorial. The following section describes how to setup the analog capture hardware, which is new (to you). The following is entirely done in the ''Scope Settings'' tab:
+
 
+
[[File:04_ADC_Clock_2_1.png|image]]</ol>
+
 
+
<ol start="8" style="list-style-type: decimal;">
+
<li>The ''ADC Freq'' should show 4x the clock speed of your device (typically 29.5MHz), and the ''DCM Locked'' checkbox __MUST__ be checked. If the ''DCM Locked'' checkbox is NOT checked, try hitting the ''Reset ADC DCM'' button again.</li>
+
<li><p>At this point you can hit the ''Capture 1'' button, and see if the system works! You should end up with a window looking like this:</p>
+
<p>[[File:05_Low_Gain.PNG|image|1250px]]</p>
+
<p>Whilst there is a waveform, you need to adjust the capture settings. There are two main settings of importance, the analog gain and number of samples to capture.</p></li>
+
 
+
[[File:06_high_gain.PNG|image|1250px]]</ol>
+
 
+
<ol start="16" style="list-style-type: decimal;">
+
<li>Under ''Gain Setting'' set the ''Mode'' to ''high''. Increase the ''Gain Setting'' to about 25. You'll be able to adjust this further during experimentation; you may need to increase this depending on your hardware and target device.</li>
+
<li>Under ''Trigger Setup'' set the ''Total Samples'' to ''500''.</li>
+
<li>Try a few more ''Capture 1'' traces, and you should see a 'zoomed-in' waveform.</li></ol>
+
 
+
== Modifying the Target ==
+
 
+
=== Background on Setup (XMEGA) ===
+
 
+
The rest of this tutorial will focus on AtXMEGA128D4 (the CW303 XMEGA target), since correlating instructions to power consumption is typically simpler on it. We are comparing the power consumption of two different instructions, the <code>MUL</code> (multiply) instruction and the <code>NOP</code> (no operation) instruction. Some information on these two instructions:
+
 
+
; mul
+
* Multiples two 8-bit numbers together.
+
* Takes 2 clock cycles to complete
+
* Intuitively expect fairly large power consumption due to complexity of operation required
+
; nop
+
* Does nothing
+
* Takes 1 clock cycle to complete
+
* Intuitively expect low power consumption due to core doing nothing
+
 
+
Note that the capture clock is running at 4x the device clock. Thus a single <code>mul</code> instruction should span 8 samples on our output graph, since it takes 4 samples to cover a complete clock cycle.
+
 
+
==== Initial Code ====
+
 
+
The initial code has a power signature something like this (yours will vary based on various physical considerations, and depending if you are using an XMEGA or AVR device):
+
 
+
[[File:cap_nop_mul.png|image]]
+
 
+
Note that the 10 <code>mul</code> instructions would be expected to take 80 samples to complete, and the 10 <code>nop</code> instructions should take 40 samples to complete. By modifying the code we can determine exactly which portion of the trace is corresponding to which operations.
+
 
+
==== Increase number of NOPs ====
+
 
+
We will then modify the code to have twenty NOP operations in a row instead of ten. The modified code looks like this:
+
 
+
<blockquote><source lang="c">/**********************************
+
* Start user-specific code here. */
+
trigger_high();
+
 
+
asm volatile(
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
::
+
);
+
 
+
asm volatile(
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
::
+
);
+
 
+
asm volatile(
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"         
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
::
+
);
+
 
+
trigger_low();
+
/* End user-specific code here. *
+
********************************/</source></blockquote>
+
Note that the <code>mul</code> operation takes 2 clock cycles on the AVR, and the <code>nop</code> operation takes 1 clock cycles. Thus we expect to now see two areas of the power trace which appear to take approximately the same time. The resulting power trace looks like this:
+
 
+
[[File:cap_doublenop_mul.png|image]]
+
 
+
Pay particular attention to the section between sample number 0 &amp; sample number 180. It is in this section we can compare the two power graphs to see the modified code. We can actually 'see' the change in operation of the device! It would appear the <code>nop</code> is occuring from approximately 10-90, and the <code>mul</code> occuring from 90-170.
+
 
+
==== Add NOP loop after MUL ====
+
 
+
Finally, we will add 10 more NOPs after the 10 MULs. The code should look something like this:
+
 
+
<blockquote><source lang="c">/**********************************
+
* Start user-specific code here. */
+
trigger_high();
+
 
+
asm volatile(
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
::
+
);
+
 
+
asm volatile(
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
::
+
);
+
 
+
asm volatile(
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"         
+
"mul r0,r1" "\n\t"
+
"mul r0,r1" "\n\t"
+
::
+
);
+
 
+
asm volatile(
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
"nop"      "\n\t"
+
::
+
);
+
 
+
trigger_low();
+
/* End user-specific code here. *
+
********************************/</source></blockquote>
+
With an output graph that looks like this:
+
 
+
<blockquote>[[File:cap_doublenop_mul_nop.png|image]]
+
</blockquote>
+
 
+
==== Comparison of All Three ====
+
 
+
The following graph lines the three options up. One can see where adding loops of different operations shows up in the power consumption.
+
 
+
<blockquote>[[File:nop_mul_comparison.png|image]]
+
</blockquote>
+
 
+
=== Background on Setup (Arm) ===
+
For the rest of this tutorial, we'll be focusing on the STM32F3, which is the microcontroller on the CW303 Arm target (though other targets should demonstrate the same principles). Since the STM32F3 is an Arm Cortex M4 device, we'll need to refer to the [http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0553a/CHDJJGFB.html Cortex M4 Instruction Set] and the [http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0439b/CHDDIGAC.html Cortex M4 Instruction Set Summary].
+
 
+
The first thing we'll do is replace the <code>nop</code> instructions, since from it's documentation page we can see the processor may not execute them. Instead, let's add some <code>add.w</code> (which is the 32 bit wide version of the add instruction) instructions. We'll be doing this since the <code>mul</code> instruction is always 32 bits wide and the 16 bit thumb instruction has a different power profile than the 32 bit Arm instruction. From the earlier links, we can see that both add and mul take 1 cycle each to complete.
+
 
+
Now we should have 10 <code>add.w</code> instructions and 10 <code>mul</code> instructions:<syntaxhighlight lang="c">
+
trigger_high();
+
 
+
 
+
asm volatile(
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
::
+
);
+
 
+
asm volatile(
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
::
+
);
+
 
+
trigger_low();
+
</syntaxhighlight>Now hit the ''Run 1'' [[File:Capture One Button.PNG|image]] button and capture a single trace. You should now have something that looks like this:
+
 
+
[[File:B2 STM Addmul.PNG|frameless|1374x1374px]]
+
 
+
We can see the <code>add.w</code> and <code>mul</code> instructions near the beginning, staring about 10 samples in and ending about 90 samples in. There's not really any difference that we can see between the two, but we can see that they take up about 80 samples (20 microcontroller clock cycles) as we expect.
+
 
+
Next, let's insert some <code>udiv</code> instructions. From the Cortex M4 Instruction Set Summary, we can see that  <code>udiv</code> (unsigned divide) instructions take between 2 and 12 cycles to complete (effectively depending on how big the numbers we're dividing are). We'll be dividing <code>r0</code> by <code>r0</code>, meaning we expect that every instruction after the first should take 2 cycles. It should have higher power consumption too, since dividing is typically a fairly complex operation:<syntaxhighlight lang="c">
+
trigger_high();
+
 
+
asm volatile(
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
"add.w r0, r0"      "\n\t"
+
::
+
);
+
asm volatile(
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
"mul r0, r0"      "\n\t"
+
::
+
);
+
 
+
asm volatile(
+
"udiv r0, r0"      "\n\t"
+
"udiv r0, r0"      "\n\t"
+
"udiv r0, r0"      "\n\t"
+
"udiv r0, r0"      "\n\t"
+
"udiv r0, r0"      "\n\t"
+
"udiv r0, r0"      "\n\t"
+
"udiv r0, r0"      "\n\t"
+
"udiv r0, r0"      "\n\t"
+
"udiv r0, r0"      "\n\t"
+
"udiv r0, r0"      "\n\t"
+
::
+
);
+
 
+
trigger_low();
+
</syntaxhighlight>Capture another trace and you should get something like:
+
 
+
[[File:B2 STM Addmuldiv.PNG|frameless|1377x1377px]]
+
 
+
As we expected, we can see periods of high power consumption measuring about 80 samples in total right after the <code>add.w</code> and <code>mul</code> instructions. Interestingly, the <code>udiv</code> instructions seem to be split into 2 sets of operations. As a final check, we can add some more <code>mul</code> instructions and see the <code>udiv</code> instructions move down (and also break into more sections):
+
 
+
[[File:B2 STM Addmulmuldiv.PNG|frameless|1365x1365px]]
+
 
+
== Clock Phase Adjustment ==
+
 
+
A final area of interest is the clock phase adjustment. The clock phase adjustment is used to shift the ADC sample clock from the actual device clock by small amounts. This will affect the appearance of the captured waveform, and in more advanced methods is used to improve the measurement.
+
 
+
The phase adjustment is found under the ''Phase Adjust'' option of the ''ADC Clock'' setting:
+
 
+
<blockquote>[[File:phasesetting.png|image]]
+
</blockquote>
+
To see the effect this has, first consider an image of the power measured by a regular oscilloscope (at 1.25GS/s):
+
 
+
<blockquote>[[File:scope_real.png|image]]
+
</blockquote>
+
And the resulting waveforms for a variety of different phase shift settings:
+
 
+
[[File:phase_differences.png|image]]
+
 
+
The specifics of the capture are highly dependent on each ChipWhisperer board &amp; target platform. The phase shift allows customization of the capture waveform for optimum performance, however what constitutes 'optimum performance' is highly dependent on the specifics of your algorithm.
+
 
+
== Conclusion ==
+
 
+
In this tutorial you have learned how power analysis can tell you the operations being performed on a microcontroller. In future work we will move towards using this for breaking various forms of security on devices. In particular, [[Tutorial B3-1 Timing Analysis with Power for Password Bypass]] will examine how we can use this information to exploit a password check.
+
 
+
== Links ==
+
 
+
{{Template:Tutorials}}
+
[[Category:Tutorials]]
+

Latest revision as of 04:19, 29 July 2019

This tutorial has been updated for ChipWhisperer 5 release. If you are using 4.x.x or 3.x.x see the "V4" or "V3" link in the sidebar.

B2: Viewing Instruction Power Differences
Target Architecture XMEGA/Arm/Other
Hardware Crypto No
Software Release V3 / V4 / V5

This tutorial will introduce you to measuring the power consumption of a device under attack. It will demonstrate how you can view the difference between assembly instructions. In ChipWhisperer 5 Release, the software documentation is now held outside the wiki. See links below.

To see background on the tutorials see the Tutorial Introduction on ReadTheDocs, which explains what the links below mean. These wiki pages (that you are reading right now) only hold the hardware setup required, and you have to run the Tutorial via the Jupyter notebook itself. The links below take you to the expected Jupyter output from each tutorial, so you can compare your results to the expected/known-good results.

Running the tutorial uses the referenced Jupyter notebook file.

  • Jupyter file: PA_Intro_2-Instruction_Differences.ipynb


XMEGA Target

See the following for using:

  • ChipWhisperer-Lite Classic (XMEGA)
  • ChipWhisperer-Lite Capture + XMEGA Target on UFO Board (including NAE-SCAPACK-L1/L2 users)
  • ChipWhisperer-Pro + XMEGA Target on UFO Board

https://chipwhisperer.readthedocs.io/en/latest/tutorials/pa_intro_2-openadc-cwlitexmega.html#tutorial-pa-intro-2-openadc-cwlitexmega

ChipWhisperer-Lite ARM / STM32F3 Target

See the following for using:

  • ChipWhisperer-Lite 32-bit (STM32F3 Target)
  • ChipWhisperer-Lite Capture + STM32F3 Target on UFO Board (including NAE-SCAPACK-L1/L2 users)
  • ChipWhisperer-Pro + STM32F3 Target on UFO Board

https://chipwhisperer.readthedocs.io/en/latest/tutorials/pa_intro_2-openadc-cwlitearm.html#tutorial-pa-intro-2-openadc-cwlitearm