Changes

Tutorial A5-Bonus Breaking AES-256 Bootloader

18,445 bytes removed, 13:36, 29 July 2019
no edit summary
{{Warningbox|This tutorial is an add-on to [[Tutorial A5 Breaking AES-256 Bootloader]]has been updated for ChipWhisperer 5 release. It continues working on If you are using 4.x.x or 3.x.x see the same firmware, showing how to obtain the hidden IV and signature "V4" or "V3" link in the bootloader. '''It is not possible to do this bonus tutorial without first completing the regular tutorial''', so please finish Tutorial A5 firstsidebar.}}
''This {{Infobox tutorial is under construction! Check back in a few days.''|name = A5: Breaking AES-256 Bootloader|image = |caption = |software versions =|capture hardware = CW-Lite, CW-Lite 2-Part, CW-Pro|Target Device = |Target Architecture = XMEGA/Arm|Hardware Crypto = No|Purchase Hardware = }}
= Exploring the Bootloader =In <!-- To edit this tutorial, we have the luxury of seeing the source code of the bootloader. This is generally not something we would have access to in the real world, so we'll try not to use it to cheat. (Peeking at <codeedit Template:Tutorial_boilerplate -->supersecret.h</code> counts as cheating.) Instead, we'll use the source to help us identify important parts of the power traces.{{Tutorial boilerplate}}
== Bootloader Source Code ==Inside the bootloader* Jupyter file: 's main loop, it does three tasks that we're interested in:* it decrypts the incoming ciphertext;* it applies the IV to the decryption's result; and* it checks for the signature in the resulting plaintextPA_Multi_1-Breaking_AES-256_Bootloader.This snippet from <code>bootloader.c</code> shows all three of these tasks:ipynb'''
<pre>
// Continue with decryption
trigger_high();
aes256_decrypt_ecb(&ctx, tmp32);
trigger_low();
// Apply IV (first 16 bytes)
for (i = 0; i < 16; i++){
tmp32[i] ^= iv[i];
}
//Save IV for next time from original ciphertext for (i = 0; i < 16; i++){ iv[i] = tmp32[i+16];}XMEGA Target ==
// Tell the user that See the CRC check was okayfollowing for using:putch* ChipWhisperer-Lite Classic (COMM_OKXMEGA);putch* ChipWhisperer-Lite Capture + XMEGA Target on UFO Board (COMM_OKincluding NAE-SCAPACK-L1/L2 users);* ChipWhisperer-Pro + XMEGA Target on UFO Board
https://Check the signatureif ((tmp32[0] == SIGNATURE1) && (tmp32[1] == SIGNATURE2) && (tmp32[2] == SIGNATURE3) && (tmp32[3] == SIGNATURE4)){ chipwhisperer.readthedocs.io/en/ Delay to emulate a write to flash memory _delay_ms(1);} <latest/tutorials/pre>This gives us a pretty good idea of how the microcontroller is going to do its jobpa_multi_1-openadc-cwlitexmega. However, we can go one step further and find the exact assembly code that the target will execute. If you have Atmel Studio and its toolchain on your computer, you can get the assembly file from the command line with<pre>avrhtml#tutorial-objdump pa-m avr multi-D bootloader.hex > disassembly.txt</pre>This will convert the hex file into assembly code, making it more human1-readable. The important part of this assembly code is:<pre> 344: d3 01 movw r26, r6 346: 93 01 movw r18, r6 348: f6 01 movw r30, r12 34a: 80 81 ld r24, Z 34c: f9 01 movw r30, r18 34e: 91 91 ld r25, Z+ 350: 9f 01 movw r18, r30 352: 89 27 eor r24, r25 354: f6 01 movw r30, r12 356: 81 93 st Z+, r24 358: 6f 01 movw r12, r30 35a: ee 15 cp r30, r14 35c: ff 05 cpc r31, r15 35e: a1 f7 brne .-24 ; 0x348 360: fe 01 movw r30, r28 362: b1 96 adiw r30, 0x21 ; 33 364: 81 91 ld r24, Z+ 366: 8d 93 st X+, r24 368: e4 15 cp r30, r4 36a: f5 05 cpc r31, r5 36c: d9 f7 brne .openadc-10 ; 0x364cwlitexmega
36e: 84 ea ldi r24, 0xA4 ; 164 370: 0e 94 16 02 call 0x42c ; 0x42c 374: 84 ea ldi r24, 0xA4 ; 164 376: 0e 94 16 02 call 0x42c ; 0x42c== ChipWhisperer-Lite ARM / STM32F3 Target ==
37aSee the following for using: 89 89 ldd r24, Y+17 ; 0x11 37c: 88 23 and r24, r24* ChipWhisperer-Lite 32-bit (STM32F3 Target) 37e: 09 f0 breq .* ChipWhisperer-Lite Capture +2 ; 0x382STM32F3 Target on UFO Board (including NAE-SCAPACK-L1/L2 users) 380: 98 cf rjmp .* ChipWhisperer-208 ; 0x2b2Pro + STM32F3 Target on UFO Board
382https: 8a 89 ldd r24, Y+18 ; 0x12 384: 8b 3e cpi r24, 0xEB ; 235 386: 09 f0 breq //chipwhisperer.+2 ; 0x38a 388: 94 cf rjmp readthedocs.io/en/latest/tutorials/pa_multi_1-openadc-cwlitearm.html#tutorial-pa-multi-1-openadc-216 ; 0x2b2cwlitearm
38a: 8b 89 ldd r24, Y+19 ; 0x13 38c: 82 30 cpi r24, 0x02 ; 2 38e: 09 f0 breq .+2 ; 0x392 390: 90 cf rjmp .-224 ; 0x2b2== ChipWhisperer Nano Target ==
392: 8c 89 ldd r24, Y+20 ; 0x14 394: 8d 31 cpi r24, 0x1D ; 29 396: 09 f0 breq .+2 ; 0x39a 398: 8c cf rjmp .-232 ; 0x2b2  39a: 83 e3 ldi r24, 0x33 ; 51 39c: 97 e0 ldi r25, 0x07 ; 7 39e: 01 97 sbiw r24, 0x01 ; 1 3a0: f1 f7 brne .-4 ; 0x39e 3a2: 87 cf rjmp .-242 ; 0x2b2</pre> We'll use both of the source files throughout the This tutorial. == Power Traces ==After the bootloader is finished the decryption process, it executes a couple of distinct pieces of code:* To apply the IV, it uses an XOR operation;* To store the new IV, it copies the previous ciphertext into the IV array;* It sends two bytes on the serial port;* It checks the bytes of the signature one by one.We should be able to recognize these four parts of the code in the power traces. Let's modify our capture routine to find them. Re-run the capture script and change a few settings:<ol><li> We'd like to skip over all of the decryption process. The source code around this point is:<pre>trigger_high(); aes256_decrypt_ecb(&ctx, tmp32); /* encrypting the data block */trigger_low();</pre>so we can skip straight over the AES-256 function by triggering on a falling edge instead of a rising edge. Change this in the scope settings.<li> We don't need as many samples now. Change the number of samples to 3000.<li> If we decrypt multiple ciphertexts in a row, only the first one will use the secret IV - all of the others will use the previous ciphertext instead. To avoid this, we'll have to automatically reset the board.<ol><li> In the ''General Settings'' tab, change the Auxiliary Module to ''Reset AVR/XMEGA via CW-Lite''.<li> In the ''Aux Settings'' tab, change both delays to around 100 ms.</ol><li> Capture one trace and make sure that everything works.</ol>If everything worked out, you should be able to see all of the code's features: [[File:Tutorial-A5-Bonus-Trace-Notes.PNG]] With all of these things clearly visible, we have a pretty good idea of how to attack the IV and the signature. We should be able to look at each of the XOR spikes to find each of the IV bytes - each byte is processed on its own. Then, the signature check uses a short-circuiting comparison: as soon as it finds a byte in error, it stops checking the remaining bytes. This type of check is susceptible to a timing attack. Let's grab a lot of traces so that we don't have to come back later. Save the project somewhere memorable, set up the capture routine to record 1000 traces, hit ''Capture Many'', and grab a coffee. = Attacking the IV =We need to find the IV before we can look at the signature, so the first half of the attack will look at the IV bytes. == Attack Theory ==The bootloader applies the IV to the AES decryption result by calculating <math>\text{PT} = \text{DR} \oplus \text{IV}</math> where DR is the decrypted ciphertext, IV is the secret vector, and PT is the plaintext that the bootloader will use later. We only have access to one of these: since we know the AES-256 key, we can calculate DR. Specifically, the assembly code to calculate the plaintext is the loop<pre> 344: d3 01 movw r26, r6 346: 93 01 movw r18, r6 348: f6 01 movw r30, r12 34a: 80 81 ld r24, Z 34c: f9 01 movw r30, r18 34e: 91 91 ld r25, Z+ 350: 9f 01 movw r18, r30 352: 89 27 eor r24, r25 354: f6 01 movw r30, r12 356: 81 93 st Z+, r24 358: 6f 01 movw r12, r30 35a: ee 15 cp r30, r14 35c: ff 05 cpc r31, r15 35e: a1 f7 brne .-24 ; 0x348</pre>This code includes two <code>ld</code> instructions, one <code>eor</code>, and one <code>st</code>: the DR and IV are loaded and XORed to get PT, which is then stored back where DR was. All of these instructions should be visible in the power traces. This is enough information not available for us to attack a single bit of the IV. Suppose we only wanted to get the first bit (number 0) of the IV. We could do the following:* Split all of the traces into two groups: those with DR[0] = 0, and those with DR[0] = 1.* Calculate the average trace for both groups.* Find the difference between the two averages. It should include a noticeable spike during the first iteration of the loop.* Look at the direction of the spike to decide if the IV bit is 0 (<code>PT[0] = DR[0]</code>) or if the IV bit is 1 (<code>PT[0] = ~DR[0]</code>).This is effectively a DPA attack on a single bit of the IV. We can repeat this attack 128 times to recover the entire IV. == A 1-Bit Attack ==Unfortunately, we can't use the ChipWhisperer Analyzer to attack this XOR functionNano. Instead, we'll write our own Python code. One thing that we ''don't'' need to do is write our own AES-256 implementation: there's some perfectly fine code in the PyCrypto library. [https://pypi.python.org/pypi/pycrypto Install PyCrypto] and make sure you can use its functions:<pre>pythonPython 2.7.10 (default, May 23 2015, 09:40:32) [MSC v.1500 32 bit (Intel)] on win32Type "help", "copyright", "credits" or "license" for more information.>>> from Crypto.Cipher import AES>>> AES<module 'Crypto.Cipher.AES' from 'C:\WinPython-32bit-2.7.10.3\python-2.7.10\lib\site-packages\Crypto\Cipher\AES.pyc'></pre> Next, open a new Python script wherever you like and load your data that you recorded earlier. You might want to rename the files to make them easier to work with. It'll also be helpful to know how many traces we have and how long they are: <pre># Load dataimport numpy as np traces = np.load(r'traces\traces.npy')textin = np.load(r'traces\textin.npy')numTraces = len(traces)traceLen = len(traces[0]) print numTracesprint traceLen</pre> It's also a good idea to plot some traces and make sure they look okay: <pre># Plot some traces import matplotlib.pyplot as pltfor i in range(10): plt.plot(traces[i])plt.show()</pre> Since we know the AES-256 key, we can decrypt all of this data and store it in a list of decryption results: <pre># Decrypt ciphertext with the key that we now knowfrom Crypto.Cipher import AESknownkey = [0x94, 0x28, 0x5D, 0x4D, 0x6D, 0xCF, 0xEC, 0x08, 0xD8, 0xAC, 0xDD, 0xF6, 0xBE, 0x25, 0xA4, 0x99, 0xC4, 0xD9, 0xD0, 0x1E, 0xC3, 0x40, 0x7E, 0xD7, 0xD5, 0x28, 0xD4, 0x09, 0xE9, 0xF0, 0x88, 0xA1]knownkey = str(bytearray(knownkey))dr = []aes = AES.new(knownkey, AES.MODE_ECB)for i in range(numTraces): ct = str(bytearray(textin[i])) d = aes.decrypt(ct) d = [bytearray(pt)[i] for i in range(16)] dr.append(d)print dr</pre> That's a lot of data to print! Now, let's split the traces into two groups by comparing bit 0 of the DR: <pre># Split traces into 2 groupsgroupedTraces = [[] for _ in range(2)]for i in range(numTraces): bit0 = dr[i][0] & 0x01 groupedTraces[bit0].append(traces[i])print len(groupedTraces[0])</pre> If you have 1000 traces, you should expect this to print a number around 500 - roughly half of the traces should fit into each group. Now, NumPy's <code>average</code> function lets us easily calculate the average at each point: <pre># Find averages and differencesmeans = []for i in range(2): means.append(np.average(groupedTraces[i], axis=0))diff = means[1] - means[0]</pre> Finally, we can plot this difference to see if we can spot the IV: <pre>plt.plot(diff)plt.grid()plt.show()</pre> This makes a plot with some pretty obvious spikes: [[File:Tutorial-A5-Bonus-Diff-0.PNG]] However, one of these spikes is meaningless to us. The spike around sample 1600 is caused by the signature check, which we aren't attacking yet. Let's ignore this peak and zoom in on the smaller spikes at the start of the trace:  [[File:A5-Bonus-Diff-0-Zoom.PNG]] This is it! We've got a pretty clear signal telling us where the decryption result is used. You can make sure this isn't a coincidence by using the second byte instead: [[File:Tutorial-A5-Bonus-Diff-1.PNG]] These peaks are about 60 samples later (or 15 cycles, since we're using an ADC clock that's 4 times faster than the microcontroller). Also, most of these peaks are upside down! This is a pretty clear indicator that we can find the IV bits from these differential traces. The only thing that we can't tell is the polarity of these signals; there's no way to tell if right-side-up peaks indicate a bit that's set or cleared. However, that means we can narrow down the IV to two possibilities, which is a lot better than <math>2^{128}</math>. == The Other 127 ==The best way to attack the IV would be to repeat the 1-bit conceptual attack for each of the bits. Try to do this yourself! (Really!) If you're stuck, here are a few hints to get you going:* One easy way of looping through the bits is by using two nested loops, like this:<pre>for byte in range(16): for bit in range(8): # Attack bit number (byte*8 + bit)</pre>* The sample that you'll want to look at will depend on which byte you're attacking. We had success when we used <code>location = 51 + byte*60</code>, but your mileage will vary.* The bitshift operator and the bitwise-AND operator are useful for getting at a single bit:<pre># This will either result in a 0 or a 1checkIfBitSet = (byteToCheck >> bit) & 0x01</pre>If you're ''really, really'' stuck, there's a working attack in [[#Appendix A: IV Attack Script]]. You should find that the secret IV is <code>C1 25 68 DF E7 D3 19 DA 10 E2 41 71 33 B0 EB 3C</code>. = Attacking the Signature =The last thing we can do with this bootloader is attack the signature. This final section will show how one byte of the signature could be recovered. If you want more of this kind of analysis, a more complete timing attack is shown in [[Tutorial B3-1 Timing Analysis with Power for Password Bypass]]. == Attack Theory ==Recall from earlier that the signature check in C looks like: <pre>if ((tmp32[0] == SIGNATURE1) && (tmp32[1] == SIGNATURE2) && (tmp32[2] == SIGNATURE3) && (tmp32[3] == SIGNATURE4)){</pre>or, in assembly,<pre> 37a: 89 89 ldd r24, Y+17 ; 0x11 37c: 88 23 and r24, r24 37e: 09 f0 breq .+2 ; 0x382 380: 98 cf rjmp .-208 ; 0x2b2  382: 8a 89 ldd r24, Y+18 ; 0x12 384: 8b 3e cpi r24, 0xEB ; 235 386: 09 f0 breq .+2 ; 0x38a 388: 94 cf rjmp .-216 ; 0x2b2  38a: 8b 89 ldd r24, Y+19 ; 0x13 38c: 82 30 cpi r24, 0x02 ; 2 38e: 09 f0 breq .+2 ; 0x392 390: 90 cf rjmp .-224 ; 0x2b2  392: 8c 89 ldd r24, Y+20 ; 0x14 394: 8d 31 cpi r24, 0x1D ; 29 396: 09 f0 breq .+2 ; 0x39a 398: 8c cf rjmp .-232 ; 0x2b2</pre>In C, boolean expressions support ''short-circuiting''. When checking multiple conditions, the program will stop evaluating these booleans as soon as it can tell what the final value will be. In this case, unless all four of the equality checks are true, the result will be false. Thus, as soon as the program finds a single false condition, it's done. The assembly code confirms this short-circuiting operation. Each of the four assembly blocks include a comparison (<code>and</code> or <code>cpi</code>), a ''branch if equal'' (<code>brqe</code>), and a relative jump (<code>rjmp</code>). All four of the relative jumps return the program to the same location (the start of the <code>while(1)</code> loop), and all four of the branches try to avoid these relative jumps. If any of the comparisons are false, the relative jumps will return the program back to the start of the loop. All four branches must succeed to get into the body of the <code>if</code> block. The short-circuiting conditions are perfect for us. We can use our power traces to watch how long it takes for the signature check to fail. If the check takes longer than usual, then we know that the first byte of our signature was right. == Finding a Single Byte == = Appendix A: IV Attack Script =This is the author's script to automatically attack the secret IV. If you've completed [[#The Other 127]], you can paste this snippet immediately after it: <pre># Attack!for byte in range(16): location = 51 + byte * 60 iv = 0 for bit in range(8): # Check if the decrypted bits are 0 or 1 pt_bits = [((dr[i][byte] >> (7 - bit)) & 0x01) for i in range(numTraces)] # Split the traces into two groups groupedPoints = [[] for _ in range(2)] for i in range(numTraces): groupedPoints[pt_bits[i]].append(traces[i][location]) # Get the means for each bit and subtract them means = [] for i in range(2): means.append(np.average(groupedPoints[i])) diff = means[1] - means[0] # Look in point of interest location iv_bit = 1 if diff > 0 else 0 iv = (iv << 1) | iv_bit print iv_bit, print "%02x" % iv</pre> The output from this script is:<pre>1 1 0 0 0 0 0 1 c10 0 1 0 0 1 0 1 250 1 1 0 1 0 0 0 681 1 0 1 1 1 1 1 df1 1 1 0 0 1 1 1 e71 1 0 1 0 0 1 1 d30 0 0 1 1 0 0 1 191 1 0 1 1 0 1 0 da0 0 0 1 0 0 0 0 101 1 1 0 0 0 1 0 e20 1 0 0 0 0 0 1 410 1 1 1 0 0 0 1 710 0 1 1 0 0 1 1 331 0 1 1 0 0 0 0 b01 1 1 0 1 0 1 1 eb0 0 1 1 1 1 0 0 3c</pre> = Appendix D AES-256 IV Attack Script = '''NB: This script works for 0.10 release or later, see local copy in doc/html directory of chipwhisperer release if you need earlier versions''' Full attack script, copy/paste into a file then add as active attack script: <pre>#IV Attack Scriptfrom chipwhisperer.common.autoscript import AutoScriptBase#Imports from Preprocessingimport chipwhisperer.analyzer.preprocessing as preprocessing#Imports from Capturefrom chipwhisperer.analyzer.attacks.CPA import CPAfrom chipwhisperer.analyzer.attacks.CPAProgressive import CPAProgressiveimport chipwhisperer.analyzer.attacks.models.AES128_8bit# Imports from utilList # Imports for AES256 Attackfrom chipwhisperer.analyzer.attacks.models.AES128_8bit import getHW #Imports for IV Attackfrom Crypto.Cipher import AES class AESIVAttack(object): numSubKeys = 16  @staticmethod def leakage(textin, textout, guess, bnum, setting, state): knownkey = [0x94, 0x28, 0x5D, 0x4D, 0x6D, 0xCF, 0xEC, 0x08, 0xD8, 0xAC, 0xDD, 0xF6, 0xBE, 0x25, 0xA4, 0x99, 0xC4, 0xD9, 0xD0, 0x1E, 0xC3, 0x40, 0x7E, 0xD7, 0xD5, 0x28, 0xD4, 0x09, 0xE9, 0xF0, 0x88, 0xA1] knownkey = str(bytearray(knownkey)) ct = str(bytearray(textin))  aes = AES.new(knownkey, AES.MODE_ECB) pt = aes.decrypt(ct) return getHW(bytearray(pt)[bnum] ^ guess) class userScript(AutoScriptBase): preProcessingList = [] def initProject(self): pass  def initPreprocessing(self): self.preProcessingResyncSAD0 = preprocessing.ResyncSAD.ResyncSAD(self.parent) self.preProcessingResyncSAD0.setEnabled(True) self.preProcessingResyncSAD0.setReference(rtraceno=0, refpoints=(6300,6800), inputwindow=(6000,7200)) self.preProcessingResyncSAD1 = preprocessing.ResyncSAD.ResyncSAD(self.parent) self.preProcessingResyncSAD1.setEnabled(True) self.preProcessingResyncSAD1.setReference(rtraceno=0, refpoints=(4800,5100), inputwindow=(4700,5200)) self.preProcessingList = [self.preProcessingResyncSAD0,self.preProcessingResyncSAD1,] return self.preProcessingList  def initAnalysis(self): self.attack = CPA(self.parent, console=self.console, showScriptParameter=self.showScriptParameter) self.attack.setAnalysisAlgorithm(CPAProgressive, AESIVAttack, None) self.attack.setTraceStart(0) self.attack.setTracesPerAttack(100) self.attack.setIterations(1) self.attack.setReportingInterval(25) self.attack.setTargetBytes([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]) self.attack.setTraceManager(self.traceManager()) self.attack.setProject(self.project()) self.attack.setPointRange((4800,6500)) return self.attack  def initReporting(self, results): results.setAttack(self.attack) results.setTraceManager(self.traceManager()) self.results = results  def doAnalysis(self): self.attack.doAttack()</pre> = Attacking the Signature =
Approved_users, administrator
366
edits