Tutorial A7 Glitch Buffer Attacks

A7: Glitch Buffer Attacks
Target Architecture	XMEGA/Arm
Hardware Crypto	No
Software Release	V3 / V4 / V5

This tutorial has been updated for ChipWhisperer 4.0.0 release. If you are using 3.x.x see the "V3" link in the sidebar.

This tutorial discusses a specific type of glitch attack. It shows how a simple printing loop can be abused, causing a target to print some otherwise private information. This attack will be used to recover a plaintext without any knowledge of the encryption scheme being used.

Background

This section introduces the attack concept by showing some real world examples of vulnerable firmware. Then, it describes the victim firmware that will be used in this tutorial.

Real Firmware

Typically, one of the slowest parts of an embedded system is its communication lines. It's pretty common to see a processor running in the MHz range with a serial connection of 96k baud. To make these two different speeds work together, embedded firmware usually fills up a buffer with data and lets a serial driver print on its own time. This setup means we can expect to see code like

for(int i = 0; i < number_of_bytes_to_print; i++)
{
    print_one_byte_to_serial(buffer[i]);
}

This is a pretty vulnerable piece of C. Imagine that we could sneak into the source code and change it to

for(int i = 0; i < really_big_number; i++)
{
    print_one_byte_to_serial(buffer[i]);
}

C compilers don't care that buffer[] has a limited size - this loop will happily print every byte it comes across, which could include other variables, registers, and even source code. Although we probably don't have a good way of changing the source code on the fly, we do have glitches: a well-timed clock or power glitch could let us skip the i < number_of_bytes_to_print check, which would have the same result.

How could this be applied? Imagine that we have an encrypted firmware image that we're going to transmit to a bootloader. A typical communication process might look like:

We send the encrypted image ciphertexts over a serial connection
The bootloader decrypts the ciphertexts and stores the result somewhere in memory
The bootloader sends back a response over the serial port

We have a pretty straightforward attack for this type of bootloader. During the last step, we'll apply a glitch at precisely the right time, causing the bootloader to print all kinds of things to the serial connection. With some luck, we'll be able to find the decrypted plaintext somewhere in this memory dump.

Bootloader Setup

For this tutorial, a very simple bootloader using the SimpleSerial protocol has been set up. The source for this bootloader can be found in chipwhisperer/hardware/victims/firmware/bootloader-glitch. The following commands are used:

pABCD\n: Send an encrypted ciphertext to the bootloader. For example, this message is made up of the two bytes AB and CD.
r0\n: The reply from the bootloader. Acknowledges that a message was received. No other responses are used.
x: Clear the bootloader's received buffer.
k: See x.

The bootloader uses triple-ROT-13 encryption to encrypt/decrypt the messages. To help you send messages to the target, the script private/encrypt.py prints the SimpleSerial command for a given fixed string. For example, the ciphertext for the string Don't forget to buy milk! is

p516261276720736265747267206762206f686c207a76797821\n

This folder also contains a Makefile to create a hex file for use with the ChipWhisperer hardware. The build process is the same as the previous tutorials: run make from the command line and make sure that everything built properly. If all goes well, the Makefile should print something like

----------------
Device: atxmega128d3

Program:    1706 bytes (1.2% Full)
(.text + .data + .bootloader)

Data:        248 bytes (3.0% Full)
(.data + .bss + .noinit)


Built for platform CW-Lite XMEGA

-------- end --------

The Attack Plan

Since we have access to the source code, let's take our time and understand how our attack is going to work before we dive in.

The Sensitive Code

Inside bootloader.c, there are two buffers that are used to store most of the important data. The source code shows:

#define DATA_BUFLEN 40
#define ASCII_BUFLEN (2 * DATA_BUFLEN)

uint8_t ascii_buffer[ASCII_BUFLEN];
uint8_t data_buffer[DATA_BUFLEN];

This tells us that there will be two arrays stored somewhere in the target's memory. The AVR-GCC compiler doesn't usually try too hard to move these around, so we can expect to find them back-to-back in memory; that is, if we can read past the end of the ASCII buffer, we'll probably find the data buffer.

Next, the code used to print a response to the serial port is

if(state == RESPOND)
{
	// Send the ascii buffer back 
	trigger_high();
	
	int i;
	for(i = 0; i < ascii_idx; i++)
	{
		putch(ascii_buffer[i]);
	}
	trigger_low();
	state = IDLE;
}

This looks very similar to the example code given in the previous section, so it should be vulnerable to a glitching attack. The goal is to cause the loop to continue past its regular limit: data_buffer[0] is the same as ascii_buffer[80], so a successful glitch should dump the data buffer for us.

Disassembly

As a final step, let's check the assembly code to see exactly what we're trying to glitch through. In the same folder as the hex file you built, open the *.lss file that corresponds to your target (for example, bootloader-CWLITEARM.lss). This is called a listing file, and it contains a bunch of debug and assembly information. Most importantly, it will allow us to easily match the source code to what's in our hex file. Search the file for something close to the vulnerable loop, such as state == RESPOND.

XMEGA Disassembly

After searching the file, you should see something like this:

 376:	89 91       	ld	r24, Y+
 378:	0e 94 06 02 	call	0x40c	;  0x40c
 37c:	f0 e2       	ldi	r31, 0x20	; 32
 37e:	cf 37       	cpi	r28, 0x7F	; 127
 380:	df 07       	cpc	r29, r31
 382:	c9 f7       	brne	.-14     	;  0x376

This is our printing loop in assembly. It has the following steps in it:

Look at the address Y and put the contents into r24. Increase the address stored in Y. (This is the i++ in the loop.)
Call the function in location 0x40c. Presumably, this is the location of the putch() function.
Compare r28 and r29 to 0x7F and 0x20. Unless they're equal, go back to the top of the loop.

There's one quirk to notice in this code. In the C source, the for loop checks whether i < ascii_idx. However, in the assembly code, the check is effectively whether i == ascii_idx! This is even easier to glitch - as long as we can break past the brne instruction once, we'll get to the data buffer.

Arm Disassembly

After searching the file, you should find:

if(state == RESPOND)
    {
      // Send the ascii buffer back
      trigger_high();
 8000258:	f000 f8da 	bl	8000410 <trigger_high>

      int i;
      for(i = 0; i < ascii_idx; i++)
      {
        putch(ascii_buffer[i]);
 800025c:	7828      	ldrb	r0, [r5, #0]
 800025e:	f000 f8f7 	bl	8000450 <putch>
 8000262:	7868      	ldrb	r0, [r5, #1]
 8000264:	f000 f8f4 	bl	8000450 <putch>
 8000268:	78a8      	ldrb	r0, [r5, #2]
 800026a:	f000 f8f1 	bl	8000450 <putch>
      }

This is our printing "loop", but as we can see, it's not really a loop at all! As it turns out, the compiler has unrolled our loop, which typically speeds up execution at the expense of code size. It will be very difficult to prevent this with such a small loop, so instead we'll increase the loop size by adding some extra newlines at the end of the message:

ascii_buffer[ascii_idx++] = '\n';
ascii_buffer[ascii_idx++] = '\n';
ascii_buffer[ascii_idx++] = '\n';
ascii_buffer[ascii_idx++] = '\n';
ascii_buffer[ascii_idx++] = '\n';

If you want to use the script at the bottom, you should add the 5 newlines above, but other values will work fine with different glitch parameters. Recompile and take another look at the listing file. You should see that our printing loop is actually a loop now:

if(state == RESPOND)
    {
      // Send the ascii buffer back
      trigger_high();
 8000262:	f000 f8d7 	bl	8000414 <trigger_high>

      int i;
      for(i = 0; i < ascii_idx; i++)
 8000266:	2400      	movs	r4, #0
      {
        putch(ascii_buffer[i]);
 8000268:	5d28      	ldrb	r0, [r5, r4]
      for(i = 0; i < ascii_idx; i++)
 800026a:	3401      	adds	r4, #1
        putch(ascii_buffer[i]);
 800026c:	f000 f8f2 	bl	8000454 <putch>
      for(i = 0; i < ascii_idx; i++)
 8000270:	2c08      	cmp	r4, #8
 8000272:	d1f9      	bne.n	8000268 <main+0x64>

We can break this assembly down into the following steps (starting with address 0x8000268):

Load the character we want to print into r0 (r5 contains the address of ascii_buffer, while r4 is our loop index i)
Add 1 to r4 (i)
Call putch
Compare r4 to 8 (8 is always the value of ascii_idx)
Branch back to the beginning of the loop if r4 and 8 aren't equal

There's one quirk to notice in this code. In the C source, the for loop checks whether i < ascii_idx. However, in the assembly code, the check is effectively whether i == ascii_idx! This is even easier to glitch - as long as we can break past the brne instruction once, we'll get to the data buffer.

Attack Script & Results

XMEGA Results

To speed up the tutorial, the script in #Appendix: Setup Script will open the ChipWhisperer Capture software and fill in all of the appropriate settings. Copy this code into a Python script and run it. Then, open the serial terminal and connect to the target, using the ASCII with Hex display mode. If everything is set up correctly, the Capture 1 button should cause the text r0 to appear in the terminal. This is the bootloader's response to a block of ciphertext.

Once this is set up, connect the glitch module's output to the target's clock. Do this by changing the Target HS IO-Out to Glitch Module. Try to Capture 1 again and watch the serial terminal. If you're lucky, a large amount of text will appear in this window:

r0
261276720736265747267206762206f686c207a767978210000000000000000000000000000000000000000000000
00000000000000Don't forget to buy milk!000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
...
<many more lines omitted>

In the middle of this output, the plaintext is clearly visible! The data buffer has successfully been printed to the serial port, allowing us to see the decrypted text with no knowledge of the algorithm.

If you can't get this to work, remember that glitching is a very sensitive operation - one glitch timing will probably not work for every board on every day. Try using the glitch explorer to attack different Glitch Widths, Glitch Offsets, and Ext Trigger Offsets. The built-in Glitch Explorer will be very useful here - take a read through Tutorial A2 Introduction to Glitch Attacks (including Glitch Explorer) if you need a refresher.

Arm Results

If you added the same number of newlines as shown (5), you should be able to run the script at the end of this page from inside ChipWhisperer. Then, open the serial terminal and connect to the target, using the ASCII with Hex display mode. If everything is set up correctly, the Capture 1 button should cause the text r0 to appear in the terminal. This is the bootloader's response to a block of ciphertext.

Once this is set up, connect the glitch module's output to the target's clock. Do this by changing the Target HS IO-Out to Glitch Module. Try to Capture 1 again and watch the serial terminal. If you're lucky, a very large amount of text will appear in this window:

Near the beginning of this output, the plaintext is clearly visible! The data buffer has successfully been printed to the serial port, allowing us to see the decrypted text with no knowledge of the algorithm.

This might take a few attempts. If the target stops responding, you'll need to reset it manually. You may also want to setup a module to iterate over some glitch parameters (ext_offset should be the most important one) like in Tutorial A2 Introduction to Glitch Attacks (including Glitch Explorer).

Ideas

There's a lot more that can be done with this type of attack...

Safer Assembly Code

You may have been surprised to see that the assembly code uses a brne instruction to check if the loop is finished - after all, we used a less-than comparison in our C source code! Try changing this line to use a more prohibitive loop. Here's how you might do this:

Find a copy of the AVR assembler documentation and find a better instruction to use. You should be able to drop in the brlt instruction without much hassle. Figure out the new op-code for this instruction.
Open the bootloader.hex file and find the instruction you want to change. Swap in your new op-code. Note that each line of the hex file has a checksum at the end, so you'll need to calculate an updated checksum.
Upload your new bootloader onto the target and retry the attack. Does it still work? You might be able to see one extra byte from the ASCII buffer, but it will be very difficult to get to the data buffer. Can you change the glitch settings to complete the attack?

Volatile Variables

The reason why the original assembly code used the brne instruction is because GCC is an optimizing compiler. The compiler doesn't directly translate the C source code into assembly instructions. Instead, it tries to determine if any of the code can be modified to make it faster or more compact. For instance, consider the loop

for(int i = 0; i < 10; i++)
{
    if(i < 20)
        printf("%s", "Less");
    else
        printf("%s", "Greater");
}

If you take a careful look at this code, you'll notice that the following loop will produce the same output:

for(int i = 0; i < 10; i++)
{
    printf("%s", "Less");
}

However, this second loop is smaller (less code) and faster (no conditional jumps). This is the kind of optimization a compiler can make.

There are several ways we can stop the compiler from making some of these assumptions. One of these methods uses volatile variables, which look like

volatile int i;

A volatile variable is one that could change at any time. There could be many reasons why the value might change on us:

Another thread might have access to the same memory location
Another part of the computer might be able to change the variable's value (example: direct memory access)
The variable might not actually be stored anywhere - it could be a read-only register in an embedded system

In any case, the volatile keyword tells the compiler to make no guarantees about this variable.

Try changing the bootloader's source code to use a volatile variable inside the loop. What happens to the disassembly? Is the loop body longer? Connect to the target board and capture a power trace. Does it look different? You'll have to find a new Ext Trigger Offset for the glitch module. Can you still perform the attack? Is it feasible to use this fix to avoid glitching attacks?

Appendix: Setup Script

The following script is used to set up the ChipWhisperer-Lite with all of the necessary settings:

# GUI compatibility
try:
    scope = self.scope
    target = self.target
except NameError:
    pass

scope.glitch.clk_src = 'clkgen'
scope.glitch.ext_offset = 68
scope.glitch.width = 3.0
scope.glitch.offset = -5.0
scope.glitch.trigger_src = "ext_single"

scope.gain.gain = 45
scope.adc.samples = 500
scope.adc.offset = 0
scope.adc.basic_mode = "rising_edge"
scope.clock.clkgen_freq = 7370000
scope.clock.adc_src = "clkgen_x4"
scope.trigger.triggers = "tio4"
scope.io.tio1 = "serial_rx"
scope.io.tio2 = "serial_tx"
scope.io.hs2 = "glitch"

target.go_cmd = "p516261276720736265747267206762206f686c207a76797821\\n"
target.key_cmd = ""
target.output_cmd = ""

Appendix: Arm Setup Script

# GUI compatibility
try:
    scope = self.scope
    target = self.target
except NameError:
    pass

scope.glitch.clk_src = 'clkgen'
scope.glitch.ext_offset = 16993
scope.glitch.width = -10
scope.glitch.offset = -40
scope.glitch.repeat = 2
scope.glitch.trigger_src = "ext_single"

scope.gain.gain = 45
scope.adc.samples = 500
scope.adc.offset = 0
scope.adc.basic_mode = "rising_edge"
scope.clock.clkgen_freq = 7370000
scope.clock.adc_src = "clkgen_x4"
scope.trigger.triggers = "tio4"
scope.io.tio1 = "serial_rx"
scope.io.tio2 = "serial_tx"
scope.io.hs2 = "glitch"

target.go_cmd = "p516261276720736265747267206762206f686c207a76797821\\n"
target.key_cmd = ""
target.output_cmd = ""