As of August 2020 the site you are on (wiki.newae.com) is deprecated, and content is now at rtfm.newae.com.

V3:Tutorial A7 Glitch Buffer Attacks

From ChipWhisperer Wiki
Revision as of 19:05, 5 November 2017 by Coflynn (Talk | contribs) (Created page with "This tutorial discusses a specific type of glitch attack. It shows how a simple printing loop can be abused, causing a target to print some otherwise private information. This...")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This tutorial discusses a specific type of glitch attack. It shows how a simple printing loop can be abused, causing a target to print some otherwise private information. This attack will be used to recover a plaintext without any knowledge of the encryption scheme being used.

Background

This section introduces the attack concept by showing some real world examples of vulnerable firmware. Then, it describes the victim firmware that will be used in this tutorial.

Real Firmware

Typically, one of the slowest parts of an embedded system is its communication lines. It's pretty common to see a processor running in the MHz range with a serial connection of 96k baud. To make these two different speeds work together, embedded firmware usually fills up a buffer with data and lets a serial driver print on its own time. This setup means we can expect to see code like

for(int i = 0; i < number_of_bytes_to_print; i++)
{
    print_one_byte_to_serial(buffer[i]);
}

This is a pretty vulnerable piece of C. Imagine that we could sneak into the source code and change it to

for(int i = 0; i < really_big_number; i++)
{
    print_one_byte_to_serial(buffer[i]);
}

C compilers don't care that buffer[] has a limited size - this loop will happily print every byte it comes across, which could include other variables, registers, and even source code. Although we probably don't have a good way of changing the source code on the fly, we do have glitches: a well-timed clock or power glitch could let us skip the i < number_of_bytes_to_print check, which would have the same result.

How could this be applied? Imagine that we have an encrypted firmware image that we're going to transmit to a bootloader. A typical communication process might look like:

  1. We send the encrypted image ciphertexts over a serial connection
  2. The bootloader decrypts the ciphertexts and stores the result somewhere in memory
  3. The bootloader sends back a response over the serial port

We have a pretty straightforward attack for this type of bootloader. During the last step, we'll apply a glitch at precisely the right time, causing the bootloader to print all kinds of things to the serial connection. With some luck, we'll be able to find the decrypted plaintext somewhere in this memory dump.

Bootloader Setup

For this tutorial, a very simple bootloader using the SimpleSerial protocol has been set up. The source for this bootloader can be found in chipwhisperer/hardware/victims/firmware/bootloader-glitch. The following commands are used:

  • pABCD\n: Send an encrypted ciphertext to the bootloader. For example, this message is made up of the two bytes AB and CD.
  • r0\n: The reply from the bootloader. Acknowledges that a message was received. No other responses are used.
  • x: Clear the bootloader's received buffer.
  • k: See x.

The bootloader uses triple-ROT-13 encryption to encrypt/decrypt the messages. To help you send messages to the target, the script private/encrypt.py prints the SimpleSerial command for a given fixed string. For example, the ciphertext for the string Don't forget to buy milk! is

p516261276720736265747267206762206f686c207a76797821\n

This folder also contains a Makefile to create a hex file for use with the ChipWhisperer hardware. The build process is the same as the previous tutorials: run make from the command line and make sure that everything built properly. If all goes well, the Makefile should print something like

----------------
Device: atxmega128d3

Program:    1706 bytes (1.2% Full)
(.text + .data + .bootloader)

Data:        248 bytes (3.0% Full)
(.data + .bss + .noinit)


Built for platform CW-Lite XMEGA

-------- end --------

The Attack Plan

Since we have access to the source code, let's take our time and understand how our attack is going to work before we dive in.

The Sensitive Code

Inside bootloader.c, there are two buffers that are used to store most of the important data. The source code shows:

#define DATA_BUFLEN 40
#define ASCII_BUFLEN (2 * DATA_BUFLEN)

uint8_t ascii_buffer[ASCII_BUFLEN];
uint8_t data_buffer[DATA_BUFLEN];

This tells us that there will be two arrays stored somewhere in the target's memory. The AVR-GCC compiler doesn't usually try too hard to move these around, so we can expect to find them back-to-back in memory; that is, if we can read past the end of the ASCII buffer, we'll probably find the data buffer.

Next, the code used to print a response to the serial port is

if(state == RESPOND)
{
	// Send the ascii buffer back 
	trigger_high();
	
	int i;
	for(i = 0; i < ascii_idx; i++)
	{
		putch(ascii_buffer[i]);
	}
	trigger_low();
	state = IDLE;
}

This looks very similar to the example code given in the previous section, so it should be vulnerable to a glitching attack. The goal is to cause the loop to continue past its regular limit: data_buffer[0] is the same as ascii_buffer[80], so a successful glitch should dump the data buffer for us.

Disassembly

As a final step, let's check the assembly code to see exactly what we're trying to glitch through. Run the command

avr-objdump -m avr -D bootloader.hex > disassembly.txt

and open disassembly.txt. If you know what to look for, you should find a snippet that looks something like:

 376:	89 91       	ld	r24, Y+
 378:	0e 94 06 02 	call	0x40c	;  0x40c
 37c:	f0 e2       	ldi	r31, 0x20	; 32
 37e:	cf 37       	cpi	r28, 0x7F	; 127
 380:	df 07       	cpc	r29, r31
 382:	c9 f7       	brne	.-14     	;  0x376

This is our printing loop in assembly. It has the following steps in it:

  • Look at the address Y and put the contents into r24. Increase the address stored in Y. (This is the i++ in the loop.)
  • Call the function in location 0x40c. Presumably, this is the location of the putch() function.
  • Compare r28 and r29 to 0x7F and 0x20. Unless they're equal, go back to the top of the loop.

There's one quirk to notice in this code. In the C source, the for loop checks whether i < ascii_idx. However, in the assembly code, the check is effectively whether i == ascii_idx! This is even easier to glitch - as long as we can break past the brne instruction once, we'll get to the data buffer.

Attack Script & Results

To speed up the tutorial, the script in #Appendix: Setup Script will open the ChipWhisperer Capture software and fill in all of the appropriate settings. Copy this code into a Python script and run it. Then, open the serial terminal and connect to the target, using the ASCII with Hex display mode. If everything is set up correctly, the Capture 1 button should cause the text r0 to appear in the terminal. This is the bootloader's response to a block of ciphertext.

Once this is set up, connect the glitch module's output to the target's clock. Do this by changing the Target HS IO-Out to Glitch Module. Try to Capture 1 again and watch the serial terminal. If you're lucky, a large amount of text will appear in this window:

r0
261276720736265747267206762206f686c207a767978210000000000000000000000000000000000000000000000
00000000000000Don't forget to buy milk!000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
...
<many more lines omitted> 

In the middle of this output, the plaintext is clearly visible! The data buffer has successfully been printed to the serial port, allowing us to see the decrypted text with no knowledge of the algorithm.

If you can't get this to work, remember that glitching is a very sensitive operation - one glitch timing will probably not work for every board on every day. Try using the glitch explorer to attack different Glitch Widths, Glitch Offsets, and Ext Trigger Offsets. The built-in Glitch Explorer will be very useful here - take a read through Tutorial A2 Introduction to Glitch Attacks (including Glitch Explorer) if you need a refresher.

Ideas

There's a lot more that can be done with this type of attack...

Safer Assembly Code

You may have been surprised to see that the assembly code uses a brne instruction to check if the loop is finished - after all, we used a less-than comparison in our C source code! Try changing this line to use a more prohibitive loop. Here's how you might do this:

  1. Find a copy of the AVR assembler documentation and find a better instruction to use. You should be able to drop in the brlt instruction without much hassle. Figure out the new op-code for this instruction.
  2. Open the bootloader.hex file and find the instruction you want to change. Swap in your new op-code. Note that each line of the hex file has a checksum at the end, so you'll need to calculate an updated checksum.
  3. Upload your new bootloader onto the target and retry the attack. Does it still work? You might be able to see one extra byte from the ASCII buffer, but it will be very difficult to get to the data buffer. Can you change the glitch settings to complete the attack?

Volatile Variables

The reason why the original assembly code used the brne instruction is because GCC is an optimizing compiler. The compiler doesn't directly translate the C source code into assembly instructions. Instead, it tries to determine if any of the code can be modified to make it faster or more compact. For instance, consider the loop

for(int i = 0; i < 10; i++)
{
    if(i < 20)
        printf("%s", "Less");
    else
        printf("%s", "Greater");
}

If you take a careful look at this code, you'll notice that the following loop will produce the same output:

for(int i = 0; i < 10; i++)
{
    printf("%s", "Less");
}

However, this second loop is smaller (less code) and faster (no conditional jumps). This is the kind of optimization a compiler can make.

There are several ways we can stop the compiler from making some of these assumptions. One of these methods uses volatile variables, which look like

volatile int i;

A volatile variable is one that could change at any time. There could be many reasons why the value might change on us:

  • Another thread might have access to the same memory location
  • Another part of the computer might be able to change the variable's value (example: direct memory access)
  • The variable might not actually be stored anywhere - it could be a read-only register in an embedded system

In any case, the volatile keyword tells the compiler to make no guarantees about this variable.

Try changing the bootloader's source code to use a volatile variable inside the loop. What happens to the disassembly? Is the loop body longer? Connect to the target board and capture a power trace. Does it look different? You'll have to find a new Ext Trigger Offset for the glitch module. Can you still perform the attack? Is it feasible to use this fix to avoid glitching attacks?

Appendix: Setup Script

The following script is used to set up the ChipWhisperer-Lite with all of the necessary settings:

#!/usr/bin/python
# -*- coding: utf-8 -*-
#
# Copyright (c) 2013-2016, NewAE Technology Inc
# All rights reserved.
#
# Authors: Colin O'Flynn, Greg d'Eon
#
# Find this and more at newae.com - this file is part of the chipwhisperer
# project, http://www.assembla.com/spaces/chipwhisperer
#
#    This file is part of chipwhisperer.
#
#    chipwhisperer is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.
#
#    chipwhisperer is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU Lesser General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with chipwhisperer.  If not, see <http://www.gnu.org/licenses/>.
#=================================================

import sys
import chipwhisperer.capture.ui.CWCaptureGUI as cwc
from chipwhisperer.common.api.CWCoreAPI import CWCoreAPI
from chipwhisperer.common.scripts.base import UserScriptBase
from chipwhisperer.common.utils.parameter import Parameter

# Check for PySide
try:
    from PySide.QtCore import *
    from PySide.QtGui import *
except ImportError:
    print "ERROR: PySide is required for this program"
    sys.exit()

class UserScript(UserScriptBase):
    def __init__(self, api):
        super(UserScript, self).__init__(api)

    def run(self):
        #User commands here
        print "***** Starting User Script *****"
    
        # Set up board and target
        self.api.setParameter(['Generic Settings', 'Scope Module', 'ChipWhisperer/OpenADC'])
        self.api.setParameter(['Generic Settings', 'Trace Format', 'ChipWhisperer/Native'])
        self.api.setParameter(['Generic Settings', 'Target Module', 'Simple Serial'])
        self.api.connect()

        # Fill in our other settings
        lstexample = [['OpenADC', 'Gain Setting', 'Mode', 'high'],
                      ['OpenADC', 'Gain Setting', 'Setting', 30],
                      ['OpenADC', 'Trigger Setup', 'Mode', 'rising edge'],
                      ['OpenADC', 'Trigger Setup', 'Total Samples', 500],
                      ['OpenADC', 'Trigger Setup', 'Offset', 0],
                      ['OpenADC', 'Clock Setup', 'CLKGEN Settings', 'Divide', 26],
                      ['OpenADC', 'Clock Setup', 'CLKGEN Settings', 'Multiply', 2],
                      ['OpenADC', 'Clock Setup', 'ADC Clock', 'Source', 'CLKGEN x4 via DCM'],
                      ['OpenADC', 'Clock Setup', 'ADC Clock', 'Reset ADC DCM', None],
                      ['CW Extra Settings', 'Target HS IO-Out', 'CLKGEN'],
                      ['CW Extra Settings', 'Target IOn Pins', 'Target IO2', 'Serial TXD'],
                      ['CW Extra Settings', 'Target IOn Pins', 'Target IO1', 'Serial RXD'],
                      ['Glitch Module', 'Clock Source', 'CLKGEN'],
                      ['Glitch Module', 'Glitch Width (as % of period)', 3.0],
                      ['Glitch Module', 'Glitch Offset (as % of period)', -5.0],
                      ['Glitch Module', 'Glitch Trigger', 'Ext Trigger:Single-Shot'],
                      ['Glitch Module', 'Ext Trigger Offset', 68],
                      
                      ['Simple Serial', 'Go Command', u'p516261276720736265747267206762206f686c207a76797821\\n'],
                      ['Simple Serial', 'Output Format', u''],
                      ['Simple Serial', 'Load Key Command', u''],
                      ]

        # NOTE: For IV: offset = 70000
        #Download all hardware setup parameters
        for cmd in lstexample:
            self.api.setParameter(cmd)

        # Try a couple of captures
        self.api.capture1()

        print "***** Ending User Script *****"


if __name__ == '__main__':
    # Run the program
    app = cwc.makeApplication()
    Parameter.usePyQtGraph = True 
    api = CWCoreAPI()             
    gui = cwc.CWCaptureGUI(api)                
    gui.show()                                 
    
    # Run our program and let the GUI take over
    api.runScriptClass(UserScript)             
    sys.exit(app.exec_())