Changes

Correlation Power Analysis

2,079 bytes removed, 13:29, 1 May 2018

Changed header levels

This page discusses some of the basics of these steps, describing what models are typically used in CPA attacks and how the Pearson correlation coefficient is calculated.

== Modeling Power Consumption ==

Electronic computers (microcontrollers, FPGAs, etc) have two components to their power consumption. First, static power consumption is the power required to keep the device running. This static power depends on things like the number of transistors inside the device. Secondly, and more importantly, dynamic power consumption depends on the data moving around inside the device. Every time a bit is changed from a 0 to a 1 (or vice versa), some current is required to (dis)charge the data lines. The dynamic power is the part that we're interested in - it can tell us what's happening inside.

</pre>

== Pearson's Correlation Coefficient ==

Once we have a way to model our power consumption, we need a way to compare our power estimate to our measured traces. A helpful tool for finding this relationship is through Pearson's correlation coefficient, which is

* If <math>Y</math> is totally independent of <math>X</math>, it will be 0.

~~= Attacking with Correlation =After taking~~ These equations are referred to as the ''normalized cross-correlation''. There are typically used to pick out patterns in noisy signals. For example, in digital imaging, correlation can be used to find where an object is in a room. In our ~~measurements~~attack, we'll ~~have <math>D</math>~~ be looking for a our model (a pre-calculated pattern) in measured power traces ~~<math>t</math>, and each of these traces will have <math>T</math> data points. Using subscript notation, <math>t_{d, j}</math> will refer to point <math>j</math> in trace <math>d</math>~~ (~~<math>1 \le d \le D, 1 \le j \le T</math>~~noisy signals).

== Attacking with Correlation ==After taking our measurements, we'll have <math>D</math> power traces <math>t</math>, and each of these traces will have <math>T</math> data points. Using subscript notation, <math>t_{d, j}</math> will refer to point <math>j</math> in trace <math>d</math> (<math>1 \le d \le D, 0 \le j < T</math>). We'll also estimate the power consumption in each trace using our model. We'll say that there are <math>I</math> different subkeys that we want to try. Then, <math>h_{d, i}</math> will refer to our power estimate in trace <math>d</math>, assuming that the subkey is <math>i</math> (<math>1 \le d \le D, 1 0 \le i ~~\le~~ < I</math>).

With this data, we can see how well our model and measurements match for each guess <math>i</math> and time <math>j</math>. We'll do this by finding how <math>t</math> and <math>h</math> correlate over the <math>D</math> traces. One way of calculating this is:

Note that these two sums are equivalent.

== Picking a Subkey ==

The last step is to use the values of <math>r_{i,j}</math> to decide which subkey matches our traces most closely. There are two steps to this:

* For each subkey <math>i</math>, find the highest value of <math>|r_{i,j}|</math>. This will discard the time information - we want to know how good our guess was, but we don't care where our guess matched the trace.

Note that we're only working with absolute values here because we don't care about the sign of the relationship. All we need to know is that a linear correlation exists.

== Example: AES-128 ==As an example, consider AES-128 encryption. The pseudo-code for this algorithm is:<pre>// AES-128 Cipher// in: 128 bits (plaintext)// out: 128 bits (ciphertext)// w: 44 words, 32 bits each (expanded key)state = in

~~Here we will assume the attack has a power trace <math>t_{d~~AddRoundKey(state,~~j}</math>~~w[0, ~~where <math>j =~~ Nb-1,2,\cdots,T</math> is the time index in the trace, and <math>d = 1,2,\cdots,D</math> is the trace number. Thus the attacker makes <math>D</math> measurements, each one <math>T</math> points long. If the attacker knew ''exactly'' where a cryptographic operation occurred, they would need to only measure a single point such that <math>T=1</math>. For each trace <math>d</math>, the attacker also knows the plaintext or ciphertext corresponding to that power trace, defined as <math>p_d</math>.])

Assume also the attacker has a model of how the power consumption of the device depends on some intermediate value. For example the attacker could assume the power consumption of a microcontroller was dependent on the hamming weight of the intermediate value. We will define <math>h_{d,i} for round = l1 step 1 to Nr–1 SubBytes( ~~w( p_d, i~~ state))< //~~math>, where~~ <~~math>l~~-- Attack this point in round 1! ShiftRows(xstate)~~</math> is the leakage model for a given intermediate value~~ MixColumns(state) AddRoundKey(state, ~~and <math>~~w[round*Nb, (~~p, i~~round+1)~~</math> generates an intermediate value given the input plaintext and the guess number <math>i =~~ *Nb-1~~,2,\cdots,I</math>.~~])end for

~~This intermediate value will be selected to depend on the input plaintext and a small portion of the secret key. For example with AES~~SubBytes(state)ShiftRows(state)AddRoundKey(state, w[Nr*Nb, ~~each byte of the plaintext is XOR'd with each byte~~ (~~subkey~~Nr+1)*Nb-1]) ~~of the secret key. In this example we would have:~~

out = state<~~blockquote~~/pre>This might look pretty complex, but we only need to worry about one part of the algorithm. After the first <~~math~~code>lSubBytes(~~x) = HammingWeight(x~~)</~~math~~code>step, the state is effectively<~~math~~pre>w// State after AddRoundKey(~~p, i~~) and SubBytes()state = ~~p \oplus i</math>~~sbox[in ^ key]</~~blockquote~~pre>~~This implies that the input plaintext~~ where <~~math~~code>psbox</~~math~~code> is ~~being attacked~~ a ~~single byte at a time, which means~~ lookup table used in AES. This is an effective point to attack. If we ~~are attacking~~ know that a ~~single~~ byte of the ~~AES key at a time. While we still need to enumerate all possibilities for this subkey, we now only have~~ plaintext is <math>~~16 \times 2^8~~p_d</math> ~~instead of <math>2^{128}</math> possibilities for AES-128.~~, then our modeled power consumption will be

~~We will be next using the correlation coefficient to look for a linear relationship between the predicted power consumption~~ <math>~~l(x)</math> and the measured power consumption <math>t_~~h_{d,ji} = \text{Hamming Weight}~~</math>. For this reason it is desirable to have a non-linear relationship between <math>w~~(p, \text{sbox}[p_d \oplus i])</math> and either <math>p</math> or <math>i</math>, as we will otherwise see a linear relationship for all values of <math>i</math>. In this case we take advantage of the non-linear substitution boxes (S-Boxes) in the algorithm, which are simply lookup tables which have been selected to have minimal possible correlation between the input and output. The emphasis on minimum possible correlation between input and output is a requirement to avoid certain linear cryptographic attacks.

~~Finally~~Then, we can ~~calculate~~ proceed with the ~~correlation coefficient for each point~~ attack as described above. Since we are attacking one byte of the key at a time, we will have to try <math>jI = 2^8</math> ~~over all traces~~ values for each subkey. Then, there are 16 bytes in the key, so our attack will take <math>D16 \times 2^8 = 2^{12}</math>~~, for each of the~~ time. This is an enormous improvement over trying every possible ~~subkey values~~ key, which would take <math>I2^{128}</math>~~, as in the following:~~time.

~~<blockquote><math>{r_{i,j}} = \frac{{\sum\nolimits_{d = 1}^D {\left~~[ {\left( {{h_{d,i}} - \overline {{h_i}} } \right)\left( {{t_{d,j}} - \overline {{t_j}} } \right)} \right]} }}{{\sqrt {\sum\nolimits_{d = 1}^D {{{\left( {{h_{d,i}} - \overline {{h_i}} } \right)}^2}} \sum\nolimits_{d = 1}^D {{{\left( {{t_{d,j}} - \overline {{t_j}} } \right)}^2}} } }}</math>~~</blockquote>This is simply Pearson's correlation coefficient, given below, where <math>X = p</math>, and <math>Y = t</math>~~[Category:Theory]]

~~<blockquote><math>{\rho _{X,Y}} = \frac{{{\mathop{\rm cov}} \left( {X,Y} \right)}}{{{\sigma _X}{\sigma _Y}}} = \frac{{E\left~~[ ~~{\left( {X - {\mu _X}} \right)\left( {Y - {\mu _Y}} \right)} \right]}}{{\sqrt {E\left~~[ ~~{{{\left( {X - {\mu _X}} \right)}^2}} \right~~Category:TocTheory]~~} \sqrt {E\left[ {{{\left( {Y - {\mu _Y}} \right)}^2}} \right~~]~~} }}</math></blockquote>~~The problem of finding a known signal in a noisy measurement exists in many other fields beyond side-channel analysis. These two equations are referred to as the ''normalized cross-correlation'', and frequently used in digital imaging for matching a known `template' to an image, e.g. finding the location of some specific item in a photo of a room.

← Older edit

Fheubach

Administrator

458

edits

Changes

Correlation Power Analysis

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Off-Wiki Links

Categories

Tools