As of August 2020 the site you are on (wiki.newae.com) is deprecated, and content is now at rtfm.newae.com.

Changes

Jump to: navigation, search

Correlation Power Analysis

1,199 bytes added, 13:29, 1 May 2018
Changed header levels
This page discusses some of the basics of these steps, describing what models are typically used in CPA attacks and how the Pearson correlation coefficient is calculated.
== Modeling Power Consumption ==
Electronic computers (microcontrollers, FPGAs, etc) have two components to their power consumption. First, static power consumption is the power required to keep the device running. This static power depends on things like the number of transistors inside the device. Secondly, and more importantly, dynamic power consumption depends on the data moving around inside the device. Every time a bit is changed from a 0 to a 1 (or vice versa), some current is required to (dis)charge the data lines. The dynamic power is the part that we're interested in - it can tell us what's happening inside.
</pre>
= Calculating the = Pearson's Correlation Coefficient ==Once we have a way to model our power consumption, we need a way to compare our power estimate to our measured traces. A helpful tool for finding this relationship is through Pearson's correlation coefficient, which is
<math> \rho_{X,Y}= Attacking the Subkeys \frac{\text{cov}(X, Y)}{\sigma_X \sigma_Y}=\frac{E[(X - \mu_X)(Y - \mu_Y)]}{\sqrt{E[(X - \mu_X)^2] E[(Y - \mu_Y)^2]}}</math>
= ExampleThis correlation coefficient will always be in the range [-1, 1]. It describes how closely the random variables <math>X</math> and <math>Y</math> are related: AES* If <math>Y</math> always increases when <math>X</math> increases, it will be 1;* If <math>Y</math> always decreases when <math>X</math> increases, it will be -128 =1;* If <math>Y</math> is totally independent of <math>X</math>, it will be 0.
Here we will assume These equations are referred to as the attack has a power trace <math>t_{d''normalized cross-correlation''. There are typically used to pick out patterns in noisy signals. For example,j}</math>in digital imaging, correlation can be used to find where <math>j = 1,2,\cdots,T</math> an object is the time index in the trace, and <math>d = 1,2,\cdots,D</math> is the trace numbera room. Thus the attacker makes <math>D</math> measurementsIn our attack, each one <math>T</math> points long. If the attacker knew we''exactly'' where ll be looking for a cryptographic operation occurred, they would need to only measure our model (a single point such that <math>T=1</math>. For each trace <math>d</math>, the attacker also knows the plaintext or ciphertext corresponding to that pre-calculated pattern) in measured power trace, defined as <math>p_d</math>traces (noisy signals).
Assume also the attacker has a model of how the == Attacking with Correlation ==After taking our measurements, we'll have <math>D</math> power consumption traces <math>t</math>, and each of the device depends on some intermediate value. For example the attacker could assume the power consumption of a microcontroller was dependent on the hamming weight of the intermediate value. We these traces will define have <math>h_T</math> data points. Using subscript notation, <math>t_{d,ij} = l( w( p_d, i ))</math>, where will refer to point <math>l(x)j</math> is the leakage model for a given intermediate value, and in trace <math>w(p, i)d</math> generates an intermediate value given the input plaintext and the guess number (<math>i = 1,2,\cdotsle d \le D,I0 \le j < T</math>).
This intermediate value will be selected to depend on We'll also estimate the input plaintext and a small portion of the secret keypower consumption in each trace using our model. For example with AESWe'll say that there are <math>I</math> different subkeys that we want to try. Then, each byte of <math>h_{d, i}</math> will refer to our power estimate in trace <math>d</math>, assuming that the plaintext subkey is XOR'd with each byte <math>i</math> (subkey<math>1 \le d \le D, 0 \le i < I</math>) of the secret key. In this example we would have:
<blockquote>With this data, we can see how well our model and measurements match for each guess <math>l(x) = HammingWeight(x)i</math>and time <math>w(p, i) = p \oplus ij</math></blockquote>This implies that the input plaintext . We'll do this by finding how <math>pt</math> is being attacked a single byte at a time, which means we are attacking a single byte of the AES key at a time. While we still need to enumerate all possibilities for this subkey, we now only have and <math>16 \times 2^8h</math> instead of correlate over the <math>2^{128}D</math> possibilities for AES-128traces.One way of calculating this is:
We will be next using the correlation coefficient to look for a linear relationship between the predicted power consumption <math>lr_{i,j}= \frac{{\sum\nolimits_{d = 1}^D {\left[ {\left(x{{h_{d,i}} - \overline {{h_i}} } \right)</math> and the measured power consumption <math>\left( {{t_{d,j}</math>. For this reason it is desirable to have a non} -linear relationship between <math>w\overline {{t_j}} } \right)} \right]} }}{{\sqrt {\sum\nolimits_{d = 1}^D {{{\left(p{{h_{d, i}} - \overline {{h_i}} } \right)</math> and either <math>p</math> or <math>i</math>}^2}} \sum\nolimits_{d = 1}^D {{{\left( {{t_{d, as we will otherwise see a linear relationship for all values of <math>ij}} - \overline {{t_j}} } \right)}^2}} } }}</math>. In this case we take advantage of the non-linear substitution boxes (S-Boxes) in the algorithm, which are simply lookup tables which have been selected to have minimal possible correlation between the input and output. The emphasis on minimum possible correlation between input and output is a requirement to avoid certain linear cryptographic attacks.
Finally, There is an alternative form of the correlation equation that we can calculate the correlation coefficient use for each point <math>j</math> over ''online'' calculations - it allows us to add one trace at a time without re-summing all traces <math>D</math>, for each of the possible subkey values <math>I</math>, as in the followingpast data. This form is:
<blockquote><math>{r_{i,j}} = \frac{{\sumD \nolimits_sum_{d = 1}^D {\left[ {\left( {{h_{d,i}t_{d,j} - \overline sum_{d=1}^D h_{h_i}} d,i} \right)\left( {sum_{d=1}^D t_{d,j}} - \overline {{t_j}} } \right)} \right]} }}{{\sqrt {\sumBig(\big(\nolimits_sum_{d = 1}^D {{{\left( {{h_{d,i}} \big)^2 - D\overline sum_{d=1}^D h_{h_i}} } \right)d,i}^2}} \sumBig)\Big(\big(\nolimits_sum_{d = 1}^D {{{\left( {{t_{d,j}} \big)^2 - D\overline sum_{d=1}^D t_{t_j}} d,j} ^2\rightBig)}^2}} } }}</math></blockquote>This is simply Pearson's correlation coefficient, given below, where <math>X = p</math>, and <math>Y = t</math>:
Note that these two sums are equivalent. == Picking a Subkey ==The last step is to use the values of <blockquotemath>r_{i,j}</math> to decide which subkey matches our traces most closely. There are two steps to this:* For each subkey <math>i</math>, find the highest value of <math>|r_{\rho _{Xi,Y}j} |</math>. This will discard the time information - we want to know how good our guess was, but we don't care where our guess matched the trace. * Looking at the maximum values for each subkey, find the highest value of <math>|r_i|</math>. The location <math>i</math> of this maximum is our best guess: it correlated more closely with the traces than any other guess.Note that we're only working with absolute values here because we don't care about the sign of the relationship. All we need to know is that a linear correlation exists. = \frac{{{\mathop{\rm cov}} \left= Example: AES-128 ==As an example, consider AES-128 encryption. The pseudo-code for this algorithm is:<pre>// AES-128 Cipher// in: 128 bits ( {Xplaintext)// out: 128 bits (ciphertext)// w: 44 words,Y} \right32 bits each (expanded key)}}{{{\sigma _X}{\sigma _Y}}} state = \frac{{E\left[ {\leftin AddRoundKey( {X state, w[0, Nb- {\mu _X}} \right1])\left  for round = 1 step 1 to Nr–1 SubBytes( {Y state) // <- {\mu _Y}} \right- Attack this point in round 1! ShiftRows(state)} \right MixColumns(state) AddRoundKey(state, w[round*Nb, (round+1)*Nb-1]}}{{\sqrt {E\left)end for SubBytes(state)ShiftRows(state)AddRoundKey(state, w[ {{{\leftNr*Nb, ( {X Nr+1)*Nb- {\mu _X}} \right1])} out = state</pre>This might look pretty complex, but we only need to worry about one part of the algorithm. After the first <code>SubBytes()</code> step, the state is effectively<pre>// State after AddRoundKey() and SubBytes()state = sbox[in ^2}} \rightkey]</pre>where <code>sbox</code> is a lookup table used in AES. This is an effective point to attack. If we know that a byte of the plaintext is <math>p_d</math>, then our modeled power consumption will be <math>h_{d, i} = \sqrt text{E\left[ {{{\leftHamming Weight}( \text{Y - {\mu _Y}sbox} [p_d \rightoplus i])}^2}} \right]} }}</math></blockquote>The problem Then, we can proceed with the attack as described above. Since we are attacking one byte of finding the key at a known signal in a noisy measurement exists in many other fields beyond side-channel analysistime, we will have to try <math>I = 2^8</math> values for each subkey. These two equations Then, there are referred to as 16 bytes in the ''normalized cross-correlation''key, and frequently used in digital imaging for matching a known `template' to so our attack will take <math>16 \times 2^8 = 2^{12}</math> time. This is an imageenormous improvement over trying every possible key, e.g. finding the location of some specific item in a photo of a roomwhich would take <math>2^{128}</math> time[[Category:Theory]] [[Category:TocTheory]]

Navigation menu