Changes

Correlation Power Analysis

1,199 bytes added, 13:29, 1 May 2018

Changed header levels

This page discusses some of the basics of these steps, describing what models are typically used in CPA attacks and how the Pearson correlation coefficient is calculated.

== Modeling Power Consumption ==

Electronic computers (microcontrollers, FPGAs, etc) have two components to their power consumption. First, static power consumption is the power required to keep the device running. This static power depends on things like the number of transistors inside the device. Secondly, and more importantly, dynamic power consumption depends on the data moving around inside the device. Every time a bit is changed from a 0 to a 1 (or vice versa), some current is required to (dis)charge the data lines. The dynamic power is the part that we're interested in - it can tell us what's happening inside.

</pre>

= ~~Calculating the~~ = Pearson's Correlation Coefficient ==Once we have a way to model our power consumption, we need a way to compare our power estimate to our measured traces. A helpful tool for finding this relationship is through Pearson's correlation coefficient, which is

<math> \rho_{X,Y}= ~~Attacking the Subkeys~~ \frac{\text{cov}(X, Y)}{\sigma_X \sigma_Y}=\frac{E[(X - \mu_X)(Y - \mu_Y)]}{\sqrt{E[(X - \mu_X)^2] E[(Y - \mu_Y)^2]}}</math>

~~= Example~~This correlation coefficient will always be in the range [-1, 1]. It describes how closely the random variables <math>X</math> and <math>Y</math> are related: ~~AES~~* If <math>Y</math> always increases when <math>X</math> increases, it will be 1;* If <math>Y</math> always decreases when <math>X</math> increases, it will be -~~128 =~~1;* If <math>Y</math> is totally independent of <math>X</math>, it will be 0.

~~Here we will assume~~ These equations are referred to as the ~~attack has a power trace <math>t_{d~~''normalized cross-correlation''. There are typically used to pick out patterns in noisy signals. For example,~~j}</math>~~in digital imaging, correlation can be used to find where ~~<math>j = 1,2,\cdots,T</math>~~ an object is ~~the time index~~ in ~~the trace, and <math>d = 1,2,\cdots,D</math> is the trace number~~a room. ~~Thus the attacker makes <math>D</math> measurements~~In our attack, ~~each one <math>T</math> points long. If the attacker knew~~ we'~~'exactly'' where~~ ll be looking for a ~~cryptographic operation occurred, they would need to only measure~~ our model (a ~~single point such that <math>T=1</math>. For each trace <math>d</math>, the attacker also knows the plaintext or ciphertext corresponding to that~~ pre-calculated pattern) in measured power ~~trace, defined as <math>p_d</math>~~traces (noisy signals).

~~Assume also the attacker has a model of how the~~ == Attacking with Correlation ==After taking our measurements, we'll have <math>D</math> power ~~consumption~~ traces <math>t</math>, and each of ~~the device depends on some intermediate value. For example the attacker could assume the power consumption of a microcontroller was dependent on the hamming weight of the intermediate value. We~~ these traces will ~~define~~ have <math>h_T</math> data points. Using subscript notation, <math>t_{d,ij} ~~= l( w( p_d, i ))~~</math>~~, where~~ will refer to point <math>~~l(x)~~j</math> ~~is the leakage model for a given intermediate value, and~~ in trace <math>~~w(p, i)~~d</math> ~~generates an intermediate value given the input plaintext and the guess number~~ (<math>~~i =~~ 1~~,2,~~\~~cdots~~le d \le D,I0 \le j < T</math>).

~~This intermediate value will be selected to depend on~~ We'll also estimate the ~~input plaintext and a small portion of the secret key~~power consumption in each trace using our model. ~~For example with AES~~We'll say that there are <math>I</math> different subkeys that we want to try. Then, ~~each byte of~~ <math>h_{d, i}</math> will refer to our power estimate in trace <math>d</math>, assuming that the ~~plaintext~~ subkey is ~~XOR'd with each byte~~ <math>i</math> (~~subkey~~<math>1 \le d \le D, 0 \le i < I</math>) ~~of the secret key~~. ~~In this example we would have:~~

~~<blockquote>~~With this data, we can see how well our model and measurements match for each guess <math>~~l(x) = HammingWeight(x)~~i</math>and time <math>~~w(p, i) = p \oplus i~~j</math>~~</blockquote>This implies that the input plaintext~~ . We'll do this by finding how <math>pt</math> ~~is being attacked a single byte at a time, which means we are attacking a single byte of the AES key at a time. While we still need to enumerate all possibilities for this subkey, we now only have~~ and <math>~~16 \times 2^8~~h</math> ~~instead of~~ correlate over the <math>~~2^{128}~~D</math> ~~possibilities for AES-128~~traces.One way of calculating this is:

~~We will be next using the correlation coefficient to look for a linear relationship between the predicted power consumption~~ <math>lr_{i,j}= \frac{{\sum\nolimits_{d = 1}^D {\left[ {\left(x{{h_{d,i}} - \overline {{h_i}} } \right)~~</math> and the measured power consumption <math>~~\left( {{t_{d,j}~~</math>. For this reason it is desirable to have a non~~} -~~linear relationship between <math>w~~\overline {{t_j}} } \right)} \right]} }}{{\sqrt {\sum\nolimits_{d = 1}^D {{{\left(p{{h_{d, i}} - \overline {{h_i}} } \right)~~</math> and either <math>p</math> or <math>i</math>~~}^2}} \sum\nolimits_{d = 1}^D {{{\left( {{t_{d, ~~as we will otherwise see a linear relationship for all values of <math>i~~j}} - \overline {{t_j}} } \right)}^2}} } }}</math>. In this case we take advantage of the non-linear substitution boxes (S-Boxes) in the algorithm, which are simply lookup tables which have been selected to have minimal possible correlation between the input and output. The emphasis on minimum possible correlation between input and output is a requirement to avoid certain linear cryptographic attacks.

~~Finally,~~ There is an alternative form of the correlation equation that we can ~~calculate the correlation coefficient~~ use for ~~each point <math>j</math> over~~ ''online'' calculations - it allows us to add one trace at a time without re-summing all ~~traces <math>D</math>, for each~~ of the ~~possible subkey values <math>I</math>, as in the following~~past data. This form is:

~~<blockquote>~~<math>{r_{i,j}} = \frac{~~{\sum~~D \~~nolimits_~~sum_{d = 1}^D ~~{\left[ {\left( {{~~h_{d,i}t_{d,j} - \~~overline~~ sum_{d=1}^D h_{~~h_i}}~~ d,i} \~~right)\left( {~~sum_{d=1}^D t_{d,j}} ~~- \overline {{t_j}} } \right)} \right]} }}{~~{\sqrt {\~~sum~~Big(\big(\~~nolimits_~~sum_{d = 1}^D ~~{{{\left( {{~~h_{d,i}} \big)^2 - D\~~overline~~ sum_{d=1}^D h_{~~h_i}} } \right)~~d,i}^2}} \~~sum~~Big)\Big(\big(\~~nolimits_~~sum_{d = 1}^D ~~{{{\left( {{~~t_{d,j}} \big)^2 - D\~~overline~~ sum_{d=1}^D t_{~~t_j}}~~ d,j} ^2\~~right~~Big)}^2}~~} } }}</math></blockquote>This is simply Pearson's correlation coefficient, given below, where <math>X = p</math>, and <math>Y = t~~</math>:

Note that these two sums are equivalent. == Picking a Subkey ==The last step is to use the values of <~~blockquote~~math>r_{i,j}</math> to decide which subkey matches our traces most closely. There are two steps to this:* For each subkey <math>i</math>, find the highest value of <math>|r_{~~\rho _{X~~i,Y}j} |</math>. This will discard the time information - we want to know how good our guess was, but we don't care where our guess matched the trace. * Looking at the maximum values for each subkey, find the highest value of <math>|r_i|</math>. The location <math>i</math> of this maximum is our best guess: it correlated more closely with the traces than any other guess.Note that we're only working with absolute values here because we don't care about the sign of the relationship. All we need to know is that a linear correlation exists. = ~~\frac{{{\mathop{\rm cov}} \left~~= Example: AES-128 ==As an example, consider AES-128 encryption. The pseudo-code for this algorithm is:<pre>// AES-128 Cipher// in: 128 bits ( {Xplaintext)// out: 128 bits (ciphertext)// w: 44 words,~~Y} \right~~32 bits each (expanded key)~~}}{{{\sigma _X}{\sigma _Y}}}~~ state = ~~\frac{{E\left[ {\left~~in AddRoundKey( {X state, w[0, Nb- ~~{\mu _X}} \right~~1])~~\left~~ for round = 1 step 1 to Nr–1 SubBytes( {Y state) // <- ~~{\mu _Y}} \right~~- Attack this point in round 1! ShiftRows(state)~~} \right~~ MixColumns(state) AddRoundKey(state, w[round*Nb, (round+1)*Nb-1]~~}}{{\sqrt {E\left~~)end for SubBytes(state)ShiftRows(state)AddRoundKey(state, w[ ~~{{{\left~~Nr*Nb, ( {X Nr+1)*Nb- ~~{\mu _X}} \right~~1])} out = state</pre>This might look pretty complex, but we only need to worry about one part of the algorithm. After the first <code>SubBytes()</code> step, the state is effectively<pre>// State after AddRoundKey() and SubBytes()state = sbox[in ^~~2}} \right~~key]</pre>where <code>sbox</code> is a lookup table used in AES. This is an effective point to attack. If we know that a byte of the plaintext is <math>p_d</math>, then our modeled power consumption will be <math>h_{d, i} = \~~sqrt~~ text{~~E\left[ {{{\left~~Hamming Weight}( \text{~~Y - {\mu _Y}~~sbox} [p_d \~~right~~oplus i])~~}^2}} \right]} }}~~</math>~~</blockquote>The problem~~ Then, we can proceed with the attack as described above. Since we are attacking one byte of ~~finding~~ the key at a ~~known signal in a noisy measurement exists in many other fields beyond side-channel analysis~~time, we will have to try <math>I = 2^8</math> values for each subkey. ~~These two equations~~ Then, there are ~~referred to as~~ 16 bytes in the ~~''normalized cross-correlation''~~key, ~~and frequently used in digital imaging for matching a known `template' to~~ so our attack will take <math>16 \times 2^8 = 2^{12}</math> time. This is an ~~image~~enormous improvement over trying every possible key, ~~e.g. finding the location of some specific item in a photo of a room~~which would take <math>2^{128}</math> time. [[Category:Theory]] [[Category:TocTheory]]

← Older edit

Fheubach

Administrator

458

edits

Changes

Correlation Power Analysis

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Off-Wiki Links

Categories

Tools