3,938 bytes added,
03:16, 29 March 2016
Here we will assume the attack has a power trace <math>t_{d,j}</math>, where <math>j = 1,2,\cdots,T</math> is the time index in the trace, and <math>d = 1,2,\cdots,D</math> is the trace number. Thus the attacker makes <math>D</math> measurements, each one <math>T</math> points long. If the attacker knew ''exactly'' where a cryptographic operation occurred, they would need to only measure a single point such that <math>T=1</math>. For each trace <math>d</math>, the attacker also knows the plaintext or ciphertext corresponding to that power trace, defined as <math>p_d</math>.
Assume also the attacker has a model of how the power consumption of the device depends on some intermediate value. For example the attacker could assume the power consumption of a microcontroller was dependent on the hamming weight of the intermediate value. We will define <math>h_{d,i} = l( w( p_d, i ))</math>, where <math>l(x)</math> is the leakage model for a given intermediate value, and <math>w(p, i)</math> generates an intermediate value given the input plaintext and the guess number <math>i = 1,2,\cdots,I</math>.
This intermediate value will be selected to depend on the input plaintext and a small portion of the secret key. For example with AES, each byte of the plaintext is XOR'd with each byte (subkey) of the secret key. In this example we would have:
<blockquote><math>l(x) = HammingWeight(x)</math><math>w(p, i) = p \oplus i</math>
</blockquote>
This implies that the input plaintext <math>p</math> is being attacked a single byte at a time, which means we are attacking a single byte of the AES key at a time. While we still need to enumerate all possibilities for this subkey, we now only have <math>16 \times 2^8</math> instead of <math>2^{128}</math> possibilities for AES-128.
We will be next using the correlation coefficient to look for a linear relationship between the predicted power consumption <math>l(x)</math> and the measured power consumption <math>t_{d,j}</math>. For this reason it is desirable to have a non-linear relationship between <math>w(p, i)</math> and either <math>p</math> or <math>i</math>, as we will otherwise see a linear relationship for all values of <math>i</math>. In this case we take advantage of the non-linear substitution boxes (S-Boxes) in the algorithm, which are simply lookup tables which have been selected to have minimal possible correlation between the input and output. The emphasis on minimum possible correlation between input and output is a requirement to avoid certain linear cryptographic attacks.
Finally, we can calculate the correlation coefficient for each point <math>j</math> over all traces <math>D</math>, for each of the possible subkey values <math>I</math>, as in the following:
<blockquote><math>{r_{i,j}} = \frac{{\sum\nolimits_{d = 1}^D {\left[ {\left( {{h_{d,i}} - \overline {{h_i}} } \right)\left( {{t_{d,j}} - \overline {{t_j}} } \right)} \right]} }}{{\sqrt {\sum\nolimits_{d = 1}^D {{{\left( {{h_{d,i}} - \overline {{h_i}} } \right)}^2}} \sum\nolimits_{d = 1}^D {{{\left( {{t_{d,j}} - \overline {{t_j}} } \right)}^2}} } }}</math>
</blockquote>
This is simply Pearson's correlation coefficient, given below, where <math>X = p</math>, and <math>Y = t</math>:
<blockquote><math>{\rho _{X,Y}} = \frac{{{\mathop{\rm cov}} \left( {X,Y} \right)}}{{{\sigma _X}{\sigma _Y}}} = \frac{{E\left[ {\left( {X - {\mu _X}} \right)\left( {Y - {\mu _Y}} \right)} \right]}}{{\sqrt {E\left[ {{{\left( {X - {\mu _X}} \right)}^2}} \right]} \sqrt {E\left[ {{{\left( {Y - {\mu _Y}} \right)}^2}} \right]} }}</math>
</blockquote>
The problem of finding a known signal in a noisy measurement exists in many other fields beyond side-channel analysis. These two equations are referred to as the ''normalized cross-correlation'', and frequently used in digital imaging for matching a known `template' to an image, e.g. finding the location of some specific item in a photo of a room.