Missing validation for ML-KEM

Note: this was bought up in https://github.com/PQClean/PQClean/issues/601 while looking at the PQClean code and then directed here, so I'm writing this issue to see if there's interest in a PR.

In FIPS 203 there are a few input validation checks which were added to ML-KEM which are missing from the Kyber specification and also seem to be missing from this code. As repositories such as PQClean use this code upstream for ML-KEM, you end up with "non-FIPS ML-KEM" if the code misses the validation checks.

The length checks are essentially done for free by how the code is written, but two checks:

1. Modulus check: ensure that the encoded value of the decoded $\hat{t}$ matches the input bytes (ensures all coefficients are in canonical range).
2. Hash check: hash the public key and ensure it matches the hash of the public key in the secret key

Seem to be missing. If these checks are thought to be useful, then I think it wouldnt be too much work to add them and I'm happy to add a PR for this.

I'll sketch the two changes below and see what the owners of this code think. The general idea is to return `0` when all checks pass and `1` on failure (originally I returned -1 which seemed more natural, but then I saw `verify()` returns `1` on mismatch, so I tried to follow this convention)

### Modulus Check

I think the easiest place to put the modulus check is in `unpack_pk()` itself, which would do the necessary decoding check and return `0` on success and `1` on failure. This means taking the `polyvec *pk` vector after decoding, re-encoding it and doing a byte comparison.

```c
static int unpack_pk(polyvec *pk,
                      uint8_t seed[KYBER_SYMBYTES],
                      const uint8_t packedpk[KYBER_INDCPA_PUBLICKEYBYTES])
{
  polyvec_frombytes(pk, packedpk);
  memcpy(seed, packedpk+KYBER_POLYVECBYTES, KYBER_SYMBYTES);

  // Preform the modulus check
  modulus_check uint8_t  [KYBER_POLYVECBYTES]
  polyvec_tobytes(modulus_check, pk);

  // if modulus_check == packedpk[..KYBER_POLYVECBYTES] return 0; else 1;
  return verify(modulus_check, packedpk, KYBER_POLYVECBYTES);
}
```

This then means we return `int` rather than void in `indcpa_enc()`

```c
int indcpa_enc(uint8_t c[KYBER_INDCPA_BYTES],
                const uint8_t m[KYBER_INDCPA_MSGBYTES],
                const uint8_t pk[KYBER_INDCPA_PUBLICKEYBYTES],
                const uint8_t coins[KYBER_SYMBYTES])
{
  // SNIP
  int modulus_check;
  modulus_check = unpack_pk(&pkpv, seed, pk);

  // SNIP
  return modulus_check;
```

and the only change to `enc` is that we return `crypto_kem_enc_derand` instead of `0`.

```c
int crypto_kem_enc(uint8_t *ct,
                   uint8_t *ss,
                   const uint8_t *pk)
{
  uint8_t coins[KYBER_SYMBYTES];
  randombytes(coins, KYBER_SYMBYTES);
  return crypto_kem_enc_derand(ct, ss, pk, coins);
}
```

### Hash Check

I think for the hash check, we can do everything within `crypto_kem_dec()`. It basically means performing one additional hash and byte comparison and we can do this effectively at any point in the code (we could also make a `perform_hash_check()` method if this is deemed cleaner)


```c
int crypto_kem_dec(uint8_t *ss,
                   const uint8_t *ct,
                   const uint8_t *sk)
{
  // SNIP

  uint8_t hash_check[KYBER_SYMBYTES];
  hash_h(hash_check, pk, KYBER_INDCPA_PUBLICKEYBYTES)
  // if h(ek_indcpa) == sk_hash return 0; else 1;
  return verify(hash_check, pk + KYBER_INDCPA_PUBLICKEYBYTES, KYBER_SYMBYTES);
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing validation for ML-KEM #107

Modulus Check

Hash Check

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Missing validation for ML-KEM #107

Description

Modulus Check

Hash Check

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions