Encryption

Introduction

This specification is a sub-specification of the ODK XForms Specification. It describes how to encrypt an XForms submission in an manner that is compatible with ODK tools for data aggregation and decryption.

Overview

A regular XForms submission consists of 1 XML file with optional additional media files from <upload> form controls. Forms that have encryption enabled, will encrypt all these files and add an additional submission manifest XML file. The submission of these files occurs according to the OpenRosa Form Submission Specification, where the original XML file (now encrypted) is treated as a media file, and the submission manifest is treated as the XML file (aka the XForm Part).

Submission Manifest

Below is an example of a valid submission manifest:

<data xmlns="http://opendatakit.org/submissions" id="mysurvey" encrypted="yes" version="2014083101">
    <base64EncryptedKey>sHXUut13/res3S3uJkwgfhABOc74aXGnCTxTcRTplS9kflomxAzK35zcLc0BJu/Dro7FpPia4qU+f3yb3roJi/EUtRkTaHauAYDEX2OHZ4QThoSmbR0NJRw6kLjfkNS5bFaONWEbRn8eSbT7uyOGyvx5ddL3IKIxzu9vGzJX+cMpKKUQsORaXNEL7lRns7tVen93OSlYhSQak/CbAbkpsSpIW+Q13zrGv3n20YOHaun5yhSyZq6LeaHzPWKQv2POyl+N2j3NGbkz+RIvaVBLvTae4zB0iXlfTkYK9HwOKKDS6MI7z4g4L988WlQurkw5jlN5X9ahNhwZN2yLWTsnCQ==</base64EncryptedKey>
    <encryptedXmlFile>submission.xml.enc</encryptedXmlFile>
    <media>
        <file>myimage.jpg.enc</file>
    </media>
    <media>
        <file>myaudio.mp3.enc</file>
    </media>
    <base64EncryptedElementSignature>OU7rbZl0uFy7xv/HnSl1juVrdf2fQpzcfjwetgl+wseOx5yeD3NjoAg978GGclsy38mECEgTkMS1g8J1I/Xrn9uSQCRyaJXgPyFYPP+y24ka+vCNuNfg6SN1h8MYyUDdg7B7/M9oacMixbAtHo9qcesSBykJWJjFjBS7Nl/GnojRIc5ywLwnzKrdjjxeTjFw7kIG3LCt298WBHuj7azbi/DJYPp26Dbho47LlaRbQpi5Q4Oea71y1h7Wdbl4r7ILyRkTo86fvg6HUfWDLWSorgoFCqi1Af9qP2ziF+LLWQzDu3M8SCHX6uWdCRm/8GPaAyUpMAyfy2e8i7KPbMcVsQ==</base64EncryptedElementSignature>
    <meta xmlns="http://openrosa.org/xforms">
        <instanceID>uuid:5b9cf8d1-106f-4004-844f-c072d76762ed</instanceID>
    </meta>
</data>

The following elements are supported. Unless otherwise specified all attributes and elements should be in the "http://opendatakit.org/submissions" namespace.

<element>/attribute description
<data> The required root element.
id Required on the root element with same value as the id attribute in the Primary Instance of the XForm.
encrypted Required on the root element with the value “yes” indicating this is the manifest of an encrypted record.
version Required on the root element with same value as the version attribute in the Primary Instance of the XForm but only if it exist there.
<base64EncryptedKey> Required child of the root element with the value of the encrypted encryption key used to encrypt the record.
<encryptedXmlFile> Required child of the root element with the value of the filename of the encrypted XML file.
<media> Child of the root element, which is required for each media file the record contains.
<file> Single child of the media element, which is required for each media file belonging to the record. The order of multiple media/file elements is significant.
<base64EncryptedElementSignature> Optional child of the root element that contains a signature that can be used to validate that the encrypted record does not appear to be tampered with.
<meta> Required child of the root element in the "http://openrosa.org/xforms" namespace.
<instanceID> Required child of the meta element in the "http://openrosa.org/xforms" namespace containing the value of the identical element in the XML record.

Content Encryption

The XML file and uploaded files are encrypted with the equivalent of the AES/CFB/PKCS5Padding algorithm as used in Java 8 with a specified initialization vector algorithm.

The files should be encrypted in the following order:

  1. all media files, one by one
  2. the XML file

The order in which the <file> elements are recorded in the submission manifest has to match the order in which they were encrypted.

AES Encryption Key

The AES encryption is a random 256 bit key generated by the client for each record. The submission manifest contains an encrypted version of this key.

After encrypting all files belonging to the record and generating the submission manifest, the raw AES encryption key is thrown away.

Initialization Vector

The initialization vector generation algorithm is pre-determined and reproducible. Each file is encrypted with a different initialization vector (through incrementation). This means that the order of files to be encrypted is sequential and important in order to decrypt successfully.

The following algorithm is used for each record:

calculate md5 digest of instanceID and the AES encryption key
convert md5 digest to a seed array of 16 bytes
start a counter at 0
for each file in the record to be encrypted - including the first - do:
    calculate index as remainder of the counter modulo 16
    increment byte in seed array at index with 1
    increment counter
    use updated seedArray as initialization vector for AES encryption

Padding

Though it is unusual to use padding for a streaming cipher such as CFB, it is required in this specification.

The algorithm name “AES/CFB/PKCS5Padding” in Java implies PKCS#5 padding. However, that padding scheme is actually not defined for AES so it is misnamed in Java. The PKCS#5 padding scheme is only defined for 8 byte blocks and AES always uses 16 byte blocks. What is meant is the equivalent of PKCS#5 for 16 byte blocks which is PKCS#7.

Key Encryption

The AES encryption key is encrypted using the equivalent of the RSA/NONE/OAEPWithSHA256AndMGF1Padding algorithm in Java 8 using the RSA public key that is part of the XForm definition. The result is base64-encoded.

Signature

To calculate a signature of the encrypted record, the following algorithm is used:

join together in the following order the strings with a "\n" newline character:
    - form ID 
    - form version if it exists
    - base64 encryted symmetric key (value of <base64EncryptedKey>)
    - instance ID 
    - for each media file and the XML file, in order of encryption, the concatenation 
      of the original filename + '::' + md5 hexadecimal digest of the original file 
add a final "\n" to the result string
calculate an md5 digest of the result string as bytes
encrypt this md5 digest using the same RSA algorithm and base64-encoding of the 
   result as used for the key encryption