Encryption
Introduction
This specification is a sub-specification of the ODK XForms Specification. It describes how to encrypt an XForms submission in an manner that is compatible with ODK tools for data aggregation and decryption.
Overview
A regular XForms submission consists of 1 XML file with optional additional media files from <upload>
form controls. Forms that have encryption enabled, will encrypt all these files and add an additional submission manifest XML file. The submission of these files occurs according to the OpenRosa Form Submission Specification, where the original XML file (now encrypted) is treated as a media file, and the submission manifest is treated as the XML file (aka the XForm Part).
Submission Manifest
Below is an example of a valid submission manifest:
<data xmlns="http://opendatakit.org/submissions" id="mysurvey" encrypted="yes" version="2014083101">
<base64EncryptedKey>sHXUut13/res3S3uJkwgfhABOc74aXGnCTxTcRTplS9kflomxAzK35zcLc0BJu/Dro7FpPia4qU+f3yb3roJi/EUtRkTaHauAYDEX2OHZ4QThoSmbR0NJRw6kLjfkNS5bFaONWEbRn8eSbT7uyOGyvx5ddL3IKIxzu9vGzJX+cMpKKUQsORaXNEL7lRns7tVen93OSlYhSQak/CbAbkpsSpIW+Q13zrGv3n20YOHaun5yhSyZq6LeaHzPWKQv2POyl+N2j3NGbkz+RIvaVBLvTae4zB0iXlfTkYK9HwOKKDS6MI7z4g4L988WlQurkw5jlN5X9ahNhwZN2yLWTsnCQ==</base64EncryptedKey>
<encryptedXmlFile>submission.xml.enc</encryptedXmlFile>
<media>
<file>myimage.jpg.enc</file>
</media>
<media>
<file>myaudio.mp3.enc</file>
</media>
<base64EncryptedElementSignature>OU7rbZl0uFy7xv/HnSl1juVrdf2fQpzcfjwetgl+wseOx5yeD3NjoAg978GGclsy38mECEgTkMS1g8J1I/Xrn9uSQCRyaJXgPyFYPP+y24ka+vCNuNfg6SN1h8MYyUDdg7B7/M9oacMixbAtHo9qcesSBykJWJjFjBS7Nl/GnojRIc5ywLwnzKrdjjxeTjFw7kIG3LCt298WBHuj7azbi/DJYPp26Dbho47LlaRbQpi5Q4Oea71y1h7Wdbl4r7ILyRkTo86fvg6HUfWDLWSorgoFCqi1Af9qP2ziF+LLWQzDu3M8SCHX6uWdCRm/8GPaAyUpMAyfy2e8i7KPbMcVsQ==</base64EncryptedElementSignature>
<meta xmlns="http://openrosa.org/xforms">
<instanceID>uuid:5b9cf8d1-106f-4004-844f-c072d76762ed</instanceID>
</meta>
</data>
The following elements are supported. Unless otherwise specified all attributes and elements should be in the "http://opendatakit.org/submissions"
namespace.
<element> /attribute |
description |
---|---|
<data> |
The required root element. |
id |
Required on the root element with same value as the id attribute in the Primary Instance of the XForm. |
encrypted |
Required on the root element with the value “yes” indicating this is the manifest of an encrypted record. |
version |
Required on the root element with same value as the version attribute in the Primary Instance of the XForm but only if it exist there. |
<base64EncryptedKey> |
Required child of the root element with the value of the encrypted encryption key used to encrypt the record. |
<encryptedXmlFile> |
Required child of the root element with the value of the filename of the encrypted XML file. |
<media> |
Child of the root element, which is required for each media file the record contains. |
<file> |
Single child of the media element, which is required for each media file belonging to the record. The order of multiple media/file elements is significant. |
<base64EncryptedElementSignature> |
Optional child of the root element that contains a signature that can be used to validate that the encrypted record does not appear to be tampered with. |
<meta> |
Required child of the root element in the "http://openrosa.org/xforms" namespace. |
<instanceID> |
Required child of the meta element in the "http://openrosa.org/xforms" namespace containing the value of the identical element in the XML record. |
Content Encryption
The XML file and uploaded files are encrypted with the equivalent of the AES/CFB/PKCS5Padding algorithm as used in Java 8 with a specified initialization vector algorithm.
The files should be encrypted in the following order:
- all media files, one by one
- the XML file
The order in which the <file>
elements are recorded in the submission manifest has to match the order in which they were encrypted.
AES Encryption Key
The AES encryption is a random 256 bit key generated by the client for each record. The submission manifest contains an encrypted version of this key.
After encrypting all files belonging to the record and generating the submission manifest, the raw AES encryption key is thrown away.
Initialization Vector
The initialization vector generation algorithm is pre-determined and reproducible. Each file is encrypted with a different initialization vector (through incrementation). This means that the order of files to be encrypted is sequential and important in order to decrypt successfully.
The following algorithm is used for each record:
calculate md5 digest of instanceID and the AES encryption key
convert md5 digest to a seed array of 16 bytes
start a counter at 0
for each file in the record to be encrypted - including the first - do:
calculate index as remainder of the counter modulo 16
increment byte in seed array at index with 1
increment counter
use updated seedArray as initialization vector for AES encryption
Padding
Though it is unusual to use padding for a streaming cipher such as CFB, it is required in this specification.
The algorithm name “AES/CFB/PKCS5Padding” in Java implies PKCS#5 padding. However, that padding scheme is actually not defined for AES so it is misnamed in Java. The PKCS#5 padding scheme is only defined for 8 byte blocks and AES always uses 16 byte blocks. What is meant is the equivalent of PKCS#5 for 16 byte blocks which is PKCS#7.
Key Encryption
The AES encryption key is encrypted using the equivalent of the RSA/NONE/OAEPWithSHA256AndMGF1Padding algorithm in Java 8 using the RSA public key that is part of the XForm definition. The result is base64-encoded.
Signature
To calculate a signature of the encrypted record, the following algorithm is used:
join together in the following order the strings with a "\n" newline character:
- form ID
- form version if it exists
- base64 encryted symmetric key (value of <base64EncryptedKey>)
- instance ID
- for each media file and the XML file, in order of encryption, the concatenation
of the original filename + '::' + md5 hexadecimal digest of the original file
add a final "\n" to the result string
calculate an md5 digest of the result string as bytes
encrypt this md5 digest using the same RSA algorithm and base64-encoding of the
result as used for the key encryption