S3 Encryption Addons14 September 2015
I wanted to share an overview of a new library named
which was created to supplement
amazonka-s3 with client-side encryption.
Client-side encryption allows transmission and storage of sensitive
information (Data in Motion), whilst ensuring that Amazon never receives any of
your unencrypted data. Previously
amazonka-s3 only supported server-side encryption
(Data at Rest), which requires transmission of unencrypted data to S3. The cryptographic
techniques used within the library are modeled as closely as possible upon the
official AWS SDKs, specifically the Java AWS SDK. Haddock documentation is available
The version 4 signing algorithm supports two modes for signing requests when communicating
with S3. The first requires a
SHA256 hash of the payload to calculate
the request signature and the second allows incremental signature calculation for
fixed or variable chunks of the payload. Up until now,
amazonka (and all other SDKs excepting Java)
only supported the first method.
This poses a problem for encryption, where the need to calculate the
of the encrypted contents requires the use of a temporary file or another buffering
mechanism. For example, the
aws-sdk-ruby library performs the following procedure
to send an encrypted
- Copy and encrypt the payload to a temporary file.
- Obtain the
SHA256hash and file size of the encrypted file.
- Stream the file contents to the socket during transmission.
This means whatever the payload size is, you have to stream/encrypt a complete copy of the payload contents to a temporary file before sending.
To avoid this same pitfall,
amazonka-s3 now uses streaming signature calculation
when sending requests. This removes the need for the pre-calculated
and allows the encryption and signing to be performed incrementally as the request
Unfortunately, despite the documentation claiming that
is supported - it appears that you need to estimate the encrypted
(including metadata) and send this without the
Transfer-Encoding header, otherwise
the signature calculations simply fail with the usual obtuse S3
The smart constructors emitted by the generation step for all
now take into account streaming signature support and you’re likely to encounter
the following parameters for operations utilising a streaming request body:
HashedBody- A request body requiring a pre-calculated
ChunkedBody- A request body which supports streaming signature calculation.
RqBody- A request body supporting any signing method.
ToBody type classes are provided to make it easy to convert
values such as
ByteString, etc into the appropriate request body.
itself exports functions such as
chunkedFile and others to assist
in constructing streaming request bodies.
All regular S3
UploadPart operations now take advantage of
streaming signature calculation with the default chunk size set to
128 KB. This seems
to be a decent trade off between streaming and the expense of incrementally performing
signature calculations, but I’d recommend profiling for your particular use-case
if performance and allocations are a concern.
The above information is available in a more context sensitive format within the documentation.
Encryption and Decryption
Client-side encryption of S3 objects is used to securely and safely store sensitive data in S3. When using client-side encryption, the data is encrypted before it is sent to S3, meaning Amazon does not receive your unencrypted object data. Unfortunately the object metadata (headers) still leak, so any sensitive information should be stored within the payload itself.
The procedure for encryption is as follows:
A one-time-use symmetric key a.k.a. a data encryption key (or data key) and initialisation vector (IV) are generated locally. This data key and IV are used to encrypt the data of a single S3 object using an AES256 cipher in CBC mode, with PKCS7 padding. (For each object sent, a completely separate data key and IV are generated.)
The generated data encryption key used above is encrypted using a symmetric AES256 cipher in ECB mode, asymmetric RSA, or KMS facilities, depending on the client-side master key you provided.
The encrypted data is uploaded and the encrypted data key and material description are attached as object metadata (either headers or a separate instruction file). If KMS is used, the material description helps determine which client-side master key is later used for decryption, otherwise the configured client-side key at time of decryption is used.
- The encrypted object is downloaded from Amazon S3 along with any metadata. If KMS was used to encrypt the data then the master key id is taken from the metadata material description, otherwise the client-side master key in the current environment is used to decrypt the data key, which in turn is used to decrypt the object data.
If you’re unsure about which key mechanism to use, I’d recommend using KMS initially to avoid having to store and manage your own master keys.
By default, the metadata (known as an envelope) required for encryption
(except for the master key itself) is stored as S3 object metadata on the encrypted
object. Due to user-defined S3 metadata
being limited to
8KB when sending a
PUT request, if you are utilising object
metadata for another purpose which exceeds this limit, an alternative method
of storing the encryption envelope in an adjacent S3 object is provided. This
method removes the metadata overhead at the expense of an additional HTTP request
to perform encryption/decryption. By default the library will store and retrieve
<your-object-key>.instruction object if the related
functions are used.
Compatibility and Status
Metadata and instruction envelopes are designed to be compatible with the official Java AWS SDK (both V1 and V2 formats), but only a limited set of the possible encryption options are supported. Therefore assuming defaults, objects stored with this library should be retrievable by any of the other official SDKs, and vice versa. Support for other cryptographic configurations will be added in future, as needed.
amazonka-s3-encryption can currently be considered an initial preview release.
Despite this, it’s tied to the greater release process for the other
libraries and therefore life will start somewhere after version
It is separated from
amazonka-s3 proper, there are extra dependencies
not desirable within the main S3 package, such as
conduit-combinators. This way those using unencrypted S3 operations do not
inadvertantly end up with an
The library is currently being used in a limited capacity and the release to Hackage will be delayed until I’m confident of correctness, robustness and compatibility aspects. If you’re brave enough to experiment, it’s contained within the greater amazonka project on GitHub. Please open an issue with any problems/suggestions or drop into the Amazonka Gitter chat if you have questions.