S3 Encryption Addons

I wanted to share an overview of a new library named amazonka-s3-encryption, which was created to supplement amazonka-s3 with client-side encryption. Client-side encryption allows transmission and storage of sensitive information (Data in Motion), whilst ensuring that Amazon never receives any of your unencrypted data. Previously amazonka-s3 only supported server-side encryption (Data at Rest), which requires transmission of unencrypted data to S3. The cryptographic techniques used within the library are modeled as closely as possible upon the official AWS SDKs, specifically the Java AWS SDK. Haddock documentation is available here.


Chunked Encoding

The version 4 signing algorithm supports two modes for signing requests when communicating with S3. The first requires a SHA256 hash of the payload to calculate the request signature and the second allows incremental signature calculation for fixed or variable chunks of the payload. Up until now, amazonka (and all other SDKs excepting Java) only supported the first method.

This poses a problem for encryption, where the need to calculate the SHA256 hash of the encrypted contents requires the use of a temporary file or another buffering mechanism. For example, the aws-sdk-ruby library performs the following procedure to send an encrypted PutObject request:

This means whatever the payload size is, you have to stream/encrypt a complete copy of the payload contents to a temporary file before sending.

To avoid this same pitfall, amazonka-s3 now uses streaming signature calculation when sending requests. This removes the need for the pre-calculated SHA256 hash and allows the encryption and signing to be performed incrementally as the request is sent.

Unfortunately, despite the documentation claiming that Transfer-Encoding: chunked is supported - it appears that you need to estimate the encrypted Content-Length (including metadata) and send this without the Transfer-Encoding header, otherwise the signature calculations simply fail with the usual obtuse S3 403 response.

The smart constructors emitted by the generation step for all amazonka-* operations now take into account streaming signature support and you’re likely to encounter the following parameters for operations utilising a streaming request body:

ToHashedBody and ToBody type classes are provided to make it easy to convert values such as JSON, ByteString, etc into the appropriate request body. amazonka itself exports functions such as hashedFile, chunkedFile and others to assist in constructing streaming request bodies.

All regular S3 PutObject and UploadPart operations now take advantage of streaming signature calculation with the default chunk size set to 128 KB. This seems to be a decent trade off between streaming and the expense of incrementally performing signature calculations, but I’d recommend profiling for your particular use-case if performance and allocations are a concern.

The above information is available in a more context sensitive format within the documentation.

Encryption and Decryption

Client-side encryption of S3 objects is used to securely and safely store sensitive data in S3. When using client-side encryption, the data is encrypted before it is sent to S3, meaning Amazon does not receive your unencrypted object data. Unfortunately the object metadata (headers) still leak, so any sensitive information should be stored within the payload itself.

The procedure for encryption is as follows:

For decryption:

If you’re unsure about which key mechanism to use, I’d recommend using KMS initially to avoid having to store and manage your own master keys.

Instruction Files

By default, the metadata (known as an envelope) required for encryption (except for the master key itself) is stored as S3 object metadata on the encrypted object. Due to user-defined S3 metadata being limited to 8KB when sending a PUT request, if you are utilising object metadata for another purpose which exceeds this limit, an alternative method of storing the encryption envelope in an adjacent S3 object is provided. This method removes the metadata overhead at the expense of an additional HTTP request to perform encryption/decryption. By default the library will store and retrieve a <your-object-key>.instruction object if the related *Instruction suffixed functions are used.

Compatibility and Status

Metadata and instruction envelopes are designed to be compatible with the official Java AWS SDK (both V1 and V2 formats), but only a limited set of the possible encryption options are supported. Therefore assuming defaults, objects stored with this library should be retrievable by any of the other official SDKs, and vice versa. Support for other cryptographic configurations will be added in future, as needed.

amazonka-s3-encryption can currently be considered an initial preview release. Despite this, it’s tied to the greater release process for the other amazonka-* libraries and therefore life will start somewhere after version 1.3.1. It is separated from amazonka-s3 proper, there are extra dependencies not desirable within the main S3 package, such as amazonka-kms and conduit-combinators. This way those using unencrypted S3 operations do not inadvertantly end up with an amazonka-kms dependency.

The library is currently being used in a limited capacity and the release to Hackage will be delayed until I’m confident of correctness, robustness and compatibility aspects. If you’re brave enough to experiment, it’s contained within the greater amazonka project on GitHub. Please open an issue with any problems/suggestions or drop into the Amazonka Gitter chat if you have questions.