Skip to main content

I turned Amazon Prime Photos into an "unlimited" off-site data backup service via JPEG APP markers

It has been several years since the Amazon Cloud Drive service API went from "okay" to "completely non-functional." Since I haven't used the service for quite a while, I figured it would be an okay time to reveal how I used to use Amazon Prime Photos as an unlimited storage backup service.

It all starts with modifying the JPEG file format in ways not really conceived of before. The JPEG file format is a chunked, structured binary file format. What this means is that JPEG format reserves the 0xFF byte for "markers" for the start of each "chunk" and each chunk has a specific structure. For example, a JPEG image starts with the Start of Image (SOI) marker, which is 0xFF 0xD8, so we know that if we see that sequence, we have found the start of a JPEG image. Similarly, the End of Image (EOI) marker is 0xFF 0xD9, which means we have found the end of a JPEG image. There are many other markers for specific purposes that have been reserved for JPEG itself within its own file format.

The JPEG file format specification also allows for what it calls custom application markers. These are defined as APP0-APP15 and are 0xFF 0xEn. This is where things get interesting. While some APP markers are in fairly common use, a few are not used at all. For example, APP5 is relatively unused. But instead of just picking an APP marker and irresponsibly using it, we're going to allow the marker to be infinitely extended should someone else want to do something with APP5:

0xFF 0xD8 (SOI)
...
0xFF 0xE5  (APP5)
stor-cbdat\x00  (The data coming next is stored, encrypted data for Cloud Backup)
~10MB payload of compressed, encrypted data  (all 0xFF's get replaced with 0xFF 0x00's as per the JPEG spec)
0xFF 0xD9 (EOI)

Most JPEG libraries will generally ignore or pass through unknown custom application markers. That is, most, if not all, image libraries will still verify that the image is a real JPEG image. Therefore, correctly dealing with the situation of using APP markers for shoving 10MB of extra data into the image almost certainly requires writing a custom image processing library.

The first step was to check/verify two things: Did Amazon servers modify JPEG images that were uploaded in any way and, second, did they count the JPEG images as real images and flag them as valid "photos" for the "unlimited" side of the service or did they count against the 5GB data file limit? Those were pretty simple questions to answer that involved grabbing a JPEG, stuff some test data in an APP5 marker, verify that Photoshop still opened the file, upload the file, download the uploaded file, and check the result to see if the original file was identical to the downloaded file. Obviously, it worked or this post wouldn't exist:

The implementation used in Cloud Backup went a slight step further. The ApplyPhoto() function in the code went out of its way to make the uploaded images fairly unique but also be real images. First, a single image was retrieved from Unsplash (a free photo site with a simple API). For each payload, the previously retrieved image was then drawn on with the current timestamp so the actual image data portion of the file would be unique. Then it dumped the real compressed, encrypted data payload into the APP5 marker and uploaded the final ~10MB photo to Amazon Prime Photos. At the time, it worked fine.

Cloud Backup used to support Amazon Prime Photos via the Amazon Cloud Drive API until the API stopped working altogether. Let's just say that Amazon isn't all that great at designing APIs and that there are much better alternatives to using their services, especially for something as critical as data backups and I'll leave it at that. But for a couple of years, I was successfully storing ~150GB of compressed, encrypted data inside JPEG images within Amazon Prime Photos using CubicleSoft Cloud Backup because unlimited photo storage was, well, unlimited! In case you are excited about this, let me burst that bubble: This solution is actually fairly dead because the Amazon Cloud Drive API is broken (by design). For backup, you are better off using Cloud Backup with Remoted API Server to route backup data to a friend's house via a public Digital Ocean instance.

The real purpose of this post is two-fold: First, chunked and many structured binary file formats like JPEG and PNG can be extended and used in unexpected ways as can be seen in this 23 minute video on structured binary file formats:

Second, I'm hoping this gets your wheels of inspiration rolling. It is, for example, possible to embed audio into a JPEG image using the same APP5 marker like this:

0xFF 0xD8 (SOI)
...
0xFF 0xE5  (APP5)
audio-flac\x00  (The data coming next is audio in the FLAC file format)
Encoded audio data  (all 0xFF's get replaced with 0xFF 0x00's as per the JPEG spec)
0xFF 0xD9 (EOI)

No native software currently exists to play such audio stored inside a JPEG. But that doesn't mean Javascript inside a web browser couldn't load the file and extract those bits of audio data and then start playing them. Why would anyone want to do this? Well, it's likely the browser caches the image, so the audio is already available locally instead of being part of a separate HTTP request. It's also a single file instead of multiple files. I dunno. Pick your favorite reason and go with that. The point is that this idea is quite extensible.

Another idea I've had before is to use APP5 to embed a PNG or SVG inside a JPEG so that the JPEG image doesn't show visual artifacting around the edges of written text. That is, get the small file size of a JPEG for a photo with the clarity of text and vectors in a PNG/SVG all within one file. I actually came up with the "embed a PNG or some audio into a JPEG" idea BEFORE the "let's cram 10MB of encrypted data into a photo file and see what happens" idea. Really, though, the possibilities are endless.

Comments