Explore Courses Blog Tutorials Interview Questions
0 votes
in AWS by (19.1k points)

This is my first question here as I'm fairly new to this world! I've spent a few days trying to figure this out for myself, but haven't so far been able to find any useful info.

I'm trying to retrieve a byte range from a file stored in S3, using something like:

S3Key.get_contents_to_file(tempfile, headers={'Range': 'bytes=0-100000'}

The file that I'm trying to restore from is a video file, specifically an MXF. When I request a byte range, I get back more info in the tempfile than requested. For example, using one file, I request 100,000 bytes and get back 100,451.

One thing to note about MXF files is that they legitimately contain 0x0A (ASCII line feed) and 0x0D (ASCII carriage return).

I had a dig around and it appears that any time a 0D byte is present in the file, the retrieved info adds 0A 0D instead of just 0D, therefore appearing to retrieve more info than required.

As an example, original file contains the Hex string of:

02 03 00 00 00 00 3B 0A 06 0E 2B 34 01 01 01 05

But the file downloaded form S3 has:

02 03 00 00 00 00 3B 0D 0A 06 0E 2B 34 01 01 01 05

I've tried to debug the code and work my way through the Boto logic, but I'm relatively new at this, so get lost very easily.

I created this for testing, which shows the issue

from boto.s3.connection import S3Connection

from boto.s3.connection import Location

from boto.s3.key import Key

import boto

import os

## AWS credentials

AWS_ACCESS_KEY_ID = 'secret key'

AWS_SECRET_ACCESS_KEY = 'access key'

## Bucket name and path to file

bucketName = 'bucket name'

filePath = 'path/to/file.mxf'

#Local temp file to download to

tempFilePath = 'c:/tmp/tempfile'

## Setup the S3 connection and create a Key to access the file specified

## in filePath


bucket = conn.get_bucket(bucketName)

S3Key = Key(bucket)

S3Key.key = filePath

def testRangeGet(bytesToRead=100000): # default read of 100K

    tempfile = open(tempFilePath, 'w')

    rangeString = 'bytes=0-' + str(bytesToRead -1)  #create byte range as string

    rangeDict = {'Range': rangeString} # add this to the dictionary

    S3Key.get_contents_to_file(tempfile, headers=rangeDict) # using Boto


    bytesRead = os.path.getsize(tempFilePath)

    print 'Bytes requested = ' + str(bytesToRead)

    print 'Bytes recieved = ' + str(bytesRead)

    print 'Additional bytes = ' + str(bytesRead - bytesToRead)

I guess there is something in the Boto code that is looking out for certain ASCII escape characters and modifying them, and I can't find any way to specify to just treat it as a binary file.

Has anyone had a similar problem and can share a way around it?

1 Answer

0 votes
by (44.4k points)

The output file should be a binary file.

tempfile = open(tempFilePath, 'wb')

Only necessary on Windows because Unix systems won't convert anything if it is a text or a binary file.

Related questions

0 votes
1 answer

Want to get 50% Hike on your Salary?

Learn how we helped 50,000+ professionals like you !

0 votes
1 answer
0 votes
1 answer
asked Jul 23, 2019 in AWS by yuvraj (19.1k points)

Browse Categories