Explore Courses Blog Tutorials Interview Questions
0 votes
in AWS by (19.1k points)

I have a large local file. I want to upload a gzipped version of that file into S3 using the boto library. The file is too large to gzip it efficiently on disk prior to uploading, so it should be gzipped in a streamed way during the upload.

The boto library knows a function set_contents_from_file() which expects a file-like object it will read from.

The gzip library knows the class GzipFile which can get an object via the parameter named fileobj; it will write to this object when compressing.

I'd like to combine these two functions, but the one API wants to read by itself, the other API wants to write by itself; neither knows a passive operation (like being written to or being read from).

Does anybody have an idea on how to combine these in a working fashion?

1 Answer

0 votes
by (44.4k points)

Using Python 3, this would work:

from io import BytesIO

import gzip

def sendFileGz(bucket, key, fileName, suffix='.gz'):

    key += suffix

    mpu = bucket.initiate_multipart_upload(key)

    stream = BytesIO()

    compressor = gzip.GzipFile(fileobj=stream, mode='w')

    def uploadPart(partCount=[0]):

        partCount[0] += 1

        mpu.upload_part_from_file(stream, partCount[0])


    with open(fileName, "rb") as inputFile:

        while True:  # until EOF

            chunk =

            if not chunk:  # EOF?






            if stream.tell() > 10<<20:  # min size for multipart upload is 5242880


Related questions

Want to get 50% Hike on your Salary?

Learn how we helped 50,000+ professionals like you !

0 votes
1 answer
0 votes
1 answer
asked Jul 23, 2019 in AWS by yuvraj (19.1k points)
0 votes
1 answer

Browse Categories