Should I run Glue crawler everytime to fetch latest data?

Question

asked Jan 13, 2021 in AWS by devin (5.6k points)

I have an S3 bucket, every 3 hours I get a file in the bucket with a timestamp attached to it. I'm using Glue job for moving the files from S3 to Redshift with some transformations. The Glue job uses the table created in Data Catalog via crawler as he input.

First, run:

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "test", table_name = "employee_623215", transformation_ctx = "datasource0")

After three hours if I get one more file on the S3 bucket should I crawl it again?

Is there any way to have a single table in Data Catalog and update the table with the latest S3 file which can be used by Glue Job for processing or I need to run crawler every time to get the latest data?

1 Answer

Related questions

0 votes

1 answer

Python Web Crawler from thenewboston

asked Feb 4, 2021 in Python by vinita (107k points)

Want to get 50% Hike on your Salary?

Learn how we helped 50,000+ professionals like you !

0 votes

1 answer

What is aws glue crawler?

asked Nov 17, 2020 in AWS by Amyra (12.9k points)

0 votes

2 answers

Downloading the latest file in an S3 bucket using AWS CLI?

asked Jul 2, 2019 in AWS by Amyra (12.9k points)

0 votes

1 answer

User Data script to call aws cli

asked Feb 5, 2020 in AWS by yuvraj (19.1k points)

+9 votes

1 answer

How do i use wildcards to copy group of files in AWS CLI?

asked May 30, 2019 in AWS by Rohan (1.5k points)

kritika · Answer 1 · 2021-01-13T04:50:11+0000

As per my understanding, an alternative approach can be, instead of reading from the catalog read directly from S3 and process data in the Glue job.

Use the below command to run the crawler again:

from_options(connection_type, connection_options={}, format=None, format_options={}, transformation_ctx="")

For more details, you can read the documentation here.

Do you want to master AWS, then do checkout the AWS Course by Intellipaat.

Should I run Glue crawler everytime to fetch latest data?

1 Answer

Related questions

Browse By Domains

Popular Courses

Popular Tutorials

Popular Resources