AWS Glue interview questions - 4

16. What are the main components of AWS Glue ?
AWS Glue consists of a Data Catalog which is a central metadata repository, an ETL engine that can automatically generate Scala or Python code, and a flexible scheduler that handles dependency resolution, job monitoring, and retries.

17. How to process MS Excel using Glue ?
As of now glue crawlers doesn't support MS Excel files. If you want to create a table for the excel file you have to convert it first from excel to csv/json/parquet and then run crawler on the newly created file.

18. Explain AWS Glue Data Catalog ?
The AWS Glue Data Catalog is a central repository to store structural and operational metadata for all your data assets. For a given data set, you can store its table definition, physical location, add business relevant attributes, as well as track how this data has changed over time.

19. What is AWS Glue Triggers ?
When fired, a trigger can start specified jobs and crawlers. A trigger fires on demand, based on a schedule, or based on a combination of events. A trigger can exist in one of several states. A trigger is either CREATED, ACTIVATED, or DEACTIVATED. There are also transitional states, such as ACTIVATING. To temporarily stop a trigger from firing, you can deactivate it. You can then reactivate it later.

20. Give some argument names used by AWS Glue internally that you cant set ?
--conf
--debug
--mode
--JOB_NAME

12345