Which data formats are supported by AWS Glue?

Boost your AWS Data Analytics knowledge with flashcards and multiple choice questions, including hints and explanations. Prepare for success!

AWS Glue is a fully managed extract, transform, and load (ETL) service that supports a variety of data formats to accommodate different data processing needs. The correct answer includes JSON, CSV, Parquet, and ORC as supported data formats.

JSON is widely used for its flexibility in representing structured data allowing nested structures, making it popular for web applications. CSV offers a straightforward way to store tabular data in a text format, which is commonly used for data exchange. Parquet, being a columnar storage file format, is optimized for use with big data processing frameworks including those in the AWS ecosystem, as it provides efficient data compression and encoding schemes. ORC (Optimized Row Columnar) is also a columnar storage format designed for highly efficient querying and is primarily used with Apache Hive.

The other options focus on a narrow set of formats or formats that are generally not aligned with AWS Glue's capabilities. For instance, XML and HTML are less common in modern data processing scenarios compared to the more structured formats like JSON and CSV. Meanwhile, formats like DOCX or PDF are typically used for document storage rather than for ETL tasks that AWS Glue excels at. Thus, the breadth of supported formats in option B reflects AWS Glue's

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy