For querying a subset of a large .csv file stored in Amazon S3 Glacier, which method is the most cost-effective?

Boost your AWS Data Analytics knowledge with flashcards and multiple choice questions, including hints and explanations. Prepare for success!

Choosing to query directly with Amazon S3 Select is the most cost-effective method for retrieving a subset of data from a large .csv file stored in Amazon S3 Glacier. S3 Select allows users to retrieve only the data they need from an object stored in S3, significantly reducing the amount of data retrieved, which in turn decreases storage and data transfer costs.

S3 Select functions by scanning through the data only for the specified criteria, rather than retrieving the entire file and doing the filtering afterward. This targeted approach allows for efficient cost management, especially when dealing with large datasets, as it minimizes the amount of data read and transferred.

Other options may incur higher costs in various ways. For instance, querying using Amazon Athena involves additional charges related to scanning data from S3, and while it can handle large datasets efficiently, it would not be as direct and economical for small subset queries as S3 Select. Loading data to Amazon S3 Glacier Select is not applicable in this scenario because addressing Glacier directly is normally associated with retrieval costs, which can be higher and less efficient than making targeted queries via S3 Select. Lastly, querying with Amazon Redshift Spectrum can also lead to increased costs due to the nature of the underlying infrastructure and data retrieval methods, which are

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy