Table of Contents
Athena vs. Macie
Introduction
Athena is a query service, which helps to analyze data in Amazon S3 by using standard SQL. It is a server-less service; therefore, management of infrastructure is not required, and the cost is only based on the query execution.
Macie is the security service that automatically detects, classifies and protects the sensitive data in AWS via the use of Machine Learning. This is a service that helps to protect your personal information stored in S3.
Athena
Amazon Athena is an interactive query service, which makes it easy to analyze data in S3 using standard SQL. The services offered by Amazon Athena are:
- It allows you to quickly query on structured, semi-structured, and structured data store in S3
- It is serverless based service
- It is designed to provide fast performance for large data sets
- It is designed for 99.99999% durability
Features
- Presto and Hive: Athena provides full standard SQL support by using Presto, and it supports DDL by using Hive
- Data Formats: Athena supports CSV, TSV, JSON, Textfiles, ORC, and Parquet. It also supports compressed data in Snappy, Zlib, LZO, and GZIP
- Integration with AWS Glue: AWS Glue is used to categorize your data, clean it, enrich it, and move it reliably between various data stores. This service is fully managed ETL (Extract, Transform and Load). Automatically, AWS Glue Crawlers store the associated metadata in an AWS Glue Data Catalog, entering the database and table schema from your data source. By using AWS Glue crawler, a table can be created in Athena
Athena vs. Other AWS Services
Let’s compare Athena with RedShift, EMR, S3 Select and Glacier Select.
RedShift
It is used for enterprise reporting, BI or Fast query, which involves complex SQL queries. RedShift is not serverless, as it needs running nodes. The data is extracted from a number of different sources.
EMR
EMR is used to run distributed processing frameworks like Hadoop, Spark, HBase, etc. EMR is a flexible service as the user can run custom applications in code, amount of computing required, take benefit of spot instances, select the required memory and storage, etc. EMR can run Machine Learning and data transform.
S3 Select
It is used for retrieving a subset of data from an object via simple SQL expressions. The performance of the application is increased as the data is only the demanded one.
Glacier Select
Data can be extracted by a Glacier query within a minute. Standard SQL statements can run directly against the Glacier object after the data is retrieved. Restoration to S3 is not required.
Macie
Amazon Macie is a powerful security and compliance enabling service that sits within the identity, management and compliance category of the AWS management console. The main function of the service is to provide automatic detecting, classifying and identifying the data that is stored in the AWS account. Macie is a service that uses machine learning allowing your data to be actively reviewed as different actions taken within the AWS account.
Features
There are number of features offered by Amazon Macie.
- Amazon Macie will automatically detect and classify the new data that is stored in Amazon S3
- Using the abilities of Machine Learning and Artificial Intelligence, the service has the ability to familiarize the access pattern to data, over a certain period of time
- Amazon Macie also uses Natural Language Processing methods to help classify different data types and content
- Amazon Macie is able to monitor and discover security changes as well as identifying the specific security-centric data such as the access key held within the S3 bucket
Application of Amazon Macie Service
- It is used to provide dashboard, reporting and alerts (in case of unauthenticated access)
- It is directly used with S3 (for future it will work for other data stores)
- It is used to analyze the CloudTrail log by integrating with CloudTrail
- It is great for PCI DSS and prevents ID theft
Amazon Athena vs. Amazon Macie
Amazon Athena | Amazon Macie |
Amazon Athena is a data analysis tool | Amazon Macie is a managed service for monitoring data access activities like S3 buckets |
It is an interactive query service to analyze the data directly from Amazon S3 | It uses AI to recognize if S3 contains data like PII (Personally Identifiable Information) |
Amazon Athena is a serverless and read only service | Amazon Macie recognizes sensitive data and provides the dashboard with alerts |
Amazon Athena is super simple to use | Amazon Macie is security service that uses machine learning to automatically discover, classify and protect sensitive data in AWS |
Amazon Athena is a highly available service that can be integrated with RedShift or EMR | Amazon Macie is available in US East 1 and Oregon US West 2 region. It is available to protect data stored in the S3 bucket |
Conclusion
In this blog, Amazon Athena and Amazon Macie service are discussed. Amazon Athena is a data analysis tool used to analyze the raw data store in the S3 bucket. The data can be in the form of structured, semi-structured or uncleaned. No data transmission is needed to use Amazon Athena service because it supports multiple data formats. Whereas, Amazon Macie is a fully managed service that automatically detects and classifies the data stored in Amazon S3. It uses Machine Learning and Artificial Intelligence to recognize sensitive data. With Amazon Macie, users can easily understand where sensitive data is located and how it is accessed inside the environment.