News
AWS Glue Adds Functionality To Detect Data Anomalies
In preview since last November, a new anomaly detection capability in AWS Glue is now generally available.
AWS Glue is Amazon’s automated extract, transform and load (ETL) solution that aims to reduce the amount of time organizations spend refining their data for machine learning and analytics projects. With Glue, organizations can build data integration pipelines that not only transform and move data, but also enforce data quality based on preset rules.
The issue, AWS argues, is that these rules are not easily updated to detect outlier data caused by seasonality or emerging business trends. An organization’s data needs may evolve over time, but the rules governing their ETL processes often remain static.
The new data anomaly detection capability, released earlier this month as part of the AWS Glue Data Quality feature, addresses this problem using machine learning.
“Although data quality static and dynamic rules are very useful, they can’t capture data seasonality and how data changes as your business evolves,” AWS said in a blog post announcing the anomaly detection feature release. “A machine learning model supporting anomaly detection can understand these complex changes and inform you of anomalies in the dataset.”
Anomaly detection analyzes new data as it’s generated, identifies outliers and recommends adjustments to the existing data quality rules to incorporate those outliers.
As this AWS product page explains, the feature “utilizes a machine learning algorithm to learn from past trends and then predict future values. When the actual value does not fall within the predicted range, AWS Glue Data Quality creates an Anomaly Observation. It provides a visual representation of the actual value and the trends.”
More information on AWS Glue is available here.