Deployment & Hosting

AWS Glue

Name: AWS Glue
Brand: AWS Glue
Availability: InStock

AWS Glue is a serverless data integration service for discovering, preparing, and combining data for analytics, machine learning, and application development. It supports ETL workflows, data cataloging, and scalable data processing on AWS.

Starting atFree

Visit AWS Glue →

💡

In Plain English

Overview

AWS Glue is a fully managed, serverless data integration service provided by Amazon Web Services that enables organizations to discover, prepare, move, and combine data from multiple sources for analytics, machine learning, and application development. It eliminates the need to provision or manage infrastructure for ETL (Extract, Transform, Load) pipelines, allowing data engineers and analysts to focus on transformation logic rather than cluster management.

At its core, AWS Glue provides several integrated components. The Glue Data Catalog serves as a centralized, persistent metadata repository compatible with Apache Hive Metastore, storing table definitions, schemas, and partition information for data assets across S3, RDS, Redshift, and dozens of other data stores. Glue Crawlers automatically scan data sources, infer schemas, and populate the Data Catalog, reducing manual cataloging effort. Glue ETL Jobs run on a managed Apache Spark or Apache Ray environment, supporting Python (PySpark) and Scala for batch transformations, with auto-scaling that adjusts Data Processing Units (DPUs) based on workload. As of Glue version 4.0, jobs run on an optimized Spark 3.3.0 runtime with up to 2.7x faster start times and improved performance over earlier versions.

AWS Glue also supports streaming ETL for near-real-time data processing from Amazon Kinesis Data Streams and Apache Kafka sources, enabling continuous ingestion pipelines. Glue DataBrew provides a visual, no-code data preparation interface with over 250 built-in transformations, making data cleaning accessible to analysts without programming expertise. Glue Studio offers a visual drag-and-drop interface for authoring, running, and monitoring ETL jobs.

The service integrates natively with the broader AWS ecosystem including Amazon S3, Amazon Redshift, Amazon Athena, Amazon EMR, and AWS Lake Formation. It supports the AWS Glue Schema Registry for managing and enforcing Avro and JSON schemas in streaming applications. Glue handles job bookmarking to process only new data in incremental loads, and supports job triggers and workflows for orchestrating complex multi-step ETL pipelines.

AWS Glue processes petabytes of data for organizations ranging from startups to enterprises. It supports JDBC, ODBC, and native connectors to over 70 data sources including SaaS applications via AWS Glue custom connectors and the AWS Marketplace. The service operates across all major AWS regions and is SOC, HIPAA, and PCI DSS compliant, making it suitable for regulated industries.

🎨

Vibe Coding Friendly?

▼

Difficulty:intermediate

Suitability for vibe coding depends on your experience level and the specific use case.

Learn about Vibe Coding →

Was this helpful?

Key Features

•Serverless Apache Spark and Apache Ray ETL job execution with auto-scaling
•Centralized Glue Data Catalog compatible with Apache Hive Metastore
•Automatic schema discovery via Glue Crawlers across 70+ data sources
•Visual no-code data preparation with Glue DataBrew (250+ transformations)
•Visual ETL job authoring and monitoring with Glue Studio
•Streaming ETL for Kinesis Data Streams and Apache Kafka sources
•Job bookmarking for incremental data processing
•Workflow orchestration with triggers, schedules, and conditional logic
•Glue Schema Registry for Avro and JSON schema management
•Native integration with S3, Redshift, Athena, EMR, and Lake Formation
•JDBC, ODBC, and AWS Marketplace custom connectors
•Glue 4.0 optimized runtime with faster cold starts and Spark 3.3.0

Pricing Plans

Data Catalog Free Tier

Free

Glue ETL Jobs

From $0.44/DPU-hour

Glue DataBrew

$1.00 per node-hour

Glue Data Catalog (beyond free tier)

$1.00 per 100,000 objects/month

Glue Crawlers

$0.44/DPU-hour

See Full Pricing →Free vs Paid →Is it worth it? →

Ready to get started with AWS Glue?

View Pricing Options →

Pros & Cons

✓ Pros

✓Fully serverless with no infrastructure to provision, patch, or scale manually
✓Deep native integration with the AWS ecosystem (S3, Redshift, Athena, Lake Formation)
✓Always-free Data Catalog tier lowers the barrier for metadata management
✓Glue 4.0 significantly improved cold start times (up to 2.7x faster) and performance
✓Supports both batch and streaming ETL in a single service
✓DataBrew enables non-technical users to participate in data preparation
✓Auto-scaling adjusts DPUs dynamically to match workload, reducing over-provisioning

✗ Cons

✗Cold start latency for Spark jobs can reach several minutes, making it unsuitable for low-latency or interactive workloads
✗Debugging Spark-based jobs can be complex—error messages are often opaque and require Spark expertise
✗VPC networking configuration for accessing private data sources adds operational complexity
✗Per-DPU-hour pricing can become expensive for long-running or always-on pipelines compared to reserved EMR clusters
✗Limited language support—primarily PySpark and Scala, with Ray support still maturing
✗Job orchestration capabilities are basic compared to dedicated tools like Apache Airflow or Step Functions
✗Vendor lock-in to AWS; migrating Glue-dependent pipelines to another cloud requires significant rework

Frequently Asked Questions

How much does AWS Glue cost?+

AWS Glue pricing starts at Free. They offer 5 pricing tiers including a free option.

What are the main features of AWS Glue?+

AWS Glue includes Serverless Apache Spark and Apache Ray ETL job execution with auto-scaling, Centralized Glue Data Catalog compatible with Apache Hive Metastore, Automatic schema discovery via Glue Crawlers across 70+ data sources and 9 other features. AWS Glue is a serverless data integration service for discovering, preparing, and combining data for analytics, machine learning, and application deve...

What are alternatives to AWS Glue?+

Popular alternatives to AWS Glue include [object Object], [object Object], [object Object], [object Object], [object Object]. Each offers different features and pricing models.

🦞

New to AI tools?

Read practical guides for choosing and using AI tools

Read Guides →

Get updates on AWS Glue and 370+ other AI tools

Weekly insights on the latest AI tools, features, and trends delivered to your inbox.

User Reviews

No reviews yet. Be the first to share your experience!

Quick Info

Try AWS Glue Today

Get started with AWS Glue and see if it's the right fit for your needs.

Get Started →

Need help choosing the right AI stack?

Take our 60-second quiz to get personalized tool recommendations

Find Your Perfect AI Stack →

Want a faster launch?

Explore 20 ready-to-deploy AI agent templates for sales, support, dev, research, and operations.

Browse Agent Templates →

More about AWS Glue

Pricing Review Alternatives Free vs Paid Pros & Cons Worth It?Tutorial

Overview

Key Features

•Serverless Apache Spark and Apache Ray ETL job execution with auto-scaling

•Centralized Glue Data Catalog compatible with Apache Hive Metastore

•Automatic schema discovery via Glue Crawlers across 70+ data sources

•Visual no-code data preparation with Glue DataBrew (250+ transformations)

•Visual ETL job authoring and monitoring with Glue Studio

•Streaming ETL for Kinesis Data Streams and Apache Kafka sources

•Job bookmarking for incremental data processing

•Workflow orchestration with triggers, schedules, and conditional logic

•Glue Schema Registry for Avro and JSON schema management

•Native integration with S3, Redshift, Athena, EMR, and Lake Formation

•JDBC, ODBC, and AWS Marketplace custom connectors

•Glue 4.0 optimized runtime with faster cold starts and Spark 3.3.0