AWS Glue is a scalable, serverless software that lets you speed up the event and execution of your knowledge integration and ETL workloads. Right now we’re launching Glue 4.0, with up to date engines, help for extra knowledge codecs, Ray help, and much more.
Earlier than I dive in, only a phrase about versioning. In contrast to most AWS companies, the place the service staff owns and has full management over the APIs, Glue features a assortment of libraries, engines, and instruments developed by the open supply neighborhood. A few of these parts don’t preserve strict backward compatibility, usually in pursuit of effectivity. With a purpose to guarantee that modifications to the parts don’t influence your Glue jobs, you need to choose a selected Glue model while you create the job.
Every model of Glue consists of efficiency and reliability advantages along with the added options, and it is best to plan to improve your jobs over time to benefit from all that Glue has to supply.
Dive in to Glue
Let’s check out what’s new in Glue 4.0:
Up to date Engines – This model of Glue consists of Python 3.10 and Apache Spark 3.3.0. Each engines embrace bug fixes and efficiency enhancements; Spark consists of new options akin to row-level runtime filtering, improved error messages, further built-in features, and far more. Glue and Amazon EMR make use of the identical optimized Spark runtime, which has been optimized to run within the AWS cloud and could be 2-3 occasions sooner than the fundamental open supply model.
New Engine Plugins – Glue 4.0 provides native help for the Cloud Shuffle Service Plugin for Spark that will help you scale your disk utilization, and Adaptive Question Execution to dynamically optimize your queries as they run.
Pandas Help – Pandas is an open supply knowledge evaluation and manipulation software that’s constructed on prime of Python. It’s straightforward to be taught and consists of every kind of attention-grabbing and helpful knowledge manipulation features.
New Information Codecs – Whether or not you’re constructing a knowledge lake or a knowledge warehouse, Glue 4.0 now handles new open supply knowledge codecs for sources and targets, with help for Apache Hudi, Apache Iceberg, and Delta Lake. To be taught extra about these new choices and codecs, learn Get Began with Apache Hudi utilizing AWS Glue by Implementing Key Design Ideas.
All the things Else – Along with the above objects, Glue 4.0 additionally consists of the Parquet vectorized reader, with help for extra knowledge varieties and encodings. It has been upgraded to make use of log4j 2 and is not depending on log4j 1.
Glue 4.0 is offered in the present day within the US East (Ohio, N. Virginia), US West (N. California, Oregon), Africa (Cape City), Asia Pacific (Hong Kong, Jakarta, Mumbai, Osaka, Seoul, Singapore, Sydney, Tokyo), Canada (Central), Europe (Frankfurt, Eire, London, Milan, Paris, Stockholm), Center East (Bahrain), and South America (Sao Paulo) AWS Areas.