Tuesday, December 6, 2022
HomeBig DataAutomate Amazon Redshift load testing with the AWS Analytics Automation Toolkit

Automate Amazon Redshift load testing with the AWS Analytics Automation Toolkit

Amazon Redshift is a quick, totally managed, broadly standard cloud information warehouse that powers the fashionable information structure that empowers you with quick and deep insights and machine studying (ML) predictions utilizing SQL throughout your information warehouse, information lake, and operational databases. A key differentiating issue of Amazon Redshift is its native integration with different AWS providers, which makes it simple to construct full, complete, and enterprise-level analytics purposes. The AWS Analytics Automation Toolkit permits computerized provisioning and integration of not solely Amazon Redshift, however database migration providers like AWS Database Migration Service (AWS DMS) and the AWS Schema Conversion Instrument (AWS SCT).

This put up discusses new additions to the AWS Analytics Automation Toolkit, which allow you to carry out superior load testing on Amazon Redshift. That is achieved by provisioning Apache JMeter as a part of the analytics stack.

Resolution overview

Apache JMeter is an open-source load testing software written in Java that you need to use to load check net purposes, backend server purposes, databases, and extra. Within the database context, it’s an especially invaluable software for repeating benchmark assessments in a constant method, simulating concurrency workloads, and scalability testing on completely different database configurations.

For instance, you need to use JMeter to simulate a single enterprise intelligence (BI) consumer or a whole bunch of BI customers concurrently working numerous SQL queries on an Amazon Redshift cluster for efficiency benchmarking, scalability, and throughput testing. Moreover, you’ll be able to rerun the identical precise simulation on a distinct Amazon Redshift cluster that maybe has twice as many nodes as the unique cluster, to match the value/efficiency ratio of every cluster.

Equally, you need to use JMeter to load check and assess the efficiency and throughput achieved for combined extract, remodel, and cargo (ETL) and BI workloads working on completely different Amazon Redshift cluster configurations.

For a deeper dialogue of JMeter and its use for benchmarking Amazon Redshift, consult with Constructing high-quality benchmark assessments for Amazon Redshift utilizing Apache JMeter.

Though JMeter set up is a comparatively easy course of, consisting primarily of downloading and putting in a Java digital machine and JMeter, the considered having to obtain, set up, and arrange any software for benchmarking functions can generally function a detractor for a lot of. Ranging from scratch for a check setup may be intimidating.

The AWS Analytics Automation Toolkit now contains the choice to robotically deploy JMeter on Amazon Elastic Compute Cloud (Amazon EC2) in the identical digital non-public cloud (Amazon VPC) as Amazon Redshift. This features a devoted Home windows occasion, with all required JMeter dependencies, resembling JVM and a pattern check plan, thereby simply enabling highly effective load testing capabilities on Amazon Redshift. On this put up, we reveal using the AWS Analytics Automation Toolkit for JMeter load assessments on cloud benchmark information, utilizing Amazon Redshift as a goal atmosphere.

This answer has the next options:

  • It deploys sources robotically, together with JMeter
  • You may level JMeter to an present Amazon Redshift cluster, or robotically create a brand new cluster
  • You may carry your personal information and queries, or use a pattern TPC dataset
  • You may simply customise the check plan into separate threads, every with completely different workloads and concurrency as wanted

To make use of the AWS Analytics Automation Toolkit to run a JMeter load check, deploy the toolkit with the JMeter choice, load information into your Amazon Redshift cluster, and customise the default check plan as you see match.

The next diagram illustrates the answer structure:


Previous to deploying the AWS Analytics Automation Toolkit, full the conditions and put together the config file, user-config.json. For directions, consult with Automate constructing an built-in analytics answer with AWS Analytics Automation Toolkit.

The config file has a brand new parameter referred to as JMETER, on the high part. To provision a JMeter occasion, enter the worth CREATE for this parameter.

To provision a brand new Amazon Redshift cluster, enter the worth CREATE for the REDSHIFT_ENDPOINT parameter. Then fill in values for the fields within the redshift part of the config file. You need to use the sizing calculator on the Amazon Redshift console to advocate the proper cluster configuration primarily based in your information dimension.

If you’d like load an industry-standard pattern TPC-DS information (3TB) into your cluster, enter the worth “Y” for the “loadTPCdata” parameter, beneath the redshift part.

To make use of an present cluster, enter the endpoint of your cluster for the REDSHIFT_ENDPOINT parameter within the high part of the user-config.json file.

Deploy sources utilizing the AWS Analytics Automation Toolkit

To deploy your sources, full the next steps:

  1. Launch the toolkit as described in Automate constructing an built-in analytics answer with AWS Analytics Automation Toolkit.
  2. Create tables and ingest your check information into your Amazon Redshift cluster.
    • When you selected to load the pattern TPC-DS 3TB information, this can take a while to load, so please permit for this. In case you are loading your personal information then it’s possible you’ll try this at this level.

Launch JMeter

To launch JMeter, full the next steps:

  1. Utilizing RDP, log in to the JMeter EC2 Home windows occasion created by the AWS Analytics Automation Toolkit.
  2. Launch the JMeter GUI by selecting (double-clicking) the shortcut JMETER on the Home windows Desktop.

In our expertise, altering the JMeter Look and Really feel choice to Home windows (as a substitute of darkish mode) leads to elevated JMeter stability, so we extremely advocate making that change and selecting Sure to restart the GUI.

Customise the JMeter check plan

To customise the JMeter check plan, we modify the JDBC connection, and optionally modify the thread ramp-up schedule and optimize the SQL.

  1. Utilizing the JMeter GUI, open the AWS Analytics Automation Toolkit’s default check plan file c:JMETERapache-jmeter-5.4.1Redshift Load Take a look at.jmx.
  2. Select the check plan title and edit the JdbcUser worth to the proper consumer title in your Amazon Redshift cluster.

When you used the CREATE cluster choice, this worth is identical because the master_user_name worth in your user-config.json file.

  1. In JDBC connection, edit the DatabaseURL worth and password with the proper values in your cluster.

When you used the CREATE cluster choice, the password is saved in a secret named <stackname>-RedshiftPassword. You could find the endpoint by selecting the brand new Amazon Redshift cluster and copying the endpoint worth on the higher proper.

This check plan already has a initialization command set to show off the consequence cache characteristic of Amazon Redshift: set enable_result_cache_for_session to off. No motion is required to configure this.

  1. Optionally, modify the thread ramp-up schedule.

This check plan makes use of the Final Thread Group, which is robotically put in while you open the check plan. Every thread group incorporates a ramp-up schedule, in addition to a question or set of queries. Modify these to in line with your testing preferences and dataset. When you loaded the TPCDS dataset, the queries included by default within the three thread teams will work.

Within the following instance, Smallthread Group has 4 rows, every of which launches the particular variety of periods at staggered timings. The ramp-up time to realize the utmost session depend is 45 seconds, as a result of the final thread doesn’t begin till 15 seconds into the check, and has a 30-second begin time. You may regulate the ramp-up schedule, in addition to the maintain length and shutdown time, by modifying, including, or deleting rows within the Thread Schedule part. The graph is then robotically adjusted.

  1. Optionally, you’ll be able to customise the SQL.

Within the following instance, we select the JDBC request merchandise beneath the identical SmallThread Group, referred to as SmallSQL, and evaluation the question run by this thread group. When you added your personal information, insert the question or queries you need to run for this thread group. Do the identical for the medium and huge thread teams, or delete or add thread teams as wanted.

Run the check

Run the check plan by selecting the inexperienced arrow within the GUI.

Alternatively, enter the next command in a Home windows command immediate to run JMeter in command line mode:

C:JMETERapache-jmeter-5.4.1binjmeter -n -t RedshiftTPCDWTest.jmx -e -l check.out

Use case instance: Evaluating concurrency scaling advantages

For our instance use case, we need to decide how Amazon Redshift concurrency scaling advantages our workload efficiency. We are able to use the JMeter setup outlined on this put up to rapidly reply this query.

Within the following instance, we ran the pattern check plan as is in opposition to the pattern TPC-DS information, with 80 concurrent customers throughout three thread teams. We ran the check first with concurrency scaling enabled, then reran it after disabling concurrency scaling on the Amazon Redshift cluster.

To observe the outcomes, open the Amazon Redshift console, select Clusters within the navigation pane, and select the cluster you’re utilizing for the efficiency check. On the Cluster efficiency tab, you’ll be able to monitor the CPU utilization of all of the nodes in your cluster, as proven within the following screenshot.

On the Question monitoring tab, you’ll be able to monitor the queue exercise, in addition to concurrency scaling exercise of your cluster.

The previous graphs cowl two completely different assessments with a 4-node ra3.16xlarge cluster. On the left facet, concurrency scaling was enabled for the cluster, and on the suitable facet it was disabled. Within the check with concurrency scaling enabled, the queueing is much less, and check completion length is shorter.

Observe that the check length is the time to run the workload on Amazon Redshift, to not obtain question outcomes.

Lastly, to evaluation the precise check consequence question efficiency, you’ll be able to obtain the file C:JMETERapache-jmeter-5.4.1SummaryReportIndividualRecords.csv from the EC2 occasion, and evaluation the question efficiency for every thread. The next screenshot is a abstract plot of the check outcomes for this instance, with and with out concurrency scaling.

Because the chart illustrates, concurrency scaling can considerably scale back the latency in your workloads, and is especially helpful for brief bursts of exercise in your software.


JMeter means that you can create a versatile and highly effective load check in your Amazon Redshift clusters to evaluate the efficiency of the cluster. With the brand new functionality within the AWS Analytics Automation Toolkit, you’ll be able to provision and configure JMeter, together with a check plan, in a fraction of the time it might usually take.

Concerning the Authors

Samir Kakli is an Analytics POC Specialist Options Architect primarily based out of Florida. He’s centered on serving to clients rapidly and successfully align Amazon Redshift’s capabilities to their enterprise wants.

Asser Moustafa is an Analytics Specialist Options Architect at AWS primarily based out of Dallas, Texas. He advises clients within the Americas on their Amazon Redshift and information lake architectures and migrations, ranging from the POC stage to precise manufacturing deployment and upkeep.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments