Many customers have built their IoT platform based on AWS managed service “IoT Core”. Normally, the IoT system consists of many cloud services, such as IoT Core, Kafka, DynamoDB, S3, SNS, etc.
The entire IoT data pipeline involves various cloud services. In actual production systems, each node may encounter failures, leading to an overall system outage. Additionally, the environment in which the devices operate often has a great deal of uncertainty. Before officially going live, the system generally requires detailed and thorough testing to ensure that, even if there is a sudden surge in load traffic, the system can still operate according to the predefined logic, and the system's robustness and resilience can handle these unexpected events. Therefore, stress testing is an indispensable step in the evolutionary process of the Internet of Things (IoT) platform. Currently, in the IoT field, the device connection layer service generally adopts the MQTT protocol, and the simulation of MQTT messages is a crucial step in stress testing.
A typical IoT platform diagram is shown below:
We can generally divide the IoT data flow into uplink and downlink. Uplink data usually includes periodic reports of sensor status data, device operational data, etc. Downlink data typically involves control commands, broadcast pushes, etc. The frequency of sending downlink messages is generally not high. In most scenarios, the traffic bottleneck exists in the uplink data, particularly when the number of devices reaches millions or tens of millions. The aggregated message reporting requests can result in peak TPS (Transactions Per Second) exceeding tens of thousands.
IoT Core supports different protocols for message publish & subscribe: MQTT, HTTP, MQTT over WebSocket.
IoT Core has default TPS limits of PublishIn (uplink) and PublishOut (downlink) requests for most regions:
- PublishIn TPS limit = 20,000/s
- PublishOut TPS limit = 20,000/s
In order to reach higher TPS, customers must:
- fill in the questionnaire: https://w.amazon.com/bin/view/AWS/Mobile/IoT/CustomerLoadTesting
- raise support case in AWS console, at least one month in advance.
- recommend to conduct load test in regions: IAD, PDX, DUB, FRA.
- IoT Core service team will review and increase the limit.
- This is the test result with default limit (20k TPS) PublishIn and PublishOut.
- This is the test result with increased limit (67k TPS) PublishIn and PublishOut.
- This shows the MQTT metrics data:
As Lambda has the limit of 15 minitues maximum running time, we need deploy the client simulation script to EC2 or EKS, so as to reach PublishIn/Out TPS > 100k.
The solution architecture is shown below:
The CloudWatch Metrics is show below:
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.