Load Test #1 - Plan
This document outlines the first load test for the Shannon upgrade. IT IS NOT intended to be an exhaustive evaluation of the entire system's performance. IT IS intended to give visibility into the business logic of the platform, and create a baseline for future load tests.
- All Poktroll
loadtest
issues on GitHub can be found here.
Table of Contents
- Goals
- Non-Goals
- Origin Document
- Load Profiles
- Measurements
- Architecture / Component Diagram
- Tool Requirements
Goals
De-risk
the network’s feasibility to have completely permissionless services & actors- This is intended for
scalability
purposes and does not account forSybil
attacks
- This is intended for
Stress test
the SMT (Sparse Merkle Trie) and how it is being usedBuild intuition
into the cost of operating the network for all of the stakeholders involved, both on & off chainGain visibility
into basic metrics (disk, RAM, CPU, ingress/egress traffic, etc.…) for our network actorsUncover
potential bugs, bottlenecks or concurrency issues in the on-chain & off-chain codeDocument and design
a process that’ll act as the foundation for future load-testing efforts
Non-Goals
Exhaustive benchmarking
of all traditional performance metrics across our tools & packages (key-value stores
,http libs
, etc…)Sybil attacks
tokenomic considerations- Perform any of the following tests:
smoke tests
,spike tests
,fuzzy testing
,chaos testing
,soak testing
etc… Evaluating
the performance or results ofproxy services
/data nodes
Selecting new tools
or libraries as a direct outcome of these results.Mimicking the scale
of Morse (v0) today- Accounting for
Failure cases
, since the primary focus is just evaluating happy path scale - Anything to do with
Quality of Service
as it is concerned from today’s Gateway POV
Origin Document
This forum post from Morse is a good starting point to gain an understanding of why load testing is important and critical in Shannon.
https://forum.pokt.network/t/block-sizes-claims-and-proofs-in-the-multi-gateway-era/5060/9
Load Profiles
Variable Parameters
Metric | Starting Value | Terminal Value | Increment (fuzz + approximate) |
---|---|---|---|
RelaysPerSecond | 1 rps | 10,000 rps | +100 rps every 10 blocks |
GatewayCount | 1 | 10 | +1 gateway every 100 blocks |
ApplicationCount | 5 | 1,000 | +10 apps every 10 blocks |
SupplierCount | 5 | 1,00 | +1 every 100 blocks |
ProxyService / DataNode | 0 / ∞ | 0 / ∞ | Mocked to avoid being a performance bottlneck |
Constant Parameter
BlockTime
- A constant block time between 10s and 60s will be selected for the benchmarks in this testRequestType
- We will use a “dummy” backing data node / proxy service that leveragesnginx
to return a200
or 500 randomlyRequestDistribution
- TheRelaysPerSecond
will be evenely distributedVirtualUsers
- For simplicity, we will assume a1:1
mapping of virtual users (i.e. curl clients) toApplications
Out-of-scope Parameters
Governance Parameters
have not been implemented yet and are therefore out-of-scope
Measurements
What to measure?
1. Chain State Size
What
:
- A
pie chart
orstacked bar chart
of how the data in the Blocks is distributed - A
line chart
showing state growth over size
Why
- Get an estimate of the cost of data publishing (i.e. TIA tokens)
- Get an estimate of data distribution (where to focus short-term optimizaiton efforts)
Example:
2. Validators
What
: Multiple line charts
to capture Disk
(size & iops), RAM
, CPU
, Network
usage (ingress/egress)
Why
:
Proof Validation
-RAM
&CPU
could be a potential bottleneckBlock generation
-RAM
&CPU
could be a bottleneck in preparing new blocksBlock Publishing
-Tx
aggregation (ingress) andBlock
publishing (egress) could be more expensive than expected w.r.t network usageData Availability State
-Disk
could be a limiting factor depending on how quickly state grows
RAM | CPU | Network | Disk | Time | |
---|---|---|---|---|---|
Proof Validation | ❓ | ❓ | ❓ | ||
Block Generation | ❓ | ||||
Block Publishing | ❓ | ||||
Data Availability State | ❓ |
3. AppGate Server (Application, Gateway, etc…)
What
: Multiple line charts
to capture Disk
(size & iops), RAM
, CPU
, Network
usage (ingress/egress)
Why
:
Relay Proxies
Ingress/egress of relays could add up to large networking costsCaches & State
- All the caching & state can have impact across the boardRequest Processing
- Signature generation, request marshaling / unmarshaling, etc…Response handling
- Slow supplier responses could increase pending relays at the AppGate level (i.e. RAM)
RAM | CPU | Network | Disk | Time | |
---|---|---|---|---|---|
Relay Proxies | ❓ | ❓ | |||
Caches & State | ❓ | ❓ | |||
Request Processing | ❓ | ||||
??? | ❓ |
4. RelayMiner (Supplier, SMT, etc..)
What
: Multiple line charts
to capture Disk
(size & iops), RAM
, CPU
, Network
usage (ingress/egress)
Why
:
SMT
- TheSMT
is one of the most important parts of the end-to-end flow which has impact onRAM
,CPU
&disk
Caches & State
- All the caching & state can have impact across the boardRequest Processing
- Signature generation, request marshaling / unmarshaling, etc…Request generation
- Generating the actual response to the request via the dummy service
RAM | CPU | Network | Disk | Time | |
---|---|---|---|---|---|
SMT | ❓ | ❓ | ❓ | ❓ | |
Caches & State | ❓ | ||||
Request Processing | ❓ | ||||
Response Generation | ❓ | ❓ | ❓ |
Out-of-scope
The exact details of the implementation are out-of-scope and will be developed adhoc along the way. The following is a non-exhaustive list of items we will figure out along the way:
- Data Collection
- ConcreteAnalysis Methodology
- Templates for format reporting
Architecture / Component Diagram
Legend:
- 🔵 Pocket Specific Actors
- 🟣 Pocket Network dependencies
- 🟠 New tooling that needs to be build
--
Asynchronous request-
Synchronous request
This GitHub does not render colored mermaid diagrams, you can also access the image here.
TODO_IMPROVE: Improve the colors for readability purposes per the comment here.
Tool Requirements
Deployment Environment
- Ability to deploy the environment (and tooling) on LocalNet & DevNet
Request Source / Generator
- A script/tool to generate $N$ requests per second
Script - Instructions
- Ramp-up & ramp-down strategy
- Instructions on when & how to execute commands (manually) to ramp-up & down
Script - Tools
- Commands to periodically
trigger manual stake/unstake txs
- Commands to periodically
scale up suppliers & gateways
- Commands to periodically
add a new virtual user
- Command to periodically
increase the number of requests per seconds