版本：1.2.4

Network latency simulation across multiple data centers

This document helps you simulate multiple data centers scenarios.

Characteristics of multiple data centers scenarios

The latency between different data centers
The bandwidth limitations between data centers

Note:
Currently, Chaos Mesh cannot simulate the scenario of the bandwidth limitations between data centers. So in this case, only simulate the scenario of the latency between different data centers.

Experiment environment

Suppose our application will be deployed in three data centers in a production environment and these data centers are still under construction. Now we want to test the impact of such a deployment topology on the business in advance.

Here we use TiDB cluster as an example. Suppose we already install the TiDB cluster and Chaos Mesh in our Kubernetes environment. In this TiDB cluster, we have three TiDB pods, three PD pods and seven TiKV pods:

kubectl get pod -n tidb-cluster # "tidb-cluster" is the namespace of TiDB cluster

Output:

NAME                               READY   STATUS    RESTARTS   AGE
basic-discovery-7f9f48c465-6pdhn   1/1     Running   0          30m
basic-pd-0                         1/1     Running   0          30m
basic-pd-1                         1/1     Running   0          30m
basic-pd-2                         1/1     Running   0          30m
basic-tidb-0                       2/2     Running   0          29m
basic-tidb-1                       2/2     Running   0          29m
basic-tidb-2                       2/2     Running   0          29m
basic-tikv-0                       1/1     Running   0          29m
basic-tikv-1                       1/1     Running   0          29m
basic-tikv-2                       1/1     Running   0          29m
basic-tikv-3                       1/1     Running   0          29m
basic-tikv-4                       1/1     Running   0          29m
basic-tikv-5                       1/1     Running   0          29m
basic-tikv-6                       1/1     Running   0          29m

Grouping

dc-a, dc-b, and dc-c are the three data centers we will use later. So we will split the pods to these data centers:

dc-a	dc-b	dc-c
basic-pd-0	basic-pd-1	basic-pd-2
basic-tidb-0	basic-tidb-1	basic-tidb-2
basic-tikv-0/1	basic-tikv-2/3	basic-tikv-4/5/6

Latency between three data centers

	latency
dc-a <--> dc-b	1ms
db-a <--> dc-c	2ms
dc-b <--> dc-c	2ms

Inject network latency

Design injection rules

Chaos Mesh provides NetworkChaos to inject network latency, so we can use it to simulate the latency between three data centers.

At present, NetworkChaos has a limitation that each target pod only has one configuration of netem in effect. So we can use the following rules:

source pods	latency	target pods
dc-a	1ms	dc-b
dc-a	1ms	dc-c
dc-b	1ms	dc-c
dc-c	1ms	dc-a
dc-c	1ms	dc-b

According to above rules, the latency between dc-a and dc-b is 1ms, the latency between dc-a and dc-c is 2ms and the latency between dc-b and dc-c is 2ms.

Define the chaos experiment

According to the injection rules, we define the chaos experiment as following:

apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
  name: network-delay-a
  namespace: tidb-cluster
spec:
  action: delay # chaos action
  mode: all
  selector: # define the pods belong to dc-a
    pods:
      tidb-cluster: # namespace of the target pods
        - basic-tidb-0
        - basic-pd-0
        - basic-tikv-0
        - basic-tikv-1
  delay:
    latency: '1ms'
  direction: to
  target:
    selector: # define the pods belong to dc-b and dc-c
      pods:
        tidb-cluster: # namespace of the target pods
          - basic-tidb-1
          - basic-tidb-2
          - basic-pd-1
          - basic-pd-2
          - basic-tikv-2
          - basic-tikv-3
          - basic-tikv-4
          - basic-tikv-5
          - basic-tikv-6
    mode: all

---
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
  name: network-delay-b
  namespace: tidb-cluster
spec:
  action: delay
  mode: all
  selector: # define the pods belong to dc-b
    pods:
      tidb-cluster: # namespace of the target pods
        - basic-tidb-1
        - basic-pd-1
        - basic-tikv-2
        - basic-tikv-3
  delay:
    latency: '1ms'
  direction: to
  target:
    selector: # define the pods belong to dc-c
      pods:
        tidb-cluster: # namespace of the target pods
          - basic-tidb-2
          - basic-pd-2
          - basic-tikv-4
          - basic-tikv-5
          - basic-tikv-6
    mode: all

---
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
  name: network-delay-c
  namespace: tidb-cluster
spec:
  action: delay
  mode: all
  selector: # define the pods belong to dc-c
    pods:
      tidb-cluster: # namespace of the target pods
        - basic-tidb-2
        - basic-pd-2
        - basic-tikv-4
        - basic-tikv-5
        - basic-tikv-6
  delay:
    latency: '1ms'
  direction: to
  target:
    selector: # define the pods belong to dc-a and dc-b
      pods:
        tidb-cluster: # namespace of the target pods
          - basic-tidb-0
          - basic-tidb-1
          - basic-pd-0
          - basic-pd-1
          - basic-tikv-0
          - basic-tikv-1
          - basic-tikv-2
          - basic-tikv-3
    mode: all

Apply the chaos experiment

Define the above chaos experiment as delay.yaml and apply this file:

kubectl apply -f delay.yaml

Check the result

Use ping command to check the latency between three centers.

Check the latency between the pods belong to `dc-a`

kubectl exec -it -n tidb-cluster basic-tidb-0 -c tidb -- ping -c 2 basic-tikv-0.basic-tikv-peer.tidb-cluster.svc

output:

PING basic-tikv-0.basic-tikv-peer.tidb-cluster.svc (10.244.1.229): 56 data bytes
64 bytes from 10.244.1.229: seq=0 ttl=63 time=0.095 ms
64 bytes from 10.244.1.229: seq=1 ttl=63 time=0.100 ms

From the output, we can see that the latency between the pods belong to dc-a is around 0.1ms.

Check the latency between `dc-a` and `dc-c`

kubectl exec -it -n tidb-cluster basic-tidb-0 -c tidb -- ping -c 2 basic-tidb-1.basic-tidb-peer.tidb-cluster.svc

output:

PING basic-tidb-1.basic-tidb-peer.tidb-cluster.svc (10.244.3.3): 56 data bytes
64 bytes from 10.244.3.3: seq=0 ttl=62 time=1.193 ms
64 bytes from 10.244.3.3: seq=1 ttl=62 time=1.201 ms

From the output, we can see that the latency between dc-a and dc-c is around 1ms.

Check the latency between `dc-b` and `dc-c`

kubectl exec -it -n tidb-cluster basic-tidb-0 -c tidb -- ping -c 2 basic-tidb-2.basic-tidb-peer.tidb-cluster.svc

output:

PING basic-tidb-2.basic-tidb-peer.tidb-cluster.svc (10.244.2.27): 56 data bytes
64 bytes from 10.244.2.27: seq=0 ttl=62 time=2.200 ms
64 bytes from 10.244.2.27: seq=1 ttl=62 time=2.251 ms

From the output, we can see that the latency between dc-a and dc-c is around 2ms.

Delete the network latency

kubectl delete -f delay.yaml

Characteristics of multiple data centers scenarios​

Experiment environment​

Grouping​

Latency between three data centers​

Inject network latency​

Design injection rules​

Define the chaos experiment​

Apply the chaos experiment​

Check the result​

Check the latency between the pods belong to dc-a​

Check the latency between dc-a and dc-c​

Check the latency between dc-b and dc-c​

Delete the network latency​

Characteristics of multiple data centers scenarios

Experiment environment

Grouping

Latency between three data centers

Inject network latency

Design injection rules

Define the chaos experiment

Apply the chaos experiment

Check the result

Check the latency between the pods belong to `dc-a`

Check the latency between `dc-a` and `dc-c`

Check the latency between `dc-b` and `dc-c`

Delete the network latency