Usually when we talk about writing tests, the first thing that comes to your mind is some kind of static test where you send an input and you expect an output. For example you send a wrong parameter to a REST service, and you expect that it returns an error message/status code.
But usually applications runs more time than the amount of time it takes to execute all tests. Probably days or months until you update it. And during this time, things happen, for example network starts to go slow, a unknown process eats all CPU or if you are using Docker, a container dies. So you can see that when you run your application for a long time some kind of chaos might appear.
The question is, are you sure your application deals correctly with these situations? You can test it manually, but if you want to apply CI/CD approach then you need some automatic way to execute them.
And this is where Arquillian Cube Q helps you. Arquillian Cube Q is an extension of Arquillian Cube (https://github.com/arquillian/arquillian-cube) that allows you to write chaos tests. Since Arquillian Cube Q is an extension of Cube, it relies on Docker to execute them.
1. Chaos
There are several level of chaos that you might test, from network chaos (latency, bandwidth limitation, …) to operative system chaos (cpu burn, io burn, dill disk, …).
Arquillian Q as all the Arquillian project, it reuses existing chaos frameworks by integrating them into Arquillian philosophy. Let’s see what is supported:
1.1. Network Chaos
To do network chaos Arquillian Q integrates with Toxiproxy project (https://github.com/Shopify/toxiproxy). Toxiproxy is a framework for simulating network conditions. It is a TCP proxy that intercepts communication between two endpoints and adds some chaos before reaching the real endpoint.
Toxiproxy supports next toxics:
- latency
-
Add a delay to all data going through the proxy. The delay is equal to latency +/- jitter.
- down
-
Bringing a service down
- bandwidth
-
Limit a connection to a maximum number of kilobytes per second.
- slow close
-
Delay the TCP socket from closing until delay has elapsed.
- timeout
-
Stops all data from getting through, and closes the connection after timeout. If timeout is 0, the connection won’t close, and data will be delayed until the toxic is removed.
- slicer
-
Slices TCP data up into small bits, optionally adding a delay between each sliced "packet".
Arquillian Q supports Toxiproxy by registering as docker container toxiproxy and then inspecting the cube definitions (in cube format or docker-compose format) and automatically redirect links to toxiproxy.
As an example:
ContainerA -link→ ContainerB
is converted to:
ContainerA -link→ ToxiproxyContainer -link→ ContainerB
After this you are able to program some toxics to Toxiproxy and execute the test.
1.1.1. Adding Dependency
Arquillian Q Toxiproxy is only a jar deployed to Maven central:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.jboss.arquillian</groupId>
<artifactId>arquillian-bom</artifactId>
<version>${version.arquillian_core}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.arquillian.cube.q</groupId>
<artifactId>arquillian-cube-q-toxic</artifactId>
<scope>test</scope>
<version>${version.arquillian_q}</version>
</dependency>
<dependency>
<groupId>org.jboss.arquillian.junit</groupId>
<artifactId>arquillian-junit-standalone</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
</dependency>
</dependencies>
Notice that instead of registering arquillian-junit-container as you usually do in Arquillian test, you are using arquillian-junit-standalone . This is because it has no sense to use in these kind of tests microdeployments feature (method annotated with @Deployment ).
|
1.1.2. Configuration
You don’t need to configure anything else from the point of view of Q apart from Cube configuration file.
<?xml version="1.0" encoding="UTF-8"?>
<arquillian xmlns="http://jboss.org/schema/arquillian"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://jboss.org/schema/arquillian
http://jboss.org/schema/arquillian/arquillian_1_0.xsd">
<extension qualifier="docker">
<property name="machineName">dev</property>
<property name="dockerContainers">
hw:
image: lordofthejars/helloworld
env: ["CATALINA_OPTS=-Djava.security.egd=file:/dev/./urandom"]
portBindings: [8081->8080/tcp]
links:
- pingpong:pingpong
pingpong:
image: jonmorehouse/ping-pong
exposedPorts: [8080/tcp]
</property>
</extension>
</arquillian>
In this case container helloworld
is connecting to pingpong
container.
1.1.3. Test
Then the test looks like:
@RunWith(Arquillian.class)
public class ToxicFuntionalTestCase {
@ArquillianResource
private NetworkChaos networkChaos; (1)
@HostIp
private String ip;
@Test
public void shouldAddLatency() throws Exception {
networkChaos.on("pingpong", 8080).latency(latencyInMillis(4000)) (2)
.exec(() -> { (3)
URL url = new URL("http://" + ip + ":" + 8081 + "/hw/HelloWorld");
final long l = System.currentTimeMillis();
String response = IOUtil.asString(url.openStream());
System.out.println(response);
System.out.println("Time:" + (System.currentTimeMillis() - l));
// assertions
}); (4)
}
}
1 | Enrich the test with NetworkChaos instance to communicate with Toxiproxy. |
2 | Adds a latency of 4 seconds when communication is done to pingpong container through port 8080. |
3 | Executes test logic. Notice that the execution time will be greater than 4 seconds. |
4 | After callback executions, toxics are reseted. |
exec method also supports you pass how many times do you want to execute the test: networkChaos.on("pingpong", 8080).latency(latencyInMillis(4000)).exec(times(2), () → {} or for example the amount of time you want to keep executing the test Q.on("pingpong", 8080).exec(during(15, TimeUnit.SECONDS), () → {} .
|
You can see full example at: https://github.com/arquillian/arquillian-cube-q/tree/master/ftest-toxic
1.1.4. Adding some randomness
Some of the discrete values set in toxics such as slowClose
, bandwidth
, timeout
or slice
can be randomized using mathematical distributions.
At this time two distributions are supported:
-
Uniform Distribution: Distribution that returns values uniformally distributed across a range. You can read about this distribution at https://en.wikipedia.org/wiki/Discrete_uniform_distribution
-
LogNormal Distribution: Returns log normally distributed values. You can use this website https://www.wolframalpha.com/input/?i=lognormaldistribution%28log%2890%29%2C+0.1%29 to play with the values. You can read more about this distribution at https://en.wikipedia.org/wiki/Log-normal_distribution
For example, this is how you can randomize the latency:
networkChaos.on("pingpong", 8080)
.latency(logNormalLatencyInMillis(2000, 0.3))
.exec(times(2), () -> {
URL url = new URL("http://" + ip + ":" + 8081 + "/hw/HelloWorld");
final long l = System.currentTimeMillis();
String response = IOUtil.asString(url.openStream());
System.out.println(response);
System.out.println("Time:" + (System.currentTimeMillis() - l));
});
In the configuration above, latency times are distributed in using a log normal distribution with median of 2 seconds and 0.3 as sigma value. Then for each iteration of the test, a new value is calculated and send to toxiproxy.
1.1.5. Binding Ports Chaos
Sometimes you don’t want to add chaos between containers but in binding ports. That is adding chaos to the communication between host and containers. This is really useful in cases when you want to test what’s happening to your frontend application (javascript) when there is some chaos.
Assuming that A has a port binding, something like:
A → B
is converted to:
Proxy → A → B
Where A has no port binding anymore but only exposed ports and it is the Proxy who has the port binding.
To use this just configure next parameter in arquillian.xml
file:
<extension qualifier="networkChaos">
<property name="toxifyPortBinding">true</property>
</extension>
By defult this flag is false, if you set to true then no chaos can be done between containers, only between host and containers. |
You can see an example at: https://github.com/arquillian/arquillian-cube-q/tree/master/ftest-toxic-frontend
1.2. Container Chaos
To do container chaos Arquillian Q integrates with Pumba project (https://github.com/Shopify/toxiproxy). Pumba is an application that you run it on every Docker host, in your cluster and it, once in a while, will "randomly" stop running containers, matching specified name/s or name patterns. You can even specify the signal, that will be sent to “kill” the container.
It supports:
-
Stop a container.
-
Remove a container.
-
Kill a container process with signal.
Arquillian Q will register a Pumba container inside the configured docker host you set in Arquillian Q.
1.2.1. Adding Dependency
Arquillian Q Pumba is only a jar file deployed in Maven central.
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.jboss.arquillian</groupId>
<artifactId>arquillian-bom</artifactId>
<version>${version.arquillian_core}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.arquillian.cube.q</groupId>
<artifactId>arquillian-cube-q-pumba</artifactId>
<scope>test</scope>
<version>${version.arquillian_q}</version>
</dependency>
<dependency>
<groupId>org.jboss.arquillian.junit</groupId>
<artifactId>arquillian-junit-standalone</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
</dependency>
</dependencies>
Notice that instead of registering arquillian-junit-container as you usually do in Arquillian test, you are using arquillian-junit-standalone . This is because it has no sense to use in these kind of tests microdeployments feature (method annotated with @Deployment ).
|
1.2.2. Configuration
You don’t need to configure anything else from the point of view of Q apart from Cube configuration file.
<?xml version="1.0" encoding="UTF-8"?>
<arquillian xmlns="http://jboss.org/schema/arquillian"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="
http://jboss.org/schema/arquillian
http://jboss.org/schema/arquillian/arquillian_1_0.xsd">
<extension qualifier="docker">
<property name="machineName">dev</property>
<property name="dockerContainers">
pingpong:
image: jonmorehouse/ping-pong
exposedPorts: [8080/tcp]
pingpong2:
image: jonmorehouse/ping-pong
exposedPorts: [8080/tcp]
</property>
</extension>
</arquillian>
In this case we are defining two instances of same image.
1.2.3. Test
Then the test looks like:
@RunWith(Arquillian.class)
public class PumbaFunctionalTestCase {
@ArquillianResource (1)
ContainerChaos containerChaos;
@ArquillianResource
DockerClient dockerClient; (2)
@Test
public void shouldKillContainers() throws Exception {
containerChaos
.onCubeDockerHost()
.killRandomly( (3)
ContainerChaos.ContainersType.regularExpression("^pingpong"), (4)
ContainerChaos.IntervalType.intervalInSeconds(4), (5)
ContainerChaos.KillSignal.SIGTERM
)
.exec(); (6)
final List<Container> containers = dockerClient.listContainersCmd().exec();
//Pumba container is not killed by itself
assertThat(containers).hasSize(1);
}
}
1 | Enrich test with container chaos |
2 | Enrich test with DockerClient class to communicate with DockerHost in test |
3 | Kills randomly one by one containers |
4 | Kills only containers with name starting with pingpong |
5 | Time to wait between kill another container |
6 | Starts Pumba. In this case no callback used. |
As happens in Network Chaos you can also specify test as callback and specify times to execute the test or the duration.
You can see full example at: https://github.com/arquillian/arquillian-cube-q/tree/master/ftest-pumba
1.3. Operative System Chaos
To do operative system chaos Arquillian Q uses some modified version scripts of Netflix Simian Army project (). Some scripts have been modified to have sense into Docker world instead of AWS world.
It supports:
-
Block a port using
iptables
command. -
Burn CPU using
dd
command. That is putting CPU to 100%. -
Burn IO using
dd
comomand. -
Fill disk with
dd
command. -
Kill process using
pkill
command. -
Null Route using
ip
command.
Scripts are executed inside the container. This means that the command used in the script must be installed inside the container. Some images might contain them, others not. |
Making chaos with scripts means a whole new kind of possibilities since the only barrier is the commands you need to execute them. Please feel free to contribute with your own scripts. |
1.3.1. Adding Dependency
Arquillian Simian Army is only a jar file deployed in Maven central.
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.jboss.arquillian</groupId>
<artifactId>arquillian-bom</artifactId>
<version>${version.arquillian_core}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>org.arquillian.cube.q</groupId>
<artifactId>arquillian-cube-q-simianarmy</artifactId>
<scope>test</scope>
<version>${version.arquillian_q}</version>
</dependency>
<dependency>
<groupId>org.jboss.arquillian.junit</groupId>
<artifactId>arquillian-junit-standalone</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
</dependency>
</dependencies>
Notice that instead of registering arquillian-junit-container as you usually do in Arquillian test, you are using arquillian-junit-standalone . This is because it has no sense to use in these kind of tests microdeployments feature (method annotated with @Deployment ).
|
1.3.2. Test
Then the test looks like:
@RunWith(Arquillian.class)
public class SimianArmyFunctionalTestCase {
@ArquillianResource (1)
OperativeSystemChaos operativeSystemChaos;
@HostIp
String dockerHost;
@HostPort(containerName = "pingpong ", value = 8080)
int port;
@Test(expected = Exception.class) @Ignore //Running this test in same machine makes everything screwed
public void shouldExecuteBurnCpuChaos() throws Exception {
operativeSystemChaos.on("pingpong") (2)
.burnCpu(singleCpu()) (3)
.exec(); (4)
//.....
}
1 | Enrich test with operative system chaos |
2 | Sets the container to set the chaos |
3 | Sets burn cpu chaos as if the system had only one cpu |
4 | Starts the burn cpu script. In this case no callback used |
As happens in Network Chaos you can also specify test as callback and specify times to execute the test or the duration.
You can see full example at: https://github.com/arquillian/arquillian-cube-q/tree/master/ftest-simianarmy