JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks written in Java and other languages targeting the JVM.
JMH 是一个 Java 工具,用于构建、运行和分析用 Java 和其他针对 JVM 的语言编写的纳/微米/毫/宏观基准测试。
JMH 基础概念
- Iteration: Iteration 是 JMH 进行测试的最小单位,包含一组 Invocations。
- Invocation: 一次 Benchmark 方法调用。
- Operation: Benchmark 方法中,被测量操作的执行。如果被测试的操作在 Benchmark 方法中循环执行,可以使用
@OperationsPerInvocation
表明循环次数,使测试结果为单次 Operation 的性能。 - Warmup: 在实际进行 Benchmark 前先进行预热。因为某个函数被调用多次之后,JIT 会对其进行编译,通过预热可以使测量结果更加接近真实情况。
如何开始
JMH 在 JDK 12 中已经被包含,低版本则需要自己在 Maven 中进行引入。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<jmh.version>1.27</jmh.version>
</properties>
<dependencies>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>${jmh.version}</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>${jmh.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
|
相关注解
Warmup
1
2
| @Warmup(iterations = -1, time = -1, timeUnit = TimeUnit.SECONDS, batchSize = -1)
public class BenchmarkTest {}
|
- iterations: 预热多少轮。
- time: 预热时间。
- timeUnit: 时间单位,默认 秒。
- batchSize: 指定每次操作调用多少次方法。
对应输出中的以下部分:
1
2
3
4
5
| # Warmup Iteration 1: 3359571890.253 ops/s
# Warmup Iteration 2: 3410327185.841 ops/s
# Warmup Iteration 3: 3397921599.396 ops/s
# Warmup Iteration 4: 3435385874.666 ops/s
# Warmup Iteration 5: 3418061628.223 ops/s
|
Measurement
1
2
| @Measurement(iterations = -1, time = -1, timeUnit = TimeUnit.SECONDS, batchSize = -1)
public class BenchmarkTest {}
|
- iterations: 执行多少轮。
- time: 执行时间。
- timeUnit: 时间单位,默认 秒。
- batchSize: 指定每次操作调用多少次方法。
Measurement
和 Warmup
的参数是一样的。不同于预热,它指的是真正的迭代次数。
对应输出中的以下部分:
1
2
3
4
5
| Iteration 1: 3439847797.303 ops/s
Iteration 2: 3450228067.947 ops/s
Iteration 3: 3440088656.138 ops/s
Iteration 4: 3427225995.315 ops/s
Iteration 5: 3455433144.375 ops/s
|
BenchmarkMode
1
2
| @BenchmarkMode(Mode.All)
public class BenchmarkTest {}
|
基准测试类型:
- Throughput: Throughput, ops/time (吞吐量,单位时间内执行的次数)
- AverageTime: Average time, time/op(平均时间,一次执行需要的单位时间,其实是吞吐量的倒数)
- SampleTime: Sampling time(是基于采样的执行时间,采样频率由 JMH 自动控制,同时结果中也会统计出 p90、p95 的时间)
- SingleShotTime: Single shot invocation time (只运行一次,把 Warmup 次数设为 0,可以用于测试冷启动时的性能)
- All: All benchmark modes (所有模式)
Fork
1
2
| @Fork(1)
public class BenchmarkTest {}
|
值一般设置为 1,表示使用一个进程进行测试。如果大于 1,则会启用新的进程进行测试。
值的注意的是,可以使用 jvmArgs
,jvmArgsPrepend
,jvmArgsAppend
来传递 JVM 相关的参数。
Threads
1
2
| @Threads(Threads.MAX)
public class BenchmarkTest {}
|
@Fork
是面向进程的,而 @Threads
是面向线程的。指定了这个注解以后,将会开启并行测试。
如果设置了 Threads.MAX
,将会使用和处理机器核数相同的线程数。
OutputTimeUnit
1
2
| @OutputTimeUnit(TimeUnit.MICROSECONDS)
public class BenchmarkTest {}
|
指定基准测试结果的时间类型。可以选择秒、毫秒、微秒等。
State
1
2
| @State(Scope.Thread)
public class BenchmarkTest {}
|
指定了在类中变量的作用范围。可以用 Scope 参数用来表示该状态的共享范围。
Scope 有三个参数:
- Benchmark:表示变量的作用范围是某个基准测试类。
- Thread:每个线程一份副本,如果配置了 Threads 注解,则每个 Thread 都拥有一份变量,它们互不影响。
- Group:联系
@Group
注解,在同一个 Group 里,将会共享同一个变量实例。
Param
1
2
3
4
5
6
7
8
9
10
11
12
| public class BenchmarkTest {
@Param({"1", "31", "65", "101", "103"})
public int arg;
@Param({"0", "1", "2", "4", "8", "16", "32"})
public int certainty;
@Benchmark
public boolean bench() {
return BigInteger.valueOf(arg).isProbablePrime(certainty);
}
}
|
属性注解,简单来说就是测试的时候将设置的各种值分别带入。
需要注意的是,如果你设置了非常多的参数,这些参数将执行多次,通常会运行很长时间。
比如参数 x 共 m 个,参数 y 共 n 个,那么总共要执行 m*n 次。
Group GroupThreads
1
2
3
4
5
6
7
| public class BenchmarkTest {
@Group("group1")
@GroupThreads(1)
public void test() {}
}
|
@Group
注解只能加在方法上,用来把测试方法进行归类。
如果单个测试文件中方法很多,需要将其归类,则可以使用这个注解。
与之关联的 @GroupThreads
注解,会在这个归类的基础上,再进行一些线程方面的设置。
Setup TearDown
1
2
3
4
5
6
7
8
9
| public class BenchmarkTest {
@Setup(Level.Trial)
public void init() {}
@TearDown(Level.Trial)
public void destory() {}
}
|
Setup 用于基准测试前的初始化动作,TearDown 用于基准测试后的动作,可以用来做一些全局的配置。
这两个注解,同样有一个 Level 值,标明了方法运行的时机,它有三个取值。
- Trial:默认的级别。也就是 Benchmark 级别。
- Iteration:每次迭代都会运行。
- Invocation:每次方法调用都会运行,这个是粒度最细的。
Benchmark
1
2
3
4
5
| public class BenchmarkTest {
@Benchmark
public void test() {}
}
|
方法级注解,表示该方法是需要进行 benchmark 的对象,用法和 JUnit 的 @Test 类似。
开始测试
创建相关类标注完注解之后,可以直接在 main 方法中开始测试:
1
2
3
4
5
6
7
8
| public class BenchmarkTest {
public static void main(String[] args) throws RunnerException {
final Options options = new OptionsBuilder()
.include(BenchmarkTest.class.getSimpleName())
.build();
new Runner(options).run();
}
}
|
Dead-Code Elimination (死码消除)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
| public class DeadCode {
/*
* The downfall of many benchmarks is Dead-Code Elimination (DCE): compilers
* are smart enough to deduce some computations are redundant and eliminate
* them completely. If the eliminated part was our benchmarked code, we are
* in trouble.
*/
private double x = Math.PI;
private double compute(double d) {
for (int c = 0; c < 10; c++) {
d = d * d / Math.PI;
}
return d;
}
@Benchmark
public void baseline() {
// do nothing, this is a baseline
}
@Benchmark
public void measureWrong() {
// This is wrong: result is not used and the entire computation is optimized away.
compute(x);
}
@Benchmark
public double measureRight() {
// This is correct: the result is being used.
return compute(x);
}
}
|
1
2
3
4
| Benchmark Mode Cnt Score Error Units
DeadCode.baseline avgt 5 0.292 ± 0.008 ns/op
DeadCode.measureRight avgt 5 7.374 ± 0.213 ns/op
DeadCode.measureWrong avgt 5 0.293 ± 0.003 ns/op
|
Dead-Code Elimination (DCE) 死码消除,编译器非常聪明,上面的代码中,baseline() 和 measureWrong() 有着相同的效率,因为编译器发现有的代码没有作用,如上述 measureWrong() 并没有返回值,计算结果并没有被使用到,这时候为了效率,编译器会消除掉这段代码,但是对基准测试来说就很不友好。我们可以通过增加 return 的方法来避免编译期间代码被擦除掉。
另外一种解决 DCE 的方法是,JMH 提供了一个 Blackholes (黑洞),我们使用 Blackholes 吃掉(consume)返回值就好了。
1
2
3
4
| @Benchmark
public void measure(final Blackhole blackhole) {
blackhole.consume(compute(x));
}
|
Constant Fold (常量折叠)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
| public class ConstantFold {
/*
* The flip side of dead-code elimination is constant-folding.
*
* If JVM realizes the result of the computation is the same no matter what,
* it can cleverly optimize it. In our case, that means we can move the
* computation outside of the internal JMH loop.
*
* This can be prevented by always reading the inputs from non-final
* instance fields of @State objects, computing the result based on those
* values, and follow the rules to prevent DCE.
*/
// IDEs will say "Oh, you can convert this field to local variable". Don't. Trust. Them.
// (While this is normally fine advice, it does not work in the context of measuring correctly.)
private double x = Math.PI;
// IDEs will probably also say "Look, it could be final". Don't. Trust. Them. Either.
// (While this is normally fine advice, it does not work in the context of measuring correctly.)
private final double wrongX = Math.PI;
private double compute(double d) {
for (int c = 0; c < 10; c++) {
d = d * d / Math.PI;
}
return d;
}
@Benchmark
public double baseline() {
// simply return the value, this is a baseline
return Math.PI;
}
@Benchmark
public double measureWrong_1() {
// This is wrong: the source is predictable, and computation is foldable.
return compute(Math.PI);
}
@Benchmark
public double measureWrong_2() {
// This is wrong: the source is predictable, and computation is foldable.
return compute(wrongX);
}
@Benchmark
public double measureRight() {
// This is correct: the source is not predictable.
return compute(x);
}
}
|
1
2
3
4
5
| Benchmark Mode Cnt Score Error Units
ConstantFold.baseline avgt 5 1.939 ± 0.100 ns/op
ConstantFold.measureRight avgt 5 7.276 ± 0.042 ns/op
ConstantFold.measureWrong_1 avgt 5 1.896 ± 0.004 ns/op
ConstantFold.measureWrong_2 avgt 5 1.916 ± 0.010 ns/op
|
constant-folding (常量折叠),上述代码的 measureWrong1 和 measureWrong2 中的运算都是可以预测的值,所以也会在编译期直接替换为计算结果,从而导致基准测试失败,注意 final 修饰的变量也会被折叠。
Loops(循环)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
| public class Loops {
/*
* It would be tempting for users to do loops within the benchmarked method.
* (This is the bad thing Caliper taught everyone). These tests explain why
* this is a bad idea.
*
* Looping is done in the hope of minimizing the overhead of calling the
* test method, by doing the operations inside the loop instead of inside
* the method call. Don't buy this argument; you will see there is more
* magic happening when we allow optimizers to merge the loop iterations.
*/
/*
* Suppose we want to measure how much it takes to sum two integers:
*/
int x = 1;
int y = 2;
/*
* This is what you do with JMH.
*/
@Benchmark
public int measureRight() {
return (x + y);
}
/*
* The following tests emulate the naive looping.
* This is the Caliper-style benchmark.
*/
private int reps(int reps) {
int s = 0;
for (int i = 0; i < reps; i++) {
s += (x + y);
}
return s;
}
/*
* We would like to measure this with different repetitions count.
* Special annotation is used to get the individual operation cost.
*/
@Benchmark
@OperationsPerInvocation(1)
public int measureWrong_1() {
return reps(1);
}
@Benchmark
@OperationsPerInvocation(10)
public int measureWrong_10() {
return reps(10);
}
@Benchmark
@OperationsPerInvocation(100)
public int measureWrong_100() {
return reps(100);
}
@Benchmark
@OperationsPerInvocation(1_000)
public int measureWrong_1000() {
return reps(1_000);
}
@Benchmark
@OperationsPerInvocation(10_000)
public int measureWrong_10000() {
return reps(10_000);
}
@Benchmark
@OperationsPerInvocation(100_000)
public int measureWrong_100000() {
return reps(100_000);
}
}
|
1
2
3
4
5
6
7
8
| Benchmark Mode Cnt Score Error Units
Loops.measureRight avgt 5 1.880 ± 0.014 ns/op
Loops.measureWrong_1 avgt 5 1.880 ± 0.009 ns/op
Loops.measureWrong_10 avgt 5 0.188 ± 0.001 ns/op
Loops.measureWrong_100 avgt 5 0.019 ± 0.001 ns/op
Loops.measureWrong_1000 avgt 5 0.022 ± 0.001 ns/op
Loops.measureWrong_10000 avgt 5 0.019 ± 0.001 ns/op
Loops.measureWrong_100000 avgt 5 0.020 ± 0.001 ns/op
|
不要在基准测试的时候使用循环,使用循环就会导致测试结果不准确。
Blackhole
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
| public class ConsumeCPU {
/*
* At times you require the test to burn some of the cycles doing nothing.
* In many cases, you *do* want to burn the cycles instead of waiting.
*
* For these occasions, we have the infrastructure support. Blackholes
* can not only consume the values, but also the time! Run this test
* to get familiar with this part of JMH.
*
* (Note we use static method because most of the use cases are deep
* within the testing code, and propagating blackholes is tedious).
*/
@Benchmark
public void consume_0000() {
Blackhole.consumeCPU(0);
}
@Benchmark
public void consume_0001() {
Blackhole.consumeCPU(1);
}
@Benchmark
public void consume_0002() {
Blackhole.consumeCPU(2);
}
@Benchmark
public void consume_0004() {
Blackhole.consumeCPU(4);
}
@Benchmark
public void consume_0008() {
Blackhole.consumeCPU(8);
}
@Benchmark
public void consume_0016() {
Blackhole.consumeCPU(16);
}
@Benchmark
public void consume_0032() {
Blackhole.consumeCPU(32);
}
@Benchmark
public void consume_0064() {
Blackhole.consumeCPU(64);
}
@Benchmark
public void consume_0128() {
Blackhole.consumeCPU(128);
}
@Benchmark
public void consume_0256() {
Blackhole.consumeCPU(256);
}
@Benchmark
public void consume_0512() {
Blackhole.consumeCPU(512);
}
@Benchmark
public void consume_1024() {
Blackhole.consumeCPU(1024);
}
}
|
1
2
3
4
5
6
7
8
9
10
11
12
13
| Benchmark Mode Cnt Score Error Units
ConsumeCPU.consume_0000 avgt 5 1.976 ± 0.007 ns/op
ConsumeCPU.consume_0001 avgt 5 1.933 ± 0.014 ns/op
ConsumeCPU.consume_0002 avgt 5 2.058 ± 0.007 ns/op
ConsumeCPU.consume_0004 avgt 5 3.182 ± 0.012 ns/op
ConsumeCPU.consume_0008 avgt 5 4.462 ± 0.020 ns/op
ConsumeCPU.consume_0016 avgt 5 10.607 ± 0.043 ns/op
ConsumeCPU.consume_0032 avgt 5 27.758 ± 0.138 ns/op
ConsumeCPU.consume_0064 avgt 5 77.873 ± 0.437 ns/op
ConsumeCPU.consume_0128 avgt 5 190.842 ± 1.602 ns/op
ConsumeCPU.consume_0256 avgt 5 412.204 ± 2.976 ns/op
ConsumeCPU.consume_0512 avgt 5 856.362 ± 3.612 ns/op
ConsumeCPU.consume_1024 avgt 5 1742.743 ± 9.288 ns/op
|
Blackhole 除了可以用来“死码消除”,同时 Blackhole 也可以“吞噬”cpu 时间片。
Blackhole.consumeCPU 的参数是时间片的 tokens,和时间片成线性关系。
Profiler
1
2
3
4
5
6
7
8
| public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(ProfilersTest.Classy.class.getSimpleName())
.addProfiler(GCProfiler.class)
.build();
new Runner(opt).run();
}
|
JMH 内置的性能剖析工具可以查看基准测试消耗在什么地方,具体的剖析方式内置的有如下几种:
- ClassloaderProfiler:类加载剖析
- CompilerProfiler:JIT 编译剖析
- GCProfiler:GC 剖析
- StackProfiler:栈剖析
- PausesProfiler:停顿剖析
- HotspotThreadProfiler:Hotspot 线程剖析
- HotspotRuntimeProfiler:Hotspot 运行时剖析
- HotspotMemoryProfiler:Hotspot 内存剖析
- HotspotCompilationProfiler:Hotspot 编译剖析
- HotspotClassloadingProfiler:Hotspot 类加载剖析
图形化分析
1
2
3
4
5
6
7
8
9
10
11
12
| public class BenchmarkTest {
public static void main(String[] args) throws RunnerException {
final Options options = new OptionsBuilder()
.include(BenchmarkTest.class.getSimpleName())
.result("BenchmarkTest.json")
.resultFormat(ResultFormatType.JSON)
.build();
new Runner(options).run();
}
}
|
使用 resultFormat 指定导出格式,result 指定导出为止,执行完成后,将测试数据导出为 JSON 文件后,上传到以下网站即可进行分析
JMH Visualizer
JMH Visual Chart