Motivation
In the current post, I would like to explore different logging patterns and to identify which one is more efficient in terms of performance. Since logging is spread across almost every application, it is important to be aware of what is the most optimal pattern to rely on when writing logging messages. In this context, I would use a setup that is probably very common to most the business applications: logging text messages using SLF4J, as a facade, and LOG4J as a primary logging framework. In the case of any other logging framework, the majority of the things discussed here should still be relevant.
When it comes to performance there are a lot of additional techniques to take into consideration, which could potentially increase the logging throughput: for example switching from text messages to binary logging, using asynchronous appenders, using ramfs or tempfs, using a low latency logging framework, etc. Nevertheless, just for simplicity, I am going to ignore all of these for now and focus only on different logging alternatives at the Java source code level.
Benchmark
Two important remarks in regards to the current benchmark:
- the log level is set to INFO and DEBUG (see the logLevel variable), which means the messages intended to be logged by the benchmark on debug() will be, first ignored (in case of INFO log level), and then captured (in case of DEBUG log level)
- inside the setup method of the benchmark, I explicitly removed all logger appenders and I have created a custom WritterAppender that uses the Blackhole to sink the messages, hence nothing is physically written (avoiding the IO latency).
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 3, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 3, timeUnit = TimeUnit.SECONDS)
@Fork(value = 3)
@State(Scope.Benchmark)
public class SLF4JLogRecordBenchmark {
private static final Logger LOGGER = LoggerFactory.getLogger(SLF4JLogRecordBenchmark.class);
@Param({"INFO", "DEBUG"})
private static String theLevel;
@Param({"P1"})
private static String aString;
@Param({"42"})
private static int anInt;
@Param({"00.42f"})
private static float aFloat;
@Param({"true"})
private static boolean aBoolean;
@Param({"!"})
private static char aChar;
private Level logLevel;
@Setup
public void setUp(final Blackhole blackhole) {
ByteArrayOutputStream bos = new ByteArrayOutputStream() {
@Override
public void write(int b) {
blackhole.consume(b);
}
@Override
public void write(byte[] b, int off, int len) {
blackhole.consume(b);
}
@Override
public void write(byte[] b) {
blackhole.consume(b);
}
};
Logger logger = Logger.getLogger(LOGGER.getName());
logger.removeAllAppenders();
logger.setAdditivity(false);
Appender appender = new WriterAppender(new PatternLayout(), bos);
logger.addAppender(appender);
logLevel = "INFO".equals(theLevel) ? Level.INFO : Level.DEBUG;
logger.setLevel(logLevel);
}
@Benchmark
public void string_format() {
LOGGER.debug(format("Result [%s], [%s], [%s], [%s], [%s]", aString, ++anInt, aBoolean, aFloat++, aChar));
}
@Benchmark
public void lambda_heap() {
LOGGER.atDebug().log(() -> ("Result [" + aString + "], [" + (++anInt) + "], [" + aBoolean + "], [" + aFloat++) + "], [" + aChar + "]"));
}
@Benchmark
public void lambda_local() {
String localString = aString;
int localInt = ++anInt;
boolean localBoolean = aBoolean;
float localFloat = aFloat++;
char localChar = aChar;
LOGGER.atDebug().log(() -> ("Result [" + localString + "], [" + localInt + "], [" + localBoolean + "], [" + localFloat + "], [" + localChar + "]"));
}
@Benchmark
public void unguarded_parametrized() {
LOGGER.debug("Result [{}], [{}], [{}], [{}], [{}]", aString, ++anInt, aBoolean, aFloat++, aChar);
}
@Benchmark
public void guarded_parametrized() {
if (LOGGER.isDebugEnabled()) {
LOGGER.debug("Result [{}], [{}], [{}], [{}], [{}]", aString, ++anInt, aBoolean, aFloat++, aChar);
}
}
@Benchmark
public void unguarded_unparametrized() {
LOGGER.debug("Result [" + aString + "], [" + (++anInt) + "], [" + aBoolean + "], [" + (aFloat++) + "], [" + aChar + "]");
}
@Benchmark
public void guarded_unparametrized() {
if (LOGGER.isDebugEnabled()) {
LOGGER.debug("Result [" + aString + "], [" + (++anInt) + "], [" + aBoolean + "], [" + (aFloat++) + "], [" + aChar + "]");
}
}
}
Results
Case I – log level is set to INFO. The highlighted results are better.
Benchmark Mode Cnt Score Error Units
string_format avgt 15 1709.122 ± 113.994 ns/op
string_format:·gc.alloc.rate.norm avgt 15 1582.304 ± 3.308 B/op
string_format:·gc.count avgt 15 ≈ 0 counts
lambda_heap avgt 15 2.292 ± 0.115 ns/op
lambda_heap:·gc.alloc.rate.norm avgt 15 ≈ 10⁻⁶ B/op
lambda_heap:·gc.count avgt 15 ≈ 0 counts
lambda_local avgt 15 3.011 ± 0.156 ns/op
lambda_local:·gc.alloc.rate.norm avgt 15 ≈ 10⁻⁶ B/op
lambda_local:·gc.count avgt 15 ≈ 0 counts
unguarded_parametrized avgt 15 8.527 ± 0.377 ns/op
unguarded_parametrized:·gc.alloc.rate.norm avgt 15 32.000 ± 0.001 B/op
unguarded_parametrized:·gc.count avgt 15 260.000 counts
guarded_parametrized avgt 15 2.317 ± 0.184 ns/op
guarded_parametrized:·gc.alloc.rate.norm avgt 15 ≈ 0 counts
unguarded_unparametrized avgt 15 99.336 ± 5.973 ns/op
unguarded_unparametrized:·gc.alloc.rate.norm avgt 15 104.000 ± 0.001 B/op
unguarded_unparametrized:·gc.count avgt 15 153.000 counts
guarded_unparametrized avgt 15 2.343 ± 0.121 ns/op
guarded_unparametrized:·gc.alloc.rate.norm avgt 15 ≈ 10⁻⁶ B/op
guarded_unparametrized:·gc.count avgt 15 ≈ 0 counts
Configuration: CPU: Intel i7-8550U Kaby Lake R; MEMORY: 32GB DDR4 2400 MHz; OS: Ubuntu 19.04; OpenJDK 64-Bit Server VM version (build 13+33)
Case I – analysis
In this scenario, nothing is logged, but there are quite important differences between each approach:
- string_format – creates the final String, passes it to the debug() method, however, due to the fact the log level is INFO the String is not logged. It also triggers the evaluation of the parameters (e.g. ++anInt, aFloat++). As it can be easily spotted, this version is the slowest and has the highest allocation rate.
- lambda_heap and lambda_local – offers very good performance. They both defer the execution (i.e. lazy evaluation), which is never triggered due to the fact the requested log level at the call site (DEBUG) does not fulfill the one explicitly set to the logger (INFO).
- unguarded_parametrized – offers quite acceptable performance and clean code. It does not create the final String, because under the hood it checks anyway if debug is enabled. However, the parameters are evaluated (e.g. ++anInt, aFloat++), since they are passed to the debug() method.
- guarded_parametrized and guarded_unparametrized – are similar and their performance is close to the lambda_heap and lambda_lodal versions. In both cases, the guard (i.e. explicit check) prevents the String creation and any parameter evaluation at the expense of a much uglier code.
- unguarded_unparametrized – creates the final String and also triggers the parameter evaluations (e.g. ++anInt, aFloat++). The difference between string_format and this approach is that, under the hood, the latter concatenates the Strings relying on invokedynamic bytecode op which dispatches the call to StringConcatFactory.makeConcatWithConstants(). Please see the “Further Readings” section below for additional references on this topic.
Case II – log level is set to DEBUG. The highlighted results are better.
Benchmark Mode Cnt Score Error Units
string_format avgt 15 2013.429 ± 109.774 ns/op
string_format:·gc.alloc.rate.norm avgt 15 1916.693 ± 37.644 B/op
string_format:·gc.count avgt 15 140.000 counts
lambda_heap avgt 15 359.730 ± 12.843 ns/op
lambda_heap:·gc.alloc.rate.norm avgt 15 784.000 ± 12.520 B/op
lambda_heap:·gc.count avgt 15 218.000 counts
lambda_local avgt 15 359.355 ± 11.172 ns/op
lambda_local:·gc.alloc.rate.norm avgt 15 768.000 ± 12.520 B/op
lambda_local:·gc.time avgt 15 228.000 counts
unguarded_parametrized avgt 15 508.648 ± 23.161 ns/op
unguarded_parametrized:·gc.alloc.rate.norm avgt 15 1072.000 ± 0.001 B/op
unguarded_parametrized:·gc.count avgt 15 212.000 counts
guarded_parametrized avgt 15 491.054 ± 27.361 ns/op
guarded_parametrized:·gc.alloc.rate.norm avgt 15 984.000 ± 119.429 B/op
guarded_parametrized:·gc.count avgt 15 226.000 counts
unguarded_unparametrized avgt 15 280.183 ± 9.168 ns/op
unguarded_unparametrized:·gc.alloc.rate.norm avgt 15 464.525 ± 1.260 B/op
unguarded_unparametrized:·gc.count avgt 15 227.000 counts
guarded_unparametrized avgt 15 285.560 ± 17.173 ns/op
guarded_unparametrized:·gc.alloc.rate.norm avgt 15 464.385 ± 1.087 B/op
guarded_unparametrized:·gc.count avgt 15 209.000 counts
Configuration: CPU: Intel i7-8550U Kaby Lake R; MEMORY: 32GB DDR4 2400 MHz; OS: Ubuntu 19.04; OpenJDK 64-Bit Server VM version (build 13+33)
Case II – analysis
In this scenario, everything is logged, which means the final String is always created and the parameters are always evaluated (e.g. ++anInt, aFloat++). However, the difference in performance is related to different underlying mechanisms, behind each approach.
- string_format – offers very poor performance in comparison to the others.
- lambda_heap – offers decent performance. In this case, the lambda body points to some instance variables, belonging to the SLF4JLogRecordBenchmark class, which are allocated on the heap. When the JVM starts, a VM anonymous class (implementing the Supplier interface) is generated and instantiated. Inside this class, there is a get() method (added by the Supplier interface), that once is called, dispatches the call back to the SLF4JLogRecordBenchmark class. This VM anonymous class does not capture any instance variable belonging to SLF4JLogRecordBenchmark. If you want to inspect the VM anonymous class you need to start the JVM with the flag -Djdk.internal.lambda.dumpProxyClasses=<path>.
- lambda_local – very similar performance as the previous lambda_heap case. However, there is a slight difference in comparison to lambda_heap, related to the fact that local stack variables are now captured by the VM anonymous class once it is instantiated (i.e. passed to the constructor).
- unguarded_parametrized and guarded_parametrized – are quite similar. The explicit guard does not make any difference, since it is always evaluated to true (i.e. predictable), leveraging on the StringFormatter.basicArrayFormat() method to create the final String.
- unguarded_unparametrized and guarded_unparametrized – offers the best performance. The explicit guard is also negligible (i.e. predictable). In both cases, the final String is created using invokedynamic bytecode op which provides a very efficient way of concatenating Strings using the plus operator (feature added in Java 9 with JEP 280).
Conclusions
- be extremely cautious with String.format() or simply avoid using it. Logging is probably a marginal case, but for example, when used inside loops it might create a lot of temporary objects (triggering more often the Garbage Collector).
- guarded_unparametrized (using the plus operator to concatenate the arguments) seems to be the most efficient way, however, the code is not very nice. Nevertheless, if you want the peak performance this might be the way to go.
- lambda_local and lambda_heap offer a good balance between performance and the clarity of the code. This is, in my opinion, the recommended approach to use in most of the applications.
Further Readings
- Translation of Lambda Expressions
- (slides 35-42)
- JEP 280: Indify String Concatenation
- Support the lambda expression in the Logger
UPDATE: The initial post was updated based on the feedback received from Francesco Nigro and dmitry_vk.
Hi, that’s an interesting benchmark.
The lambda expression in “lambda()” method does not capture any variables. But in real life, lambda for logging would usually capture variables from local function stack. AFAIK, the difference between non-capturing lambda and capturing lambda is in memory allocation rate: compiler/JVM is free to reuse the same non-capturing lambda but it must re-instantiate capturing lambda.
It may be the case that in more real-world-like cases atLambda() still incurs some overhead over plain guards (e.g., isDebugEnabled())
I was going to write exactly the same thing 🙂
Hi!
Thanks for sharing!
My advice about the lambda case: in a real world usage the parameters of the lambda always change (because the user log different things most of the time): it would be nice to see what happen if you save the as local (effective final) parameters the ones used right now (aString, aLong etc etc). It would make the lambda to became capturing and probably will see the real allocation we would have in the real case scenario, where parameters comes from the outside.
Hi Dmitry, Francesco,
Many thanks for the feedback, you are perfectly right. There was a slight mistake in regards to capturing lambda test case. I have updated the benchmark with two cases: lambda_heap and lambda_local, as you suggested.
Based on what I found looking at the bytecode, the only difference between these two is just the parameters that are captured by the VM anonymous class in case of heap_local. Nevertheless, in both cases, the Supplier interface gets instantiated (a fact reflected also by the similar alloc.rate.norm).
LAMBDA_HEAP
===========
public lambda_heap()V
INVOKEDYNAMIC get()Ljava/util/function/Supplier; [
// handle kind 0x6 : INVOKESTATIC
LambdaMetafactory.metafactory(…)LCallSite;
// arguments:
()Ljava/lang/Object;,
// handle kind 0x6 : INVOKESTATIC
SLF4JLogRecordBenchmark.lambda$lambda_heap$0()L;,
()L;
]
// $FF: SYNTHETIC CLASS
final class SLF4JLogRecordBenchmark$$Lambda$8 implements Supplier {
private SLF4JLogRecordBenchmark$$Lambda$8() {
}
@Hidden
public Object get() {
return SLF4JLogRecordBenchmark.lambda$lambda_heap$0();
}
}
LAMBDA_LOCAL
============
// access flags 0x1
public lambda_local()V
ALOAD 1
ILOAD 2
ILOAD 3
FLOAD 4
ILOAD 5
INVOKEDYNAMIC get(L;IZFC)Ljava/util/function/Supplier; [
// handle kind 0x6 : INVOKESTATIC
LambdaMetafactory.metafactory(…)LCallSite;
// arguments:
()Ljava/lang/Object;,
// handle kind 0x6 : INVOKESTATIC
SLF4JLogRecordBenchmark.lambda$lambda_local$1(L;IZFC)L;,
()L;
]
// $FF: SYNTHETIC CLASS
final class SLF4JLogRecordBenchmark$$Lambda$7 implements Supplier {
private final String arg$1;
private final int arg$2;
private final boolean arg$3;
private final float arg$4;
private final char arg$5;
private SLF4JLogRecordBenchmark$$Lambda$7(String var1, int var2, boolean var3, float var4, char var5) {
this.arg$1 = var1;
this.arg$2 = var2;
this.arg$3 = var3;
this.arg$4 = var4;
this.arg$5 = var5;
}
private static Supplier get$Lambda(String var0, int var1, boolean var2, float var3, char var4) {
return new SLF4JLogRecordBenchmark$$Lambda$7(var0, var1, var2, var3, var4);
}
@Hidden
public Object get() {
return SLF4JLogRecordBenchmark.lambda$lambda_local$1(this.arg$1, this.arg$2, this.arg$3, this.arg$4, this.arg$5);
}
}
Thanks
Ionut