Logging patterns and their performance impact

(Last Updated On: 14th December 2019)

Motivation

In the current post, I would like to explore different logging patterns and to identify which one is more efficient in terms of performance. Since logging is spread across almost every application, it is important to be aware of what is the most optimal pattern to rely on when writing logging messages. In this context, I would use a setup that is probably very common to most the business applications: logging text messages using SLF4J, as a facade, and LOG4J as a primary logging framework. In the case of any other logging framework, the majority of the things discussed here should still be relevant.

When it comes to performance there are a lot of additional techniques to take into consideration, which could potentially increase the logging throughput: for example switching from text messages to binary logging, using asynchronous appenders, using ramfs or tempfs, using a low latency logging framework, etc. Nevertheless, just for simplicity, I am going to ignore all of these for now and focus only on different logging alternatives at the Java source code level.

Benchmark

Two important remarks in regards to the current benchmark:

  1. the log level is set to INFO and DEBUG (see the logLevel variable), which means the messages intended to be logged by the benchmark on debug() will be, first ignored (in case of INFO log level), and then captured (in case of DEBUG log level)
  2. inside the setup method of the benchmark, I explicitly removed all logger appenders and I have created a custom WritterAppender that uses the Blackhole to sink the messages, hence nothing is physically written (avoiding the IO latency).
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5, time = 3, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 3, timeUnit = TimeUnit.SECONDS)
@Fork(value = 3)
@State(Scope.Benchmark)
public class SLF4JLogRecordBenchmark {

    private static final Logger LOGGER = LoggerFactory.getLogger(SLF4JLogRecordBenchmark.class);

    @Param({"INFO", "DEBUG"})
    private static String theLevel;
    @Param({"P1"})
    private static String aString;
    @Param({"42"})
    private static int anInt;
    @Param({"00.42f"})
    private static float aFloat;
    @Param({"true"})
    private static boolean aBoolean;
    @Param({"!"})
    private static char aChar;
    private Level logLevel;

    @Setup
    public void setUp(final Blackhole blackhole) {
        ByteArrayOutputStream bos = new ByteArrayOutputStream() {
            @Override
            public void write(int b) {
                blackhole.consume(b);
            }

            @Override
            public void write(byte[] b, int off, int len) {
                blackhole.consume(b);
            }

            @Override
            public void write(byte[] b) {
                blackhole.consume(b);
            }
        };

        Logger logger = Logger.getLogger(LOGGER.getName());
        logger.removeAllAppenders();
        logger.setAdditivity(false);

        Appender appender = new WriterAppender(new PatternLayout(), bos);
        logger.addAppender(appender);

        logLevel = "INFO".equals(theLevel) ? Level.INFO : Level.DEBUG;
        logger.setLevel(logLevel);
    }

    @Benchmark
    public void string_format() {
        LOGGER.debug(
           format("Result [%s], [%s], [%s], [%s], [%s]", 
               aString, ++anInt, aBoolean, aFloat++, aChar));
    }

    @Benchmark
    public void lambda_heap() {
        LOGGER.atDebug().log(
            () -> ("Result [" + aString + "], [" + (++anInt) + "], [" + 
                aBoolean + "], [" + (aFloat++) + "], [" + aChar + "]"));
    }

    @Benchmark
    public void lambda_local() {
        String localString = aString;
        int localInt = ++anInt;
        boolean localBoolean = aBoolean;
        float localFloat = aFloat++;
        char localChar = aChar;
        LOGGER.atDebug().log(() -> ("Result [" + localString + "], [" + localInt + "], [" + 
            localBoolean + "], [" + localFloat + "], [" + localChar + "]"));
    }

    @Benchmark
    public void unguarded_parametrized() {
        LOGGER.debug("Result [{}], [{}], [{}], [{}], [{}]", 
            aString, ++anInt, aBoolean, aFloat++, aChar);
    }

    @Benchmark
    public void guarded_parametrized() {
        if (LOGGER.isDebugEnabled()) {
            LOGGER.debug("Result [{}], [{}], [{}], [{}], [{}]", 
                aString, ++anInt, aBoolean, aFloat++, aChar);
        }
    }

    @Benchmark
    public void unguarded_unparametrized() {
        LOGGER.debug("Result [" + aString + "], [" + (++anInt) + "], [" + 
            aBoolean + "], [" + (aFloat++) + "], [" + aChar + "]");
    }

    @Benchmark
    public void guarded_unparametrized() {
        if (LOGGER.isDebugEnabled()) {
            LOGGER.debug("Result [" + aString + "], [" + (++anInt) + "], [" + 
                aBoolean + "], [" + (aFloat++) + "], [" + aChar + "]");
        }
    }
} 

Results

Case I – log level is set to INFO. The highlighted results are better.

Benchmark                                     Mode  Cnt     Score     Error   Units
string_format                                 avgt   15  1709.122 ± 113.994   ns/op
string_format:·gc.alloc.rate.norm             avgt   15  1582.304 ±   3.308    B/op
string_format:·gc.count                       avgt   15       ≈ 0            counts

lambda_heap                                   avgt   15     2.292 ±   0.115   ns/op
lambda_heap:·gc.alloc.rate.norm               avgt   15       ≈ 10⁻⁶           B/op
lambda_heap:·gc.count                         avgt   15       ≈ 0            counts

lambda_local                                  avgt   15     3.011 ± 0.156     ns/op
lambda_local:·gc.alloc.rate.norm              avgt   15       ≈ 10⁻⁶           B/op
lambda_local:·gc.count                        avgt   15       ≈ 0            counts

unguarded_parametrized                        avgt   15     8.527 ±   0.377   ns/op
unguarded_parametrized:·gc.alloc.rate.norm    avgt   15    32.000 ±   0.001    B/op
unguarded_parametrized:·gc.count              avgt   15   260.000            counts

guarded_parametrized                          avgt   15     2.317 ±   0.184   ns/op
guarded_parametrized:·gc.alloc.rate.norm      avgt   15       ≈ 0            counts

unguarded_unparametrized                      avgt   15    99.336 ±   5.973   ns/op
unguarded_unparametrized:·gc.alloc.rate.norm  avgt   15   104.000 ±   0.001    B/op
unguarded_unparametrized:·gc.count            avgt   15   153.000            counts

guarded_unparametrized                        avgt   15     2.343 ±   0.121   ns/op
guarded_unparametrized:·gc.alloc.rate.norm    avgt   15    ≈ 10⁻⁶              B/op
guarded_unparametrized:·gc.count              avgt   15       ≈ 0            counts

Configuration: CPU: Intel i7-8550U Kaby Lake R; MEMORY: 32GB DDR4 2400 MHz; OS: Ubuntu 19.04; OpenJDK 64-Bit Server VM version (build 13+33)

Case I – analysis

In this scenario, nothing is logged, but there are quite important differences between each approach:

  • string_format – creates the final String, passes it to the debug() method, however, due to the fact the log level is INFO the String is not logged. It also triggers the evaluation of the parameters (e.g. ++anInt, aFloat++). As it can be easily spotted, this version is the slowest and has the highest allocation rate.
  • lambda_heap and lambda_local – offers very good performance. They both defer the execution (i.e. lazy evaluation), which is never triggered due to the fact the requested log level at the call site (DEBUG) does not fulfill the one explicitly set to the logger (INFO).
  • unguarded_parametrized – offers quite acceptable performance and clean code. It does not create the final String, because under the hood it checks anyway if debug is enabled. However, the parameters are evaluated (e.g. ++anInt, aFloat++), since they are passed to the debug() method.
  • guarded_parametrized and guarded_unparametrized – are similar and their performance is close to the lambda_heap and lambda_lodal versions. In both cases, the guard (i.e. explicit check) prevents the String creation and any parameter evaluation at the expense of a much uglier code.
  • unguarded_unparametrized – creates the final String and also triggers the parameter evaluations (e.g. ++anInt, aFloat++). The difference between string_format and this approach is that, under the hood, the latter concatenates the Strings relying on invokedynamic bytecode op which dispatches the call to StringConcatFactory.makeConcatWithConstants(). Please see the “Further Readings” section below for additional references on this topic.

Case II – log level is set to DEBUG. The highlighted results are better.

Benchmark                                     Mode  Cnt     Score     Error   Units
string_format                                 avgt   15  2013.429 ± 109.774   ns/op
string_format:·gc.alloc.rate.norm             avgt   15  1916.693 ±  37.644    B/op
string_format:·gc.count                       avgt   15   140.000            counts

lambda_heap                                   avgt   15   359.730 ±  12.843   ns/op
lambda_heap:·gc.alloc.rate.norm               avgt   15   784.000 ±  12.520    B/op
lambda_heap:·gc.count                         avgt   15   218.000            counts

lambda_local                                  avgt   15   359.355 ± 11.172    ns/op
lambda_local:·gc.alloc.rate.norm              avgt   15   768.000 ± 12.520     B/op
lambda_local:·gc.time                         avgt   15   228.000            counts

unguarded_parametrized                        avgt   15   508.648 ±  23.161   ns/op
unguarded_parametrized:·gc.alloc.rate.norm    avgt   15  1072.000 ±   0.001    B/op
unguarded_parametrized:·gc.count              avgt   15   212.000            counts

guarded_parametrized                          avgt   15   491.054 ±  27.361   ns/op
guarded_parametrized:·gc.alloc.rate.norm      avgt   15   984.000 ± 119.429    B/op
guarded_parametrized:·gc.count                avgt   15   226.000            counts

unguarded_unparametrized                      avgt   15   280.183 ±   9.168   ns/op
unguarded_unparametrized:·gc.alloc.rate.norm  avgt   15   464.525 ±   1.260    B/op
unguarded_unparametrized:·gc.count            avgt   15   227.000            counts

guarded_unparametrized                        avgt   15   285.560 ±  17.173   ns/op
guarded_unparametrized:·gc.alloc.rate.norm    avgt   15   464.385 ±   1.087    B/op
guarded_unparametrized:·gc.count              avgt   15   209.000            counts

Configuration: CPU: Intel i7-8550U Kaby Lake R; MEMORY: 32GB DDR4 2400 MHz; OS: Ubuntu 19.04; OpenJDK 64-Bit Server VM version (build 13+33)

Case II – analysis

In this scenario, everything is logged, which means the final String is always created and the parameters are always evaluated (e.g. ++anInt, aFloat++). However, the difference in performance is related to different underlying mechanisms, behind each approach.

  • string_format – offers very poor performance in comparison to the others.
  • lambda_heap – offers decent performance. In this case, the lambda body points to some instance variables,  belonging to the SLF4JLogRecordBenchmark class, which are allocated on the heap. When the JVM starts, a VM anonymous class (implementing the Supplier interface) is generated and instantiated. Inside this class, there is a get() method (added by the Supplier interface), that once is called, dispatches the call back to the SLF4JLogRecordBenchmark class. This VM anonymous class does not capture any instance variable belonging to SLF4JLogRecordBenchmark. If you want to inspect the VM anonymous class you need to start the JVM with the flag -Djdk.internal.lambda.dumpProxyClasses=<path>.
  • lambda_local – very similar performance as the previous lambda_heap case. However, there is a slight difference in comparison to lambda_heap, related to the fact that local stack variables are now captured by the VM anonymous class once it is instantiated (i.e. passed to the constructor).
  • unguarded_parametrized and guarded_parametrized – are quite similar. The explicit guard does not make any difference, since it is always evaluated to true (i.e. predictable), leveraging on the StringFormatter.basicArrayFormat() method to create the final String.
  • unguarded_unparametrized and guarded_unparametrized – offers the best performance. The explicit guard is also negligible (i.e. predictable). In both cases, the final String is created using invokedynamic bytecode op which provides a very efficient way of concatenating Strings using the plus operator (feature added in Java 9 with JEP 280).

Conclusions

  • be extremely cautious with String.format() or simply avoid using it. Logging is probably a marginal case, but for example, when used inside loops it might create a lot of temporary objects (triggering more often the Garbage Collector).
  • guarded_unparametrized (using the plus operator to concatenate the arguments) seems to be the most efficient way, however, the code is not very nice. Nevertheless, if you want the peak performance this might be the way to go.
  • lambda_local and lambda_heap offer a good balance between performance and the clarity of the code. This is, in my opinion, the recommended approach to use in most of the applications.

Further Readings

UPDATE: The initial post was updated based on the feedback received from Francesco Nigro and dmitry_vk.

4 thoughts on “Logging patterns and their performance impact

  1. Hi, that’s an interesting benchmark.

    The lambda expression in “lambda()” method does not capture any variables. But in real life, lambda for logging would usually capture variables from local function stack. AFAIK, the difference between non-capturing lambda and capturing lambda is in memory allocation rate: compiler/JVM is free to reuse the same non-capturing lambda but it must re-instantiate capturing lambda.

    It may be the case that in more real-world-like cases atLambda() still incurs some overhead over plain guards (e.g., isDebugEnabled())

  2. Hi!

    Thanks for sharing!
    My advice about the lambda case: in a real world usage the parameters of the lambda always change (because the user log different things most of the time): it would be nice to see what happen if you save the as local (effective final) parameters the ones used right now (aString, aLong etc etc). It would make the lambda to became capturing and probably will see the real allocation we would have in the real case scenario, where parameters comes from the outside.

  3. Hi Dmitry, Francesco,

    Many thanks for the feedback, you are perfectly right. There was a slight mistake in regards to capturing lambda test case. I have updated the benchmark with two cases: lambda_heap and lambda_local, as you suggested.
    Based on what I found looking at the bytecode, the only difference between these two is just the parameters that are captured by the VM anonymous class in case of heap_local. Nevertheless, in both cases, the Supplier interface gets instantiated (a fact reflected also by the similar alloc.rate.norm).

    LAMBDA_HEAP
    ===========
    public lambda_heap()V
    INVOKEDYNAMIC get()Ljava/util/function/Supplier; [
    // handle kind 0x6 : INVOKESTATIC
    LambdaMetafactory.metafactory(…)LCallSite;
    // arguments:
    ()Ljava/lang/Object;,
    // handle kind 0x6 : INVOKESTATIC
    SLF4JLogRecordBenchmark.lambda$lambda_heap$0()L;,
    ()L;
    ]

    // $FF: SYNTHETIC CLASS
    final class SLF4JLogRecordBenchmark$$Lambda$8 implements Supplier {
    private SLF4JLogRecordBenchmark$$Lambda$8() {
    }
    @Hidden
    public Object get() {
    return SLF4JLogRecordBenchmark.lambda$lambda_heap$0();
    }
    }

    LAMBDA_LOCAL
    ============
    // access flags 0x1
    public lambda_local()V
    ALOAD 1
    ILOAD 2
    ILOAD 3
    FLOAD 4
    ILOAD 5
    INVOKEDYNAMIC get(L;IZFC)Ljava/util/function/Supplier; [
    // handle kind 0x6 : INVOKESTATIC
    LambdaMetafactory.metafactory(…)LCallSite;
    // arguments:
    ()Ljava/lang/Object;,
    // handle kind 0x6 : INVOKESTATIC
    SLF4JLogRecordBenchmark.lambda$lambda_local$1(L;IZFC)L;,
    ()L;
    ]

    // $FF: SYNTHETIC CLASS
    final class SLF4JLogRecordBenchmark$$Lambda$7 implements Supplier {
    private final String arg$1;
    private final int arg$2;
    private final boolean arg$3;
    private final float arg$4;
    private final char arg$5;

    private SLF4JLogRecordBenchmark$$Lambda$7(String var1, int var2, boolean var3, float var4, char var5) {
    this.arg$1 = var1;
    this.arg$2 = var2;
    this.arg$3 = var3;
    this.arg$4 = var4;
    this.arg$5 = var5;
    }

    private static Supplier get$Lambda(String var0, int var1, boolean var2, float var3, char var4) {
    return new SLF4JLogRecordBenchmark$$Lambda$7(var0, var1, var2, var3, var4);
    }

    @Hidden
    public Object get() {
    return SLF4JLogRecordBenchmark.lambda$lambda_local$1(this.arg$1, this.arg$2, this.arg$3, this.arg$4, this.arg$5);
    }
    }

    Thanks
    Ionut

Leave a Reply

Your email address will not be published. Required fields are marked *