Performance testing an IntelliJ plugin with JMH

January 26, 2025

Performance testing Fuzzier

The time has finally come. I want to create some performance metrics for Fuzzier.

Fuzzier is a file finder plugin for JetBrains editors inspired by Telescope for Nvim that I've been developing for a little over a year now.

First, I need a plan. Fuzzier is written in Kotlin and I'm a Java developer, so I'll be leaning into the tools that I know.

Hindsight

In this article I'll choose to set up JMH and focus on a single function, that I measure and improve. I faced some challenges while setting up JMH with Kotlin and IntelliJ plugin dependencies and have documented them here.

The Plan

Hindsight

While the initial idea was to benchmark the actual search and I had some thoughts on this, I did end up choosing a more contained problem to focus on. I think it was a good choice, as now I have a better idea how to approach performance testing Fuzzier, and have a working setup for the future.

Here are some questions that need to be answered first.

What do I want to measure?
What can I use the results for?
Will the same setup work in the future?
What tools should I use?

Let's go over each of these one by one.

What do I want to measure?

Primarily, I want to measure how long Fuzzier takes to process IntelliJ's file index and produce a result. The secondary goal is to ensure that the memory footprint is kept under control, though I don't expect this to be a problem.

The measurements should account for the settings used in the processing, either through the IDE or separate integration.

What can I use the results for?

The goal is to create a baseline performance metric, which I can compare future results to.

I cannot account for everything, so these initial results are not perfect and will likely need to be tweaked and measured again later.

Will the same setup work in the future?

I'd like to be able to use the same setup/frame in the future to measure comparable benchmarks.

One large issue is that I do not have static hardware to benchmark on. My personal pc is the primary way that I will be doing these tests, but that is likely to change in the future which would invalidate the previous measurements instantly.

I'm not experienced in using them, or creating performance metrics, so I'll have to do some exploring. Here are my initial thoughts between the tools.

JMeter

I could create a testing endpoint, which could be used to launch full tests using a running IDE. This would allow me to use JMeter to start and measure the actual time taken.

JMH

With JMH I could test smaller parts of the code with more precision, but this would require a lot of work to capture a large part of the program. I would write the tests using code and running them with JMH.

The hard part (setting up JMH)

Hindsight

The hardest part was trying to put JMH, Kotlin and IntelliJ together. Here I also realize that I have a good function to use and the initial test can be copy pasted from my existing unit tests. Here's a link to the pull request, which includes changes to the build.gradle file.

How do I create the necessary tests and make sure that it accounts for everything that I want it to?
How do I measure the things that I want to?
- What is the output, how is it read, and how will I store it?
  Always start simple
  Those were some open ended questions. Instead of trying to consider everything I end up starting with what I run into.
  JMH with Gradle
  No more planning, time to go and break some things. I guess I'll start with JMH and see what happens.
  Requirements
  The jmh task doesn't have the same dependencies available as the main project, which caused me some headache. While trying to run the task I got an error where com.intellij.ui.JBColor was not being found.

This took me some time to figure out, because the missing libraries were not in a public dependency. I had to manually add the jars to the jmh task dependency, because I did not find a solution which would just reuse all of the dependencies and plugins from the main gradle project.

My solution was to copy the dependencies to the project and use a file dependency to add them to the jmh task. This way I could avoid adding an absolute path to the build file.

jmh-gradle-plugin - Plugin
- id("me.champeau.jmh") version "0.7.2"
JMH core - Dependency
- implementation("org.openjdk.jmh:jmh-core:1.37")
JMH annotation processor - Dependency
- annotationProcessor("org.openjdk.jmh:jmh-generator-annprocess:1.37")
Kotlin standard library for JMH - Dependency
- `jmh("org.jetbrains.kotlin:kotlin-stdlib:2.1.0")'
File dependency for idea:ideaIC:2024.3. Copied each .jar file from my .gradle directory.
- ideaIC-2024.3/lib cp *.jar ~/IdeaProjects/fuzzier/libs/
- jmh(files("libs/*"))

The default source file location is src/jmh, so my tests live in the following location src/jmh/kotlin/com/mituuz/fuzzier/performance.

Configuration can be done through build.gradle file. Here's an example:

jmh {
  fork = 1
  warmupIterations = 1
  iterations = 1
  timeOnIteration = "1s"
  resultFormat = "JSON"
  jvmArgs = listOf("-Xms2G", "-Xmx2G")
  jmhTimeout = "30s"
}

Running JMH

Note to self

Do not use prints when testing JMH. The terminal can't keep up and it froze my system twice before I realized what's going on.

Friendly reminder for us Linux folk: Raising Elephants Is So Utterly Boring

After solving the print issue, I was able to run the JMH task from the terminal and through IntelliJ. jmh task wasn't compatible with gradle's configuration cache, so I ended up disabling it when running the task. ./gradlew jmh --no-configuration-cache

Crafting tests to use with JMH

While setting this up, I realized that I have a perfect function to do some actual improvements.

I created an ugly highlighting function that appends several strings together to form an HTML string that has certain letters using different background color (<font style='background-color: $colorAsHex'). I knew that it wasn't the best way and always had the intention to return to this.

Creating the test

To keep this simple, I just want to measure the highlight function for a single string with varying number of characters to highlight. I can just copy my existing test and change some parameters to create variance.

It's hard to say anything about benchmarking results when you don't have anything to compare to, but here is the initial run.

Benchmark	Mode	Score	Units
PerformanceTests.noMatchesInFilename	thrpt	36214160.146	ops/s
PerformanceTests.oneMatchInFilename	thrpt	17464086.561	ops/s
PerformanceTests.threeMatchesInFilename	thrpt	6670398.128	ops/s
PerformanceTests.sevenMatchesInFilename	thrpt	2880108.760	ops/s

But before doing any improvements it's time to think about how I can get consistent performance results.

Setting up test environment

Hindsight

I did not end up with a consistent test environment and do not need it at this point, but here is my failed attempt to benchmark with two devices using docker with constraints.

Okay, so I dug around a bit and decided to go with docker constraints. First building the image from a dockerfile and then defining constraints on the docker run command.

Not sure if this is the best way, but you gotta start somewhere. (Using docker-compose and defining the resource limits in the file would be an alternative way to do this.)

Here's my initial run configuration and the results (I decided to increase the measurement time a bit).

jmh {
  fork = 1
  warmupIterations = 2
  iterations = 2
  timeOnIteration = "2s"
  resultFormat = "JSON"
  jvmArgs = listOf("-Xms2G", "-Xmx2G")
  jmhTimeout = "30s"
}

docker build -t fuzzier-perf .
docker run --memory="3G" --cpus="2.0" -d fuzzier-perf

Initial results

Laptop

Benchmark	Mode	Score	Units
PerformanceTests.noMatchesInFilename	thrpt	20833517.400	ops/s
PerformanceTests.oneMatchInFilename	thrpt	7015488.826	ops/s
PerformanceTests.threeMatchesInFilename	thrpt	3328829.228	ops/s
PerformanceTests.sevenMatchesInFilename	thrpt	1342837.360	ops/s

Desktop

Benchmark	Mode	Score	Units
PerformanceTests.noMatchesInFilename	thrpt	44547130.894	ops/s
PerformanceTests.oneMatchInFilename	thrpt	13409151.110	ops/s
PerformanceTests.threeMatchesInFilename	thrpt	6207958.712	ops/s
PerformanceTests.sevenMatchesInFilename	thrpt	2635322.828	ops/s

No luck

So, because the cores aren't equal between the devices the results aren't either. In hindsight this makes perfect sense.

I think I'll use a smaller measure -> change -> measure cycle instead of trying to create permanent, comparable results.

This means that my performance testing will be limited to the scope of a single change and will only be valid for a little time. This is fine, because my goal shifted from creating general measurements to improving a single function.

Pivot

With the new cycle defined I can start working on the improvements.

measure -> change -> measure

I'll do all of the work on a single device to ensure that the measurements are valid.

Baseline

First, I'll create the baseline measurements that I can compare against after doing some changes.

Because I don't need docker anymore, I'll be running the JMH task without it for convenience and speed.

Now that I am ready to start measuring, I should take a closer look on which mode I will be using and what am I actually measuring.

First, I need to select the JMH mode that I want to use.

JMH modes

JMH offers several modes, which measure different things.

Throughput (default)
- How many times the function completes per second
AverageTime
- How long does the function take to run
SampleTime
- Assesses the execution time of specific calls within a function over a defined period
SingleShotTime
- Measures the execution time of a function in a single invocation without any iterations.

Because the function in question is being run for x amount of files with each search, I could use either average time or throughput. The function is so fast that throughput feels easiest to read, the higher the score, the better.

Results

So, here are the initial results of how many operations complete in a second. Quite noticeable decrease in the score when the match count increases, which is to be expected because that requires more operations.

Benchmark	Mode	Cnt	Score	Units
PerformanceTests.noMatchesInFilename	thrpt	2	44580990.574	ops/s
PerformanceTests.oneMatchInFilename	thrpt	2	15314862.488	ops/s
PerformanceTests.threeMatchesInFilename	thrpt	2	6661081.448	ops/s
PerformanceTests.sevenMatchesInFilename	thrpt	2	2652866.574	ops/s

Optimization time

Here is the function that I would like to optimize. It works by inserting tags in to an HTML string, which then highlights the correct characters.

The tag used is a font style which sets the background color. Swing components do not support every HTML tag and I remember trying a lot of different things before ending up with this.

var startStyleTag = createStartStyleTag()
const val END_STYLE_TAG: String = "</font>"

private fun createStartStyleTag(): String {
    val color = JBColor.YELLOW
    return "<font style='background-color: ${colorAsHex(color)};'>"
}

fun highlight(source: String): String {
    val stringBuilder: StringBuilder = StringBuilder(source)
    var offset = 0
    val hlIndexes = score.highlightCharacters.sorted()
    for (i in hlIndexes) {
        if (i < source.length) {
            stringBuilder.insert(i + offset, startStyleTag)
            offset += startStyleTag.length
            stringBuilder.insert(i + offset + 1, END_STYLE_TAG)
            offset += END_STYLE_TAG.length
        }
    }
    return stringBuilder.toString()
}

Going for CoPilot

Before even taking a look at what I could do, let's see what CoPilot thinks.

I simply asked it to optimize the function, copy pasted the results and measured again.

fun highlight(source: String): String {
    val hlIndexes = score.highlightCharacters.sorted()
    val parts = mutableListOf<String>()
    var lastIndex = 0

    for (i in hlIndexes) {
        if (i < source.length) {
            parts.add(source.substring(lastIndex, i))
            parts.add(startStyleTag)
            parts.add(source[i].toString())
            parts.add(END_STYLE_TAG)
            lastIndex = i + 1
        }
    }
    parts.add(source.substring(lastIndex))

    return parts.joinToString("")
}

Benchmark	Mode	Cnt	Score	Units
PerformanceTests.noMatchesInFilename	thrpt	2	21140629.430	ops/s
PerformanceTests.oneMatchInFilename	thrpt	2	12419843.488	ops/s
PerformanceTests.threeMatchesInFilename	thrpt	2	3177763.124	ops/s
PerformanceTests.sevenMatchesInFilename	thrpt	2	1366734.188	ops/s

Here's the explanation that CoPilot gave me.

To optimize the highlight function, we can use a StringBuilder more efficiently by collecting all the parts of the string and then joining them at the end. This avoids multiple insertions which can be costly.

This made everything slower. Instead of using StringBuilder and appending the parts, CoPilot wanted to store all of the invidual strings and keep them in a list. That does not sound like it would be faster, or more memory efficient. Instead of using StringBuilder to create a single String object, it would create and store multiple distinct String objects.

I'll do it myself

Okay, so I know what I can look for. Instead of adding tags for each character, I could group them and add the start and end tags for each group. This of course means that I want to add another test where we have a same number of characters to tag, but the grouping is different.

I decided to create separate threeMatchesInFilenameInARow and threeMatchesInFilename. So a new set of initial benchmarks, where it is clear that no matter the groupings 3 matches take the same time.

I'll also drop the no matches case, because every time this function is called the string must have at least one match.

Benchmark	Mode	Cnt	Score	Units
PerformanceTests.oneMatchInFilename	thrpt	2	15147366.761	ops/s
PerformanceTests.threeMatchesInFilenameInARow	thrpt	2	6757701.170	ops/s
PerformanceTests.threeMatchesInFilename	thrpt	2	6751191.472	ops/s
PerformanceTests.sevenMatchesInFilename	thrpt	2	2703818.535	ops/s

Grouping characters

Here's a comparison on what the results would look like with and without grouping. I think it is pretty clear why this would be an improvement.

Without grouping:
<font style='background-color: #ffff00;'>H</font><font style='background-color: #ffff00;'>e</font>llo

With grouping:
<font style='background-color: #ffff00;'>He</font>llo

This had the exact effect that I was looking for, by reducing the number of tags inserted we can make sequential character processing faster. It does slow down when there are no sequential matches, but I believe that some sequences are to be expected in the most common cases.

Benchmark	Mode	Cnt	Score	Units
PerformanceTests.oneMatchInFilename	thrpt	2	15262031.008	ops/s
PerformanceTests.threeMatchesInFilenameInARow	thrpt	2	10133267.856	ops/s
PerformanceTests.threeMatchesInFilename	thrpt	2	5662795.456	ops/s
PerformanceTests.sevenMatchesInFilename	thrpt	2	3672124.419	ops/s

Here is a better look into the changes between the two implementations. For such a small change, I think it is a good one.

Test Name	Initial ops/s	Optimized ops/s	Change in ops/s	% Change
PerformanceTests.oneMatchInFilename	15147366.76	15262031.01	114664.25	0.76%
PerformanceTests.threeMatchesInFilenameInARow	6757701.17	10133267.86	3375566.69	49.95%
PerformanceTests.threeMatchesInFilename	6751191.47	5662795.46	-1088396.02	-16.12%
PerformanceTests.sevenMatchesInFilename	2703818.54	3672124.42	968305.88	35.81%

I think this has gone on for long enough and this article is getting a bit too long. This is not the only issue that I had with the highlighting and I think it can still be improved, but it is a start.

Final thoughts

Setting up JMH was considerable amount of work, but honestly I feel like it was a good investment. Fuzzier is a file finder plugin and it needs to be fast to feel good. Having the framework ready to keep iterating and being able to see the results in a readable form will help me in the future.

I didn't even get to JMeter here, but maybe at some point in the future I could look at how to integrate it to measure the scoring and matching.

If you have any tips, questions or anything else feel free to contact me at mitja@mituuz.com.

Performance testing Fuzzier

The Plan

What do I want to measure?

What can I use the results for?

Will the same setup work in the future?

JMeter

JMH

The hard part (setting up JMH)

Always start simple

JMH with Gradle

Requirements

Running JMH

Crafting tests to use with JMH

Creating the test

Setting up test environment

Initial results

Laptop

Desktop

No luck

Pivot

Baseline

JMH modes

Results

Optimization time

Going for CoPilot

I'll do it myself

Grouping characters

Final thoughts

References