Generating UI tests with copilot

Test generation for file type globbing

Iteration 1

Good

Prompt

Implement tests for this class (included FuzzyFinderComponent.kt)

Result

Copilot generated the following tests:

4 tests verifying the orientation based on the search field location
Default dimensions are followed as expected
Secondary text field listener is added and works as expected

The test setup wasn’t working, though the error might be coming from my code.

The issue was with allowed actions.

ERROR: Read access is allowed from inside read-action only

Iteration 2

Not good

Prompt

Can you solve this issue? This is not happening when actually running the plugin.

Included the stacktrace.

Result

Copilot added an if to the code, which would run the problematic part in a read-action if it wasn’t allowed. I don’t think adding if blocks to the code for test specific cases is very good.

The explanation sounded valid, but even though the if case was hit it didn’t solve the issue.

This time the issue was with the thread itself.

Access is allowed from Event Dispatch Thread (EDT) only

Iteration 3

Not good

Prompt

The if case is hit, but not we get a different error

Included the stacktrace.

Result

This time copilot made a huge amount of changes to the actual implementation. Changing variable names and formatting things differently making the diff hard to read.

This time the tests did pass, but I wasn’t happy.

I started commenting out things that I didn’t like and the tests just kept passing.

So, Copilot had tried to fix the issue in both places. It had changed the tests to run the command using ETD, which to me seems like the correct solution, but it had also done a large amount of changes to the source file which were totally unnecessary.

For example it created a runnable in the init block, which was used to initialize the component.

I was able to restore the whole FuzzyFinderComponent.kt without issues (30+ changes) and the actual fix were only a couple lines to the test file.

Conclusion

The tests were actually good, and helpful because I don’t like creating UI tests.

They weren’t too complex or too granular, but actually tested things which would be useful in the long run.

Even though copilot did solve the test runner issues, it would’ve included unnecessary complexities and introduced debt without me verifying and challenging its work.

I think pointing out that I could’ve prompted it better is a bit unnecessary. We cannot know everything, that’s why these tools are useful.

Though it still is important that we can learn from the provided solutions.