Generating UI tests with copilot
Test generation for file type globbing
Iteration 1
GoodPrompt
Implement tests for this class (included FuzzyFinderComponent.kt)Result
Copilot generated the following tests:
- 4 tests verifying the orientation based on the search field location
- Default dimensions are followed as expected
- Secondary text field listener is added and works as expected
The test setup wasn’t working, though the error might be coming from my code.
The issue was with allowed actions.
ERROR: Read access is allowed from inside read-action only
Iteration 2
Not goodPrompt
Can you solve this issue? This is not happening when actually running the plugin.
Included the stacktrace.
Result
Copilot added an if to the code, which would run the problematic part in a read-action if it wasn’t allowed. I don’t think adding if blocks to the code for test specific cases is very good.
The explanation sounded valid, but even though the if case was hit it didn’t solve the issue.
This time the issue was with the thread itself.
Access is allowed from Event Dispatch Thread (EDT) only
Iteration 3
Not goodPrompt
The if case is hit, but not we get a different error
Included the stacktrace.
Result
This time copilot made a huge amount of changes to the actual implementation. Changing variable names and formatting things differently making the diff hard to read.
This time the tests did pass, but I wasn’t happy.
I started commenting out things that I didn’t like and the tests just kept passing.
So, Copilot had tried to fix the issue in both places. It had changed the tests to run the command using ETD, which to me seems like the correct solution, but it had also done a large amount of changes to the source file which were totally unnecessary.
For example it created a runnable in the init block, which was used to initialize the component.
I was able to restore the whole FuzzyFinderComponent.kt without issues (30+ changes) and the actual fix were only a couple lines to the test file.
Conclusion
The tests were actually good, and helpful because I don’t like creating UI tests.
They weren’t too complex or too granular, but actually tested things which would be useful in the long run.
Even though copilot did solve the test runner issues, it would’ve included unnecessary complexities and introduced debt without me verifying and challenging its work.
I think pointing out that I could’ve prompted it better is a bit unnecessary. We cannot know everything, that’s why these tools are useful.
Though it still is important that we can learn from the provided solutions.