First step is getting the problem reproducible using the command line. In addition you will also need to output anything that be used to determine if the test case passes or fails using standard console output. This makes it easy to run the test case with expect. You will need to follow [2] on how to setup your VM to use serial output.
Next modify the expect script and ensure it works by just running it by itself. Identify which versions pass or fail (a coarse bisect). Once you can get between two release tags you should start a bisect between those tags.
Next run:
git bisect start git bisect good <good tag> git bisect bad <bad tag> git bisect run ../bisect-run
I ran into a few gotchas that may be bugs that really need fixing. Occasionally when running in '-noconsole' mode, I wouldn't see any prompt for a very long time. When re-running with '-vga std', I'd see that it was waiting at GRUB. You may need to modify timeouts such that you don't hit those issues without a VGA console.
Overall, you can find these scripts here [3].
Hopefully they will evolve a bit once its used more and more.