The results of current speech recognition are excellent. As long as the recording conditions are good (little or no background noise), people speak clearly and distinctly, perhaps with a slight accent but certainly not in dialect, the recognition results are almost human-level.
However, it does not always work well because the recordings are less than perfect and, above all, because human speech is different from written speech. By this we mean that in a conversation it is not at all disruptive if you correct yourself. E.g. ‘I walked behind the house, um... no, in front of it.’ We understand that the first part can be forgotten and that the official sentence should have been ‘I walked past the house.’ But speech recognition does not know this and tries to convert everything from speech to text as accurately as possible. This leads to errors and sometimes strange sentences.
Way of speaking
Another issue is the way people speak. If someone speaks with a strong accent or even in dialect, recognition often goes wrong. But even ‘correct and contemporary British English’ can cause problems. For example, the final N is often not pronounced. We say ‘we lope naar huis’ instead of ‘we lopen naar huis’. Now, this can be solved grammatically (we lope is incorrect), but with a name, for example, you simply don't know. For example, I am often recognised as Arjen van Hesse instead of Arjan van Hessen.
Of course, you can completely trust the speech recogniser, but for a reliable transcription, you will always have to check everything.
Correction Software
To do this, there is correction software that allows you to listen to the recording and read it through, then make corrections where necessary.
Some speech recognisers offer this as an addition to recognition, but usually you will have to use a separate software module.
That is why we created WhisperCorrector: a tool that reads the recognition and, if available, the speakers and saves them all in pieces of about 500 words. You can then check those pieces one by one and correct them where necessary (see the download).
Speaker Diarization
You can also indicate which speaker is speaking at any given moment. Many speech recognisers already do this, but a) they still make quite a few mistakes with short sections and b) you don't want the results to be listed as speaker_1, speaker_2, but as “Hans Janssen” and “Marieke Burger”.
WhisperCorrector is still under development, but we hope (and believe) that the current version already works well.