A few years ago speech recognition was billed to be the ‘next big thing’. So why did this technology not take off?
- Many hours of training are required before a user can operate the system effectively and concentration on the job in hand can be lost during the process of correcting mistakes.
- The user requires not only skills in using the particular software, but also considerable computer knowledge. Even if the user is prepared to put in the time to learn the system, yet further work is required to train the computer system to recognise the voice and understand the accent. Each user has to have their own “voice profile” and even then, a noisy office, a bad head cold or a too-hurried speech pattern can make the system virtually unusable.
- The great advantage that human involvement has over computer automated systems (e.g. voice recognition) is that of common sense! A letter transcribed by a computer program may contain patently absurd phrases whilst a human has the common sense to ensure that the content and context of the transcript make sense.
- The grammar and general English used in many dictations is poor. Whilst human transcribers will endeavour to re-order the letter so that it is clear and easy to understand for the recipient, voice recognition will not be able to do so.
- Generally speaking, each newer version of the software (and there are new versions every few months) requires a brand new training process. The software is of course incredibly clever and is always pushing against the limits of the hardware. Thus, only the very latest PCs are capable of the most recent level of the software, which is quite a considerable expense if more than a few users are to be involved.
- It is extremely difficult to make corrections to the text, requiring the operator to use set phrases, with total precision required - any lack of precision in the terminology leads to the supposedly 'correcting' words appearing on the screen verbatim.



