Wednesday, January 12, 2011

Reliability of Voice Recognition Technology

Voice recognition technology refers to that technology that recognizes spoken word and converts it into text.  There are many voice recognition softwares in the market, the most popular one being Dragon Naturally Speaking Software by Nuance.  Here, we will talk about this particular voice recognition software.
Dragon Naturally Speaking

Voice recognition software is very helpful to an individual whose keyboard skills are poor.  The software ‘Dragon’ is designed in such a way that the user has a proper interface with the software and its features to the fullest extent possible.

To start with, the software needs to be trained.  Every new user creates his individual profile and then starts the procedure to train the software.  Dragon comes with a module in which the user needs to train it with regards to the tone of the user’s voice.  This module has a series of steps to be followed so that the software gets accustomed to his voice.  Once the user is comfortable with the commands of the software, he can work with the software on live jobs.

As a transcriber, a live job means the audios to be transcribed.  With this software, a transcriber can listen to the audio and speak out the lines as they are heard.  This software also contains intelligence which is of added advantage to the user.  If a user narrates a line, the software is able to interpret the content to some extent and not confuse with phrases like ‘I scream’ and ‘ice cream’ according to the context.

This software is also helpful for those who lack a proper English vocabulary.  Difficult and rarely used words like ‘habiliments,’ ‘sacerdotal,’ etcetera, if spoken properly and clearly, can be typed out by Dragon without the transcriber knowing these words.  It is also useful in a similar way in case of names of places.  The more this software is used, the more it gets accustomed to the voice and tone of the speaker thus enabling it to grasp the context and content matter of the file.  This helps to easily get hold of some words which are time-consuming to find in some cases.

Using 'Dragon' for the actual work
At times, a file can have some medical terms specific to some disease.  In these cases also Dragon helps to some extent and deciphers the words with regards to the contextual meaning of the statement.  A disease name such as amebiasis can be spelt as pronounced by Dragon more or less in a correct way provided the user narrates it correctly.

In general, even simple English words which frequently appear in a file; for example, words like ‘differentiation,’ which are long and tedious to type, are easily taken care of by Dragon once it gets used to the speaker’s accent, tone, pronunciations, etcetera.  All in all, Dragon reduces the time spent on typing the file and enables a transcriber to devote more time to research.  This results in optimum quality transcripts.

Wednesday, January 5, 2011

Speaker Identification – Expectations and Limitations

Speaker Identification is the process of identifying different speakers in the audio file/transcript.  Usually, speaker identification is done by one of the following methods.

1.    Reference Material/Agenda – In case of a meeting/conference/symposium, if we have the minute-to-minute agenda of the meeting along with the names of the speakers, we try to match the speaker to the speaker names as given in the agenda.  This is the simplest way to get accurate speaker names.  So, we always encourage our clients to send us the agenda or draft of the meeting/conference/symposium at the time of job confirmation so as to get accurate speaker identification for their jobs.

2.    Video Reference –Speaker identification can be made easy if the client provides professionally recorded video files as references where the focus is completely on the speakers.  In such cases, the transcriber can easily identify the speakers by viewing the video.

3.    Googling/Research skill based – There are many cases where there is no reference or speaker names provided by the client.  At this time, it is a very challenging task for the transcriber.  The transcriber tries to identify the voices of different speakers, simultaneously differentiating them into Speaker1 or Speaker2.  If the speakers identify themselves while speaking, the transcriber then uses his Googling/search engine skills to find out the name on the internet.  He then tries to relate the speaker name to the content of the file so as to judge whether it might be the same speaker.  The limitation in this case is that there is a high chance of identifying the wrong speaker as the internet is not a very reliable source.  This is because the search engine shows up a very wide variety of results with the same name.

4.    Voice differentiation – This is the most difficult method.  If there is no reference material or agenda and if there are more than 3 to 4 speakers in the audio, then it is very challenging to differentiate the speakers based on their voice tone.  This is especially so when it is a discussion where the speakers’ voices overlap and one cannot decipher what each speaker is saying.  Identification of speakers is almost impossible if the audio quality is bad.  In such cases, we try our best to use our listening skills to differentiate the voices to our best possible ability.  The limitation in this case is that it is based on the listening skills of the transcriber and there are high chances of mix up of speakers.

As our standard service at Cripton, Speaker Identification is done as [Male] and [Female] or Interviewer or Interviewee.  Having said that, our Transcribers/Editors always strive to do speaker identification to their best ability by applying one of the above-mentioned methods.  But these methods have their limitations; for more than three speakers, it becomes difficult to identify a particular speaker without a set agenda as a reference.

Hence, the only way to expect correct speaker identification is to provide proper reference material in the form of agenda/draft or speaker names of the conference/meeting/symposium, video files, etc.  Also, it is very important that the client uploads all the reference material along with the job itself and not at a later time as speaker identification is done at a primary stage when the transcriber is working on the document.  Hence, it is always advisable for the client to upload all the appropriate reference materials along with the audio jobs.  This will ensure the delivery of a transcript with accurate speaker identification.