Cripton Transcription Blog: 2011

Monday, September 12, 2011

The Challenges of Mixed Audio Transcription

In today’s article, we are going to focus on the challenges of mixed audio transcription.

An audio which contains two more than two languages is called as a mixed audio. Here, we will be focusing on audios having two languages. For example: transcription of an audio which contains English as well as Japanese languages.

There are three types of mixed audios. They are as follows:
1. Consecutive interpretation.
2. Simultaneous interpretation.
3. Random mix of both languages.

Consecutive Interpretation: In this case, the two different languages (English and Japanese) are spoken one after the other. Here, the same audio gets allocated to both language transcribers. It is important for both the transcribers to listen to the entire audio, mainly because in consecutive interpretation, the rate of language change is very frequent. This process takes up a lot of time. Next, compilation of the two different final transcripts is required which adds up to the time taken. Frequent speaker interruptions also make it difficult to compile the transcript.

Simultaneous Interpretation: This signifies that two different languages are spoken at the same time. In this case too, the same audio gets allocated to both language transcribers. Compilation is not required as both the languages are heard throughout the audio file and can be transcribed in separate documents. In most instances, both the languages are heard in different channels (for example: Japanese in the left channel and English in the right channel). In cases where just one channel is available, it is very difficult for both language transcribers to work on the audio as it is not possible to clearly hear only one language at a time. Since both languages are heard throughout the audio, the transcription process is challenging and increases the time and effort utilized in the process much more than a normal single-language audio. Owing to this, clients are charged a premium rate, which they are made aware of before taking up the assignment.

Random mix of both languages: This means that there is a random mix of more than one language in the audio file; there is no pattern in terms of the frequency of language change. One needs to listen to the entire audio carefully to figure out the breakup of both languages. After that, the job can be allocated to both language transcribers. In this case, compilation of the files can be done by someone who is familiar with both the languages.

In all the above scenarios, it is clear that working on a mixed audio transcript consumes almost twice the time than that of a normal English transcript (single language audio). Allocation of mixed audio transcription jobs also takes significantly more time than the standard transcription job, right from listening to the audios to compilation of the final transcripts, as well as passing on specific instructions and references (if any) to both language transcribers.

Wednesday, September 7, 2011

Why are ‘Batch Deliveries’ not feasible/advisable in Transcription Industry

Definition: Whenever the client requires the delivery of transcripts in batches as opposed to a single delivery of the entire job, it is termed as Batch Delivery.
Batch Deliveries are not our standard service and we do not encourage them, the reasons for which will be clear as you read through this blog.

Batch Deliveries usually are favoured by clients in rare cases where;
1. Clients need urgent delivery of the entire job and it is not possible for us to process that job in a day. Thus, sometimes clients request us to split the job up in batches so that each batch will have a separate delivery schedule.

2. In some cases where clients have a very large job, for example, an entire Conference/Symposium which spans over an entire week, the clients upload the audios on a daily basis and hence the delivery is then scheduled in batches as per the turnaround time (TAT) for that particular file.

Disadvantage of Batch Deliveries:
1. Serious Quality Implications: With regards to batch delivery, there are multiple audio files with urgent TAT. These files are usually processed as single entities, not an entire job, and can get allotted to different Editors. This raises the issue of quality/consistency in the overall output of the job. If it is a single delivery, then we always try and give the entire job to a single Editor (owner) so he has a good understanding of the subject matter of the job, the speakers in the files with regard to identification, etc., and hence the final transcripts are accurate and of the best quality.

2. Consistency Issues: In batch deliveries, there are multiple files and multiple Editors work on it at the same time which usually results in issues of inconsistency in formatting/speaker identification, etc. For example, there may be instances where the Main Speaker has identified himself/herself at the start of the symposium/interview in the first audio file. As it is a batch delivery, the Editor doing the second or subsequent files might not be the same person who worked on the first file. Hence, the second Editor might miss out on that information given in the first audio. This will result in incorrect or no speaker identification in the subsequent files which leads to major consistency issues in the job.

3. Delayed Deliveries: Sometimes, as there are three to four batches of the same job that have to be delivered over a span of 3–4 days, there may be a high chance that some batches may get delayed. This is because it is very challenging for the Process Manager to keep a track of these batches as compared to a single delivery for the entire job.

Thus, we always encourage our clients to avoid Batch Delivery Schedules and to keep a single delivery schedule for the entire job so that we can assure our clients an unclear-free, optimal quality output with timely deliveries.

Tuesday, July 5, 2011

Key Factors for an Effective Interview

A conversation between two or more people – the interviewer(s) and the interviewee(s) – where questions are asked by the interviewer to obtain information from the interviewee is called an interview. A good interview is a focused, direct conversation between two or more professionals. Interviews are the best way of understanding a complicated situation and viewing it from someone else’s perspective. From a transcriber’s point of view, an ideal interview should take place in a quiet surrounding without any disturbance. This helps in producing a better quality transcript.

The first and foremost requirement while transcribing an interview would be the introduction of the participants of the interview for speaker identification. This information is not only limited to speaker identification, but also helps in getting additional information on the participants, which in turn helps to produce a good quality transcript.

A good interviewer should always be prepared with questions, facts, and sources of information in a professional manner. Speaking clearly, slowly and repeating keywords also helps a great deal. The pace of an interview should be controlled in accordance with the time allotted.

There are instances where there may be multiple interviewers. In this case, coordination with fellow interviewers is the key. Interviewers should divide the questions to be asked between themselves. If there is no coordination between the interviewers, it would result in a lot of interruptions. Such overlap of communication should be avoided.

On the interviewee front, interviewees should try to be concise while answering the questions posed to them. Digression should be avoided. Avoid interrupting the interviewer and let the interviewer finish questioning before beginning to answer. An interviewee should always keep in mind what information the interviewer is seeking and then give crisp and clear answers. There also can be instances where there are multiple interviewees. Again, it is important to have coordination with fellow interviewees. Interruptions should be avoided and there should be clear segregation of people answering particular types of questions and so on.

However, there are instances when an interview falls below its potential because of some reasons such as the participants are not fully prepared, the interviewer has not organized the questions, or there is no fixed format followed. When an interviewer speaks in an unclear or a hurried manner, the interviewee sometimes fails to comprehend the questions and can repeatedly request the interviewer to repeat the questions. This results in deterioration of the transcript quality. The same rule applies for the interviewee as well as they too have to be prepared for the interview with all research, facts, and sources.

In order to conduct a good interview, distractions should be avoided such as cell phones, BlackBerries, meeting room intrusions, etcetera. In case of a phone interview, it is feasible to utilize a landline. Interviews on cell phones should be avoided. Focus should be maintained on what is required of the interview by all the participants. Try using an ergonomic keyboard, which is quieter than a normal keyboard, so that the sound of typing doesn’t intrude on the interview.

Following all these general and basic practices will be useful in ensuring an effective interview and subsequently in producing the best quality transcripts.

Monday, June 6, 2011

Cripton's Transcribers: Their Background and Experience

Cripton, a division of Cactus Communications Pvt. Ltd., provides English transcription services to academic and corporate clients. Cripton boasts of a team of highly experienced, qualified in-house transcribers and editors with an average work experience that ranges from 5 to 10 years. The main aim of Cripton is to strive to work closely with the client to understand all the aspects of client needs and provide the best possible transcription services that suit and satisfy the clients’ requirements.

The process of transcription, or the conversion of voice-recorded reports into text format, is carried out by a transcriptionist. The key skills and abilities that go into making a transcriptionist are: Sound knowledge of diverse terminology; usage of correct punctuation, grammar and spelling; appropriate internet search skills; and excellent typing skills.

Cripton has individuals with varied educational backgrounds working as transcriptionists delivering on client documents according to their field of specialization.

There are individuals who are graduates in Science – majoring in either Physics, Chemistry, Biology, Medicine, or Electronics/Engineering – and provide their vast expertise in catering to the transcription needs of clients on subjects related to scientific fields like natural sciences/phenomena, biological life, metaphysics, mathematics, and social sciences; and medical fields like cardiology, chiropractics, gastroenterology, internal medicine, neurology, orthopedics, pediatrics, podiatry, psychiatry, radiology, and a host of other fields.

There are individuals who hold a Bachelors degree in Commerce – majoring in accountancy, business administration, e-commerce, economics, finance, and marketing – and provide their vast expertise in catering to transcription needs of clients on subjects dealing with finance, fiscal and industrial policies, market fluctuations, and trade. The transcripts could be related to board meetings, conferences, interviews, lectures, speeches, roundtable discussions, symposiums, seminars, or teleconferences.

There are individuals who have graduated in Arts or Humanities – majoring in advertising, business management, journalism, mass communication, media studies, performing arts, travel and tourism, and more – and provide their vast expertise in catering to clients on subjects like English being taught as a foreign language or accommodating to a client demanding a transcript in a specific format, example: with timestamps.

In a nutshell, Cripton’s transcriptionists are capable of handling transcripts from as varied and diverse fields like business, law, market research, medical, and any individual with a transcription need, the guiding principle being to understand the client's needs and providing prompt and flexible services, maintaining confidentiality, and regular communication with the client. Thus, one of the main pillars of the success of Cripton and its transcriptionists has been in striving to provide accurate transcriptions on time, every time.

Cripton also provides equal growth of opportunities and thus a transcriptionist can graduate to become an Editor, who edits transcripts and checks for quality, and also grow further to the level of a Quality Lead or a Process Manager.

Thursday, May 26, 2011

Why does inserting timestamps necessitate a premium charge?

In today’s article, we will focus on inserting timestamps in transcripts.

· Entering timestamps is largely a manual task performed by transcribers who listen to and transcribe the audio files. This transcript is subsequently reviewed by editors who also need to ensure that the final product is of good quality with careful compliance of instructions.

o Being a manual task, it consumes the transcribers’ time and efforts depending upon the frequency of the required timestamps.

· Inserting timestamps at regular intervals takes the transcribers’ attention away from their basic transcription requirements.

o In order to ensure quality output with timestamps and compliance of instructions, the transcribers spend an unusual amount of time transcribing audio files with such instructions. This ultimately results in loss of productivity for the transcribers.

· The following are some of the examples of the frequency of timestamps, i.e. when to enter a timestamp:

o Every 5 minutes, every 2 minutes, every 1 minute, every sentence (suitable for subtitling), every time there is a change of speaker, etcetera. These are just a snapshot of some requests we have received so far.

· The most time-consuming and exhaustive of the above methods is when one needs to insert timestamps in every sentence. Again, if both the start and end time are required, one cannot expect a transcriber to transcribe more than 20 minutes in a particular day (ideally a transcriber can transcribe more than 50 minutes in an 8-hour shift). This indicates that transcribers need to insert a timestamp at the beginning and the conclusion of a sentence, then again at the beginning and end of sentence2, and so on. This process is repeated until the end of the audio file.

o For example:

§ 0:00:05 – 0:00:20

§ Sentence1

§ 0:00:21 – 0:00:39

§ Sentence2

· Consequently, the lesser the frequency, the more time and efforts required.

· Again, inserting timestamps for every speaker change can be complicated.This is because several times, speakers (mostly discussion participants and in some cases interviewers/interviewees) interrupt each other and thus cut off the flow of the conversation as they tend to speak at the same time. In this case, the process becomes even more difficult.

· These days, there are some software products available which claim to help insert timestamps in a transcript. After exploring those products, we have realized that inserting a timestamp is still a combination of two commands. This means that every time a transcriber wants to enter a timestamp, they need to provide two keystrokes. This instance illustrates that manual efforts have to be taken to maintain the quality of the transcript.

· All in all, inserting timestamps at regular intervals, for example, at a frequency of less than 5 minutes, requires meticulous and intensive efforts on the part of the transcribers.

Wednesday, May 11, 2011

BPO and Transcription Industry in India – An Overview 2/2

When transcription first began in India, there were various institutes that imparted professional transcription training in lieu of a fee. The fee varied from as low as 10,000 to 60,000 Indian Rupees. There were also some organizations that used to impart on-the-job training and then recruit the candidates in-house or provide placements. The basic training lasts from anywhere between 4 to 6 months.

The transcription training program includes different modules or course subjects – depending on business, legal, general or medical – and is designed to help gain the knowledge and skills towards becoming a transcriptionist. Numerous exercises and assessments throughout each module ensure that the candidates master each concept before moving on.

Medical Transcription, a subdivision of transcription, is one of the fastest growing fields in healthcare. In the United States, the entire healthcare industry is based on insurance and detailed medical records are needed to process insurance claims.

Medical transcription is the process of accurately and quickly transcribing medical records dictated by doctors. A medical transcriptionist transcribes, formats, and proof-reads the medical report of a patient.

The ideal qualification for a medical transcriptionist is Graduation and the primary skills required towards becoming an ideal medical transcriptionist are: Good listening and language skills, knowledge of medical terms, fluency in English, and comprehension of various accents. Undergoing a training course in Medical Transcription is an added advantage.

The major end-users of transcription are media houses, universities and institutes, law firms, consultancy services, and business services.

Thursday, May 5, 2011

BPO and Transcription Industry in India – An Overview 1/2

Business Process Outsourcing or BPO started off with the shifting of manufacturing goods to countries providing cheap labor, but in today's technology-driven world, the definition of BPO has taken a new dimension wherein it involves the contracting of a process to a third-party service provider. This process is beneficial to both the outsourcing company and the service provider.

The main advantages of outsourcing are the flexibility it provides to the organization in terms of utilization of its employee strength by focusing on core competencies, rather than performing non-core or administrative processes; increased efficiency, speed, quality, and lower costs in delivering the end product; and real-time access to skilled people.

The work generally involves outsourcing of internal business functions such as human resources, finance, and front office in the form of customer-related services.

The process of outsourcing work to India began in the early 1990s. The major reasons that contributed to India's success in this industry is: Abundant, skilled, English-speaking manpower; high-end telecom and infrastructure which is at par with global standards; strong focus on measuring and monitoring quality targets; fast turnaround times and the ability to offer 24x7 services based on the country's unique geographic location that allows for leveraging time zone differences; and a positive environment which encourages growth of ITES/BPO industry.

Transcription is one of the sub-segments of BPO. Transcription is the conversion of recorded speech into a written or electronic text format. Transcription services are provided for business – earnings calls; legal – courtroom proceedings; general – podcast, media, IT & engineering transcription, symposiums, interviews, presentations, lectures; or medical – patient summary.

Transcription services are provided for a wide variety of audio and video media, ranging from traditional analog tapes to high-definition digital media of various formats.

High-quality transcription is characterized not only by an ability to listen and comprehend diverse vocabulary, but also by an ability to collect relevant information.

Transcription services are charged depending on per line, per word, per minute or per hour basis. Other factors that go into providing a custom quality professional end-product to the clients are: Prompt and professional service, quick turnaround time with highest possible accuracy, and cost effectiveness.

Friday, April 8, 2011

Delivering an Effective Presentation (From a Transcriber’s Point of View) -2/2

(Topic continuing from the previous post)

3. Access to Video Recording: Nowadays, many of the speeches at high level meetings, conferences, and forums are covered by video recordings, generally for archiving purposes. Such recordings are of tremendous help to the transcribers because it helps in speaker identification, helps resolve confusion on slide identification if the video contains slide footage, and to a lesser extent, helps in identifying many words merely through speakers’ body language and gestures. However, this is only true when the video is of good quality and it has been supplemented with PDF or PPT references.

One issue that transcribers sometimes face with the video footage is that many times, the video shots of PDF and PPT are sent as references. This should be avoided as much as possible because such video footage, due to light reflection, uneven focusing, and wrong handling of the camera are of far lesser quality than the actual PPT or PDF document. The words written in the PPT slides then cannot be clearly deciphered leading to ambiguous interpretation on the part of the transcribers.

4. Accent and Spoken English Quality: The main element that the transcriber needs to produce a good quality transcript, and in many instances maybe the only one, is the voice file. Hence, even if the reference quality is not good, it can be made up by a voice file which is clearly audible, less accented, and without distortion. To deliver his message correctly to the audience, the speaker should speak in clear and correct English, keeping a moderate pace. The correct speech pace becomes even more significant for South Asian speakers as they generally have heavily accented voice and sometimes use faulty English sentences, which can lead to wrong interpretations.

5. Proper Placement of Recording Equipment: Generally, the recording equipment used in seminars and conferences is of high quality, but the placement can be faulty. The microphones are wrongly placed as a result of which they pick up side conversations, ruffling of papers on the dais, and noise from other electronic equipment placed close to the microphone resulting in distortion of the main speech. Hence, care should be taken that the microphone be placed at a proper place and proper distance to catch the speaker’s speech only and not any other sounds. In this regard, it is important to note that the reverberation quality of the hall should be checked beforehand; otherwise, the recorded speech will have echo effects making the speech difficult to listen to. If PPTs are being video recorded, the camera should be placed where reflection of lights on the PPT slides is minimum and the words are recognizable.

Keeping in mind the above points while delivering a structured speech such as a presentation or lecture will not only help in making the presentation lively and effective for the audience, but will also help the transcriber deliver a superior quality transcript which ultimately will lead to a satisfied client.

Wednesday, April 6, 2011

Delivering an Effective Presentation (From a Transcriber’s Point of View) -1/2

As transcribers, we come across various types of jobs that we have to transcribe. They can range from telephone conferences, where the speakers speak in very fragmented and unstructured sentences, to speakers delivering presentations in a high level forum or symposium. Although not too much can be done for unstructured speeches for providing a clear transcript, a lot can, however, be done for structured ones like in presentations, which can enhance the quality of the transcript.

While giving a presentation or while recording it, the transcript quality may not be foremost on the minds of the speakers and the organizers; however, for a good quality transcript, a number of factors come into play. These factors can be internal, meaning factors the transcription provider has control over and external, meaning factors which are beyond a transcription providers’ control. These external factors are in control of the recording company which records the audio/video; the organizers of the symposium/forum which choses a hall with, say, less reverberation; to the speakers themselves who prepare the PowerPoints and speak fluent English at the correct pace. Today, we will look into some of these external factors and look into what can ensure a better quality presentation and, thus, a good quality transcript.

1. Proper　PowerPoints: Speakers giving presentations in symposiums and forums usually have a PowerPoint presentation (PPT) to help them present their ideas and thoughts in a very structured manner to the audiences. These PPTs, when made correctly, can greatly enhance the speakers’ thought process and help present a clearer picture to the audience. There are a few pointers that should be kept in mind while making an effective PPT.

a) No. of Slides: The speaker should keep in mind that the number of slides of the PPT should be kept at a moderate number to make the right impact with the audience. Fewer slides will give very less detail about the idea presented and thus the audience will have to rely mostly on the speaker’s speech for information flow, whereas too many slides will lose the attention of the audience as their focus will be divided between reading the content of the slide and listening to the speaker.

b) Logical Flow of Information: The flow of ideas presented in the slide should be logical. Disjointed flow of information in the slides will create confusion in the minds of the audience.

c) Grammatically Correct English: The sentences used in the slides should be grammatically correct. Many speakers bring with them slides that are marred with improper English which leads to ambiguous sentence formation and, ultimately, ambiguous interpretation on the part of the audience.

These PowerPoints are frequently sent to the transcribers as references along with the voice file. Hence, keeping in mind the above points while making a PPT will not only help the speaker deliver a smooth and logical presentation, but also help the transcriber enhance the accuracy of the transcript.

2. Using the PowerPoint Presentation Correctly: For an effective presentation, the speaker should not only prepare a good PPT, but should also complement it by presenting and elaborating on ideas given in the PPT in a logical sequence and at a correct pace. The speaker should, as much as possible, stick to the PPT and not jump across the slides. Also, the speaker should pick a slide, elaborate on it, provide examples where necessary, and then move on to the next slide. Jumping across slides will lead to confusion for the audience. It will also create confusion for transcribers because most of the time, they have access to audio files of the presentation only without any access to video footage.

(To be continued to 2/2...)

Wednesday, February 9, 2011

Equipments we use - 3 (Software-2)

In this part of the blog, you will see some details about softwares used in the actual process of transcription. So far, you have seen how the source (audio/video) is obtained and processed using different softwares.

<The series back number>

Equipments we use - 1 (Hardware)
Equipments we use - 2 (Software- 1)

The actual process of transcribing the processed audio/video source begins now.

1) Audio/Video Player: This is the software used to play the source either in audio or video format. The freely available softwares in the market today have limited capability of playing files only in certain formats like mp3, wav, wma, etc. Transcribers have to carefully choose players available according to their requirements. Features include: the basic – play, forward, rewind, foot pedal compatibility, hot keys – and the advanced – bookmarks, timestamps.
An important point to remember chose players where playback of videos while using the foot pedal is smooth and does not often as is seen in most softwares, thereby affecting sound output.
a. Play, pause, forward, rewind (Basic features).
b. Foot pedal compatibility.
c. Hot-keys (forward, rewind, jump seconds, etc).
d. Bookmarks. You can bookmark audios in places where you have left an unclear for review. The bookmark come very handy when there are multiple unclears in an audio which you can review later.
e. Timestamps: You can copy time stamps to mark unclears points in an audio.
f. Background noise removal.

Express Scribe Interface

Examples: Express Scribe media player

2) Microsoft office/Macros: Our transcripts are typed in MS Word documents. Auto-text and auto-correct are used to speed up the typing process. Macros also come in very handy to simplify repeated tasks as checking gender inconsistencies and spelling errors.

3) Dictionaries/References: Numerous reference dictionaries are available freely in the market. Oxford, American Heritage, Collins. The latest dictionaries offer advanced features like Wild Card Search, Hot Key functions, etc.

a. Wild Card search. Ever wondered when you can partially hear a word but do not the spelling, here’s where wild card comes handy… The wild card feature can give you a list of the words containing the characters you were searching for.
b. Hotkeys. One can use hotkeys to activate the dictionary and search the selected word.

Oxford Dictionary interface

WordsWeb Pro Interface

Examples: WordWeb Pro

Thus, these are the different softwares that we use to make our transcription process smoother and efficient thus ensuring an optimum-quality transcript.

Wednesday, January 12, 2011

Reliability of Voice Recognition Technology

Voice recognition technology refers to that technology that recognizes spoken word and converts it into text. There are many voice recognition softwares in the market, the most popular one being Dragon Naturally Speaking Software by Nuance. Here, we will talk about this particular voice recognition software.

Dragon Naturally Speaking

Voice recognition software is very helpful to an individual whose keyboard skills are poor. The software ‘Dragon’ is designed in such a way that the user has a proper interface with the software and its features to the fullest extent possible.

To start with, the software needs to be trained. Every new user creates his individual profile and then starts the procedure to train the software. Dragon comes with a module in which the user needs to train it with regards to the tone of the user’s voice. This module has a series of steps to be followed so that the software gets accustomed to his voice. Once the user is comfortable with the commands of the software, he can work with the software on live jobs.

As a transcriber, a live job means the audios to be transcribed. With this software, a transcriber can listen to the audio and speak out the lines as they are heard. This software also contains intelligence which is of added advantage to the user. If a user narrates a line, the software is able to interpret the content to some extent and not confuse with phrases like ‘I scream’ and ‘ice cream’ according to the context.

This software is also helpful for those who lack a proper English vocabulary. Difficult and rarely used words like ‘habiliments,’ ‘sacerdotal,’ etcetera, if spoken properly and clearly, can be typed out by Dragon without the transcriber knowing these words. It is also useful in a similar way in case of names of places. The more this software is used, the more it gets accustomed to the voice and tone of the speaker thus enabling it to grasp the context and content matter of the file. This helps to easily get hold of some words which are time-consuming to find in some cases.

Using 'Dragon' for the actual work

At times, a file can have some medical terms specific to some disease. In these cases also Dragon helps to some extent and deciphers the words with regards to the contextual meaning of the statement. A disease name such as amebiasis can be spelt as pronounced by Dragon more or less in a correct way provided the user narrates it correctly.

In general, even simple English words which frequently appear in a file; for example, words like ‘differentiation,’ which are long and tedious to type, are easily taken care of by Dragon once it gets used to the speaker’s accent, tone, pronunciations, etcetera. All in all, Dragon reduces the time spent on typing the file and enables a transcriber to devote more time to research. This results in optimum quality transcripts.

Wednesday, January 5, 2011

Speaker Identification – Expectations and Limitations

Speaker Identification is the process of identifying different speakers in the audio file/transcript. Usually, speaker identification is done by one of the following methods.

1. Reference Material/Agenda – In case of a meeting/conference/symposium, if we have the minute-to-minute agenda of the meeting along with the names of the speakers, we try to match the speaker to the speaker names as given in the agenda. This is the simplest way to get accurate speaker names. So, we always encourage our clients to send us the agenda or draft of the meeting/conference/symposium at the time of job confirmation so as to get accurate speaker identification for their jobs.

2. Video Reference –Speaker identification can be made easy if the client provides professionally recorded video files as references where the focus is completely on the speakers. In such cases, the transcriber can easily identify the speakers by viewing the video.

3. Googling/Research skill based – There are many cases where there is no reference or speaker names provided by the client. At this time, it is a very challenging task for the transcriber. The transcriber tries to identify the voices of different speakers, simultaneously differentiating them into Speaker1 or Speaker2. If the speakers identify themselves while speaking, the transcriber then uses his Googling/search engine skills to find out the name on the internet. He then tries to relate the speaker name to the content of the file so as to judge whether it might be the same speaker. The limitation in this case is that there is a high chance of identifying the wrong speaker as the internet is not a very reliable source. This is because the search engine shows up a very wide variety of results with the same name.

4. Voice differentiation – This is the most difficult method. If there is no reference material or agenda and if there are more than 3 to 4 speakers in the audio, then it is very challenging to differentiate the speakers based on their voice tone. This is especially so when it is a discussion where the speakers’ voices overlap and one cannot decipher what each speaker is saying. Identification of speakers is almost impossible if the audio quality is bad. In such cases, we try our best to use our listening skills to differentiate the voices to our best possible ability. The limitation in this case is that it is based on the listening skills of the transcriber and there are high chances of mix up of speakers.

As our standard service at Cripton, Speaker Identification is done as [Male] and [Female] or Interviewer or Interviewee. Having said that, our Transcribers/Editors always strive to do speaker identification to their best ability by applying one of the above-mentioned methods. But these methods have their limitations; for more than three speakers, it becomes difficult to identify a particular speaker without a set agenda as a reference.

Hence, the only way to expect correct speaker identification is to provide proper reference material in the form of agenda/draft or speaker names of the conference/meeting/symposium, video files, etc. Also, it is very important that the client uploads all the reference material along with the job itself and not at a later time as speaker identification is done at a primary stage when the transcriber is working on the document. Hence, it is always advisable for the client to upload all the appropriate reference materials along with the audio jobs. This will ensure the delivery of a transcript with accurate speaker identification.