In the area of EVP recording the debate goes on, which is better, analog tape recording or digital? This is a three part essay which will attempt to answer that question without becoming overly technical. I am not going to go into detail explaining Nyquist points, antialiasing techniques, intermodulation distortion, etc. However I will demonstrate how these factors enter into the equation. I will only be concerned with the results here, not the methods. If you the reader wishes to delve into those areas, the information is readily available in technical journals elsewhere.
I have divided this into three parts, each a factor in making the final determination. The first deals with speech itself, the complexities and how words are formed. This section applies equally to all forms of recording and in fact human speech in general whether it originates as an EVP or not. The second will explain how the speech is recorded and some of the problems encountered in doing so. Digital recording methods will be covered as well as their analog counterparts. Finally the last part will compare the two recording methods and detail how these methods compare and how false EVPs may be generated by the recorder itself.
Part One - The Complexity of Speech.
In order to understand how speech is recorded it is first necessary to understand exactly how it is generated. Speech is composed of two general components, vocalizations and fricatives. The vocalizations are those generated in the throat, including nasal sounds. Fricatives are those created by placement of the tongue and lips. Every speech pattern is simply a combination of these two components.
For sake of this discussion we will consider a single word and just a few of the many possible sounds that make up speech. Our example is, appropriately, the word, "Ghost". Say the word, and as you do, note exactly how your throat, mouth, and tongue are positioned to make each individual phoneme associated with it. The waveform below is an oscilloscope display of the word with each phoneme designated.
The first sound is the "G". Note that it is formed by placing the tongue at the back of the mouth, building up a little pocket of air behind it, then releasing it in a quick manner. This results in an explosive start followed by the "aaa" sound of the air escaping the throat accompanied by a vocalization. To demonstrate the importance of faithfully reproducing this sound, consider the "P" sound in the word "Post". Post and Ghost have two different meanings entirely, yet the only difference is the "P" sound. And the only difference between the "G" and "P" is where it is formed in the mouth.
"P" is formed by the lips, "G" by the tongue. If one views the sound on an oscilloscope there are only two factors that separate the two sounds, rise time and harmonic content. Because the "G" is created at the back of the mouth saliva tends to seal the opening between the tongue and the top of the mouth better than the lips which are drier. This results in the "G" being slightly more "explosive" with a faster rise time on the initial wave. The "P" is more restrained with a slightly slower rise time. The harmonics from the initial burst are also affected by the mouth on the "G" sound, since the "P" is formed by the lips this is not true in that case. I went into this detail here in order to show the importance of faithfully reproducing a sound and all of its properties, and how easily the failure in even one small area could result in a totally different word and meaning. That importance will become even more apparent later in this discussion.
Next consider the "O" sound in the word "Ghost". It is not a constant sound; it actually begins while the "G" sound is in progress. This is important to allow proper inflections of the phonemes as speech progresses. Failure to do this results in a mechanical sound, robotic in nature. In fact it was one of the early problems with computer synthesized speech.
But even the "O" sound itself changes as it is spoken. It is a vocalization, created by the throat at a nearly constant frequency. But as it is spoken the jaws are brought together and the lips are positioned in a circular manner. This creates a change in resonances in the mouth resulting in the varying sound. If viewed on an oscilloscope, an increase in harmonics is evident as this change occurs. As related to recording speech these harmonics must be faithfully reproduced in order to maintain speech quality.
Progression continues to the "S" sound. This is formed by placing the tongue just behind the upper front teeth and forcing air through between the tip of the tongue and roof of the mouth. There are no vocalizations associated with it, however as with the "O" and "G", it does begin just before the "O" sound is ended in order to allow for more uniform speech patterns. The sound itself is very rich in harmonics, and since it is a high frequency sound itself, requires an electronic device have a good high frequency response in order to faithfully reproduce it. The frequency is so high that the resolution at the sample rate I used to create the oscilloscope display here is insufficient. For me to have increased the sample rate to display it properly would have resulted in a display too large to fit your screen!
The "T" sound is made by pushing the tongue against the roof of the mouth and blocking the air flow making the "S". A small amount of air is trapped and builds up. The tongue is then released and this air escapes in an explosive manner. Since this occurs near the front of the mouth there is little buffering done to the sound. It has a very fast rise time, followed by the short "aaa" sound of the final air escaping from the mouth. All of this just to say a single word, "Ghost".
If we analyze every word in the language, and consider too the various inflections dialect place on them as well, it is apparent just how complex speech becomes. It also is apparent that if one is to use a recording method to capture such speech the system employed must be capable of capturing all of these characteristics. Failure to do so is going to result in misinterpretations of words or phrases. In the case of EVP, we have all of these considerations plus we are recording under very adverse conditions which make matters even worse. The next section will deal with recording speech in general.
Part Two - Analog and Digital Recording Methods.
The next step in determining whether to use analog or digital recorders is understanding exactly how each work and their limitations. For sake of this discussion we will assume a single microphone connected to a simple recorder, therefore no consideration will be made to equalization or mixer boards sometimes used in EVP work.
The Analog Tape Method.
The audio signal is received by the microphone, which converts it to its electrical equivalent. At this point it is very weak and not usable in that form. Analog amplifiers boost the level to that which can be placed onto magnetic tape. It is fed to a recording head where, along with a bias signal, it is used to magnetize particles on a magnetic tape. The magnetic strength of each particle on the tape is a direct representation of the signal level which placed it there. This is the audio signal which was recorded.
When played back, the tape passes a playback head. Often this is the same as the record head, simply reconfigured by the recorder for playback mode. As this tape passes the head, the magnetic domains on the tape induce a small voltage into the head. This voltage is a representation of the signal recorded on the tape. As with the recording process this signal is very weak and needs to be amplified. Analog amplifiers in the recorder perform this function and the output of the amplifier is fed to a speaker or headphones so it can be heard.
Along the way in both recording and playback there are certain equalization factors applied which compensate for characteristics of tape to respond at different levels dependent upon frequency and types of tape. Some of these are switches found on some tape decks such as metal tape, low noise, equalization, Dolby Noise Reduction, etc. The proper setting depends on your conditions, more about that later.
Digital Recording
As with analog, the signal is picked up by a microphone . It is very weak and is amplified by an analog amplifier to get it up to a usable level. At this point it is fed to an Analog to Digital Converter. There it is sampled at a high rate. By sampled The A to D converter simply looks at the instantaneous voltage of the analog signal, measures it to determine what the level is, and applies a digital value (voltage) to that point. The value determined is saved in a memory either in the recorder or as a removable chip. This is done thousands of times every second, thus a record is built up of the digital value of successive sample points. The speed which this operation is done is called the sample rate. The higher the sample rate the better the quality of the recording, but at the expense of either a shorter recording time or larger memory requirements.
Playback is just the opposite of recording. The memory is sampled at a rate equal to that which it was recorded. The digital value is read out of each successive memory location and fed to a digital to analog converter. There the digital value is read and converted back to its analog equivalent voltage. This instantaneous voltage is sent to a buffer and power amplifier. The buffer attempts to smooth out irregularities between sample points before the signal is output. Then it goes on to the headphones or speaker where it is heard.
Some recorders allow for sample rates to be adjusted. This allows one to choose either extended recording time or better quality. Some also allow one to alter the sample rate of playback effectively speeding up or slowing down the recording. Unlike analog, tape hiss is not a problem; consequently noise reduction is not generally required on digital.
Do I Use Analog Or Digital?
Now that you have a basic understanding of how each work, the question becomes which is better? For that you have to determine what you are going to do with the recorder. Since this essay is primarily interested in EVP work I am going to hold off on discussing that to the next part, giving an entire section to that issue. But here are a few comments regarding each for general recording purposes such as interviews with clients, logging investigations, etc.
Analog has the advantage of simplicity. You can easily pop a tape in or out if the need arises. Should a situation require a quick interview just grab the tape and start recording. Quality of recording is not generally an issue; it’s just a matter of being clear enough to understand. Any good quality analog recorder is more than adequate for that. Consider too that you may want to transfer whatever you record to a more permanent log of the case later; your recorder should also have a line output jack to allow you to do so.
Digital on the other hand is compact. Many recorders offer extended recording time so it is often a practice to simply let them run and record everything going on. Some also can apply a time stamp in the event something happens that you can quickly go back and review. The quality is generally not a problem here since most recorders are sensitive enough to pick up everything going on around them. One issue to consider as with analog is saving your recording. Since it is common to simply record everything if you log in digital, you will likely be transferring the recording elsewhere into a permanent log. Your digital recorder should have an analog line output for this.
In both cases, analog and digital, consider too that you may at some time want to type up a transcript of whatever is recorded. For that you will want to make sure your recorder is easy to pause / start. Generally the digital recorders are better in that regard since there is no mechanical system to operate.
Part Three - Evidence Gathering
So far we have covered speech in general and the methods of recording it. The applications have been limited to general everyday recording under what would be considered normal conditions. As far as the analog vs. digital debate for these purposes there really isn't much difference. It is simply a matter of preference, either will provide satisfactory results.
But when it comes to gathering evidence, such as EVP, the requirements become much stricter. The voice on the tape must not just be understood, it should be recorded with as much accuracy as possible. It should contain all the background harmonics and characteristics of the original since as evidence it may be subjected to analysis using laboratory standard equipment. It is by doing such in depth study its authenticity may be confirmed as a valid EVP instead of just a strange echo of someone in another room. To accomplish this degree of accuracy puts tremendous demand on the recorder used to capture the sound.
There are a few ways to improve the quality that apply to both digital and analog recording systems such as preselectors and multiple microphones. That is whole topic in itself, and will not be discussed here. Suffice it to say that using such devices will provide a similar improvement to any recorder regardless of the type employed. Since this report is concerned with the recorders themselves, we will only be concerned with the two recording methods, analog and digital.
The typical EVP is a very low level signal, often barely heard above the background hiss and other noise present on the recording. The dynamic range of the recorder is usually much greater than that needed to capture the EVP. So how does each react under those conditions?
The analog recorder by its nature has a certain amount of hiss generated by the movement of the tape across the head. Plus the electronics itself generates some hiss across the audio spectrum. These are the limiting factors to capturing low level audio. As a rule the use of a good quality recorder will minimize the amount of hiss. Plus, using good quality metal tape will improve the response of the analog system.
As explained earlier, the analog system picks up the sound and after amplifying it, generates the electrical voltage equivalent to the sound. This is an ongoing process, within the dynamic range of the recorder; the voltage can be virtually any level at all. Unlike digital which assigns a numeric value to the sample point, analog has an almost infinite number of possibilities. This applies even at very low signal levels near the noise threshold.
Digital has a limited number of bits available determined by the particular A to D converter used in your recorder. The more the better, but then cost increases. To explain this it has to get a little technical so bear with me.
Suppose the sample of your sound by the A to D converter returned a value of 6. That means the converter would set bits 1 and 2. (2+4=6). If the next sample was 8, only bit 3 would be set. Thus it can be seen that four bits give 16 possible combinations or 16 numeric values. Not nearly enough to record sound, but the concept is the same. The dynamic range of 4 bits would only give 16 possible combinations. 10 bits, a low quality audio, would yield 1024 possible combinations.
Now look again at our 4 bit example. The entire range is divided into 16 parts. Suppose the proper conversion of a particular sound sample was 6 1/2? How do you represent it? Answer is you can't! The A to D converter would choose either 6 or 7. Neither is correct but that is as close as you can get. This is a conversion error and would result in distortion or alteration of the signal.
Of course the solution is to add bits, making each represent a smaller portion of the dynamic range. That is exactly what is done, 10 bits divides the range into 1024 parts instead of 16. This would allow better representation of any value. But no matter how many bits you add there will always be some values that cannot be properly represented. There will always be conversion errors. Analog does not have this problem since there is no conversion in analog.
But there is yet another problem with using digital for EVP. Remember most EVP are very low volume levels. Suppose you are using a recorder with 10 bits, capable of 1024 possible combinations. That is over its entire dynamic range. Since EVP is a low level, it will be confined to only the first few lower order bits. Thus the entire EVP is not represented by all 10 bits, maybe only the first 5 or 6. Consequently the resolution is much lower, more along the line of our hypothetical recorder using only 4 bits. The conversion error of placing the entire range of speech into those few bits becomes even greater as was seen in the first example. The accuracy of the data recorded suffers even more.
Consider the other factor mentioned earlier, sample rate. Remember the various sounds that make up speech. Most "intelligence" that makes up words is contained in the fricatives. This is easily proven by simply whispering. The vocalizations are not present, yet most words can be whispered quite well. When we discussed speech we found most fricatives are contained in the upper frequencies of the speech band, which is the region between 1 and 3 kHz. To properly record speech it is imperative these frequencies be faithfully reproduced.
Without going into the mathematics behind it, the Nyquist Point needs mentioned here. If a frequency is to be copied reasonably accurately, enough samples must be made on each cycle. For a standard sine wave, which is what most compact audio recorders consider, a minimum of 7 samples must be made. If we consider the highest frequency of normal speech fricatives, 3 kHz, that means that 3,000 times 7, or 21,000 samples per second is the minimum needed to reproduce a sine wave.
But there is another vital consideration; that is speech is not always a pure sine wave. Remember the explosive sounds of the "G" and "T" sounds in the word Ghost? These are based on a sudden release of energy; a puff of air. For this we need to consider the sound as a square wave, that is one which has a frequency and rises instantly to its high and low levels on each cycle. By definition a square wave is a sine wave comprised of the fundamental frequency and an infinite number of its harmonics. Of course infinity here would be an unobtainable ideal, so for audio applications we can use what is known as a pseudo-square wave. This will give a reasonably accurate square wave, good enough for audio applications. This can be obtained by using the first 7 harmonics. To go back to our example we take the 21,000 samples and simply multiply it by 7. That means a sample rate of 147,000 samples per second, or 147KBPS.
What does this mean? Simply that in order to faithfully reproduce speech the minimum sample rate must exceed 147 KBPS. Most compact recorders do not approach this. Some are as low as 16 KBPS in their extended play modes. When one tries to record a frequency too high with respect to the sample rate a heterodyning sound may be generated. This may be mistaken for voice, a false EVP. The obvious fix for that problem is to keep the sample rate high on those recorders which allow for various settings. But that will obviously use memory much quicker shortening record time.
Another little consideration is the recording method. So far I have assumed that the recorder is creating the file in a .wav format. Most do not. There are several other recording methods such as .MP3 which use compression techniques to make smaller files that supposedly contain the same data. They do not. Compression is another method to extend the record time on digital recorders using the same memory capacity. Because of the compression some bits are manipulated in a manner that allows the recorder to simulate what was recorded in playback. It still may sound good, but the data has been altered in ways which make it difficult if not impossible to analyze using laboratory equipment. As evidence it has been severely compromised.
So for those who insist on digital the answer is use a recorder with a high number of bits in the digital converters, a sample rate over 150 KBPS, and records .wav files directly. Sure such machines exist when you get into professional grade systems and are willing to spend hundreds if not thousands of dollars. The problem is I don't know of one investigator doing EVP work with that grade of digital equipment. And why would they when analog stereo recorders capable of meeting the same criteria for reproduction are available for much less money.
One final comment regarding why some want digital instead of analog. I hear often the investigator say, "I use digital so I can put it on my computer." That is fine, you can put analog on your computer too. All you need to do is connect the line output from your analog player to the mic input jack on your sound card. Use whatever audio recorder program you want on your computer. You can save it in whatever format you choose, the same things apply as with digital recorders though. Best quality is achieved by sampling at a high rate and recording in the .wav format. But why put it on your computer in the first place? With the exception of wanting to send a representative copy to a friend via e-mail or posting it for others to hear there is no reason to put it there. If you managed to capture something you obviously should keep the original analog tape to work with. That is your master copy. You can make numerous second generation copies from it so degradation is not a factor in only 2 generations. If you are going to do any processing, the master copy is analog so the best thing is process using analog as well. (That is the subject of another discussion.)
In summary, Analog versus Digital for EVP work, analog is superior because of its improved responses and linearity at low volume levels. Digital is fine for normal conversation or event logging while doing an investigation. Using digital for EVP will result in a greater number of suspected EVPs; however most will be false because of artifacts that are created by the recorder itself unless professional grade equipment is used. The portability advantage of digital is not a major factor since it is not advisable to carry any recorder around while recording. Movement and disturbance will result in false positives or may mask an actual EVP. Hopefully this report has given the reader a few considerations that result in more reliable, even if not as many, EVPs.