The quick advancement of technology in recent years
has altered the entire dimension of communication. Now a days people are more
interested in hands-free communication. In such a situation, the use a typical
loudspeaker and often a high-gain microphone, in place of obsolete telephone
receiver, looks more justified. This would allow multiple persons to
participate in a conversation at the same time such as in a teleconferencing
scenario. Yet another benefit is that it would allow the person to have both
hands free and to move in the room at their ease. However, the presence of a
significant acoustic coupling between the loudspeaker and microphone will
generate a loud echo that would make conversation a little difficult. The
remedies to these problems is the removal of the echo with an echo suppression
or echo cancellation algorithm.
However, the echo suppressor has a foremost
disadvantage as it supports only half-duplex communication. Half-duplex
communication allows only one speaker to talk at a time. This drawback
motivated technocrats to the invention of echo cancellers. An important aspect
of echo cancellers is that full-duplex communication can be sustained, which
permits both speakers to talk at the same instant. The three basic components
of an echo canceller are an adaptive filter, a doubletalk detector and a non
linear processor. The adaptive filter generates an almost exact replica of the
echo and subtracts it from the combination of the produces echo and users
speech.
The doubletalk detector senses the doubletalk.
Doubletalk occurs when both users speak simultaneously which stops the adaptive
filter in order to avoid divergence.. In order to avoid clipping, a noise gate
is used as a non linear processor . The noise gate allows a threshold value to
be set and all signals below the threshold are removed. This action makes sure
that only residual echoes were removed in the last stage. Till date, the real
time implementation of AEC is performed by utilizing both a VLSI processor and
a DSP processor. Since there has been an advancement in computing field, all
essential algorithms are implemented in MATLAB.
Acoustic Echo in Telephony
Advent of hands free telephony and teleconferencing
has enabled users to communicate with others without holding the device with
their hand during conversation. However, in such cases, numerous detrimental
phenomena can significantly harm the quality of speech being communicated.
Acoustic echo is perhaps the most troublesome amongst those. Acoustic echo is
produced where loudspeaker and microphone of a same device get acoustically
coupled (fig. 1.2).when a cell phone is set on hands free off , sound wave coming
out of speaker doesn’t have sufficient power level to be sensed by microphone
of the same cell phone.
In this scenario power level of speech wave is so less
at output of speaker that we can’t listen to the far end talker without holding
the cell phone with its speaker near our ears (fig. 1.1). In this case by the
time the sound reaches to the microphone of the same cell phone it becomes
practically insensible. Therefore, if near end talker speaks in the microphone,
only his/her sound will be sensed and effect of far end talker’s speech through
speaker of near end talker wouldn’t be substantial.
Now assume a situation where the near end talker has
set his/her speaker hands free. Sound wave coming out off speaker of cell phone
will now affect microphone adversely. In case of hands free mode power level of
sound emitted from speaker of near end talker would be several times larger
than that of former case. Having traveled some distance sound wave will reach
to microphone and will be picked by the same. Therefore, along with near end
talker’s voice an additional signal due to hands free environment will
propagate through channel and will be received by far end talker. as the
received signal contains two components; one signal is the speech of near end
talker and other is the delayed version of his/her own signal arisen due to
hands free environment.
Due to above phenomenon far end talker listens to his
own voice after considerable amount of time as an echo. Conditions may become
more severe if near end talker is sitting in a room having several reflecting
objects or having poor acoustic immunity .It is know that if the time interval
between echoes of near end talker doesn’t exceed 1/10th of a second it can go
unnoticed for far end talker. More over, as in mobile communication environment
user can move any where in the room the appearances of echoes would be
different for different locations in the room. This dynamics prompted engineers
to carry out time varying modeling of acoustic echoes by estimating echoes
path.
Acoustic Echo Modeling
Echo is a phenomenon wherein a time delayed and
distorted copy of an actual sound is reflected back to the source. With rare
exceptions, conversations occur in the presence of echoes. Echoes of our speech
are heard as they are reflected off the floor, walls and other objects in the
proximity. If a reflected wave reaches after a very little time of direct
sound, it is assumed as a spectral distortion or reverberation. Nevertheless,
when the leading edge of the reflected wave arrives a few tens of milliseconds
after the original sound, it is perceived as a distinct echo. Since the advent
of telephony echoes have been an issue in communication networks.
In particular, echoes can be generated electrically
due to impedance mismatches at various points along the transmission medium.
The most important parameter in echoes is called as end-to-end delay, which is
also known as latency. Latency is the time between the generation of the sound
at one end of the call and its reception at the other end. Round trip delay,
which is the time taken to reflect an echo, is approximately twice the
end-to-end delay. Echoes become annoying when the round trip delay go beyond 30
ms. Such an echo is nominally heard as a hollow sound. Echoes ought to be loud
enough to be heard.
Those less than thirty (30) decibels (dB) are unlikely
to be noticed. However, when round trip delay go beyond 30 ms and echo strength
becomes more than 30 dB, echoes become steadily more severe. However, not all
echoes degrades voice quality. In order for telephone conversations to sound
comfortable, callers must be able to hear themselves speaking. For this reason,
a short instantaneous echo, termed side tone, is deliberately inserted. The
side tone is coupled with the caller’s speech from the telephone mouthpiece to
the earpiece so that the line sounds connected.[9] Mathematically if x(t) is
the original signal then it’s one of the components of echo can be represented
as ax(t-t1) where a is the attenuation factor and t1 is the delay encountered
by the sound after reflecting from a surface. In case of multiple path
available for reflection the composite signal at the input of microphone can be
written.
C(t) = x(t) + a1 x(t-t1) + a2 x(t-t2) + a3 x(t-t3) +
……......an x(t-tn) (1)
‘Where’ c(t) is composite signal , x(t) is original
signal a1, a2, a3, an…. are attenuations suffered by sound from corresponding
paths and t1, t2, t3, t4…….tn are underlying delays. Since today in almost all
cases digital technology prevails so the representation of a sound wave is
carried out by sampling and quantizing the electric voltage signals at twice
the nyquist rate and hence the composite signal without AEC would be
C(k) = x(k) + a1x(k-k1) + a2 x(k-k2) + a3 x(k-k3) +….
an X(k-kn) + n(k) (2)
‘Where’ additional term n(k) is the noise due to
digitization of analog voltage signal.
Room Impulse Response and it’s Estimation
When a human being speaks in front of a microphone in
an open atmosphere having no nearby objects practically no problem of echo is
observed because sound traveling in open atmosphere won’t get reflected. But in
case of a closed room (fig. 2.1) microphone receives multiple signal including
direct one. It may be assumed that there exist a system whose input is a
original speech signal at NET and output is the signal received by microphone.
If a person produces acoustic impulse in front of
microphone microphone won’t receive that impulse directly it would rather
receive the signal coming from different paths having reflected off different
surfaces. So the signal sensed by microphone would be as that of shown in fig.
2.2 where a series of time delayed impulse would come in the picture.
Noise Gate as a NLP
Noise gate is used as a NLP, which is a type of
dynamic processor. Noise gates belong to the category of expanders. As the name
suggests, it boost up the dynamic range of a signal so that low-level signals
are attenuated significantly while the higher-level signals are neither
attenuated nor amplified. The noise gate expansion can be taken to the extreme
where it will greatly attenuate the input or eliminate it completely leaving
only silence. While expanders are immensely difficult to use effectively, noise
gates are a very simple and effective way of reducing the apparent noise level
in audio signals. The noise gate provides a method of turning down the gain of
an audio signal when the signal value falls below some threshold value. The
threshold value needs to be large enough that only the background noise goes
below but not so high that the audio signals are cut off unnecessarily. Noise
gates are too often used to extricate noise or hiss that may otherwise be
amplified.
Comments
Post a Comment