A SURVEY ON IMPLEMENTATION OF AUDIO OVER 802.15.4

Streaming an audio over wireless networks is gaining very popularity now days. In view of this IEEE 802.15.4 standard is most used protocol for streaming an audio. It is the Wireless Personal Area Network (WPAN) providing low data rate, low power which is need for long battery life. While streaming an audio the most important factor is the available bandwidth, so the speech samples must be compressed. By compressing an audio a lot bandwidth can be saved, this results in the more samples of speech to be sent on the available bandwidth. In this paper we present a survey on the implementation of audio over 802.15.4 by considering different parameters and speech vocoder because of availability of limited bandwidth.


INTRODUCTION
IEEE 802.15.4 is the unique standard protocol designed for low rate wireless personal area network (LR-WPAN) [12]. It is especially designed for low data, low power consumption, low cost wireless connectivity. It accommodates 250kbps of data rate in 2.4GHz band, 40kbps in 915MHz band and a total of 16 channels are allocated in 2.4GHz band [12]. The physical and MAC layer of 802.15.4 provides the basic functions required for an audio streaming. The physical layer provides functions like activation and deactivation of radio receiver, link quality indication, clear channel assessment for CSMA/CA, data transmission and reception and many more. Likewise MAC layer also provides many functions as synchronizing to beacons, association and disassociation, etc.
While streaming an audio over 802.15.4 main limitation occurs in case of bandwidth. The bandwidth requirement for an audio signal is from 15Hz to 20 KHz. This 20 KHz is the maximum allowable limit as human ear cannot be able to hear beyond this frequency range. So the sampling must be done at 40 KHz according to the nyquist criteria. The reliable sampling rate for speech communication is 8Kbps as typical used audio is between the ranges 20-4000 Hz. An audio with 8 KHz sampling rate and 12 bits per sample [2] would require a bandwidth of 96kbps, which is quite high for transmission of audio over 802. 15.4, as its own bandwidth is limited. Audio is of two types; mono and stereo. In mono, all the audio signals are mixed together and routed through a single channel while stereo uses two independent audio channels and reproduces sound having specific level and phase to each other. It is important to note that which system we want to design, mono or stereo.
Sometimes it is better to go with mono instead of designing poor two channel stereo system. For lower bit rate (<64kbps) mono is used, and for higher bit rate (98Kbps and 128Kbps) stereo is used. In spite of bandwidth limitation there are several also factors which must be considered like packet loss, end to end delay, jitter, latency etc.
Bandwidth limitation can be overcome by using audio compression techniques. In this respect various speech vocoder are released to compress speech. Different vocoder is used by different researchers explained in below papers as per their research requirement.
By compressing audio a lot of bandwidth can be saved. In [3] 56.25% free bandwidth is available and in [2] with the adoption of speech vocoder and silence, bandwidth occupancy reduced from 13% to 5.2%. Thus this free bandwidth is used for various other functions like sensor, control and monitoring system [5]. The reduced speech bandwidth now cause more speech frames to be send over the available bandwidth. All the factors mentioned above also are carefully examined for successful transmission and reception of audio with low error rate.
In this paper we present a survey on audio over 802.15.4/ zigbee. Section 2 contains the summary of different papers studied. Section 3 contains various parameters important while streaming an audio. Section 4 discusses the research gap .In section 5 we conclude our survey.

Brief Summary of Papers Studied Regarding our Survey
In [1], author developed a embedded system for streaming voice in real time over IEEE 802.15.4. Real time constraints become evident in presence of bandwidth, memory, computing resources.

Technology Used:
A well known psychoacoustic model based on FFT signal decomposition and haar wavelet transform is used for audio compression. Compression codec used are MPEG-1 model and public-domain Vorbis codec. The various tasks involved at transmitter section such as acquisition task; to reduce capacity overhead, compression task to compress data, packaging task to pack the encoded stream. All the tasks involved are independent of the nature of the audio signal. Similarly on the receiver section, decompression task runs. This whole implementation runs on the top of ERIKA (Embedded Real-Time Kernel Application), modular open source operating system.

Results:
The developed system encodes and transmits 512 samples in about 47ms, so maximum sampling rate for audio stream is 11KHz. A specific deadline is being set for the arrival of packets. All the packets arriving after the specified is neglected. Compression codec used reduces the bandwidth by compressing the speech frames. The audio signal generated has similarity with original signal (when comparison is made) in lower frequency range differences in comparison at higher frequencies.
Limitation: The accuracy at lower frequency is due to limited sampling frequency of 11.5 KHz against the optimal value of 44.100 KHz. So there is lack of accuracy in upper part of the spectrum. This can be overcome if the acquisition task is scheduled at higher frequencies. The developed system can be affected by low processing power so with more powerful hardware broadcasting of wireless signals can be easily manageable. It does not focus on the factors while streaming audio like jitter, packet loss and end to end delay.
In [2], 802.15.4 radios are used for audio implementation. Basic focus is on the silence detection of an audio which improves the overall bandwidth. Basic idea is that the normal conversation contains 50-60% of silence.

Technology Used:
In place of external vocoder inline software ADPCM (Adaptive Differential Pulse Code Modulation) is used for audio compression, in [1] also external vocoder is not used. A range is defined for silence data as 0X7f0 to 0X80f. any value between this ranges is treated as silence data. Nonsilence data is encrypted by ADPCM (Adaptive Differential Pulse Code Modulation) algorithm. An analog input signal having range (0-3V) is used while in [1] either (-3,3V) or (-5,5V) is used. This implementation is done on Jennic-5139 µC kit, a fully functional kit.

Result:
The addition of silence detection reduced the conversion time from 3ms to 1ms. This also reduces the employed bandwidth. An audio streaming with 8Kbps sampling rate with 12 bits/sample is used. Two basic advantages we gain here is the reduced data size, and the ADPCM compression time. The results show that jitter is considerably reduced by taking the proper buffer size or number of samples buffered in the system.

Limitation:
Silence detection does not have more advantage over jitter reduction. The compression technique is not as good as it seems to be. Various new and improved algorithms are available. Sometimes the specified range for silence contains real audio, results in increase in latency.
Likewise in [1] and [2], [3] also introduces a new innovative method keeping in mind the limited bandwidth of IEEE 802.15.4. An algorithm is introduced in [3] for perceptual selection of voice data aiming at reducing speech flow bandwidth along with voice data protection technique preserving speech quality against the packet loss. This whole technique is developed by considering the application of emergency management support whenever disaster occurs.

Technology Used:
The compression standard adopted is ITUT-G.711.In the proposed approach, only perceptual packets being sent. The perceptual importance is expressed in terms of distortion, which is introduced by packet loss in the received speech flow. Higher the speech segment energy, higher is the distortion. So each packet segment is evaluated, if the segment energy is higher than threshold energy then segment is compressed and sent. A piggybacking based forward error correction (FEC) is used along with perceptual selection algorithm for data protection. The main idea is storing and duplicating the sent segment of packets until the data payload size in MAC reached to maximum.

Results:
According to the results in which the accepted MOS is 3.5, so bandwidth reduction reached up to 30.8% at the cost of MOS(Mean Opinion Score) reduction equal to 0.88. The proposed protection technique shows better results at low values when selected segment to send high value of MOS. Thus the effect of both perceptual selection and data protection techniques results in bigger value of MOS still requiting lower values of bandwidth at MAC layer [3].

Limitation:
If perceptual selection is done randomly, then both silent and active packets can be discarded, thus lowering the end-to-end speech quality. Due to the introduction of two introduction (fields of both current and redundant speech segments) MAC data payload size increases.
Different speech compression is used in [4] for real time speech transmission over 802. 15.4. The author presents a elaborative analysis of Linear Predictive Coding (LPC) vocoder parameters and simulation using system C.
Technology Used: LPC (Linear Predictive Coding) is suited for low bandwidth transmission in denser networks. For optimization of parameters MATLAB is used. The algorithm uses a autocorrelation matrix to determine pitch period and filter taps. A pitch period is determined by searching a repetitive pattern inside the matrix. If pattern is found, it is marked as "voiced". Otherwise, the frame is marked as "unvoiced" and pitch period is zero [4]. If the frame is unvoiced, the decoder creates a white noise signal; otherwise it creates a train of pulses. During LPC compression we only need to transmit 8 filter taps, gain pitch period and a flag indicating whether it is voiced data or unvoiced data [4]. With floating point arithmetic a compression ratio of 6 can be achieved.

Results:
Simulation is carried out regarding quality of LPC (Linear Predictive Coding) algorithm implementation using MATLAB. The quality of signal increases only up to theoretical limit of 8 taps, as after this MOS saturates. Results shows that only half overlapped frames are chosen in order to make windowing easier to implement. Payload size is only 19 bytes, 12 times less than without compression [4].
Limitation: Superframe order equal to 2 is used, so that enough GTS can be allocated to 7 nodes, thereby saving power. But packet loss will occur by using longer superframe. More speech frames lost, greater is the corruption while reconstructing signal.
As in [1], [2], [3] and [4] secure speech communication is not been discussed while [5] describes a network for high a quality and secure speech communication. Security is implemented and dynamically forming GTS method is proposed. Results: By dynamically form and deform of GTS leads in reducing GTS wastage. A special data frame is issued by coordinator requesting device or GTS. An average of 40ms is required for forming GTS in one direction. Also on board a Push to Talk (PTT) is allocated to form and deform GTS.

Limitations:
With the implementation of security, security improves but throughput reduces because of the extra header introduction. This extra header reduces payload size for data Packet loss and latency are the main factors in performance evaluation to find suitable operating range has been discussed in [6]. Also a Link Quality indication (LQI) based cross layer has been proposed for adaptive streaming and its comparison is done against non-streaming approach. Also previous discussed works does not consider the effect of ACK (acknowledgment) ON or OFF.
Technology Used: Non-beacon enabled mode is used with media streaming at 10 seconds and stop at 190 seconds. Performance is evaluated at different bit rates ranging from 16Kbps to 44Kbps. For non-adaptive streaming, a CBR (Constant Bit Rate) with audio stream with bit rate 128Kbps starting at 10 seconds is used. Adaptive streaming is done by using feedback information from receiver side. The LQI (Link Quality indication) measurement is done using receiver energy detection, signal-to-noise ratio (SNR) and other methods. The LQI value is checked against threshold, if the value is below the threshold, a feedback is send from receiver to sender. Then sender applies a decision whether to lower or increase the bit rate of streaming audio. The stream bit rate can be lowered using trans-coding or stream switching techniques. The sender can increased the media stream bit rate if no LQI is received within the declared time. The proposed system is implemented using NS2 Simulator.
Results: From the results it is shown that percentage of packets dropped at sender MAC is more when MAC ACK is set ON as compared to when it is OFF. While at receiver MAC percentage of packet dropped is more when MAC ACK is set OFF. Similarly average latency is high in case of sender sets MAC ACK ON as compared to when it is OFF. The percentage packet loss is 0.07% when this method is used, while in nonadaptive method loss is 6.28%. Also, the average delay is 7.8msec with this proposed method while in non-adaptive streaming, it is 519msec.
Limitation: If because of certain delay/ if ACK is lost, receiver does not provide appropriate feedback information to sender, thus sender continues to send bit stream in the previous pattern which may lead to corrupted signal upon reconstruction.
The feasibility of supporting voice communication over Zigbee/IEEE 802.15.4 presented in [7]. Two type of voice communications are considered namely, Voice over Internet Protocol (VoIP) and Push to Talk (PTT). Voice quality is characterized using R-factor, packet loss, delay and jitter. Voice over Zigbee is investigated by extensively studying about Zigbee technology.

Technology Used: Results carried out using NS2 simulator for VoIP and PTT. The transmission range is 15 meters and CSR (Carrier Sense Range) range is 15/30 meters.
Results: It is concluded that between two directly connected nodes, two g.729a VoIP calls are supported. As the number of hops increase to three, support of VoIP becomes unreliable. It is found that about 17 PTT sessions are being supported. Like in VoIP, here also as the hop count increases to three, no PTTs can be supported. To reduce packet-loss rate call admission control must be implemented in ZigBee networks and to reduce delay, effective bandwidth must increase by reducing contention.
Limitation: Voice cannot be supported when linear communication is beyond two hops. So it lacks multi hopping. As Zigbee networks designed for very low traffic level, so it is uncertain to have simultaneous cross traffic.
All the papers discussed above uses voice codec for audio compression, so [8] presents the evaluation and implementation of voice codec in Z-phone project. It presents all the criteria in selecting the most suitable codec and its performance. Results: Perceptual Evaluation of Speech Quality (PESQ) was used to perform tests took input signal which is WAV file consists of 51 different male speakers and degraded signal (wav file result from encoding and decoding of original signal). The test is not used as evaluation of speech quality but only gives indication of correctness of porting. Speex uses 20 ms speech frame, so task (data transfer/encoding) must be executed in less than 20ms. Execution time exceeds 20ms limit for the values of complexity higher than 2 for 11Kbps and higher than 4 for 8Kbps [8].
Limitations: During codec implementation, it is not possible to achieve bit exactness. Differences in arithmetic and rounding between BelSigna 300 and speex source code affects bit exactness. BelSigna 300 rounding mechanism is implemented as it provides better accuracy.
In [9] a comparison between narrow and wideband coded speech is presented. The narrowband represents a bandwidth of 300-3400Hz while wideband represents a bandwidth of 50-7000Hz. Here six listening tests are performed to evaluate the difference between narrowband and wideband coded speech signals. Adaptive Multirate Standard (AMR) is used in order to make difference easy to read.
Results: From the results it is shown that wideband coded speech signal is preferred over narrowband speech signal. The lowest bit rate of 6.6K of wideband codec got better values than narrowband codec (AMR-NB 12.2K). It is seen that wideband speech coding gives better performance.

Limitation:
In [9] wideband codec is preferred but narrow band codecs gives a much good results which is not discussed here. There is much more maturing is required in these codecs to achieve much better quality.
In [10] a speech recognition algorithm for wireless actuator control is presented. The speech recognition unit is based on 16-bit Texas Instrument microcontroller type. It is an ultra low power microcontroller sufficient for speech recognition algorithm. The reliability of this algorithm is based on many factors like environmental condition, how far the microphone speaker is, etc. The actuator used just represents three LEDs. In real-life application, it is used in buildings, air condition and in household appliance.
In [11] simulation results of voice over IEEE 802.15.4 are presented using ns-2 simulator. Firstly, node performance is analyzed in terms of maximum throughput, the maximum coexisting number of voice and data nodes, etc. Total simulation time is 500ms. The sensor used generates CBR traffic of 0.05Kbps, 0.1Kbps, and 0.4Kbps. The simulation results show that the loss rate and delay of voice node increases as number of coexisting data sensor nodes becomes greater than 79 [10]. For 16Kbps codec, up to two voice codecs can coexist along with multiple data sensor nodes [10].
Thus, various voice transmission applications can be implemented on devices using IEEE 802.15.4 LR-WPAN protocol. The quality of streaming audio over IEEE 802.15.4 is also verifies by the simulation results. Implementations show the possible uses of voice transmission.

Parameters Considered while Audio Streaming
From all the papers it is drawn that for successful transmission of audio over IEEE 802.15.4 following parameters must be taken care of. First it is the sampling rate, the audio signal must be sampled at correct frequency (sampling frequency must be greater than twice of the modulating frequency). Instead of sending a whole audio file, it is better to transmit the sampled data as it saves a lot of processing. So sampling rate is very important factor. If sampling rate is not selected properly, important information may lost which leads to corrupted received signal. It is better to select according to the nyquist criteria.
Second factor which holds responsible for better voice transmission is latency. In [6] main focus is on latency and packet loss. It affects a lot in VoIP calls, or we can say that it is the worst enemy of VoIP calls. Acknowledgment (Round-Trip latency) also affects system performance, sometimes with ACK ON leads to have low latency or vice versa. This excessive latency leads in delay; both latency and delay are interchangeable words in some contexts. Queuing of packets also results in latency. This can be removed by adopting proper management control method. As in [6] a feedback method is employed to control latency.
Third factor which hampers audio streaming over IEEE 802.15.4 is jitter. In [2] it is explained that jitter is basically due to the dropped audio packets. Because of the dropped packets, important information may lost and receiver will be deprived of what sender wants to communicate. The audio packets dropped because of the inadequate buffer size and also due to the unsynchronized packet sending. In [2] it is suggested to take care of buffer size calculation. Sampling also controls jitter, as more number of samples are buffered, jitter reduced considerably.
Fourth factor which is quite important for audio streaming is the bandwidth. In [2] a special technique is introduced to save bandwidth called silence detection. Limited bandwidth of the used standard motivates us to save bandwidth in order to send more data on the available bandwidth. All the papers discussed above focused on compression of audio packets to save our limited bandwidth which leads to another factor voice codecs.
Voice codecs are used to compression of speech signal. It successfully reduces the bandwidth required for sending an audio file. Different voice codecs are used in above paper as in [1] a psychoacoustic model based on FFT signal decomposition is used while in [4] LPC vocoder is used. Similarly, other projects used different vocoder. Choice of vocoder depends on the project requirements, like in [8] speex is used, as it is open source and it satisfies all requirement of Z-Phone project. Not only these parameters plays significant role in audio streaming over IEEE 802.15.4, there are also other factors like packet loss, echo, hardware noise, end to end delay, etc.

Research Gap
Cost is the most important actor in designing a device. It depends on various factors like labor employed, technology used, market conditions and many more. But this factor had not been addressed. Market conditions are not studied properly to check whether their device is compatible with existing technologies or not. All the papers are being focused on the different method of audio streaming but none of them talked about interoperability.
Environmental conditions are also ignored. Be able to use our device efficiently in every area of aspect, surroundings must be studied thoroughly. Only in [3] emergency conditions are considered (disastrous conditions) while developing methods for audio streaming. Many other factors like climatic conditions (tends to destroy the device), natural calamities and many more. GSM codec is highly successful codec with light output, but rarely being used while streaming an audio. Others codecs are used like Speex, G.729, G.711 etc. So there is lot of maturing that needs to be done in voice codecs for speech frames compression.
All the papers discussed above are very good in their aspect but a lot of discussion is needed whether the designed product is market ready or not.

CONCLUSION
By combining techniques used in [3] and [5], a much more advanced technique can be obtained. As [3] focused on perceptual voice selection designed keeping in mind the emergency management and [5] develops secure and high quality speech transmission method. So by mingling these two techniques we are able to save our bandwidth and also security and speech quality is maintained. In [5] a slight modification can be done by replacing vocoder with other high standard and open source codec for better results. For controlling the bit stream [6] can be used along with [3] and [5]. Concluding our paper, all the discussed papers have much more scope describing many aspects of streaming an audio over IEEE 802.15.4.

Acknowledgement
To start with I thank GOD for the wisdom and perseverance that He has bestowed upon me throughout my life due to which I was able to complete this report well in time.
I hereby take the opportunity to express my sincere reverence and gratitude to all those who helped me in completing this work.
Most important of all, it gives me immense pleasure while express my thanks to my Supervisor Mr. Swastik Gupta, Assistant Professor, DECE, who left no stone unturned to help and guide me in every possible and right way. I am deeply grateful to him for his constant supervision as well as for providing necessary information and support in completing the thesis synopsis.