4800bps Speech codec SDK v.1.0
Programmer's Manual
Introduction
4800bps Speech codec Software Development Kit (SDK)
is designed for Internet Telephony, Web
based voice communication, Voice mail, Voice chat developers for easy adding voice codec to their programs.
4800bps Speech codec SDK includes the Codec4800.dll
which contains speech encoder and
decoder, implementation examples and free support.
Codec4800.dll has Dll and COM interfaces.
In the SDK examples
are included.
Speech codec allows to compress
digitized speech signal to output 4800bps bitrate and decompress it.
Voice Activity Detector (VAD) for recognizing of voice and pauses to
speech codec is embedded. VAD allows more efficiently channel resources using.
VAD is adaptive to noise level.
4800bps speech codec compatible
with Java version of codec. For more information,
please, contact us.
Codec performance:
Encoder input speech signal : PCM format, sampling frequency is 8000Hz,
16 bits per sample.
Decoder output speech signal format and parameters are the same.
Encoder output bitrate: 4800 bits per second.
Algorithmic delay: < 60ms.
Frame length: 240 samples.
Coded frame length: 18 bytes.
In the SDK are included:
1.
COM implementation examples,
2.
DLL implementation example.
Both examples for Visual C++ 6.0.
COM implementation Example1 allows to compress digitized voice file and
decompress it. The input file for
compression must be in PCM format, sampling frequency is 8000Hz, 16bit per sample. The output file format
after decompression is the same.
COM implementation Example2 allows to compress the digitized voice frame
by frame. This case is appropriate for most application.
DLL implementation example allows to compress and decompress the
digitized voice frame by frame.
Speech codec in the Voice
Recording Applet http://www.vimas.com/ve_record_applet_sdk.htm and Web Voice Mail http://www.vimas.com/ve_voice_mail.htm
is used. You can test voice quality
online on these sites. Please, do not forget, the voice quality depends on
microphone which you used, so use good microphone for testing.
The trial version of speech codec has the same functionality as licensed
version but encoder can process only 10 sec. of the digitized speech.
For free support contact us.
New! Available now!
1.
The 16kbps wideband speech codec
SDK in C++ and Java.
2.
The 4800bps speech codec SDK in Java.
Reference Guide
Codec4800.dll includes 4 public
methods:
2. void WINAPI DeInit( HANDLE
hCodec);
3. short WINAPI FrameEncoder(
HANDLE hCodec, const
short*, unsigned char* , short
);
4. void WINAPI
FrameDecoder( HANDLE hCodec, const unsigned char* , short*, short )
Init( ) |
|
Prototype: |
HANDLE WINAPI Init( ) |
Description: |
Creates the codec object |
Parameters: |
Non. |
Return value: |
Pointer to the codec object |
DeInit( ) |
|
Prototype: |
void WINAPI DeInit( HANDLE hCodec ) |
Description: |
Deletes the codec object |
Parameters: |
Pointer to the codec object |
Return value: |
Non. |
FrameEncoder() |
|
Prototype: |
short WINAPI
FrameEncoder( HANDLE hCodec, const short*, unsigned char*
, short ) |
Description: |
Encodes 240 samples speech frame and makes decision: voice or pause. |
Parameters: |
|
HANDLE hCodec |
Pointer to the object |
const short* |
Input data. Pointer to array which contains 240 samples of the input speech.
Each sample is 2 bytes, so the array size is 480 bytes. In each sample the
LSB (Least Significant Byte ) is first, the MSB
(Most Significant Byte) is second. |
unsigned char* |
Output data. Pointer to array which contains 18 bytes of compressed speech
frame |
short |
Input data. Parameter for VAD threshold adjustment. Recommended value
is 50. |
Return value: |
|
short |
Output data. VAD decision. 1 –
frame is speech, 0 – frame is pause. |
FrameDecoder() |
|
Prototype: |
void WINAPI
FrameDecoder(HANDLE hCodec, const unsigned
char* , short*,
short ) |
Description: |
Decodes the 240 samples speech
frame. |
Parameters: |
|
HANDLE hCodec |
Pointer to the object |
const unsigned char* |
Input data. Array which contains 18 bytes of compressed speech |
short* |
Output data. Array which contains 240 samples of the decoded speech. Each
sample is 2 bytes, so the array size is 480 bytes. In each sample the LSB (Least Significant Byte ) is first, the MSB
(Most Significant Byte) is second |
short |
Input data. Reserved for future
lossed frames compensation mechanism. |
Return value: |
Non. |
2. HRESULT
Decode( VARIANT Source, VARIANT Dest, VARIANT_BOOL LossFrame)
4. [propget] HRESULT FrameSize( short* retval)
5. [propget]
HRESULT CodedFrameSize( [out, retval] short* retval)
Encode( ) |
|
Prototype: |
HRESULT Encode( [in] VARIANT Source, [in] VARIANT Dest, [in] short
Tresh, [out,retval] VARIANT_BOOL*
pIsSpeech) |
Description: |
Encodes 240 samples speech frame and makes decision: voice or pause. |
Parameters: |
|
[in] VARIANT Source |
Input
data object. It can represents: a) array which contains
240 samples of the input speech frame, sampling frequency is 8000Hz, PCM
format. Each sample is 2 bytes, so the array size is 480 bytes. In each
sample the LSB (Least Significant Byte ) is first, the MSB
(Most Significant Byte) is second; b)
name of digitized voice
file in PCM format. Sampling frequency is 8000Hz, 16bits per sample. |
[in] VARIANT Dest |
Output data object. It can
represents: a)
array which contains 18 bytes of compressed
speech frame; b) name
of compressed voice file. |
[in] short Tresh |
Input data. Parameter for Voice Activity Detector (VAD) threshold
adjustment. Recommended value is 50. |
[out,retval] VARIANT_BOOL*
pIsSpeech |
Output data. VAD decision. True –
frame is speech, false – frame is pause. |
Return value: |
Non. |
Decode( ) |
|
Prototype: |
HRESULT
Decode( [in]
VARIANT
Source, [in]
VARIANT
Dest, [in] VARIANT_BOOL LossFrame ) |
Description: |
Decodes the 240 samples speech
frame. |
Parameters: |
|
[in] VARIANT Source |
Input
data object. It can represents: a)
array which contains 18 bytes of compressed
speech; b)
name of compressed speech file. It is the
consecutive of the 18bytes frames. |
[in] VARIANT Dest |
Output data object. It can represents: a) array
which contains 240 samples of the decoded speech, sampling frequency is
8000Hz,PCM format. Each sample is 2
bytes, so the array size is 480 bytes. In each sample the LSB (Least Significant Byte ) is first, the MSB
(Most Significant Byte) is second; b)
name of decompressed speech file in PCM
format. Sampling frequency is 8000Hz, 16bits per sample. |
[in] VARIANT_BOOL
LossFrame |
Input data. Reserved for future
lossed frames compensation mechanism |
Output arguments: |
Non. |
Reset( ) |
|
Prototype: |
HRESULT Reset( ) |
Description: |
Set the initial values of variables in the encoder and decoder. |
Parameters: |
Non. |
Return value: |
Non. |
FrameSize ( ) |
|
Prototype: |
[propget] HRESULT FrameSize( [out, retval] short* retval ) |
Description: |
Calculates the size of input frame in bytes. This value allways is
480. |
Parameters: |
|
[out, retval] short* retval |
Output data. Size of input frame in bytes. This value allways is 480. |
Return value: |
Non. |
CofedFrameSize ( ) |
|
Prototype: |
[propget] HRESULT CodedFrameSize( [out, retval] short* retval) |
Description: |
Calculates the size of output frame in bytes. This value allways is
18. |
Parameters: |
|
[out, retval] short* retval |
Output data. Size of input frame in bytes. This value allways is 18. |
Return value: |
Non. |
Copyright © VIMAS Technologies,
2001-2002.