Speex On iPhone Explained Part II


*Update*
From part I I neglected to point out that you should un comment #define _USE_SSE in the config.h as mentioned below. This preproc directive will allow you to run on device. It was also mentioned that you could get more speed out of Speex if you #define FIXED_POINT instead of FLOATING_POINT. I have not verified this and Speex runs acceptable in my implementation without it but its worth mentioning.
*Update*

You have a lot of vocal audio data. Maybe it needs to be stored on an iPhone. Maybe it needs to glide effortlessly over the network like a slice of dental floss blowing in the wind. Whatever the case, you need a good compressor and Speex is a great solution. Hi, I’m Cliff. You’re here because you’re looking for an answer to your audio compression needs. I’m here to deliver the secrets to decompressing audio with the Speex codec. That, for what it’s worth, is the only reason I’m still hanging around here. In any other event you’d probably find me on South Street sharing a soda with a cat. I digress . . .

In part I of this series I explained how to get Speex to compile. Today we’ll try to import the OGG container format into our project and move onto Speex decompression. Because not everyone may be aware, a brief explanation on codecs and containers is in order. Audio encoding is typically made of two distinct pieces. You would usually have a container format and an encoding. The audio container holds the meta data, or descriptive information, for the actual audio file. This meta or descriptive information includes things like how many channels are in the audio data, what sample rate it is recorded at, and the endianness (not Indianness) of the audio data. There are other potential data held in the container format depending on the type of encoding you use. Finally, the descriptive (meta) data will have the location (offset) of the actual audio data in the file. The encoding is the actual raw audio data that is to be delivered directly to the output device or speakers. The encoding can be a direct digital dump (that is the actual audio samples taken over time as the audio was recorded) or it can be a compressed variant of the raw samples. It’s important to note that the encoding and the container are not usually dependent upon one another. That mean you can mix and match Speex encoding with a wave container format just the same as you can put raw uncompressed samples in an OGG container. It’s just more common to find uncompressed data in a wave container and Speex compressed audio in an OGG container.

Let’s take a step back and try some TDD. Following best practices, we need to create a need for the Speex codec and the OGG container. I realize this is cart before the horse style since we’ve already imported Speex but bear with me as I’m doing this tutorial on my time off. Also up until now I’ve been completely out of the TDD habit for a while as I strive to work closely with others who are uncomfortable with the style. We start by creating a “Unit Test Bundle” target in the project. Create a new objective C class named “CCCSpeexDecoderTest” using the “New File…” dialog and do not choose (unselect) the “also create .h file” option. Include the following in your new Objective-C class file.

//
//  CCCSpeexDecoderTest.m
//  SpeexLib
//
//  Created by Clifton Craig on 11/13/10.
//  Copyright 2010 Craig Corporation. All rights reserved.
//
#include <SenTestingKit/SenTestingKit.h>

@interface CCCSpeexDecoderTest : SenTestCase
{
	NSString* wavFile;
	NSString* speexFile;
}
@end

@implementation CCCSpeexDecoderTest

-(void) setUp
{
	wavFile = [[NSBundle bundleForClass:[self class]] pathForResource:@"sample" ofType:@"wav"];
	speexFile = [[NSBundle bundleForClass:[self class]] pathForResource:@"sample" ofType:@"spx"];
}

-(void) testFirstTest
{
	STAssertNotNil(wavFile,@"A sample wav file is required to test speex decoding.");
	STAssertNotNil(speexFile,@"A sample speex file is required to test speex decoding.");
}

@end

Running this tells us that we’re going to need some speex data to operate on. (I’ve taken the liberty to generate a wav file using the “say” command and converted it to a Speex encoded file using the JSpeex API via Groovy. I’ll include both in a download of the project for this lesson.) Next we’ll create a structure to hold our unit tests and test resources. We will be following the “golden copy” testing pattern. You later learn that using the pattern here is rather fragile, however a more purist approach would take us through an exercise of re-writing the entire Speex project which is outside the scope of my tutorial. Using Finder, I created a “Tests” and a “Resources” folder under my src folder in my project. Drag/drop these folders into XCode to create the corresponding groups. Then drag/drop the sample wave and sample speex files (named “sample.wav” and “sample.spx” respectively) into the “Resources” group in XCode. Running the test will now pass.

We now work our way through creating the decoder. I’ll spare the individual steps in TDD as it would make this text overly verbose and I’ll try to summarize instead. We need an actual decoder instance which we’ll be importing. TDD suggests we import what we don’t have so add the import for a CCCSpeexDecoder type which does not exist. Build and fail. (The failure is important as it formalizes the class or feature you are about to add/change/delete.) We also need to be able to create this type and give it some audio to decode. It will also need a place to send the decoded audio data. I’m going to define an abstraction for providing/receiving the audio data so that we don’t necessarily need a file system so I’m adding a test to demonstrate/document the need for an audio source, a test to demonstrate/document the need for an audio sink, and one other test that formalizes how we plug these two abstractions into the decoder.

#import "CCCSpeexDecoder.h"

@interface CCCSpeexDecoderTest : SenTestCase <CCCSpeexAudioSource, CCCSpeexAudioSink>
{
	NSString* wavFile;
	NSString* speexFile;
	CCCSpeexDecoder *decoder;
}
@end

@implementation CCCSpeexDecoderTest

//...

-(void) testAudioSourceIsDefined
{
	id<CCCSpeexAudioSource> anAudioSource = self;
}

-(void) testAudioSinkIsDefined
{
	id<CCCSpeexAudioSink> anAudioSink = self;
}

-(void) testCanCreateDecoder
{
	[[CCCSpeexDecoder alloc] initWithAudioSource:self andAudioSink:self];
}

And this calls for the additional CCCSpeexDecoder class which defines the abstractions…

#import <Foundation/Foundation.h>

@protocol CCCSpeexAudioSource

@end

@protocol CCCSpeexAudioSink

@end

@interface CCCSpeexDecoder : NSObject {

}

- (id) initWithAudioSource:(id<CCCSpeexAudioSource>) anAudioSource andAudioSink:(id<CCCSpeexAudioSink>) anAudioSink;

@end

#import "CCCSpeexDecoder.h"

@implementation CCCSpeexDecoder

- (id) initWithAudioSource:(id<CCCSpeexAudioSource>) anAudioSource andAudioSink:(id<CCCSpeexAudioSink>) anAudioSink
{
	self = [super init];
	if (self != nil) {

	}
	return self;
}

@end

Now we go back and add one more test that explains what we’re after.

-(void) testCanDecode
{
	[decoder decodeAudio];
}

Build and fail so that we know to define the method.

-(void) decodeAudio
{
}

We now have defined the ability to decode audio. We have to set our expectation for this method. (Test first begins with declaring or expressing a need for a feature or function then setting an expectation for its behavior.) After invoking decodeAudio we would expect to have collected the decoded audio bytes somewhere. I’ll add a mutable data fieldin the test for this.

@interface CCCSpeexDecoderTest : SenTestCase <CCCSpeexAudioSource, CCCSpeexAudioSink>
{
	NSString* wavFile;
	NSString* speexFile;
	CCCSpeexDecoder *decoder;
	NSMutableData *decodedAudio;
}
@end

@implementation CCCSpeexDecoderTest

-(void) setUp
{
	wavFile = [[NSBundle bundleForClass:[self class]] pathForResource:@"sample" ofType:@"wav"];
	speexFile = [[NSBundle bundleForClass:[self class]] pathForResource:@"sample" ofType:@"spx"];
	decoder = [[CCCSpeexDecoder alloc] init];
	decodedAudio = [[NSMutableData alloc] init];
}

And we add a test to exercise the method and document/verify our expectation:

-(void) testDecodeAudioFillsDecodedAudio
{
	STAssertTrue([decodedAudio length] == 0, @"Should NOT have accumulated data");
	[decoder decodeAudio];
	STAssertTrue([decodedAudio length] > 0, @"Should have accumulated data");
}

Here’s the Oogly part. We are calling a method with no return value. We’ve defined an abstraction around collecting data (an audio sink) and we’ve made our test case adopt the protocol for this abstraction. The protocol defines no methods. The test calls for data to magically arrive in the mutable data field. Indirectly, our test is stating that given a source and a sink, when the decodeAudio message is sent we should have accumulated data in the sink. running the test fails because we haven’t added the functionality. We step into the decodeAudio implementation and fill in the simplest thing that works.

-(void) decodeAudio
{
	NSString *pretendData = @"pretendData";
	[audioSink audioWasDecoded:
		[NSData dataWithBytes:[pretendData cStringUsingEncoding:NSUTF8StringEncoding] length:[pretendData length]]
	 ];
}

You see we are talking to an audioSink object here. Because we don’t really have an audiosink object in scope (I just made it up b/c it felt right) we need to declare it.

@interface CCCSpeexDecoder : NSObject {
	id<CCCSpeexAudioSink> audioSink;
}

If we run we still won’t get satisfaction because we haven’t ensured that the audiosink given in the constructor is the one we talk to when we decode audio. So we revisit the init method.

- (id) initWithAudioSource:(id<CCCSpeexAudioSource>) anAudioSource andAudioSink:(id<CCCSpeexAudioSink>) anAudioSink
{
	self = [super init];
	if (self != nil) {
		audioSink = [anAudioSink retain];
	}
	return self;
}

We also need to release in our dealloc.

- (void) dealloc
{
	[audioSink release];
	[super dealloc];
}

Let’s be more specific. When decoding audio we will want to discover the meta data or attributes of the audio. This information is usually the first group of bytes in a file and it explains what the rest of the file contains. We’ll declare an expectation to receive a callback in our sink which contains the meta data in an easily navigable NSDictionary.

-(void) testDecodeAudioReturnsHeaderInfoToSink
{
	STAssertNil(headerInfo, @"We should start with no header info.");
	[decoder decodeAudio];
	STAssertNotNil(headerInfo, @"We should now have header info.");
}

and we need to add an NSDictionary field to our test to record the header info.

@interface CCCSpeexDecoderTest : SenTestCase <CCCSpeexAudioSource, CCCSpeexAudioSink>
{
        //Other fields...
	NSDictionary *headerInfo;
}
@end

we add the simplest thing that will work.

-(void) decodeAudio
{
	NSString *pretendData = @"pretendData";
	[audioSink headerWasDecoded:[NSDictionary dictionary]];
	[audioSink audioWasDecoded:
		[NSData dataWithBytes:[pretendData cStringUsingEncoding:NSUTF8StringEncoding] length:[pretendData length]]
	 ];
}

…And this calls for an additional method in our AudioSink protocol.

@protocol CCCSpeexAudioSink <NSObject>

-(void) audioWasDecoded:(NSData*) someDecodedAudio;
-(void) headerWasDecoded:(NSDictionary*) theAudioAttributes;
@end

Which bleed back into the test where we store the attibutes as our header info. Add the following to the test case.

-(void) headerWasDecoded:(NSDictionary*) theAudioAttributes
{
	headerInfo = theAudioAttributes;
}

Now we’ll look at individual attributes given to the sink during the parse. We set some expectations for numeric values mapped to specific keys in the header info.

-(void) testDecodeAudioHeaderInfoIncludesSpecificValues
{
	[decoder decodeAudio];
	NSNumber *value = [headerInfo valueForKey:@"sampleRate"];
	STAssertNotNil(value, @"Should have returned a number");
	value = [headerInfo valueForKey:@"frameSize"];
	STAssertNotNil(value, @"Should have returned a number");
	value = [headerInfo valueForKey:@"numberOfChannels"];
	STAssertNotNil(value, @"Should have returned a number");
	value = [headerInfo valueForKey:@"decodeBlockSize"];
	STAssertNotNil(value, @"Should have returned a number");
	value = [headerInfo valueForKey:@"framesPerPacket"];
	STAssertNotNil(value, @"Should have returned a number");
}

And as you’ll note a pattern here we should do some refactoring.

-(void) assertNumericValueInDictionary:(NSDictionary*)aDictionary atKey:(NSString*)aKey
{
	NSNumber *value = [headerInfo valueForKey:aKey];
	STAssertNotNil(value, @"Should have returned a number");
}

-(void) testDecodeAudioHeaderInfoIncludesSpecificValues
{
	[decoder decodeAudio];
	[self assertNumericValueInDictionary:headerInfo atKey:@"sampleRate"];
	[self assertNumericValueInDictionary:headerInfo atKey:@"frameSize"];
	[self assertNumericValueInDictionary:headerInfo atKey:@"numberOfChannels"];
	[self assertNumericValueInDictionary:headerInfo atKey:@"decodeBlockSize"];
	[self assertNumericValueInDictionary:headerInfo atKey:@"framesPerPacket"];
}

Because I forget the attributes of the file provided I’m going to use a discovery test technique. With this technique we use a dummy expected value in our assert and allow the assertion error message tell us what the actual value is. I wouldn’t do this in normal testing. It’s only because I already have working code that I’m plugging in and because this tutorial is getting wordy that I’m going to take the cheap way out.

-(void) assertIntValue:(int)anInt isInDictionary:(NSDictionary*)aDictionary atKey:(NSString*)aKey
{
	NSNumber *value = [headerInfo valueForKey:aKey];
	STAssertNotNil(value, @"Should have returned a number");
	STAssertEquals([value intValue], anInt, @"Integer value %i should exist for key %@", anInt, aKey);
}

-(void) testDecodeAudioHeaderInfoIncludesSpecificValues
{
	[decoder decodeAudio];
	[self assertIntValue:-999 isInDictionary:headerInfo atKey:@"sampleRate"];
	[self assertIntValue:-999 isInDictionary:headerInfo atKey:@"frameSize"];
	[self assertIntValue:-999 isInDictionary:headerInfo atKey:@"numberOfChannels"];
	[self assertIntValue:-999 isInDictionary:headerInfo atKey:@"decodeBlockSize"];
	[self assertIntValue:-999 isInDictionary:headerInfo atKey:@"framesPerPacket"];
}

Once we implement the actual parsing logic we will start to see the actual values reported in the assertion errors. (I am adapting existing working code rather than developing the code from test cases.) We will pull the values from the errors back into the asserts to make the test pass and document what our expectations actually are.

Now we need to actually start pulling audio from our audio source abstraction. Because we used protocols, our test can pose (using the self-shunt pattern) as the audio source and provide data for the decoder. We step into the decoder and start doing some actual parsing.

-(void) decodeHeader
{
	[audioSink headerWasDecoded:[NSDictionary dictionary]];
}

-(void) decodeAudio
{
	NSString *pretendData = @"pretendData";
	[self decodeHeader];
	[audioSink audioWasDecoded:
		[NSData dataWithBytes:[pretendData cStringUsingEncoding:NSUTF8StringEncoding] length:[pretendData length]]
	 ];
}

Importing OGG

At this point we have to import OGG for decoding the container so we can read the file meta data. Download and unpack libogg (not liboggz) from the Xiph.org download site.

We need to add the ogg header files to the header search path, so drag/drop the ogg folder from the include folder in the root of the unpacked directory into your XCode project. (/path/to/libogg-1.2.1/include/ogg) Choose to Copy the files from the dialog and select your static lib target before accepting the dialog. Delete the config_types.h.in and makefile.am and Makefile.in from this folder and group. (Also move them to trash.) Double click the project icon in the left tree pane and select the “Build” tab. Type “header search” in the search box at the top to narrow the options to the header search path. You need to add, “$(SRCROOT)” as one of your header search path values here. Create an XCode group for the ogg source code and drag/drop the “bitwise.c” and “framing.c” files from the unpacked libogg source folder. (/path/to/libogg-1.2.1/src).

At this point building unit test target should leave you with errors from the latest round of header info asserts which we will fix in the next part of the series. We have a fully configured project with access to both the speex and ogg encoding/decoding APIs which is exciting. In the next part of the series we will tackle calling into these APIs to decode the data. I’m going to upload my part II example project to my box account so it will be in the right and pane for your downloading pleasure. Until next time…

(Some of you will have noticed I accidentally published this post the other day before finishing it. This is why I’m publishing it half baked tonight. There’s alot here and a lot more to cover. Keep checking back for updates!)

10 Comments

  1. Josiah Hoskins

    Hey Cliff. I have been able to build Part II. But the Build and Run icon is grayed out. Is the project not runnable. Inquiring minds want to know.

    • That’s absolutely right! What you downloaded is an API, not an application. I am planning a part three to the tutorial where I will demonstrate how to use the API in an application. Until then, you can still use the API by including it in your own application. I should be specific. The project builds a static lib (a “.a” file) which can be dragged/dropped into an Xcode project. Once it’s dropped in you should be able to make function calls to your heart’s content.

      • yzh2002

        hello Cliff, your part II example project link was broken.
        I wonder if you can mail the project to me (yezehui@gmail.com)
        or repair the link.
        Thank you SoooooooooooMuch.

      • Thanx for the update! I’ve fixed the link and for future reference the file also appears in the right pane under Box.net files along with some of my older junk like mario-programming etc.

  2. Josiah Hoskins

    Great! I am looking forward to your part III.

  3. bob

    [NSData dataWithBytes:[pretendData cStringUsingEncoding:NSUTF8StringEncoding] length:[pretendData length]]
    is not correct because if pretendData contains multibyte characters, then [pretendData length] won’t be the length of the byte sequence

    anyhow you should use
    [pretendData dataUsingEncoding:NSUTF8StringEncoding]
    which will even handle null characters correctly

  4. hello cliff,great! I can’t download resource(part II) code, I wonder if you can mail the project to me (315884562@qq.com)

  5. Vimal Jain

    Hello Cliff,

    Thanks for this article.

    I have tried it and able to build it. I am going to try to make a speex encoder and decoder for iPhone. I have looked other blogs and lists and it seems you have also tried to go ahead with your plan of part 3 (sample example of working speex on iPhone). If you can share your experiences with us, it would be great help.

    Thanks again for providing a direction.

  6. Vimal Jain

    I have made one sample to play a speex file in iPhone app and it works great.

    Thanks again.

  7. Sarab

    Vimal, Would you be able to share your source code? email sarab7_sep@yahoo.com. Appreciate it in advance.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 238 other followers

%d bloggers like this: