Voice over mobile… from an agile perspective…

So I did this really spontaneous thing last week because the new lead VP of AOL Local and Maps came down from high. I love how that sounds… “came down from high”. It’s like these executives descend from their cloud cathedrals and bless us mere propeller heads with a visit. As it turns out, these guys are not heavenly at all. They get constipation just like I do when the diet doesn’t include an adequate supply of fiber and liquid. Hi, I’m Cliff. You’re here because you don’t think it’s possible to develop a mobile voice application using agile practices. I mean, you can’t honestly test drive something like text-to-speech or voice recognition… can you? Anyhow, like I was saying, I did a really “out-of-the-blue” thing. I’d been playing with Text-To-Speech ever since I started mobile development. I’d been fooling with it and trying to get a handle on voice recognition as well. So when I heard my director tell us that the execs were coming to town I thought, “This would be an ideal time to throw my work in front of their faces to see what kind of reaction I’d get.” At this point I had already taken voice in a double round trip from the iPhone to the server pinged back as text then ponged to the server which re-pings more voice. The example was simple and it demonstrated that we could not only do voice output from mobile, but that we could concatenate multiple audio streams in the client and perform recognition. I also had a good amount of confidence in my TTS engine, enough to spend the weekend prior roping in an ugly hack in one of our premier products. (We have two of them now… go grab the Navigator from the Apple store!)

I received favorable response from all who saw my prototype. So much favor that a subsequent demo was scheduled the next day on my behalf. (I missed my own subsequent demo.) That energized me even further as I decided two nights ago to begin officially bundling my idea. It’s one thing to hack something together but I needed a polished solution. I began to get back into my comfort zone with test driven design. My warm/fuzzy place. (Lately I’d been combing through miles of piles of other people’s non-tested non-reusable code regurgitating the bad practices I’d been swamped with.)

It began with a test. No. It began with a spec.
Given the text, “hello Miami!”
When I ask voiceOutput to speak the text
Then I should get an audio response in return.

I then started to chip away. Nothing was talking. No sounds or audio files were connected to my code. It was just how I remember it. The sweet smell of JUnit… err… OCUnit… or SenTest…somethin’ or other… I began to dream while I wrote test after test each followed by implementation logic. “What will the end picture look like?” Lower level details tried to sneak into my head but I fought them off… each time refocusing on the test before me. The task was only to accept text input and respond with an audio stream.

From the onset it seems difficult if not impractical. How does one validate that the voice engine is working? How do you verify the thing speaks… with sound your ears can pick up? How does it all work with unit tests? I tend to get tripped up on these details just as much as anyone who picks up TDD. When you look at the same problem, the same question with a slightly different view (in reverse even) it becomes obvious. Put the “D” in front of TDD. We aren’t testing, we’re designing! It’s Design Driven-by Tests or DDT. If you think test you think things like “how can I plug my ear drum into SenTestCase?”, or “how can I wire my optic nerve up to the JUnit Test runner?” When you think design you no longer care if it works right you care if you have the right works. So back to my original question, “How does one validate that the voice engine is working?” Let’s instead ask “How does one build the right voice engine?” The voice engine is right when it answers the question. The question become the point of focus. The question is the specification. So I don’t verify audio output. I verify the interface or logic in between. I didn’t write the voice engine code so verifying it is a wasted exercise. I only care about the code that sits between my phone and the voice engine.

On the outside I know that my phone code deals with english sentences. I also know that it is rather complicated so changing it to “fit into” a voice engine would be an exercise in complexity. So now that I’m not testing audio output or the engine that produces it I can totally forget about what the engine expects… for now. I continue design with specs and answers to specs. as long as my code returns an audio stream on the outside, I know I’m finished. The converter code is looking a lot more simple than one would imagine. I then know that I’l need to get the stream from somewhere. So I write the code to get the audio data from another thing. (Here’s where I make a subtle mis-step.) I start thinking about caching. Because I’m worried about cell network latency I want to include an on-device cache.

I continue adding a test that states (literally) that I’ll probably need a cache. The test asks for text to be converted to audio without using a cache then triggers a test failure if an exception isn’t thrown. The failure message states something like, “an exception should be thrown if a cache is not configured.” I satisfy this requirement and add more tests that anticipate cache inquiries prior to any conversion. I use the self-shunt pattern rather than OCMock objects because I don’t want to add a framework (layer of complexity) to my project just yet. I drive on with more tests that eventually shape the cache and also a network data provider that only exists as a protocol (analogous to a Java interface). I’m feeling good at this point. Not finished, not even close… but I have good progress with a relatively small amount of code. I’m also feeling slightly more comfortable with both XCode and Objective C. I feel best about the fact that I can actually test drive a feature that would likely seem impossible to most developers. That along with the fact that I managed to do an acceptable demo to our Senior VP team, one that got people giving the thumbs up. I think I can sleep well tonight.