Networking, sockets, and UDP Streaming fun! Part III


I haven’t posted an update in a while and I looked back then realized that I almost forgot about this series I started. Hi, I’m Cliff. I like to post topics of interest, start a random series and abandon it part way in. If you’ve been here before then you probably already knew that. Today We’re going to take a deep look at the UDP component of my streaming audio experiment. Actually, it never was my experiment, instead I borrowed it off of some forum I found online but that’s irrelevant. In my last post I explained how audio is captured on the server at a high level and covered the basics of how audio works. I also hinted at two key methods in the program, sendThruUDP() and receiveThruUDP(). These two methods send audio bytes over the network from the server and receive them on the client respectively.

Rapid audio packets
Going back to my last post I highlighted the following block of code:

byte tempBuffer[] = new byte[1000] ;
int cnt = 0 ;
while( true )
{
   targetDataLine.read( tempBuffer , 0 , tempBuffer.length );
   sendThruUDP( tempBuffer ) ;
}

This is what we, in silicon valley, call a tight loop. It is a while statement based on a a literal boolean value which will never be false, meaning the loop will continue indefinitely. The reason it is tight is because using a literal removed the need for an additional conditional step, which would slow down the iteration. I hinted at the importance of speed when I illustrated this loop. When streaming audio and/or video data real time you want to do everything possible to reduce overhead in sending the data over the network. With this in mind, let’s look inside sendThruUDP();

    public static void sendThruUDP( byte soundpacket[] )
    {
       try
       {
       DatagramSocket sock = new DatagramSocket() ; 
       sock.send( new DatagramPacket( soundpacket , soundpacket.length , InetAddress.getByName( IP_TO_STREAM_TO ) , PORT_TO_STREAM_TO ) ) ; 
       sock.close() ;
       }
       catch( Exception e )
       {
       e.printStackTrace() ;
       System.out.println(" Unable to send soundpacket using UDP " ) ;   
       }
 
    }

There’s a lot happening inside this method even though there are very few lines of code visible. Here we see code which starts by creating a DatagramSocket object. It then creates a DatagramPacket object and stuffs the packet object full of sound packet bytes using the constructor parameters. We also pass the length of the packet byte array along with the IP address and port of the client that we are streaming to. On the same line that we create the DatagramPacket we call send on the DatagramSocket instance, passing this newly created DatagramPacket object. The send() method will take this Packet, which contains the raw audio data, and send it to the IP address and port info that is recorded inside the DatagramPacket. We end the method by closing the datagram socket then continue with the loop that originally called it.

The work of object constructor/destruction
Our first series of major problems are right here in this method. Remember what I said above about reducing overhead? Well there is a ton of overhead in this method, much of it is in the form of constructors. An object constructor usually contains the most expensive parts of any program as such you want to call them as infrequently as possible. Java attempts to make programing fun and simple by removing the many details of what your operating system and hardware are doing behind the scenes but in reality it helps to have a cursory understanding of what happens in general. Start with DatagramSocket(). This isn’t just a magical object. (Let’s try to imagine what ultimately needs to take place for sound to fly from one machine to another.) In reality the object has to establish communication bridge between your program and the operating system and eventually with your network card. This work would most likely happen in the constructor. Now consider the DatagramPacket object constructor. It doesn’t have to do as much work, however it does need to set aside (or allocate) a chunk of memory to hold the audio data. You may tend not to ignore it but allocating memory also takes some time. (As a Java programmer you are not supposed to think about memory allocation because it’s done auto-magically for you!) The operating system has to scan the available RAM and sometimes shuffle things a bit to find room for what you want to do. Finally the call to sock.close() adds even more overhead. The close() call destroys all of the bridge work that was established in the constructor.

Visualization
To visualize what is happening, imagine you wanted to carry a bunch of wood from Home Depot to your condo. Pretend you needed a truck and that there was a bridge between Home Depot and where you lived across town. Let’s say the truck could only carry so many blocks of wood and required several trips back and forth between your home and Lowes. (Yes, I started the analogy with Home Depot but work with me, Lowes is easier to type.) The bridge represents the Datagram socket, the truck would be the DatagramPacket, and the repeat trips would be the while loop calling the method. What this method does is build the bridge, then build the truck before driving a single load of wood home. It then places dynamite under the bridge and under the truck completely demolishing them before exiting and returning to Home Depot for the next load. (The sock.close() method is represented by the sticks of dynamite in my analogy.) Hopefully you can imagine how inefficient it is to move all of the wood from Lowes to your apartment. If there were a crew of wood workers at your home they would be annoyed by how long it took for each load of wood to arrive, thus there would be a lag in their productivity. On each trip they would likely takes a coffee break and watch an episode of Judge Judy.

We’ve found one major source of lag in our program but now let’s look at the client logic. Recall how I highlighted the run() method in the client?

public void run()
{
    byte b[] = null ;
    while( true )
    {
       b = receiveThruUDP() ; 
       toSpeaker( b ) ;
    }        
}

This is another tight loop which is intended to be fast. It calls the receiveThruUDP() method on each iteration to receive bytes from the network into a byte array variable then pass them to the speaker.inside the receiveThruUDP() method we have the following:

    public static byte[] receiveThruUDP()
    {
       try
       {
          DatagramSocket sock = new DatagramSocket(PORT_TO_STREAM_TO); 
          byte soundpacket[] = new byte[1000];
          DatagramPacket datagram = new DatagramPacket( soundpacket , soundpacket.length , InetAddress.getByName( IP_TO_STREAM_TO ) , PORT_TO_STREAM_TO );
          sock.receive( datagram ); 
          sock.close();
          return datagram.getData(); // soundpacket
       }
       catch( Exception e )
       {
          System.out.println(" Unable to send soundpacket using UDP " );   
          return null;
       } 
 
    }

This method begins by creating a DatagramSocket. It then creates a byte array of 1000 bytes, and then goes on to create a DatagramPacket where it passes the empty sound packet byte array, it’s length, and the IP and port we are streaming to. Next it calls receive on the socket passing the empty datagram packet. The receive method will will the packet with the sound data that was sent from the server. Finally the method ends by calling close on the socket and returning the received data to the calling code. Again, the logic in the method is intended to be fast. However, based on our learnings from above we can probably identify some very similar inefficiencies in this method. Creating a socket establishes a bridge with your operating system and your network card, creating the sound packet array and the Datagram Packet each need to allocate memory, closing the socket destroys all of the communication bridge that was set up in the beginning, then the entire process is repeated on each iteration.

How do we optimize these inefficiencies? The simplest thing would be to remove all calls to object construction from the inefficient methods. You also don’t want to call close in either the send or receive method. Instead you want to create the objects when the program starts and reuse these objects inside the send and receive methods. There are likely other inefficiencies in the program but these are, by far, the most critical. Like I said earlier, I was amazed the program ran and produced any audio at all as the logic is terribly inefficient. The fun part of the project was working incrementally through both the client and the server while running the app and hearing the incremental improvements real time. In other words, I was running both the client and the server and listening from the client. I would then make little small optimizations on either the client or server, recompile and re-run. The compile and run step happens quickly causing only a brief interruption in the audio. The result felt like I was massaging the sound into the correct shape… truly amazing!

I’m going to continue on this path of discovering the capabilities and possibilities of network streaming. Stay tuned for additional updates.

Networking, sockets, and UDP Streaming fun! Part II


Yesterday, in my arrogance I spoke about some old UDP audio streaming logic I found online yesterday. I suggested that I could see where the choppiness problems were and fix them. Today I couldn’t help myself. I dove in and ran the code between two Mac computers to hear how it sounded. I was amazed that the code worked at all without any modification! Hi, I’m Cliff. You’re here because you wanna hear how it sounds to break your voice up into tiny little ones and zeroes on one device then reassemble them on another. I’m here because I’m just fortunate enough to be able to do this task. Picking up from yesterday I can proudly announce that I have indeed solved the mystery of choppy audio! It took about 20-30 minutes of copyin/pasting the original code and tweaking it to sound natural. My goal is to give you the answer to the mysterious choppy problem but…! Before I do so, let’s try to understand what the code is actually doing. There are two components, a server and a client.

The Server
We’ll Look first at the server. It starts off in the main() method and prints information about the audio system’s connected devices to the console.

    Mixer.Info minfo[] = AudioSystem.getMixerInfo() ;
    for( int i = 0 ; i < minfo.length ; i++ )
    {
     System.out.println( minfo[i] ) ;    
    }

It then looks to see if the Microphone audio line is supported before proceeding.

if (AudioSystem.isLineSupported(Port.Info.MICROPHONE))

This code merely detects whether there is any sort of “microphone-like” input attached to the computer.

It then opens a target data line for the microphone and starts it.

      DataLine.Info dataLineInfo = new DataLine.Info( TargetDataLine.class , getAudioFormat() ) ;
      TargetDataLine targetDataLine = (TargetDataLine)AudioSystem.getLine( dataLineInfo  ) ;
      targetDataLine.open( getAudioFormat() );
      targetDataLine.start();

These API calls [above] create a special DataLine.Info object instance which describes the format of the audio we wish to capture from the microphone. We call a getAudioFormat() method where the details of the audio format are encoded and returned in a special AudioFormat object instance. Without going into too much detail, understand that audio can come in many shapes and sizes depending on if you want stereo, mono, high fidelity or whatever. The AudioFormat class models things such as sample rate sample size, number of channels, etc. The format is given to the DataLine.Info constructor which creates an instance that we pass to the AudioSystem API via the getLine() method. At this point we have a connection to the microphone which we can open() and start() to capture audio samples.

How Audio works
For those of you who are very green to programming and/or audio I’ll explain briefly how audio, or sound in general works in computer systems. This all may seem pretty complicated but audio is one of the simplest concepts you can learn and is actually pretty cool when you understand it’s basic building blocks. Sound is merely the vibration of matter which our ears detect using little tiny hairs. The speed of the vibration makes the different noises that we hear. The vibrations come in various wave forms with different frequencies. Computers record and create sound using a process called digitalization. This is where frequencies, which are analog in nature, are converted to and from digits. It captures digital audio by using a microphone which is a piece of hardware that measures vibrations, or wave forms in the air. It takes a series of samples of the intensity of the wave at specific intervals and it creates or synthesizes audio by sending recorded samples to a speaker which includes a paper cone that vibrates based on the size of the digits in the sample. In short, the bigger the digits are in the sample the more intense the paper cone vibrates. You can think of the digits 16 as causing a small vibration where the digits 128 would cause a much more intense vibration of the paper cone inside the speaker. If a bunch of samples are sent quickly to cone the vibrations happen quickly, if they are sent slowly then the vibrations occur slowly. The combination of the speed and intensity of the vibrations of the paper creates noise or sound that we hear. I’m over-simplifying and glossing over a lot but that is the basics of sound. The samples and sample speed are the key to sound!

Going back to our example, the guts of our server are below:

      byte tempBuffer[] = new byte[1000] ;
      int cnt = 0 ;
      while( true )
      {
      targetDataLine.read( tempBuffer , 0 , tempBuffer.length );
      sendThruUDP( tempBuffer ) ;
      }

Here we see a byte buffer is created (with a strange not-round length of 1000 instead of 1024) and we enter a loop where we continually pass the buffer between two methods, targetDataLine.read() and sendThruUDP(). The first method reads a block of samples from the microphone, which (as described above) measures the vibrations in the air and writes these samples to the buffer. The second method sends the buffer of samples over UDP to the client.

The Client
We’ll now turn our attention over to the client. The client is a RadioReceiver which extend Thread. As it turns out, this is unnecessary complexity as there is no work being done other than playing back the captured audio samples. We can safely ignore the Thread part and pay attention to the code inside of the run method, which is invoked indirectly by the call to r.start() in the main method below.

    public static void main(String[] args) {
  
    RadioReceiver r = new RadioReceiver() ;
    r.start() ;
  
    }

The run method below declares a byte array which is used in a while loop. Inside the while loop we load the byte array variable with the result from the receiveThruUDP() method. This method attempts to capture sample data sent over the network and return it to the caller. In short the byte array is loaded with the samples captured from the network which were originally captured and sent from the server. We then pass the array of samples to a method called toSpeaker. This method eventually hands the samples to a Java API called SourceDataLine.write(). This Java API will eventually send the samples to the speaker which causes the paper cone to vibrate and recreate the sound on the client. See the run snippet below:

    public void run()
    {
        byte b[] = null ;
        while( true )
        {
           b = receiveThruUDP() ; 
           toSpeaker( b ) ;
        }        
    }

That’s the basics of how the whole operation works. There’s not a terribly large amount of code and I haven’t described the UDP networking pieces at all. I’ll continue this series explaining UDP next. in the mean time, keep looking to see if you too can figure out where things are slowing, **ahem**, chopping up. Keep in mind how I explained the key components of audio, sample intensity and the frequency, or speed of the samples.