Google Talk | Gtalk | Google Chatting | XMPP |


Google Talk & Fun with XMPP



How often when you signed into your GTalk or Google Talk, have you wondered what might be happening behind the hood? Have you thought about what this little application might be cooking as you type those letters? How can it tell you that your friend is typing just as she has, in fact, started typing? How does it manage to show all that real time presence information?
Well, one day, I got irresistibly curious and decided to open it up! In this two part article, I share my thrilling adventures as I unravel the way GTalk does what it is best at — Communication.



The basics

First of all, let's remember that GTalk or any such communication application has to be just a socket program at the core. A socket program is a networking program which is usually targeted at a specific protocol. TCP/IP is the most widely used and supported communication protocol for the internet. Most of the protocols we come across in our mundane lives such as HTTP, FTP, SMTP etc, are all based on TCP/IP.



Identifying the protocol GTalk uses

Having established that, our next task is to find out which protocol GTalk actually uses. There are two ways to do this. The first one is to simply query Google itself for the answer. The second method is a little more complex and hence, thrilling. I shall stick on with the second one.
There are a number of network tracing and analysis tools available on the internet. These are powerful tools capable of revealing magical details about protocols, TCP/IP packets and so on. The one I have chosen is called Ethereal. It is a highly sophisticated tool that can analyze live TCP/IP packets. What's more, it even comes with a UI for doing all this and the best part is that it's an open source and free software.
After having set up Ethereal properly, I open the GTalk client and sign in while live capture of TCP/IP packets is in progress. Here's what I get.


Screenshot of Ethereal protocol analyzer live trace showing Jabber packets. Observe rows numbered 120 and 122.
It's a widely known fact, perhaps, that GTalk always connects to its server, talk.google.com. When we ping talk.google.com, we get its IP address, which resolves to 209.85.137.125 on my computer as of today. When we look back at the above trace, we find rows with "Destination" column having a value of 209.85.137.125 (Observe rows numbered 120 and 124). If we observe the value of the "Source" column for these rows, we get 192.168.200.190. This happens to be my computer's IP address! Thus we can conclude here that the rows we are looking at, are actually the TCP/IP packets sent from my computer to the GTalk server, talk.google.com!
Similarly, we can find rows such as 122, 128 and 131 which are packets which talk.google.com has sent to my computer as responses to the requests sent by my computer. Now that we've identified the important packets, it's a simple task to just read from the value of "Protocol" column and say that it is Jabber.



More about Jabber (and XMPP)

Quoting from the home of jabber, www.jabber.org
Jabber is best known as "the Linux of instant messaging" — an open, secure, ad-free alternative to consumer IM services like AIM, ICQ, MSN, and Yahoo. Under the hood, Jabber is a set of streaming XML protocols and technologies that enable any two entities on the Internet to exchange messages, presence, and other structured information in close to real time.
Jabber defines a host of sub-protocols, XMPP being the core of them. XMPP stands for eXtensible Messaging and Presence Protocol. In October 2004, it was adopted by the IETF community and the specification is available as an RFC numbered 3920.
To cover XMPP specifics is out of scope of this article. In summary, XMPP is an XML based communications protocol. This means all requests/responses happen through XML. The GTalk client sends requests as XML messages to its server at talk.google.com and receives responses also as XML messages. In the next section, we shall see what forms the essence of XMPP while we try out some experiments.



Raw XMPP communication with talk.google.com

How about talking to the GTalk server using its native language? Well, this is not as farfetched as it sounds. But before we attempt to do anything like that, we should understand the nature of the XMPP protocol.
XMPP defines two fundamental terms with respect to messaging — Streams and Stanzas. Here's how we can define them:
Stream:  A Stream is an open XML envelope sent before exchanging more XML elements between two entities. These entities can be either the client or the server. These XML elements are known as stanzas as we learn in the next definition. Streams are always the root elements. They start with an optional XML Processing Instruction (Prolog) followed by an unterminated element. The Stream contains other information such as the server it is addressed to, the version of protocol used and various namespace declarations.
Stanza:  A Stanza is a specific, well formed and complete XML element which either of the entities sends within an already open XML Stream. Stanzas are always the first level children in the XML document. XMPP Core defines three types of Stanzas viz. , and .
Entities can send any number of these Stanzas within an open Stream. All other information is sent as nested elements or attributes of these core Stanzas. Further details of these Stanzas are again beyond the scope of this article.
Now let's see what happens in a simple session of the client with the server. The following shows typical interaction between the client and the server. For ease of readability, messages sent by the client have been annotated in blue colour whilst those sent by the server in red. This and many more such examples can be found in the specification of XMPP Core, RFC 3920.




... encryption, authentication, and resource binding ...
Art thou not Romeo, and a Montague?
Neither, fair saint, if either thee dislike.

Typical interaction. Click here to show only messages from the client or the server. Click here to view both.
By looking at the patterns of messages, we can tell that there are two separate XML documents involved here. The one which the client opens and terminates in the end and other one which the server opens and closes. However, during an interaction, these XML documents are interspersed.
Now let's try some talking with talk.google.com. XMPP will be our language and TCP/IP our medium.
To do a raw communication with any server in its native protocol, we need to be able to open a terminal session at a specific port on the server. We can use any of the available telnet clients such as Microsoft Telnet or Putty to do this. I chose Putty for historical reasons.
Let's first configure Putty to open a raw connection on talk.google.com at 5222 port. Note that 5222 is the non-SSL port which Jabber protocol uses. If we were to use 5223, which is the SSL enabled port, we would have difficulties doing our raw communication due to the encrypted nature of the medium.


Screenshot of Putty showing configuration to talk.google.com on port 5222
Here's another screenshot of the actual raw XMPP communication we've been talking about till now. The first and third XML fragments are sent by the client (us) while the second and fourth are sent by the server, talk.google.com.


Screenshot of Putty showing raw XMPP interaction with talk.google.com. Click here if the image above appears cropped.
We first initiate the stream with the "to" attribute of the Stream set to "gmail.com". The server then acknowledges the request by sending another Stream enumerating the features it supports and the method of encryption that it mandates to be used. The element indicates that the server requires the client to acknowledge by sending another fragment, the element, indicating that it has accepted to start a TLS negotiation.
After this line, the server again acknowledges by telling the client to proceed with TLS negotiation. This is followed by an SASL negotiation. Here again, we reach our scope boundaries.
After the authentication phase, the client and the server can start exchanging XML Stanzas. We however, can't reach this stage using the raw communication approach with the GTalk server as TLS negotiation and SASL handshake both require understanding of complex encryption mechanisms and the messages exchanged would no longer be human readable.



What next?

So far, we talked about identifying the protocols applications use; discovered that GTalk is just another Jabber client. We learnt the basics of XMPP. We even successfully tried a preliminary raw XMPP communication with talk.google.com. Next, we shall advance a step higher and see how we can exploit the wealth of features provided by XMPP to play with GTalk!



References




Acknowledgements

They say, The Early Bird Catches The Worm. Had it not been for friends who pointed out mistakes and helped me fix them soon after it was published, this article would still have been in a bad shape. My sincere thanks to my friends Amod Pandey, Bharati K and Hemanth H M. 

Second Part

In the first part, we learnt about the basics of XMPP (Jabber) protocol and even learnt a native way of interacting with Google Talk server, talk.google.com. I am sure it was exciting enough.
Here, in this sequel, I will take you several steps ahead and introduce Application Layer Programming concepts for XMPP. I will also share implementation details of couple of cool things you can do with your Google Talk, programmatically! In the end, I am sure you would be as enthralled as I was on the day I discovered these possibilities.


More XMPP Concepts

To be able to effectively interact programmatically with XMPP and follow the intricacies which XMPP libraries talk about, one must understand few terminologies which XMPP defines.

Roster

Roster is the name given to the list of contacts which appears in one's chat client. In Google Talk terms, it is the list of friends you would see once you login. The elements of the roster are your contacts. Each of these items is identified by a unique identifier called JID (Jabber Identifier). JIDs have a syntax similar to that of Email addresses, i.e., user@domain. The domain part in this identifier is usually the server one connects to. In case of Google Talk, it is gmail.com or any of its domain aliases.
Rosters are stored on the server so that the XMPP client may retrieve the same each time the user logs in. Rosters are a type of IQ Stanza. (See IQ explanation.)

Message

This is one of the three fundamental Stanzas in XMPP. (The other two are IQ and Presence.) Just as it is obvious, a Message Stanza is used for sending IMs (Instant Messages). A Message Stanza is analogous to Email communication except for the fact that it is asynchronous and hence, instantaneous. A user sends a message bearing the address of the intended recipient to the server. The server then dispatches it to the addressee. It is an "I push to the server; the server pushes to you" form of messaging here.
When a user Juliet sends an IM to her pal Romeo, lots of things happen behind the scenes.
Let the JIDs of Romeo and Juliet be romeo@gmail.com and juliet@gmail.com respectively. Juliet would have already logged onto her XMPP client and she would have received her Roster from the server at gmail.com. She sees Romeo online and available. She types a message, "Wherefore art thou, Romeo?" and hits enter. Now, it's the combined responsibility of her XMPP client, the server at gmail.com and Romeo's XMPP client to all work in synchronism and deliver the message.
Juliet's XMPP client constructs a Message Stanza as follows.

Wherefore art thou, Romeo?

The "to" attribute contains JID of the intended recipient; Romeo in this case. The "from" attribute is the JID of the sender. A mandatory attribute, "type" with a value of "chat" is also set to distinguish this message from other types such as "error", "groupchat" or "headline". However, I shall not discuss these here. The "xml:lang" attribute specifies the language used, English in this case. The actual message itself is passed as a nested "body" element of the "message" Stanza.
The gmail.com server receives this Stanza. It learns that it has to dispatch it to a user with JID "romeo@gmail.com" and that it was sent by "juliet@gmail.com". Juliet sees Romeo online as he has already logged in as well. This means he too has established a connection with the server at gmail.com and has an active, unterminated XML Stream with gmail.com server (talk.google.com). The server simply pushes the message into Romeo's connection channel. Romeo's XMPP client detects this and pops up a window with the message Juliet had typed for him, in an instant! No wonder folks call it Instant Messaging!

IQ

This is another very important Stanza. IQ stands for Info/Query. It is similar to traditional Request/Response paradigms such as HTTP. An IQ Stanza can be one of four types, "get", "set", "result" or "error". The first two are requests or queries while the last two are responses or information. An IQ Stanza can wrap various kinds of elements. Let's see one such example here.
Juliet logs onto her XMPP client. The client sends an XMPP IQ Stanza of type "get" to gmail.com server requesting her roster.



The gmail.com server responds by sending Juliet the roster her XMPP client requested through another XMPP Stanza of type "result".

Friends Friends Friends

As it can be seen above, IQ Stanzas can contain nested XML elements. There is no restriction on the length of the response/request enclosed within "iq" tags.

Presence

This is the third Stanza type specified by XMPP Core. In layman's terms, Presence simply refers to the availability or status of a contact. The green, red and amber dots we see in Google Talk are different types of Presence. A nested element, "show" represents this information. There are by default, four types of values a "show" element can hold. These are,
  • chat: Indicates that the contact is available and ready to chat. (Green dot in GTalk)
  • dnd: Indicates "Do Not Disturb" and that the contact is busy for chat. (Red dot in GTalk)
  • away: Indicates that the contact is idle and possibly not at seat. (Amber dot in GTalk)
  • xa: Indicates "Extended Away". This means that the contact has been idle for a considerably long duration. GTalk, however, makes no distinction between this and the "away" states and an Amber dot is displayed for this as well.
Unlike the Message Stanza, Presence works in a very different mechanism. Presence information is usually broadcast to all the contacts in a user's Roster. The technique is a special kind of Broadcast Publish/Subscribe Messaging, which is introduced in brief detail in the next section. If you are not sure how Broadcast Publish/Subscribe Messaging works, you might want to take a look at the next section before continuing with this one. In this kind of Publish/Subscribe Messaging, each contact is both the Publisher and Subscriber of presence information. Each of the contacts in a certain user's roster would have subscribed (implicitly) to presence changes of the user.
Consider a simple scenario in which the user Romeo is about to sleep and changes his XMPP client (Google Talk, say) status to
"Good night, good night! parting is such sweet sorrow, that I shall say good night till it be morrow."
He also makes it a busy one so that people don't bother him while he's asleep. Now let's see what happens behind the scenes.
His XMPP client constructs a Presence Stanza and sends it to the gmail.com server, as shown here.

dnd Good night, good night! parting is such sweet sorrow, that I shall say good night till it be morrow.

The gmail.com server reacts to this by broadcasting the Presence Stanza to each contact in Romeo's roster. The Stanzas would look like this.

dnd Good night, good night! parting is such sweet sorrow, that I shall say good night till it be morrow. dnd Good night, good night! parting is such sweet sorrow, that I shall say good night till it be morrow. dnd Good night, good night! parting is such sweet sorrow, that I shall say good night till it be morrow.

Each of the above Presence Stanzas are sent individually by the server to different user's connections. In other words, the Presence with "to" bearing the JID of Juliet is delivered to her Stream. The one with JID of Mercutio, is sent to his Stream, and so on.


Introduction to Publish/Subscribe Messaging

This section has been provided here just as an aid to understanding the way Presence Stanzas work. You may safely skip this section if you are already aware of how Broadcast Publish/Subscribe Messaging works. I shall explain this in a little detail as we will be mostly dealing with Presence manipulations in our examples a little later.
Consider this. You live in a certain part of New York and have subscribed to a couple of dailies, The New York Times and The Times of India. Specifically, you are interested in happenings around New York, stock quotes from the NYSE, Bollywood gossip and you are also an avid fan of cookery articles by Sanjeev Kapoor.
There would be millions of subscribers to these and other dailies, just like you. The originators of these articles, stock quotes and many other types of information would just publish their artifacts to these dailies in order to reach their targeted audience which includes people like you. Periodically, you would get copies of the dailies you have subscribed to. These dailies federate contents from various sources and authors.
In this example, the authors of various artifacts are indirectly communicating with you, albeit one way. This forms the essence of Publish/Subscribe messaging. Have a look at the following representation for a better picture.

A graphical representation of Publish/Subscribe Messaging. The entities on the left represent Publishers whilst those on the right represent Subscribers. A common interaction point for both of these is the Topic Server.
There are four types of entities involved in this kind of Broadcasting Publish/Subscribe Messaging.

Publisher

A Publisher's job is to generate content targeted at a specific or generic audience. A Publisher is usually unaware of its Subscribers. A Publisher can publish to any number of Topics. In the above representation, the red Publisher publishes to two Topics, T1 and T2.

Topic

The core entity here is called a Topic. These are entities to which both the Publishers and Subscribers have access to. A Publisher publishes to a Topic. A Subscriber subscribes to a certain Topic. Any number of Publishers can publish to any number of topics. A single Topic can be served by multiple Publishers. In the above diagram, Topic T1 is served by both the blue and red Publishers. A Topic can be considered equivalent to a certain news topic in a daily in our example.

Topic Server

A Topic Server aggregates different Topics to form a single, unified entity. This is equivalent to the daily in our example.

Subscriber

These are the end users of messaging. Subscribers subscribe to Topics via Topic Servers. Subscribers are usually aware of the Publishers of the Topics they are subscribing to. Usually there will be more number of Subscribers than there are Publishers for a given Topic. It is also possible for Subscribers to be Publishers and vice versa, under special circumstances. In our example, you are a Subscriber. Your occupation might be an Editor for a journal in which case, you become both a Publisher as well as a Subscriber.


XMPP Libraries for Application Programming

Before we start writing XMPP Application code, it's worth having a look at some of the available libraries which we can use. XMPP libraries are available for different programming languages, with different types of licenses. There are a number of free libraries as well. Besides providing all the specified XMPP functionalities, these libraries often include additional useful components.
The one we shall be using in our application programming is called Smack API. Smack is an open source, pure Java library for working with XMPP (clients only). The API can be downloaded from http://www.igniterealtime.org/downloads/index.jsp#smack. The source code for this can be obtained from http://www.igniterealtime.org/downloads/source.jsp.
Please see the References section for a list of other XMPP libraries, both for Java and other programming languages.


Getting started with Smack API

Now let's try some simple examples using the Smack API. The snippets of Java code shown here are not complete. They might need proper exception handling and configuration depending on the circumstances.

Obtaining XMPP Connection and Enumerating Roster Entries

The following example shows how to get connected to Google Talk server and list all the contacts in your roster.

// Create XMPP connection to gmail.com server
    XMPPConnection connection = new XMPPConnection("gmail.com");
    
    try
    {
        // Connect
        connection.connect();
        
        // Login with appropriate credentials
        connection.login("username", "password");

        // Get the user's roster
        Roster roster = connection.getRoster();
        
        // Print the number of contacts
        System.out.println("Number of contacts: " + roster.getEntryCount());

        // Enumerate all contacts in the user's roster
        for (RosterEntry entry : roster.getEntries())
        {
            System.out.println("User: " + entry.getUser());
        }
    }
    catch (XMPPException e)
    {
        // Do something better than this!
        e.printStackTrace();
    }

Java code listing to demonstrate use of Smack API to connect to Google Talk server and list out the contact list.

Changing Google Talk Status Programmatically

The following snippet shows how to change your chat status programmatically. However, since Google Talk has the highest priority for chat statuses, we can only override the status of Google Talk as long as our program is running. So, we make the running thread to sleep for 20 seconds so that the status change can be observed. Also note that even though the status would've changed, it won't show up in your own Google Talk. Instead, ask a contact of yours to verify this for you.

// Create the presence object with default availability 
    Presence presence = new Presence(Presence.Type.available);
    
    // Set the status message
    presence.setStatus("Online, Programmatically!");
    
    // Set the highest priority
    presence.setPriority(24);
    
    // Set available presence mode
    presence.setMode(Presence.Mode.available);
    
    // Send the presence packet through the connection
    connection.sendPacket(presence);
    
    // Sleep for 20 seconds
    Thread.sleep(20000);

Code snippet to change status message for a limited time duration.
This way, one can do numerous things with Smack API. If you like to know more about Smack, I recommend you to have a look at the Smack API Documentation. Kindly see the References section for links.


Scrolling Google Talk Status Message

How about showing something like the following against your name in your friends' Google Talk clients?

Google Talk showing a scrolling status message for contact named "Stanza Stream".
Well, now that you know how to set Presence programmatically, that shouldn't be a hitch. We take the text for the status message, "I ♥ Maggi Noodles", in this case, rotate it by a character to the right, push the packet into the stream, wait for a fraction of a second and repeat all over again.
Following code snippet shows how to accomplish this.

// The text for the status
    String status = "I \u2665 Maggi Noodles ";
    
    // Construct the presence object
    Presence presence = 
        new Presence(
            Presence.Type.available, 
            status, 
            24, 
            Presence.Mode.available);
    
    // Send 500 packets
    for (int i = 0; i < 500; i++)
    {
        // Rotate left by a character
        status = rotate(status);
        
        // Set the status into presence object
        presence.setStatus(status);
        
        // Send it
        connection.sendPacket(presence);
        
        // Sleep for a tenth of a second
        Thread.sleep(100);
    }

Listing to produce a scrolling effect in Google Talk status messages.
The code for the rotate() function is simple and is as shown below.

// Rotate by shifting a character right at a time
    private static String rotate(String input)
    {
        return input.substring(1) + input.charAt(0);
    }

Listing to rotate phrases by a character to the right.


Displaying Current Music Track from Winamp, your own way!

This is yet another cool implication of playing with Google Talk status through Java. In fact, when I had first implemented this feature, Google Talk never had the "Show current music track" custom status message which it now provides! Let's see how we can program it.
To get the current track information, we need to first install a small plug-in for Winamp. It is called the Now Playing Plug-in which can be downloaded from http://www.cc.jyu.fi/~ltnevala/nowplaying/.
The plug-in has an option for delivering the title (or any ID3 tag) of the music track being played in Winamp in three ways — FTP, Local Save and an HTTP Post. For simplicity, we shall use the HTTP Post method. Now Playing plug-in needs to be configured as follows to post to a certain Java HTTP Servlet which is hosted at a URL, say, http://localhost:8080/GStatus.

Screenshot showing typical configuration of Now Playing plug-in to perform HTTP Post to a Servlet hosted at http://localhost:8080/GStatus.
Once this is configured, we write our Servlet and deploy it on any J2EE Servlet Container such as Apache Tomcat. Now Playing plug-in posts histories of a configurable number of songs played on Winamp. The title of the song that is currently being played would appear as a request parameter with the name Song[0][Title]. Other values can be fetched too as per the plug-in's documentation.
The following Servlet code shows the details. To reduce network traffic, the connection is obtained only once, when the Servlet starts up, in its init() method. Similarly, the connection should be promptly closed in the destroy() method.

public class GStatus extends HttpServlet
    {
        // We should re-use this
        private XMPPConnection connection;
        
        protected void init() throws ServletException
        {
            super.init();
            
            try
            {
                // Create the connection
                this.connection = new XMPPConnection("gmail.com");
                
                // Connect
                connection.connect();
                
                // Login with appropriate credentials
                this.connection.login("username", "password");
            }
            catch (XMPPException e)
            {
                throw new RuntimeException(e);
            }
        }
        
        protected void doPost(HttpServletRequest request, 
            HttpServletResponse response) 
            throws ServletException, IOException
        {
            // Create the presence object with default availability 
            Presence presence = new Presence(Presence.Type.available);
            
            // Get the Title and Artist of the track now playing
            String title = request.getParameter("Song[0][Title]");
            String artist = request.getParameter("Song[0][Artist]");
            
            // Set the status message
            presence.setStatus(
                "I'm listening to \u266A "
                + artist
                + " - "
                + title);
            
            // Set the highest priority
            presence.setPriority(24);
            
            // Set available presence mode
            presence.setMode(Presence.Mode.available);
            
            // Send the presence packet through the connection
            this.connection.sendPacket(presence);
        }
        
        protected void destroy()
        {
            super.destroy();
            
            // Disconnect the connection
            this.connection.disconnect();
        }
    }

Java Servlet code showing how the information about the music track being played in Winamp can be used to set Google Talk status messages.
Once the Servlet and Now Playing plug-in are properly configured and set up, Winamp hits the Servlet each time the song being played changes. The Servlet picks up the current song being played by reading data from the HttpServletRequest parameters. It then creates a Presence packet and pushes it to the Google Talk server using a pre-established connection. The server in turn, broadcasts this Presence/status to all contacts in your Roster.
If all goes well, the track being played on Winamp would be displayed as shown in the following screenshot.

Google Talk screenshot showing the song currently being played in Winamp.
Note that unlike Google Talk itself, we can customize the message we would like to be displayed while we are listening to our favourite music, not to mention the numerous other possibilities with programmatic status messages.


Conclusion

As XMPP continues to remain an open standard, more and more applications are being developed not only for human communication, but also for machine interaction. Gradually, the terms Jabber and XMPP have become interchangeable. With the advent of XMPP Extensions, features such as Voice Chat, File Transfers, Video Conferencing, Feed Notifications, Gaming, Drawing Canvas and so on are no longer bound to proprietary protocols such as Yahoo or MSN.
All the Java code examples provided in this article have been tested. To be able to run them successfully, Java 5 (Tiger) or higher is required.
A lot of effort has gone into writing this article. All the images, scripts, representations and code samples are original property. Kindly obtain my consent before publishing them elsewhere. However, you are free to provide links to this article or share it with as many people as possible.


References




Acknowledgements

As with any work that requires a lot of patience, I deeply acknowledge all the people in my life who have stood by me, constantly uplifting and pushing me to achieve my goals.
While writing this article, there was a stage when I had to take quick, automated and consecutive screenshots so that I could use them for creating GIF Animations. I stumbled upon a wonderful, feature-packed yet free tool for doing just that. The tool is named MWSnap. I thank the creator of this product, Mirek Wojtowicz.

No comments: