Biography | About master's work | Master's portal | DonNTU

RUS | ENG | UKR

Research the workload of the server of instant messaging service. Software development for the workload balancing on the server

Contents

Relevance of the master's work topic

Currently, such a network services, as instant messaging service (IM) and IP telephony are getting more and more popular in the communications environment. This happens due to the fact that when these services are used there can be big money savings in communications with people over long distances.

IM is the choice of communication for many people because it provides a more "personal" communication than e-mail, but less intrusive than a phone.

IM also helpful in communications in emergency situations, as the Internet can function while the phone line is not.

IM vs Email

Instant messaging services start to replace email due to the email cons listed below:

  • Spam
  • Low level of security
  • Discomfort in active correspondence
  • Relatively slow message delivery
  • Information absence on the status of the contacts

The main reason for the e-mail spam is the absence of a reliable identification system for the sender's message. In today's e-mail system, spammers can put in the field "Sender" any information. It's one of the main reasons for low security.

Disadvantages of active correspondence are the difficulties of monitoring the entire history of communication. As a result, active email users are left to organize own ways to store logs.

Delivery of messages by e-mail is slow, if compared with instant messaging. Delay in 1-10 seconds (the official specification says about the possibility of mail delivery within 24 hours) makes it difficult to communicate in real time. In addition, too often mail server check can be interpreted as a network attack, leading to a temporary blockage of the user.

IM and mobile phone

Experts of the international research company TNS found that SMS and e-mail on a mobile phone is gradually losing its popularity. "Non-voice" communication is now increasingly based on instant messaging. According to the study, which was attended by 17 thousand respondents from 30 countries, 61% of mobile phone owners prefer IM way of communication, 55% use SMS and only 12% - e-mail.

Choosing IM instead of SMS is quite logical: "As soon as the owners of mobile phones can access the Internet and are able to use instant messaging, text messages cost decreases rapidly and tends to zero. In that case, users pay only for traffic, and the messages cost nothing. In addition, large numbers of people are used to communicate via instant messaging services over the Internet on personal computers, so they are more readily switch to a similar communication on their mobile phones. And as more mobile operators offer their customers unlimited internet access, we can speak with confidence about continued growth in popularity of instant messaging and reducing the role of SMS and email messages "- explain the experts TNS.

Today, all over the world 8% of the people communicate over instant messaging on a mobile phone. Such communications are popular in Hong Kong, where the number of people using this type of communication reaches 23%. The next are China (16%), Saudi Arabia (15%), South Africa (12%), India (11%) and Brazil (10%).

Cons and weaknesses of IM

Despite all its advantages, IM services are not without cons. Here are the main of them:

  • Spam. IM service is also subject to unwanted advertising (IM spam is sometimes called spim). Spam is the most annoying problem for users of instant messaging services. Many organizations use the IM services for "free advertising" by the mass spamming a large number of users. In addition, this type of distribution has gained popularity among groups of people whose purpose is the theft of information or inflicting harm to a user's system.
  • Redundancy of the transmitted information. This problem is typical for protocols such as XMPP (Jabber). If the corporate network involves many workers in the chat, it can strike at the network load and performance.
  • Insufficient resources. Frequent disconnect from the server happens in such protocols as ICQ. Many users receive such messages like "Connection limit exceeded”. There periodical disconnections from the Skype servers.
  • Low level of network security. This is dangerous especially for corporate networks in which instant messaging services using among staff is allowed. This point requires special attention and more details are discussed below.

Network security of instant messaging services.

Different IM applications use different proprietary protocols, and standard firewall configuration can not detect them. Most IM programs can bypass the authentication system. Some IM clients can use ports other than those associated with IM, even the normally open ports, such as 80.

The main security problems of IM services are:

  • P2P sharing. This type of exchange occurs when files are transferred in such protocols as ICQ. P2P sharing provides an opportunity to find the IP address of the client and allows direct connection to user’s computer for using its vulnerabilities.
  • Data encryption and secure authentication. Many protocols today often do not use encryption during the chat and/or transmit passwords openly. These cons are dangerous for corporate networks, which include sending confidential information and confidential communication.

The rising of instant messaging services popularity can be seen in statistics studies. The relevance of the work described in this topic is to improve the quality of services by reducing the cons of IM services.

Motvation

Motivation of the works is based on the personal experience of using the various instant messaging services, as well as the experiences of people with whom communications took place over these services.

The most annoying fact is spam while using the ICQ protocol. Spam is being received from contacts that are in the list. This happens due to the low level of security. The ICQ service is vulnerable to hacking and accounts thefts. In addition, disconnections with subsequent inability to connect within 5 to 15 minutes happen in ICQ service.

A similar problem takes place in the Skype service. Skype Client connects to the server longer than clients of other protocols. Also periodic breaks in the connection happen in Skype protocol. While using Skype for Voice communication any disconnections weren’t noticed.

And the final motivating factor for choosing the topic of master's work is the desire to study the ways of data encryption and to make experience in the development of network software, which is actively using the databases.

Tasks of the master's work

The tasks of the master’s work are:

  • Review of existing solutions and statistics gathering of different IM services popularity;
  • Study of the open instant messages protocols;
  • Development of the protocol of a decentralized instant messaging (DMP);
  • Modeling multiserver instant messaging network;
  • Building the necessary libraries for the implementation of software using the DMP protocol;
  • Development of a test server, client and router working on the DMP protocol;
  • Adding the support of encryption in the protocol;
  • Organization of a database that will store accounts’ data;îðãàíèçàöèÿ áàçû äàííûõ, â êîòîðîé áóäóò õðàíèòüñÿ äàííûå îá ó÷åòíûõ çàïèñÿõ;
  • Testing of the developed software and analyzing protocol statistics and results.

Expected scientific novelty

Expected scientific novelty consists of development and implementation of routing algorithms between servers, development of the new protocol for messaging.

Also several approaches for dealing with spam in networks that use the DMP protocol are offered.

A review of research and development on the subject

Review of the popular IM services

Figure 1 shows the percentage of instant messaging services popularity in the CIS countries on July 2008

pic

Figure 1 - Using instant messaging in the CIS countries.

The most popular IM service in the CIS countries is ICQ, which is using the protocol OSCAR. However, many people receive error "Connection limit exceeded". Also a large amount of spam is being sent by bots. Spamming can be easy organized due to the way of contacts identification by the OSCAR protocol. OSCAR identifies contacts by the number. OSCAR protocol provides a low safety for use in commercial networks.

Another protocol, getting popular is XMPP. Jabber and GTalk services use this protocol. XMPP is based on XML, an open, free to use protocol for instant messaging and presence information in a mode close to real time, which is also decentralized, but also has weaknesses:

  • Redundancy of transmitted data: Typically, more than 70% of inter-server traffic XMPP are reports of the presence. About 60% of this traffic is redundant.
  • Scalability: XMPP are actually suffering from the same problems of redundancy, but applied to chat rooms and features publication.
  • Inefficiency of transmission of binary data: because XMPP is one long XML-document, it is impossible to transmit the unmodified binary data.

Methods of balancing server workload

There are various ways in which load balancing can be achieved. The deciding factors for choosing one over the other depends on the requirement, available features, complexity of implementation, and cost. For example, using a hardware load balancing equipment is very costly compared to the software version.

Round Robin DNS Load Balancing

The in-built round-robin feature of BIND of a DNS server can be used to load balance multiple servers. It is one of the early adopted load balancing techniques to cycle through the IP addresses corresponding to a group of servers in a cluser.

Pros: Very simple, inexpensive and easy to implement.

Cons: The DNS server does not have any knowledge of the server availability and will continue to point to an unavailable server. It can only differentiate by IP address, but not by server port. The IP address can also be cached by other name servers and requests may not be sent to the load balancing DNS server.

Hardware Load Balancing

Hardware load balancers can route TCP/IP packets to various servers in a cluster. These types of load balancers are often found to provide a robust topology with high availability, but comes for a much higher cost.

Pros: Uses circuit level network gateway to route traffic.

Cons: Higher costs compared to software versions.

Software Load Balancing

Most commonly used load balancers are software based, and often comes as an integrated component of expensive web server and application server software packages.

Pros: Cheaper than hardware load balancers. More configurable based on requirements. Can incorporate intelligent routing based on multiple input parameters.

Cons: Need to provide additional hardware to isolate the load balancer.

Expected practical results

As a result of development and research the next results are expected:

  • reducing the amount of the transmitted traffic;
  • reducing the information amount processed by server;
  • improving the security comparing to existing IM protocols;
  • reducing the potential for spam.

Results are available at the time of completion of the article

Solving the problems of the low bandwith and capabilities

DMP protocol (decentralized messaging protocol) is aimed primarily at reducing the workload of the server and the number of transmitted information.

The protocol is decentralized, i.e. uses multiple servers for load balancing on each of them. Statistics show that about 90% of user's contacts reside within a single administrative unit. Thus, it is effective to use one server for this administrative unit, and the remaining 10% of contacts are served with cross-server exchange. This solves the problem of server’s limited capabilities.

The problem of bandwidth is solved by the practice of using XMPP protocol. Opening the standards of the protocol allows anyone to establish own server, i.e. servers are not served by a single organization, but by many private or corporate entities that have different abilities to transfer traffic.

In contrast to the protocol XMPP, the router is not a part of the server. Router is a separate program unit that can be installed on a separate machine or on the machine that has running DMP server.

Direct connection between servers bypassing the router is also possible. This solution improves the exchange capabilities between the servers (see Fig. 2).

Figure 2 - Possible structure of the DMP service (flash-animation,38KB, 5.7 seconds)

An example of a possible structure of the DMP service can be seen from figure 2. On the server 2 the router can be placed.

In addition, the protocol involves data compression with different compressors, the format of which is set by the active version of the protocol. For the beta version of the protocol compressor bzip2 is specified. bzip2 is a free open source software. For subsequent versions of the protocol more efficient compressors can be expanded. In addition, compression of the transmitted information must be reasonable. For example, compression of binary information, for example, when transferring files is not always reasonable. This efficiency should be determined on the client side.

Minimalism in organization of packet structure is used for reducing the traffic. The structure of packets is discussed below.

The using of routers is limited only by the effectiveness.

There are also some cons in the complexity of monitoring capabilities, reliability and security of the server. This problem is solved by experience of using a particular server, as well as feedback from users.

Solving the problem of security

DMP protocol is able to exclude P2P exchange when transferring files. Safety increase is realized by the possibility to exchange files through server only. This solution loads the channel’s bandwidth and the server of DMP service as a whole. However, information security is more important, since the protocol is aimed at use in corporate networks, where confidential information can be transferred.

In addition, the protocol includes encryption of transmitted data and secure authentication. Encryption is necessary to avoid possible interception of transmitted information and listening.

An attempt to reduce the spam is based on limitation of the transmitted information amount per time unit, and comparing MD5 or SHA fingerprints of message parts or the whole message of a contact. If they are the same, in excess of a certain number of matches it is possible to lock the contact on the server, as a suspicion of spam bot. However, this method is experimental and highly questionable in the opinion of the author and requires additional testing and research in practice.

General description of the protocol

The main features provided by the DMP protocol are presented in the following paragraphs:

  • Protocol uses the account type nickname @ serverdns or simply nickname with a clear indication of IP address of the server (if it does not have a domain name) in time of connection. This enables the server to be configured and used in local area networks without Internet access.
  • There is support for the contacts list with the status of presence. Also, there are lists of acceptable visibility and ignorance.
  • It is planned to support audio and video and the organization of conferences. But these functions are disabled so far, due to the state of the protocol at the level of beta testing.
  • At this stage of development the protocol provides the ability to use any of the databases. The only requirement is the database structure, as well as ensuring adequate security (e.g., prohibition of storing passwords in clear text, etc.).

Brief protocol specification

Basic unit of the protocol, the package is described in this section. Packet length is not fixed, but should not exceed a fixed value - 1MB. The structure of the package is shown in Figure 3.

pic

Figure 3 - Structure of the packet of DMP protocol

As seen in Figure 3, the package contains 4 fields: signature, data encryption, compression, and field of commands and data.

The protocol eliminates such excessive ways of organizing information as XML for minimizing the traffic.

ÑThe signature contains a version of the protocol.

The command format has the form: ID command: options. Command ID is a numeric field sized with 16 bits, i.e. maximum number of commands is 65536.

It should be noted that the encryption and compression fields may be empty.

The software available up to this point

Client and server using the DMP protocol are developed. This software implements the core functionality of the protocol.

This software is implemented on 2 platforms - Win32 and Linux. Client and server were tested on Windows XP SP3 and Mandriva Linux 2009.1.

Also the implementation of the following features is scheduled:

  • Support the encryption;
  • Support the creation of chat rooms;
  • Support the lists of invisibility, ignorance.

Also, a router has to be developed.

Conclusions

The DMP protocol is developed. It is based on the experience of using such instant messaging services as ICQ and XMPP. Protocol reduces following cons of instant messaging services:

  • reduces the amount of information transmitted via data compression and minimalism in the organization of protocol commands;
  • distributes the workload on the server and the channel through a network of servers;
  • increased security through encryption of transmitted information, as well as exclusion of P2P sharing;
  • reduces spam by analyzing fingerprints of messages and the number of users to whom the message is sent, as well as the volume of textual information transmitted per time unit.

References

1. AIM/Oscar Protocol Specification - http://www.oilcan.org/oscar/

2. XMPP Standards Foundation - http://xmpp.org/

3. ICQ / From Wikipedia, the free encyclopedia - http://en.wikipedia.org/wiki/Icq

4. XMPP / From Wikipedia, the free encyclopedia - http://en.wikipedia.org/wiki/Jabber

5. Global Instant Messaging Market Share / Open Data - http://billionsconnected.com/blog/2008/08/global-im-market-share-im-usage/

6. IM Network Market Share by Country, July 2008 (%) / Google Doc - http://spreadsheets.google.com/ccc?key=p5D5M7Vy6XNdfLH8xX9lbHw&hl=en

7. IM vs. SMS - http://blog.imobis.ru/research/im-vs-sms-novaya-statistika.html

8. Server Load Balancing: Algorithms - http://content.websitegear.com/article/load_balance_types.htm

9. Server Load Balancing Methods - http://content.websitegear.com/article/load_balance_methods.htm

10. Shinder D., Instant Messaging: Does it have a Place in Business Networks? - http://www.windowsecurity.com/articles/Instant-Messaging-Business-Networks.html

11. Hindocha N., Instant Insecurity: Security Issues of Instant Messaging - http://www.symantec.com/connect/articles/instant-insecurity-security-issues-instant-messaging

12. Leskovec J., Horvitz E., Planetary-scale views on a large instant-messaging network / Microsoft Research Technical Report MSR-TR-2006-186

13. Sharma S., Singh S., Sharma M., Performance Analysis of Load Balancing Algorithms / World Academy of Science, Engineering and Technology 38 2008

14. Czerwinski M., Cutrell E. and Horvitz E., Instant Messaging: Effects of Relevance and Timing / Microsoft Research

Biography | About master's work | Master's portal | DonNTU
RUS | ENG | UKR