Biography | About master's work | Master's portal | DonNTU RUS | ENG | UKR |
|||||||||||||
Research the workload of the server of instant messaging service. Software development for the workload balancing on the server
|
|||||||||||||
Relevance of the master's work topic Currently, such a network services, as instant messaging service (IM) and IP telephony are getting more and more popular in the communications environment. This happens due to the fact that when these services are used there can be big money savings in communications with people over long distances. IM is the choice of communication for many people because it provides a more "personal" communication than e-mail, but less intrusive than a phone. IM also helpful in communications in emergency situations, as the Internet can function while the phone line is not. Instant messaging services start to replace email due to the email cons listed below:
The main reason for the e-mail spam is the absence of a reliable identification system for the sender's message. In today's e-mail system, spammers can put in the field "Sender" any information. It's one of the main reasons for low security. Disadvantages of active correspondence are the difficulties of monitoring the entire history of communication. As a result, active email users are left to organize own ways to store logs. Delivery of messages by e-mail is slow, if compared with instant messaging. Delay in 1-10 seconds (the official specification says about the possibility of mail delivery within 24 hours) makes it difficult to communicate in real time. In addition, too often mail server check can be interpreted as a network attack, leading to a temporary blockage of the user. Experts of the international research company TNS found that SMS and e-mail on a mobile phone is gradually losing its popularity. "Non-voice" communication is now increasingly based on instant messaging. According to the study, which was attended by 17 thousand respondents from 30 countries, 61% of mobile phone owners prefer IM way of communication, 55% use SMS and only 12% - e-mail. Choosing IM instead of SMS is quite logical: "As soon as the owners of mobile phones can access the Internet and are able to use instant messaging, text messages cost decreases rapidly and tends to zero. In that case, users pay only for traffic, and the messages cost nothing. In addition, large numbers of people are used to communicate via instant messaging services over the Internet on personal computers, so they are more readily switch to a similar communication on their mobile phones. And as more mobile operators offer their customers unlimited internet access, we can speak with confidence about continued growth in popularity of instant messaging and reducing the role of SMS and email messages "- explain the experts TNS. Today, all over the world 8% of the people communicate over instant messaging on a mobile phone. Such communications are popular in Hong Kong, where the number of people using this type of communication reaches 23%. The next are China (16%), Saudi Arabia (15%), South Africa (12%), India (11%) and Brazil (10%). Despite all its advantages, IM services are not without cons. Here are the main of them:
Network security of instant messaging services. Different IM applications use different proprietary protocols, and standard firewall configuration can not detect them. Most IM programs can bypass the authentication system. Some IM clients can use ports other than those associated with IM, even the normally open ports, such as 80. The main security problems of IM services are:
The rising of instant messaging services popularity can be seen in statistics studies. The relevance of the work described in this topic is to improve the quality of services by reducing the cons of IM services. Motivation of the works is based on the personal experience of using the various instant messaging services, as well as the experiences of people with whom communications took place over these services. The most annoying fact is spam while using the ICQ protocol. Spam is being received from contacts that are in the list. This happens due to the low level of security. The ICQ service is vulnerable to hacking and accounts thefts. In addition, disconnections with subsequent inability to connect within 5 to 15 minutes happen in ICQ service. A similar problem takes place in the Skype service. Skype Client connects to the server longer than clients of other protocols. Also periodic breaks in the connection happen in Skype protocol. While using Skype for Voice communication any disconnections weren’t noticed. And the final motivating factor for choosing the topic of master's work is the desire to study the ways of data encryption and to make experience in the development of network software, which is actively using the databases. The tasks of the master’s work are:
Expected scientific novelty consists of development and implementation of routing algorithms between servers, development of the new protocol for messaging. Also several approaches for dealing with spam in networks that use the DMP protocol are offered. A review of research and development on the subject Review of the popular IM services Figure 1 shows the percentage of instant messaging services popularity in the CIS countries on July 2008 Figure 1 - Using instant messaging in the CIS countries. The most popular IM service in the CIS countries is ICQ, which is using the protocol OSCAR. However, many people receive error "Connection limit exceeded". Also a large amount of spam is being sent by bots. Spamming can be easy organized due to the way of contacts identification by the OSCAR protocol. OSCAR identifies contacts by the number. OSCAR protocol provides a low safety for use in commercial networks. Another protocol, getting popular is XMPP. Jabber and GTalk services use this protocol. XMPP is based on XML, an open, free to use protocol for instant messaging and presence information in a mode close to real time, which is also decentralized, but also has weaknesses:
Methods of balancing server workload There are various ways in which load balancing can be achieved. The deciding factors for choosing one over the other depends on the requirement, available features, complexity of implementation, and cost. For example, using a hardware load balancing equipment is very costly compared to the software version. Round Robin DNS Load Balancing The in-built round-robin feature of BIND of a DNS server can be used to load balance multiple servers. It is one of the early adopted load balancing techniques to cycle through the IP addresses corresponding to a group of servers in a cluser. Pros: Very simple, inexpensive and easy to implement. Cons: The DNS server does not have any knowledge of the server availability and will continue to point to an unavailable server. It can only differentiate by IP address, but not by server port. The IP address can also be cached by other name servers and requests may not be sent to the load balancing DNS server. Hardware Load Balancing Hardware load balancers can route TCP/IP packets to various servers in a cluster. These types of load balancers are often found to provide a robust topology with high availability, but comes for a much higher cost. Pros: Uses circuit level network gateway to route traffic. Cons: Higher costs compared to software versions. Software Load Balancing Most commonly used load balancers are software based, and often comes as an integrated component of expensive web server and application server software packages. Pros: Cheaper than hardware load balancers. More configurable based on requirements. Can incorporate intelligent routing based on multiple input parameters. Cons: Need to provide additional hardware to isolate the load balancer. As a result of development and research the next results are expected:
Results are available at the time of completion of the article Solving the problems of the low bandwith and capabilities DMP protocol (decentralized messaging protocol) is aimed primarily at reducing the workload of the server and the number of transmitted information. The protocol is decentralized, i.e. uses multiple servers for load balancing on each of them. Statistics show that about 90% of user's contacts reside within a single administrative unit. Thus, it is effective to use one server for this administrative unit, and the remaining 10% of contacts are served with cross-server exchange. This solves the problem of server’s limited capabilities. The problem of bandwidth is solved by the practice of using XMPP protocol. Opening the standards of the protocol allows anyone to establish own server, i.e. servers are not served by a single organization, but by many private or corporate entities that have different abilities to transfer traffic. In contrast to the protocol XMPP, the router is not a part of the server. Router is a separate program unit that can be installed on a separate machine or on the machine that has running DMP server. Direct connection between servers bypassing the router is also possible. This solution improves the exchange capabilities between the servers (see Fig. 2). Figure 2 - Possible structure of the DMP service (flash-animation,38KB, 5.7 seconds) An example of a possible structure of the DMP service can be seen from figure 2. On the server 2 the router can be placed. In addition, the protocol involves data compression with different compressors, the format of which is set by the active version of the protocol. For the beta version of the protocol compressor bzip2 is specified. bzip2 is a free open source software. For subsequent versions of the protocol more efficient compressors can be expanded. In addition, compression of the transmitted information must be reasonable. For example, compression of binary information, for example, when transferring files is not always reasonable. This efficiency should be determined on the client side. Minimalism in organization of packet structure is used for reducing the traffic. The structure of packets is discussed below. The using of routers is limited only by the effectiveness. There are also some cons in the complexity of monitoring capabilities, reliability and security of the server. This problem is solved by experience of using a particular server, as well as feedback from users. Solving the problem of security DMP protocol is able to exclude P2P exchange when transferring files. Safety increase is realized by the possibility to exchange files through server only. This solution loads the channel’s bandwidth and the server of DMP service as a whole. However, information security is more important, since the protocol is aimed at use in corporate networks, where confidential information can be transferred. In addition, the protocol includes encryption of transmitted data and secure authentication. Encryption is necessary to avoid possible interception of transmitted information and listening. An attempt to reduce the spam is based on limitation of the transmitted information amount per time unit, and comparing MD5 or SHA fingerprints of message parts or the whole message of a contact. If they are the same, in excess of a certain number of matches it is possible to lock the contact on the server, as a suspicion of spam bot. However, this method is experimental and highly questionable in the opinion of the author and requires additional testing and research in practice. General description of the protocol The main features provided by the DMP protocol are presented in the following paragraphs:
Basic unit of the protocol, the package is described in this section. Packet length is not fixed, but should not exceed a fixed value - 1MB. The structure of the package is shown in Figure 3. Figure 3 - Structure of the packet of DMP protocol As seen in Figure 3, the package contains 4 fields: signature, data encryption, compression, and field of commands and data. The protocol eliminates such excessive ways of organizing information as XML for minimizing the traffic. ÑThe signature contains a version of the protocol. The command format has the form: ID command: options. Command ID is a numeric field sized with 16 bits, i.e. maximum number of commands is 65536. It should be noted that the encryption and compression fields may be empty. The software available up to this point Client and server using the DMP protocol are developed. This software implements the core functionality of the protocol. This software is implemented on 2 platforms - Win32 and Linux. Client and server were tested on Windows XP SP3 and Mandriva Linux 2009.1. Also the implementation of the following features is scheduled:
Also, a router has to be developed. The DMP protocol is developed. It is based on the experience of using such instant messaging services as ICQ and XMPP. Protocol reduces following cons of instant messaging services:
1. AIM/Oscar Protocol Specification - http://www.oilcan.org/oscar/ 2. XMPP Standards Foundation - http://xmpp.org/ 3. ICQ / From Wikipedia, the free encyclopedia - http://en.wikipedia.org/wiki/Icq 4. XMPP / From Wikipedia, the free encyclopedia - http://en.wikipedia.org/wiki/Jabber 7. IM vs. SMS - http://blog.imobis.ru/research/im-vs-sms-novaya-statistika.html 8. Server Load Balancing: Algorithms - http://content.websitegear.com/article/load_balance_types.htm 9. Server Load Balancing Methods - http://content.websitegear.com/article/load_balance_methods.htm 12. Leskovec J., Horvitz E., Planetary-scale views on a large instant-messaging network / Microsoft Research Technical Report MSR-TR-2006-186 13. Sharma S., Singh S., Sharma M., Performance Analysis of Load Balancing Algorithms / World Academy of Science, Engineering and Technology 38 2008 14. Czerwinski M., Cutrell E. and Horvitz E., Instant Messaging: Effects of Relevance and Timing / Microsoft Research |
|||||||||||||
Biography |
About master's work |
Master's portal |
DonNTU |
|||||||||||||
RUS | ENG | UKR |