Vladislav ViazminFaculty of computer science and technology (CST)Department of software engineering (SE)Speciality
|
Vladislav ViazminFaculty of computer science and technology (CST)Department of software engineering (SE)Speciality
|
When writing this essay, the master's work is not yet complete. Final completion: June 2021. The full text of the work and materials on the topic can be obtained from the author or his supervisor after the specified date.
Introduction
Voice messengers [1] have gained their popularity only recently. The development of this industry comes at a time when average Internet speeds around the world have become comfortable enough to use these apps, even on smartphones. Messenger is a program, mobile application or web service for instant messaging, in this case, by voice. This industry began to develop quite rapidly and efficiently, which resulted in the creation of at least hundreds, if not thousands of messengers.
Messenger is a client-server application [2], which can be implemented for a user as a mobile application or web service for instant messaging. This area is developing quite quickly and efficiently, resulting in the creation of at least hundreds, if not thousands, of instant messengers by various software firms and individual programmers around the world. Voice messaging using the Internet is quite common in the circles of private users, as well as in government and corporate institutions. For the transmission of media data, and in particular audio data over the network between users, the real-time transmission protocol RTP [3] is used, which is insecure because by default it does not contain cryptographic encryption and authentication, and data in the RTP-package is transmitted in the open. For this reason, it is important to improve the efficiency of voice messaging protocols.
1. Goals and objectives of the study, planned results
The purpose of the study is to identify and eliminate vulnerabilities in media transmission protocols, as well as to identify and eliminate alleged vulnerabilities in the mechanisms of user registration and authentication in the development of applications using cryptographic protected voice messaging protocols. The practical value lies in the development of the protocol for transmission of voice messages with cryptographic protection and also in the modified method of user authentication by means of QR-code [4] for applications for transmission of voice messages with cryptographic protection.
The result of this masterʼs work will be the developed authorʼs protocol for transfer of media data with cryptographic protection and also the modified way of user authentication by means of a QR-code.
2. Analysis of problems of multimedia data transmission over the network
To transfer voice messages between users, you need to establish connections. There are two main data transfer protocols, UDP [5] and TCP [6]. TCP is a reliable protocol for data transfer because delivery is guaranteed, however, it is not suitable for streaming media transfer because speed is required and TCP will check if a packet has been sent and if not, then it will resend it. In this case the UDP protocol comes to the aid, it will provide the necessary speed, but it does not guarantee the delivery of the packet, which means that the user may not receive the word spoken, or may not receive it completely. Therefore, the next problem is the lossless full data transfer.
3. Literature review on the problem of multimedia data transmission over the network
In the book Basics of voice data transmission over IP networks
the authors give the same example as described above, namely: For the transmission of voice data over IP networks TCP protocol guarantees the reliability of the established
connection. However, the methods used in the book TCP protocol, do not allow to apply it as a mechanism for transmission of voice data proper (RTP). When transmitting voice data over IP networks, packet loss is a lesser evil than network latency.
Currently, the protocol H.323 uses the protocol TCP, and the protocols SIP and MGCP-protocol UDP (as a transport mechanism, protocol SIP is also able to use the protocol TCP).
When transmitting voice data over networks IP protocol UDP is used to transmit the actual voice traffic (carrier channels). UDP protocol is not used for this purpose, because in this case, flow control and retransmission of sound packets are simply not needed. Since UDP protocol only transmits audio stream, its transmission will not be affected by either 5% or 50% packet loss.
If the TCP protocol was used to transmit voice data over IP networks, the network delay, reinforced by waiting for confirmations and repeated transmissions, would seriously degrade the sound quality. For voice data transmission over IP networks and other real-time applications, network latency control is more important than ensuring reliable transmission of each packet.
On the other hand, the TCP protocol is used to establish a connection by the majority of protocols for transmitting service signals when transmitting voice data over IP networks [7].
In the book Computer Networks. The top-down approach
considers a more detailed structure of protocols, much attention is paid to the protocols of media data transmission [8]. In this book as an example is the work
of various popular services, such as YouTube, Skype, etc. Much attention is paid to one of the developing areas today - multimedia network technology, in particular, the specifics of audio and video data transmission. The book tells about
multimedia networks. Now it is possible to find in it a detailed discussion of streaming video; in particular, adaptive streaming. In addition, the book has a completely new section on content delivery networks (CDN). Also we are talking about
streaming video systems Netflix, YouTube and Kankan.
Data transmission, be it text or voice and video, can take place in real or model time. Multimedia data can be data of both real and model time. Real time is the ability to see and hear the data dynamically. For example, a video clip that is viewed as
it is downloaded to your network station is classified as a real-time application. A camera that shoots someoneʼs performance with video servers that use the protocol IP, and distributes the data to thousands of workstations for viewing
in real time – another example. Voice and video require special conditions to be met, or rather, real-time applications have certain requirements for data transmission mechanisms, which are discussed in the book TCP/IP. Illustrated
textbook
[9].
3.1 Overview of local sources
The problems of security of transmitted data in messengers were considered in the article of A.I. Krushanov, Master of DonNTU, within the framework of the II International Scientific and Practical Conference 2018, where the author pays special attention to the key exchange between users and designs his own protocol for this purpose, which also indicates the urgency of the problem [10].
Earlier, in the framework of the X International Scientific and Technical Conference Information control systems and computer monitoring
(ІCS and CM-2019) I considered the existing protocols for the software implementation of voice
messengers and after reviewing them I came to the conclusion that the development of a voice messenger is a rather time-consuming process, which includes close communication of different protocols [11].
4. Analysis of voice message transmission protocols
Each voice messenger in one way or another transmits streaming data over the network, while the voice is transmitted using different transport protocols. Each existing voice messenger in one way or another uses them. Transport protocols, in this case, are provided for streaming media data, namely sound. Currently, there are two most common protocols for this purpose. These are RTP and SRTP protocols, which help to transmit streaming data, but they also have their disadvantages.
4.1 RTP protocol analysis
As a rule, RTP (Real-time Transport Protocol) – transport protocol is used for this purpose. It is the protocol that provides data transmission in real time. RTP data is usually delivered via UDP, which is an unreliable transport protocol. Therefore, there is no guarantee that packets will be delivered at the transport layer. The packets will be received in the order they were sent or the packets will be sent at a constant speed. Sequential packet numbers and timestamps allow the application receiving the RTP packets to restore the senderʼs packet sequence, detect changes in the network and adjust accordingly. Figure 1 shows the scheme of audio data transmission over the network using the RTP protocol.
Figure 1 – Transfer of audio data over the network using the RTP protocol
Using the UDP protocol to encapsulate RTP packets includes certain limitations, such as transmission errors. The result is that any lost or damaged part is simply ignored. The RTP protocol is used for the transmission of sound and images, but does not monitor the integrity of the transmitted data in any way. RTP does not provide automatic retransmission of missed packets. However, to transfer data using only RTP is unreasonable from the point of view of security of transferred data, because they can be intercepted by third parties. Consequently, all messengers, as a rule, encrypt the transmitted data and most messengers use SRTP protocol for this purpose [12].
Today most VoIP traffic is sent without any cryptographic protection [13] and is vulnerable in terms of listening and modification, so the use of security features is an urgent task.
4.2 SRTP protocol analysis
SRTP (Secure Real-time Transport Protocol) is an extension of the RTP protocol that adds additional security features, such as message authentication, encryption, integrity verification and data replacement protection, mainly designed for VoIP communications. SRTP is one of the security protocols used for WebRTC technology. As a rule, SRTP uses AES-CM by default for encryption [14]. The main reason for choosing AES-CM was the lack of payload extension (the encrypted payload has the same length as the original one). Another feature of AES-CM allows to process packets in a different order, which implies the possibility to process packets in parallel. By payload we mean the part of the transmitted packet where the actual message is located. Therefore, we can conclude that all voice messengers for the transmission of streaming data use the RTP protocol, over which different encryption algorithms are used.
The cryptographic state information associated with each SRTP stream is called a cryptographic context. It (state) must be supported by both sender and receiver. If there are several SRTP streams in a given RTP session, a separate cryptographic context must be supported for each SRTP stream (letʼs say, send audio and video simultaneously, but in different streams).
The cryptographic context includes any session key (a key directly in the message encryption/authentication) and the main key (a random bit string used to obtain session keys), as well as other parameters of a working session.
Although SRTP does not define a precise mechanism for key exchange implementation, it provides several functions that simplify key management and improve overall security. The master key is used to provide key material for the output function key.
This can generate initial session keys [15], and, this mechanism periodically provides new session keys to guarantee a limited length of ciphertext obtained by any given cipher key. Session keys are used to provide protection against various influences such as pre-calculation and memory based attacks over time.
Periodically changing the key generation function itself leads to additional security measures. As a rule, this prevents a person in the middle from collecting a large amount of encrypted material, encrypted with a single session key. Some hacking is easier to do when there is a large amount of encrypted material. In addition, multiple key generation changes provide forward and reverse security in the sense that the decrypted session key does not jeopardize other session keys obtained from the same master key. This means that even if the attacker managed to obtain a certain session key, he is not able to decrypt messages provided with previous and later session keys obtained from the same master key (although, of course, the obtained master key will give all the session keys obtained from it).
SRTP relies on an external key exchange protocol to install the master primary key. SRTP uses such protocols as ZRTP [16] and MIKEY [17] for this purpose. There are other methods to agree on SRTP keys. Several different manufacturers offer products that use the SDES key exchange method.
Conclusions
In the course of the study it was found that RTP protocol is used for data streaming in most of the known messengers. To ensure protection of transmitted information over RTP protocol, known cryptographic encryption algorithms are used. The result of writing the masterʼs thesis will be the developed protocols, which theoretically can replace SRTP and standard authentication methods. In the course of analysis of existing protocols, it was found that the protocols audio data transmissions are quite vulnerable to eavesdropping and traffic spoofing.
Further research on this topic will focus on the following aspects:
List of sources
Э, 2016. – 912 с.
Crypto Messenger/ А.И. Крушанов, А.В. Чернышова // Программная инженерия: методы и технологии разработки информационно-вычислительных систем (ПИИВС–2018): сборник научных трудов II научно-практической конференции (студенческая секция), том 2 / Донец.национал.техн.ун-т; — Донецк, 2017. — С. 116-120.