Abstract

Introduction
1. Goals and objectives of the study, planned results
2. Analysis of problems of multimedia data transmission over the network
3. Literature review on the problem of multimedia data transmission over the network
3.1 Overview of local sources
4. Analysis of voice message transmission protocols
4.1 RTP protocol analysis
4.2 SRTP protocol analysis
Conclusions
List of sources

When writing this essay, the master's work is not yet complete. Final completion: June 2021. The full text of the work and materials on the topic can be obtained from the author or his supervisor after the specified date.

Introduction

Voice messengers [1] have gained their popularity only recently. The development of this industry comes at a time when average Internet speeds around the world have become comfortable enough to use these apps, even on smartphones. Messenger is a program, mobile application or web service for instant messaging, in this case, by voice. This industry began to develop quite rapidly and efficiently, which resulted in the creation of at least hundreds, if not thousands of messengers.

Messenger is a client-server application [2], which can be implemented for a user as a mobile application or web service for instant messaging. This area is developing quite quickly and efficiently, resulting in the creation of at least hundreds, if not thousands, of instant messengers by various software firms and individual programmers around the world. Voice messaging using the Internet is quite common in the circles of private users, as well as in government and corporate institutions. For the transmission of media data, and in particular audio data over the network between users, the real-time transmission protocol RTP [3] is used, which is insecure because by default it does not contain cryptographic encryption and authentication, and data in the RTP-package is transmitted in the open. For this reason, it is important to improve the efficiency of voice messaging protocols.

1. Goals and objectives of the study, planned results

The purpose of the study is to identify and eliminate vulnerabilities in media transmission protocols, as well as to identify and eliminate alleged vulnerabilities in the mechanisms of user registration and authentication in the development of applications using cryptographic protected voice messaging protocols. The practical value lies in the development of the protocol for transmission of voice messages with cryptographic protection and also in the modified method of user authentication by means of QR-code [4] for applications for transmission of voice messages with cryptographic protection.

The result of this masterʼs work will be the developed authorʼs protocol for transfer of media data with cryptographic protection and also the modified way of user authentication by means of a QR-code.

2. Analysis of problems of multimedia data transmission over the network

To transfer voice messages between users, you need to establish connections. There are two main data transfer protocols, UDP [5] and TCP [6]. TCP is a reliable protocol for data transfer because delivery is guaranteed, however, it is not suitable for streaming media transfer because speed is required and TCP will check if a packet has been sent and if not, then it will resend it. In this case the UDP protocol comes to the aid, it will provide the necessary speed, but it does not guarantee the delivery of the packet, which means that the user may not receive the word spoken, or may not receive it completely. Therefore, the next problem is the lossless full data transfer.

3. Literature review on the problem of multimedia data transmission over the network

In the book Basics of voice data transmission over IP networks the authors give the same example as described above, namely: For the transmission of voice data over IP networks TCP protocol guarantees the reliability of the established connection. However, the methods used in the book TCP protocol, do not allow to apply it as a mechanism for transmission of voice data proper (RTP). When transmitting voice data over IP networks, packet loss is a lesser evil than network latency. Currently, the protocol H.323 uses the protocol TCP, and the protocols SIP and MGCP-protocol UDP (as a transport mechanism, protocol SIP is also able to use the protocol TCP).

When transmitting voice data over networks IP protocol UDP is used to transmit the actual voice traffic (carrier channels). UDP protocol is not used for this purpose, because in this case, flow control and retransmission of sound packets are simply not needed. Since UDP protocol only transmits audio stream, its transmission will not be affected by either 5% or 50% packet loss.

If the TCP protocol was used to transmit voice data over IP networks, the network delay, reinforced by waiting for confirmations and repeated transmissions, would seriously degrade the sound quality. For voice data transmission over IP networks and other real-time applications, network latency control is more important than ensuring reliable transmission of each packet.

On the other hand, the TCP protocol is used to establish a connection by the majority of protocols for transmitting service signals when transmitting voice data over IP networks [7].

In the book Computer Networks. The top-down approach considers a more detailed structure of protocols, much attention is paid to the protocols of media data transmission [8]. In this book as an example is the work of various popular services, such as YouTube, Skype, etc. Much attention is paid to one of the developing areas today - multimedia network technology, in particular, the specifics of audio and video data transmission. The book tells about multimedia networks. Now it is possible to find in it a detailed discussion of streaming video; in particular, adaptive streaming. In addition, the book has a completely new section on content delivery networks (CDN). Also we are talking about streaming video systems Netflix, YouTube and Kankan.

Data transmission, be it text or voice and video, can take place in real or model time. Multimedia data can be data of both real and model time. Real time is the ability to see and hear the data dynamically. For example, a video clip that is viewed as it is downloaded to your network station is classified as a real-time application. A camera that shoots someoneʼs performance with video servers that use the protocol IP, and distributes the data to thousands of workstations for viewing in real time – another example. Voice and video require special conditions to be met, or rather, real-time applications have certain requirements for data transmission mechanisms, which are discussed in the book TCP/IP. Illustrated textbook [9].

3.1 Overview of local sources

The problems of security of transmitted data in messengers were considered in the article of A.I. Krushanov, Master of DonNTU, within the framework of the II International Scientific and Practical Conference 2018, where the author pays special attention to the key exchange between users and designs his own protocol for this purpose, which also indicates the urgency of the problem [10].

Earlier, in the framework of the X International Scientific and Technical Conference Information control systems and computer monitoring (ІCS and CM-2019) I considered the existing protocols for the software implementation of voice messengers and after reviewing them I came to the conclusion that the development of a voice messenger is a rather time-consuming process, which includes close communication of different protocols [11].

4. Analysis of voice message transmission protocols

Each voice messenger in one way or another transmits streaming data over the network, while the voice is transmitted using different transport protocols. Each existing voice messenger in one way or another uses them. Transport protocols, in this case, are provided for streaming media data, namely sound. Currently, there are two most common protocols for this purpose. These are RTP and SRTP protocols, which help to transmit streaming data, but they also have their disadvantages.

4.1 RTP protocol analysis

As a rule, RTP (Real-time Transport Protocol) – transport protocol is used for this purpose. It is the protocol that provides data transmission in real time. RTP data is usually delivered via UDP, which is an unreliable transport protocol. Therefore, there is no guarantee that packets will be delivered at the transport layer. The packets will be received in the order they were sent or the packets will be sent at a constant speed. Sequential packet numbers and timestamps allow the application receiving the RTP packets to restore the senderʼs packet sequence, detect changes in the network and adjust accordingly. Figure 1 shows the scheme of audio data transmission over the network using the RTP protocol.

Figure 1 – Transfer of audio data over the network using the RTP protocol

Using the UDP protocol to encapsulate RTP packets includes certain limitations, such as transmission errors. The result is that any lost or damaged part is simply ignored. The RTP protocol is used for the transmission of sound and images, but does not monitor the integrity of the transmitted data in any way. RTP does not provide automatic retransmission of missed packets. However, to transfer data using only RTP is unreasonable from the point of view of security of transferred data, because they can be intercepted by third parties. Consequently, all messengers, as a rule, encrypt the transmitted data and most messengers use SRTP protocol for this purpose [12].

Today most VoIP traffic is sent without any cryptographic protection [13] and is vulnerable in terms of listening and modification, so the use of security features is an urgent task.

4.2 SRTP protocol analysis

SRTP (Secure Real-time Transport Protocol) is an extension of the RTP protocol that adds additional security features, such as message authentication, encryption, integrity verification and data replacement protection, mainly designed for VoIP communications. SRTP is one of the security protocols used for WebRTC technology. As a rule, SRTP uses AES-CM by default for encryption [14]. The main reason for choosing AES-CM was the lack of payload extension (the encrypted payload has the same length as the original one). Another feature of AES-CM allows to process packets in a different order, which implies the possibility to process packets in parallel. By payload we mean the part of the transmitted packet where the actual message is located. Therefore, we can conclude that all voice messengers for the transmission of streaming data use the RTP protocol, over which different encryption algorithms are used.

The cryptographic state information associated with each SRTP stream is called a cryptographic context. It (state) must be supported by both sender and receiver. If there are several SRTP streams in a given RTP session, a separate cryptographic context must be supported for each SRTP stream (letʼs say, send audio and video simultaneously, but in different streams).

The cryptographic context includes any session key (a key directly in the message encryption/authentication) and the main key (a random bit string used to obtain session keys), as well as other parameters of a working session.

Although SRTP does not define a precise mechanism for key exchange implementation, it provides several functions that simplify key management and improve overall security. The master key is used to provide key material for the output function key.

This can generate initial session keys [15], and, this mechanism periodically provides new session keys to guarantee a limited length of ciphertext obtained by any given cipher key. Session keys are used to provide protection against various influences such as pre-calculation and memory based attacks over time.

Periodically changing the key generation function itself leads to additional security measures. As a rule, this prevents a person in the middle from collecting a large amount of encrypted material, encrypted with a single session key. Some hacking is easier to do when there is a large amount of encrypted material. In addition, multiple key generation changes provide forward and reverse security in the sense that the decrypted session key does not jeopardize other session keys obtained from the same master key. This means that even if the attacker managed to obtain a certain session key, he is not able to decrypt messages provided with previous and later session keys obtained from the same master key (although, of course, the obtained master key will give all the session keys obtained from it).

SRTP relies on an external key exchange protocol to install the master primary key. SRTP uses such protocols as ZRTP [16] and MIKEY [17] for this purpose. There are other methods to agree on SRTP keys. Several different manufacturers offer products that use the SDES key exchange method.

Conclusions

In the course of the study it was found that RTP protocol is used for data streaming in most of the known messengers. To ensure protection of transmitted information over RTP protocol, known cryptographic encryption algorithms are used. The result of writing the masterʼs thesis will be the developed protocols, which theoretically can replace SRTP and standard authentication methods. In the course of analysis of existing protocols, it was found that the protocols audio data transmissions are quite vulnerable to eavesdropping and traffic spoofing.

Further research on this topic will focus on the following aspects:

Improve the quality of privacy when streaming audio data.
Minimization of transmitted traffic when streaming audio data.
Increasing the level of confidentiality of user's personal data by improving existing mechanisms of user authentication.
Development of a fully functional model of the developed protocols.

List of sources

Орлова Н.В. Голосовые сообщения как источник сведений о коммуникативных нормах и ценностях / Н.В. Орлова. // Языкознание и литературоведение. — 2018. — С. 57-66. — URL: https://cyberleninka.ru/articl… (circulation date: 20.11.2020).
Танатканова А.К. Построение клиент-серверных приложений / А.К. Танатканова, А.К. Жамбаева. // Компьютерные и информационные науки. — 2019. — С. 1-2. — URL: https://cyberleninka.ru/articl… (circulation date: 20.11.2020).
Schulzrinne H. RTP: A Transport Protocol for Real-Time Applications / H. Schulzrinne, S. Casner, R. Frederick, J. Jacobson // RFC 3550. — 2003. — pp. 1-89. — URL: https://www.rfc-editor.org/rfc… (circulation date: 25.11.2020).
Ткачева М.В. Оценка допустимых преобразований Qr Code / М.В. Ткачева. // Дизайн и технология полиграфического и упаковочного производства. — 2013. — С. 151-156. — URL: https://cyberleninka.ru/articl… (circulation date: 20.11.2020).
Postel J. User Datagram Protocol / J. Postel // RFC 768. — 1980. — pp. 1-3. — URL: https://docbox.etsi.org/Refere… (circulation date: 22.11.2020).
TRANSMISSION CONTROL PROTOCOL // RFC 793. — 1981. — pp. 1-91. — URL: https://www.rfc-editor.org/pdf… (circulation date: 22.11.2020).
Джонатан Д. Основы передачи голосовых данных по сетям IP / Д. Джонатан, Д. Питерс, М. Бхатия, С. Калидини, С. Мукхержи, – М.: Видьямс, 2007. – 400 с.
Куроуз Д. Компьютерные сети : Нисходящий подход / Д. Куроуз, К. Росс, – М.: Издательство Э, 2016. – 912 с.
Ногл М. TCP/IP. Иллюстрированный учебник / М. Ногл, – М.: ДМК Пресс., 2001. – 480 с.
Крушанов А.И. Разработка защищенного протокола обмена сообщениями между пользователями и клиент-серверного программного обеспечения Crypto Messenger / А.И. Крушанов, А.В. Чернышова // Программная инженерия: методы и технологии разработки информационно-вычислительных систем (ПИИВС–2018): сборник научных трудов II научно-практической конференции (студенческая секция), том 2 / Донец.национал.техн.ун-т; — Донецк, 2017. — С. 116-120.
Вязмин В.И. Обзор существующих протоколов для программной реализации голосового мессенджера / В.И. Вязмин, А.В. Чернышова // Информатика, управляющие системы, математическое и компьютерное моделирование (ИУСМКМ–2019): материалы X Международной научно-технической конференции (студенческая секция), топ 5 / Донец.национал.техн.ун-т; — Донецк, 2019. — С. 55-59.
Baugher M. The Secure Real-time Transport Protocol (SRTP) / M. Baugher, D. McGrew, M. Naslund, E. Carrara, K. Norrman // RFC 3711. — 2004. — pp. 1-56. — URL: https://www.rfc-editor.org/rfc… (circulation date: 21.11.2020).
Афанасьева Д.В. Средства криптографической защиты информации / Д.В. Афанасьева, А.А. Абидарова. // Известия Тульского государственного университета. Технические науки. — 2019. — С. 67-71. — URL: https://cyberleninka.ru/articl… (circulation date: 23.11.2020).
Lakmal S. AES-CM implementation in VoIP achieve media transport confidentiality. / S. Lakmal. — 2017. — pp. 1-6. — URL: https://www.researchgate.net/p… (circulation date: 25.11.2020).
Кунегин С.В. Обмен ключами [Электронный ресурс]. — URL: http://kunegin.com/ref1/atmsec… (circulation date: 25.11.2020).
Zimmermann P. ZRTP: Media Path Key Agreement for Unicast / P. Zimmermann, A. Johnston, J. Callas // RFC 6189. — 2011. — pp. 1-69. — URL: https://philzimmermann.com/doc… (circulation date: 21.11.2020).
Abdmeziem M. Distributed and Compressed MIKEY Mode to Secure End-to-End Communications in the Internet of Things / M. Abdmeziem, D. Tandjaoui, I. Romdhani. — 2015. — pp. 1-8. — URL: https://www.napier.ac.uk/~/med… (circulation date: 24.11.2020).
Вязмин В.И. Повышение эффективности протоколов передачи голосовых сообщений для мессенджеров с криптографической защитой / В.И. Вязмин, А.В. Чернышова // Программная инженерия: методы и технологии разработки информационно-вычислительных систем (ПИИВС–2020): Сборник материалов III Международной научно-практической конференции (студенческая секция), том 3 / Донец.национал.техн.ун-т; — Донецк, 2020. — С. 105-109.
Мичурин А. Иллюстрация работы RSA на примере // Алексей Мичурин [Электронный ресурс]. — URL: http://www.michurin.net/comput… (дата обращения: 21.11.2020).
Дудин Д. Что такое base64 и зачем он нужен в веб разработке? // html5.by [Электронный ресурс]. — URL: https://html5.by/blog/wtf-base… (circulation date: 24.11.2020).

Vladislav Viazmin

Faculty of computer science and technology (CST)

Department of software engineering (SE)

Speciality Software engineering

Improving the efficiency of voice messaging protocols for cryptographically protected messengers

Scientific adviser: Professor, Doctor of Technical Sciences, Sergey Zori

Consultant: senior lecturer Alla Chernyshova

Abstract