Abstract

Introduction

In the process of software development, measures are required to identify errors and defects in its work and source code. Various software analysis methods are used to accomplish this task. Software analysis can be divided into two categories: static and dynamic.

Static analysis of software detects errors in source code and is performed without actual execution of the programs. Dynamic analysis, on the contrary, is performed with the help of program code execution, so it allows detecting errors in software operation.

This research focuses on static program code analysis. One of the oldest methods to detect defects in source code is the code review. It means activity in which one or several people read the source code together carefully and making recommendations for improvement. [1] However, there is a desire to automate this process, so other methods have been invented which are implemented in different software analysis tools.

The following basic forms of static software analysis exist:

In this paper, metrics and visualization (one of the reverse engineering methods) are considered as forms of static analysis. However, many think differently, but in any case metrics and visualization are combined with static analysis.

1. Relevance of the topic

As already mentioned in the previous section, various software analysis tools have been created. These tools are implemented as standalone applications that analyze locally stored code. However, static analysis does not require any real execution of the program, so it can be done on source code which is stored in any way. That is, the whole integration process comes down to the problem of getting the source code of a project by static analysis tool.

Today there are many ways to store the source code of a project: version control hosting, private (closed) repositories, own hosting of repositories, etc. The most popular among such approaches are version control hosting web services. These web services are based on different version control systems, so they have all the features of version control systems and also additional unique for these services.

These web services have all the features of the VCS, allowing them to store the source code of the project remotely. Due to the popularity of such web services and their ability to store code, it is relevant to consider integrating static analysis with version control hosting web services.

The article Integrated static analysis for software project using architectural style REST [2] is presented at the III International scientific-practical conference Software engineering: methods and technologies for development of Information Computer Systems (SEICS-2020). It describes in detail this integrable approach to static analysis of a program project.

2. Goal and tasks of the research

The goal of the research is to develop an approach to static analysis, where it can be integrated into an existing software project stored on the version control hosting web service. The approach is designed with the advantages and disadvantages of existing ones in mind.

The study focuses on the static analysis of imperative programming languages, specifically the analysis of C programming projects. Also, at the end of the research, a program model should be developed, which is a system of static analysis of a program project written in C that integrates with a version control hosting.

Core research objectives:

Research objects: static software analysis and the REST architectural approach.

Research subject: integrating static software analysis into existing software project using REST.

.

3. Literature review

The research is linked to the following fields of software development: software metrics, visualization of metrics and program project code, distributed systems and REST architectural approach for creating web services. Let's look at different studies in each of these areas.

3.1 Overview of software metrics studies

A lot of authors explores different software metrics. This paper considers metrics only for imperative programming languages, but even for imperative languages there are different sets of metrics.

Maurice Halstead proposed a complete set of software metrics that displays the characteristics of an existing program. [3] These metrics do not rely on the traditional Lines of Code (LOC), but rather on operands and operators. The set of metrics is divided into: basic metrics, different metrics of program volume, quality level, labor and cost.

Thomas McCabe described the metric of cyclomatic complexity of the program. [4] The metric is based on the analysis of the control flow from one operator to another. This approach takes into account the logic of the program. The program is presented as a control directed graph, this graph is called a control graph, a flow control graph, or a program control graph. The McCabe metric is a cyclomatic number of the control graph, the sum of the number of edges and vertices of the graph. For this metric, there are modifications proposed by different authors, the two most famous of which are Myers and Hansen metrics [5].

This metrics are studied in the work of DonNTU Master D.K. Zhukova, published on her personal site. The work describes Halsted metrics, McCabe's cyclomatic metric. In addition to these metrics, various metrics for object-oriented languages are analyzed: the number of remote methods called (NORM), responses for class (RFC), weighted message passing coupling (WMPC1, WMPC2) and others. Quantitative metrics are presented: Lines of Code (LOC), Number of Classes (NOC), Number of Descendant Classes (NDC), Number of Local Methods (NLM) etc.

Victor Basili and Richard Selby reviewed different testing strategies [6]. The work shows that there is a wide range of testing strategies. Because of this wide range of testing strategies, it is useful to use metrics that calculate the number of test cases required for each strategy for a given program, for example, the McCabe metric can measure the number of test cases required for structural testing.

A majority of modern languages support modularity. One of the intentions when splitting the program into modules is to prevent the impact of impact of one module changing on another. In order to measure such a relationship Norman Fenton suggested classifying modules into 6 classes of coupling. [7]. In the future the coupling of the program can be depicted as coupling graph. The coupling graph can provide a good method to visualize the coupling between the modules.

Typically, modular programs consist of modules with data flowing through them in a way called information flow. Information flows can show complex interactions and complex algorithmic behavior in a program. Sally Henry and Dennis Kafura proposed a metric for modules that counts the number of such flows that terminate in a module and originate from the module [8]. This metric allows us to determine the effect of modules on the program's information flow, and thus identify what makes changes and testing in a program more complex.

3.2 Overview of software visualization studies

Diana Sidarkeviciute claims that the most commonly used visualization tools are code viewers [9]. Code viewers provide a fixed set of graphical views of program input: graph structures, forward and backward slicers, dicers, dead (unreachable) code view.

But due to the huge size of modern programs, the use of graph structures or different types of code viewers does not always find its application. In the process of visualization such views can become heavily cluttered.

Peter Young and Malcolm Munro described a visualization based on three-dimensional geometric forms such as cubes and cylinders [10]. Their system consists of two parts, CallStax and FileVis, which can be combined into one visualization system. There are other types of visualizations using abstract structures: real-world code representations (e.g., cities), polymetrical representations, hyperbolic trees, treemapping, radar evolution, etc.

3.3 Overview of distributed systems and REST studies

It is possible to create an integrated static analyzer in different ways. For example, it is possible to create the analyzer as a utility, but this complicates development for multiple platforms. Therefore, it is better to use an approach where the analyzer is created as a web service. To facilitate the process of integrating the analyzer into the development process, you can use the REST server architecture. However, before considering the REST approach and the limitations it places on the application architecture, you should consider the concept of middleware.

S.A. Shelokov highlights the following requirements for distributed systems: openness, scalability, stability and security. He argues that to meet these requirements, the system must include an additional layer of software that is located between the application and transport layers of the OSI model, called middleware [11].

The research of masters V.A. Alekseeva, M.A. Sednevets, N.V. Vorotyntsev also analyze these requirements to distributed systems. In the research of V.A. Alekseeva and N.V. Vorotyntsev the use of client-server architecture in designing a distributed application is considered.

Middleware should provide data exchange between components of a distributed system. The concept of a middleware is detailed in M.A. Sednevets research. S.V. Gorin, V.A. Krischenko argue that there are two concepts of interaction of software components: messaging and remote procedure call (RPC) or remote object method call [12]. Master S.A. Prihodko considers the mechanism of RPC and other mechanisms of interaction of components in a distributed system, and also gives examples of architectures using these mechanisms. The advantages and disadvantages of each of the architectures are presented.

Using TCP, HTTP protocols in the Internet high-level protocols can be built to implement remote procedure call. A remote procedure call can be represented as a normal HTTP request (GET or POST, such request is called a REST request). This is what REST architectural style is based on, which is one of the ways to exchange data between distributed system components.

The REST architectural style was proposed by Roy Fielding in his thesis Architectural Styles and the Design of Network-based Software Architectures at the University of California at Irvine in 2000 [13]. In essence, the concept of REST does not refer to architecture, but to a set of limitations imposed on system architecture. Here are some mandatory limitations: bringing the system architecture to the client-server model, the absence of states between requests, request caching, uniform interface, layers (intermediate servers), code on demand (optional requirement). Uniform interface is a complex requirement consisting of: resource identification in requests; resource manipulation through representations; self-descriptive messages; and hypermedia as the Engine of Application State (HATEOAS). [14]

So REST is a set of principles for designing a system architecture, not an architecture or protocol. Also, when designing a system it is not necessary to meet all requirements. A system that not fulfill all constraints called REST-like. REST makes developing distributed web services simple and flexible.

4. Static analysis approach with integration with project hosting web services

As mentioned in the previous section, a static code analysis tool must be created as a web service. The web service integrates more easily with other tools for building, deploying and testing the project. This simplifies its use in continuous integration (CI) [15] and continuous deployment (CD) [16]. Web service also allows you to make a static analysis system cross-platform, reduce the load on the system into which static analysis is integrated.

4.1 Web Service Requirements for Static Code Analysis

The presence of the REST API in a web service allows simplifying integration with version control hosting web services. All popular web services for version control hosting, such as GitHub [17] or GitLab [18], have a REST API. That is, a web service for static analysis must be designed using the REST architectural approach.

Statistical analysis exists in the following forms: software metrics, software visualization and code error detection. It is necessary to determine what types of metrics and visualizations will be used in a static analysis system. The web service has a REST interface so requirements must also be specified accordingly.

4.1.1 Requirements for metrics used by the web service

The following software metrics should be computed by the static analysis web server:

In addition to the metrics used, you can also make requirements for their display, filtering and exporting. The following criteria should be used for filtering: feature set, file set. The metrics should also be calculated for 3 levels: project, files, functions. Recommended values for metrics (upper and lower limits) should be displayed. It should be possible to compare metrics between project releases.

The most important format for exporting metrics is CSV [19], as it allows better integration with other utilities. There should also be at least one reader-friendly format, such as HTML [20].

4.1.2 Requirements for the visualizations used by the web service

The following visualization approaches should be used by the web service:

It is also worth adding filtering using dynamic queries by metric value during software visualization.

4.1.3 Requirements for REST interface and web interface

REST API requests can be divided into 2 types: write and read. Read type queries are used to get information about the project or calculated metrics. Write type queries allow you to create a project or change its content. Write type queries are based on the idea that all metrics are calculated from related files (or the project as a whole) while the query is processed. Further on, the client simply receives information about the calculated metrics by read queries.

All queries manipulate project through representations that can be obtained through a particular resource. Resources can be nested within each other. As a result, they form a tree-like structure. This resource structure is shown in Figure 1.

.
<abbr title=

Figure 1 — REST API resource tree of the web service

The following resources are descending from the root: register and :user_name (a parameter that is specified in a request, e.g. /omegadog). Using a register request, you can create an account on the system. Then, using the username specified in the account, you can manage projects of this account. Thus, the resource /:user_name/:project_name (:project_name is the parameter that defines the project in the account with username :user_name) is the root resource for all requests that manage the project.

Using the resource 'contents' you can manage the project contents, add or delete files or get information about a file or a project directory. Then, you can obtain the metrics of a file, directory, or project using the contents resource. That is, if a client got some path with resource 'contents', for example /contents/directory1/file1, then using the metrics resource and the same path (/metrics/directory1/file1) you can get metrics of that path. The same approach to requests is used for the resource 'visualization', which allows to get a visualization of the specified path.

Resource /:username/settings is used to change the settings of an account with a given username.

Service requests take different parameters, there are 2 ways to transfer parameters: request parameters or the request's body. The parameters that are passed in the request body are usually difficult to represent in the request parameters, for example HTTP [21] link to another resource. The request body is described in JSON format, the parameters inside the body are represented as JSON key value (field value) [22] object.

The following HTTP methods are used in requests: GET, PUT, POST. The POST method is used when creating a representation or resource, for example, a request to create a project. When requesting the project creation, the client refers to the resource /:user_name/:project_name using the POST method and passes the parameters necessary for project creation. The GET method is used for obtaining representations. JSON object (resource representation) is returned to the client when accessing the resource using this method. In case the client only needs to make changes to the existing resource, it does the same protocol as in the case of the POST method, only using the PUT method.

Write type requests must be authenticated, so there must be a field in the request for HMAC transmission [23]. If the request was not authenticated then the request should return a 403 Forbidden response. Signing requests with HMAC is done to differentiate between clients. For trusted clients, secret keys should be generated, and the service and clients should share these keys.

Trusted users must be able to create accounts on the system to bind secret keys to accounts. The process of creating an account can be done using the REST interface, but it's easier to do using the GUI or command-line interface. For the service it is best to use the web interface as a graphical interface, so the service should have this type of interface. It is also possible to add a command line interface while developing the service further.

The web-interface must have minimum capabilities to create and manage accounts, generate and receive secret keys to use the REST API. Other features of the service can be added to the interface during further development.

4.2 An example of a static analysis web service integration with a version control hosting

Let's look at an example of integrating a static analysis web service with the GitHub web service, which is the largest version control hosting service at the moment. This service, like many others, supports subscription to events that occur in the project repository. When an event occurs a POST request is sent to the specified URL of the REST interface, this feature is called web hooks [24]. The service supports different events:

Based on the events listed below, push is most important for the static analysis service. Because it is necessary to track changes occurring in a repository branch, and then calculate metrics and visualizations for new changes. Also REST interface of the static analysis service should have a resource for web hook processing, i.e. for each project there should be a resource allocated for this purpose. For example, for the GitHub web hook handler, you can allocate the following resource /:user_name/:project_name/webhook/github where :user_name and :project_name are parameters that change depending on the project owner's account and the project name itself.

The web hook for the push event has many parameters in the request body, but the most important parameter is 'after', which stores the SHA hash [25] value of the last commit sent to the branch. With this hash value, the service can then process changes in the repository branch using the GitHub REST API.

The process of processing changes in the repository branch when the push web hook is received by the static analysis service consists of the following steps:

  1. Getting the SHA hash value of the last commit, the service sends a request for information about commit using the GitHub REST API /repos/:user_name/:project_name/git/commits/:commit_hash. Many parameters come in reply to the request, but 'tree' parameter is important. The 'tree' parameter is the object inside which the property 'sha' is stored. The sha property stores the SHA hash value of the BLOB tree [26] that the commit points to.
  2. Using the request /repos/:user_name/:project_name/git/trees/:hash_tree, we get the tree structure. This structure is stored in the tree parameter, it is an array of JSON objects. Important parameters for each object are: 'path' (path to a file or directory), 'type' (BLOB or tree) and 'sha' (SHA hash value of a BLOB or subtree).
  3. If new paths are needed to create, it is necessary to add corresponding resources to the project.
  4. Hash value must be checked for each already existing path against the value stored in the static analysis service. In case of a value mismatch and if the path is of BLOB type, the content must be obtained with the /repos/:user_name/:project_name/git/blobs/:hash_blob request. For each new path it is also necessary to get the content if the new path is of BLOB type. The reply to this request contains 2 fields: 'content' (BLOB content) and 'encoding' (data encoding of the content field, e.g. base64 [27]).
  5. In case of hash value mismatches and if the path is a subtree (tree type), repeat steps 2–5 recursively. For a new path with a tree type, repeat steps 2–5 for the subtree as well.
  6. .

Figure 2 shows a concrete example of the process of processing changes in the repository branch while the static analysis service receiving a push web hook, in this example a commit is pushed to the repository branch that changes file1.c and adds file2.c.

Example of push web hook processing

Figure 2 — Example of push web hook processing (animation: 6 frames, 65.3 Kb)

Conclusions

In this paper we consider an approach to static analysis of the software project, which simplifies integration with version control hosting web services and other software development tools. This approach makes the software analysis process more flexible and simplifies its interaction with continuous integration (CI) and continuous deployment (CD). In the future, it is planned to expand static analysis web service support not only for imperative programming languages, but also for languages of other paradigms.

As part of the research performed:

Further research focuses on the following aspects:

While writing this abstract, the master's work is not yet complete. Final completion: June 2021. Full text of the research and materials on the topic may be obtained from the author or his adviser after the specified date.

.

References

  1. Статический анализ как часть процесса разработки Unreal Engine // Habr [Электронный ресурс]. — Режим доступа: https://habr.com/ru/company/pvs-studio/blog/331724
  2. Чернышова А.В., Мазалов Р.А. Интегрируемый статический анализ для программного проекта с использованием архитектурного стиля REST // III Международная научно-практическая конференция Программная инженерия: методы и технологии разработки информационно-вычислительных систем (ПИИВС-2020). — Донецк, 2020. — С.125–128.
  3. Maurice H. Halstead. Elements of Software Science [Text] : Elsevier. — 1st ed. — New York : Elsevier, 1977. — 127 pp. : illus.
  4. T. J. McCabe. A complexity measure [Text] : IEEE. — 1st ed. — New York : IEEE, 1976. — 308–320 pp.
  5. Hansen, W. J. Measurement of Program Complexity By the Pair (Cyclomatic Number, Operator Count) [Text] : ACM SIGPLAN Not.– vol. 13 no. 3 — New York : ACM SIGPLAN Not., 1978.– 29–33 pp.
  6. Victor R. Basili, Richard W. Selby. Comparing the effectiveness of software testing strategies. [Text] / Victor R. Basili and Richard W. Selby // IEEE Transactions on Software Engineering., 1987. — p. 1278–1296
  7. Norman E. Fenton. Software Metrics: A Rigorous Approach [Text] : Chapman & Hall. — 1st ed. — London : Chapman & Hall, 1992.
  8. S. Henry, D. Kafura. Software structure metrics based on information flow. [Text] / S. Henry and D. Kafura // IEEE Transactions on Software Engineering,, 1981. — p. 510–518
  9. Diana Sidarkeviciute. Program analysis and visualisation: Towards a declarative approach. [Text] / Diana Sidarkeviciute // Informatica., 1997. — p. 153–175
  10. P. Young, Malcolm Munro. Visualising software in virtual reality. [Text] / CP. Young, Malcolm Munro // In Proceedings of the IEEE 6th International Workshop on Program Comprehension / IEEE Computer Society. — Ischia, 1998. — p. 19–26
  11. Щелоков С.А. Проектирование распределенных информационных систем [Текст] : курс лекций по дисциплине «Проектирование распределенных информационных систем» / С.А. Щелоков, Е.Н. Чернопрудова; Оренбургский гос. ун-т. — Оренбург : ОГУ, 2012. — 195 с. : ил.
  12. Крищенко В.А., Горин С.В. Поддержка разработки распределенных приложений в Microsoft .NET Framework [Текст] : 2 изд. — М. : Национальный Открытый Университет ИНТУИТ", 2016. — 249 с. : ил.
  13. Fieldling R.T. Architectural Styles and the Design of Network-based Software Architectures. Master’s thesis. University of California, Irvine, 2000. https://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
  14. Mazalov R.A., Chernyshova A.V., Gilmanova R.R. Overview of REST architectural style // VI Международная научно-техническая конференция Современные информационные технологии в образовании и научных исследованиях (СИТОНИ-2019). — Донецк, 2019. — С.442–448.
  15. Continuous Integration для новичков [Электронный ресурс] / Что такое CI // Habr. — Электрон. дан. — 2018. — Режим доступа: https://habr.com/ru/post/352282. — Загл с экрана.
  16. Какая разница между Continuous Delivery, Continuous Deployment и Continuous Integration [Электронный ресурс] / Continuous deployment(непрерывное развёртываение) // Qaat. — Электрон. дан. — 2017. — Режим доступа: https://qaat.ru/kakaya-raznica-mezhdu-continuous-delivery-continuous-deployment-i-continuous-integration. — Загл с экрана.
  17. Github: Where the world builds software [Электронный ресурс] / Github // Github. — Электрон. дан. — 2020. — Режим доступа: https://github.com. — Загл с экрана.
  18. The first single application for the entire DevOps [Электронный ресурс] / Gitlab // Gitlab. — Электрон. дан. — 2020. — Режим доступа: https://gitlab.com. — Загл с экрана.
  19. Описание формата CSV // FilesReview [Электронный ресурс]. — Режим доступа: https://filesreview.com/ru/info/csv
  20. What is HTML? // W3C HTML [Electronic resource]. — Mode of access: https://www.w3.org/html/
  21. HTTP [Электронный ресурс] / Определение // MDN. — Электрон. дан. — 2020. — Режим доступа: https://developer.mozilla.org/ru/docs/Web/HTTP. — Загл с экрана.
  22. Работа с JSON [Электронный ресурс] / Определение // MDN. — Электрон. дан. — 2020. — Режим доступа: https://developer.mozilla.org/ru/docs/Learn/JavaScript/Объекты/JSON. — Загл с экрана.
  23. Подписываем данные: HMAC на практике в API и Web-формах [Электронный ресурс] / Определение // Habr. — Электрон. дан. — 2015. — Режим доступа: https://habr.com/ru/post/262341. — Загл. с экрана.
  24. About webhooks [Electronic resource] / Learn the basics of how webhooks work to help you build and set up integrations // Github Docs. — 2020. — Mode of access: https://docs.github.com/en/free-pro-team@latest/developers/webhooks-and-events/about-webhooks
  25. Пошагово объясняем, как работает алгоритм хеширования SHA-2 (SHA-256) [Электронный ресурс] / Определение SHA-2 // Tproger. — Электрон. дан. — 2020. — Режим доступа: https://tproger.ru/translations/sha-2-step-by-step. — Загл с экрана.
  26. Blob [Электронный ресурс] / Определение // MDN. — Электрон. дан. — 2020. — Режим доступа: https://developer.mozilla.org/ru/docs/Web/API/Blob. — Загл с экрана.
  27. Кодирование и декодирование в формате Base64 [Электронный ресурс] / Определение Base64 // MDN. — Электрон. дан. — 2020. — Режим доступа: https://developer.mozilla.org/ru/docs/Web/API/WindowBase64/Base64_encoding_and_decoding. — Загл с экрана.