UA   RU
DonNTU   Masters' portal

Abstract

Introduction

Complaint – the name of the document under which the consumer's claim to the supplier of goods or services is hidden. The complaint is made in writing and is the basis for taking measures leading to the elimination of identified shortcomings, defects, marriages and other violations.

In the modern world, customer service issues, in particular the resolution of complaints, companies still pay undeservedly little attention, forgetting that it is about their reputation.

In order to learn how to manage complaints and use them to grow your business, you need to go beyond the current understanding of a customer complaint as simply an expression of dissatisfaction with them. To a rational solution of the complaint, satisfying both parties can only come in a friendly environment. It is necessary to see in the complaint a manifestation of the highest trust of the client and a way to improve the quality of the goods and services provided.

Complaint allows the buyer of a product or recipient of a service to claim that it was provided under improper conditions. The claim can be made in terms of quality, quantity, assortment, weight of any inventory items, unilateral change in their cost, delivery time and other parameters.

A complaint can be made on behalf of an individual or an organization. In the second case, this letter can be written by any employee of the company authorized to create such claims and having sufficient this level of knowledge, qualifications and familiarity with the law.

Today, this document does not have a unified sample that is mandatory for use, therefore it can be drawn up in any form.

An important task when working with claims is to classify them according to the type of claim and determine which department or specific employee should receive it in order to analyze and prevent the described errors in the future.

To solve this problem, it is proposed to create a decision support system for the production documentation management process (DSS) – computer automated system, some intelligent instrument that used by decision makers in difficult conditions for a complete and objective analysis of subject activity. DSS is designed to support multicriteria decisions in a complex information environment. At the same time, under the multicriteria it is understood that the results of the decisions made are evaluated not by one, but by the totality of many indicators (criteria) considered simultaneously.

1. Relevance of the topic

Due to the increased volume of electronic document management, it became difficult for sales department employees to process a large amount of information.

Today, the complaint does not have a unified sample that is mandatory for use, therefore it can be drawn up in any form and is a document in an unstructured form. There is a need to extract useful information and, in the future, the classification of complaints according to various criteria (for example, by type of complaint) and the identification of the department that allowed the defect. The task of developing a modern intelligent system for supporting the adoption of managerial decisions in the sales department.

The main activity of the enterprise in question is related to the production and marketing of cosmetic products. In the chain, the company – the consumer may find problems with the product: incorrectly pasted label, defective packaging, damage to goods during transportation, etc. In such cases, the client has the opportunity to contact the manufacturer in order to resolve the situation – prepare and submit a claim.

2. Purpose and objectives of the study, planned results

The purpose of creating an intelligent system for processing and classifying complaint texts at the enterprise is to increase the efficiency of the complaint processing process by reducing the time spent by employees on information analysis.

To do this, you need to complete the following tasks:

The object of research is the process of processing complaints in the sales department.

The subject of the work is the classification of complaint texts according to the type of problem using pre-processing of the document text, knowledge representation model and text classification methods.

Supposed scientific novelty:

3. Overview of existing tools

Let's consider several well-known tools similar to the theme of the system being developed:

Reviewed have advantages:

Also, the tools have their drawbacks:

Next, consider the models and methods used in existing software solutions.

4. Formalized problem statement

Let D – many documents, C – many categories, F – an unknown objective function that, given the pair [di , cj], tells whether the document di belongs to the category cj or not.

The task of classification is to build a classifier that is as close as possible to the function.

The task of exact classification is set, i.e. each document belongs to only one category.

5. An overview of the document text preprocessing model

The process of obtaining an indexed representation of the body of a document is called document indexing. Indexing is done in two steps, see on figure 1: [4]

  1. Extract terms – at this stage, the search and selection of the most significant terms in the entire set of documents is performed. The result of this stage is the set of terms T used to obtain the weight characteristics of documents.
  2. Weighing – the significance of the term for this document is determined. The weight of terms is given by a special weight function.
Term extraction stage

Figure 1 - Term extraction stage
(animation: 12 frames; 3 loops; 116 kilobytes)

Let's take a closer look at the term extraction stage:

  1. Graphematic analysis – all characters that are not letters are filtered out (for example, html tags and punctuation marks).
  2. Lemmatization – when building a text classifier, it makes no sense to distinguish between forms (conjugation, declension) of a word, since this leads to an excessive growth of the dictionary, increases resource consumption, and reduces the speed of algorithms. Lemmatization is the reduction of each word to its normal form.
  3. Reducing the dimension of the feature space – words that are not useful for the classifier are removed.
  4. Highlighting key terms – usually single words found in the document are used as terms. This can lead to a distortion or loss of meaning, which, for example, lies in phraseological units that are indivisible vocabulary units from the point of view of linguistics. Therefore, when processing abstracts, instead of individual words, phrases (key terms) specific to a given subject area are distinguished.

6. An Overview of Knowledge Representation Models

Knowledge Representation Model (KRM) – it is a way of setting knowledge (extracted information from documents) for storage, easy access and interaction with them, which fits the task of an intelligent system. [5]

Four main KRMs are common:

1. Production – it is based on the constructive part, production (rule):

IF Condition THEN Action

Pros of production models:

Cons of the production system:

2. Semantic web – the basis is a directed graph. Graph vertices – concepts, arcs – relationships between concepts.

Pros of semantic networks:

Cons of semantic networks:

3. Frame – The basis of the frame model is the frame. Frame – it is a frame, a template that describes an object of the subject area, using slots. Slot – it is an attribute of the object. A slot has a name, a value, a stored data type, a daemon. Demon – procedure automatically performed under certain conditions.

Pros of the knowledge frame model include:

Cons of the frame system are:

4. Formally, the logical – based on a first-order predicate. It is assumed that there is a finite, non-empty set of objects in the subject area. On this set, with the help of interpreter functions, links are established between objects. In turn, on the basis of these connections, all the laws and rules of the subject area are built.

Pros of the logical model:

Cons of the logical model:

Recently, a new way of representing knowledge in intelligent systems is gaining popularity – ontology. An ontology is understood as a system of concepts (concepts, entities), relations between them and operations on them in the considered subject area, in other words, ontology – it is a specification of the content of the domain. [6]

The use of ontologies allows avoiding the loss of computer time for the analysis of concepts that are not included in the subject area.

7. An overview of text classification models

7.1 Bayes Method

This algorithm is based on the maximum posterior probability principle. For the classified object, the likelihood functions of each of the classes are calculated, and the a posteriori probabilities of the classes are calculated from them. The object belongs to the class for which the posterior probability is maximum.

Pros:

Cons:

7.2 Support Vector Machine (SVM)

It is used to solve classification problems. The main idea of the method is to construct a hyperplane that separates the sample objects in an optimal way. The algorithm works under the assumption that the greater the distance between the separating hyperplane and objects of separable classes, the smaller the average classifier error will be. [7,10]

Pros:

Cons:

7.3 k-nearest neighbors

In order to find rubrics relevant to a document, this document is compared with all documents in the training set. For each document from the training sample, the distance is found - the cosine of the angle between the feature vectors. Coming from of the training sample, the documents closest to ours are selected. Relevance is calculated for each category. Categories with relevance above some given threshold are considered relevant to the document. [8,11]

Pros:

Cons:

All of the previously listed methods, except for the Bayesian method, use a vector representation of the document, in which the content is represented as a vector of terms contained in the document. The classifier is a special document whose vector is formed at the training stage and consists of the average values of the weights of the terms included in the documents of the training sample. These methods have quite a lot in common and differ only in the method of training and compiling a classifier vector. The classification itself is the calculation of the angle between two vectors, as the degree of their similarity.

If a domain ontology is used for classification, then the document vector can be compared with the vector of the ontology itself. This implies two important differences from classical machine learning methods: [9]

  1. The description of the subject area in the form of an ontology is itself a classifier, thus, time and computing resources are not wasted on building an average document from the training sample.
  2. With this approach, only those terms that are included in the considered ontology are included in the document vector. This means that those concepts that are not included in the set of ontology concepts leave the process of calculating the weights of terms.

Conclusions

At this stage of the master's work, the goal and objectives for the system were determined, similar tools were studied and analyzed on the subject of the master's work. Existing methods are described and analyzed knowledge representation and text preprocessing.

When writing this essay, the master's work has not yet been completed. Final Completion: May 2023. The full text of the work and materials on the topic can be obtained from the author or his supervisor after the specified date.

List of used sources

  1. RCO Text Categorization Engine [Electronic resource]. – Access mode: [Link]
  2. OpenText Auto-Classification [Electronic resource]. – Access mode: [Link]
  3. ABBYY FlexiCapture. Универсальная платформа для интеллектуаль¬ной обработки информации [Electronic resource]. Access mode: [Link]
  4. Леонова Ю. В., Федотов А. М., Федотова О. А. О подходе к классификации авторефератов диссертаций по темам // Вестн. НГУ. Серия: Информационные технологии. 2017. Т. 15, № 1. С. 47–58.
  5. Представления знаний в интеллектуальных системах, экспертные системы [Electronic resource]. – Access mode: [Link]
  6. Грушин М.А. Автоматическая классификация текстовых документов с помощью онтологий // ФГБОУ ВПО МГТУ им. Н.Э. Баумана. Эл No. ФС77-51038
  7. К. В. Воронцов. Лекции по методу опорных векторов [Electronic resource]. – Access mode: [Link]
  8. Классификация данных методом k-ближайших соседей [Electronic resource]. – Access mode: [Link]
  9. Данченков С.И., Поляков В.Н. Классификация текстов в системе узлов лексической онтологии // Физико-математические науки. Том 152, кн.1, 2010 г.
  10. Машина опорных векторов [Electronic resource]. – Access mode: [Link]
  11. Метод k взвешенных ближайших соседей (пример) [Electronic resource]. – Access mode: [Link]