Abstract

Content

Introduction
1. Theme urgency
2. Goal and tasks of the research
3. Analysis of the existing test systems of English language knowledge
4. Methods of language knowledge assessment
4.1 Fuzzy search in vocabulary
4.2 Morphological analisis
4.3 Morpheme analisis
Conclusion
References

Introduction

Today, information technology in the world is rapidly being introduced into our lives, including in the sphere of education and knowledge control. The concept of distance learning is more and more often met in our life. On this basis, the problem of rapid and objective automated monitoring of students' knowledge is becoming more urgent. Despite this, almost all the existing automated methods for assessing knowledge are significantly inferior to the expert assessment of knowledge.

1. Theme urgency

Most of the systems for evaluating language skills do not allow flexible assessment, depending on the severity of errors. In these systems, it is necessary that the answers were identical with the right ones, and the points are set according to the principle "all or nothing". Hence, the development of a flexible evaluation system of knowledge is relevant, especially nowadays, when various distance learning courses become more and more popular.

2. Goal and tasks of the research

The goal of the research is developing a system of knowledge testing of a foreign language with a subsystem of error analysis.

To achieve this goal it is necessary to perform a number of tasks:

The analysis of methods for assessing language skills.
Creating a test set including correct answers, and variants of answers containing misprints and grammar mistakes.
The development of the language skills testing system.
The development of a subsystem of intellectual analysis of mistakes.
System testing.

The subject of research is the system language skills assessment, approximate to the expert evaluation.

The object of this research is intellectual knowledge evaluation.

Research methods: Analysis and generalization of the information, fuzzy methods, methods of information theory and computer linguistics.

The areas of the system use are language schools, and distance learning systems, in which the assessment of knowledge of foreign words and grammar is performed through a set of different tests.

3. Analysis of the existing test systems of English language knowledge

The analysis of a variety of online-systems to carry out testing of the language skills has been performed. The important criterion of the systems choice is its accessibility.

1. http://www.englisch-hilfen.de/en/

Among the English language portals analyzed this portal offers the widest range of different language tests. Despite the rich variety of tests, the answers to the job are mainly given as a list, which means that it is possible to guess, but where it is necessary to enter the word manually the checking of the responses is performed on the principle of "all or nothing", which, of course, is a disadvantage of these job options.

2. http://englex.ru/

The online-English language school INGLEX provides an opportunity of determining the level of language knowledge with the help of a comprehensive test knowledge in four areas: grammar, vocabulary, reading and listening comprehension. Despite the variety of tasks, this testing system always offers a variety of answers. That is its disadvantage.

3. http://englishteststore.net/

This portal can boast a rich set of English tests and exercises, targeted at different levels of knowledge. Here you can check your reading skills, oral speech perception, check the vocabulary and grammar level. However, all these tasks lack answers provided.

4. http://www.study.ru/test/

The tests provided by this portal differ from the rest by the fact that they allow to test knowledge of the various parts of speech (pronouns, prepositions, articles, and others, as well as the ability of build interrogative sentences composing). The tasks provide a choice of answers (the disadvantage described earlier).

In the WEB you can find a huge number of English language tests which allocate neither any variety or choice of test subjects nor differentiated tests to assess language skills. Here are some of such systems:

Despite the great abundance of tests it was not possible to find those that would give differentiated scores for the answers.

In addition, besides online-systems of language skills testing, there is a PC software, which is designed for the same purpose. Such software can be characterized using the following criteria:

The number of job types.
Ability to work online.
Ability to load tests from a file and easiness of tests preparation.
Ability to run by a signal of the teacher.
Ability to edit the program.
Cost.

In the course of analysis such systems have been noted: OpenTest2, MyTestXPro, Indigo, Airen. Among the most perspective ones two systems are distinguished, each of them having its advantages and disadvantages:

OpenTest2. Advantages: free distribution, open code, WEB-orientation. Disadvantages: a small number of job types.
MyTestXPro. Benefits program: rich choice of jobs types, low price. Disadvantages: proprietary code, setting up inconvenience for working on-line.

4. Methods language knowledge assessment

Karpova identifies the most common methods of knowledge assessment [1]:

undifferentiated – the most simple one. It consists in the following: if the student has given a response that coincides with the standard, he/she receives the maximum score, otherwise – minimal. Partially correct answer does not exist.
Systems, allowing you to select the correct answer from the list, which contains the correct answer, incorrect and partially correct ones. The points for them can vary. The obvious disadvantage of this approach is the need to enumerate all the possible answers and invent plausible but incorrect answers.
Systems that provide a student a few tries to select the correct answer, in this case for each wrong answer the amount of points is reduced.

Since these types are far from knowledge assessment methodology made by the teacher, we need to develop a system that will evaluate the knowledge of subjects in the same way as a teacher. One of the possible systems can be as follows: there is a proposal of a missing key word, for example, the verb that you have to manually enter in the required form. If the inscribed word coincides with the reference, the answer is correct, the student receives the maximum score for the test. If not – the system has to determine what kind of mistake was made, and then to determine how many points is to be given for the work. An important fact is that experts do not have to sort all possible errors, all that is required is to specify once the types of errors and the number of points that each of them is "worth". Also, this system will be considered as a universal one, as it is designed for determining errors in any part of speech, and in any subject.

Errors identification can generally be accomplished in several stages:

Figure 1 – Stages of errors identification (animation: the number of frames – 7 pcs, frequency – 1/sec, cycles of repeating are not limited, size – 42.5 Kb.)

Let us consider each step in details.

Comparison of the answer to the job with a standard. If the answer matches with the standard maximum quantity of points is given .
If the answer doesn’t coincide with the standard fuzzy Search in the dictionary is performed to receive information about the answer. This information contains data of word forms, person, etc.
If the answer is not found, or if the answer is too different from the compared words, points for the task are not given and the answer is considered fully wrong.
We compare the morphological features of the most likely word that the student meant.
With the help of morphemic analysis the part of the word where the error was made is determined. Depending on the place of the error it is necessary to decrease the score by different number of points.
In case of non-critical errors (based on paragraphs 4 and 5) the number of points that must be subtracted is determined.
Based on paragraphs 1, 3, 6 points for the job are given.

The key stages of the system under development are fuzzy search in the dictionary, morphological and morphemic analysis.

4.1 Fuzzy search in vocabulary

As the developed system is designed to test the language skills, it is understood that the entered words will contain errors, so it is necessary to provide a way to search misspelled words. For this purpose fuzzy search is required, which makes it possible to find similar words. The degree of similarity is determined using metrics. By metric of fuzzy search we understand function of the distance between two words, allowing to assess the degree of their similarity in the context. As the metrics Hamming, Lowenstein, Damerau – Lowenstein distances [2] are used. The developed system uses a modification of the Levenshtein distance – the distance Damerau – Lowenstein. The essence of this modification consists in the fact that characters transposition operation is added to the operations of characters deleting and replacement as defined in Lowenstein distance.

Here are the examples of such distances.

Right – Rigth: 1

Rabbit – Rabit: 1

Fly – Flai: 2

Thus, in case of discrepancy with the standard words, with the help of fuzzy search, we obtain the word, that the student meant, despite possible errors, and continue working with it.

4.2 Morphological analysis

The aim and result of morphological analysis is to determine the morphological characteristics of the word and its basic word form [3].

There are three main approaches of carrying out morphological analysis. The first approach is often called "clear" morphology. The second approach is based on a system of rules that determine morphological characteristics for a given word. In contrast to the first approach, it is called "fuzzy" morphology. The third, probabilistic approach is based on the compatibility of words with specific morphological characteristics [3].

The first approach is the most suitable for our purposes, because it receives single words at the input (as the second approach does), while the third (probability) approach receives a part of the sentence at the input. That is not acceptable to testing the system of linguistic knowledge. The first option is easier to implement, however, it has a major drawback – the words coming to the input, may be not included in the dictionary of word forms. Such a situation may arise due to input string errors.

Gashkov offers the solution of this problem in his paper "Increasing the accuracy of morphological features determination of unknown words by analogy method with the help of fuzzy sets" [4]. The author suggests that the use of analogy method combined with fuzzy sets can improve the quality of the analysis. The conducted experiments showed that the accuracy of characters definition of the words that are not included into the dictionary rises to 50%, which the author believes to be a satisfactory result [4].

4.3 Morphemic analysis

The aim of morphemic analysis of the word is to divide words into individual tokens: prefixes, roots, suffixes and endings.

[3]. Methods of morphemic analysis can be divided into two groups in a global sense.

The first group includes those methods that perform analysis only on the basis of the dictionary. It would be logical to combine this method with the previous step of answer analysis – with the morphological analysis. This will significantly reduce the time of the analysis of the answer, as morphological analysis of the word is performed and its morphemic constituents are immediately obtained. This group has an obvious drawback – if the word is not included into the dictionary, morphemic analysis will not be executed. The second group is devoid of this problem.

The second group includes those methods that do not refer a dictionary for their work when splitting words into morphemes. This group includes a variety of statistical and probabilistic methods. This group of methods needs previous learning. The obvious advantage of this group is that the morphemic analysis of words is always performed, even though it may be erroneous in some cases. Studies by Xuri TANG show Good results [5]. The author deals with probabilistic methods. His analysis method is based on the transition probabilities from n-1-letter into n-letter. If the probability is below a certain threshold – value defined in advance, these letters are considered different morpheme boundaries, and therefore, must be separated.

In general, a combination of these two methods is required for a qualitative morphemic analysis, especially if the task of other types of analysis is set, as in our case. That will significantly reduce the overall time of analysis.

Conclusion

The developed system of testing will be in demand, as widely used assessment methods consider the answer to be wrong, even if there was a simple misprint, which is not critical. For example, in the word «ccrying» we see an extra letter, which appeared as a result of hands trembling, or simply because of a long press on the key. This error is not a key one, so it would be incorrect to subtract all points.

This master's work can serve as a basis for further development, which would not be possible without a well-functioning performance of the morphological and morphemic analysis – the system of knowledge assessment using syntactic and semantic analysis. Their addition will allow the system being developed to completely cover an assessment of a foreign language knowledge.

In writing this essay master's work is not yet complete.
Final completion: May 2017. The full text of work and materials on the topic can be obtained from the author or his manager after that date.

References

Карпова И.П. Некоторые аспекты качественной оценки ответов тестируемых в системах контроля знаний [Электронный ресурс]. – Режим доступа: http://cat.convdocs.org/docs/index-194365.html
П.М. Мосалев. «Обзор методов нечеткого поиска текстовой информации» – Журнал "Вестник Московского государственного университета печати" Выпуск № 2 / 2013.
Константин Селезнёв. «Обработка текстов на естественном языке» – «Открытые системы», № 12, 2003 г.
А.В.Гашков «Повышение точности определения морфологических признаков неизвестных слов методом аналогий с помощью нечетких множеств» – Вестник Челябинского государственного университета. 2014. № 7 (336). Филология. Искусствоведение. Вып. 89. С. 20–23.
Xuri TANG. Dept. Foreign Languages Wuhan University of Science and Engineering, 430073, Wuhan, P. R. China
Карпова И.П. Анализ ответов обучаемого в автоматизированных обучающих системах. – Информационные технологии, 2001, № 11. – с.49-55.
Белоногов, Г. Г. Компьютерная лингвистика и перспективные информационные технологии. М., 2004. 248 с.

Denis Shulyanskyy

Faculty of computer science and technology (CST)

Department ACS

Speciality "Specialized computer systems"

Intellectual system of natural language knowledge assessment

Scientific adviser: Ph.D., Hmelevoy Sergey Vladimirovich