Українська   Русский
DonNTU   Masters' portal

Abstract

Анимация

Content

Introduction

A universal and commonly accepted definition of "Dictionary" currently does not exist.

There are several reasons for this:

Therefore, different sources have different definitions of the term «dictionary»:

With the development of computer technology, electronic dictionaries and online dictionaries become more and more common.

1. Topic relevance

Everyone needs dictionaries — from beginner to professional linguists, translators and interpreters. In fact, any dictionary deserves the right to life. And since now the computers are becoming increasingly important, they take place not only among the programmers and engineers, but also among a wide variety of users, including linguists, translators and specialists who need prompt translation of foreign language information, the computer dictionaries have become a very convenient means at hand to to save time and optimize the process of understanding a foreign language information. In addition, there are now translation software that can produce more or less adequate translation of foreign language texts and can be a help in the work of experts of various profiles. [14]

This work is dedicated to addressing these problems, as well as some linguistic analysis software designed to automate the translation process.

2. The purpose and objectives of the study, expected results

The goal of this work is to create an electronic dictionary with the help of ontologies.

The main objectives of the work:

  1. Analysis of the relevance of the dictionaries in our time.
  2. Comparing paper and electronic dictionaries.
  3. Analysis of the possibilities of using ontologies for creating online dictionary.
  4. Research on existing analogs with the example of WordNet
  5. Development an online dictionary.

3. Concept and types of dictionaries

The main feature of the dictionary is that he mainly reports the information relevant from the point of view of interpretation, the use or replacement of the signs contained on the left side. Quantitative qualifier «mainly» is introduced in order to ensure a smooth transition from the «unconditional» vocabularies to intermediate types, as well as to provide the compilers of the dictionary some discretion with regard to the optional information.

Specific features of the dictionary:

3.1. The typology of dictionaries

Dictionaries can be divided into two main types: linguistic and encyclopedic.

The object description of the language dictionaries — linguistic units (words, word forms, morphemes).

In such a dictionary a word (word form, morpheme) may be characterized from different sides (multidimensionally):

Depending on how many word features are described in the dictionary, there are single and multidimensional dictionaries.

Synchronic linguistic dictionaries reflect the language section of a certain period of time (eg, XVIII century language, modern language).

Diachronic (etymological) — reflects the development of language over time.

Encyclopedic — dictionaries that contain extralinguistic information about the following linguistic units; These dictionaries contain information about scientific concepts, terminology, historical events, personalities, geography and so on. The encyclopedic dictionary has no grammatical information about the word, and the given information is about an object, denoted by the word.

Term dictionary — contains the terms of any knowledge of the area or theme and their interpretation.

Special attention deserves the distinction of linguistic (especially explanatory) and encyclopedic dictionaries, that, first of all, is that in the encyclopedic ones describe the concept (depending on the volume and destination of the dictionary more or less unfolded scientific information is given), in the explanatory ones — linguistic values.

In the encyclopedic dictionaries a lot of entries in which the header word is their own name.

Encyclopedias, reference books and dictionaries, as well as research materials used in everyday life to get information on a variety of issues. [1]

3.2. The main components of a dictionary

Before the creation of the dictionary following components should be established, without which the dictionary cannot exist.

1. Vocabulary — a list of words compiled in the process of working on a dictionary.

Vocabulary in encyclopedias — a complete list of articles names (terms), usually with a short summary and an indication of the size of the articles (in print).

Vocabulary in language dictionaries — an alphabetical list (register) of vocabulary items (words, phraseological units, etc...), subjected to the interpretation or translation.

Encyclopedic edition usually starts with a thematic vocabulary creation on different branches of knowledge varying from common concepts to private terms. On the basis of the consolidated general thematic vocabulary an alphabetical vocabulary of all publications is compiled.

During the creation of a vocabulary:

Work on the vocabulary is closely related to the bibliography planning premises, illustrations, maps, and other supplementary material.

2. Glossary — a dictionary of highly specialized terms in any field of knowledge to the interpretation, sometimes translated into another language, comments and examples.

The collecting of glosses and glossaries became the predecessors of a dictionary.

Gloss — a foreign or unfamiliar word in the text of the book with the interpretation placed by the word either above or below it, or at the borders.

Originally a foreign or unfamiliar expression itself was called a gloss.

Glossary is a list of commonly used expressions.

3. Alphabet vocabulary — a list of unfamiliar words with a brief explanation (usually glosses to the text). Alphabet vocabularies were made in the XVII century in Belarus, Russia, Ukraine.

The articles were arranged alphabetically (usually considered only the first letter), hence the name.

4. Thesaurus — a collection of data (housing, vault) covering a maximum concept fullness, definitions and terms of a specific field of knowledge or activity, with examples of their use in texts.

Thesaurus (in modern linguistics) — a special kind of dictionaries of general or specialized lexis, which indicate the semantic relations (synonyms, antonyms, paronyms, hyponyms, hyperonyms, etc...) among the lexical units.

The main structural elements of the dictionary design:

3.3. Electronic dictionaries and online dictionaries

Currently, computers are becoming increasingly important not only among the programmers and engineers, but also among a wide variety of users, including linguists, translators and specialists who need prompt translation of foreign language information. In this context, computer dictionaries are very convenient means at hand to save time and optimize the process of understanding a foreign language information. In addition, there are now translation software that can produce more or less adequate translation of foreign language texts and can be a help in the work of experts of various profiles. This work is dedicated to addressing these problems, as well as some linguistic analysis software designed to automate the translation process. [14]

Electronic Dictionary — a dictionary in a computer or other electronic device. It allows you to quickly find the right word, often taking into account the morphology and the ability to search phrases (usage examples) as well as the ability to change the direction of translation (for example, English–Russian and Russian–English). By design an electronic dictionary is a database with dictionary entries.

Electronic dictionaries are not to be confused with computer dictionaries, not intended for users, and for computer programs working with texts in natural languages. [3]

Right now, electronic dictionaries are out of the paper dictionaries’ shadows and become independent players on the site language, the players who in the near future might make the rest of the dictionaries into the exhibits of a book museum. After all, electronic dictionaries have a number of obvious and significant advantages compared with conventional dictionaries. The only drawback is the same attachment to a personal computer and, therefore, limited availability. However, this disadvantage is remedied soon enough, if not completely, then at least to a greater extent, as a consequence of the ever–increasing pace of computerization, including the growing availability of portable Laptop computers. [14]

Popular electronic dictionaries:

Today, electronic dictionaries are as relevant as ever.

After all, the fundamentally best paper dictionaries — dictionaries inevitably outdated.

This is especially true for the casual language, in particular the offensive. In this area, the classic Russian dictionaries appear not only outdated, but also simply hypocritical. Establishing functions of the current language state take on growing like mushrooms after rain, small dictionaries, usually very opportunistic and superficial. The new values in them are separated from their linguistic roots, poorly or arbitrarily explained.

For the mass software products, which the electronic dictionaries are, the characteristic feature is the frequent changes of versions and the presence of ongoing feedback from thousands of users. Therefore, computer lexicography — this is inevitably relevant lexicography. The lifespan of an electronic dictionary should be like a hard life of other software systems: with a maniacal desire of especially harmful users to find the next error or gap, and on the other hand, with the possibility and the need to correct the matter now, rather than in decades. [14]

Online Dictionary — an electronic dictionary, posted on the Internet. Online dictionaries today are quickly gaining popularity. They are placed at many search portals.

There are three types of online dictionaries:

Known examples of online dictionaries:

4. Ontologies

Ontology in computer science — an attempt of a comprehensive and detailed formalization of a field of knowledge with the help of a conceptual scheme. Typically, such a scheme consists of a data structure containing all the relevant object classes, their relationships and rules (theorems, restrictions) taken in this area. [16]

4.1. The definition of ontologies

Ontologies are used in the programming process as a form of knowledge representation of the real world, or part of it. Key applications — business process modeling, semantic web, artificial Intelligence.

Modern ontologies are constructed largely the same way regardless of the writing language. They generally consist of instances, concepts and relationships and attributes.

Instances or individuals — the main low level ontology components. Instances may be physical objects (people, houses, planets) or abstract (numbers, words).

Strictly speaking, an ontology can do without specific objects. However, one of the main objectives of the ontology is a classification of such objects, so they are also included.

Concepts or classes — abstract groups, collections or sets of objects. These may include items from other classes, and any combination of both. Example: The concept of «people», the embedded concept of «man». What is a «person» — an embedded concept, or an instance (individual) — it depends on the ontology.

The definition of «individual», an instance of «individual».

Ontology classes constitute a taxonomy — a hierarchy of concepts towards installments.

In the ontology objects may have attributes. Each attribute has at least a name and a value, and is used for storing information that is specific to the object and is attached thereto. The attribute value can be a complex data type. Important attributes role is to define relationships (dependencies) between ontology objects. Typically, the ratio is an attribute whose value is another object.

Ontologies can be general and specialized. General ontologies are used to represent concepts that are common to a large number of areas. Such an ontology contains a basic set of terms, glossary or thesaurus used to describe the subject areas. Specialized (object–oriented) ontology is a representation of a field of knowledge or a part of the real world. In this ontology there are contained term meanings special for the area. If a system using specialized ontology develops, it may require the union. The ontology association subtask is the problem of ontology mapping. And for the engineer ontologies is a serious problem. Ontologies even on close areas may not be compatible with each other. The difference may appear due to the characteristics of the local culture, ideology, or due to the use of another language of the description. Ontology merging is operated both manually and semi–automatically. In general, it is a time–consuming, slow and expensive process. Using the basic ontology — a common glossary — simplifies the job. There are scientific works on combining technology, but they are mostly theoretical.

In recent years the development of ontologies — explicit formal definitions of terms in the domain and the relationships between them — is transferred from artificial intelligence laboratories to the knowledge field experts desktops. In the World Wide Web ontologies have become commonplace. An ontology defines a common vocabulary for researchers who need to share information in the subject area. It includes machine–interpretable formulations of basic concepts in the domain and the relationships between them.

The Web–page content ontologies are needed for the search programs to improve the quality of search on the Web. The idea of building conceptualization specifics of the Web–pages content is at the base of the concept of so–called Intelligent Web or Semantic Web.

The formal specification of the Web–document contents gives the search program the opportunity to draw conclusions about the search query according to this Web–document not only based on syntactic information obtained from the text of the document, but also based on the semantics of the contents of this document. This can dramatically improve the quality of Web–search, such as Web–page description of the world, understood by the search program, provides significantly more information than it can get from an unstructured text. [16]

4.2. Ontologies description languages

Ontologies description language is a formal language used to encode the ontology. There are several such languages:[16]

The Resource Description Framework language (RDF) — a Web resource description system. Designed to describe the contents of the Web. In the Semantic Web, when they talk about some entities Web, they call these entities resources. RDF is a language for describing such resources. Since the description of the semantics of the documents should be understandable to computers, it is necessary to develop special agent programs that would produce such a reading. You must also enable the exchange of information between different software agents. Thus, under the RDF is meant not only the language itself, but also a variety of additional software modules necessary to provide full read and exchange of information recorded in this language. This fact is emphasized in the RDF language title.

The main RDF element of language — a triple or triplet. Triple is a combination of three entities:

  1. Subject.
  2. Object.
  3. Predicate.[4]

Predicates is often called properties. A triple also has representation in the form of a graph type subject — predicate — object, where subject and object are represented as nodes, and the predicate acts as a rib which connects these components.

RDF Scheme (RDFS) is an extension of the RDF language allowing to describe simple data ontologies, located in RDF storages. Just as the database scheme describes the structure of the database in the form of tables, headers and connections between them, RDF scheme allows to describe the structure of RDF– store. The structure describes the store in terms of types and relationships between them. In fact, RDF scheme allows to describe only the classification with some additional relationships. To describe more complex types of relationships, it is necessary to attract more powerful tools, such as OWL. In RDFS classes can be given which are defined in the descriptive logic as unary relations. [17]

OWL — Web Ontology Language, W3C standard language for semantic statements, developed as an extension of RDF and RDFS. At the core of the language is the representation of reality in the data model «object — the property.» OWL is suitable to describe not only Web pages but also any object validity. Each element in this description language (including properties, binding sites) is assigned a URI.

KIF (Knowledge interchange format) — based on the S–expression syntax for logic. KIF is similar to frame– based languages, such as KL– One and LOOM, but unlike such languages its primary role is not as a framework for the expression or use of knowledge, but for knowledge– sharing between systems. KIF developers have compared it to PostScript. PostScript was not designed primarily as a language for the storage and processing of documents, but rather as a data exchange format for systems and devices to share documents. Likewise, KIF is designed to facilitate the exchange of knowledge in the various systems that use different languages, formalism, platforms, etc.

Common Logic (CL) — KIF successor (standardized — ISO / IEC 24707: 2007). Determination of CL allows and encourages the development of many different syntactic forms, called dialects. Dialect can use the desired syntax, but it should be possible to show how the specific dialect syntax corresponds to the abstract semantics of the CL, which is based on the model of theoretical interpretation. Each dialect can then be treated as a formal language. After syntactic compliance is established, the dialect gets semantics CL free, as they relate only to the abstract syntax, and therefore inherited by any compatible dialect. Additionally, all CL dialects are equivalent (can be mechanically translated to each other), although some of them may be more expensive than others.

CycL — an ontological language used in the Cyc project. It is based on the predicate calculus with some higher–order extensions. CycL is used to represent the knowledge stored in the Cyc knowledge available from Cycorp. The source code written in CycL released with the OpenCyc system receives a license with an open source code to increase its usefulness in supporting the Semantic Web.

To work with the languages of ontologies, there are several types of technologies: ontology editors (for creating ontologies), ontology database (for storing and accessing ontologies) and storage of ontologies (to work with multiple ontologies). [16]

4.3. Lexical ontologies

A special type of ontology — lexical (or linguistic). Their distinguishing feature is the use of a resource of concepts (words) along with their linguistic properties. The main source of the concepts in the ontology of this type are the values of linguistic units. They are also distinguished by a set of relationships, usually peculiar linguistic elements: synonymy, hyponymy, meronymy and others. The linguistic ontologies include:

The circle of problems solved by these ontologies are closely linked to the processing of natural language. The main characteristic of linguistic ontologies is that their units are associated with the values of linguistic expressions (words, nominal groups, etc.), which is important when it comes to creating new ontologies and lexicalization existing. There are display most famous ontology (SUMO, OpenCyc et al.) On WordNet. [5]

In order to apply the ontology for automatic text processing, in particular, to solve the problems of information retrieval, ontology concepts are needed to be compared to a set of linguistic expressions (words and phrases) with which the concept can be expressed in the text. [6]

4.4. Electronic dictionary WordNet, as an example of lexical ontology

WordNet — an electronic thesaurus/semantic network for English, developed at Princeton University and released together with the accompanying software under an uncopyleft free license.

The dictionary consists of 4 major networks for significant parts of speech: nouns, verbs, adjectives and adverbs. The basic unit in the WordNet dictionary is not a single word, but the so–called synonyms («synsets»), combining the words with a similar meaning and essentially are network nodes. A word or phrase may appear in more than one synset and have more than one part of speech category. Each synset contains a list of synonyms and synonymous phrases and signs describing the relationship between it and other synsets. Words that have several meanings are included in several synsets and can be attributed to a variety of syntactic and lexical classes.

Synsets in WordNet are connected to each other by various semantic relations:

Also, there are various other links: lexical, antonymous, context (word «x» is related to the word «y») and others. Among them, a special role is played by hyponymy: it allows you to organize the synsets in the form of semantic networks.

WordNet can be freely used for commercial and scientific purposes. To work with him, there are several programs, many interfaces and the API, implemented in the majority of possible languages and using DICT protocol, GoldenDict program and others. Also, WordNet packages are present in some repositories software for GNU and Linux and its distributions. [18]

WordNet was originally created as a model of human memory. Many of the decisions that a description in words WordNet motivated psycholinguistic experiments.

However, it should be noted that the WordNet caused a much greater interest in computer linguists than the psycholinguists.

The basic hypothesis underlying the development of WordNet, as follows:

The basic relation in WordNet is the relation of synonymy. Sets of synonyms — synsets — are the basic structural elements of WordNet.

The definition of synonymy is based on the criteria that the two expressions are synonymous, unless the replacement of one of them on another in the sentence does not change the value of this statement is true.

The definition of synonymy used in the WordNet, requires no substitutability synonyms in all contexts — on such criteria in natural language would be too few synonyms. It used much weaker statement that WordNet synonyms should be interchangeable at least some variety of contexts. For example, the replacement plank board for the words rarely changes the truth value in the context of carpentry, but there are contexts where such replacement cannot be considered acceptable.

The definition of synonymy in terms of substitutability is what necessitates the WordNet division into separate sub–parts of speech.

The structure of the dictionary includes lexemes, pertaining to the four parts of speech: adjective, noun, verb and adverb. Lexemes of various speech parts are stored separately, and the description associated with each part of speech has different structures.

Synsets can be regarded as a representation of a lexicalized concept of the English language.

The authors believe that noun synset represents the concept of nouns, verbs express verbal concepts, adjectives — adjectival concepts, etc.

In addition, the authors believe that this division corresponds to psycholinguistic experiments that provide information that the adjectives, nouns, verbs and adverbs concepts in the human memory are arranged differently.

Most synsets are provided with interpretation, such an interpretation in traditional dictionaries, — interpretation is regarded as one of the synonyms for synset. If a word has several meanings, it comes in several different synsets. [19]

Conclusions

Master's work is devoted to an actual scientific problem of the creation of the dictionary on the basis of ontologies. In terms of the research made:

  1. The basic concept of the dictionary is explored as a whole, as well as electronic and online dictionaries in particular.
  2. The ontologies were researched as a whole, their constituent parts and components.
  3. Lexical ontologies explored as a separate category of ontologies, using the word as a resource.
  4. Several existing lexical ontologies analyzed to confirm the possibility of using them to create an electronic dictionary. As an example of such an ontology they were investigated as WordNet.

In writing this abstract the master's work is not yet complete. Final completion: May 2017. The full text of work and materials on the topic can be obtained from the author or his manager after that date.

Source list

  1. Dictionary — Wikipedia [electronic resource] Access: https://ru.wikipedia.org/wiki/Словарь
  2. The definition of "Dictionary" [Electronic resource] Access: http://lab314.brsu.by/kmp-lite/kmp2/OTT/tLecture/tDict.htm
  3. Electronic dictionary [electronic resource] https://ru.wikipedia.org/wiki/Электронный_словарь
  4. Ontologies in computer systems [electronic resource] Access: https://rsdn.ru/article/philosophy/what–is–onto.xml
  5. WordNet lexical ontology in Semantic Web technologies [electronic resource] Access: http://www.interface.ru/home.asp?artId=36209
  6. Ontologies for processing texts in natural language. Lexical ontologies [electronic resource] Access: http://www.intuit.ru/studies/courses/1078/270/lecture/6847?page=3
  7. Onltology components — Wikipedia [electronic resource] Access: https://en.wikipedia.org/wiki/Ontology_components
  8. New Collegiate Dictionary. M., 2000. — 320 p.
  9. Dal VI Explanatory Dictionary of Russian language.
  10. Ozhegov SI, NY Shvedova Dictionary of Russian language.
  11. Russian Academy Dictionary. SPb., 1806–1822.
  12. Dictionary of modern Russian literary language in 17 vols., 1948–1965.
  13. Explanatory Dictionary of the Russian language in 4 vols. ed. DN Ushakov.
  14. Electronic dictionaries and their application to traditional machine translation [electronic resource] Access: http://ref.by/refs/29/39596/1.html
  15. Online dictionary — Wikipedia [electronic resource] Access: https://ru.wikipedia.org/wiki/Онлайн-словарь
  16. Ontology_(computer science) — Wikipedia [electronic resource] Access: https: //ru.wikipedia.org/wiki/Ontology_ (computer science)
  17. What is ontology [electronic resource] Access: http://belyaev-sw1m3r2011.narod.ru/index/0–12
  18. WordNet — Wikipedia [electronic resource] Access: https://ru.wikipedia.org/wiki/WordNet
  19. Linguistic ontology WordNet [electronic resource] Access: http://www.intuit.ru/studies/courses/1078/270/lecture/6859