Artyom Shelyuk

Faculty institute of informatics and artificial intelligence

Department software of the intellectual systems

Speciality "Software systems"

Research and development of predective model of browser

Scientific supervisor: Professor Anatoly Barashko

Scientific advisor: Serge Nekrashevich

Abstract

  1. Introduction
  2. 1. Actuality of the theme
  3. 2. The purpose and objectives of the study
  4. 3. Expected results
  5. 4. Overview of Semantic Web technologies
  6. 4.1 The principles of constructing a model of RDF
  7. 4.2 Using dictionaries: RDF Schema
  8. Conclusion
  9. List of sources

Introduction

Nowadays, the problem of intellectual data management is one of the most relevant topics in storage industry. Increasing the amount of data accumulated industrial systems during their operation leads to a complication of management and subsequent analysis based on traditional models.

Projected that in 2012 the volume of digital information closer to exabytes in 1800, exceeding 10 times the amount of digital information in 2006 not less than 95% of the data - it's hard to manageable unstructured data, such as electronic mail, documents, Word, video, etc., and 90% of this information will never be read.

Almost all of the information on the Internet does not include semantics and therefore its search queries relevant user within a particular domain is a rather serious problem. To provide effective search and management, web application must clearly understand the semantics of the documents submitted to the network.

Actuality of the theme

In today's world takes a leading role information. Sometimes it looks good reason, sometimes speculative, sometimes completely unfounded, but she always has a huge impact on their decisions. The total amount of information is growing all the time there are new sources of it. Acquire and allocate the necessary information is becoming increasingly difficult.

Because the main problem is the development of its intellectualization, and related data integration, quality search, the integration of Web services, and much more. Effective tools for these tasks are available in the approach Semantic Web.

The purpose and objectives of the study

The aim of this work is to develop tools and methods that allow you to create a repository structured and semi-structured data based on semantic representation, as well as manage data at a high level of abstraction.

To fulfill this goal should be to solve the existing problems:

  • consider the purpose and the classification of the ontology;
  • execute a review of technology Semantic Web;
  • explore the conceptual model of UML diagrams;
  • Review and analysis of existing software products for review web data;
  • explore existing visualization techniques the data of web, based on the already created tools;
  • explore the methods and principles of creating Eclipse plug-in;
  • perform the implementation of predective model of browser.

Expected results

The graphic part of tools of the semantic browser will be sold in the Java language using object-oriented libraries extensible framework IDE Eclipse:

  • Plug-In Development Environment, a tool of extension Eclipse platform;
  • SWT, portable graphical toolkit widgets;
  • GEF, a framework to display graphics.

As a result, the work must be designed tools that allow you to view the internal structure of the databases and partially-structured files, perform conversions on objects on the basis of relations between them, supports different levels of multiplicity (one to one, one to many) and the types of relationships.

Tools will include: 1) modeling tool that allows you to design a subject area (as the primary method of describing the subject area diagram used UML); 2) the tools to bind data to the elements of the conceptual model (the map data management through the corresponding concepts), and 3) directly to the browser to view the concepts and their associated links, and sample volumes of data, etc.

Implemented tools will work with the information to a qualitatively different level, and the semantic web browser can be a basis for the construction of information and analytical systems.

Overview of Semantic Web technologies

The main idea of ​​Semantic Web is to make the information transmitted on the Web, more formalized and easy to machine perception, in particular, in order that it can be identified and classified. According to the authors of technology Semantic Web, this can be achieved through the introduction of metadata that must accompany any information and talk about its origins, format, and many other things that should be a radical way to help you find information on the Web and its treatment.

Based on open standards, the Semantic Web technologies allow us to describe and to provide meaningful information (semantics) of arbitrary data, in particular the content of the documents or application code. Saying that the machine understands the semantics of the document is meant not only the interpretation of the character set contained in the document, but that the machine understands the meaning of the document, that is, the value of the whole document. The following are the main technologies in the Semantic Web:

  • global scheme names (URI);
  • model for describing data (RDF);
  • language for describing vocabularies (RDFS);
  • means of describing relationships between data objects (ontology, and language to describe them OWL).

A key element of the Semantic Web technologies is a unique system for identifying objects. URI (Uniform Resource Identifier) ​​- the identifier of an object (resource) in the global network. Any element of the scheme, or a semantic network data model must have its own unique address (URI). Now there are two types of identifiers.

1. Universal Resource Locator (Uniform Resource Locator, abbr. URL) - is a URI, which, in addition to identifying a resource indicates how to access the resource by describing how to access it, or its position in the network.

2. Universal Resource Name (Uniform Resource Name, abbr. URN) - is a URI, which identifies a resource by name in a namespace. This allows you to refer to the resource without the use of information about its location.

The second basic component of Semantic Web - a data model, Resource Description Framework (RDF), which allows you to combine information from random sources. RDF format is most useful in the sharing of information, whose meaning can be interpreted in the same way by different software agents. Specificity of the RDF data model is that the resources and properties are identified by global identifiers (URI). RDF describes the subject area in terms of resources, resource properties and property values​​. RDF-data can be viewed as a set of statements - subject, predicate and object of approval, and represented as a directed graph formed by such statements.

The next level in the pyramid of Semantic Web technologies is RDF Schema - a language for describing RDF-vocabularies of terms. RDFS provides the foundation for a rich description languages ​​domain ontology, which allows the system to adapt to the Web to provide logic and semantic processing. RDF Schema provides a type system for the Semantic Web and allows you to define classes of resources and properties of the elements of the dictionary, in particular, to ask what properties of what classes can be used.[17]

The principles of constructing a model of RDF

The basic building block of the data model RDF - a statement which is a triple: a resource named property and its value. In the terminology of RDF statements, these three parts are called, respectively: the subject (subject), the predicate (predicate) and object (object) [5]. Resource in this case is called everything is described by means of RDF. This may be a common Web-page or any part thereof, for example, a single HTML element markup. Also, a resource may be a collection of pages, for example, Web-site. And finally, as a resource may be a something that is not accessible directly through the Internet, for example, an arbitrary object from the world of things.

In RDF, beneath the property (Property) should be understood as an aspect, characteristic, attribute or relation used to describe a resource. Each property has its own specific meaning, valid values​​, type of resources that it can be applied, as well as with other properties. To ensure the uniqueness of the property names adhere to the concept of URI, ie property becomes a potential target for the description using RDF characterized separately from the existing resource and value.

Thus, each property in the RDF is itself a resource and may have their own attributes. This fact makes the data model is made ​​of wood, which is an XML-markup in a directed graph. The vertices of this graph are the subjects and objects, and arcs - named properties. Since the property can in turn be the subject of a statement, the graphs may be linear or nested, for example, we may express doubt or acceptance of any claim or indicate the source of the information [3].

One of the universally valid properties is «type», referring to the namespace specified by the specification directly to RDF. It allows you to specify the class of the described resource. This could be a car person, book, etc., and may be a sequence of objects (for the expression of this fact there is a special meaning «Seq», also belonging to the namespace of RDF). According to the specification [4], the value of a property can have one of two types.

The first - a resource defined by a certain URI. The second type - the literal, there are some characteristics of a text value. However, a literal can express the value of any primitive data type that is present in the XML. His text can also contain some kind of markup, for example, XML, but the distinguishing feature of this layout is that it is not handled by RDF-processor, and is perceived as an ordinary string [17].

Using dictionaries: RDF Schema

The data model itself is just a skeleton. In order for the description of acquired some sense, it is necessary to use dictionaries, which are defined by a complementary technology - RDF Schema, RDF for playing the same role as the schema for XML. Under the dictionary refers to a set of resources that are used to describe the properties of other resources, resource classes, which can be described using the specified properties, and restrictions on their values ​​or sets of valid values. In this class may consist of "subclass" and similar properties can be connected by the relation "subproperty" [6].

The data model is constructed using the appropriate dictionaries, offers a meaningful description of the resources, but this is not sufficient for understanding the Web machines. Just as a person does not have the ability to transfer knowledge to another if they both know how to speak the same language, but use a different vocabulary for this purpose, the goal is not achieved until you have developed common vocabulary to describe some facts, and the program does not will be able to use them.

The real value of RDF is impossible to assess until it is used for internal purposes of a single application. The benefits from the introduction of RDF will be when it becomes a means of inter-program collaboration, data exchange, when the machines get the ability to combine information obtained from various sources, thus getting some new information.

The more applications on the Internet will be able to work with data, the higher will be their value [5]. At the same time, the RDF is perfect for the submission of the data themselves, their structure and relationships. Thus, the application of specially developed RDF-Schema (as a means to describe the ontology) Semantic Web technology can be used to express the information pertaining to certain areas of knowledge, understandable for a variety of Internet applications properly[17].

Conclusion

In this paper we examined the basic classification of ontologies and their description languages​​. A review of technology Semantic Web, the principles of constructing a model of RDF vocabularies and use of RDF: Schema. There have also been studied by methods of application development framework for extensible IDE Eclipse, and the library GEF (Graphical Editing Framework) for implementing the graphical part of tools.

List of sources

  1. Официальный сайт IDE Eclipse [Электронный ресурс]. – Режим доступа: http://www.eclipse.org/
  2. Clayberg E. Eclipse Plug-ins / Eric Clayberg, Dan Rubel. – AddisonWesley, 2005. – 852 c.
  3. Официальный сайт Plug-in Development Environment (PDE) [Электронный ресурс]. – Режим доступа: http://www.eclipse.org/pde/
  4. Спецификация JAR файлов [Электронный ресурс]. – Режим доступа: http://java.sun.com/javase/6/docs/technotes/guides/jar/jar.html
  5. Gamma E. Eclipse. Plug-ins. Third edition / Eric Gamma. – Addison Wesley, 2008. – 633 c.
  6. Daum B. Professional Eclipse 3 for Java Developers / Berthold Daum. – Wrox, 2009. – 548 c.
  7. Valcarsel C. Eclipse 3.0 Kick Start / Carlos Valcarsel. – Sams, 2005. – 389 c.
  8. Eclipse Plugin Development Tutorial [Электронный ресурс]. – Режим доступа: http://www.eclipsepluginsite.com/
  9. Проект Eclipse [Электронный ресурс]. – Режим доступа: http://www.rsdn.ru/article/devtools/eclipse.xml#EDB
  10. Eclipse – среда раработки [Электронный ресурс]. – Режим доступа: http://www.eclipsepluginsite.com/
  11. JAR файлы [Электронный ресурс]. – Режим доступа: http://ru.wikipedia.org/wiki/JAR
  12. Модель описания данных (RDF): Концепты и абстрактный синтаксис. Рекомендация W3C [Электронный ресурс]. – Режим доступа: http://www.w3.org/TR/rdf-concepts/
  13. Модель описания данных. Язык описания словарей 1.0: RDF схема. Рекомендация W3C [Электронный ресурс]. – Режим доступа: http://www.w3.org/TR/rdf-schema/
  14. Модель описания данных (RDF) [Электронный ресурс]. – Режим доступа: http://www.w3.org/RDF/
  15. Мельник С., Semantic Web: роли XML и RDF. Открытые системы / С. Мельник, С. Дехер. – Когито-Центр, 2001. – 96 c.
  16. Ной Н., Онтологическая разработка: Проводник для создания первой антологии / Н. Ной, Д. МакГвинес. – Техническая лаборатория знаний Стэнфорда KSL-01-05, 2001. – 78 с.
  17. Сергеев Е.В., Магистерская диссертация «Реализация Семантического Подхода К Построению Тематического Рубрикатора Информационных Ресурсов», г. Москва, 2007. - Перейти
  18. Научно-образовательный кластер CLAIM – Классификация онтологий. Перейти

Notice

At the time of writing this essay master work still is not complete. Estimated date of completion: December 2012, which is why the full text of the paper, as well as materials on the subject may be obtained from the author or his head only after the specified date.