Abstract of the graduation Work
Study of the method of data mining, precedents for the prediction of meteorological parameters
Topicality
Need to anticipate the likely scenario for the future in Ukraine, has never been so important as now. Decisions made today are based on the signs of the phenomena. In turn, they are more or less affect it in the future. That is why the study of models of time series prediction in a lack of information will help avoid errors of principle when making any decisions. The study of this problem is important both for theory and for practice.
General formulation of the problem
At the present level of development of information technologies and, more specifically, decision support systems are two directions in the development of inference of knowledge:
- development of systems of logical inference, based on the rules;
- development of systems of logical inference, based on precedent.
Virtually all of the early expert systems simulated move decision-making expert as a purely deductive process using a logical inference based on rules. This meant that the system pawned a set of rules like "if ... then ...", according to which on the basis of input data was generated by a particular conclusion on a particular issue. The chosen model was the basis for creating the first generation of expert systems, which were comfortable enough for both developers and users, experts. However, over time, it was realized that the deductive model simulates one of the rare approaches to be an expert at solving problems.
In fact, instead of having to solve every problem on the basis of first principles, an expert often examines the situation in general and remembers what decisions were taken earlier in similar situations. He then uses these solutions, or adapting them to the circumstances will change for a specific problem.
Simulation of such an approach to problem solving based on past situations that led to the technology of inference, based on precedent (in English - Case-Based Reasoning, or CBR), and in the future - to create software products that implement this technology.
In some situations, the output method of precedent has significant advantages compared with the conclusion, based on rules, and is especially effective when:
- the main source of knowledge about the problem is experience, not theory,
- solutions are not unique to a particular situation and can be used in other cases;
- the aim is not guaranteed the right decision, and the best possible.
Thus, a conclusion based on precedent, is a method of construction of expert systems that make conclusions about a given problem or situation through the search results analogies, stored in a database of precedents.
output system of precedents show very good results in a variety of problems, but have some significant drawbacks.
First, they generally do not create any models or rules, generalizing previous experience - in the choice of solutions are based on the entire array of available historical data, so it is impossible to say based on what specific factors of the system output to build their own precedents specific answers.
There are two main problems faced by such systems: the search for the most appropriate precedent and subsequent adaptation solutions.
Underlying all the approaches to the selection of use cases is one or another way to measure how close a precedent and current case. When these measurements calculated the numerical value of some measure that determines the composition of a set of precedents that must be processed to achieve a satisfactory classification or prediction. The main disadvantage of such systems is arbitrary, which allow the system when choosing a measure of proximity. In addition, the spread of unfounded looks overall measure of the closeness to the data sample as a whole.
Another drawback of the method is associated with the design of relevant precedents and destination weights of attributes, which reduces their applicability (universality).
In most cases, methods of searching precedents are reduced to the induction of decision trees or algorithms "nearest neighbor", supplemented, perhaps, the use of domain knowledge. With regard to adaptation and use of the solution found, this problem is still not formalized and is strongly dependent on the domain.
Both issues - find precedents and adaptation of the chosen solution - resolved (in whole or in part) with the involvement of background knowledge, in other words, domain knowledge (domain knowledge). There are different ways of obtaining information about domain:
- bringing expert knowledge. It can be expressed, for example, in the restrictions imposed on the ranges of change attributes of the objects, or to formulate a set of rules to split the base of precedents on the classes.
- obtaining the required knowledge of a set of available data, data mining techniques (in English - Data Mining). This includes all methods to identify relationships in data, in particular, clustering, regression, finding associations. Using data mining techniques can provide a narrow set of indicators, which determine characteristics of interest for the researcher, and to patterns that were discovered in an analytical form.
- creation of knowledge based on training sample provided by an expert (supervised learning). This method includes both of the first.
Originallysystems output by precedent as a source of background knowledge were the experts - highly qualified specialists of subject areas, as well as textual materials - from textbooks to the protocols, and, of course, databases. The Role of Experts in verbalization, that is, the translation of such sources in an explicit form. Given that the most important task in the process of formalization of knowledge extraction is to minimize the role of the expert, his role should be to take the tools of data mining.
Among the extracted patterns in practice, the most frequent equivalence relations and order. The first characterized, in particular, problems of classification, diagnosis and pattern recognition. On the other hand, the relationship of the order inherent problems of scaling, forecasting, etc.
Idea of the algorithm
Conventionally, the algorithm involves the following steps:
- Input time series for a variable;
- Choice "current date" and the number «k» nearest neighbors;
- the distance from the current date to the values of previous dates;
- Sorted ascending distances;
- Selection «k» first distance after sorting;
- Forming array “these values”;
- Adaptation and analysis of values, such as finding the arithmetic mean, minimum or maximum value;
- output value obtained as a result of the forecast;
Figure 1 shows a diagram of the modules of the system being developed.
Figure 1 - Animation diagram of modules of the system developed (Size animation: 58.8 KB; Number of staff: 5 Number of cycles: 5)
Objective algorithm
Predict the value of the time series for a future time period based on the patterns and relationships identified in the database containing the values (measurements) of the series in the previous period.
Conclusion
Advantages of the developed algorithm is that it can use to predict the dynamic performance and factors from any field of knowledge and human activities. For example, for the prediction of meteorological parameters and weather conditions, the dynamics of stock prices and currency forecasting consumer demand forecasting in lending for the next reporting period, yield forecasting and much more.
Literature
- Журавлев Ю.М. История развития методов интеллектуального анализа данных – Data Mining. Интернет-ресурс. - Режим доступа: http://azfor.ucoz.ru/publ/3-1-0-3.
- Обучение на примерах. Интернет-ресурс. - Режим доступа: http://ru.wikipedia.org/wiki/Обучение_по_прецедентам.
- Машинное обучение. Интернет-ресурс. - Режим доступа: http://www.machinelearning.ru/wiki/index.php?title=Машинное_обучение.
- Задачи прогнозирования. Интернет-ресурс. - Режим доступа: http://ru.wikipedia.org/wiki/Задачи_прогнозирования
- Torgeir Dingsоyr. Integration of Data Mining and Case-Based Reasoning. Интернет-ресурс. - Режим доступа: http://www.idi.ntnu.no/~dingsoyr/diploma/
- A. Aamodt, H. A. Sandtorv, O. M. Winnem. Combining Case Based Reasoning and Data Mining - A way of revealing and reusing RAMS experience. A. Aamodt - NTNU/SINTEF, Dep. of Computer and Information Science, Trondheim, Norway; H. A. Sandtorv - SINTEF Industrial Management, Safety and Reliability, Trondheim, Norway; O. M. Winnem - SINTEF Telecom and Informatics, Trondheim, Norway
- A Hybrid Data Mining and Case-Based Reasoning User Modeling System Architecture. Proceedings of the World Congress on Engineering 2008 Vol IWCE 2008, July 2 - 4, 2008, London, U.K.
- Прецедент. Интернет-ресурс. - Режим доступа: http://ru.wikipedia.org/wiki/Прецедент
- Марина Шапот. Интеллектуальный анализ данных в системах поддержки принятия решений. Интернет-ресурс. - Режим доступа: http://www.osp.ru/os/1998/01/179360/
- Data Mining - интеллектуальный анализ данных. Электронный-ресурс. - Режим доступа: http://www.iteam.ru/publications/it/section_92/article_1448/
Home Abstract