Koshelyeva Viktoriya Andriivna |
Materials on the theme of final work:
Abstract | Library | References |
Report on the search | Individual work |
One of the main approaches in Data Mining is clustering. Clustering is used for grouping (clustering) large volumes of data. These clusters are characterized by the fact that elements within each group have more "similarities" between them than the elements in the neighboring clusters do. In general, all methods of clustering can be divided into hierarchical and non-hierarchical. Non-hierarchical methods are mostly used in the analysis of large amounts of data, because they are faster. [8] Cluster analysis of the data recovers the previously unknown regularities, which are virtually impossible to be explored in other ways and present them in a user-friendly form. Cluster analysis methods are used both as independent tools, and as a part of other means of Data Mining (for example, neural networks). Cluster analysis is used for processing large amounts of data, from 10 thousand to millions of records, each of which may contain hundreds of attributes, and is widely used in pattern recognition, finance, insurance, demographics, trade, marketing research, medicine, chemistry, biology, etc. To date there has been developed a large number of clustering techniques applicable to the type of numerical data. In the field of numerical (categorical) data there is much less generally accepted methods. (ROCK, DBSCAN, BIRTH, CP, CURE, etc.) Data processing of mixed type data at the moment causes great difficulty and is an area of research. In general, all phases of cluster analysis are interrelated, and the decisions taken at one of them, determine the actions at subsequent stages. [9] Analysts should decide whether to use all the observations or delete certain data or a sample from the data set. The choice of metric and the method of standartization of the baseline data. Determination of the number of clusters (for iterative cluster analysis). Determination of clustering method (rules of association or connection). According to many experts’ view, the choice of clustering technique is decisive in determining the form and the specifics of clusters. Analysis of the clustering results. This phase involves such issues: whether received splitting into clusters is random; whether the splitting is reliable and stable in the sub-sample data; whether there is a relationship between the clustering and variables that have not been involved in the process of clustering; whether it is possible to interpret the results clustering. Testing clustering results. Clustering results should also be tested by formal and informal methods. Formal methods depend on the method that was used for clustering. Informal include following procedures for checking the quality of clustering:
One of the ways of checking clustering quality is using several methods and comparing the results. The lack of similarity will not mean incorrect results, but the presence of similar groups is considered a sign of clustering quality. Like any other method, cluster analysis methods have certain weaknesses, that is some difficulties, problems and constraints. In the cluster analysis it is important to take into account that the clustering results depend on the criteria of splitting the original data. In reducing the dimension of data some distortion may appear. Some individual characteristics of objects could be lost because of generalization. There is a number of complications, which should be considered before the clustering.
|
Materials on the theme of final work:
Abstract | Library | References |
Report on the search | Individual work |