Abstract

Content

Introduction

1 Problem Statement

2 Features formulate recommendations and preferences

3 Analysis Tools requests from google

4 Personalization information system

Conclusions

References

Introduction

Now most search social, news and advertising services on the Internet trying to lure people simplicity and ease of use of its resources to users as quickly as possible could find what interests them. One of these tools was the personalization information. The user has a favorite page or it run your blog in any social network. If this site has a personalization algorithm, all information displayed in the RSS feed or newsletter advertising will only match his interests and hobbies. As developers believe the same Google, personalization offers many advantages. Chief among them is the speed of information retrieval, almost one click. If we consider the process in more detail, it turns out that the search engine using semantic analysis itself determines what to show and what not, because the information about the visited Internet resources specific IP address, laid in memory of the searcher. Thanks to this automatic selection, the user will receive exactly the information they need in the first place. However, many may wonder : where one or other online resource can "know" that someone need? It turns out that in the modern information technologies, nothing is impossible. Now, going to a particular search engine and typing in the passphrase, any user will say, " Where is easier ! ". Whichever site you can add to your browser bookmarks and immediately go to him, not bothering to search.

1 Mission statement

Aim of this work is to study the student model for computer-based training systems based on search history. Analysis systems that use queries to gather information about the user. To achieve this goal has been studied material on heuristic algorithms and methods used to analyze the information most popular systems analyzed using analysis of user queries.

2 Features advice and preferences

Probably every Internet user in recommending systems met without knowing it. For example, when doing online shopping sites such as Amazon. Amazon tracks consumer habits of all its visitors and when the user visit a site uses the collected information to offer products that might interest him. Amazon may even suggest movies that he might like, although earlier the user only buy books. Some sites are selling tickets for concerts analyzed that the user visited before, and announce upcoming concerts, which can be interesting to him. Sites like reddit.com, allowed to vote for the links to other sites, and then on the basis of user voting offer other links that may be interested. Yandex provides statistics on a search engine. View statistics of this diverse geographical location of the user to the age factor. Allowable travel agency can see what city, in what month and who usually look for things to do in the most " Turkey." And in order to advertise in this region. Based only from members and virtually no processing them using algorithms can be very valuable information that seemed to be available a wider audience. From these examples it can be seen that the preference information may be collected in various ways. Sometimes data are visitor bought goods and opinions on these products are presented in the form of voting " yes / no" or rated on a scale, and sometimes just the word that has been entered into the search box. All this information has many advantages as a user and the system itself.

Through analysis of the following options are available :

Preference information;
Find similar users;
Selection information;
Selection and filtering the similarity and amount of information;
Forecasting;
Formation of thematic focus;

And this is only a small part of the possibilities which gives an analysis of user actions.

3 Analysis tool requests from Google

Hardly anyone would question the convenience and superiority of special tools and services research keywords, over the scarce information you provide directly to the search engines. Nevertheless, the primary source are usually just what they, and all sorts of applications received only treated them the result. In addition, the "naked " information from search engines can be quite interesting and informative, especially with regard to Google. Unlike Yandex, Google gives more interesting statistics on requests, which in itself is self-sufficient and does not need a " wrapper ". The only problem - " branded services " do not work on a large scale by analyzing keywords in the " industrial " scale. But even if you are not the proud owner of the coolest apps for keyword analysis, everyone has the ability to quickly assess the potential niche and determine the approximate range of keywords. Google offers three services : Google Trends, Google KeywordTool (AdWords) and Google InSights. Google InSights (Google Trends) despite the fact that he had long been out of beta for some reason is not very well known. Besides - it's an old friend Google Trends, only slightly more informative than its predecessor (see Fig. 3.1). Despite the fact that Google Insights for Search and Google Trends uses the same data, statistics search mostly intended for users ( researchers or advertisers), which may be useful advanced features of this service.

Figure 3.1 - Statistics search a given word according to Google InSights

Why do I need it service ( services ) if it does not show the exact number of possible transitions, it is primarily a marketing tool which is an indicator of interest in any concept. For example, was rated "blogspot". Growing interest in the term ( as in runet so globally), the prognosis is also encouraging that means that the user will not lose if zavedet blog on this platform. The added value of the service is that it provides information on any geographical area, which features news stories for data collection caused a surge of interest in the term, the ability to compare the concept, etc. In general, before you start the next project should be, of course, in the beginning study niche, at least using Google InSights. Google KeywordTool, in contrast to the " Insights " Keyword Tool for AdWords campaigns is quite good for SEO. You can use this feature to compile a list of at least indicative keywords. Does that have to do more than one run on different keywords. Also available in targeting and much more. There is a service KeywordTool also to explore niches for their potential in kontekstke ( specifically Adsense), because the user can see the average CPC, the number of competitors (meaning ad on adwords).

4 Personalization information

When recommendation system operates with a large amount of contents, the main task of the filter becomes the content and its ranking. If we talk about the news - every day out hundreds of thousands of articles, thousands of which may affect the interests of each person reading the news. But mostly people do not read more than 5-10 articles per day. And so the task is to show the right information in the first place. To solve this problem, articles, coming in from the Internet, are analyzed in order to identify additional information :

System recognizes concepts named in the text, such as the main participants in the events referred to - people, companies, brands, where the event occurs. To do this, we have implemented an algorithm based on the grammatical approach to the search for patterns in the text entity.
System classifies news using several different approaches. For the classification of articles on popular headings, such as sports, business, or politics, the method of support vectors.
To isolate smaller and narrow topics of the text of a simple implementation of rule-based classification.

For simplicity, named concepts, themes, categories, and all other knowledge about the article called Article tags. In the form of the same tag, the system determines user's interests by analyzing the articles that he likes, or when the user explicitly tells about their interests ( see Fig. 4.1).

Figure 4.1 - System personalization requests

To further optimize newsfeed system includes articles from various sources about the same to the main user does not see the tape repeats, but immersed in reading history, could choose which point of view it is interesting to read. This clustering of the content is done by a special mechanism based on graphs. When the user reads the article, the system determines that the user is more like it. Thus, the system is trained for each user, forming his "portrait" and uses this portrait in order to choose the most, in her view, the user interesting news. Weight - a belief system that will be interesting subjects (see Table. 4.1). This weight is calculated based on how active the user " interacts " with a certain theme.

Table 1 - Weights categories regarding user preferences
Category Name	Weight
Cloud Computing	0.95
API	0.72
Steve Jobs	0.62
Microsoft	0.44
Facebook	0.40
iPhone	0.24
Startups	0.18
Manu Ginobili	0.17

This approach allows the user to save not interest him the news, but at the present abundance of content does not guarantee that the user knows all the most important thing happening in his areas of interest (see Fig. 4.2), ie not solve the problem of information overload.

Figure 4.2 - Filter by user preferences (animation: 6 frames, size - 761х298, 149 kilobyte)

That is system filters content on users' interests. This approach allows the user to save not interest him the news, but at the present abundance of content does not guarantee that the user knows all the most important thing happening in his areas of interest, ie not solve the problem of information overload. With the introduction of the concept of " the importance of the news for the user " entered Comparative characteristics (ie, some news may be more important to the user, other - less), which makes it necessary to rank the news in accordance with this feature individually for each user. This technique is called " recommendations based on content " and is widely used in various products, such as recommendation system imdb.com. For each document reveals a set of attributes, each of which is weighted relative to the user to determine whether they can be important news for this user.Let's say you can use the following parameters:

freshness of content.
Number of tags news, which is in the portrait of this person.
Likelihood that news relevant tags like user (coefficient in Table 1).
Resonance - number of sources covered the news, ie number of sources, whose articles are involved in the current cluster.

Thus considered the article to identified user preferences can be determined which of the articles will choose system.The more evaluation criteria the more likely that the user will get interesting information. If any criterion is not in the article, it may be replaced by the average score on all counts, or assign it a score of 0.When ranking clusters born three advantages:

1) Resulting ranking immediately appears tape, which can be shown to the user.

2) Items to rank turns out less ( cluster contains many articles at once ), respectively, obtained the necessary work done faster.

3) No additional cost we get such a parameter as the resonant events ( ie how many sources have written about).

But this approach has a problem that has led us to, to get away from ranking clusters and begin to rank articles alone. The problem is that many of the attributes we have chosen the cluster can not be compared with those of the user.

For example, if five articles in the cluster, the resonance of the cluster is taken as 5, but that does not mean that all five articles of interest to the user. That is, the ranking of a particular cluster for a particular user in each parameter must take into account all the interests of the user. In this case, computing the number of articles resonant with the cluster of interest to the user (referred to the interests of the user), instead of the total number of items in the cluster.

At the same time it is necessary to show the user stories (clusters), and not the article. First, because the user does not want to see in your feed several different articles about the same, even if they are published in different sources. secondly, because for ranking us necessarily need this parameter, as high-profile events.

System in which articles are ranked, but at the same time take into account resonance events, and user stories appear more promising in respect of which ranked only articles.

In addition to using the weights of tags user's portrait, the system can also weigh the options of different articles regarding different tags. Options - this article date, the number of sources, the amount of textual information, the index of influence in social networks, and similar attributes articles. For example, there is little textual information in analytical articles for tag Politics - it's bad. However, exactly the same amount of information for a photoblog - this is permissible. Thus, one and the same article will have different weights for different tags. After normalization using the developed system in ranking functions, these parameters are aggregated articles on weight tag.

considered the user's profile as a desire to see a particular tag from articles now aggregate the weight of the articles in those tags that the user has in the portrait, thus obtaining the final total weight of the article relative to the user.

Conclusions

Analyze personalization system and methods of analysis of user requests. Identified huge potential and benefits of these systems. Currently, there are all sorts of user data analysis, some of them good, some still need to be improved.

There are already a lot of systems using user information which the user does not even know. This is a great resource that bring convenience to users as well as material gain and popularity by the services.

Most systems aimed at providing popular information interesting to the user but very few systems are expected to, what else would be subject would be interesting to the user, that is not short- personalization of it in his own preferences and give all new and new information. Which will be of interest to the user, but it is still just about it does not know or did not seek nor when.

References

1. Сегаран Т. Программируем коллективный разум. – Пер. с англ. – СПб: Символ-Плюс, 2008. – С. 368.
2. В.А. Лексин Персонализация контента на основе оценок сходства пользователей и ресурсов сети интернет. - 49-я научная конференция МФТИ.
3. Система персонализации News360: ранжирование кластеров информации [Электронный ресурс] Режим доступа: http://habrahabr.ru/post/191528/
4. Traboulsi, H. N. (2006). Named entity recognition: A localgrammar-based approach. PhD thesis, Department of ComputingSchool of Electronics and Physical Sciences, University of Surrey, Guildford, Surrey, U.K. Retrieved from: scribd.com
5. Boser, Bernhard E.; Guyon, Isabelle M.; and Vapnik, Vladimir N.; A training algorithm for optimal margin classifiers. In Haussler, David (editor); 5th Annual ACM Workshop on COLT, pages 144–152, Pittsburgh, PA, 1992. ACM Press. Retrieved from: citeseer.ist.psu.edu
6. Chang, C., & Lin, C. (n.d.). Libsvm — a library for support vector machines.
7. Дмитрий Ночевнов. Методы и средства сегментации пользователей web-сайтов
8. Kornfein, M. M., Goldfarb, H. (2007, July). In M.M. Kornfein (Chair). A comparison of classification techniques for technical text passages. WCE 2007, London, U.K. Retrieved from: citeseerx.ist.psu.edu
9. Мини проект «Vizitator» — дознаватель пользовательских предпочтений [Электронный ресурс] Режим доступа:http://habrahabr.ru/post/46784/
10. Анализ данных и процессов: учеб. пособие / а. а. барсегян, м.с. куприянов, и. и. холод, м. д. тесс, с. и. елизаров. — 3-е изд., перераб. и доп. — спб.: бхв-петербург, 2009. — C. 512.

Ignatov Philip

Faculty of computer science and technology (CST)

Department of computer engineering (CE)

Speciality “Software systems"

Construction and study of the student model for computer-based training systems based on search history

Scientific adviser: Ph.D., Professor Anatoliy Ivanovich Shevchenko

Abstract

Content

Introduction

1 Mission statement

2 Features advice and preferences

3 Analysis tool requests from Google

4 Personalization information

Conclusions

References