UA   RU
DonNTU   Masters' portal

Abstract

Introduction

The master's work is devoted to actual scientific problem of satisfaction of user requests by data analysis. Thus algorithms are also called recommendations algorithms.

The main objective of recommendation algorithms – analyze users requests and based on them predict they actions.

1. Theme urgency

The are a lot services, which sell goods or content, have long tail problem. This issue similar to Pareto distribution (ratio of 20% to 80%), which shown in Figure 1. For example 20% of the types of goods in online store makes 80% of its revenue. Customers are buying mostly popular products, but don't know anything about remaining 80% of the goods which isn't popular. To solve this problem created recommendation algorithms that personalize offers recommendations based on user data [1].

Long tail problem

Figure 1. Long tail problem.

There are a lot of issues in this area:

  • Privacy. It is necessary to secure access to the private data of users. It is important to exclude the possibility to identify the users' personality based on the result of algorithm work[2];
  • Malware. Requires ability to defend against malicious attacks that may send wrong data as input, which reduce algorithm efficiency[3];
  • Importance of different user actions. When user viewing and purchase of goods and show it interest in this product. For example, the importance of buying is much higher than the importance of simple viewing[4];
  • Sociallity. Active development of social networks makes possible to find out information about the interests of user friends, not only the individual user[5];
  • Mobile platforms. Development of mobile operating systems made it possible to collect more information about users, as well as significantly increased the amount of data about the current user context[6];
  • Optimization. With new tehnologies features we have the opportunity to collect large amounts of information. Because of this, it is necessary to optimize the algorithms, so as algorithm should use less memory and CPU time, etc[7].
  • Merge algorithms. Over 15 years of development of the sphere was created by a large number of algorithms that use different source data about the user and the recommended objects, and also have different approaches to predict. The challenge is to merge several methods into one hybrid approach for more accurate predicts[8].

At the moment recommendation algorithms is actively developing due to open up the possibility to collect large amounts of information about users. Start from his current location and end to the list of his purchases in the store[9].

There are exist a lot of regularly championship, which stimulates new explorings in this area.

2. Goal and tasks of the research

The main task – develop algorithm which predict intersting mobile applications for user obased on already installed applications on it's device. The algorithm should also take into account information about the properties and applications of user feedback to improve the result recommendation.

Additional objectives:

  • Analysis of approaches to create systems that satisfy user requests;
  • Evaluation of recommendation algorithms;
  • The study of technologies analyze large array infmoratsii;
  • Combining multiple recommendation algorithms in one hybrid algorithm;
  • Algorithm performance.

Research object : systems that satisfy user requests.

Subject of research : Combining different algorithms of recommendations to satisfy user requests in the recommendations for mobile applications.

As part of the master's work is to get the current scientific results in the following areas:

  1. Optimization algorithms based on user activity for applications in the field of mobile applications recommendations.
  2. Optimization algorithms based on the properties of objects for use in mobile applications recommendations.
  3. Optimization algorithms based on knowledge for the application of the recommendations in the field of mobile applications.
  4. Optimization of a combine different approach to the hybrid algorithm.

For the experimental evaluation of the theoretical results and the formation of the foundation of further research, as practical results is planned develop recommendation system of mobile applications with the following properties:

  • Available information about the objects prediction (genre, category, etc.);
  • Available information about the users and their data set;
  • Object predictions are assessed by user ratings (from 0 to 5);
  • The algorithm should indicate what data is most strongly influenced the prediction to the user it was easier to make your choice;
  • The algorithm should work in real time and provide predictions of the user within a few seconds;
  • The algorithm must take into account the feedback from users (like it or not a prediction, what information is important for predictions, etc.);
  • The algorithm has to collect new data from users and feedback during the prediction, and use the data updating the basic distance matrix (not necessarily real mvremeni, for example once a day).

3. Approaches to create systems that satisfy user requests

3.1 User-based algorithms.

The system only uses information about user actions. The basic idea – if users U1 and U2 bought a book B1, and the user U2 bought another book B2, then we can predict, that this book will also be for user U1.

It is worth noting that these systems do not take care about account knowledge and data about the properties of users and objects recommendations. That is, no matter what genre it will bought the book and also does not matter that the user has specified in his profile that he is interested in books an author.

3.2 Item-based algorithms.

Systems of this type does not use the data about the actions of the user and only recommended analyzed properties of objects and properties of the current user.

These properties can be entered manually (such as the user specifies what genre of books of interest to him or site administrator enters information about the book: the genre, author, year of publication, etc.) are automatically retrievied from the recommended items (eg analysis of the length of texts and books dividing them into groups: short, medium, long) and finding new signs using clustering algorithms by analyzing raw data.

3.3 Knowledge-based algorithms.

Systems of this group use their knowledge about the scope of the expected products. These systems are very similar to the system on the basis of the user's properties and objects of the recommendations and in some sources this type of system is a subtype of systems based on the properties.

The basic principle of operation of these systems is based on the active a user interaction (receiving feedback) and knowledge of the areas of recommendations. This type of system is best used in the case when the type 1 and 2 can not be used due to the small amount of data. An example would be listening to the recommendation of the system cameras. As a rule, people are buying a new camera only once every few years, so each individual store is not enough data to recommend based on user actions . Cameras also have a large number of characteristics, buyout have different importance for users. For photographers who work in the studio the most important quality of photo at the same time for photographers who work in the street and have active lifestyle is also important the weight of the camera, easy portability, memory size and battery life.

System recommendations of this type actively interacts with the user, but does not ask the user to enter all the parameters manually. In the example with cameras the system will ask the user where there is shooting going on, portrait or landscape shooting user prefers. Based on the answers by the system to identify the most important characteristics for the user and their optimal values. Knowledge about the scope of the original set of recommendations system administrator.

3.4 Hybrid algorithms.

The above systems use different approaches to data analysis and prediction of the result, the most crucial task in parallel. In practice, none of the methods alone can not recommend products or content with a high probability. Therefore, the actual recommendation system always use a number of approaches to the analysis and recommendations.

Hybrid systems merge all these approaches into a single algorithm and have the highest efficiency. Also, these systems are the most difficult to implement and design. The challange is to adapt algorithms for specific application recommender sistems. So for example in the areas where each user evaluates the proposed content (movie catalog with rankings) are the most important part of the assessment and analysis of user actions. A recommendation system using for online shopping site the most important tasks for the system administrator is improove profits, and sence of recommendation service is necessary to take care about account the prices of commodities and the need to sell them.

3.5 Comparing algorithms.

Comparing algorithms presented in Table 1.

Algorithms/Property"Cold start" for objects "Cold start" for users Recommends unpopular objects Take care about individuality Use knowledges Use object properties Use user actions Explanation Require administrator
1. User-based algorithms + - - +/- - - + +/- -
2. Item-based algorithms - +/- + + - + - + +/-
3. Knowledge-based algorithms - + + + + - - + +
4. Hybrid algorithms + + + +/- + + + +/- +

Table 1. Comparing recommendation algorithms.

Structure of abstract recommendation systems

Abstract recommendation service structure is shown in Figure 2.

Äèàãðàììà ðàáîòû àáñòðàêòíîé ðåêîìåíäàòåëüíîé ñèñòåìû.

Figure 2. Abstract recommendation service structure.

(Animation Size: 27 Kb; Number of frames: 9; Number of cycles: 5)

Conclusion

There are several ways to implement the algorithm of recommendations. However, none of the methods alone can not give good results. Therefore, in practice it is always necessary to use a hybrid system which unites several ways.

The algorithm of the recommendations will create several sub-modules work independently of each other at the first stage. Therefore, data can be sub-modules are risen as independent algorithms and develop them separately.

The most difficult part of the developed algorithm is to adapt generic algorithms to the context of the recommendation and the omnibus modules into a single tseloe.Kontekst algorithms apply the algorithm is the main criterion for the selection of algorithms predictions of metrics to measure performance.

Before creating the algorithm necessary to examine the context of its application to review and complete the set with the help of statistic tools (R, Weka, etc.). It help determine the most effective algorithms in a given situation.

To implement the algorithm can use open source frameworks. This will significantly reduce development time. When choosing a framework and programming language is the most important factor in the context of applying the algorithm of recommendations scope. Frameworks provide a basic implementation of the algorithms to be changed In accordance with the context. However, the frameworks also impose significant restrictions on the structure of the algorithm. If you need complete freedom in terms of design of the algorithm, then the best way is to write recommendation system without the use of frameworks.

To implementation the algorithm can use ready-made frameworks open source. This will significantly reduce development time. When choosing a framework and programming language is the most important factor in the context of applying the algorithm of recommendations scope. Frameworks provide a basic implementation of the algorithms to be changed In accordance with the context. However, the frameworks also impose significant restrictions on the structure of the algorithm. If you need complete freedom in terms of design of the algorithm, then the best way is to write recommendation system without the use of frameworks.

You must also consider the importance of explaining the results when using the algorithm recommendations. To the user, it is desirable to specify trusted algorithm based on public data was obtained result. So for example if you offer the user a choice of two films: a first prediction was based on a review of the films, and the second prediction based documentaries. And the user will be able to choose what kind of movie it is now more interesting.

References

  1. Leskovec J., Rajaraman A., Ullman J. Mining of Massive Datasets // The MOOC. – 2014. [URL]: http://www.mmds.org/.
  2. Dean J., Ghemawat S. MapReduce: Simplified Data Processing on Large Clusters // Google, Inc. – 2004. [URL]: http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf
  3. Aranda J., Givoni I., Handcock J., Tarlow D. An Online Social Network-based Recommendation System // University of Toronto – 2007. [URL]: http://www.cs.toronto.edu/syslab/courses/csc2231/07au/projects/2/aranda.pdf
  4. Koren Y. Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model // University of Toronto – 2007. [URL]: http://research.yahoo.com/files/kdd08koren.pdf
  5. Hahsler M. recommenderlab: A Framework for Developing and Testing Recommendation Algorithms // Southern Methodist University – 2008. [URL]: http://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf
  6. Salakhutdinov R., Mnih A., Hinton G. Restricted Boltzmann Machines for Collaborative Filtering // Southern Methodist University // Proceeding – 2007.
  7. Silva N.B. A graph-based friend recommendation system using Genetic Algorithm // Southern Methodist University // Evolutionary Computation – 2010.
  8. Gong S. Learning User Interest Model for Content-based Filtering in Personalized Recommendation System // Zhejiang Business Technology Institute – 2010.
  9. Husain W., Dih L.Y. A Framework of a Personalized Location-based Traveler Recommendation System in Mobile Application // Universiti Sains Malaysia – 2012.

Notes

In writing this essay master's work is not yet complete. Final completion: December 2015. Full text and materials on the topic can be obtained from the author or his scientific adviser after that date.