Songjie Gong
Zhejiang Business Technology Institute, Ningbo 315012, China
E-mail: eizolz@163.com
Original: http://www.aicit.org/JDCTA/ppl/JDCTA%20Vol6%20No11_part20.pdf
Abstract
With the emergence and evolution of Networks, the information on the Internet has increased greatly. Retrieving useful information from a large amount of information has become a key technology in the information area. The application of personalized recommendation in the Internet effectively improved its service, especially the service of E-commerce. Traditional search engine do not take different user’s interest into consideration, so the result they retrieved cannot satisfy user’s specified needs. In order to effectively solve the problem, this paper presented a personalized recommendation system employing user interest model for content-based filtering. This paper analyzes the system of five different components: document information extraction, document vectors representation, user interest model representation; matching algorithms, user feedback update. This personalized recommendation system can describe user’s interest type and interest degree well, and can enhance the personalized information service efficiency.
Keywords: Personalized Service, Information Retrieve, Information Filtering, Recommender System, Content-Based Filtering, User Interest Model
1. Introduction
With the popularization of the Internet and the development of E-commerce, the E-Commerce system’s structure becomes more complicated when it provides more and more choices for users [1,2,3]. The recommender system can alleviate the information overload [4,5,6]. Lots of personalized recommendation systems have been proposed in many fields. Two main technologies are usually adopted in personalized recommendation systems: content-based filtering and collaborative filtering [7,8,9].
Traditional search engine do not take different user’s interest into consideration, so the result they retrieved cannot satisfy user’s specified needs [10,11]. In order to effectively solve the problem, this paper presented a personalized recommendation system employing user interest model for content- based filtering. This paper analyzes the system of five different components: document information extraction, document vectors representation, user interest model representation; matching algorithms, user feedback update. This personalized recommendation system can describe user’s interest type and interest degree well, and can enhance the personalized information service efficiency.
2. Content-based Filtering
There exist two main approaches in information filtering: collaborative and content-based. In collaborative filtering, the system selects and rank-orders items for a user based on the similarity of the user to other users who read/liked similar items in the past. In content-based filtering, the system selects and rank-orders items based on the similarity of the user's profile and the items' profiles.
2.1. Framework of content-based filtering
Content-based filtering has five parts: document information extraction, document vectors, user interest model representation; matching algorithms, user feedback update. The framework of content- based filtering is as figure 1 shown.
Figure 1. Framework of content-based filtering
3. Extracting feature of items
Get text from a text by the feature vector, to go through the process of extracting a feature item. Feature extraction is the key vocabulary from all possible to extract an expression of strong persuasive texts feature the best subset of items. The purpose of doing this mainly two: First, to improve the efficiency of procedures, streamline operations, improve the operating speed; all the tens of thousands of pairs of text content of the meaning of words is different. Prevalence of some common terms on the contribution of small text. In order to improve the accuracy of recommendation systems, should be removed that is not strong and expressive vocabulary, text selected for the optimal set of feature items of interest[12,13,14,15].
Best feature items are those with the relevant text set rel (Q) maximum mutual information terms, vocabulary, and related text set on the number of mutual information between the calculated by the following:
logMI (wi,rel(Q)) =log ( P(wi|wi∈rel(Q) )/p(wi))
where, wi is the ith word in the text; P(wi|wi∈rel(Q) is the ratio as the word wi in the relevant text set rel (Q); p(wi) is the ratio as the word wi in the data processing text.
4. Item presentation model 4.1.Vector space model
Expressed in the traditional practices of information resources and the interests of users, is a vector space model. The vector space model is a text representation model. It has the text and all the functional items constitute the basic unit of the terms set project. Each item can be expressed as a vector and the dimension of the vector is the number of item sets. General is not fixed and we can also specify a fixed size. Because the characteristic frequency of the word document to a certain extent reflects the theme of the file, so each component is the number of items in the feature vector document. This concentration of resources in the resource can be expressed as a term sets of vectors[16,17].
4.2. Probability model
The probability model is firstly established in the field of the classification model and then calculates the classification probability distribution of all the files and users interested in the model. Used to denote the probability distribution of documents and users' interests can better reflect the diversity of user interest, and easy to implement. The classification model is using the Bayesian method of training. The expression of the interests of users and files are the same.
4.3.Improved probabilistic model
Vector space model method can only express user interest keywords. It can not distinguish the difference between the user interests. Despite the differences can be distinguished on the probability model approach is based on the user's interests, the diversity of the user's interest, but can not express the love of the user of the level of interest rates. Therefore, in order to improve the method, the improved probability model can express user interest keywords and express the level of user interest.
5. User Interest Model
Interest to the user and the candidate documents match the calculation, first need to define the user's computer interest and candidate documents said[18,19,20]. We use the classic VSM model document, said the candidate, that candidate document D can be expressed as follows:
Where wi is the first document D i a feature term weight. We select the word as the feature item, and use the relative term frequency as the characteristics of term weight. Relative Frequency Words can tf- idf formula is as follows:
of times the word occurs, N the total number of that document.
5.1.User Interest Model based on Interest Document Vector
The most simple idea, the user U can be expressed as a series of interest in the document I set the user, namely:
The user interest and the candidate matching documents can be expressed as each user interested in documents I and the candidate document D, the sum of matching, namely:
The user interested in documents I and D of the matching candidate documents VSM can be used in the similarity formula, namely:
5.2.User Interest Model based on Interest Vector
Where, interest for the user's first i-U characteristics of term weight.
U user interest in the model selected as the feature that contains the word item, and use the relative term frequency as the characteristics of term weight. We define the set U contains the word K as follows:
Here I model for the user interested in documents included in interest.
And define the relative word frequency word k as follows:
Definition of a good user interest vector that, you can use the VSM model to calculate the similarity of user interest formula and candidate matching documents are as follows:
Can see that the interest in the document containing the n-user interest model, without reducing the accuracy of matching the case, the model will be stored in the space and matching time is reduced to the model of 1 / n.
5.3.User Interest Model based on Multi-Interest Vector
Because the user's query Q is often reflected the interest of concern to the present, in order to resolve these issues, we extend the model of a user to maintain multiple interests. Match in the candidate documents, we first query the user Q and users interested in V for every match, only when the match is greater than a threshold L, we think that V is a user interested in the present inquiry concerns , so will interest in the document D, V and candidate interest multiplied by matching V and query Q as a weight, adding the user interest and document D, U in the matching. Matching algorithm used in the model as follows[21,22,23]:
You can see, using this matching algorithm, the model can effectively identify the user interest in the current concern and interest in accordance with the current concerns matching calculation, making the final result returned to the user effectively reflect current interest concerns.
5.4.User Interest Model based on Role
In real life, each user belongs to one or several roles, such as Joe Smith's work is a program, and his hobby is mountain climbing, then Joe Smith's role is to programmers and climbers. Contain some kind of interest a user, that user belongs to a role also includes the interest. In contrast, the role of interest included interest than the user that contains more accurate, because sometimes the user can not accurately express their interest in the role the user belongs to effectively modify the user's interest.
We define a user U, and the role the user belongs to R1 ,R2 ,...,Rn , then the candidate document D and user interest matching degree is calculated as follows:
Where, P1 is the model matching formula, α and β are weight coefficients. We also can define the basis of multi-user role model, but the actual application, a layer of role models able to effectively identify the user interest.
6. Matching Algorithm
A set of similarity measures are presented and a metric of relevance between two vectors. The similarity measure can be effectively used to balance the ratings significance in a prediction algorithm and therefore to improve accuracy[24,25,26].
There are several similarity algorithms that have been used in the recommendation algorithm: Pearson correlation, cosine vector similarity, adjusted cosine vector similarity, mean-squared difference and Spearman correlation.
Pearson’s correlation measures the linear correlation between two vectors of ratings.
The cosine measure looks at the angle between two vectors of ratings where a smaller angle is regarded as implying greater similarity.
The adjusted cosine is used in some filtering methods for similarity among users where the difference in each user’s use of the rating scale is taken into account.
7. Feedback and Update of User Interest Model
After the user interest model, can allow users to take the initiative to update, you can also track the user's behavior dynamically updated. Talking about the latter, that according to the user's actions produce different current update [27,28,29]. User action can be add a bookmark to download documents, visit summary, ignore and delete bookmarks and other documents, these actions reflect the different interests of users, and therefore have a different meaning[30,31,32,33], see Table 1.
7.1.Short-term interest
The user short interest is shown as figure 2.
Short interest Ps has tow parts: P(s1) and P(s2) . P(s1) , that from the first day of searches here to receive part of the record of user interest; P(s2) based on the current search and get the latest part of user interest. Ps is defined as:
where, x + y = 1.
7.2.Long-term interest
The user long interest is shown as figure 3.
Figure 3. User long interest
Long interet Pl is defined as:
7.3 Upgrading user interest.
Then we can update the user interest model.
where, a + b = 1 ; x+ y = 1.
8. Conclusions
The application of personalized recommendation in the Internet effectively improved e-commerce service [34,35,36,37]. In order to effectively solve the problem, in this paper, we presented a personalized recommendation system employing user interest model for content-based filtering. This paper analyzes the system of five different components: document information extraction, document vectors representation, user interest model representation; matching algorithms, user feedback update. This personalized recommendation system can describe user’s interest type and interest degree well, and can enhance the personalized information service efficiency.
9. Acknowledgment
A Project Supported by Scientific Research Fund of Zhejiang Provincial Education Department (Grant No. Y201121981).
10. References