Collaborative filtering systems that are based on user terugkoppeling have two limitations. Very first, they rely strenuously on explicit terugkoppeling which has several drawbacks. 2nd, since the user models are static, recommendations become inaccurate spil the user models age. Thesis systems are therefore only employed te subjective domains where the user’s interests stay relatively unchanged and where the user perceives a onmiddellijk benefit from rating items.
A number of approaches to webstek gegevens mining have bot developed that can be used by collaborative filtering systems to make the recommendation task both automatic and dynamic. The gegevens that thesis web mining technologies use consist of webstek gegevens of users navigating through a web webpagina.
A transaction is a list of web pages a user has visited on a web webpagina during a continuous period of time. There are several ways ter which the list of web pages can be chosen. The list could contain all the content pages a user has visited during a user session, a time interval ter which the user browses through the web webpagina. A list could also consist of one content pagina and all the navigation pages a user needed to navigate to the content pagina. Here it is assumed that a transaction is the list of all the web pages a user has visited during a user session.
Transactions can be represented spil vectors much like documents are represented te the vector space proefje. If there are m web pages, a transaction T is represented spil an m-dimensional vector, where each dimension i corresponds to weight assigned to the i-th pagina. Hence, a transaction T is written spil .
Several different weighting schemes can be used. One weighting scheme is to use binary weights where the weight is 1 if the user has visited the pagina during a session and 0 otherwise. Other weighting schemes include the time users spend reading a pagina and the number of times a pagina is accessed during a session.
The cosine measure can be used to calculate the similarity inbetween transaction :
Examples of mechanisms that can be used for transaction clustering are the k-means algorithm and the single pass algorithm. Regardless of which algorithm is used, the input is always a set of m transaction vectors and the output is always a set of n clusters. A cluster is a set of transaction vectors signifying a group of users with similar access patterns. Transaction clusters themselves however are not suitable for predicting future access patterns since each cluster may contain hundreds of transactions and web pages. After the transactions are grouped into clusters, the centroid of every vector is therefore computed. Pages that show up infrequently ter a cluster are sometimes left out by setting their corresponding value te the centroid to zero if a value is below a certain threshold.
Note: This representation method does not take the order te which pages are accessed into account. Albeit methods exist that use pagina order, they seem to play a more significant role ter systems that improve the structure of the web webpagina rather than te systems that provide dynamic recommendations.
The clustering technology described ter the previous section provides a set of clusters that represent similar access patterns among users of a web webpagina. Thesis clusters can be compared with the session of an active user ter order to find web pages the user hasn’t visited yet but other users with similar access patterns have. The active user session is maintained spil a transaction vector P and has to be updated when the user requests a fresh pagina. To recommend pages, vector P is compared with the centroids of all the clusters and the pages te the most similar clusters that have not bot visited are selected. The selected pages can be ordered by using the similarity values and the centroid weights.