Documentation
¶
Overview ¶
Package model provides models for item rating and ranking.
There are two kinds of models: rating model and ranking model. Although rating models could be used for ranking, performance won't be guaranteed and even won't make sense, vice versa.
- Item rating models include: Random, Baseline, SVD(optimizer=Regression), SVD++, NMF, KNN, SlopeOne, CoClustering
- Item ranking models includes: ItemPop, WRMF, SVD(optimizer=BPR)
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type BPR ¶
type BPR struct { Base // Model parameters UserFactor [][]float64 // p_u ItemFactor [][]float64 // q_i // Fallback model UserRatings []*base.MarginalSubSet ItemPop *ItemPop // contains filtered or unexported fields }
BPR means Bayesian Personal Ranking, is a pairwise learning algorithm for matrix factorization model with implicit feedback. The pairwise ranking between item i and j for user u is estimated by:
p(i >_u j) = \sigma( p_u^T (q_i - q_j) )
Hyper-parameters:
Reg - The regularization parameter of the cost function that is optimized. Default is 0.01. Lr - The learning rate of SGD. Default is 0.05. nFactors - The number of latent factors. Default is 10. NEpochs - The number of iteration of the SGD procedure. Default is 100. InitMean - The mean of initial random latent factors. Default is 0. InitStdDev - The standard deviation of initial random latent factors. Default is 0.001.
func (*BPR) Fit ¶
func (bpr *BPR) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the BPR model.
type Base ¶
type Base struct { Params base.Params // Hyper-parameters UserIndexer *base.Indexer // Users' ID set ItemIndexer *base.Indexer // Items' ID set // contains filtered or unexported fields }
Base model must be included by every recommendation model. Hyper-parameters, ID sets, random generator and fitting options are managed the Base model.
func (*Base) Fit ¶
func (model *Base) Fit(trainSet core.DataSet, options *base.RuntimeOptions)
Fit has not been implemented,
func (*Base) Init ¶
func (model *Base) Init(trainSet core.DataSetInterface)
Init the Base model. The method must be called at the beginning of Fit.
type BaseLine ¶
type BaseLine struct { Base UserBias []float64 // b_u ItemBias []float64 // b_i GlobalBias float64 // mu // contains filtered or unexported fields }
BaseLine predicts the rating for given user and item by
\hat{r}_{ui} = b_{ui} = μ + b_u + b_i
If user u is unknown, then the Bias b_u is assumed to be zero. The same applies for item i with b_i. Hyper-parameters:
Reg - The regularization parameter of the cost function that is optimized. Default is 0.02. Lr - The learning rate of SGD. Default is 0.005. NEpochs - The number of iteration of the SGD procedure. Default is 20. RandomState - The random seed. Default is 0.
func NewBaseLine ¶
NewBaseLine creates a baseline model.
func (*BaseLine) Fit ¶
func (baseLine *BaseLine) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the BaseLine model.
type CoClustering ¶
type CoClustering struct { Base GlobalMean float64 // A^{global} UserMeans []float64 // A^{R} ItemMeans []float64 // A^{R} UserClusters []int // p(i) ItemClusters []int // y(i) UserClusterMeans []float64 // A^{RC} ItemClusterMeans []float64 // A^{CC} CoClusterMeans [][]float64 // A^{COC} // contains filtered or unexported fields }
CoClustering [5] is a novel collaborative filtering approach based on weighted co-clustering algorithm that involves simultaneous clustering of users and items.
Let U={u_i}^m_{i=1} be the set of users such that |U|=m and P={p_j}^n_{j=1} be the set of items such that |P|=n. Let A be the m x n ratings matrix such that A_{ij} is the rating of the user u_i for the item p_j. The approximate matrix \hat{A}_{ij} is given by
\hat{A}_{ij} = A^{COC}_{gh} + (A^R_i - A^{RC}_g) + (A^C_j - A^{CC}_h)
where g=ρ(i), h=γ(j) and A^R_i, A^C_j are the average ratings of user u_i and item p_j, and A^{COC}_{gh}, A^{RC}_g and A^{CC}_h are the average ratings of the corresponding co-cluster, user-cluster and item-cluster respectively.
Hyper-parameters:
NEpochs - The number of iterations of the optimization procedure. Default is 20. NUserClusters - The number of user clusters. Default is 3. NItemClusters - The number of item clusters. Default is 3. RandomState - The random seed. Default is 0.
func NewCoClustering ¶
func NewCoClustering(params base.Params) *CoClustering
NewCoClustering creates a CoClustering model.
func (*CoClustering) Fit ¶
func (coc *CoClustering) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the CoClustering model.
func (*CoClustering) Predict ¶
func (coc *CoClustering) Predict(userId, itemId string) float64
Predict by the CoClustering model.
func (*CoClustering) SetParams ¶
func (coc *CoClustering) SetParams(params base.Params)
SetParams sets hyper-parameters for the CoClustering model.
type FM ¶
type FM struct { Base UserFeatures []*base.SparseVector ItemFeatures []*base.SparseVector // Model parameters GlobalBias float64 // w_0 Bias []float64 // w_i Factors [][]float64 // v_i // Fallback model UserRatings []*base.MarginalSubSet ItemPop *ItemPop // contains filtered or unexported fields }
FM is the implementation of factorization machine [12]. The prediction is given by
\hat y(x) = w_0 + \sum^n_{i=1} w_i x_i + \sum^n_{i=1} \sum^n_{j=i+1} <v_i, v_j>x_i x_j
Hyper-parameters:
Reg - The regularization parameter of the cost function that is optimized. Default is 0.02. Lr - The learning rate of SGD. Default is 0.005. nFactors - The number of latent factors. Default is 100. NEpochs - The number of iteration of the SGD procedure. Default is 20. InitMean - The mean of initial random latent factors. Default is 0. InitStdDev - The standard deviation of initial random latent factors. Default is 0.1.
func (*FM) Fit ¶
func (fm *FM) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the factorization machine.
type ItemPop ¶
ItemPop recommends items by their popularity. The popularity of a item is defined as the occurrence frequency of the item in the training data set.
func (*ItemPop) Fit ¶
func (pop *ItemPop) Fit(set core.DataSetInterface, options *base.RuntimeOptions)
Fit the ItemPop model.
type KNN ¶
type KNN struct { Base GlobalMean float64 SimMatrix [][]float64 LeftRatings []*base.MarginalSubSet RightRatings []*base.MarginalSubSet UserRatings []*base.MarginalSubSet LeftMean []float64 // Centered KNN: user (item) Mean StdDev []float64 // KNN with Z Score: user (item) standard deviation Bias []float64 // KNN Baseline: Bias // contains filtered or unexported fields }
KNN for collaborate filtering.
Type - The type of KNN ('Basic', 'Centered', 'ZScore', 'Baseline'). Default is 'basic'. Similarity - The similarity function. Default is MSD. UserBased - User based or item based? Default is true. K - The maximum k neighborhoods to predict the rating. Default is 40. MinK - The minimum k neighborhoods to predict the rating. Default is 1.
func (*KNN) Fit ¶
func (knn *KNN) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the KNN model.
type KNNImplicit ¶
type KNNImplicit struct { Base Matrix [][]float64 Users []*base.MarginalSubSet }
KNNImplicit is the KNN model for implicit feedback.
func NewKNNImplicit ¶
func NewKNNImplicit(params base.Params) *KNNImplicit
NewKNNImplicit creates a KNN model for implicit feedback.
func (*KNNImplicit) Fit ¶
func (knn *KNNImplicit) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the KNN model.
func (*KNNImplicit) Predict ¶
func (knn *KNNImplicit) Predict(userId, itemId string) float64
Predict by the KNN model.
type NMF ¶
type NMF struct { Base GlobalMean float64 // the global mean of ratings UserFactor [][]float64 // p_u ItemFactor [][]float64 // q_i // contains filtered or unexported fields }
NMF [3] is the Matrix Factorization process with non-negative latent factors. During the MF process, the non-negativity, which ensures good representativeness of the learnt model, is critically important. Hyper-parameters:
Reg - The regularization parameter of the cost function that is optimized. Default is 0.06. NFactors - The number of latent factors. Default is 15. NEpochs - The number of iteration of the SGD procedure. Default is 50. InitLow - The lower bound of initial random latent factor. Default is 0. InitHigh - The upper bound of initial random latent factor. Default is 1.
func (*NMF) Fit ¶
func (nmf *NMF) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the NMF model.
type SVD ¶
type SVD struct { Base // Model parameters UserFactor [][]float64 // p_u ItemFactor [][]float64 // q_i UserBias []float64 // b_u ItemBias []float64 // b_i GlobalMean float64 // mu // Fallback model UserRatings []*base.MarginalSubSet ItemPop *ItemPop // contains filtered or unexported fields }
SVD algorithm, as popularized by Simon Funk during the Netflix Prize. The prediction \hat{r}_{ui} is set as:
\hat{r}_{ui} = μ + b_u + b_i + q_i^Tp_u
If user u is unknown, then the Bias b_u and the factors p_u are assumed to be zero. The same applies for item i with b_i and q_i. Hyper-parameters:
UseBias - Add useBias in SVD model. Default is true. Reg - The regularization parameter of the cost function that is optimized. Default is 0.02. Lr - The learning rate of SGD. Default is 0.005. nFactors - The number of latent factors. Default is 100. NEpochs - The number of iteration of the SGD procedure. Default is 20. InitMean - The mean of initial random latent factors. Default is 0. InitStdDev - The standard deviation of initial random latent factors. Default is 0.1.
func (*SVD) Fit ¶
func (svd *SVD) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the SVD model.
type SVDpp ¶
type SVDpp struct { Base TrainSet core.DataSetInterface UserFactor [][]float64 // p_u ItemFactor [][]float64 // q_i ImplFactor [][]float64 // y_i UserBias []float64 // b_u ItemBias []float64 // b_i GlobalMean float64 // mu // contains filtered or unexported fields }
SVDpp (SVD++) [10] is an extension of SVD taking into account implicit interactions. The predicted \hat{r}_{ui} is:
\hat{r}_{ui} = \mu + b_u + b_i + q_i^T\left(p_u + |I_u|^{-\frac{1}{2}} \sum_{j \in I_u}y_j\right)
Where the y_j terms are a new set of item factors that capture implicit interactions. Here, an implicit rating describes the fact that a user u rated an item j, regardless of the rating value. If user u is unknown, then the bias b_u and the factors p_u are assumed to be zero. The same applies for item i with b_i, q_i and y_i. Hyper-parameters:
Reg - The regularization parameter of the cost function that is optimized. Default is 0.02. Lr - The learning rate of SGD. Default is 0.007. NFactors - The number of latent factors. Default is 20. NEpochs - The number of iteration of the SGD procedure. Default is 20. InitMean - The mean of initial random latent factors. Default is 0. InitStdDev - The standard deviation of initial random latent factors. Default is 0.1.
func (*SVDpp) Fit ¶
func (svd *SVDpp) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the SVD++ model.
type SlopeOne ¶
type SlopeOne struct { Base GlobalMean float64 // Mean of ratings in training set UserRatings []*base.MarginalSubSet // Ratings by each user UserMeans []float64 // Mean of each user's ratings Dev [][]float64 // Deviations }
SlopeOne [4] predicts ratings by the form f(x) = x + b, which precompute the average difference between the ratings of one item and another for users who rated both.
First, deviations between pairs of items are computed. Given a training set χ, and any two items j and i with ratings u_j and u_i respectively in some user evaluation u (annotated as u∈S_{j,i}(χ)), the average deviation of item i with respect to item j is computed by:
dev_{j,i} = \sum_{u∈S_{j,i}(χ)} \frac{u_j-u_i} {card(S_{j,i}(χ)}
The computation on deviations could be parallelized.
In the predicting stage, Given that dev_{j,i} + u_i is a prediction for u_j given u_i, a reasonable predictor might be the average of all such predictions
P(u)_j = \frac{1}{card(R_j) \sum_{i∈R_j}(dev_{j,i} + u_i)
where R_j = {i|i ∈ S(u), i \ne j, card(S_{j,i}(χ)) > 0} is the set of all relevant items. The subset of the set of items consisting of all those items which are rated in u is S(u).
func NewSlopOne ¶
NewSlopOne creates a SlopeOne model.
func (*SlopeOne) Fit ¶
func (so *SlopeOne) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)
Fit the SlopeOne model.
type WRMF ¶
type WRMF struct { Base // Model parameters UserFactor *mat.Dense // p_u ItemFactor *mat.Dense // q_i // Fallback model UserRatings []*base.MarginalSubSet ItemPop *ItemPop // contains filtered or unexported fields }
WRMF [7] is the Weighted Regularized Matrix Factorization, which exploits unique properties of implicit feedback datasets. It treats the data as indication of positive and negative preference associated with vastly varying confidence levels. This leads to a factor model which is especially tailored for implicit feedback recommenders. Authors also proposed a scalable optimization procedure, which scales linearly with the data size. Hyper-parameters:
NFactors - The number of latent factors. Default is 10. NEpochs - The number of training epochs. Default is 50. InitMean - The mean of initial latent factors. Default is 0. InitStdDev - The standard deviation of initial latent factors. Default is 0.1. Reg - The strength of regularization.
func (*WRMF) Fit ¶
func (mf *WRMF) Fit(set core.DataSetInterface, options *base.RuntimeOptions)
Fit the WRMF model.