Hệ thống khuyến nghị

Chương 3. HIỆN THỰC HỆ THỐNG

3.1. Tổng quan hệ thống mua sắm thông minh

3.1.5. Hệ thống khuyến nghị

a. Content-based

• Truyền dữ liệu cần thiết cho Content-based

u_cols = ['user_id','email']

users = pd.read_csv('E:/SSISData/Account.user', sep='|', names=u_cols,encoding='utf8') n_users = users.shape[0]

r_cols = ['user_id', 'pro_id', 'rating']

ratings_base =

pd.read_csv('E:/SSISData/User_Rated.base', sep='|', names=r_cols, encoding='utf8')

ratings_test = pd.read_csv('E:/SSISData/User_Test.base', sep='|', names=r_cols, encoding='utf8') rate_train = ratings_base.as_matrix()

rate_test = ratings_test.as_matrix()

• Xây dựng item profiles

Công việc quan trọng trong Content-based Recommendation System là xây dựng thông tin cho mỗi item, tức feature vector cho mỗi item. Trước hết, chúng ta cần load toàn bộ thông tin về các items vào biến items:

i_cols = ['Number','ProductID', 'ImageURL'

,'Dienthoai','Maytinhbang', 'Laptop', 'Apple', 'Nokia', 'Samsung',

'Oppo', 'Panasonic', 'Honor', 'Xiaomi', 'Sony', 'Asus', 'Huawei',

'Blackberry', 'HTC', 'Vivo', 'Philips', 'Masstel',

'Bluboo', 'Ulefone', 'Ivvi', 'Oukitel', 'Lenovo', 'Kindle', 'ONYX', 'cutePad', 'Itel', 'Bibox', 'Kobo', 'HP', 'Dell', 'MSI', 'Micro']

items = pd.read_csv('D:/Download/ml-100k/ml-100k/hoang.txt',

sep='|', names=i_cols, encoding='utf8')

n_items = items.shape[0]

Vì ta đang dựa trên thể loại của sản phẩm để xây dựng nội dung, ta sẽ chỉ quan tâm tới 36 giá trị nhị phân ở cuối mỗi hàng vì 36 giá trị đó thể hiện cho nội dung của sản phẩm:

X0 = items.as_matrix() # ma trận các sản phẩm X_train_counts = X0[:, -33:]

Tiếp theo, chúng ta sẽ xây dựng feature vector cho mỗi item dựa trên ma trận thể loại của sản phẩm và feature TF-IDF.

from sklearn.feature_extraction.text import TfidfTransformer transformer = TfidfTransformer(smooth_idf=True, norm ='l2') tfidf =

transformer.fit_transform(X_train_counts.tolist()).toarray()

Sau bước này, mỗi hàng của tfidf tương ứng với feature vector của một bộ phim.

Tiếp theo, với mỗi user, chúng ta cần đi tìm những sản phẩm nào mà user đó đã rated, và giá trị của các rating đó.

def get_items_rated_by_user(rate_matrix, user_id):

y = rate_matrix[:,0]

ids = np.where(y == user_id +1)[0]

item_ids = rate_matrix[ids, 1] - 1 scores = rate_matrix[ids, 2]

return (item_ids, scores)

• Tìm mô hình cho mỗi user

Bây giờ, ta có thể đi tìm các hệ số của Ridge Regression cho mỗi user:

from sklearn.linear_model import Ridge from sklearn import linear_model

d = tfidf.shape[1] # data dimension W = np.zeros((d, n_users))

b = np.zeros((1, n_users))

# Xac dinh cac he so w, b for n in range(n_users):

try:ids, scores = get_items_rated_by_user(rate_train, n)

#ids = ids-1

clf = Ridge(alpha=0.01, fit_intercept=True) Xhat = tfidf[ids, :]

clf.fit(Xhat, scores) W[:, n] = clf.coef_

b[0, n] = clf.intercept_

Sau khi tính được các hệ số W và b, ratings cho mỗi items được dự đoán bằng cách tính:

Yhat = tfidf.dot(W) + b

• Chạy đê nhận kết quả dự đoán

np.set_printoptions(precision=2) # 2 digits after . ids, scores = get_items_rated_by_user(rate_test, n) write_file(n, Yhat[ids, n], ids)

print('Rated movies ids :', ids)

print('True ratings :', scores)

print('Predicted ratings:', Yhat[ids, n])

print('RMSE for training:', evaluate(Yhat, rate_train, W, b))print('RMSE for test :', evaluate(Yhat, rate_test, W, b))except:

pass

Hình 3.3: Kết quả của hệ thống khuyến nghị

• Ghi kết quả vào file .txt cho qua SSIS để thêm dữ liệu khuyến nghị cho Web so sánh

def write_file(user_id, data, product_ids, filename="predited_data1.txt"):

text = "user"

# save as file os.chdir(os.getcwd())

with open(filename, "a+", encoding="UTF-8") as f:

for (rate, pro_id) in zip(data, product_ids):

msg = "%d|%d|%.2f\n" % (user_id+1, pro_id+1, rate)

f.writelines(msg)

Hình 3.4: File chứa kết quả b. Collaborative filtering

Chúng ta sẽ tạo một hàm class CF với dữ liệu được đưa vào là một ma trận được lưu dưới dạng một ma trận với ba cột là thông tin user đánh giá cho sản phẩm, k là số lượng các điểm lân cận được sử dụng để dự đoán kết quả điểm đánh giá trong công thức dự đoán đánh giá, dist_func là hàm đó similarity giữa hai vectors, mặc định là cosine_similarity được lấy từ sklearn.metrics.pairwise.

Trong hàm này sẽ thực hiện từng bước như lý thuyết trên:

• Tính toán normalized Utility Matrix và Similarity matrix

def normalize_Y(self):

users = self.Y_data[:, 0]

self.Ybar_data = self.Y_data.copy() self.mu = np.zeros((self.n_users+1,)) for n in range(1, self.n_users+1):

ids = np.where(users == n)[0].astype(np.int32) item_ids = self.Y_data[ids, 1]

ratings = self.Y_data[ids, 2]

m = np.mean(ratings) if np.isnan(m):

m = 0

self.Ybar_data[ids, 2] = ratings - self.mu[n]

self.Ybar = sparse.coo_matrix((self.Ybar_data[:, 2], (self.Ybar_data[:, 1], self.Ybar_data[:, 0])))

self.Ybar = self.Ybar.tocsr() def similarity(self):

self.S = self.dist_func(self.Ybar.T, self.Ybar.T)

• Dự đoán kết quả:

Hàm __pred là hàm dự đoán rating mà user u cho item i cho trường hợp User- user CF chúng ta sẽ truyền dữ liệu biến đầu vào là user và sản phẩm

def __pred(self, u, i, normalized=1):

ids = np.where(self.Y_data[:, 1] ==

i)[0].astype(np.int32)

users_rated_i = (self.Y_data[ids, 0]).astype(np.int32) sim = self.S[u, users_rated_i]

a = np.argsort(sim)[-self.k:]

nearest_s = sim[a]

r = self.Ybar[i, users_rated_i[a]]

if normalized:

return (r * nearest_s)[0] / (np.abs(nearest_s).sum()

+ 1e-8)

return (r * nearest_s)[0] / (np.abs(nearest_s).sum() + 1e-8) + self.mu[u]

def pred(self, u, i, normalized=1):

if self.uuCF: return self.__pred(u, i, normalized) return self.__pred(i, u, normalized)

• Tìm tất cả các items nên được gợi ý cho user u trong trường hợp User- user CF

def recommend(self, u, normalized=1):

ids = np.where(self.Y_data[:, 0] == u)[0]

items_rated_by_u = self.Y_data[ids, 1].tolist() recommended_items = []

for i in range(1, self.n_items + 1):

if i not in items_rated_by_u:

rating = self.__pred(u, i) if rating > 0:

recommended_items.append(i) return recommended_items

• In toàn bộ kết quả:

def print_recommendation(self):

'Recommendation: '

for u in range(1, self.n_users + 1):

recommended_items = self.recommend(u) write_file(u, self.recommend(u)) if self.uuCF:

print(

' Recommend item(s):', recommended_items,

'to user', u)

else:

print(' Recommend item', u, 'to user(s): ', recommended_items)

Kết quả của Collaborative Filltering sẽ là một file dữ liệu đầu vào cho Content Base.

Quản lý dữ liệu website

Trang web so sánh giá