Collaborative filtering: Difference between revisions

Content deleted Content added
add authorlink and rm bogus repo link
 
(8 intermediate revisions by 6 users not shown)
Line 3:
[[File:Collaborative filtering.gif|300px|thumb|This image shows an example of predicting of the user's rating using [[Collaborative software|collaborative]] filtering. At first, people rate different items (like videos, images, games). After that, the system is making [[prediction]]s about user's rating for an item, which the user has not rated yet. These predictions are built upon the existing ratings of other users, who have similar ratings with the active user. For instance, in our case the system has made a prediction, that the active user will not like the video.]]
{{Recommender systems}}
'''Collaborative filtering''' ('''CF''') is a technique used by [[recommender system]]s.<ref name="handbook">Francesco Ricci and Lior Rokach and Bracha Shapira, [http://www.inf.unibz.it/~ricci/papers/intro-rec-sys-handbook.pdf Introduction to Recommender Systems Handbook] {{Webarchive|url=https://web.archive.org/web/20160602175633/http://www.inf.unibz.it/~ricci/papers/intro-rec-sys-handbook.pdf |date=2 June 2016 }}, Recommender Systems Handbook, Springer, 2011, pp. 1–35</ref> Collaborative filtering has two senses, a narrow one and a more general one.<ref name=recommender>{{cite web|title=Beyond Recommender Systems: Helping People Help Each Other|url=http://www.grouplens.org/papers/pdf/rec-sys-overview.pdf|publisher=Addison-Wesley|access-date=16 January 2012|page=6|year=2001|last1=Terveen|first1=Loren|last2=Hill|first2=Will|author-link1=Loren Terveen}}</ref>
 
In the newer, narrower sense, collaborative filtering is a method of making automatic [[prediction]]s (filtering) about the interests of a [[End user|user]] by collecting preferences or [[taste (sociology)|taste]] information from [[crowdsourcing|many users]] (collaborating). The underlying assumption of the collaborative filtering approach is that if a person ''A'' has the same opinion as a person ''B'' on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person. For example, a collaborative filtering recommendation system for preferences in [[television]] programming could make predictions about which television show a user should like given a partial list of that user's tastes (likes or dislikes).<ref>[http://www.redbeemedia.com/insights/integrated-approach-tv-vod-recommendations An integrated approach to TV & VOD Recommendations] {{webarchive |url=https://web.archive.org/web/20120606225352/http://www.redbeemedia.com/insights/integrated-approach-tv-vod-recommendations |date=6 June 2012 }}</ref> These predictions are specific to the user, but use information gleaned from many users. This differs from the simpler approach of giving an [[average]] (non-specific) score for each item of interest, for example based on its number of [[vote]]s.
Line 75:
 
===Hybrid===
A number of applications combine the memory-based and the model-based CF algorithms. These overcome the limitations of native CF approaches and improve prediction performance. Importantly, they overcome the CF problems such as sparsity and loss of information. However, they have increased complexity and are expensive to implement.<ref>{{cite journal | doi=10.1016/j.ins.2012.04.012 | volume=208 | title=Kernel-Mapping Recommender system algorithms | journal=Information Sciences | pages=81–104| year=2012 | last1=Ghazanfar | first1=Mustansar Ali | last2=Prügel-Bennett | first2=Adam | last3=Szedmak | first3=Sandor | citeseerx=10.1.1.701.7729 | s2cid=20328670 }}
</ref> Usually most commercial recommender systems are hybrid, for example, the Google news recommender system.<ref>{{cite book|chapter=Google news personalization|doi=10.1145/1242572.1242610|title=Proceedings of the 16th international conference on World Wide Web – WWW '07|pages=271|year=2007|last1=Das|first1=Abhinandan S.|last2=Datar|first2=Mayur|last3=Garg|first3=Ashutosh|last4=Rajaram|first4=Shyam|isbn=9781595936547|s2cid=207163129}}</ref>
 
===Deep-Learninglearning===
In recent years a number of neural and deep-learning techniques have been proposed. Some generalize traditional [[Matrix factorization (recommender systems)|Matrixmatrix factorization]] algorithms via a non-linear neural architecture,<ref>{{cite journalbook |last1=He |first1=Xiangnan |last2=Liao |first2=Lizi |last3=Zhang |first3=Hanwang |last4=Nie |first4=Liqiang |last5=Hu |first5=Xia |last6=Chua |first6=Tat-Seng |title=Neural Collaborative Filtering |journal=Proceedings of the 26th International Conference on World Wide Web |chapter=Neural Collaborative Filtering |date=2017 |pages=173–182 |doi=10.1145/3038912.3052569 |chapter-url=https://dl.acm.org/citation.cfm?id=3052569 |access-date=16 October 2019 |publisher=International World Wide Web Conferences Steering Committee|isbn=9781450349130 |arxiv=1708.05031 |s2cid=13907106 }}</ref> or leverage new model types like Variational [[Autoencoder]]s.<ref>{{cite journalbook |last1=Liang |first1=Dawen |last2=Krishnan |first2=Rahul G. |last3=Hoffman |first3=Matthew D. |last4=Jebara |first4=Tony |title=VariationalProceedings Autoencodersof forthe Collaborative2018 FilteringWorld |journal=ProceedingsWide ofWeb theConference 2018on World Wide Web Conference– WWW '18 |chapter=Variational Autoencoders for Collaborative Filtering |date=2018 |pages=689–698 |doi=10.1145/3178876.3186150 |url=https://dl.acm.org/citation.cfm?id=3186150 |publisher=International World Wide Web Conferences Steering Committee|arxiv=1802.05814 |isbn=9781450356398 |doi-access=free }}</ref>
While deep learning has been applied to many different scenarios: context-aware, sequence-aware, social tagging etc. its real effectiveness when used in a simple collaborative recommendation scenario has been put into question. A systematic analysis of publications applying deep learning or neural methods to the top-k recommendation problem, published in top conferences (SIGIR, KDD, WWW, RecSys), has shown that on average less than 40% of articles are reproducible, with as little as 14% in some conferences. Overall the study identifies 18 articles, only 7 of them could be reproduced and 6 of them could be outperformed by much older and simpler properly tuned baselines. The article also highlights a number of potential problems in today's research scholarship and calls for improved scientific practices in that area.<ref>{{cite journalbook |last1=Ferrari Dacrema |first1=Maurizio |last2=Cremonesi |first2=Paolo |last3=Jannach |first3=Dietmar |title=AreProceedings Weof Reallythe Making13th MuchACM Progress?Conference Aon WorryingRecommender AnalysisSystems of|chapter=Are Recentwe Neuralreally Recommendationmaking Approachesmuch |journal=Proceedingsprogress? ofA theworrying 13thanalysis ACMof Conferencerecent onneural Recommenderrecommendation Systemsapproaches |date=2019 |pages=101–109 |doi=10.1145/3298689.3347058 |hdl=11311/1108996 |chapter-url=https://dl.acm.org/authorize?N684126
|access-date=16 October 2019 |publisher=ACM|arxiv=1907.06902 |isbn=9781450362436 |s2cid=196831663 }}</ref> Similar issues have been spotted by others<ref>{{cite journal |last1=Anelli |first1=Vito Walter |last2=Bellogin |first2=Alejandro |last3=Di Noia |first3=Tommaso |last4=Jannach |first4=Dietmar |last5=Pomo |first5=Claudio |title=Top-N Recommendation Algorithms: A Quest for the State-of-the-Art |journal=Proceedings of the 30th ACM Conference on User Modeling, Adaptation and Personalization |date=2022 |pages=121–131 |doi=10.1145/3503252.3531292 |url=https://doi.org/10.1145/3503252.3531292}
|access-date=1 March 2022 |publisher=ACM |arxiv=2203.01155 |isbn=9781450392075 |s2cid=247218662 }}</ref> and also in sequence-aware recommender systems.<ref>{{cite journalbook |last1=Ludewig |first1=Malte |last2=Mauro |first2=Noemi |last3=Latifi |first3=Sara |last4=Jannach |first4=Dietmar |title=Performance ComparisonProceedings of Neuralthe and13th Non-neuralACM ApproachesConference toon Session-basedRecommender RecommendationSystems |journalchapter=ProceedingsPerformance comparison of theneural 13thand ACMnon-neural Conferenceapproaches onto Recommendersession-based Systemsrecommendation |date=2019 |pages=462–466 |doi=10.1145/3298689.3347041 |url=https://dl.acm.org/citation.cfm?id=3347041 |access-date=16 October 2019 |publisher=ACM|isbn=9781450362436 |doi-access=free }}</ref>
 
== Context-aware collaborative filtering ==
Many recommender systems simply ignore other contextual information existing alongside user's rating in providing item recommendation.<ref>{{Cite book|title=Recommender Systems Handbook|last1=Adomavicius|first1=Gediminas|last2=Tuzhilin|first2=Alexander|author2-link= Alexander Tuzhilin |date=2015-01-01|publisher=Springer US|isbn=9781489976369|editor-last=Ricci|editor-first=Francesco|pages=191–226|language=en|doi=10.1007/978-1-4899-7637-6_6|url=https://repositorio.unal.edu.co/handle/unal/79598|editor-last2=Rokach|editor-first2=Lior|editor-last3=Shapira|editor-first3=Bracha}}</ref> However, by pervasive availability of contextual information such as time, location, social information, and type of the device that user is using, it is becoming more important than ever for a successful recommender system to provide a context-sensitive recommendation. According to Charu Aggrawal, "Context-sensitive recommender systems tailor their recommendations to additional information that defines the specific situation under which recommendations are made. This additional information is referred to as the context."<ref name=":0" />
 
Taking contextual information into consideration, we will have additional dimension to the existing user-item rating matrix. As an instance, assume a music recommender system which provideprovides different recommendations in corresponding to time of the day. In this case, it is possible a user have different preferences for a music in different time of a day. Thus, instead of using user-item matrix, we may use [[tensor]] of order 3 (or higher for considering other contexts) to represent context-sensitive users' preferences.<ref>{{cite journal|last1=Bi|first1=Xuan|last2=Qu|first2=Annie|last3=Shen|first3=Xiaotong|year=2018|title=Multilayer tensor factorization with applications to recommender systems.|url=https://projecteuclid.org/euclid.aos/1536631275|journal=Annals of Statistics|volume=46|issue=6B|pages=3303–3333|doi=10.1214/17-AOS1659|arxiv=1711.01598|s2cid=13677707}}</ref><ref>{{cite arXiv|last1=Zhang|first1=Yanqing|last2=Bi|first2=Xuan|last3=Tang|first3=Niansheng|last4=Qu|first4=Annie|year=2020|title=Dynamic tensor recommender systems.|eprint=2003.05568v1|class=stat.ME}}</ref><ref>{{cite journal|last1=Bi|first1=Xuan|last2=Tang|first2=Xiwei|last3=Yuan|first3=Yubai|last4=Zhang|first4=Yanqing|last5=Qu|first5=Annie|year=2021|title=Tensors in Statistics.|url=https://www.annualreviews.org/doi/abs/10.1146/annurev-statistics-042720-020816#:~:text=Abstract,are%20useful%20data%20representation%20architectures.|journal=[[Annual Review of Statistics and Its Application]]|volume=8|issue=1|pages=annurev|doi=10.1146/annurev-statistics-042720-020816|bibcode=2021AnRSA...842720B|s2cid=224956567|doi-access=free}}</ref>
 
In order to take advantage of collaborative filtering and particularly neighborhood-based methods, approaches can be extended from a two-dimensional rating matrix into a tensor of higher order{{Citation needed|reason=unsupported claim, sounds like personal thoughts|date=May 2017}}. For this purpose, the approach is to find the most similar/like-minded users to a target user; one can extract and compute similarity of slices (e.g. item-time matrix) corresponding to each user. Unlike the context-insensitive case for which similarity of two rating vectors are calculated, in the [[context awareness|context-aware]] approaches, the similarity of rating matrices corresponding to each user is calculated by using [[Pearson correlation coefficient|Pearson coefficients]].<ref name=":0" /> After the most like-minded users are found, their corresponding ratings are aggregated to identify the set of items to be recommended to the target user.
Line 134:
|title=Recommender Systems Handbook|date=2015|publisher=Springer US|isbn=978-1-4899-7637-6|edition=2
|chapter=Novelty and Diversity in Recommender Systems|chapter-url = https://link.springer.com/chapter/10.1007/978-1-4899-7637-6_26|doi=10.1007/978-1-4899-7637-6_26|pages=881–918
}}</ref> by recommending novel,<ref>{{Cite journalarXiv |last1=Choi |first1=Jeongwhan |last2=Hong |first2=Seoyong |last3=Park |first3=Noseong |last4=Cho |first4=Sung-Bae|title=Blurring-Sharpening Process Models for Collaborative Filtering |urlyear=https://arxiv2022 |class=cs.org/abs/2211.09324IR |journaleprint=arXiv preprint arXiv: Arxiv-2211.09324 |via=arXiv}}</ref> unexpected,<ref>{{cite journal| last1= Adamopoulos | first1= Panagiotis | first2= Alexander |last2= Tuzhilin | title=On Unexpectedness in Recommender Systems: Or How to Better Expect the Unexpected|journal=ACM Transactions on Intelligent Systems and Technology | volume= 5 | issue= 4 | pages= 1–32 |date=January 2015| doi = 10.1145/2559952| s2cid= 15282396 }}</ref> and serendipitous items.<ref>{{cite book| last1= Adamopoulos | first1= Panagiotis | title=Beyond ratingProceedings predictionof accuracy:the on7th newACM perspectivesconference inon recommenderRecommender systems |journal chapter=Proceedings ofBeyond therating 7thprediction ACM Conference on Recommender Systemsaccuracy | pages= 459–462 |date=October 2013| doi = 10.1145/2507157.2508073| isbn= 9781450324090 | s2cid= 1526264 }}</ref>
 
==Innovations==
Line 143:
==Auxiliary information==
 
User-item matrix is a basic foundation of traditional collaborative filtering techniques, and it suffers from data sparsity problem (i.e. [[Cold start (computing)|cold start]]). As a consequence, except for user-item matrix, researchers are trying to gather more auxiliary information to help boost recommendation performance and develop personalized recommender systems.<ref>{{Cite journal|last1=Shi|first1=Yue|last2=Larson|first2=Martha|last3=Hanjalic|first3=Alan|year=2014|title=Collaborative filtering beyond the user-item matrix: A survey of the state of the art and future challenges|journal=ACM Computing Surveys |volume=47|pages=1–45|doi=10.1145/2556270|s2cid=5493334}}</ref> Generally, there are two popular auxiliary information: attribute information and interaction information. Attribute information describes a user's or an item's properties. For example, user attribute might include general profile (e.g. gender and age) and social contacts (e.g. followers or friends in [[social networks]]); Item attribute means properties like category, brand or content. In addition, interaction information refers to the implicit data showing how users interplay with the item. Widely used interaction information contains tags, comments or reviews and browsing history etc. Auxiliary information plays a significant role in a variety of aspects. Explicit social links, as a reliable representative of trust or friendship, is always employed in similarity calculation to find similar persons who share interest with the target user.<ref>{{cite book|last1=Massa|first1=Paolo|last2=Avesani|first2=Paolo|title=Computing with social trust|date=2009|publisher=Springer|location=London|pages=259–285}}</ref><ref>{{cite conference|author1=Groh Georg|author2=Ehmig Christian|title=Recommendations in taste related domains: collaborative filtering vs. social filtering|conference=Proceedings of the 2007 international ACM conference on Supporting group work|pages=127–136|citeseerx=10.1.1.165.3679}}</ref> The interaction-associated information – tags – is taken as a third dimension (in addition to user and item) in advanced collaborative filtering to construct a 3-dimensional tensor structure for exploration of recommendation.<ref>{{Cite book|last1=Symeonidis|first1=Panagiotis|last2=Nanopoulos|first2=Alexandros|last3=Manolopoulos|first3=Yannis|yeartitle=Proceedings of the 2008 ACM conference on Recommender systems |titlechapter=Tag recommendations based on tensor dimensionality reduction |journalyear=Proceedings of the 2008 ACM Conference on Recommender Systems|pages=43–50|doi=10.1145/1454008.1454017|citeseerx=10.1.1.217.1437|isbn=9781605580937|s2cid=17911131}}</ref>
 
==See also==
Line 181:
*[http://www.cs.utexas.edu/users/ml/papers/cbcf-aaai-02.pdf Content-Boosted Collaborative Filtering for Improved Recommendations.] Prem Melville, [[Raymond J. Mooney]], and Ramadass Nagarajan. Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI-2002), pp.&nbsp;187–192, Edmonton, Canada, July 2002.
*[http://agents.media.mit.edu/projects.html A collection of past and present "information filtering" projects (including collaborative filtering) at MIT Media Lab]
*[http://www.ieor.berkeley.edu/~goldberg/pubs/eigentaste.pdf Eigentaste: A Constant Time Collaborative Filtering Algorithm. Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. Information Retrieval, 4(2), 133-151133–151. July 2001.]
*[http://downloads.hindawi.com/journals/aai/2009/421425.pdf A Survey of Collaborative Filtering Techniques] Su, Xiaoyuan and Khoshgortaar, Taghi. M
*[http://dl.acm.org/citation.cfm?id=1242610 Google News Personalization: Scalable Online Collaborative Filtering] Abhinandan Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. International World Wide Web Conference, Proceedings of the 16th international conference on World Wide Web
*[https://web.archive.org/web/20101023032716/http://research.yahoo.com/pub/2435 Factor in the Neighbors: Scalable and Accurate Collaborative Filtering] Yehuda Koren, Transactions on Knowledge Discovery from Data (TKDD) (2009)
*[https://web.archive.org/web/20110224164938/http://webpages.uncc.edu/~asaric/ISMIS09.pdf Rating Prediction Using Collaborative Filtering]
*[http://www.cis.upenn.edu/~ungar/CF/ Recommender Systems] {{Webarchive|url=https://web.archive.org/web/20130211033515/http://www.cis.upenn.edu/~ungar/CF/ |date=11 February 2013 }}
*[http://www2.sims.berkeley.edu/resources/collab/ Berkeley Collaborative Filtering]