Content deleted Content added
Undid revision 1065138702 by 2603:6081:8E01:B3B3:41D2:1B96:18E8:DF09 (talk) - vandalism |
add authorlink and rm bogus repo link |
||
(19 intermediate revisions by 14 users not shown) | |||
Line 1:
{{short description|Algorithm}}
{{Use dmy dates|date=April 2023}}
[[File:Collaborative filtering.gif|300px|thumb|This image shows an example of predicting of the user's rating using [[Collaborative software|collaborative]] filtering. At first, people rate different items (like videos, images, games). After that, the system is making [[prediction]]s about user's rating for an item, which the user
{{Recommender systems}}
'''Collaborative filtering''' ('''CF''') is a technique used by [[recommender system]]s.<ref name="handbook">Francesco Ricci and Lior Rokach and Bracha Shapira, [http://www.inf.unibz.it/~ricci/papers/intro-rec-sys-handbook.pdf Introduction to Recommender Systems Handbook] {{Webarchive|url=https://web.archive.org/web/20160602175633/http://www.inf.unibz.it/~ricci/papers/intro-rec-sys-handbook.pdf |date=2 June 2016 }}, Recommender Systems Handbook, Springer, 2011, pp.
In the newer, narrower sense, collaborative filtering is a method of making automatic [[prediction]]s (filtering) about the interests of a [[End user|user]] by collecting preferences or [[taste (sociology)|taste]] information from [[crowdsourcing|many users]] (collaborating). The underlying assumption of the collaborative filtering approach is that if a person ''A'' has the same opinion as a person ''B'' on an issue, A is more likely to have B's opinion on a different issue than that of a randomly chosen person. For example, a collaborative filtering recommendation system for preferences in [[television]] programming could make predictions about which television show a user should like given a partial list of that user's tastes (likes or dislikes).<ref>[http://www.redbeemedia.com/insights/integrated-approach-tv-vod-recommendations An integrated approach to TV & VOD Recommendations] {{webarchive |url=https://web.archive.org/web/20120606225352/http://www.redbeemedia.com/insights/integrated-approach-tv-vod-recommendations |date=6 June 2012 }}</ref>
In the more general sense, collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc.<ref name="recommender" /> Applications of collaborative filtering typically involve very large data sets. Collaborative filtering methods have been applied to many different kinds of data including: sensing and monitoring data, such as in mineral exploration, environmental sensing over large areas or multiple sensors; financial data, such as financial service institutions that integrate many financial sources; or in electronic commerce and web applications where the focus is on user data, etc. The remainder of this discussion focuses on collaborative filtering for user data, although some of the methods and approaches may apply to the other major applications as well.
==Overview==
The [[internet growth|growth]] of the [[Internet]] has made it much more difficult to effectively [[information extraction|extract useful information]] from all the available [[online information]].{{according to whom
The motivation for collaborative filtering comes from the idea that people often get the best recommendations from someone with tastes similar to themselves.{{citation needed|date=March 2021}} Collaborative filtering encompasses techniques for matching people with similar interests and making [[recommender system|recommendations]] on this basis.
Line 71 ⟶ 72:
In this approach, models are developed using different [[data mining]], [[machine learning]] algorithms to predict users' rating of unrated items. There are many model-based CF algorithms. [[Bayesian networks]], [[Cluster Analysis|clustering models]], [[Latent Semantic Indexing|latent semantic models]] such as [[singular value decomposition]], [[probabilistic latent semantic analysis]], multiple multiplicative factor, [[latent Dirichlet allocation]] and [[Markov decision process]] based models.<ref name="Suetal2009">Xiaoyuan Su, Taghi M. Khoshgoftaar, [http://www.hindawi.com/journals/aai/2009/421425/ A survey of collaborative filtering techniques], Advances in Artificial Intelligence archive, 2009.</ref>
Through this approach, [[dimensionality reduction]] methods are mostly being used as complementary technique to improve robustness and accuracy of memory-based approach. In this sense, methods like [[singular value decomposition]], [[principal component analysis]], known as latent factor models, compress user-item matrix into a low-dimensional representation in terms of latent factors. One advantage of using this approach is that instead of having a high dimensional matrix containing abundant number of missing values we will be dealing with a much smaller matrix in lower-dimensional space. A reduced presentation could be utilized for either user-based or item-based neighborhood algorithms that are presented in the previous section. There are several advantages with this paradigm. It handles the [[Sparse matrix|sparsity]] of the original matrix better than memory based ones. Also comparing similarity on the resulting matrix is much more scalable especially in dealing with large sparse datasets.<ref name=":0">{{Cite book|url=https://www.springer.com/us/book/9783319296579|title=Recommender Systems
===Hybrid===
A number of applications combine the memory-based and the model-based CF algorithms. These overcome the limitations of native CF approaches and improve prediction performance. Importantly, they overcome the CF problems such as sparsity and loss of information. However, they have increased complexity and are expensive to implement.<ref>{{cite journal | doi=10.1016/j.ins.2012.04.012 | volume=208 | title=Kernel-Mapping Recommender system algorithms | journal=Information Sciences | pages=81–104| year=2012 | last1=Ghazanfar | first1=Mustansar Ali | last2=Prügel-Bennett | first2=Adam | last3=Szedmak | first3=Sandor | citeseerx=10.1.1.701.7729 | s2cid=20328670 }}
</ref> Usually most commercial recommender systems are hybrid, for example, the Google news recommender system.<ref>{{cite book|chapter=Google news personalization|doi=10.1145/1242572.1242610|title=Proceedings of the 16th international conference on World Wide Web
===Deep-
In recent years a number of neural and deep-learning techniques have been proposed. Some generalize traditional [[Matrix factorization (recommender systems)|
While deep learning has been applied to many different scenarios: context-aware, sequence-aware, social tagging etc. its real effectiveness when used in a simple collaborative recommendation scenario has been put into question. A systematic analysis of publications applying deep learning or neural methods to the top-k recommendation problem, published in top conferences (SIGIR, KDD, WWW, RecSys), has shown that on average less than 40% of articles are reproducible, with as little as 14% in some conferences. Overall the study identifies 18 articles, only 7 of them could be reproduced and 6 of them could be outperformed by much older and simpler properly tuned baselines. The article also highlights a number of potential problems in today's research scholarship and calls for improved scientific practices in that area.<ref>{{cite
|access-date=16 October 2019 |publisher=ACM|arxiv=1907.06902 |isbn=9781450362436 |s2cid=196831663 }}</ref> Similar issues have been spotted
|access-date=1 March 2022 |publisher=ACM |arxiv=2203.01155 |isbn=9781450392075 |s2cid=247218662 }}</ref> and also in sequence-aware recommender systems.<ref>{{cite book |last1=Ludewig |first1=Malte |last2=Mauro |first2=Noemi |last3=Latifi |first3=Sara |last4=Jannach |first4=Dietmar |title=Proceedings of the 13th ACM Conference on Recommender Systems |chapter=Performance comparison of neural and non-neural approaches to session-based recommendation |date=2019 |pages=462–466 |doi=10.1145/3298689.3347041 |publisher=ACM|isbn=9781450362436 |doi-access=free }}</ref>
== Context-aware collaborative filtering ==
Many recommender systems simply ignore other contextual information existing alongside user's rating in providing item recommendation.<ref>{{Cite book|title=Recommender Systems Handbook|last1=Adomavicius|first1=Gediminas|last2=Tuzhilin|first2=Alexander|author2-link= Alexander Tuzhilin |date=2015-01-01|publisher=Springer US|isbn=9781489976369|editor-last=Ricci|editor-first=Francesco|pages=191–226|language=en|doi=10.1007/978-1-4899-7637-6_6
Taking contextual information into consideration, we will have additional dimension to the existing user-item rating matrix. As an instance, assume a music recommender system which
In order to take advantage of collaborative filtering and particularly neighborhood-based methods, approaches can be extended from a two-dimensional rating matrix into a tensor of higher order{{Citation needed|reason=unsupported claim, sounds like personal thoughts|date=May 2017}}. For this purpose, the approach is to find the most similar/like-minded users to a target user; one can extract and compute similarity of slices (e.g. item-time matrix) corresponding to each user. Unlike the context-insensitive case for which similarity of two rating vectors are calculated, in the [[context awareness|context-aware]] approaches, the similarity of rating matrices corresponding to each user is calculated by using [[Pearson correlation coefficient|Pearson coefficients]].<ref name=":0" /> After the most like-minded users are found, their corresponding ratings are aggregated to identify the set of items to be recommended to the target user.
Line 101 ⟶ 103:
===Problems===
A collaborative filtering system does not necessarily succeed in automatically matching content to one's preferences. Unless the platform achieves unusually good diversity and independence of opinions, one point of view will always dominate another in a particular community. As in the personalized recommendation scenario, the introduction of new users or new items can cause the [[Cold start (recommender systems)|cold start]] problem, as there will be insufficient data on these new entries for the collaborative filtering to work accurately. In order to make appropriate recommendations for a new user, the system must first learn the user's preferences by analysing past voting or rating activities. The collaborative filtering system requires a substantial
==Challenges==
Line 127 ⟶ 129:
===Diversity and the long tail===
Collaborative filters are expected to increase diversity because they help us discover new products. Some algorithms, however, may unintentionally do the opposite. Because collaborative filters recommend products based on past sales or ratings, they cannot usually recommend products with limited historical data. This can create a rich-get-richer effect for popular products, akin to [[positive feedback]]. This bias toward popularity can prevent what are otherwise better consumer-product matches. A [[Wharton School of the University of Pennsylvania|Wharton]] study details this phenomenon along with several ideas that may promote diversity and the "[[long tail]]."<ref>{{cite journal| last1= Fleder | first1= Daniel | first2= Kartik |last2= Hosanagar | title=Blockbuster Culture's Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity|journal=Management Science | volume= 55 | issue= 5 | pages= 697–712 |date=May 2009|ssrn=955984 | doi = 10.1287/mnsc.1080.0974 | url= http://archive.nyu.edu/handle/2451/28488 }}</ref> Several collaborative filtering algorithms have been developed to promote diversity and the "[[long tail]]"<ref name="castells2015">{{cite book
|author1=
|editor1-last=Ricci|editor1-first=Francesco|editor2-last=Rokach|editor2-first=Lior|editor3-last=Shapira |editor3-first=Bracha
|title=Recommender Systems Handbook|date=2015|publisher=Springer US|isbn=978-1-4899-7637-6|edition=2
|chapter=Novelty and Diversity in Recommender Systems|chapter-url = https://link.springer.com/chapter/10.1007/978-1-4899-7637-6_26|doi=10.1007/978-1-4899-7637-6_26|pages=881–918
}}</ref> by recommending novel,<ref>{{Cite arXiv |last1=Choi |first1=Jeongwhan |last2=Hong |first2=Seoyong |last3=Park |first3=Noseong |last4=Cho |first4=Sung-Bae|title=Blurring-Sharpening Process Models for Collaborative Filtering |year=2022 |class=cs.IR |eprint=2211.09324}}</ref> unexpected,<ref>{{cite journal| last1= Adamopoulos | first1= Panagiotis | first2= Alexander |last2= Tuzhilin | title=On Unexpectedness in Recommender Systems: Or How to Better Expect the Unexpected|journal=ACM Transactions on Intelligent Systems and Technology | volume= 5 | issue= 4 | pages= 1–32 |date=January 2015| doi = 10.1145/2559952| s2cid= 15282396 }}</ref> and serendipitous items.<ref>{{cite book| last1= Adamopoulos | first1= Panagiotis | title=
==Innovations==
* New algorithms have been developed for CF as a result of the [[Netflix prize]].
* Cross-System Collaborative Filtering where user profiles across multiple [[recommender systems]] are combined in a multitask manner; this way, preference pattern sharing is achieved across models
* [[Robust collaborative filtering]], where recommendation is stable towards efforts of manipulation. This research area is still active and not completely solved.<ref>{{cite book|doi=10.1145/1297231.1297240 |publisher=Portal.acm.org |date=19 October 2007 |title=Proceedings of the 2007 ACM conference on Recommender systems
==Auxiliary information==
User-item matrix is a basic foundation of traditional collaborative filtering techniques, and it suffers from data sparsity problem (i.e. [[Cold start (computing)|cold start]]). As a consequence, except for user-item matrix, researchers are trying to gather more auxiliary information to help boost recommendation performance and develop personalized recommender systems.<ref>{{Cite journal|last1=Shi|first1=Yue|last2=Larson|first2=Martha|last3=Hanjalic|first3=Alan|year=2014|title=Collaborative filtering beyond the user-item matrix: A survey of the state of the art and future challenges|journal=ACM Computing Surveys |volume=47|pages=1–45|doi=10.1145/2556270|s2cid=5493334}}</ref> Generally, there are two popular auxiliary information: attribute information and interaction information. Attribute information describes a user's or an item's properties. For example, user attribute might include general profile (e.g. gender and age) and social contacts (e.g. followers or friends in [[social networks]]); Item attribute means properties like category, brand or content. In addition, interaction information refers to the implicit data showing how users interplay with the item. Widely used interaction information contains tags, comments or reviews and browsing history etc. Auxiliary information plays a significant role in a variety of aspects. Explicit social links, as a reliable representative of trust or friendship, is always employed in similarity calculation to find similar persons who share interest with the target user.<ref>{{cite book|last1=Massa|first1=Paolo|last2=Avesani|first2=Paolo|title=Computing with social trust|date=2009|publisher=Springer|location=London|pages=259–285}}</ref><ref>{{cite conference|author1=Groh Georg|author2=Ehmig Christian|title=Recommendations in taste related domains: collaborative filtering vs. social filtering|conference=Proceedings of the 2007 international ACM conference on Supporting group work|pages=127–136|citeseerx=10.1.1.165.3679}}</ref> The interaction-associated information
==See also==
{{div col|colwidth=
* [[Attention Profiling Mark-up Language|Attention Profiling Mark-up Language (APML)]]
* [[Cold start (computing)|Cold start]]
Line 168 ⟶ 170:
==References==
{{Reflist
==External links==
*[http://www.grouplens.org/papers/pdf/rec-sys-overview.pdf ''Beyond Recommender Systems: Helping People Help Each Other''], page 12, 2001
*[http://www.prem-melville.com/publications/recommender-systems-eml2010.pdf Recommender Systems.] Prem Melville and Vikas Sindhwani. In Encyclopedia of Machine Learning, Claude Sammut and Geoffrey Webb (Eds), Springer, 2010.
*[https://arxiv.org/abs/1203.4487 Recommender Systems in industrial contexts
*[http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1423975 Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions]{{dead link|date=May 2020|bot=medic}}{{cbignore|bot=medic}}. Adomavicius, G. and Tuzhilin, A. IEEE Transactions on Knowledge and Data Engineering 06.2005
*[https://web.archive.org/web/20060527214435/http://ectrl.itc.it/home/laboratory/meeting/download/p5-l_herlocker.pdf Evaluating collaborative filtering recommender systems] ([http://www.doi.org/ DOI]: [https://dx.doi.org/10.1145/963770.963772 10.1145/963770.963772])
Line 179 ⟶ 181:
*[http://www.cs.utexas.edu/users/ml/papers/cbcf-aaai-02.pdf Content-Boosted Collaborative Filtering for Improved Recommendations.] Prem Melville, [[Raymond J. Mooney]], and Ramadass Nagarajan. Proceedings of the Eighteenth National Conference on Artificial Intelligence (AAAI-2002), pp. 187–192, Edmonton, Canada, July 2002.
*[http://agents.media.mit.edu/projects.html A collection of past and present "information filtering" projects (including collaborative filtering) at MIT Media Lab]
*[http://www.ieor.berkeley.edu/~goldberg/pubs/eigentaste.pdf Eigentaste: A Constant Time Collaborative Filtering Algorithm. Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. Information Retrieval, 4(2),
*[http://downloads.hindawi.com/journals/aai/2009/421425.pdf A Survey of Collaborative Filtering Techniques] Su, Xiaoyuan and Khoshgortaar, Taghi. M
*[http://dl.acm.org/citation.cfm?id=1242610 Google News Personalization: Scalable Online Collaborative Filtering] Abhinandan Das, Mayur Datar, Ashutosh Garg, and Shyam Rajaram. International World Wide Web Conference, Proceedings of the 16th international conference on World Wide Web
*[https://web.archive.org/web/20101023032716/http://research.yahoo.com/pub/2435 Factor in the Neighbors: Scalable and Accurate Collaborative Filtering] Yehuda Koren, Transactions on Knowledge Discovery from Data (TKDD) (2009)
*[https://web.archive.org/web/20110224164938/http://webpages.uncc.edu/~asaric/ISMIS09.pdf Rating Prediction Using Collaborative Filtering]
*[http://www.cis.upenn.edu/~ungar/CF/ Recommender Systems] {{Webarchive|url=https://web.archive.org/web/20130211033515/http://www.cis.upenn.edu/~ungar/CF/ |date=11 February 2013 }}
*[http://www2.sims.berkeley.edu/resources/collab/ Berkeley Collaborative Filtering]
|