Back

Explore Courses Blog Tutorials Interview Questions
0 votes
1 view
in Machine Learning by (19k points)

My question: How to train a classifier with only positive and neutral data?

I am building a personalized article recommendation system for educational purposes. The data I use is from Instapaper.

Datasets

I only have positive data: - Articles that I have read and "liked", regardless of reading/unread status

And neutral data (because I have expressed interest in it, but I may not like it later anyway): - Articles that are unread - Articles that I have read and marked as read but I did not "like" it

The data I do not have is negative data: - Articles that I did not send to Instapaper to read it later (I am not interested, although I have browsed that page/article) - Articles that I might not even have clicked into, but I might have or might not have archived it.

My problem

In such a problem, negative data is basically missing. I have thought of the following solution(s) but did not resolve to them yet:

1) Feed a number of negative data to the classifier Pros: Immediate negative data to teach the classifier Cons: As the number of articles I like increase, the negative data effect on the classifier dims out

2) Turn the "neutral" data into negative data Pros: Now I have all the positive and (new) negative data I need Cons: Despite the neutral data is of mild interest to me, I'd still like to get recommendations on such article, but perhaps as a less value class.

1 Answer

0 votes
by (33.1k points)

There is a Spy EM algorithm, that might help to solve this problem.

It is a text learning or classification system that learns from a set of positive and unlabeled examples. It is based on a "spy" technique, naive Bayes and EM algorithm.

The basic idea is to merge your positive set with a whole bunch of random documents. You should initially treat all the random documents as the negative class, and implement a naive Bayes classifier on that set. Some documents will actually be positive, and you can relabel any documents that are scored higher than the lowest-scoring held out true positive document. Then you repeat this process until it stabilizes.

Hope this answer helps.

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

28.4k questions

29.7k answers

500 comments

94k users

Browse Categories

...