Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Data Science by (18.4k points)

Consider the regression model on any dataset. I have a dataset named ‘a’ which consists of 10k rows, 10 features, and a target variable on which train and test model is applied. Now consider another dataset named ‘b’ with 100k rows but it doesn’t have any target variable. I wish to predict and build a model using the trained dataset of ‘a’, but I do not understand whether the dataset ‘b’ also follows a similar distribution as dataset ‘a’. Even when I train my model with the regression problem, my concern is about the confidence value, weather the predicted dataset ‘b’, is good enough or not.

I am aware of the Ks test and Earth Mover’s distance, but they only compare individual features but not an entire dataset.

1 Answer

0 votes
by (36.8k points)
edited by

The important point here is to understand what you wanted to solve and why?

As you have no target variable in dataset ‘b’ you can use unsupervised learning like clustering, which creates two or more different clusters according to your requirement. These clusters will be labeled by the model as cluster identifiers. Another way is to do it manually, or you can classify, based on its patterns across the dataset. Later, you can automate the task. Once it is done, you can predict it on the dataset ‘b’ using the trained dataset ‘a’.

To learn and gain more knowledge about Data Science through online and get Data Science Certification.

31k questions

32.9k answers

507 comments

693 users

...