Consider the regression model on any dataset. I have a dataset named ‘a’ which consists of 10k rows, 10 features, and a target variable on which train and test model is applied. Now consider another dataset named ‘b’ with 100k rows but it doesn’t have any target variable. I wish to predict and build a model using the trained dataset of ‘a’, but I do not understand whether the dataset ‘b’ also follows a similar distribution as dataset ‘a’. Even when I train my model with the regression problem, my concern is about the confidence value, weather the predicted dataset ‘b’, is good enough or not.
I am aware of the Ks test and Earth Mover’s distance, but they only compare individual features but not an entire dataset.