Let's suppose our training data set is represented by T and the data set has M number of features
T = {(X1,y1), (X2,y2), ... (Xn, yn)}
and
Xi is input vector {xi1, xi2, ... xiM}
Here, yi is the actual label.
Random Forests algorithm is a classifier based on primarily two methods:
Bagging
Random subspace method.
If we take S number of trees in our random forest algorithm. Then we first create S datasets of "the same size as original" created from random resampling of data in T with-replacement (n times for each dataset). This will result in {T1, T2, ... TS} datasets. Each of them is called a bootstrap dataset.
Due to the "with-replacement" parameter, every dataset Ti can have duplicate data records and Ti can be missing several data records from original datasets. This is called Bootstrapping.
Bagging is the process of taking bootstraps & then aggregating the models learned on each bootstrap.
Random Forest creates an S number of trees and uses m (=sqrt(M) or =floor(lnM+1)) random subfeatures out of M possible features to build any tree. This is called a random subspace method.
So for each Ti bootstrap dataset, you create a tree, Ki. You can classify some input data D = {x1, x2, ..., xM} you can let it pass through each tree and produce S outputs which can be denoted by Y = {y1, y2, ..., ys}. The final prediction is a majority vote on this set.
Out-of-bag error:
After building the classifiers (S trees), for each (Xi,yi) in the original training set i.e. T, select all Tk which does not include (Xi,yi). This subset, pay attention, is a set of bootstrap datasets which do not contain a particular record from the original dataset. This set is called out-of-bag examples. There are n such subsets (one for each data record in original dataset T). OOB classifier is the aggregation of votes ONLY over Tk such that it does not contain (xi,yi).
The out-of-bag estimate for the generalization error is the error rate of the out-of-bag classifier on the training set (compare it with known yi's).
The study of error estimates for bagged classifiers gives empirical evidence to show that the out-of-bag estimate is as accurate as using a test set of the same size as the training set. Therefore, using the out-of-bag error estimate removes the need for a set-aside test set.
Hope this answer helps.