As discussed in my previous blog,
Imbalanced data poses serious challenges in Machine Learning. One of approach to combat this imbalance is data is to alter the training set in such a way as to create a more balanced class distribution so that the resulting sampled data set can be used with traditional data-mining algorithms. This can be achieved through...
Under-sample where the size of the majority class is reduced using different techniques like reducing redundancy, removing boundary candidates etc.,Over-sample where the size of the minority class is increased by adding more candidates which can augment the data set.Hybrid approach where a combination of both oversampling of minority class and under sampling of majority class is attempted.
Each of these techniques discussed below
Random Over Sampling
In random over-sampling, the minority class instances are duplicated in the data set until a more balanced distribution is reached. As a illustration,
consider a data set of 100 it…