For many real world machine learning problem we see an imbalance in the data where one class under represented in relative to others. This leads to mis-classification of elements between classes. The cost of mis-classification is often unknown at learning time and can be far too high. We often see this type of imbalanced classification scenarios in fraud/intrusion detection, medical diagnosis/monitoring, bio-informatics, text categorization and et al. To better understand the problem, consider the “Mammography Data Set,” a collection of images acquired from a series of mammography examinations performed on a set of distinct patients. For such a data set, the natural classes that arise are “Positive” or “Negative” for an image representative of a “cancerous” or “healthy” patient, respectively. From experience, one would expect the number of noncancerous patients to exceed greatly the number of cancerous patients; indeed, this data set contains 10,923 “Negative” (majority class) and...