In the real-world, dataset collection is loosely controlled, noisy, unreliable, redundant, and incomplete. This makes data pre-processing an integral stage in the machine learning pipeline. In our previous blogs, we introduced how to improve data quality and how to handle imbalanced data. …