Universität Rostock, 2021
Abstract: Imbalanced datasets for classification problems, characterised by unequal distribution of samples, are abundant in practical scenarios. Oversampling algorithms generate synthetic data to enrich classification performance for such datasets. In this thesis, I discuss two algorithms LoRAS & ProWRAS, improving on the state-of-the-art as shown through rigorous benchmarking on publicly available datasets. A biological application for detection of rare cell-types from single-cell transcriptomics data is also discussed. The thesis also provides a better theoretical understanding behind oversampling.
doctoral thesis free access