This study compares three pretrained deep learning models - BatDetect2, Bioacoustic Transformer (BAT), and Patchout faSt Spectrogram Transformer (PaSST) - for bat call and general audio classification, with and without further training, on a three-class multilabel dataset contaminated with drone noise. Without retraining, BatDetect2 and BAT showed minimal differentiation between noisy and clean datasets. After transfer learning and exploring resampling and augmentation to address class imbalance, the PaSST model with oversampling achieved the best performance, with an Fl-score of 94.9% on binary classification, and micro and macro Fl-scores of 90.6% and 78.5%, respectively, for multilabel classification.