Selected Research Topics
Learning with noisy labels
Learning with noisy labels becomes a more and more important topic recently. The reason is that, in the era of big data, datasets are becoming larger and larger. Often, largescale datasets are infeasible to be annotated accurately due to the cost and time, which naturally brings us cheap datasets with noisy labels. However, the noisy dataset can severally degenerate the performance of machine learning models, especially for the deep neural networks, as they easily memorize and eventually fit label noise. Normally, there are two ways to deal with label noise. One is to extract confident examples, whose labels are correct with a high probability. Another one is to model the noise and then get rid of the sideeffect of label noise, i.e., obtain the optimal classifier defined by the clean data by exploiting the noisy data.
Relevant Work/Publications:
Domain adaptation and transfer learning
Just like human, machine can also find the common knowledge between tasks and transfer the knowledge from one task to another one. In machine learning, we can exploit training examples drawn from some related tasks (source domains) to improve the performance on the target task (target domain). This relates two terms in machine learning, i.e., domain adaptation and transfer learning. Domain adaptation refers to how to reduce the difference between the distributions of source and target domain data. Transfer learning refers to how to extract knowledge from source tasks and apply it to improve the learning performance of a target task. We are interested in studying the domain adaptation and transfer learning problems from a causal perspective.
Relevant Work/Publications:
Robust/Adversarial learning
We are also interested in how to reduce the side effect of noise on the instance, which may be caused by the failure of sensors or even malicious attacks. We human have the ability to correctly recognise the objects even there are noise (e.g., we can easily recognise human faces under extreme illumination conditions, when partially occluded, or even with heavy makeup); while current machine learning algorithms may not. Recent studies also show that an imperceptible noise on the instance will lead machines to make wrong decisions. All those mean that we human and machines are using different feature extraction mechanisms for making decisions. What are the differences? And how to align them? Answering those questions is very important to build robust and trustworthy machine learning algorithms.
Relevant Work/Publications:
Towards defending against adversarial examples via attackinvariant features [ICML’20]
Efficient gradient approximation for black boxes [ICML’20]
Understanding adversarial attacks via maximum mean discrepancy [ICML’21]
Learning diversestructured networks for adversarial robustness [ICML’20]
Robust nonnegative matrix factorisation algorithms [TNNLS’17, AAAI’20]
Compare the robustness of different loss functions [ICIST’14, TPAMI’19]
Statistical (deep) learning theory
Deep learning algorithms have given exciting performances, e.g., painting pictures, beating Go champions, and autonomously driving cars, among others, showing that they have very good generalisation abilities (small differences between training and test errors). These empirical achievements have astounded yet confounded their human creators. Why do deep learning algorithms generalise so well on unseen data? It lacks mathematical elegance. We do not know the underlying principles that guarantee its success. Let alone to interpret or pertinently strengthen its generalisation ability. We are interested in analysing error bounds, e.g., generalisation error bound and excess risk bound, by measuring the complexity of the predefined (or algorithmic) hypothesis class. An algorithmic hypothesis class is a subset of the predefined hypothesis class that a learning algorithm will (or is likely to) output.
Relevant Work/Publications:
The relationship between algorithmic stability and algorithmic hypothesis complexity [ICML’17]
Control batch size and learning rate to generalize well [NeurIPS’19]
On the Rates of Convergence from Surrogate Risk Minimizers to the Bayes Optimal Classifier [TNNLS’accepted]
Understanding the generalisation of ResNet [TNNLS’20]
Understanding the generalisation of orthogonal deep neural networks [TPAMI’accepted]
Understanding the generalisation of multitask learning [TPAMI’17]
Understanding how feature structure transfers in transfer learning [IJCAI’17]
Understanding the generalisation of nonnegative matrix factorisation [NECO’16, TNNLS’16, TNNLS’17]
