学术前沿分享 | 王锴博士

中大HCP实验室

个人介绍

王锴

现为新加坡国立大学一年级博士生，本科和硕士分别毕业于北京师范大学珠海分校和中国科学院大学，博士导师为新加坡国立大学校长青年教授尤洋。研究领域包括Resource-Efficient AI, Affective Computing等。现发表论文20篇，其中CCF-A论文包括6篇CVPR，1篇ECCV，一篇AAAI和一篇TIP。曾在华为，阿里达摩院担任研究实习生。在PEER-ImageNet，EmotiW，华为云垃圾分类挑战赛，国家人工智能竞赛（NAIC），ICCV Mask Face Recognition, CVPR Au detection等国际竞赛上获得冠军5次, 亚军2次。获得新加坡最高博士奖学金AI Singapore Ph.D. Fellowship。谷歌学术引用598次，h-index为10。

本次报告给大家带来两篇论文的分享

时间：下周一（3月14日）

第一篇

CVPR-2022 Paper: CAFE: Learning to Condense Dataset by Aligning Feat

Abstract

Dataset condensation aims at reducing the network training effort through condensing a cumbersome training set into a compact synthetic one. State-of-the-art approaches largely rely on learning the synthetic data by matching the gradients between the real and synthetic data batches. Despite the intuitive motivation and promising results, such gradient-based methods, by nature, easily overfit to a biased set of samples that produce dominant gradients, and thus lack a global supervision of data distribution. In this paper, we propose a novel scheme to Condense dataset by Aligning FEatures (CAFE), which explicitly attempts to preserve the real-feature distribution as well as the discriminant power of the resulting synthetic set, lending itself to strong generalization capability to various architectures. At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales, while accounting for the classification of real samples. Our scheme is further backed up by a novel dynamic bi-level optimization, which adaptively adjusts parameter updates to prevent over-/under-fitting. We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art: on the SVHN dataset, for example, the performance gain is up to 11%. Extensive experiments and analysis verify the effectiveness and necessity of proposed designs.

第二篇

ICML-2022 Manuscript: Reliable Label Correction is a Good Booster When Learning with Extremely Noisy Labels

Abstract

Learning with noisy labels has aroused much research interest since data annotations, especially for large-scale datasets, may be inevitably imperfect. Recent approaches resort to a semi-supervised learning problem by dividing training samples into clean and noisy sets. This paradigm, however, is prone to significant degeneration under heavy label noise, as the number of clean samples is too small for conventional methods to behave well. In this paper, we introduce a novel framework, termed as LC-Booster, to explicitly tackle learning under extreme noise. The core idea of LC-Booster is to incorporate label correction into the sample selection, so that more purified samples, through the reliable label correction, can be utilized for training, thereby alleviating the confirmation bias. Experiments show that LC-Booster advances state-of-the-art results on several noisy-label benchmarks, including CIFAR-10, CIFAR-100, Clothing1M and WebVision. Remarkably, under the extreme 90% noise ratio, LC-Booster achieves 93.5% and 48.4% accuracy on CIFAR-10 and CIFAR-100, surpassing the state-of-the-art by 1.6% and 7.2% respectively.

继续滑动看下一个