Machine Learning from Weak Supervision

An Empirical Risk Minimization Approach

Look inside
Fundamental theory and practical algorithms of weakly supervised classification, emphasizing an approach based on empirical risk minimization.

Standard machine learning techniques require large amounts of labeled data to work well. When we apply machine learning to problems in the physical world, however, it is extremely difficult to collect such quantities of labeled data. In this book Masashi Sugiyama, Han Bao, Takashi Ishida, Nan Lu, Tomoya Sakai and Gang Niu present theory and algorithms for weakly supervised learning, a paradigm of machine learning from weakly labeled data. Emphasizing an approach based on empirical risk minimization and drawing on state-of-the-art research in weakly supervised learning, the book provides both the fundamentals of the field and the advanced mathematical theories underlying them. It can be used as a reference for practitioners and researchers and in the classroom.

The book first mathematically formulates classification problems, defines common notations, and reviews various algorithms for supervised binary and multiclass classification. It then explores problems of binary weakly supervised classification, including positive-unlabeled (PU) classification, positive-negative-unlabeled (PNU) classification, and unlabeled-unlabeled (UU) classification. It then turns to multiclass classification, discussing complementary-label (CL) classification and partial-label (PL) classification. Finally, the book addresses more advanced issues, including a family of correction methods to improve the generalization performance of weakly supervised learning and the problem of class-prior estimation.
Preface xiii
I Machine Learning from Weak Supervision
1 Introduction 3
2 Formulation and Notation 21
3 Supervised Classification 35
II Weakly Supervised Learning for Binary Classification
4 Positive-Unlabeled (PU) Classification 67
5 Positive-Negative-Unlabeled (PNU) Classification 85
6 Positive-Confidence (Pconf) Classification 111
7 Pairwise-Constraint Classification 127
8 Unlabeled-Unlabeled (UU) Classification 149
III Weakly Supervised Learning for Multi-class Classification
9 Complementary-Label Classification 177
10 Partial-Label Classification 193
IV Advanced Topics and Perspectives
11 Non-Negative Correction for Weakly Supervised Classification 207
12 Class-Prior Estimation 239
13 Conclusions and Prospects 275
Notes 279
Bibliography 283
Index 293
Masashi Sugiyama is Director of the RIKEN Center for Advanced Intelligence Project and Professor of Computer Science at the University of Tokyo. Han Bao is a PhD student in the Department of Computer Science at the University of Tokyo and Research Assistant at the RIKEN Center for Advanced Intelligence Project. Takashi Ishida is a Lecturer at the University of Tokyo and Visiting Scientist at the RIKEN Center for Advanced Intelligence Project. Nan Lu is a PhD student in the Department of Complexity Science and Engineering at the University of Tokyo and Research Assistant at the RIKEN Center for Advanced Intelligence Project. Tomoya Sakai is Senior Researcher at NEC Corporation and Visiting Scientist at the RIKEN Center for Advanced Intelligence Project. Gang Niu is Research Scientist in the Imperfect Information Learning Team at the RIKEN Center for Advanced Intelligence Project.

About

Fundamental theory and practical algorithms of weakly supervised classification, emphasizing an approach based on empirical risk minimization.

Standard machine learning techniques require large amounts of labeled data to work well. When we apply machine learning to problems in the physical world, however, it is extremely difficult to collect such quantities of labeled data. In this book Masashi Sugiyama, Han Bao, Takashi Ishida, Nan Lu, Tomoya Sakai and Gang Niu present theory and algorithms for weakly supervised learning, a paradigm of machine learning from weakly labeled data. Emphasizing an approach based on empirical risk minimization and drawing on state-of-the-art research in weakly supervised learning, the book provides both the fundamentals of the field and the advanced mathematical theories underlying them. It can be used as a reference for practitioners and researchers and in the classroom.

The book first mathematically formulates classification problems, defines common notations, and reviews various algorithms for supervised binary and multiclass classification. It then explores problems of binary weakly supervised classification, including positive-unlabeled (PU) classification, positive-negative-unlabeled (PNU) classification, and unlabeled-unlabeled (UU) classification. It then turns to multiclass classification, discussing complementary-label (CL) classification and partial-label (PL) classification. Finally, the book addresses more advanced issues, including a family of correction methods to improve the generalization performance of weakly supervised learning and the problem of class-prior estimation.

Table of Contents

Preface xiii
I Machine Learning from Weak Supervision
1 Introduction 3
2 Formulation and Notation 21
3 Supervised Classification 35
II Weakly Supervised Learning for Binary Classification
4 Positive-Unlabeled (PU) Classification 67
5 Positive-Negative-Unlabeled (PNU) Classification 85
6 Positive-Confidence (Pconf) Classification 111
7 Pairwise-Constraint Classification 127
8 Unlabeled-Unlabeled (UU) Classification 149
III Weakly Supervised Learning for Multi-class Classification
9 Complementary-Label Classification 177
10 Partial-Label Classification 193
IV Advanced Topics and Perspectives
11 Non-Negative Correction for Weakly Supervised Classification 207
12 Class-Prior Estimation 239
13 Conclusions and Prospects 275
Notes 279
Bibliography 283
Index 293

Author

Masashi Sugiyama is Director of the RIKEN Center for Advanced Intelligence Project and Professor of Computer Science at the University of Tokyo. Han Bao is a PhD student in the Department of Computer Science at the University of Tokyo and Research Assistant at the RIKEN Center for Advanced Intelligence Project. Takashi Ishida is a Lecturer at the University of Tokyo and Visiting Scientist at the RIKEN Center for Advanced Intelligence Project. Nan Lu is a PhD student in the Department of Complexity Science and Engineering at the University of Tokyo and Research Assistant at the RIKEN Center for Advanced Intelligence Project. Tomoya Sakai is Senior Researcher at NEC Corporation and Visiting Scientist at the RIKEN Center for Advanced Intelligence Project. Gang Niu is Research Scientist in the Imperfect Information Learning Team at the RIKEN Center for Advanced Intelligence Project.