We are interested in developing a safe semi-supervised learning that works in any situation. Semi-supervised learning postulates that n ' unlabeled data are available in addition to n labeled data. However, almost all of the previous semi-supervised methods require additional assumptions (not only unlabeled data) to make improvements on supervised learning. If such assumptions are not met, then the methods possibly perform worse than supervised learning. Sokolovska, Cappé, and Yvon (2008) proposed a semi-supervised method based on a weighted likelihood approach. They proved that this method asymptotically never performs worse than supervised learning (i.e.,it is safe) without any assumption. Their method is attractive because it is easy to implement and is potentially general. Moreover, it is deeply related to a certain statistical paradox. However, the method of Sokolovska etal. (2008) assumes a very limited situation, i.e.,classification, discrete covariates, n ' → ∞ and a maximum likelihood estimator. In this paper, we extend their method by modifying the weight. We prove that our proposal is safe in a significantly wide range of situations as long as n ≤ n '. Further, we give a geometrical interpretation of the proof of safety through the relationship with the above-mentioned statistical paradox. Finally, we show that the above proposal is asymptotically safe even when n ' < n by modifying the weight. Numerical experiments illustrate the performance of these methods.
All Science Journal Classification (ASJC) codes
- Cognitive Neuroscience
- Artificial Intelligence