NeurIPS Meetup Japan 2021

December 13–14, 2021 | Online

An online meetup for students, researchers, and engineers in machine learning, artificial intelligence, and related disciplines to share ideas and learn from each other.



Welcome to NeurIPS Meetup Japan 2021! NeurIPS is a flagship conference in machine learning and more general artificial intelligence. This virtual NeurIPS meetup aims to bring together students, researchers, and engineers who are related to or interested in these topics for an opportunity to connect, exchange ideas, and learn from each other. Underrepresented minorities and undergraduates interested in machine learning are encouraged to attend.

This meetup is going to be held on December 13th (10:00–16:30) and 14th (09:35–16:30), 2021 (Japan Standard Time/UTC+9). The official languages for this meetup are English and Japanese.

For this year’s meetup, we will

  • stream NeurIPS keynote talks, as well as talks by our invited speakers and authors from institutions in Japan;
  • provide a social venue for researchers and practitioners to meet fellow participants and build local communities;
  • host a mentoring session for participants to discuss current research trends and career choices with our invited mentors from industry and academia;
  • run a panel discussion to discuss the applications and futures of machine learning;
  • organize a women in machine learning event for women researchers and engineers to interact with each other, find support, and build more technical confidence.

Please join us!

NeurIPS Meetup Japan 2021へようこそ! NeurIPSは機械学習及び人工知能の分野における最重要国際会議の一つです. NeurIPSのオンラインミートアップとして,機械学習・人工知能分野に関連する学生,研究者,開発者をつなぎ,交流の機会を提供することを目指します. マイノリティグループの皆様や機械学習に関心のある学部生の積極的なご参加もお待ちしております.



  • 講演:NeurIPS本会議の招待講演の配信,本ミートアップの招待講演,日本の研究機関・企業等から採択された著者による講演を開催します.
  • 交流会:研究者や開発者の皆様の交流会を開催します.ぜひ交流の輪を広げるためにご活用ください.
  • メンターセッション:企業やアカデミアからメンターを招待し,メンターとともに,参加者は最近の研究動向や人生キャリアについてディスカッションを行います.
  • パネルディスカッション:機械学習の応用や将来展望についてディスカッションするイベントを開催します.
  • 機械学習の女性コミュニティ(WiML):女性研究者や女性エンジニアがお互いに交流,議論,相談し合えるためのイベントを開催します.



The registration is free but mandatory. Please register before December 12 to participate in NeurIPS Meetup Japan 2021.
We will send you a Zoom link via email after the registration.
Please read and follow Code of Conduct.

参加にあたってはCode of Conductの遵守をお願いします.

If you have any questions, please contact us at


All dates and times are in Japan Standard Time (JST/UTC+9).


Reliability is a fundamental concern when implementing machine learning models to real-world problems. One typical concern regarding reliability is the black-box nature of complex machine learning models, which hinders us from understanding how the models make decisions. In the last few years, several methods for explaining the models’ decisions are developed aiming at overcoming this black-box problem. However, recent studies revealed that these explanation methods themselves can be not fully reliable. In this talk, I introduce some topics on reliability issues on the existing explanation methods, both from technical and social perspectives.


Satoshi Hara is an associate professor at SANKEN of Osaka University, Osaka, Japan. His current research interest includes interpretable and explainable machine learning and the applications of these technologies to scientific and industrial problems. Before joining Osaka University, he worked as a researcher at IBM Research Tokyo (2013-2016), and as a project researcher at National Institute of Informatics (2016-2017) in Tokyo, Japan. He received the PhD degree in engineering from Osaka University in 2013.




Tarin Clanuwat, 博士(文学)。中世の『源氏物語』古注釈専門。現在Google Brain、Research Scientist。2018年にKuroNetくずし字認識モデル開発。2019年にKaggleくずし字認識コンペのホスト。2021年にAIくずし字認識「みを」アプリをリリース。情報処理学会人文科学とコンピュータシンポジウム最優秀論文賞、情報処理学会山下記念研究賞、デジタルアーカイブ学会学術賞など受賞。

Topic: Applications and Futures of Machine Learning


  • Takashi Ishida 石田 隆 (The University of Tokyo 東京大学)


  • Satoshi Hara 原 聡 (Osaka University 大阪大学)
  • Hongyi Ding (Rakuten 楽天)
  • Nontawat Charoenphakdee (Preferred Networks, Inc.)
  • Kenny Song (Citadel AI Inc.)

[OpenReview] [Conference] [Proceeding]

Bowen Zhang, Yidong Wang, Wenxin Hou, Hao Wu, Jindong Wang, Manabu Okumura, Takahiro Shinozaki

The recently proposed FixMatch achieved state-of-the-art results on most semi-supervised learning (SSL) benchmarks. However, like other modern SSL algorithms, FixMatch uses a pre-defined constant threshold for all classes to select unlabeled data that contribute to the training, thus failing to consider different learning status and learning difficulties of different classes. To address this issue, we propose Curriculum Pseudo Labeling (CPL), a curriculum learning approach to leverage unlabeled data according to the model’s learning status. The core of CPL is to flexibly adjust thresholds for different classes at each time step to let pass informative unlabeled data and their pseudo labels. CPL does not introduce additional parameters or computations (forward or backward propagation). We apply CPL to FixMatch and call our improved algorithm FlexMatch. FlexMatch achieves state-of-the-art performance on a variety of SSL benchmarks, with especially strong performances when the labeled data are extremely limited or when the task is challenging. For example, FlexMatch outperforms FixMatch by 14.32% and 24.55% on CIFAR-100 and STL-10 datasets respectively, when there are only 4 labels per class. CPL also significantly boosts the convergence speed, e.g., FlexMatch can use only 1/5 training time of FixMatch to achieve even better performance. Furthermore, we show that CPL can be easily adapted to other SSL algorithms and remarkably improve their performances. We open-source our code at

[OpenReview] [Conference] [Proceeding]

Yihan Du, Yuko Kuroki, Wei Chen

In this paper, we study the Combinatorial Pure Exploration problem with the Bottleneck reward function (CPE-B) under the fixed-confidence (FC) and fixed-budget (FB) settings. In CPE-B, given a set of base arms and a collection of subsets of base arms (super arms) following a certain combinatorial constraint, a learner sequentially plays a base arm and observes its random reward, with the objective of finding the optimal super arm with the maximum bottleneck value, defined as the minimum expected reward of the base arms contained in the super arm. CPE-B captures a variety of practical scenarios such as network routing in communication networks, and its unique challenges fall on how to utilize the bottleneck property to save samples and achieve the statistical optimality. None of the existing CPE studies (most of them assume linear rewards) can be adapted to solve such challenges, and thus we develop brand-new techniques to handle them. For the FC setting, we propose novel algorithms with optimal sample complexity for a broad family of instances and establish a matching lower bound to demonstrate the optimality (within a logarithmic factor). For the FB setting, we design an algorithm which achieves the state-of-the-art error probability guarantee and is the first to run efficiently on fixed-budget path instances, compared to existing CPE algorithms. Our experimental results on the top-k, path and matching instances validate the empirical superiority of the proposed algorithms over their baselines.

[OpenReview] [Conference] [Proceeding]

Toru Hishinuma, Kei Senda

This paper discusses model estimation in offline model-based reinforcement learning (MBRL), which is important for subsequent policy improvement using an estimated model. From the viewpoint of covariate shift, a natural idea is model estimation weighted by the ratio of the state-action distributions of offline data and real future data. However, estimating such a natural weight is one of the main challenges for off-policy evaluation, which is not easy to use. As an artificial alternative, this paper considers weighting with the state-action distribution ratio of offline data and simulated future data, which can be estimated relatively easily by standard density ratio estimation techniques for supervised learning. Based on the artificial weight, this paper defines a loss function for offline MBRL and presents an algorithm to optimize it. Weighting with the artificial weight is justified as evaluating an upper bound of the policy evaluation error. Numerical experiments demonstrate the effectiveness of weighting with the artificial weight.

[OpenReview] [Conference] [Proceeding]

Futoshi Futami, Tomoharu Iwata, Naonori Ueda, Issei Sato, Masashi Sugiyama

Bayesian model averaging, obtained as the expectation of a likelihood function by a posterior distribution, has been widely used for prediction, evaluation of uncertainty, and model selection. Various approaches have been developed to efficiently capture the information in the posterior distribution; one such approach is the optimization of a set of models simultaneously with interaction to ensure the diversity of the individual models in the same way as ensemble learning. A representative approach is particle variational inference (PVI), which uses an ensemble of models as an empirical approximation for the posterior distribution. PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models. However, despite its promising performance, a theoretical understanding of this repulsion and its association with the generalization ability remains unclear. In this paper, we tackle this problem in light of PAC-Bayesian analysis. First, we provide a new second-order Jensen inequality, which has the repulsion term based on the loss function. Thanks to the repulsion term, it is tighter than the standard Jensen inequality. Then, we derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models. Finally, we derive a new PVI that optimizes the generalization error bound directly. Numerical experiments demonstrate that the performance of the proposed PVI compares favorably with existing methods in the experiment.

The WiML session aims to help women and gender minority individuals working in data science or machine learning build their technical confidence and their voice, and create opportunities for participants to engage in professional conversations in a positive, supportive environment. We invite guests from both academia and industry to talk about their experiences and hope more people can join and share your stories and get to hear others’ stories. While all guest speakers will identify as a woman, people of all gender identities are invited to attend.


  • Nan Lu (The University of Tokyo 東京大学)


  • Yuka Hashimoto 橋本 悠香 (NTT Network Service Systems Laboratories NTTネットワークサービスシステム研究所)
  • Zhengyan Gao (Preferred Networks, Inc.)
  • Ayaka Kume 久米 絢佳 (Preferred Networks, Inc.)
  • Yuko Kuroki 黒木 祐子 (The University of Tokyo 東京大学)
  • Xiaodan Shi (The University of Tokyo 東京大学)
  • Ayana Niwa 丹羽彩奈 (Tokyo Institute of Technology 東京工業大学)


Two-sample tests ask, “given samples from each, are these two populations the same?” For instance, one might wish to know whether a treatment and control group differ. With very low-dimensional data and/or strong parametric assumptions, methods such as t-tests or Kolmogorov-Smirnov tests are widespread. Recent work in statistics and machine learning has sought tests that cover situations not well-handled by these classic methods, providing tools useful in machine learning for domain adaptation, causal discovery, generative modeling, fairness, adversarial learning, and more. In this talk, I will introduce two advances in the two-sample testing field: Two-sample testing under high dimensionality and few observations. I also present how to use advanced two-sample tests to defend against the adversarial attacks, which justified the significance of two-sample testing in the AI security area.


Dr. Feng Liu is a machine learning researcher with research interests in hypothesis testing and trustworthy machine learning. His long-term goal is to develop trustworthy intelligent systems that can learn reliable knowledge from massive related but different domains automatically. Currently, he is a Lecturer at the Australian Artificial Intelligence Institute (AAII), University of Technology Sydney (UTS), Australia, and a Visiting Scientist at RIKEN-AIP, Japan. He was the recipient of Australian Laureate postdoctoral fellowship and received his Ph.D. degree in computer science at UTS-AAII in 2020, advised by Dist. Prof. Jie Lu and Prof. Guangquan Zhang.

He was a research intern with the AI Residency Program at RIKEN Center for Advanced Intelligence Project (RIKEN-AIP), working on the trustworthy domain adaptation project with Prof. Masashi Sugiyama, Dr. Gang Niu, and Dr. Bo Han. He visited Gatsby Computational Neuroscience Unit at UCL and worked on the hypothesis testing project with Prof. Arthur Gretton, Dr. Danica J. Sutherland and Wenkai Xu.

He has served as program committee (PC) members for NeurIPS, ICML, ICLR, AISTATS, ACML. He also serves as a reviewer for many academic journals, such as IEEE-TPAMI, IEEE-TNNLS, IEEE-TFS, and AMM. He has received the Outstanding Reviewer Award of NeurIPS (2021), the Outstanding Reviewer Award of ICLR (2021), the UTS-FEIT HDR Research Excellence Award (2019), the Best Student Paper Award of FUZZ-IEEE (2019) and the UTS Research Publication Award (2018).


Since its very inception [1], AI & ML have been relying on geometry and computational geometric methods thereof (e.g., MLPs, kernel SVMs). In this talk, I will present a few geometric structures [3,4] and discuss the statistical invariance principle of information geometry [2] with some applications, opportunities and challenges for ML [5].


[1] Minsky and Papert, Perceptrons: An introduction to computational geometry, MIT press, 2017. (first edition 1969)
[2] An Elementary Introduction to Information Geometry, Entropy, 2020.
[3] An information-geometric characterization of Chernoff information, IEEE Signal Processing Letters 20.3 (2013): 269-272.
[4] Clustering in Hilbert simplex geometry (with Sun), Springer 2019.
[5] Relative Fisher information and natural gradient for learning large modular models (with Sun (first author). ICML 2017.
[6] Tractable structured natural-gradient descent using local parameterizations (with Lin (first author), Khan, and Schmidt), ICML 2021.


Frank Nielsen was awarded his PhD on adaptive computational geometry (1996) from INRIA/University of Cote d’Azur (France). He is a fellow of Sony Computer Science Laboratories Inc. (Sony CSL, Tokyo) where he currently conducts research on the fundamentals and practice of geometric machine learning and intelligence. He taught at Ecole Polytechnique (France) visual computing (Charles River Media textbook, 2005) and high-performance computing for data science (Springer UTICS textbook, 2016), and currently serves the following journals: Springer Information Geometry, MDPI Entropy, and IEEE Transactions on Information Theory. Frank Nielsen co-organizes with Frederic Barbaresco the biannual conference Geometric Science of Information (GSI LNCS proceedings).

The following two sessions will be held simultaneously.

Session 1: Machine Learning in Industry


  • Hikari Amano 天野 光 (MathWorks Japan)


  • Takashi Nakano 中野 嵩士 (Acute Medical Innovation AMI株式会社)
  • Mengyuan Shi (Indeed Tokyo)


  1. Can you show us some interesting industrial applications of machine learning?
  2. I don’t have a machine learning background in school. Can I find an internship or a job of machine learning?
    私は学校で機械学習の勉強をしていません. 機械学習のインターンシップや仕事を見つけることはできますか?
  3. I’m a software engineer. What should I know before starting machine learning?
    私はソフトウェアエンジニアです. 機械学習を始める前に知っておくべきことは何ですか?
  4. Do you prefer state-of-the-art models or widely used models in deployment?
  5. In what situations we should use machine learning, and in what situations we don’t need to, better not, or should not use it?
  6. How do you balance work and life, especially during the current COVID-19 situation?

Feel free to ask any questions!

Session 2: Machine Learning in Academia


  • Yivan Zhang (The University of Tokyo 東京大学)


  • Feng Liu (University of Technology Sydney)
  • Kaito Fujii 藤井 海斗 (National Institute of Informatics 国立情報学研究所)
  • Yoshihiro Nagano 長野 祥大 (The University of Tokyo 東京大学)


  1. What do you think is good research?
  2. How do you find an appropriate research topic?
  3. How do you plan for the long term during your 3-5 years of Ph.D. student life?
  4. How do you choose between impactful research and incremental contribution?
  5. Where do you find good sources of information?
  6. Do you have any tips for daily research life?

Feel free to ask any questions!

[OpenReview] [Conference] [Proceeding]

Yujin Tang, David Ha

In complex systems, we often observe complex global behavior emerge from a collection of agents interacting with each other in their environment, with each individual agent acting only on locally available information, without knowing the full picture. Such systems have inspired development of artificial intelligence algorithms in areas such as swarm optimization and cellular automata. Motivated by the emergence of collective behavior from complex cellular systems, we build systems that feed each sensory input from the environment into distinct, but identical neural networks, each with no fixed relationship with one another. We show that these sensory networks can be trained to integrate information received locally, and through communication via an attention mechanism, can collectively produce a globally coherent policy. Moreover, the system can still perform its task even if the ordering of its inputs is randomly permuted several times during an episode. These permutation invariant systems also display useful robustness and generalization properties that are broadly applicable. Interactive demo and videos of our results:

[OpenReview] [Conference] [Proceeding]

Takanori Maehara, Hoang NT

Theoretical analyses for graph learning methods often assume a complete observation of the input graph. Such an assumption might not be useful for handling any-size graphs due to the scalability issues in practice. In this work, we develop a theoretical framework for graph classification problems in the partial observation setting (i.e., subgraph samplings). Equipped with insights from graph limit theory, we propose a new graph classification model that works on a randomly sampled subgraph and a novel topology to characterize the representability of the model. Our theoretical framework contributes a theoretical validation of mini-batch learning on graphs and leads to new learning-theoretic results on generalization bounds as well as size-generalizability without assumptions on the input.

[OpenReview] [Conference] [Proceeding]

Kento Nozawa, Issei Sato

Instance discriminative self-supervised representation learning has been attracted attention thanks to its unsupervised nature and informative feature representation for downstream tasks. In practice, it commonly uses a larger number of negative samples than the number of supervised classes. However, there is an inconsistency in the existing analysis; theoretically, a large number of negative samples degrade classification performance on a downstream supervised task, while empirically, they improve the performance. We provide a novel framework to analyze this empirical result regarding negative samples using the coupon collector’s problem. Our bound can implicitly incorporate the supervised loss of the downstream task in the self-supervised loss by increasing the number of negative samples. We confirm that our proposed analysis holds on real-world benchmark datasets.

[OpenReview] [Conference] [Proceeding]

Hiroki Furuta, Tadashi Kozuno, Tatsuya Matsushima, Yutaka Matsuo, Shixiang (Shane) Gu

Recently many algorithms were devised for reinforcement learning (RL) with function approximation. While they have clear algorithmic distinctions, they also have many implementation differences that are algorithm-independent and sometimes under-emphasized. Such mixing of algorithmic novelty and implementation craftsmanship makes rigorous analyses of the sources of performance improvements across algorithms difficult. In this work, we focus on a series of off-policy inference-based actor-critic algorithms – MPO, AWR, and SAC – to decouple their algorithmic innovations and implementation decisions. We present unified derivations through a single control-as-inference objective, where we can categorize each algorithm as based on either Expectation-Maximization (EM) or direct Kullback-Leibler (KL) divergence minimization and treat the rest of specifications as implementation details. We performed extensive ablation studies, and identified substantial performance drops whenever implementation details are mismatched for algorithmic choices. These results show which implementation or code details are co-adapted and co-evolved with algorithms, and which are transferable across algorithms: as examples, we identified that tanh Gaussian policy and network sizes are highly adapted to algorithmic types, while layer normalization and ELU are critical for MPO’s performances but also transfer to noticeable gains in SAC. We hope our work can inspire future work to further demystify sources of performance improvements across multiple algorithms and allow researchers to build on one another’s both algorithmic and implementational innovations.

Our social session will be held in Gather. The Gather link will be sent to you via email after your registration.
If you have any questions, please contact us at


Organizing Committee
Nan LuThe University of Tokyo 東京大学
Yivan ZhangThe University of Tokyo 東京大学, RIKEN AIP 理研AIP
Takashi Ishida 石田 隆The University of Tokyo 東京大学, RIKEN AIP 理研AIP
Hoang NTTokyo Institute of Technology 東京工業大学, RIKEN AIP 理研AIP
Kai ArulkumaranAraya
Suzana IlicMLTokyo, Hugging Face
Hikari Amano 天野 光MathWorks Japan
Satoko Shimada 島田 聡子MathWorks Japan
Advisory Committee
Masashi Sugiyama 杉山 将RIKEN AIP 理研AIP, The University of Tokyo 東京大学

Previous Meetups