ブログ – ページ 2 – Shimodaira Lab

2022-03-242022-03-24

言語処理学会第28回年次大会(NLP2022)において若手奨励賞を受賞

単語の意味をベクトルで表現する，という単語埋め込みの研究においてB4の大山百々勢が若手奨励賞（対象280件中12件）を受賞をしました．対象となった研究「単語ベクトルの長さは意味の強さを表す」は，大山百々勢 (京大，理研AIP)，横井祥（東北大学，理研AIP），下平英寿（京都大学，理研AIP）の共同研究です．

言語処理学会のサイト https://www.anlp.jp/nlp2022/award.html

2022-03-242022-03-24

マルチスケールブートストラップによる選択的推測のセミナー(2021/12/15)

YouTubeの公開動画

EPFL CIS-RIKEN AIP Joint Seminar #6 20211215

Date and Time: December 15th 6:00pm – 7:00pm(JST)
10:00am-11:00pm(CET)
Venue:Zoom webinar

Language: English

Speaker: Hidetoshi Shimodaira, RIKEN AIP

Title: Selection bias may be adjusted when the sample size is negative in hierarchical clustering, phylogeny, and variable selection

Abstract:
For computing p-values, you should specify hypotheses before looking at data. However, people tend to use datasets twice for hypothesis selection and evaluation, leading to inflated statistical significance and more false positives than expected. Recently, a new statistical method, called selective inference or post-selection inference, has been developed for adjusting this selection bias. On the other hand, we also face biased p-values in multiple testing, although it is a different type of selection bias. In this talk, I present a bootstrap resampling method with a “negative sample size” for adjusting these two types of selection bias. The theory is based on a geometric idea in the data space, which bridges Bayesian posterior probability to the frequentist p-value. Examples are shown for the confidence interval of regression coefficients after model selection and significance levels of trees and edges in hierarchical clustering and phylogenetic inference.

Bio:
Hidetoshi Shimodaira is a professor at Kyoto University and a team leader at RIKEN AIP. He has been working on theory and methods of statistics and machine learning. His multiscale bootstrap method is used in genomics for evaluating the statistical significance of trees and clusters. His “covariate shift” setting for transfer learning is popular in machine learning.

2021-10-092021-10-09

研究室分野名の変更（2021/10/08）

京都大学情報学研究科における研究室の分野名が変更になりました．

旧分野名：数理システム論 (Mathematical System Theory)

新分野名：統計知能 (Statistical Intelligence)

研究室では，統計的方法論による「理解や思考」の探求を目指して，人工知能やデータサイエンスの基礎となる手法の数理的研究をしています．統計学や機械学習を融合する形でのデータ駆動型の帰納的推測を「統計知能 (Statistical Intelligence)」と呼称し研究室の分野名としました．

2021-07-072022-03-24

AIPオープンセミナー (2021/07/07)

数理統計学チームの研究を紹介するセミナーです．

YouTubeの公開動画

2021/07/07 15:00-17:00 Zoomによるオンライン配信（要登録）

Mathematical Statistics Team (https://aip.riken.jp/labs/generic_tech/math_stat/) at RIKEN AIP

Speaker 1: Hidetoshi Shimodaira (30 mins)
Title: Statistical Intelligence for Advanced Artificial Intelligence
Abstract: Our goal is to develop a data-driven methodology with statistical inference for artificial intelligence, which may be called “statistical intelligence.” In the first half of the talk, I overview our research topics: (1) Representation learning via graph embedding for multimodal relational data, (2) Valid inference via bootstrap resampling for many hypotheses with selection bias, (3) Statistical estimation of growth mechanism from complex networks. In the second half of the talk, I discuss a generalization of “additive compositionality” of word embedding in natural language processing. I show the computation of distributed representations for logical operations including AND, OR, and NOT, which would be a basis for implementing “advanced thinking” by AI in the future.

Speaker 2: Akifumi Okuno (30mins)
Title: Approximation Capability of Graph Embedding using Siamese Neural Network
Abstract: In this talk, we present our studies on the approximation capability of graph embedding using the Siamese neural network (NN). Whereas a prevailing line of previous works has applied the inner-product similarity (IPS) to the neural network outputs, the overall Siamese NN is limited to approximate only the positive-definite similarities. To overcome the limitation, we propose novel similarities called shifted inner product similarity (SIPS) and weighted inner product similarity (WIPS) for the siamese NN. We theoretically prove and empirically demonstrate their improved approximation capabilities.

Speaker 3: Yoshikazu Terada (30 mins)
Title: Selective inference via multiscale bootstrap and its application
Abstract: We consider a general approach to selective inference for hypothesis testing of the null hypothesis represented as an arbitrarily shaped region in the parameter space of the multivariate normal model. This approach is useful for hierarchical clustering, where confidence levels of clusters are calculated only for those appearing in the dendrogram, subject to heavy selection bias. Our computation is based on a raw confidence measure, called bootstrap probability, which is easily obtained by counting how many times the same cluster appears in bootstrap replicates of the dendrogram. We adjust the bias of the bootstrap probability by utilizing the scaling law in terms of geometric quantities of the region in the abstract parameter space, namely, signed distance and mean curvature. Although this idea has been used for non-selective inference of hierarchical clustering, its selective inference version has not been discussed in the literature. Our bias-corrected p-values are asymptotically second-order accurate in the large sample theory of smooth boundary surfaces of regions, and they are also justified for nonsmooth surfaces such as polyhedral cones. Moreover, the p-values are asymptotically equivalent to those of the iterated bootstrap but with less computation.

Speaker 4: Thong Pham (30 mins)
Title: Some recent progress in modeling preferential attachment of growing complex networks
Abstract: Preferential attachment (PA) is a network growth mechanism commonly invoked to explain the emergence of those heavy-tailed degree distributions characteristic of growing network representations of diverse real-world phenomena. In this talk, I will review some of our recent PA-related works, including a new estimation method for the nonparametric PA function from one single snapshot and a new condition for Bose-Einstein condensation in complex networks.

2021-03-252021-03-25

言語処理学会第27回年次大会(NLP2021)において優秀賞と委員特別賞を受賞

単語の意味をベクトルで表現する，という単語埋め込みの研究において２つの受賞をしました．横井祥（東北大学，理研AIP），内藤雅博（京都大学，理研AIP），下平英寿（京都大学，理研AIP）の共同研究です．

優秀賞　横井祥, 下平英寿　「単語埋め込みの確率的等方化」

委員特別賞　内藤雅博, 横井祥, 下平英寿　「単語埋め込みによる論理演算」

言語処理学会のサイト　https://www.anlp.jp/nlp2021/award.html

新人ラボメンの内藤くん（B4），おめでとう！

2020-05-152020-05-18

「みんなのPython勉強会」でトークしてきました

みんなのPython勉強会#57 祝5周年 – データサイエンス祭り！！

というイベントで講演させていただきました．

（YouTube動画の1:30〜2:03)

『統計学・機械学習における新しい手法のつくりかた』
下平英寿先生（京都大学、理化学研究所、@hshimodaira）
データサイエンスでは統計学・機械学習の様々な手法が実践されています．テキストに載っている手法のアルゴリズムをPythonなどで実装しても問題が解決しないときは，自分で新しい手法を作ることになります．新型コロナウイルスの系統解析で用いられる手法や，深層学習のバッチ正規化に関連した手法をこれまでに考えました．このような新しい手法の作り方についてお話したいと思います．

2020-02-242023-04-20

配属のための研究室見学

この研究室への配属に興味ある学生の見学や相談は随時受け付けていますので，先生へメール（下平・本多）で問い合わせてください．何度か見学に来てもいいです．

事前に研究室での研究内容についてイメージを持ってから訪問すると具体的な質問ができて有意義と思います．研究のページにある次の項目を訪問前に見てください．

研究紹介のリンク：　研究ダイジェスト，研究紹介などをみると，どのような研究をやっているか，雰囲気わかると思います．そこで興味を持てれば，見学時に具体的な質問をしてください．
論文被引用数：　Google Scholarみると，論文が被引用数の多い順（人気順みたいなもの）に並んでいますので，過去のどのような研究が評価されているかがわかります．
最近の論文：　研究室で書かれた最近の論文です．ちょっとむずかしいとは思いますが，ぜひいくつかクリックして眺めてください．ここに研究室の活動が集約されています．論文は研究室の「商品」とも言えます．

学士や修士の卒業研究はこちらにリストがありますので，参考にしてください．

研究室ではコアタイムはないですが，なるべく研究室に来て他のメンバーとも交流したほうが良いです．そのほうが研究テーマも決まりやすいし，研究も良い結果になることが多いです．研究室でお茶会みたいな企画をすることもありますが任意参加です．勉強に関しては，研究室ゼミ，個別の打ち合わせ，輪読が必須の活動です．輪読で読む本は学生が候補を提案して投票して決めています．

2020-02-132020-02-13