- New versions of
*pvclust*(>=2.2-0) and*scaleboot*(>=1.0-1) can compute selective inference (si) \(p\)-values as well as the conventional approximately unbiased (au) \(p\)-values and the bootstrap probability (bp). - We recommend si values in most cases instead of au and bp values.
- Several estimating methods of \(p\)-values are implemented in the software. The default
*pvclust*setting (e.g.,*lung.pvclust*below) is a simple yet good enough method; this can be computed without*scaleboot*. - More accurate, less biased versions of \(p\)-values (i.e., \(k=3\)) can be computed by
*scaleboot*. The setting of the wide range of scales (e.g.,*lung73.k3*below) is then recommended.

Our method for computing selective inference \(p\)-values via multiscale bootstrap is explained in Shimodaira and Terada (2019). The theory for the selective inference behind the method is given in Terada and Shimodaira (2017). The *pvclust* package is originally described in Suzuki and Shimodaira (2006) for non-selective inference, and the multiscale bootstrap method of *scaleboot* is originally described in Shimodaira (2008).

The selective inference is also known as post-selection inference. This takes account of the fact that the clusters in the dendrogram are selected by looking at data. This contradicts the basic assumption of the conventional statistical hypothesis testing, which requires null hypotheses are selected before looking at the data. The newly proposed si values are adjusting the bias of this effect, but the conventional au values suffer from the selection bias.

Hidetoshi Shimodaira and Yoshikazu Terada. Selective Inference for Testing Trees and Edges in Phylogenetics. Front. Ecol. Evol., 2019. https://doi.org/10.3389/fevo.2019.00174

Yoshikazu Terada and Hidetoshi Shimodaira. Selective inference for the problem of regions via multiscale bootstrap. arXiv:1711.00949, 2017. https://arxiv.org/abs/1711.00949

Hidetoshi Shimodaira. Testing regions with nonsmooth boundaries via multiscale bootstrap. Journal of Statistical Planning and Inference 138 (5), 1227-1241, 2008. https://doi.org/10.1016/j.jspi.2007.04.001

Ryota Suzuki and Hidetoshi Shimodaira. Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22 (12), 1540-1542, 2006. https://doi.org/10.1093/bioinformatics/btl117

We use the following two packages here. Both packages implement the multiscale bootstrap method. Older versions of *pvclust* and *scaleboot* compute only BP and AU, but newer versions (pvclust>=2.2-0, scaleboot>=1.0-1) compute SI as well.

```
library(pvclust) # computing p-values for clusters in hierarchical clustering
library(scaleboot) # computing p-values for general setttings
```

If your pc has cpu with many cores, then we can speed up bootstrap computation.

```
### dont run
library(parallel)
length(cl <- makeCluster(detectCores()))
```

We use the sample data of microarray expression profiles. It is \(n\times m\) matrix for \(n=916\) genes and \(m=73\) tumors. We compute clusters of tumors.

```
data(lung) # in pvclust
dim(lung)
```

`## [1] 916 73`

We may run pvclust as follows. The default scale is specifed as *r=seq(.5,1.4,by=.1)* in pvlcust. It is equivalent to \(\sigma^{-2} = 0.5, 0.6, \ldots, 1.4\) for multiscale bootstrap.

```
### dont run
lung.pvclust <- pvclust(lung, nboot=10000, parallel=cl)
```

The above code takes a long time. You can simply download the preveously performed result. The following code loads *lung.pvclust*, *lung73.pvclust*, *lung.sb*, and *lung73.sb* (see *help(lung73)*).

`data(lung73)`

The default plot of cluster dendrogram shows two types of \(p\)-values (au, bp), and the significant clusters with au > 0.95 are shown as red boxes. au is the approximately unbiased \(p\)-value, and bp is the boostrap probability.

```
plot(lung.pvclust, cex=0.5, cex.pv=0.5)
pvrect(lung.pvclust) # au > 0.95
```