scaleboot Home Page

scaleboot: Approximately Unbiased P-values via Multiscale Bootstrap

Contact

Hidetoshi Shimodaira
Graduate School of Informatics
Kyoto University
Jointly affiliated at RIKEN AIP
Lab website

What is scaleboot?

scaleboot is an add-on package of R. This is for calculating approximately unbiased (AU) p-values from a set of multiscale bootstrap probabilities for a hypothesis. Scaling is equivalent to changing the sample size of data set in bootstrap resampling. We calculate bootstrap probabilities at several scales, from which a very accurate p-value is calculated. This multiscale bootstrap method has been implemented in CONSEL software and pvclust package. The thrust of scaleboot package is to calculate an improved version of AU p-values which are justified even for hypotheses with nonsmooth boundaries by taking care of the singularity.

scaleboot package includes an interface to pvclust package of R for bootstrapping hierarchical clustering. We use pvclust to calculate multiscale bootstrap probabilities, from which we calculate an improved version of AU p-values using scaleboot.

scaleboot has a front end for phylogenetic inference, and it can replace CONSEL software for testing phylogenetic trees. Currently, scaleboot does not have a method for file conversion of several phylogenetic software, and so we must use CONSEL for this purpose before applying scaleboot to calculate an improved version of AU p-values for trees and edges.

The package vignette "Multiscale Bootstrap Using Scaleboot Package" (usesb.pdf) explains the methodology. It includes a simple example for illustration. It also includes real applications in hierarchical clustering and phylogenetic inference. Further description is given in Shimodaira (2008). For the use of scaleboot, Shimodaira (2008) may be referenced.

New in December 2019. A new method "Selective Infernce" (SI) is now implemented in the stable version of scaleboot (1.0-1) and pvclust (2.2-0), and they are available in CRAN. Previously, SI was implemented in the development version of scaleboot (1.0-0) and pvclust (2.1-0) available only at github sites (January 2019). SI may replace AU implemented in pvclust. Look at "Computing selective inference p-values of clusters using pvclust and scaleboot" (pvclust.pdf pvclust.html). SI is also implemented for phylogenetic inference. Look at "Phylogenetic Tree Selection" (phylo.pdf phylo.html) and "Model Map in Phylogenetics" (modelmap.pdf modelmap.html).

The method of SI (also known as Post-Selection Inference) is explained in Shimodaira and Terada (2019), and it may be cited for using SI value. The theory of SI is described in Terada and Shimodaira (2017).

Hosts

The stable version is available at CRAN; SI is included in this version. (Use this in December 2019)

The development version as well as the stable version is available at github. (No need to use this in December 2019)

scaleboot at github. The development version is at develop branch. The stable version is also at master branch.
pvclust at github. The development version is at develop branch. The stable version is also at master branch.

Install

scaleboot as well as pvclust is easily installed from CRAN online. RStudio users can install the package by choosing "scaleboot" from the pull-down menu. Otherwise, run R on your computer and type


		> install.packages("scaleboot")

		> install.packages("pvclust")

The development version of scaleboot as well as pvclust is installed from github. You need the devtools for installing from github.


		> install.packages("devtools")  # binary version is ok

		> library(devtools)

		> install_github("shimo-lab/scaleboot", ref = "develop", subdir = "src") 

		> install_github("shimo-lab/pvclust", ref = "develop", subdir = "src")

Dataset files

Supplementary dataset files for phylogenetic inference are available at dataset/mam15-files directory of the github site.

References

Shimodaira, H. (2002). An approximately unbiased test of phylogenetic tree selection. Systematic Biology, 51, 492-508.
Shimodaira, H. (2004). Approximately unbiased tests of regions using multistep-multiscale bootstrap resampling. Annals of Statistics, 32, 2616-2641. [PDF] [SUPPLEMENT]
Shimodaira, H. (2006). Approximately Unbiased Tests for Singular Surfaces via Multiscale Bootstrap Resampling. Research Reports B-430. Department of Mathematical and Computing Sciences, Tokyo Institute of Technology, Japan. [PDF]
Shimodaira, H. (2006). Technical Details of Multiscale Bootstrap for Singular Surfaces. Research Reports B-431. Department of Mathematical and Computing Sciences, Tokyo Institute of Technology, Japan. [PDF]
Shimodaira, H. (2008). Testing Regions with Nonsmooth Boundaries via Multiscale Bootstrap. Journal of Statistical Planning and Inference, 138, 1227-1241, 2008. http://dx.doi.org/10.1016/j.jspi.2007.04.001
Terada, R. and Shimodaira, H. (2017). Selective inference for the problem of regions via multiscale bootstrap. arXiv:1711.00949
Shimodaira, H. and Terada, R. (2019). Selective Inference for Testing Trees and Edges in Phylogenetics. arXiv:1902.04964, Front. Ecol. Evol., 24 May 2019. https://doi.org/10.3389/fevo.2019.00174