Aim of this workshop

In this workshop we will focus on two published phosphoproteomics datasets to illustrate some key components in PhosR package, including:

  1. Phosphoproteomic data preprocessing (normalisation, imputation, and quality control),
  2. Knowledge-based kinase perturbation analysis (using direction analysis),
  3. Kinase substrate predictions (using positive unlabelled learning).

At the end of this workshop you should have the basic understanding of some key steps in phosphoproteomics data analysis and some key computational and statistical methods that could be applied in each step.

Prerequisites

Ideally, you should be somewhat familiar with R. If you haven’t used R before, you can still pick up key elements in phosphoproteomics data analysis by running the codes I provided.

Running examples using Google Cloud

For avoiding issues that may be dependent on your own computer setup, we provide Google Cloud based virtual machine for running all examples in this tutorial. For this option, you need to follow instructions to initiate a Google Cloud virtual machine.

Local installation

If you have experience in using R and would like to install packages to your local computer which will allow you to analyse data using the packages in future, please first download and install R and RStudio if you don’t have them on your local compauter:

and then install PhosR package as below:

devtools::install_github("PengyiYang/PhosR")

You will also need to install CRAN packages directPA and ClueR for running exmaples provided in the tutorial.

Finally, you can find all the data and materials here.

References

Methodologies

  1. Knowledge-Based Analysis for Detecting Key Signaling Events from Time-Series Phosphoproteomics Data, Pengyi Yang et al. PLoS Computational Biology, 2015.

  2. KinasePA: Phosphoproteomics data annotation using hypothesis driven kinase perturbation analysis Pengyi Yang et al. Proteomics, 2016

  3. Positive-unlabeled ensemble learning for kinase substrate prediction from dynamic phosphoproteomics data, Pengyi Yang et al. Bioinformatics, 2016

Data

  1. Multi-omic Profiling Reveals Dynamics of the Phased Progression of Pluripotency, Pengyi Yang et al. Cell Systems, 2019.
sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.6.1  magrittr_1.5    tools_3.6.1     htmltools_0.4.0
##  [5] yaml_2.2.0      Rcpp_1.0.3      stringi_1.4.3   rmarkdown_1.18 
##  [9] knitr_1.26      stringr_1.4.0   xfun_0.11       digest_0.6.23  
## [13] rlang_0.4.2     evaluate_0.14