Detection of polymorphisms from human NGS by PED

[Link to top page]

In this section, install of Linux(Ubuntu), setup of PED and browsing of detected polymorphisms using Platinum human sequence data provided by Illumina are described.

Following is PED analysis with ERR194146 and ERR194147 sequence data.
Project title of short reads: Whole genome sequencing and variant calls for the Coriell CEPH/UTAH 1463 family to create a "platinum" standard comprehensive set for variant calling improvement (PRJEB3381).

An example of genome view by IGV

Click to large.
Indels can be detect exactly, as well as SNPs.
Nucleotide sequences of primer pair to amplify target mutations are also shown.

Polymorphisms of ERR194146 are deteceted by PED with ERR194158 (Father) and ERR194159 (Mother)

Click to large.
ERR194146vsERR194159 shows specific polymorhisms from father, and ERR194146vsERR194158 shows specific polymorhisms from mother.
Segments inherited from father and mother are clearly shown.
Link of analyzed vcf files.
ERR194146vsERR194158.vcf (Mother specific)
ERR194146vsERR194159.vcf (Father specific)
View of polymorphisms with web application of IGV. Click to large.
Ambiguous loci mapped in two or more positions on the reference genome have been filtered.

Install of Ubuntu

Download Ubuntu 20.4 LTS from Ubuntu download site.
Ubuntu install tutorial will be helpful for installing.
Because PED uses maximum CPU power, exclusive use for PED analysis is recommended.
PED supports multithread environment. CPU with 4 or 8 core is recommended.

Setup and analysis of PED

Installing software required by PED

$ sudo apt update
$ sudo apt upgrade
$ sudo apt install curl
$ sudo apt install git
$ git clone https://github.com/akiomiyao/ped.git

To download sequence data, fastq-dump from NCBI is required.
Tool kit can be download from https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software
Details of setup fastq-dump is described in https://akiomiyao.github.io/ped/sratoolkit/index.html
Download fastq files of the Platinum genome sequence (ERR194146 and ERR194147) from NCBI Sequence read archive site.
```
$ cd ped
$ git pull
$ perl download.pl accession=ERR194146
$ perl download.pl accession=ERR194147
```
Scripts are updated by git pull command.
Files of fastq format are downloaded into ERR194146/read and ERR194147/read.
Downloading is slow. It requires one or two days.
Sometimes error message will appear. The script will try to reconnect. Check increasing size of downloading files.
Polymorphism detection of ERR194146 and ERR194147 sequences on human reference genome hg19.
```
$ perl ped.pl target=ERR194146,ref=hg19
$ perl ped.pl target=ERR194147,ref=hg19
```
Since hg19 reference file is already configured, data are automatically downloaded and calculated.
When all processes are done, ERR194146.vcf and ERR194147.vcf are created in ERR194146 and ERR194147 directory.
Sequences of primer pair are also outputted into vcf file.

If you have control sequences for the target,

$ perl ped.pl target=ERR194146,control=ERR194158,ref=hg19
$ perl ped.pl target=ERR194146,control=ERR194159,ref=hg19

Specific polymorphisms for the target will be outputted.

For Docker users

Set up of Docker
$ sudo apt install docker
$ sudo apt install docker.io
Example of analysis,
$ sudo docker pull akiomiyao/ped
$ sudo docker run -w /ped -v `pwd`:/work akiomiyao/ped perl download.pl accession=ERR194146,wd=/work
$ sudo docker run -w /ped -v `pwd`:/work akiomiyao/ped perl ped.pl target=ERR194146,ref=hg19,wd=/work
Copy and paste commands described above to your terminal window, change values of accession, target and clipping, and run.
Both the ped.pl script and the docker container can be run on Google Cloud Platform and Amazon Web Services.

Visualization of detected polymorphisms using IVG

Integrative Genomics Viewer (IVG) is a genome browser made in Broad Institute.
Web application of IVG https://igv.org/app/ is useful.
Click to large.
- Download pre-analyzed vcf files linked below.
- Import files by selecting 'Local file' in 'Tracks' pull down menu. Because vcf files are big, it takes minutes.
Or download from http://software.broadinstitute.org/software/igv/download.
Drag ERR194146.vcf and ERR194147.vcf on main display panel of IVG, data will be imported.
Due to big size of vcf data, IVG requests making index. Click OK and create index.
For making index, enough memory (8GB for linux is enough) is required, otherwise your computer will freeze.
If index making is completed but failed to display data, restart the IVG. Data will be shown.
When click the number of chromosome, map and polymorphisms by each chromosome will be shown.
Click at interested position, zoom up the region.
Mouse over at vertical line of polymorphism, detailed information will be shown.
Sequence of primer pair for target polymorphism and PCR product size as well as kind and frequency of polymorphism will also be shown.
ERR194146.vcf
ERR194146.vcf.idx
ERR194147.vcf
ERR194147.vcf.idx
ERR194146vsERR194158.vcf (Mother specific)
ERR194146vsERR194159.vcf (Father specific)
are already analyzed data.

Link of PED

Github: https://github.com/akiomiyao/ped/
Docker: sudo docker pull akiomiyao/ped
sudo docker run -w /ped -v `pwd`:/work akiomiyao/ped perl download.pl accession=ERR194146,wd=/work
sudo docker run -w /ped -v `pwd`:/work akiomiyao/ped perl ped.pl target=ERR194146,ref=hg19,wd=/work
Paper: Polymorphic edge detection (PED): two efficient methods of polymorphism detection from next-generation sequencing data (2019) BMC Bioinformatics 20(1):362

Contact

Akio Miyao Ph.D (miyao@affrc.go.jp)
Institute of Crop Science / NARO
2-1-2, Kannondai, Tsukuba, Ibaraki, 305-8518, Japan