• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

DeepVariant: DeepVariant is a deep learning-based variant caller that takes alig ...

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称:

DeepVariant

开源软件地址:

https://gitee.com/openvinotoolkit-prc/deepvariant

开源软件介绍:

DeepVariant

releaseannouncementsblog

DeepVariant is a deep learning-based variant caller that takes aligned reads (inBAM or CRAM format), produces pileup image tensors from them, classify eachtensor using a convolutional neural network, and finally reports the results ina standard VCF or gVCF file.

DeepVariant supports:

  • Germline variant-calling in diploid organisms.
    • For somatic data or any other samples where the genotypes go beyond twocopies of DNA, DeepVariant will not work out of the box because the onlygenotypes supported are hom-alt, het, and hom-ref.
    • The models included with DeepVariant are only trained on human data. Forother organisms, see theblog post on non-human variant-callingfor some possible pitfalls and how to handle them.
  • Calling from NGS and long-read sequencing data.

How to run

We recommend using our Docker solution. The command will look like this:

BIN_VERSION="1.0.0"docker run \  -v "YOUR_INPUT_DIR":"/input" \  -v "YOUR_OUTPUT_DIR:/output" \  google/deepvariant:"${BIN_VERSION}" \  /opt/deepvariant/bin/run_deepvariant \  --model_type=WGS \ **Replace this string with exactly one of the following [WGS,WES,PACBIO,HYBRID_PACBIO_ILLUMINA]**  --ref=/input/YOUR_REF \  --reads=/input/YOUR_BAM \  --output_vcf=/output/YOUR_OUTPUT_VCF \  --output_gvcf=/output/YOUR_OUTPUT_GVCF \  --num_shards=$(nproc) **This will use all your cores to run make_examples. Feel free to change.**

To see all flags you can use, run: docker run google/deepvariant:"${BIN_VERSION}" --help

If you're using GPUs, or want to use Singularity instead, seeQuick Start for more details or see all thesetup options available including solutions on externalplatforms.

For more information, also see:

How to cite

If you're using DeepVariant in your work, please cite:

A universal SNP and small-indel variant caller using deep neural networks. Nature Biotechnology 36, 983–987 (2018).
Ryan Poplin, Pi-Chuan Chang, David Alexander, Scott Schwartz, Thomas Colthurst, Alexander Ku, Dan Newburger, Jojo Dijamco, Nam Nguyen, Pegah T. Afshar, Sam S. Gross, Lizzie Dorfman, Cory Y. McLean, and Mark A. DePristo.
doi: https://doi.org/10.1038/nbt.4235

Additionally, if you are generating multi-sample calls using ourDeepVariant and GLnexus Best Practices, pleasecite:

Accurate, scalable cohort variant calls using DeepVariant and GLnexus. bioRxiv10.1101/2020.02.10.942086v1 (2020).
Taedong Yun, Helen Li, Pi-Chuan Chang, Michael F. Lin, Andrew Carroll, and Cory Y.McLean.
doi: https://doi.org/10.1101/2020.02.10.942086

Why Use DeepVariant?

  • High accuracy - In 2016 DeepVariant wonPrecisionFDA Truth Challengefor best SNP Performance. DeepVariant maintains high accuracy across datafrom different sequencing technologies, prep methods, and species. Forlower coverage,using DeepVariant makes an especially great difference. Seemetrics for the latest accuracy numbers on each of thesequencing types.
  • Flexibility - Out-of-the-box use forPCR-positivesamples andlow quality sequencing runs,and easy adjustments fordifferent sequencing technologiesandnon-human species.
  • Ease of use - No filtering is needed beyond setting your preferredminimum quality threshold.
  • Cost effectiveness - With a single non-preemptible n1-standard-16machine on Google Cloud, it costs ~$9.11 to call a 30x whole genome and~$0.39 to call an exome. With preemptible pricing, the cost is $2.19 for a30x whole genome and $0.09 for whole exome (not considering preemption).
  • Speed - On a 64-core CPU-only machine, DeepVariant completes a 50x WGSin 5 hours and an exome in 16 minutes (1). Multipleoptions for acceleration exist, taking the WGS pipeline to as fast as 40minutes (see external solutions).
  • Usage options - DeepVariant can be run via Docker or binaries, usingboth on-premise hardware or in the cloud, with support for hardwareaccelerators like GPUs and TPUs.

(1): Time estimates do not include mapping.

How DeepVariant works

diagram of stages in DeepVariant

For more information on the pileup images and how to read them, please see the"Looking through DeepVariant's Eyes" blog post.

DeepVariant relies on Nucleus, a library ofPython and C++ code for reading and writing data in common genomics file formats(like SAM and VCF) designed for painless integration with theTensorFlow machine learning framework. Nucleuswas built with DeepVariant in mind and open-sourced separately so it can be usedby anyone in the genomics research community for other projects. See this blogpost onUsing Nucleus and TensorFlow for DNA Sequencing Error Correction.

DeepVariant Setup

Prerequisites

  • Unix-like operating system (cannot run on Windows)
  • Python 2.7

Official Solutions

Below are the official solutions provided by theGenomics team in Google Health.

NameDescription
DockerThis is the recommended method.
Build from sourceDeepVariant comes with scripts to build it on Ubuntu 14 and 16, with Ubuntu 16 recommended. To build and run on other Unix-based systems, you will need to modify these scripts.
Prebuilt BinariesAvailable at gs://deepvariant/. These are compiled to use SSE4 and AVX instructions, so you will need a CPU (such as Intel Sandy Bridge) that supports them. You can check the /proc/cpuinfo file on your computer, which lists these features under "flags".

External Solutions

The following pipelines are not created or maintained by theGenomics team in Google Health.Please contact the relevant teams if you have any questions or concerns.

NameDescription
Running DeepVariant on Google Cloud PlatformDocker-based pipelines optimized for cost and speed. Code can be found here.
DeepVariant-on-spark from ATGENOMIXA germline short variant calling pipeline that runs DeepVariant on Apache Spark at scale with support for multi-GPU clusters (e.g. NVIDIA DGX-1).
NVIDIA Clara ParabricksAn accelerated DeepVariant pipeline with multi-GPU support that runs our WGS pipeline in just 40 minutes, at a cost of $2-$3 per sample. This provides a 7.5x speedup over a 64-core CPU-only machine at lower cost.
DNAnexus DeepVariant AppOffers parallelized execution with a GUI interface (requires platform account).
Nextflow PipelineOffers parallel processing of multiple BAMs and Docker support.
DNAstack PipelineCost-optimized DeepVariant pipeline (requires platform account).

Contribution Guidelines

Please open a pull request ifyou wish to contribute to DeepVariant. Note, we have not set up theinfrastructure to merge pull requests externally. If you agree, we will test andsubmit the changes internally and mention your contributions in ourrelease notes. We apologizefor any inconvenience.

If you have any difficulty using DeepVariant, feel free toopen an issue. If you havegeneral questions not specific to DeepVariant, we recommend that you post on acommunity discussion forum such as BioStars.

License

BSD-3-Clause license

Acknowledgements

DeepVariant happily makes use of many open source packages. We would like tospecifically call out a few key ones:

We thank all of the developers and contributors to these packages for theirwork.

Disclaimer

This is not an official Google product.


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
热门话题
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap