Биоинформатические инструменты

  1.  Подборка инструментов для анализа геномов

Сборка геномов (de novo)

  1. ALLPATHS_LG http://software.broadinstitute.org/allpaths-lg/blog/
    Gnerre, S., MacCallum, I., Przybylski, D., Ribeiro, F.J., Burton, J.N., Walker, B.J., Sharpe, T., Hall, G., Shea, T.P., Sykes, S. and Berlin, A.M., 2011. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences, 108(4), pp.1513-1518.

  2. SOAPdenovo2 http://soap.genomics.org.cn/soapdenovo.html
    Luo, R., Liu, B., Xie, Y., Li, Z., Huang, W., Yuan, J., He, G., Chen, Y., Pan, Q., Liu, Y. and Tang, J., 2012. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience, 1(1), p.1.

  3. Platanus http://platanus.bio.titech.ac.jp
    Kajitani, R., Toshimoto, K., Noguchi, H., Toyoda, A., Ogura, Y., Okuno, M., Yabana, M., Harada, M., Nagayasu, E., Maruyama, H. and Kohara, Y., 2014. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome research, 24(8), pp.1384-1395.

  4. SPAdes http://bioinf.spbau.ru/spades
    Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D. and Pyshkin, A.V., 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology, 19(5), pp.455-477.

  5. Canu (https://github.com/marbl/canu/releases)
    Berlin K, Koren S, Chin CS, Drake PJ, Landolin JM, Phillippy AM Assembling Large Genomes with Single-Molecule Sequencing and Locality Sensitive Hashing. Nature Biotechnology. (2015).

Оценка качества сборки и нуклеотидных прочтений

  1. ASTQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
    Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc

  2. Quast http://bioinf.spbau.ru/quast
    Gurevich, A., Saveliev, V., Vyahhi, N. and Tesler, G., 2013. QUAST: quality assessment tool for genome assemblies. Bioinformatics, 29(8), pp.1072-1075.

Обработка Fastq файлов

  1. Trimmomatic http://www.usadellab.org/cms/?page=trimmomatic

  2. FASTX-Toolkit http://hannonlab.cshl.edu/fastx_toolkit/

Выравнивание ридов

  1. Bowtie2 http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
    Langmead, B. and Salzberg, S.L., 2012. Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), pp.357-359.

  2. BWA http://bio-bwa.sourceforge.net
    Li, H. and Durbin, R., 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25(14), pp.1754-1760.

  3. mrfast http://mrfast.sourceforge.net
    Alkan, C., Kidd, J.M., Marques-Bonet, T., Aksay, G., Antonacci, F., Hormozdiari, F., Kitzman, J.O., Baker, C., Malig, M., Mutlu, O. and Sahinalp, S.C., 2009. Personalized copy number and segmental duplication maps using next-generation sequencing. Nature genetics, 41(10), pp.1061-1067.

  4. mrsfast http://sfu-compbio.github.io/mrsfast/
    Hach, F., Hormozdiari, F., Alkan, C., Hormozdiari, F., Birol, I., Eichler, E.E. and Sahinalp, S.C., 2010. mrsFAST: a cache-oblivious algorithm for short-read mapping. Nature methods, 7(8), pp.576-577.

Аннотация нуклеотидных вариантов

  1. SAMtools http://samtools.github.io
    Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G. and Durbin, R., 2009. The sequence alignment/map format and SAMtools. Bioinformatics, 25(16), pp.2078-2079.

  2. Bcftools https://samtools.github.io/bcftools/
    Li, H., 2011. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics, 27(21), pp.2987-2993.

  3. VCFtools https://vcftools.github.io
    Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A., Handsaker, R.E., Lunter, G., Marth, G.T., Sherry, S.T. and McVean, G., 2011. The variant call format and VCFtools. Bioinformatics, 27(15), pp.2156-2158.

  4. GATK https://software.broadinstitute.org/gatk/
    McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M. and DePristo, M.A., 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research, 20(9), pp.1297-1303.

  5. BreakDancer http://breakdancer.sourceforge.net
    Chen, K., Wallis, J.W., McLellan, M.D., Larson, D.E., Kalicki, J.M., Pohl, C.S., McGrath, S.D., Wendl, M.C., Zhang, Q., Locke, D.P. and Shi, X., 2009. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature methods, 6(9), pp.677-681.

  6. VariationHunter http://compbio.cs.sfu.ca/software-variation-hunter
    Alkan, C., Kidd, J.M., Marques-Bonet, T., Aksay, G., Antonacci, F., Hormozdiari, F., Kitzman, J.O., Baker, C., Malig, M., Mutlu, O. and Sahinalp, S.C., 2009. Personalized copy number and segmental duplication maps using next-generation sequencing. Nature genetics, 41(10), pp.1061-1067.

  7. Picard https://github.com/broadinstitute/picard
    DePristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., Del Angel, G., Rivas, M.A., Hanna, M. and McKenna, A., 2011. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics, 43(5), pp.491-498.

  8. FreeBayes https://github.com/ekg/freebayes
    Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv.org > q-bio > arXiv:12073907 / Available from: https://arxiv.org/abs/1207.3907

Аннотация генов

  1. AUGUSTUS http://bioinf.uni-greifswald.de/augustus/
    Stanke, M., Schöffmann, O., Morgenstern, B. and Waack, S., 2006. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC bioinformatics, 7(1), p.62.

  2. Genewise http://www.ebi.ac.uk/Tools/psa/genewise/
    Birney, E., Clamp, M. and Durbin, R., 2004. GeneWise and genomewise. Genome research, 14(5), pp.988-995.

  3. Exonerate http://www.ebi.ac.uk/about/vertebrate-genomics/software/exonerate
    Slater, G.S. and Birney, E., 2005. Automated generation of heuristics for biological sequence comparison. BMC bioinformatics, 6(1), p.31.

  4. MAKER http://www.yandell-lab.org/software/maker.html
    Campbell, M.S., Holt, C., Moore, B. and Yandell, M., 2014. Genome annotation and curation using MAKER and MAKER‐P. Current Protocols in Bioinformatics, pp.4-11.

  5. HMMER http://hmmer.org
    Johnson, L.S., Eddy, S.R. and Portugaly, E., 2010. Hidden Markov model speed heuristic and iterative HMM search procedure. BMC bioinformatics, 11(1), p.1.

Содержание повторяющихся элементов генома

  1. RepeatMasker http://www.repeatmasker.org
    Tarailo‐Graovac, M. and Chen, N., 2009. Using RepeatMasker to identify repetitive elements in genomic sequences. Current Protocols in Bioinformatics, pp.4-10.

  2. WindowMasker http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/
    Morgulis, A., Gertz, E.M., Schäffer, A.A. and Agarwala, R., 2006. WindowMasker: window-based masker for sequenced genomes. Bioinformatics, 22(2), pp.134-141.

  3. Tandem Repeats Finder http://tandem.bu.edu/trf/trf.html
    Benson, G., 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research, 27(2), p.573.

  4. DustMasker http://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/src/
    Morgulis, A., Gertz, E.M., Schäffer, A.A. and Agarwala, R., 2006. A fast and symmetric DUST implementation to mask low-complexity DNA sequences. Journal of Computational Biology, 13(5), pp.1028-1040.

Поиск эндогенных ретровирусо-подобных элементов в геноме

  1. RetroTector http://retrotector.neuro.uu.se
    Sperber, G.O., Airola, T., Jern, P. and Blomberg, J., 2007. Automated recognition of retroviral sequences in genomic data—RetroTector©. Nucleic acids research, 35(15), pp.4964-4976.

  2. LTR_FINDER http://tlife.fudan.edu.cn/ltr_finder/
    Xu, Z. and Wang, H., 2007. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic acids research, 35(suppl 2), pp.W265-W268.

  3. LTRharvest http://www.zbh.uni-hamburg.de/?id=206
    Ellinghaus, D., Kurtz, S. and Willhoeft, U., 2008. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC bioinformatics, 9(1), p.1.

Поиск вариаций числа копий и сегментных дупликаций

  1. Dupmasker http://www.repeatmasker.org/DupMaskerDownload.html
    Jiang, Z., Tang, H., Ventura, M., Cardone, M.F., Marques-Bonet, T., She, X., Pevzner, P.A. and Eichler, E.E., 2007. Ancestral reconstruction of segmental duplications reveals punctuated cores of human genome evolution. Nature genetics, 39(11), pp.1361-1368.

  2. mrcanavar http://mrcanavar.sourceforge.net
    Alkan, C., Kidd, J.M., Marques-Bonet, T., Aksay, G., Antonacci, F., Hormozdiari, F., Kitzman, J.O., Baker, C., Malig, M., Mutlu, O. and Sahinalp, S.C., 2009. Personalized copy number and segmental duplication maps using next-generation sequencing. Nature genetics, 41(10), pp.1061-1067.

Поиск генов микро-РНК

  1. MiRFinder http://www.bioinformatics.org/mirfinder/
    Huang, T.H., Fan, B., Rothschild, M.F., Hu, Z.L., Li, K. and Zhao, S.H., 2007. MiRFinder: an improved approach and software implementation for genome-wide fast microRNA precursor scans. BMC bioinformatics, 8(1), p.1.

  2. miRBase http://www.mirbase.org
    Griffiths-Jones, S., Grocock, R.J., Van Dongen, S., Bateman, A. and Enright, A.J., 2006. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic acids research, 34(suppl 1), pp.D140-D144.

  3. ViennaRNA http://www.tbi.univie.ac.at/RNA/index.html
    Lorenz, R., Bernhart, S.H., Zu Siederdissen, C.H., Tafer, H., Flamm, C., Stadler, P.F. and Hofacker, I.L., 2011. ViennaRNA Package 2.0. Algorithms for Molecular Biology, 6(1), p.1.

Эволюция генных семейств

  1. CAFE http://www.indiana.edu/~hahnlab/software.html
    Han, M.V., Thomas, G.W., Lugo-Martinez, J. and Hahn, M.W., 2013. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Molecular biology and evolution, 30(8), pp.1987-1997.

Эволюционно консервативные элементы

  1. phastCons http://compgen.cshl.edu/phast/phastCons-HOWTO.html
    Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A.S., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S. and Weinstock, G.M., 2005. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome research, 15(8), pp.1034-1050.

  2. SiPhy http://portals.broadinstitute.org/genome_bio/siphy/index.html
    Garber, M., Guttman, M., Clamp, M., Zody, M.C., Friedman, N. and Xie, X., 2009. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics, 25(12), pp.i54-i62.

  3. GERP++ http://mendel.stanford.edu/SidowLab/downloads/gerp/
    Davydov, E.V., Goode, D.L., Sirota, M., Cooper, G.M., Sidow, A. and Batzoglou, S., 2010. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol, 6(12), p.e1001025.

Множественное геномное выравнивание и поиск синтенных блоков

  1. HAL tools https://github.com/glennhickey/hal
    Hickey, G., Paten, B., Earl, D., Zerbino, D. and Haussler, D., 2013. HAL: a hierarchical format for storing and analyzing multiple genome alignments. Bioinformatics, p.btt128.

  2. Progressive cactus https://github.com/glennhickey/progressiveCactus
    Paten, B., Diekhans, M., Earl, D., John, J.S., Ma, J., Suh, B. and Haussler, D., 2011. Cactus graphs for genome comparisons. Journal of Computational Biology, 18(3), pp.469-481.

  3. Ragout-maf2synteny http://fenderglass.github.io/Ragout/
    Kolmogorov, M., Raney, B., Paten, B. and Pham, S., 2014. Ragout—a reference-assisted assembly tool for bacterial genomes. Bioinformatics, 30(12), pp.i302-i309.

  4. GRIMM synteny http://grimm.ucsd.edu/DIST/
    Pevzner, P. and Tesler, G., 2003. Genome rearrangements in mammalian evolution: lessons from human and mouse genomes. Genome research, 13(1), pp.37-45.

  5. GRIMM http://grimm.ucsd.edu/GRIMM/
    Tesler, G., 2002. Efficient algorithms for multichromosomal genome rearrangements. Journal of Computer and System Sciences, 65(3), pp.587-609.

Обработка данных

  1. Kent utils https://github.com/ENCODE-DCC/kentUtils
    Karolchik, D., Baertsch, R., Diekhans, M., Furey, T.S., Hinrichs, A., Lu, Y.T., Roskin, K.M., Schwartz, M., Sugnet, C.W., Thomas, D.J. and Weber, R.J., 2003. The UCSC genome browser database. Nucleic acids research, 31(1), pp.51-54.

  2. Samtools https://samtools.github.io
    Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G. and Durbin, R., 2009. The sequence alignment/map format and SAMtools. Bioinformatics, 25(16), pp.2078-2079.

  3. Bedtools http://bedtools.readthedocs.io/en/latest/
    Quinlan, A.R. and Hall, I.M., 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6), pp.841-842.

Геномное выравнивание

  1. NCBI BLAST+ https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastDocs&DOC_TYPE=Download
    Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K. and Madden, T.L., 2009. BLAST+: architecture and applications. BMC bioinformatics, 10(1), p.1.

  2. LASTZ http://www.bx.psu.edu/~rsharris/lastz/
    Harris, R.S., 2007. Improved pairwise alignment of genomic DNA. ProQuest.

Популяционный анализ

  1. PSMC https://github.com/lh3/psmc
    Li, H. and Durbin, R., 2011. Inference of human population history from individual whole-genome sequences. Nature, 475(7357), pp.493-496.

  2. MrBayes http://mrbayes.sourceforge.net
    Ronquist, F., Teslenko, M., van der Mark, P., Ayres, D.L., Darling, A., Höhna, S., Larget, B., Liu, L., Suchard, M.A. and Huelsenbeck, J.P., 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Systematic biology, 61(3), pp.539-542.

  3. Beast http://beast.bio.ed.ac.uk
    Drummond, A.J., Suchard, M.A., Xie, D. and Rambaut, A., 2012. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Molecular biology and evolution, 29(8), pp.1969-1973.

  4. DaDi https://bitbucket.org/gutenkunstlab/dadi/
    Gutenkunst, R.N., Hernandez, R.D., Williamson, S.H. and Bustamante, C.D., 2009. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet, 5(10), p.e1000695.

  5. RAXML http://sco.h-its.org/exelixis/web/software/raxml/index.html
    A. Stamatakis: "RAxML Version 8: A tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies". In Bioinformatics, 2014

Визуалиазация данных

  1. Circos http://circos.ca/
    Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D., Jones, S.J. and Marra, M.A., 2009. Circos: an information aesthetic for comparative genomics. Genome research, 19(9), pp.1639-1645.

  2. FigTree http://beast.bio.ed.ac.uk/figtree

  3. UCSC Genome Browser https://genome.ucsc.edu/admin/mirror.html
    Kent, W.J., Sugnet, C.W., Furey, T.S., Roskin, K.M., Pringle, T.H., Zahler, A.M. and Haussler, D., 2002. The human genome browser at UCSC. Genome research, 12(6), pp.996-1006.

Важнейшие точки интернета для структурной биологии белков

  1. http://www.uniprot.org/ UniProt, "genbank" для белков.
  2. http://www.rcsb.org/pdb/home/home.do  PDB, Protein Data Bank, депозитарий третичных структур белков
  3. http://web.expasy.org/docs/swiss-prot_guideline.html  Некая начальная страница UniProt/SwissProt, дающая дополнительные ссылки.

Иерархические классификации белков

  1. http://www.cathdb.info/  CATH.
  2. http://scop.mrc-lmb.cam.ac.uk/scop/  и http://scop2.mrc-lmb.cam.ac.uk/   SCOP и SCOP2

Доменная структура и её анализ

  1. http://aquaria.ws/  Нечто, что пришло на смену SRS 3D.
  2. http://prosite.expasy.org/  Prosite
  3. http://prodom.prabi.fr/prodom/current/html/home.php  ProDom
  4. https://www.ebi.ac.uk/interpro/  InterPro - protein analysis, classification and search.
  5. http://polyview.cchmc.org/  Polyview.


Предсказание структуры белков

  1. https://salilab.org/modeller/  Modeller
  2. http://www.proteinmodelportal.org/  Protein Modeling Portal
  3. https://www.predictprotein.org/  PredictProtein
  4. https://www.bakerlab.org/  The Baker Lab - Rosetta, Robetta, The Institute for protein design.

Подвижность белков

  1. http://wishart.biology.ualberta.ca/moviemaker/  MovieMaker
  2. https://enm.lobos.nih.gov/start_path.html ENM modeling server
  3. https://github.com/gtamazian/PROMPT  PROMPT (G.Tamazian, E.Stepanov, Yu.Porozov)
  4. http://www.molmovdb.org/  Database of macromolecular motions
  5. http://lorentz.dynstr.pasteur.fr/joel/index.php  MinActionPath server

Докинг и скоринг

  1. http://www.ebi.ac.uk/intact/
  2. https://cluspro.bu.edu/login.php
  3. http://proteinsplus.zbh.uni-hamburg.de/ SIENA
  4. http://blaster.docking.org/start.shtml
  5. https://structure.bu.edu/content/our-servers
  6. http://ibis.tau.ac.il/wiki/nir_bental/index.php/Programs_and_DataBases  Nir Ben-Tal group
  7. http://bioinfo3d.cs.tau.ac.il/SymmDock/

Соревнования

  1. http://predictioncenter.org/  CASP
  2. http://www.ebi.ac.uk/msd-srv/capri/ CAPRI

ProteoPedia

  1. http://www.proteopedia.org/wiki/index.php/Main_Page

 



To Be Continued...