Detail
Article
Online article
FT
Medvik - BMC
  • Something wrong with this record ?

Fully automated pipeline for detection of sex linked genes using RNA-Seq data

M. Michalovova, Z. Kubat, R. Hobza, B. Vyskot, E. Kejnovsky,

. 2015 ; 16 (-) : 78. [pub] 20150311

Language English Country England, Great Britain

Document type Journal Article, Research Support, Non-U.S. Gov't

BACKGROUND: Sex chromosomes present a genomic region which to some extent, differs between the genders of a single species. Reliable high-throughput methods for detection of sex chromosomes specific markers are needed, especially in species where genome information is limited. Next generation sequencing (NGS) opens the door for identification of unique sequences or searching for nucleotide polymorphisms between datasets. A combination of classical genetic segregation analysis along with RNA-Seq data can present an ideal tool to map and identify sex chromosome-specific expressed markers. To address this challenge, we established genetic cross of dioecious plant Rumex acetosa and generated RNA-Seq data from both parental generation and male and female offspring. RESULTS: We present a pipeline for detection of sex linked genes based on nucleotide polymorphism analysis. In our approach, tracking of nucleotide polymorphisms is carried out using a cross of preferably distant populations. For this reason, only 4 datasets are needed - reads from high-throughput sequencing platforms for parent generation (mother and father) and F1 generation (male and female progeny). Our pipeline uses custom scripts together with external assembly, mapping and variant calling software. Given the resource-intensive nature of the computation, servers with high capacity are a requirement. Therefore, in order to keep this pipeline easily accessible and reproducible, we implemented it in Galaxy - an open, web-based platform for data-intensive biomedical research. Our tools are present in the Galaxy Tool Shed, from which they can be installed to any local Galaxy instance. As an output of the pipeline, user gets a FASTA file with candidate transcriptionally active sex-linked genes, sorted by their relevance. At the same time, a BAM file with identified genes and alignment of reads is also provided. Thus, polymorphisms following segregation pattern can be easily visualized, which significantly enhances primer design and subsequent steps of wet-lab verification. CONCLUSIONS: Our pipeline presents a simple and freely accessible software tool for identification of sex chromosome linked genes in species without an existing reference genome. Based on combination of genetic crosses and RNA-Seq data, we have designed a high-throughput, cost-effective approach for a broad community of scientists focused on sex chromosome structure and evolution.

References provided by Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc15031385
003      
CZ-PrNML
005      
20151008115728.0
007      
ta
008      
151005s2015 enk f 000 0|eng||
009      
AR
024    7_
$a 10.1186/s12859-015-0509-0 $2 doi
035    __
$a (PubMed)25884927
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a enk
100    1_
$a Michalovova, Monika $u Department of Plant Developmental Genetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Kralovopolska 135, CZ-61200, Brno, Czech Republic. biomonika@psu.edu. Current address: Department of Biology, Pennsylvania State University, University Park, PA, 16802, USA. biomonika@psu.edu.
245    10
$a Fully automated pipeline for detection of sex linked genes using RNA-Seq data / $c M. Michalovova, Z. Kubat, R. Hobza, B. Vyskot, E. Kejnovsky,
520    9_
$a BACKGROUND: Sex chromosomes present a genomic region which to some extent, differs between the genders of a single species. Reliable high-throughput methods for detection of sex chromosomes specific markers are needed, especially in species where genome information is limited. Next generation sequencing (NGS) opens the door for identification of unique sequences or searching for nucleotide polymorphisms between datasets. A combination of classical genetic segregation analysis along with RNA-Seq data can present an ideal tool to map and identify sex chromosome-specific expressed markers. To address this challenge, we established genetic cross of dioecious plant Rumex acetosa and generated RNA-Seq data from both parental generation and male and female offspring. RESULTS: We present a pipeline for detection of sex linked genes based on nucleotide polymorphism analysis. In our approach, tracking of nucleotide polymorphisms is carried out using a cross of preferably distant populations. For this reason, only 4 datasets are needed - reads from high-throughput sequencing platforms for parent generation (mother and father) and F1 generation (male and female progeny). Our pipeline uses custom scripts together with external assembly, mapping and variant calling software. Given the resource-intensive nature of the computation, servers with high capacity are a requirement. Therefore, in order to keep this pipeline easily accessible and reproducible, we implemented it in Galaxy - an open, web-based platform for data-intensive biomedical research. Our tools are present in the Galaxy Tool Shed, from which they can be installed to any local Galaxy instance. As an output of the pipeline, user gets a FASTA file with candidate transcriptionally active sex-linked genes, sorted by their relevance. At the same time, a BAM file with identified genes and alignment of reads is also provided. Thus, polymorphisms following segregation pattern can be easily visualized, which significantly enhances primer design and subsequent steps of wet-lab verification. CONCLUSIONS: Our pipeline presents a simple and freely accessible software tool for identification of sex chromosome linked genes in species without an existing reference genome. Based on combination of genetic crosses and RNA-Seq data, we have designed a high-throughput, cost-effective approach for a broad community of scientists focused on sex chromosome structure and evolution.
650    _2
$a ženské pohlaví $7 D005260
650    12
$a geny vázané na chromozom X $7 D050172
650    12
$a geny vázané na chromozom Y $7 D050173
650    _2
$a genetické markery $x genetika $7 D005819
650    _2
$a genom lidský $7 D015894
650    _2
$a vysoce účinné nukleotidové sekvenování $x metody $7 D059014
650    _2
$a lidé $7 D006801
650    _2
$a mužské pohlaví $7 D008297
650    _2
$a polymerázová řetězová reakce $7 D016133
650    _2
$a jednonukleotidový polymorfismus $x genetika $7 D020641
650    _2
$a RNA $x genetika $7 D012313
650    _2
$a sekvenční analýza RNA $x metody $7 D017423
650    12
$a software $7 D012984
655    _2
$a časopisecké články $7 D016428
655    _2
$a práce podpořená grantem $7 D013485
700    1_
$a Kubat, Zdenek $u Department of Plant Developmental Genetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Kralovopolska 135, CZ-61200, Brno, Czech Republic. kubat@ibp.cz.
700    1_
$a Hobza, Roman $u Department of Plant Developmental Genetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Kralovopolska 135, CZ-61200, Brno, Czech Republic. hobza@ibp.cz. Centre of the Region Hana for Biotechnological and Agricultural Research, Institute of Experimental Botany, Slechtitelu 31, 78371, Olomouc, Czech Republic. hobza@ibp.cz.
700    1_
$a Vyskot, Boris $u Department of Plant Developmental Genetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Kralovopolska 135, CZ-61200, Brno, Czech Republic. vyskot@ibp.cz.
700    1_
$a Kejnovsky, Eduard $u Department of Plant Developmental Genetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Kralovopolska 135, CZ-61200, Brno, Czech Republic. kejnovsk@ibp.cz.
773    0_
$w MED00008167 $t BMC bioinformatics $x 1471-2105 $g Roč. 16, č. - (2015), s. 78
856    41
$u https://pubmed.ncbi.nlm.nih.gov/25884927 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y a $z 0
990    __
$a 20151005 $b ABA008
991    __
$a 20151008115914 $b ABA008
999    __
$a ok $b bmc $g 1092261 $s 914511
BAS    __
$a 3
BAS    __
$a PreBMC
BMC    __
$a 2015 $b 16 $c - $d 78 $e 20150311 $i 1471-2105 $m BMC bioinformatics $n BMC Bioinformatics $x MED00008167
LZP    __
$a Pubmed-20151005

Find record

Citation metrics

Loading data ...

Archiving options

Loading data ...