De novo transcriptome assembly databases for the butterfly orchid Phalaenopsis equestris

Shan-Ce Niu; Qing Xu; Guo-Qiang Zhang; Yong-Qiang Zhang; Wen-Chieh Tsai; Jui-Ling Hsu; Chieh-Kai Liang; Yi-Bo Luo; Zhong-Jian Liu

doi:10.1038/sdata.2016.83

De novo transcriptome assembly databases for the butterfly orchid Phalaenopsis equestris

Sci Data. 2016 Sep 27:3:160083. doi: 10.1038/sdata.2016.83.

Authors

Shan-Ce Niu^{1

2}, Qing Xu³, Guo-Qiang Zhang³, Yong-Qiang Zhang³, Wen-Chieh Tsai^{4

5

6}, Jui-Ling Hsu^{3

5}, Chieh-Kai Liang⁴, Yi-Bo Luo¹, Zhong-Jian Liu^{3

7

8

9}

Affiliations

¹ State Key Laboratory of Systematic and Evolutionary Botany, Institute of Botany, Chinese Academy of Sciences, Beijing 100093, China.
² University of Chinese Academy of Sciences, Beijing 100049, China.
³ Shenzhen Key Laboratory for Orchid Conservation and Utilization, The National Orchid Conservation Centre of China and The Orchid Conservation and Research Centre of Shenzhen, Shenzhen 518114, China.
⁴ Institute of Tropical Plant Sciences, National Cheng Kung University, Tainan 701, Taiwan.
⁵ Orchid Research and Development Center, National Cheng Kung University, Tainan 701, Taiwan.
⁶ Department of Life Sciences, National Cheng Kung University, Tainan 701, Taiwan.
⁷ The Centre for Biotechnology and BioMedicine, Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China.
⁸ College of Forestry and Landscape Architecture, South China Agricultural University, Guangzhou 510640, China.
⁹ College of Arts, College of Landscape Architecture, Fujian Agriculture and Forestry University, Fuzhou 350002, China.

Abstract

Orchids are renowned for their spectacular flowers and ecological adaptations. After the sequencing of the genome of the tropical epiphytic orchid Phalaenopsis equestris, we combined Illumina HiSeq2000 for RNA-Seq and Trinity for de novo assembly to characterize the transcriptomes for 11 diverse P. equestris tissues representing the root, stem, leaf, flower buds, column, lip, petal, sepal and three developmental stages of seeds. Our aims were to contribute to a better understanding of the molecular mechanisms driving the analysed tissue characteristics and to enrich the available data for P. equestris. Here, we present three databases. The first dataset is the RNA-Seq raw reads, which can be used to execute new experiments with different analysis approaches. The other two datasets allow different types of searches for candidate homologues. The second dataset includes the sets of assembled unigenes and predicted coding sequences and proteins, enabling a sequence-based search. The third dataset consists of the annotation results of the aligned unigenes versus the Nonredundant (Nr) protein database, Kyoto Encyclopaedia of Genes and Genomes (KEGG) and Clusters of Orthologous Groups (COG) databases with low e-values, enabling a name-based search.