• Je něco špatně v tomto záznamu ?

BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management

D. Sehnal, S. Bittrich, S. Velankar, J. Koča, R. Svobodová, SK. Burley, AS. Rose

. 2020 ; 16 (10) : e1008247. [pub] 20201019

Jazyk angličtina Země Spojené státy americké

Typ dokumentu časopisecké články, Research Support, N.I.H., Extramural, práce podpořená grantem, Research Support, U.S. Gov't, Non-P.H.S.

Perzistentní odkaz   https://www.medvik.cz/link/bmc21012014

Grantová podpora
R01 GM133198 NIGMS NIH HHS - United States
104948 Wellcome Trust - United Kingdom

3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression-factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.

Citace poskytuje Crossref.org

000      
00000naa a2200000 a 4500
001      
bmc21012014
003      
CZ-PrNML
005      
20210507104148.0
007      
ta
008      
210420s2020 xxu f 000 0|eng||
009      
AR
024    7_
$a 10.1371/journal.pcbi.1008247 $2 doi
035    __
$a (PubMed)33075050
040    __
$a ABA008 $b cze $d ABA008 $e AACR2
041    0_
$a eng
044    __
$a xxu
100    1_
$a Sehnal, David $u CEITEC, Central European Institute of Technology, Masaryk University, Brno, Czech Republic $u National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic $u Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
245    10
$a BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management / $c D. Sehnal, S. Bittrich, S. Velankar, J. Koča, R. Svobodová, SK. Burley, AS. Rose
520    9_
$a 3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression-factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.
650    _2
$a krystalografie $x metody $7 D003461
650    _2
$a komprese dat $x metody $7 D044962
650    _2
$a chemické databáze $7 D062126
650    _2
$a makromolekulární látky $x chemie $x ultrastruktura $7 D046911
650    12
$a molekulární modely $7 D008958
650    12
$a software $7 D012984
655    _2
$a časopisecké články $7 D016428
655    _2
$a Research Support, N.I.H., Extramural $7 D052061
655    _2
$a práce podpořená grantem $7 D013485
655    _2
$a Research Support, U.S. Gov't, Non-P.H.S. $7 D013486
700    1_
$a Bittrich, Sebastian $u RCSB Protein Data Bank, San Diego Supercomputer Center University of California, San Diego, La Jolla, CA 92093, USA
700    1_
$a Velankar, Sameer $u Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
700    1_
$a Koča, Jaroslav $u CEITEC, Central European Institute of Technology, Masaryk University, Brno, Czech Republic $u National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic
700    1_
$a Svobodová, Radka $u CEITEC, Central European Institute of Technology, Masaryk University, Brno, Czech Republic $u National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic
700    1_
$a Burley, Stephen K $u RCSB Protein Data Bank, San Diego Supercomputer Center University of California, San Diego, La Jolla, CA 92093, USA $u RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers The State University of New Jersey, Piscataway, NJ 08854, USA $u Cancer Institute of New Jersey, Rutgers The State University of New Jersey, New Brunswick, NJ 08903, USA $u Skaggs School of Pharmacy and Pharmaceutical Sciences University of California, San Diego, La Jolla, CA 92093, USA
700    1_
$a Rose, Alexander S $u RCSB Protein Data Bank, San Diego Supercomputer Center University of California, San Diego, La Jolla, CA 92093, USA
773    0_
$w MED00008919 $t PLoS computational biology $x 1553-7358 $g Roč. 16, č. 10 (2020), s. e1008247
856    41
$u https://pubmed.ncbi.nlm.nih.gov/33075050 $y Pubmed
910    __
$a ABA008 $b sig $c sign $y p $z 0
990    __
$a 20210420 $b ABA008
991    __
$a 20210507104147 $b ABA008
999    __
$a ok $b bmc $g 1650402 $s 1132393
BAS    __
$a 3
BAS    __
$a PreBMC
BMC    __
$a 2020 $b 16 $c 10 $d e1008247 $e 20201019 $i 1553-7358 $m PLoS computational biology $n PLoS Comput Biol $x MED00008919
GRA    __
$a R01 GM133198 $p NIGMS NIH HHS $2 United States
GRA    __
$a 104948 $p Wellcome Trust $2 United Kingdom
LZP    __
$a Pubmed-20210420

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...