-
Je něco špatně v tomto záznamu ?
BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management
D. Sehnal, S. Bittrich, S. Velankar, J. Koča, R. Svobodová, SK. Burley, AS. Rose
Jazyk angličtina Země Spojené státy americké
Typ dokumentu časopisecké články, Research Support, N.I.H., Extramural, práce podpořená grantem, Research Support, U.S. Gov't, Non-P.H.S.
Grantová podpora
R01 GM133198
NIGMS NIH HHS - United States
104948
Wellcome Trust - United Kingdom
NLK
Directory of Open Access Journals
od 2005
Free Medical Journals
od 2005
Public Library of Science (PLoS)
od 2005
PubMed Central
od 2005
Europe PubMed Central
od 2005
ProQuest Central
od 2005-06-01
Open Access Digital Library
od 2005-06-01
Open Access Digital Library
od 2005-01-01
Open Access Digital Library
od 2005-01-01
Medline Complete (EBSCOhost)
od 2005-06-01
Health & Medicine (ProQuest)
od 2005-06-01
ROAD: Directory of Open Access Scholarly Resources
od 2005
- MeSH
- chemické databáze MeSH
- komprese dat metody MeSH
- krystalografie metody MeSH
- makromolekulární látky chemie ultrastruktura MeSH
- molekulární modely * MeSH
- software * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression-factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.
Cancer Institute of New Jersey Rutgers The State University of New Jersey New Brunswick NJ 08903 USA
CEITEC Central European Institute of Technology Masaryk University Brno Czech Republic
National Centre for Biomolecular Research Faculty of Science Masaryk University Brno Czech Republic
Protein Data Bank in Europe Wellcome Genome Campus Hinxton UK
Citace poskytuje Crossref.org
- 000
- 00000naa a2200000 a 4500
- 001
- bmc21012014
- 003
- CZ-PrNML
- 005
- 20210507104148.0
- 007
- ta
- 008
- 210420s2020 xxu f 000 0|eng||
- 009
- AR
- 024 7_
- $a 10.1371/journal.pcbi.1008247 $2 doi
- 035 __
- $a (PubMed)33075050
- 040 __
- $a ABA008 $b cze $d ABA008 $e AACR2
- 041 0_
- $a eng
- 044 __
- $a xxu
- 100 1_
- $a Sehnal, David $u CEITEC, Central European Institute of Technology, Masaryk University, Brno, Czech Republic $u National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic $u Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
- 245 10
- $a BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management / $c D. Sehnal, S. Bittrich, S. Velankar, J. Koča, R. Svobodová, SK. Burley, AS. Rose
- 520 9_
- $a 3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression-factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.
- 650 _2
- $a krystalografie $x metody $7 D003461
- 650 _2
- $a komprese dat $x metody $7 D044962
- 650 _2
- $a chemické databáze $7 D062126
- 650 _2
- $a makromolekulární látky $x chemie $x ultrastruktura $7 D046911
- 650 12
- $a molekulární modely $7 D008958
- 650 12
- $a software $7 D012984
- 655 _2
- $a časopisecké články $7 D016428
- 655 _2
- $a Research Support, N.I.H., Extramural $7 D052061
- 655 _2
- $a práce podpořená grantem $7 D013485
- 655 _2
- $a Research Support, U.S. Gov't, Non-P.H.S. $7 D013486
- 700 1_
- $a Bittrich, Sebastian $u RCSB Protein Data Bank, San Diego Supercomputer Center University of California, San Diego, La Jolla, CA 92093, USA
- 700 1_
- $a Velankar, Sameer $u Protein Data Bank in Europe (PDBe), European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK
- 700 1_
- $a Koča, Jaroslav $u CEITEC, Central European Institute of Technology, Masaryk University, Brno, Czech Republic $u National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic
- 700 1_
- $a Svobodová, Radka $u CEITEC, Central European Institute of Technology, Masaryk University, Brno, Czech Republic $u National Centre for Biomolecular Research, Faculty of Science, Masaryk University, Brno, Czech Republic
- 700 1_
- $a Burley, Stephen K $u RCSB Protein Data Bank, San Diego Supercomputer Center University of California, San Diego, La Jolla, CA 92093, USA $u RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers The State University of New Jersey, Piscataway, NJ 08854, USA $u Cancer Institute of New Jersey, Rutgers The State University of New Jersey, New Brunswick, NJ 08903, USA $u Skaggs School of Pharmacy and Pharmaceutical Sciences University of California, San Diego, La Jolla, CA 92093, USA
- 700 1_
- $a Rose, Alexander S $u RCSB Protein Data Bank, San Diego Supercomputer Center University of California, San Diego, La Jolla, CA 92093, USA
- 773 0_
- $w MED00008919 $t PLoS computational biology $x 1553-7358 $g Roč. 16, č. 10 (2020), s. e1008247
- 856 41
- $u https://pubmed.ncbi.nlm.nih.gov/33075050 $y Pubmed
- 910 __
- $a ABA008 $b sig $c sign $y p $z 0
- 990 __
- $a 20210420 $b ABA008
- 991 __
- $a 20210507104147 $b ABA008
- 999 __
- $a ok $b bmc $g 1650402 $s 1132393
- BAS __
- $a 3
- BAS __
- $a PreBMC
- BMC __
- $a 2020 $b 16 $c 10 $d e1008247 $e 20201019 $i 1553-7358 $m PLoS computational biology $n PLoS Comput Biol $x MED00008919
- GRA __
- $a R01 GM133198 $p NIGMS NIH HHS $2 United States
- GRA __
- $a 104948 $p Wellcome Trust $2 United Kingdom
- LZP __
- $a Pubmed-20210420