BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management
Jazyk angličtina Země Spojené státy americké Médium electronic-ecollection
Typ dokumentu časopisecké články, Research Support, N.I.H., Extramural, práce podpořená grantem, Research Support, U.S. Gov't, Non-P.H.S.
Grantová podpora
R01 GM133198
NIGMS NIH HHS - United States
104948
Wellcome Trust - United Kingdom
PubMed
33075050
PubMed Central
PMC7595629
DOI
10.1371/journal.pcbi.1008247
PII: PCOMPBIOL-D-20-00814
Knihovny.cz E-zdroje
- MeSH
- chemické databáze MeSH
- komprese dat metody MeSH
- krystalografie metody MeSH
- makromolekulární látky chemie ultrastruktura MeSH
- molekulární modely * MeSH
- software * MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- Research Support, N.I.H., Extramural MeSH
- Research Support, U.S. Gov't, Non-P.H.S. MeSH
- Názvy látek
- makromolekulární látky MeSH
3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression-factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.
Cancer Institute of New Jersey Rutgers The State University of New Jersey New Brunswick NJ 08903 USA
CEITEC Central European Institute of Technology Masaryk University Brno Czech Republic
National Centre for Biomolecular Research Faculty of Science Masaryk University Brno Czech Republic
Protein Data Bank in Europe Wellcome Genome Campus Hinxton UK
Zobrazit více v PubMed
Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic acids research. 2018;47(D1):D520–D528. PubMed PMC
Kim SJ, Fernandez-Martinez J, Nudelman I, Shi Y, Zhang W, Raveh B, et al. Integrative structure and functional anatomy of a nuclear pore complex. Nature. 2018;555(7697):475 10.1038/nature26003 PubMed DOI PMC
Burley SK, Kurisu G, Markley JL, Nakamura H, Velankar S, Berman HM, et al. PDB-Dev: a prototype system for depositing integrative/hybrid structural models. Structure. 2017;25(9):1317–1318. 10.1016/j.str.2017.08.001 PubMed DOI PMC
Berman H, Henrick K, Nakamura H. Announcing the worldwide protein data bank. Nature Structural & Molecular Biology. 2003;10(12):980 10.1038/nsb1203-980 PubMed DOI
Sali A, Berman H, Schwede T, Trewhella J, Kleywegt G, Burley S, et al. Outcome of the First wwPDB Hybrid/Integrative Methods Task Force Workshop. Structure. 2015;23(7):1156–1167. 10.1016/j.str.2015.05.013. PubMed DOI PMC
Trewhella J, Hendrickson W, Kleywegt G, Sali A, Sato M, Schwede T, et al. Report of the wwPDB Small-Angle Scattering Task Force: Data Requirements for Biomolecular Modeling and the PDB. Structure. 2013;21(6):875–881. 10.1016/j.str.2013.04.020. PubMed DOI
Berman HM, Kleywegt GJ, Nakamura H, Markley JL. The Protein Data Bank archive as an open data resource. Journal of computer-aided molecular design. 2014;28(10):1009–1014. 10.1007/s10822-014-9770-y PubMed DOI PMC
Adams PD, Afonine PV, Baskaran K, Berman HM, Berrisford J, Bricogne G, et al. Announcing mandatory submission of PDBx/mmCIF format files for crystallographic depositions to the Protein Data Bank (PDB). Acta Crystallographica Section D. 2019;75(4):451–454. 10.1107/S2059798319004522 PubMed DOI PMC
Westbrook JD, Bourne PE. STAR/mmCIF: An ontology for macromolecular structure. Bioinformatics. 2000;16(2):159–168. 10.1093/bioinformatics/16.2.159 PubMed DOI
Vallat B, Webb B, Westbrook JD, Sali A, Berman HM. Development of a Prototype System for Archiving Integrative/Hybrid Structure Models of Biological Macromolecules. Structure. 2018;26(6):894–904.e2. 10.1016/j.str.2018.03.011. PubMed DOI PMC
Valasatava Y, Bradley AR, Rose AS, Duarte JM, Prlić A, Rose PW. Towards an efficient compression of 3D coordinates of macromolecular structures. PLOS ONE. 2017;12(3):e0174846 10.1371/journal.pone.0174846 PubMed DOI PMC
Bradley AR, Rose AS, Pavelka A, Valasatava Y, Duarte JM, Prlić A, et al. MMTF—An efficient file format for the transmission, visualization, and analysis of macromolecular structures. PLOS Computational Biology. 2017;13(6):e1005575 10.1371/journal.pcbi.1005575 PubMed DOI PMC
Westbrook J, Ito N, Nakamura H, Henrick K, Berman HM. PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics. 2004;21(7):988–992. PubMed
Bekker GJ, Nakamura H, Kinjo AR. Molmil: a molecular viewer for the PDB and beyond. Journal of Cheminformatics. 2016;8(1):42 10.1186/s13321-016-0155-1 PubMed DOI PMC
Kinjo AR, Bekker GJ, Wako H, Endo S, Tsuchiya Y, Sato H, et al. New tools and functions in data-out activities at Protein Data Bank Japan (PDBj). Protein Science. 2018;27(1):95–102. 10.1002/pro.3273 PubMed DOI PMC
Hall SR. The STAR file: a new format for electronic data transfer and archiving. Journal of Chemical Information and Computer Sciences. 1991;31(2):326–333.
Sehnal D, Deshpande M, Vařeková RS, Mir S, Berka K, Midlik A, et al. LiteMol suite: interactive web-based visualization of large-scale macromolecular structure data. Nature Methods. 2017;14(12):1121–1122. 10.1038/nmeth.4499 PubMed DOI
Sehnal D, Rose A, Koca J, Burley S, Velankar S. Mol*: Towards a Common Library and Tools for Web Molecular Graphics. In: Byska J, Krone M, Sommer B, editors. Workshop on Molecular Graphics and Visual Analysis of Molecular Data. The Eurographics Association; 2018.
Mesoscale explorer: Visual exploration of large-scale molecular models
Mesoscale Explorer - Visual Exploration of Large-Scale Molecular Models
Describing and Sharing Molecular Visualizations Using the MolViewSpec Toolkit
Mol* Viewer: modern web app for 3D visualization and analysis of large biomolecular structures