BinaryCIF and CIFTools-Lightweight, efficient and extensible macromolecular data management

. 2020 Oct ; 16 (10) : e1008247. [epub] 20201019

Jazyk angličtina Země Spojené státy americké Médium electronic-ecollection

Typ dokumentu časopisecké články, Research Support, N.I.H., Extramural, práce podpořená grantem, Research Support, U.S. Gov't, Non-P.H.S.

Perzistentní odkaz   https://www.medvik.cz/link/pmid33075050

Grantová podpora
R01 GM133198 NIGMS NIH HHS - United States
104948 Wellcome Trust - United Kingdom

Odkazy

PubMed 33075050
PubMed Central PMC7595629
DOI 10.1371/journal.pcbi.1008247
PII: PCOMPBIOL-D-20-00814
Knihovny.cz E-zdroje

3D macromolecular structural data is growing ever more complex and plentiful in the wake of substantive advances in experimental and computational structure determination methods including macromolecular crystallography, cryo-electron microscopy, and integrative methods. Efficient means of working with 3D macromolecular structural data for archiving, analyses, and visualization are central to facilitating interoperability and reusability in compliance with the FAIR Principles. We address two challenges posed by growth in data size and complexity. First, data size is reduced by bespoke compression techniques. Second, complexity is managed through improved software tooling and fully leveraging available data dictionary schemas. To this end, we introduce BinaryCIF, a serialization of Crystallographic Information File (CIF) format files that maintains full compatibility to related data schemas, such as PDBx/mmCIF, while reducing file sizes by more than a factor of two versus gzip compressed CIF files. Moreover, for the largest structures, BinaryCIF provides even better compression-factor ten and four versus CIF files and gzipped CIF files, respectively. Herein, we describe CIFTools, a set of libraries in Java and TypeScript for generic and typed handling of CIF and BinaryCIF files. Together, BinaryCIF and CIFTools enable lightweight, efficient, and extensible handling of 3D macromolecular structural data.

Zobrazit více v PubMed

Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic acids research. 2018;47(D1):D520–D528. PubMed PMC

Kim SJ, Fernandez-Martinez J, Nudelman I, Shi Y, Zhang W, Raveh B, et al. Integrative structure and functional anatomy of a nuclear pore complex. Nature. 2018;555(7697):475 10.1038/nature26003 PubMed DOI PMC

Burley SK, Kurisu G, Markley JL, Nakamura H, Velankar S, Berman HM, et al. PDB-Dev: a prototype system for depositing integrative/hybrid structural models. Structure. 2017;25(9):1317–1318. 10.1016/j.str.2017.08.001 PubMed DOI PMC

Berman H, Henrick K, Nakamura H. Announcing the worldwide protein data bank. Nature Structural & Molecular Biology. 2003;10(12):980 10.1038/nsb1203-980 PubMed DOI

Sali A, Berman H, Schwede T, Trewhella J, Kleywegt G, Burley S, et al. Outcome of the First wwPDB Hybrid/Integrative Methods Task Force Workshop. Structure. 2015;23(7):1156–1167. 10.1016/j.str.2015.05.013. PubMed DOI PMC

Trewhella J, Hendrickson W, Kleywegt G, Sali A, Sato M, Schwede T, et al. Report of the wwPDB Small-Angle Scattering Task Force: Data Requirements for Biomolecular Modeling and the PDB. Structure. 2013;21(6):875–881. 10.1016/j.str.2013.04.020. PubMed DOI

Berman HM, Kleywegt GJ, Nakamura H, Markley JL. The Protein Data Bank archive as an open data resource. Journal of computer-aided molecular design. 2014;28(10):1009–1014. 10.1007/s10822-014-9770-y PubMed DOI PMC

Adams PD, Afonine PV, Baskaran K, Berman HM, Berrisford J, Bricogne G, et al. Announcing mandatory submission of PDBx/mmCIF format files for crystallographic depositions to the Protein Data Bank (PDB). Acta Crystallographica Section D. 2019;75(4):451–454. 10.1107/S2059798319004522 PubMed DOI PMC

Westbrook JD, Bourne PE. STAR/mmCIF: An ontology for macromolecular structure. Bioinformatics. 2000;16(2):159–168. 10.1093/bioinformatics/16.2.159 PubMed DOI

Vallat B, Webb B, Westbrook JD, Sali A, Berman HM. Development of a Prototype System for Archiving Integrative/Hybrid Structure Models of Biological Macromolecules. Structure. 2018;26(6):894–904.e2. 10.1016/j.str.2018.03.011. PubMed DOI PMC

Valasatava Y, Bradley AR, Rose AS, Duarte JM, Prlić A, Rose PW. Towards an efficient compression of 3D coordinates of macromolecular structures. PLOS ONE. 2017;12(3):e0174846 10.1371/journal.pone.0174846 PubMed DOI PMC

Bradley AR, Rose AS, Pavelka A, Valasatava Y, Duarte JM, Prlić A, et al. MMTF—An efficient file format for the transmission, visualization, and analysis of macromolecular structures. PLOS Computational Biology. 2017;13(6):e1005575 10.1371/journal.pcbi.1005575 PubMed DOI PMC

Westbrook J, Ito N, Nakamura H, Henrick K, Berman HM. PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics. 2004;21(7):988–992. PubMed

Bekker GJ, Nakamura H, Kinjo AR. Molmil: a molecular viewer for the PDB and beyond. Journal of Cheminformatics. 2016;8(1):42 10.1186/s13321-016-0155-1 PubMed DOI PMC

Kinjo AR, Bekker GJ, Wako H, Endo S, Tsuchiya Y, Sato H, et al. New tools and functions in data-out activities at Protein Data Bank Japan (PDBj). Protein Science. 2018;27(1):95–102. 10.1002/pro.3273 PubMed DOI PMC

Hall SR. The STAR file: a new format for electronic data transfer and archiving. Journal of Chemical Information and Computer Sciences. 1991;31(2):326–333.

Sehnal D, Deshpande M, Vařeková RS, Mir S, Berka K, Midlik A, et al. LiteMol suite: interactive web-based visualization of large-scale macromolecular structure data. Nature Methods. 2017;14(12):1121–1122. 10.1038/nmeth.4499 PubMed DOI

Sehnal D, Rose A, Koca J, Burley S, Velankar S. Mol*: Towards a Common Library and Tools for Web Molecular Graphics. In: Byska J, Krone M, Sommer B, editors. Workshop on Molecular Graphics and Visual Analysis of Molecular Data. The Eurographics Association; 2018.

Najít záznam

Citační ukazatele

Nahrávání dat ...

    Možnosti archivace