Automated shape-based clustering of 3D immunoglobulin protein structures in chronic lymphocytic leukemia
Jazyk angličtina Země Anglie, Velká Británie Médium electronic
Typ dokumentu časopisecké články
PubMed
30453883
PubMed Central
PMC6245605
DOI
10.1186/s12859-018-2381-1
PII: 10.1186/s12859-018-2381-1
Knihovny.cz E-zdroje
- Klíčová slova
- 3D protein descriptors, CLL protein clustering, descriptor fusion,
- MeSH
- anotace sekvence MeSH
- automatizace MeSH
- chronická lymfatická leukemie metabolismus MeSH
- databáze proteinů MeSH
- imunoglobuliny chemie MeSH
- lidé MeSH
- sekvence aminokyselin MeSH
- zobrazování trojrozměrné * MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- Názvy látek
- imunoglobuliny MeSH
BACKGROUND: Although the etiology of chronic lymphocytic leukemia (CLL), the most common type of adult leukemia, is still unclear, strong evidence implicates antigen involvement in disease ontogeny and evolution. Primary and 3D structure analysis has been utilised in order to discover indications of antigenic pressure. The latter has been mostly based on the 3D models of the clonotypic B cell receptor immunoglobulin (BcR IG) amino acid sequences. Therefore, their accuracy is directly dependent on the quality of the model construction algorithms and the specific methods used to compare the ensuing models. Thus far, reliable and robust methods that can group the IG 3D models based on their structural characteristics are missing. RESULTS: Here we propose a novel method for clustering a set of proteins based on their 3D structure focusing on 3D structures of BcR IG from a large series of patients with CLL. The method combines techniques from the areas of bioinformatics, 3D object recognition and machine learning. The clustering procedure is based on the extraction of 3D descriptors, encoding various properties of the local and global geometrical structure of the proteins. The descriptors are extracted from aligned pairs of proteins. A combination of individual 3D descriptors is also used as an additional method. The comparison of the automatically generated clusters to manual annotation by experts shows an increased accuracy when using the 3D descriptors compared to plain bioinformatics-based comparison. The accuracy is increased even more when using the combination of 3D descriptors. CONCLUSIONS: The experimental results verify that the use of 3D descriptors commonly used for 3D object recognition can be effectively applied to distinguishing structural differences of proteins. The proposed approach can be applied to provide hints for the existence of structural groups in a large set of unannotated BcR IG protein files in both CLL and, by logical extension, other contexts where it is relevant to characterize BcR IG structural similarity. The method does not present any limitations in application and can be extended to other types of proteins.
Carlsberg Research Laboratory Copenhagen Denmark
Center for Biological Sequence Analysis Technical University of Denmark Copenhagen Denmark
Department of Informatics Ionian University Corfu Greece
Hematology Department and HCT Unit G Papanikolaou Hospital Thessaloniki Greece
Masaryk University Central European Institute of Technology Brno Czech Republic
Zobrazit více v PubMed
Bender A, Glen RC. Molecular similarity: a key technique in molecular informatics. Org Biomol Chem. 2004;2(22):3204–18. doi: 10.1039/b409813g. PubMed DOI
Webb B, Sali A. Comparative protein structure modeling using MODELLER. Curr Protoc Bioinforma. 2014;47(1):5–6. doi: 10.1002/0471250953.bi0506s47. PubMed DOI
Axenopoulos A, Rafailidis D, Papadopoulos G, Houstis EN, Daras P. Similarity search of flexible 3d molecules combining local and global shape descriptors. IEEE/ACM Trans Comput Biol Bioinforma. 2016;13(5):954–70. doi: 10.1109/TCBB.2015.2498553. PubMed DOI
Murzin AG, Brenner SE, Hubbard T, Chothia C. Scop: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247(4):536–40. PubMed
Knudsen M, Wiuf C. The cath database. Hum Genomics. 2010;4(3):207. doi: 10.1186/1479-7364-4-3-207. PubMed DOI PMC
Csaba G, Birzele F, Zimmer R. Systematic comparison of scop and cath: a new gold standard for protein structure analysis. BMC Struct Biol. 2009;9(1):23. doi: 10.1186/1472-6807-9-23. PubMed DOI PMC
Sillitoe I, Dawson N, Thornton J, Orengo C. The history of the cath structural classification of protein domains. Biochimie. 2015;119:209–17. doi: 10.1016/j.biochi.2015.08.004. PubMed DOI PMC
Li Z, Natarajan P, Ye Y, Hrabe T, Godzik A. Posa: a user-driven, interactive multiple protein structure alignment server. Nucleic Acids Res. 2014;42(W1):240–5. doi: 10.1093/nar/gku394. PubMed DOI PMC
Liu Y-S, Li Q, Zheng G-Q, Ramani K, Benjamin W. Using diffusion distances for flexible molecular shape comparison. BMC Bioinformatics. 2010;11(1):480. doi: 10.1186/1471-2105-11-480. PubMed DOI PMC
Arenas AF, Salcedo GE, Montoya AM, Gomez-Marin JE. Msca: a spectral comparison algorithm between time series to identify protein-protein interactions. BMC Bioinformatics. 2015;16(1):152. doi: 10.1186/s12859-015-0599-8. PubMed DOI PMC
Srivastava S, Lal SB, Mishra D, Angadi U, Chaturvedi K, Rai SN, Rai A. An efficient algorithm for protein structure comparison using elastic shape analysis. Algoritm Mol Biol. 2016;11(1):27. doi: 10.1186/s13015-016-0089-1. PubMed DOI PMC
Kamburov A, Lawrence MS, Polak P, Leshchiner I, Lage K, Golub TR, Lander ES, Getz G. Comprehensive assessment of cancer missense mutation clustering in protein structures. Proc Natl Acad Sci. 2015;112(40):5486–95. doi: 10.1073/pnas.1516373112. PubMed DOI PMC
Jiang M, Xu Y, Zhu B. Protein structure–structure alignment with discrete fréchet distance. J Bioinforma Comput Biol. 2008;6(01):51–64. doi: 10.1142/S0219720008003278. PubMed DOI
Ballester PJ, Richards WG. Ultrafast shape recognition to search compound databases for similar molecular shapes. J Comput Chem. 2007;28(10):1711–23. doi: 10.1002/jcc.20681. PubMed DOI
Ballester PJ, Richards WG. Ultrafast shape recognition for similarity search in molecular databases. In: Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences. 463, 2081. The Royal Society;2007.1307–1321.
Bock M, Cortelazzo G, Ferrari C, Guerra C. Identifying similar surface patches on proteins using a spin-image surface representation. In: Combinatorial Pattern Matching. Springer: 2005. p. 29–99.
Ankerst M, Kastenmüller G, Kriegel H-P, Seidl T. 3d shape histograms for similarity search and classification in spatial databases. In: International Symposium on Spatial Databases. Springer: 1999. p. 207–26.
Kinoshita K, Nakamura H. Identification of protein biochemical functions by similarity search using the molecular surface database ef-site. Protein Sci. 2003;12(8):1589–95. doi: 10.1110/ps.0368703. PubMed DOI PMC
Furuya T, Ohbuchi R. Dense sampling and fast encoding for 3d model retrieval using bag-of-visual features. In: Proceedings of the ACM International Conference on Image and Video Retrieval. ACM: 2009. p. 26.
Chen D-Y, Tian X-P, Shen Y-T, Ouhyoung M. On visual similarity based 3d model retrieval, vol. 22. In: Computer Graphics Forum. Wiley Online Library: 2003. p. 223–32.
Chen BY, Honig B. Vasp: a volumetric analysis of surface properties yields insights into protein-ligand binding specificity. PLoS Comput Biol. 2010;6(8):1000881. doi: 10.1371/journal.pcbi.1000881. PubMed DOI PMC
Chen BY. Vasp-e: Specificity annotation with a volumetric analysis of electrostatic isopotentials. PLoS Comput Biol. 2014;10(8):1003792. doi: 10.1371/journal.pcbi.1003792. PubMed DOI PMC
Amin SR, Erdin S, Ward RM, Lua RC, Lichtarge O. Prediction and experimental validation of enzyme substrate specificity in protein structures. Proc Natl Acad Sci. 2013;110(45):4195–202. doi: 10.1073/pnas.1305162110. PubMed DOI PMC
Wang Y, You Z, Li X, Chen X, Jiang T, Zhang J. Pcvmzm: Using the probabilistic classification vector machines model combined with a zernike moments descriptor to predict protein–protein interactions from protein sequences. Int J Mol Sci. 2017;18(5):1029. doi: 10.3390/ijms18051029. PubMed DOI PMC
Wang Y-B, You Z-H, Li L-P, Huang Y-A, Yi H-C. Detection of interactions between proteins by using legendre moments descriptor to extract discriminatory information embedded in pssm. Molecules. 2017;22(8):1366. doi: 10.3390/molecules22081366. PubMed DOI PMC
Sael L, Li B, La D, Fang Y, Ramani K, Rustamov R, Kihara D. Fast protein tertiary structure retrieval based on global surface shape similarity. Proteins Struct Funct Bioinforma. 2008;72(4):1259–73. doi: 10.1002/prot.22030. PubMed DOI
Ritchie DW, Venkatraman V. Ultra-fast fft protein docking on graphics processors. Bioinformatics. 2010;26(19):2398–405. doi: 10.1093/bioinformatics/btq444. PubMed DOI
Sit A, Kihara D. Comparison of image patches using local moment invariants. IEEE Trans Image Process. 2014;23(5):2369–79. doi: 10.1109/TIP.2014.2315923. PubMed DOI
Eck S, Wörz S, Müller-Ott K, Hahn M, Biesdorf A, Schotta G, Rippe K, Rohr K. A spherical harmonics intensity model for 3d segmentation and 3d shape analysis of heterochromatin foci. Med Image Anal. 2016;32:18–31. doi: 10.1016/j.media.2016.03.001. PubMed DOI
Li Z, Geng C, He P, Yao Y. A novel method of 3d graphical representation and similarity analysis for proteins. MATCH Commun Math Comput Chem. 2014;71:213–26.
Fang Y, Liu Y-S, Ramani K. Three dimensional shape comparison of flexible proteins using the local-diameter descriptor. BMC Struct Biol. 2009;9(1):29. doi: 10.1186/1472-6807-9-29. PubMed DOI PMC
Li B, Lu Y, Li C, Godil A, Schreck T, Aono M, Burtscher M, Chen Q, Chowdhury NK, Fang B, et al. A comparison of 3d shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Comp Vision Image Underst. 2015;131:1–27. doi: 10.1016/j.cviu.2014.10.006. DOI
Can T, Wang Y-F. Ctss: a robust and efficient method for protein structure alignment based on local geometrical and biological features. In: Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE. IEEE: 2003. p. 169–79. PubMed
Mrozek D, BroŻek M, Małysiak-Mrozek B. Parallel implementation of 3d protein structure similarity searches using a gpu and the cuda. J Mol Model. 2014;20(2):2067. doi: 10.1007/s00894-014-2067-1. PubMed DOI PMC
Marcatili P, Ghiotto F, Tenca C, Chailyan A, Mazzarello AN, Yan X-J, Colombo M, Albesiano E, Bagnara D, Cutrona G, et al. Igs expressed by chronic lymphocytic leukemia b cells show limited binding-site structure variability. J Immunol. 2013;190(11):5771–8. doi: 10.4049/jimmunol.1300321. PubMed DOI
Sutton L-A, Agathangelidis A, Belessi C, Darzentas N, Davi F, Ghia P, Rosenquist R, Stamatopoulos K. Antigen selection in b-cell lymphomas—tracing the evidence. vol. 23. In: Seminars in Cancer Biology. Elsevier: 2013. p. 399–409. PubMed
Agathangelidis A, Darzentas N, Hadzidimitriou A, Brochet X, Murray F, Yan X-J, Davis Z, van Gastel-Mol EJ, Tresoldi C, Chu CC, et al. Stereotyped b-cell receptors in one-third of chronic lymphocytic leukemia: a molecular classification with implications for targeted therapies. Blood. 2012;119(19):4467–75. doi: 10.1182/blood-2011-11-393694. PubMed DOI PMC
Stamatopoulos K, Agathangelidis A, Rosenquist R, Ghia P. Antigen receptor stereotypy in chronic lymphocytic leukemia. Leukemia. 2017;31(2):282. doi: 10.1038/leu.2016.322. PubMed DOI
Rusu RB, Blodow N, Beetz M. Fast point feature histograms (fpfh) for 3d registration. In: Robotics and Automation, 2009. ICRA’09. IEEE International Conference On. IEEE: 2009. p. 3212–7.
Frome A, Huber D, Kolluri R, Bülow T, Malik J. Recognizing objects in range data using regional point descriptors. Comp Vision -ECCV 2004. 2004:224–37.
Marton Z-C, Pangercic D, Blodow N, Kleinehellefort J, Beetz M. General 3d modelling of novel objects from a single view. In: Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference On. IEEE: 2010. p. 3700–5.
Rusu RB, Bradski G, Thibaux R, Hsu J. Fast 3d recognition and pose using the viewpoint feature histogram. In: Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference2 On. IEEE: 2010. p. 2155–62.
Zhang Y, Skolnick J. Tm-align: a protein structure alignment algorithm based on the tm-score. Nucleic Acids Res. 2005;33(7):2302–9. doi: 10.1093/nar/gki524. PubMed DOI PMC
Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins Struct Funct Bioinforma. 2004;57(4):702–10. doi: 10.1002/prot.20264. PubMed DOI
Rusu RB, Cousins S. 3D is here: Point Cloud Library (PCL). In: IEEE International Conference on Robotics and Automation (ICRA). Shanghai: 2011.
Rusu RB, Marton ZC, Blodow N, Beetz M. Learning informative point classes for the acquisition of object model maps. In: Control, Automation, Robotics and Vision, 2008. ICARCV 2008. 10th International Conference On. IEEE: 2008. p. 643–650.
Hallek M, Cheson BD, Catovsky D, Caligaris-Cappio F, Dighiero G, Döhner H, Hillmen P, Keating MJ, Montserrat E, Rai KR, et al. Guidelines for the diagnosis and treatment of chronic lymphocytic leukemia: a report from the international workshop on chronic lymphocytic leukemia updating the national cancer institute–working group 1996 guidelines. Blood. 2008;111(12):5446–56. doi: 10.1182/blood-2007-06-093906. PubMed DOI PMC
Darzentas N, Stamatopoulos K. The significance of stereotyped b-cell receptors in chronic lymphocytic leukemia. Hematol Oncol Clin N Am. 2013;27(2):237–50. doi: 10.1016/j.hoc.2012.12.001. PubMed DOI
Bystry V, Agathangelidis A, Bikos V, Sutton LA, Baliakas P, Hadzidimitriou A, Stamatopoulos K, Darzentas N. Arrest/assignsubsets: a novel application for robust subclassification of chronic lymphocytic leukemia based on b cell receptor ig stereotypy. Bioinformatics. 2015;31(23):3844–6. PubMed
Marcatili P, Olimpieri PP, Chailyan A, Tramontano A. Antibody modeling using the prediction of immunoglobulin structure (pigs) web server. Nat Protoc. 2014;9(12):2771–83. doi: 10.1038/nprot.2014.189. PubMed DOI
Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci. 1992;89(22):10915–9. doi: 10.1073/pnas.89.22.10915. PubMed DOI PMC
Messih MA, Lepore R, Marcatili P, Tramontano A. Improving the accuracy of the structure prediction of the third hypervariable loop of the heavy chains of antibodies. Bioinformatics. 2014;30(19):2733–40. doi: 10.1093/bioinformatics/btu194. PubMed DOI PMC
Marcatili P, Mochament K, Agathangelidis A, Moschonas P, Sutton L-A, Yan X-J, Bikos V, Vardi A, Chailyan A, Stavroyianni N, et al.Automated clustering analysis of immunoglobulin sequences in chronic lymphocytic leukemia based on 3D structural descriptors. Blood. 2016; 128(22).
Vardi A, Agathangelidis A, Sutton L-A, Chatzouli M, Scarfò L, Mansouri L, Douka V, Anagnostopoulos A, Darzentas N, Rosenquist R, et al. Igg-switched cll has a distinct immunogenetic signature from the common md variant: ontogenetic implications. Clin Cancer Res. 2014;20(2):323–30. doi: 10.1158/1078-0432.CCR-13-1993. PubMed DOI
Ortiz AR, Strauss CE, Olmea O. Mammoth (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci. 2002;11(11):2606–21. doi: 10.1110/ps.0215902. PubMed DOI PMC
Zemla A. Lga: a method for finding 3d similarities in protein structures. Nucleic Acids Res. 2003;31(13):3370–4. doi: 10.1093/nar/gkg571. PubMed DOI PMC
Wrabl JO, Grishin NV. Statistics of random protein superpositions: p-values for pairwise structure alignment. J Comput Biol. 2008;15(3):317–55. doi: 10.1089/cmb.2007.0161. PubMed DOI
Kolodny R, Koehl P, Levitt M. Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol. 2005;346(4):1173–88. doi: 10.1016/j.jmb.2004.12.032. PubMed DOI PMC
Pandit SB, Skolnick J. Fr-tm-align: a new protein structural alignment method based on fragment alignments and the tm-score. BMC Bioinformatics. 2008;9(1):531. doi: 10.1186/1471-2105-9-531. PubMed DOI PMC
Aung Z, Tan K-L. Matalign: precise protein structure comparison by matrix alignment. J Bioinforma Comput Biol. 2006;4(06):1197–216. doi: 10.1142/S0219720006002417. PubMed DOI
Martínez L, Andreani R, Martínez JM. Convergent algorithms for protein structural alignment. BMC Bioinformatics. 2007;8(1):306. doi: 10.1186/1471-2105-8-306. PubMed DOI PMC
Krissinel E, Henrick K. Secondary-structure matching (ssm), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr D Biol Crystallogr. 2004;60(12):2256–68. doi: 10.1107/S0907444904026460. PubMed DOI