BACKGROUND: Structure search is one of the valuable capabilities of small-molecule databases. Fingerprint-based screening methods are usually employed to enhance the search performance by reducing the number of calls to the verification procedure. In substructure search, fingerprints are designed to capture important structural aspects of the molecule to aid the decision about whether the molecule contains a given substructure. Currently available cartridges typically provide acceptable search performance for processing user queries, but do not scale satisfactorily with dataset size. RESULTS: We present Sachem, a new open-source chemical cartridge that implements two substructure search methods: The first is a performance-oriented reimplementation of substructure indexing based on the OrChem fingerprint, and the second is a novel method that employs newly designed fingerprints stored in inverted indices. We assessed the performance of both methods on small, medium, and large datasets containing 1, 10, and 94 million compounds, respectively. Comparison of Sachem with other freely available cartridges revealed improvements in overall performance, scaling potential and screen-out efficiency. CONCLUSIONS: The Sachem cartridge allows efficient substructure searches in databases of all sizes. The sublinear performance scaling of the second method and the ability to efficiently query large amounts of pre-extracted information may together open the door to new applications for substructure searches.
- Keywords
- Inverted indices, Molecule cartridges, Small molecule databases, Substructure search,
- Publication type
- Journal Article MeSH
MOTIVATION: The existing connections between large databases of chemicals, proteins, metabolites and assays offer valuable resources for research in fields ranging from drug design to metabolomics. Transparent search across multiple databases provides a way to efficiently utilize these resources. To simplify such searches, many databases have adopted semantic technologies that allow interoperable querying of the datasets using SPARQL query language. However, the interoperable interfaces of the chemical databases still lack the functionality of structure-driven chemical search, which is a fundamental method of data discovery in the chemical search space. RESULTS: We present a SPARQL service that augments existing semantic services by making interoperable substructure and similarity searches in small-molecule databases possible. The service thus offers new possibilities for querying interoperable databases, and simplifies writing of heterogeneous queries that include chemical-structure search terms. AVAILABILITY: The service is freely available and accessible using a standard SPARQL endpoint interface. The service documentation and user-oriented demonstration interfaces that allow quick explorative querying of datasets are available at https://idsm.elixir-czech.cz .
- Keywords
- Interoperability, Linked data, Small molecule databases, Substructure search,
- Publication type
- Journal Article MeSH
A search in an all-jet final state for new massive resonances decaying to W W , W Z , or Z Z boson pairs using a novel analysis method is presented. The analysis is performed on data corresponding to an integrated luminosity of 77.3 fb - 1 recorded with the CMS experiment at the LHC at a centre-of-mass energy of 13 Te . The search is focussed on potential narrow-width resonances with masses above 1.2 Te , where the decay products of each W or Z boson are expected to be collimated into a single, large-radius jet. The signal is extracted using a three-dimensional maximum likelihood fit of the two jet masses and the dijet invariant mass, yielding an improvement in sensitivity of up to 30% relative to previous search methods. No excess is observed above the estimated standard model background. In a heavy vector triplet model, spin-1 Z ' and W ' resonances with masses below 3.5 and 3.8 Te , respectively, are excluded at 95% confidence level. In a bulk graviton model, upper limits on cross sections are set between 27 and 0.2 fb for resonance masses between 1.2 and 5.2 Te , respectively. The limits presented in this paper are the best to date in the dijet final state.
- Keywords
- CMS, Diboson resonances, Physics, Substructure,
- Publication type
- Journal Article MeSH
Results are reported from a search for new particles that decay into a photon and two gluons, in events with jets. Novel jet substructure techniques are developed that allow photons to be identified in an environment densely populated with hadrons. The analyzed proton-proton collision data were collected by the CMS experiment at the LHC, in 2016 at sqrt[s]=13 TeV, and correspond to an integrated luminosity of 35.9 fb^{-1}. The spectra of total transverse hadronic energy of candidate events are examined for deviations from the standard model predictions. No statistically significant excess is observed over the expected background. The first cross section limits on new physics processes resulting in such events are set. The results are interpreted as upper limits on the rate of gluino pair production, utilizing a simplified stealth supersymmetry model. The excluded gluino masses extend up to 1.7 TeV, for a neutralino mass of 200 GeV and exceed previous mass constraints set by analyses targeting events with isolated photons.
- Publication type
- Journal Article MeSH
A search is reported for pairs of light Higgs bosons (H1) produced in supersymmetric cascade decays in final states with small missing transverse momentum. A data set of LHC pp collisions collected with the CMS detector at s=13TeV and corresponding to an integrated luminosity of 138fb-1 is used. The search targets events where both H1 bosons decay into pairs that are reconstructed as large-radius jets using substructure techniques. No evidence is found for an excess of events beyond the background expectations of the standard model (SM). Results from the search are interpreted in the next-to-minimal supersymmetric extension of the SM, where a "singlino" of small mass leads to squark and gluino cascade decays that can predominantly end in a highly Lorentz-boosted singlet-like H1 and a singlino-like neutralino of small transverse momentum. Upper limits are set on the product of the squark or gluino pair production cross section and the square of the branching fraction of the H1 in a benchmark model containing almost mass-degenerate gluinos and light-flavour squarks. Under the assumption of an SM-like branching fraction, H1 bosons with masses in the range 40-120GeV arising from the decays of squarks or gluinos with a mass of 1200-2500GeV are excluded at 95% confidence level.
- Publication type
- Journal Article MeSH
A search for narrow low-mass resonances decaying to quark-antiquark pairs is presented. The search is based on proton-proton collision events collected at 13 TeV by the CMS detector at the CERN LHC. The data sample corresponds to an integrated luminosity of 35.9 fb^{-1}, recorded in 2016. The search considers the case where the resonance has high transverse momentum due to initial-state radiation of a hard photon. To study this process, the decay products of the resonance are reconstructed as a single large-radius jet with two-pronged substructure. The signal would be identified as a localized excess in the jet invariant mass spectrum. No evidence for such a resonance is observed in the mass range 10 to 125 GeV. Upper limits at the 95% confidence level are set on the coupling strength of resonances decaying to quark pairs. The results obtained with this photon trigger strategy provide the first direct constraints on quark-antiquark resonance masses below 50 GeV obtained at a hadron collider.
- Publication type
- Journal Article MeSH
A search is presented for narrow heavy resonances X decaying into pairs of Higgs bosons ([Formula: see text]) in proton-proton collisions collected by the CMS experiment at the LHC at [Formula: see text]. The data correspond to an integrated luminosity of 19.7[Formula: see text]. The search considers [Formula: see text] resonances with masses between 1 and 3[Formula: see text], having final states of two b quark pairs. Each Higgs boson is produced with large momentum, and the hadronization products of the pair of b quarks can usually be reconstructed as single large jets. The background from multijet and [Formula: see text] events is significantly reduced by applying requirements related to the flavor of the jet, its mass, and its substructure. The signal would be identified as a peak on top of the dijet invariant mass spectrum of the remaining background events. No evidence is observed for such a signal. Upper limits obtained at 95 % confidence level for the product of the production cross section and branching fraction [Formula: see text] range from 10 to 1.5[Formula: see text] for the mass of X from 1.15 to 2.0[Formula: see text], significantly extending previous searches. For a warped extra dimension theory with a mass scale [Formula: see text] [Formula: see text], the data exclude radion scalar masses between 1.15 and 1.55[Formula: see text].
- Publication type
- Journal Article MeSH
An inclusive search for the standard model Higgs boson (H) produced with large transverse momentum (p_{T}) and decaying to a bottom quark-antiquark pair (bb[over ¯]) is performed using a data set of pp collisions at sqrt[s]=13 TeV collected with the CMS experiment at the LHC. The data sample corresponds to an integrated luminosity of 35.9 fb^{-1}. A highly Lorentz-boosted Higgs boson decaying to bb[over ¯] is reconstructed as a single, large radius jet, and it is identified using jet substructure and dedicated b tagging techniques. The method is validated with Z→bb[over ¯] decays. The Z→bb[over ¯] process is observed for the first time in the single-jet topology with a local significance of 5.1 standard deviations (5.8 expected). For a Higgs boson mass of 125 GeV, an excess of events above the expected background is observed (expected) with a local significance of 1.5 (0.7) standard deviations. The measured cross section times branching fraction for production via gluon fusion of H→bb[over ¯] with reconstructed p_{T}>450 GeV and in the pseudorapidity range -2.5<η<2.5 is 74±48(stat)_{-10}^{+17}(syst) fb, which is consistent within uncertainties with the standard model prediction.
- Publication type
- Journal Article MeSH
A search for a massive [Formula: see text] gauge boson decaying to a top quark and a bottom quark is performed with the ATLAS detector in [Formula: see text] collisions at the LHC. The dataset was taken at a centre-of-mass energy of [Formula: see text] and corresponds to [Formula: see text] of integrated luminosity. This analysis is done in the hadronic decay mode of the top quark, where novel jet substructure techniques are used to identify jets from high-momentum top quarks. This allows for a search for high-mass [Formula: see text] bosons in the range 1.5-3.0 [Formula: see text]. [Formula: see text]-tagging is used to identify jets originating from [Formula: see text]-quarks. The data are consistent with Standard Model background-only expectations, and upper limits at 95 % confidence level are set on the [Formula: see text] cross section times branching ratio ranging from [Formula: see text] to [Formula: see text] for left-handed [Formula: see text] bosons, and ranging from [Formula: see text] to [Formula: see text] for [Formula: see text] bosons with purely right-handed couplings. Upper limits at 95 % confidence level are set on the [Formula: see text]-boson coupling to [Formula: see text] as a function of the [Formula: see text] mass using an effective field theory approach, which is independent of details of particular models predicting a [Formula: see text] boson.
- Publication type
- Journal Article MeSH
This Letter presents the results of a search for pair-produced particles of masses above 100 GeV that each decay into at least four quarks. Using data collected by the CMS experiment at the LHC in 2015-2016, corresponding to an integrated luminosity of 38.2 fb^{-1}, reconstructed particles are clustered into two large jets of similar mass, each consistent with four-parton substructure. No statistically significant excess of data over the background prediction is observed in the distribution of average jet mass. Pair-produced squarks with dominant hadronic R-parity-violating decays into four quarks and with masses between 0.10 and 0.72 TeV are excluded at 95% confidence level. Similarly, pair-produced gluinos that decay into five quarks are also excluded with masses between 0.10 and 1.41 TeV at 95% confidence level. These are the first constraints that have been placed on pair-produced particles with masses below 400 GeV that decay into four or five quarks, bridging a significant gap in the coverage of R-parity-violating supersymmetry parameter space.
- Publication type
- Journal Article MeSH