A general multi-purpose data structure for an efficient representation of conforming unstructured homogeneous meshes for scientific computations on CPU and GPU-based systems is presented. The data structure is provided as open-source software as part of the TNL library (https://tnl-project.org/). The abstract representation supports almost any cell shape and common 2D quadrilateral, 3D hexahedron and arbitrarily dimensional simplex shapes are currently built into the library. The implementation is highly configurable via templates of the C++ language, which allows avoiding the storage of unnecessary dynamic data. The internal memory layout is based on state-of-the-art sparse matrix storage formats, which are optimized for different hardware architectures in order to provide high-performance computations. The proposed data structure is also suitable for meshes decomposed into several subdomains and distributed computing using the Message Passing Interface (MPI). The efficiency of the implemented data structure on CPU and GPU hardware architectures is demonstrated on several benchmark problems and a comparison with another library. Its applicability to advanced numerical methods is demonstrated with an example problem of two-phase flow in porous media using a numerical scheme based on the mixed-hybrid finite element method (MHFEM). We show GPU speed-ups that rise above 20 in 2D and 50 in 3D when compared to sequential CPU computations, and above 2 in 2D and 9 in 3D when compared to 12-threaded CPU computations.
- Publikační typ
- časopisecké články MeSH
Výpočetní výkon grafického procesoru (dále jen GPU – graphic processing unit) v současných osobních počítačích často výrazně převyšuje výkon procesoru CPU. Na trhu jsou ještě výkonnější GPU specializovaná na intenzivní výpočty. GPU se skládá z desítek až stovek výpočetních jader. Předpokladem využití výpočetního výkonu je vhodná paralelizace řešené úlohy. Zrychlení výpočtu, kterého je možné dosáhnout použitím GPU se značně liší v závislosti na řešené úloze. U některých úloh lze dosáhnout až stonásobného zrychlení. Na druhou stranu je mnoho úloh, které nelze vůbec paralelizovat a jejich výpočet na GPU by vedl naopak ke zpomalení. Cílem tohoto příspěvku je ukázat možnosti použití GPU pro obecné výpočty, přiblížit architekturu GPU a programování na platformě CUDA. Použití GPU je demonstrováno na konkrétní aplikaci. Pro názornost je mnoho detailů vynecháno nebo zjednodušeno. Cílem není naučit čtenáře programovat v CUDA. Detailní popis GPU a CUDA je například v [1,2].
These tasks often exceed CPU processor performance considerably. Even more powerful GPUs are available on the market, specialized in intensive computations. GPU consists of tens up to hundreds of computational cores. Use of the computational power requires suitable parallelization of the task to be solved. Acceleration of the computation that can be accomplished using a GPU differs considerably based on the task being solved. Up to hundredfold acceleration can be achieved in some tasks. On the other hand, many tasks cannot be parallelized at all and their computation using a GPU would actually slow them down. This contribution is aimed at showing the potential of GPUs for general computations, at providing a more detailed view of GPU architecture and of programming on the CUDA platform. GPU use is demonstrated using a specific application. Many details are omitted or simplified in order to provide a better illustration. The contribution does not seek to teach the reader how to program in CUDA. Detailed description of GPU and CUDA can be found e.g. in [1,2].
... Grafická karta 137 -- 9.1 GPU 138 -- 9.1.1 Grafické karty se dvěma GPU 139 -- 9.1.2 Nové grafické čipy ... ... 141 -- 9.4 TV tunery 142 -- 9.5 Dvě grafické karty 142 -- 9.6 Diagnostika grafické karty 142 -- 9.6.1 GPU-Z ...
1. elektronické vydání 1 online zdroj (224 stran)
Kniha je určena zejména pro začátečníky, ale přinese užitečné informace i pro uživatele středně pokročilého. Jednotlivé kapitoly knihy se věnují získání, základnímu nastavení a instalaci (a odinstalaci) programů, dále jednotlivým součástkám počítače (jako je například základní deska nebo grafická karta), periferiím (jako je monitor nebo tiskárna) a nastavení základního programu v počítači - BIOSu. V samém závěru se čtenář naučí používat nástroje komplexní diagnostiky a zjistit informace o svém počítači pomocí systému Windows. Po přečtení knihy také bude čtenář vědět, jaký počítač je vhodný pro kancelářské práce,jaký naopak pro domácí zábavu, apod. Kniha je od renomovaného autora více než deseti publikací, který se danou oblastí zabývá více než patnáct roků a působí jako lektor pro několik známých českých společností, jako jsou např. Computer Help, PC-DIR, Nicom, AIT Consult, a další.
... Spuštění notebooku Jupyter na instanci EC2 GPU 314 -- C. ... ... možnosti 71 -- 3.3.3 Spouštění hlubokého učení v cloudu: klady a zápory 71 -- 3.3.4 Jaká je nejlepší GPU ... ... Spuštění notebooku Jupyter na instanci EC2 GPU 314 -- B. 1 Co jsou notebooky Jupyter? ... ... Proč je provozovat GPU na AWS? ... ... 315 -- B.3 Nastavení instance AWS GPU 315 -- B.3.1 Konfigurace Jupyteru 317 -- B.4 Instalace Keras 318 ...
1. elektronické vydání 1 online zdroj (328 stran)
The advent of cryo-electron microscopy (cryo-EM) and cryo-electron tomography (cryo-ET), coupled with computational modeling, has enabled the creation of integrative 3D models of viruses, bacteria, and cellular organelles. These models, composed of thousands of macromolecules and billions of atoms, have historically posed significant challenges for manipulation and visualization without specialized molecular graphics tools and hardware. With the recent advancements in GPU rendering power and web browser capabilities, it is now feasible to render interactively large molecular scenes directly on the web. In this work, we introduce Mesoscale Explorer, a web application built using the Mol* framework, dedicated to the visualization of large-scale molecular models ranging from viruses to cell organelles. Mesoscale Explorer provides unprecedented access and insight into the molecular fabric of life, enhancing perception, streamlining exploration, and simplifying visualization of diverse data types, showcasing the intricate details of these models with unparalleled clarity.
- MeSH
- elektronová kryomikroskopie * metody MeSH
- molekulární modely * MeSH
- software * MeSH
- viry chemie ultrastruktura MeSH
- Publikační typ
- časopisecké články MeSH
In cryo-electron microscopy, accurate particle localization and classification are imperative. Recent deep learning solutions, though successful, require extensive training datasets. The protracted generation time of physics-based models, often employed to produce these datasets, limits their broad applicability. We introduce FakET, a method based on neural style transfer, capable of simulating the forward operator of any cryo transmission electron microscope. It can be used to adapt a synthetic training dataset according to reference data producing high-quality simulated micrographs or tilt-series. To assess the quality of our generated data, we used it to train a state-of-the-art localization and classification architecture and compared its performance with a counterpart trained on benchmark data. Remarkably, our technique matches the performance, boosts data generation speed 750×, uses 33× less memory, and scales well to typical transmission electron microscope detector sizes. It leverages GPU acceleration and parallel processing. The source code is available at https://github.com/paloha/faket/.
Designing a cranial implant to restore the protective and aesthetic function of the patient's skull is a challenging process that requires a substantial amount of manual work, even for an experienced clinician. While computer-assisted approaches with various levels of required user interaction exist to aid this process, they are usually only validated on either a single type of simple synthetic defect or a very limited sample of real defects. The work presented in this paper aims to address two challenges: (i) design a fully automatic 3D shape reconstruction method that can address diverse shapes of real skull defects in various stages of healing and (ii) to provide an open dataset for optimization and validation of anatomical reconstruction methods on a set of synthetically broken skull shapes. We propose an application of the multi-scale cascade architecture of convolutional neural networks to the reconstruction task. Such an architecture is able to tackle the issue of trade-off between the output resolution and the receptive field of the model imposed by GPU memory limitations. Furthermore, we experiment with both generative and discriminative models and study their behavior during the task of anatomical reconstruction. The proposed method achieves an average surface error of 0.59mm for our synthetic test dataset with as low as 0.48mm for unilateral defects of parietal and temporal bone, matching state-of-the-art performance while being completely automatic. We also show that the model trained on our synthetic dataset is able to reconstruct real patient defects.
- MeSH
- lebka diagnostické zobrazování MeSH
- lidé MeSH
- neuronové sítě * MeSH
- počítačové zpracování obrazu * MeSH
- protézy a implantáty MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
Three-dimensional structure models refined using low-resolution data from crystallographic or electron cryo-microscopy experiments can benefit from high-quality restraints derived from quantum-chemical methods. However, nonperiodic atom-centered quantum-chemistry codes do not inherently account for nearest-neighbor interactions of crystallographic symmetry-related copies in a satisfactory way. Here, these nearest-neighbor effects have been included in the model by expanding to a super-cell and then truncating the super-cell to only include residues from neighboring cells that are interacting with the asymmetric unit. In this way, the fragmentation approach can adequately and efficiently include nearest-neighbor effects. It has previously been shown that a moderately sized X-ray structure can be treated using quantum methods if a fragmentation approach is applied. In this study, a target protein (PDB entry 4gif) was partitioned into a number of large fragments. The use of large fragments (typically hundreds of atoms) is tractable when a GPU-based package such as TeraChem is employed or cheaper (semi-empirical) methods are used. The QM calculations were run at the HF-D3/6-31G level. The models refined using a recently developed semi-empirical method (GFN2-xTB) were compared and contrasted. To validate the refinement procedure for a non-P1 structure, a standard set of crystallographic metrics were used. The robustness of the implementation is shown by refining 13 additional protein models across multiple space groups and a summary of the refinement metrics is presented.
... -- První kroky s Colaboratory 98 -- Instalace balíčků pomocí PIP 100 -- Použití běhového prostředí GPU ... ... 13.1.2 Kombinování modelů 461 -- 13.2 Rozšiřování školení modelů 463 -- 13.2.1 Zrychlení tréninku na GPU ... ... smíšenou přesností v praxi 467 -- 13.2.2 Skolení s více grafickými procesory 467 -- Získání dvou nebo více GPU ...
1. elektronické vydání 1 online zdroj (528 stran)
Strojové učení zaznamenalo v posledních letech pozoruhodný pokrok od téměř nepoužitelného rozpoznávání řeči a obrazu k nadlidské přesnosti. Od programů, které nedokázaly porazit jen trochu zkušenějšího hráče go, jsme dospěli k přemožiteli mistra světa. Za pokrokem ve vývoji učících se programů stojí tzv. hluboké učení - deep learning.; Strojové učení zaznamenalo v posledních letech pozoruhodný pokrok od téměř nepoužitelného rozpoznávání řeči a obrazu k nadlidské přesnosti. Od programů, které nedokázaly porazit jen trochu zkušenějšího hráče go, jsme dospěli k přemožiteli mistra světa. Za pokrokem ve vývoji učících se programů stojí tzv. hluboké učení – deep learning.
Nowadays, advanced computational chemistry methods offer various strategies for revealing prospective hit structures in drug development essentially through accurate binding free energy predictions. After the era of molecular docking and quantitative structure-activity relationships, much interest has been lately oriented to perturbed molecular dynamic approaches like replica exchange with solute tempering and free energy perturbation (REST/FEP) and the potential of the mean force with adaptive biasing and accelerated weight histograms (PMF/AWH). Both of these receptor-based techniques can exploit exascale CPU&GPU supercomputers to achieve high throughput performance. In this fundamental study, we have compared the predictive power of a panel of supercomputerized molecular modelling methods to distinguish the major binding modes and the corresponding binding free energies of a promising tacrine related potential antialzheimerics in human acetylcholinesterase. The binding free energies were estimated using flexible molecular docking, molecular mechanics/generalized Born surface area/Poisson-Boltzmann surface area (MM/GBSA/PBSA), transmutation REST/FEP with 12 x 5 ns/λ windows, annihilation FEP with 20 x 5 ns/λ steps, PMF with weight histogram analysis method (WHAM) and 40 x 5 ns samples, and PMF/AWH with 10 x 100 ns replicas. Confrontation of the classical approaches such as canonical molecular dynamics and molecular docking with alchemical calculations and steered molecular dynamics enabled us to show how large errors in ΔG predictions can be expected if these in silico methods are employed in the elucidation of a common case of enzyme inhibition.Communicated by Ramaswamy H. Sarma.