Background: While administrative databases for health care are increasingly used as research tools, such databases generally contain only health insurance claims data, the contents of which are insufficient for conducting epidemiological research. Creating a dataset appropriate for specific analysis requires technical expertise and familiarity with data analysis. The aim of our research is to develop a data warehouse (DW) accessible to researchers of epidemiology without this expertise.Methods: We began by adding commonly used attributes in the epidemiological field to the National Database of Health Insurance Claims of Japan (NDB), to construct a Research Question Oriented DB. Secondly, we developed a versatile analysis unit schema by which the Research Question Oriented DW was reconstructed as per-patient units, covering demographics including sex, age group etc. We then proposed a pattern relational calculus by which research-specific attributes can be added without expert knowledge of SQL. Finally, we applied the DW in two epidemiological studies.Results: In both studies, the coverage of attributes constructed only by the versatile analysis unit schema was limited. The versatile analysis unit schema covered 12% (3/25) of the attributes used for the one study as well as 15% (3/20) in the other study. On the other hand, the pattern relational calculus we proposed covered all remaining attributes which researchers used for their study.Conclusion: As the versatile analysis unit schema and the pattern relational calculus were able to cover all attributes used in the two epidemiological studies, this shows that even within a limited scope, our method allows researchers who have little knowledge of SQL to tackle respective epidemiological study.Abbreviations and Terminologies: NDB-SD: NDB Sampling Data set; DW: Data Warehouse; Shema: design of attributes in relations in the relational model theory; Relation: table with no duplicate tuple; Attribute: column name or variable name in relations; Primary key: one or more attributes that uniquely identify each tuple in a relation; Tuple: combination of attributes in a relation, almost the same meaning as row; Tuple relational calculus: logical expression used in the relational model theory; SQL: database language based on the relational model theory.
- MeSH
- Data Analysis * MeSH
- Big Data MeSH
- Databases as Topic MeSH
- Epidemiologic Studies * MeSH
- Humans MeSH
- Delivery of Health Care MeSH
- Universal Health Insurance MeSH
- Check Tag
- Humans MeSH
- Geographicals
- Japan MeSH
Diamond-Blackfan Anemia (DBA) is characterized by a defect of erythroid progenitors and, clinically, by anemia and malformations. DBA exhibits an autosomal dominant pattern of inheritance with incomplete penetrance. Currently nine genes, all encoding ribosomal proteins (RP), have been found mutated in approximately 50% of patients. Experimental evidence supports the hypothesis that DBA is primarily the result of defective ribosome synthesis. By means of a large collaboration among six centers, we report here a mutation update that includes nine genes and 220 distinct mutations, 56 of which are new. The DBA Mutation Database now includes data from 355 patients. Of those where inheritance has been examined, 125 patients carry a de novo mutation and 72 an inherited mutation. Mutagenesis may be ascribed to slippage in 65.5% of indels, whereas CpG dinucleotides are involved in 23% of transitions. Using bioinformatic tools we show that gene conversion mechanism is not common in RP genes mutagenesis, notwithstanding the abundance of RP pseudogenes. Genotype-phenotype analysis reveals that malformations are more frequently associated with mutations in RPL5 and RPL11 than in the other genes. All currently reported DBA mutations together with their functional and clinical data are included in the DBA Mutation Database. 2010 Wiley-Liss, Inc.
- MeSH
- Databases, Genetic * MeSH
- Anemia, Diamond-Blackfan diagnosis genetics MeSH
- Genetic Association Studies MeSH
- Humans MeSH
- Molecular Sequence Data MeSH
- Mutation * genetics MeSH
- Mutagenesis genetics MeSH
- Ribosomal Proteins genetics MeSH
- Ribosomes * genetics MeSH
- Base Sequence MeSH
- Check Tag
- Humans MeSH
- Publication type
- Research Support, Non-U.S. Gov't MeSH
- Research Support, N.I.H., Extramural MeSH
Do epidemiologickej prierezovej štúdie prebiehajúcej v roku 1992 bolo zahrnutých 722 rodičiek z I. gynekologicko-pôrodníckej kliniky FNsP v Košiciach. Databáza súboru bola tvorená v softwari SOR (Správa o rodičke) a SON (Správa o novorodencovi). Analýza pôsobenia vybraných sociálnych faktorov poukázala na ich významný vplyv na niektoré reprodukčné ukazovatele žien. Vzdelanie matky významne ovplyvňovalo počet pôrodov. Štatisticky vysoko významné rozdiely boli zistené u žien s tromi a viac pôrodmi medzi ženami s VS a základným vzdelaním, ale aj celková štatistická analýza súboru chí-kvadrátovým testom poukazuje na významnosť vzdelania matiek. Vplyv vzdelanosti matiek nebol dokázaný pri sledovaní počtu spontánnych potratov a interrupcií. Rodinný stav matky mal štatisticky významný vplyv na gestačný vek. Gestačný vek 38 a viac týždňov malo 90,4 % vydatých rodičiek, zatiaľ čo u rodičiek bez partnera to bolo len 77,2 % (p < 0,001).
The epidemiological cross-sectional study implemented in 1992 comprised 722 parturient women from the 1st Gynaecological and Obstetric Clinic of the Faculty Hospital in Košice. The database of the group was processed using software SOR and SON. Analysis of the action of selected social factors revealed their important influence on some reproductive parameters. The mother's education had a significant effect on the number of deliveries. Highly significant differences were found between women with three or more deliveries as regards their education (university education, elementary education). However also the general statistical analysis of the group indicates the importance of the mother's education. The influece of education was not found when investigating the number of spontaneous and induced abortions. The family status of the mother had a significant impact on gestational age. A gestational age of 38 weeks or longer was recorded in 90.4% married mothers, while in unmarried mothers it was only 77.2% (p<0.001)
The impact of the classification method and features selection for the speech emotion recognition accuracy is discussed in this paper. Selecting the correct parameters in combination with the classifier is an important part of reducing the complexity of system computing. This step is necessary especially for systems that will be deployed in real-time applications. The reason for the development and improvement of speech emotion recognition systems is wide usability in nowadays automatic voice controlled systems. Berlin database of emotional recordings was used in this experiment. Classification accuracy of artificial neural networks, k-nearest neighbours, and Gaussian mixture model is measured considering the selection of prosodic, spectral, and voice quality features. The purpose was to find an optimal combination of methods and group of features for stress detection in human speech. The research contribution lies in the design of the speech emotion recognition system due to its accuracy and efficiency.
- MeSH
- Algorithms * MeSH
- Databases, Factual MeSH
- Emotions physiology MeSH
- Voice Quality MeSH
- Humans MeSH
- Neural Networks, Computer MeSH
- Signal Processing, Computer-Assisted instrumentation MeSH
- Speech physiology MeSH
- ROC Curve MeSH
- Pattern Recognition, Automated * MeSH
- Pattern Recognition, Physiological physiology MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
One of the aims of high-throughput gene/protein profiling experiments is the identification of biological processes altered between two or more conditions. Pathway analysis is an umbrella term for a multitude of computational approaches used for this purpose. While in the beginning pathway analysis relied on enrichment-based approaches, a newer generation of methods is now available, exploiting pathway topologies in addition to gene/protein expression levels. However, little effort has been invested in their critical assessment with respect to their performance in different experimental setups. Here, we assessed the performance of seven representative methods identifying differentially expressed pathways between two groups of interest based on gene expression data with prior knowledge of pathway topologies: SPIA, PRS, CePa, TAPPA, TopologyGSA, Clipper and DEGraph. We performed a number of controlled experiments that investigated their sensitivity to sample and pathway size, threshold-based filtering of differentially expressed genes, ability to detect target pathways, ability to exploit the topological information and the sensitivity to different pre-processing strategies. We also verified type I error rates and described the influence of overexpression of single genes, gene sets and topological motifs of various sizes on the detection of a pathway as differentially expressed. The results of our experiments demonstrate a wide variability of the tested methods. We provide a set of recommendations for an informed selection of the proper method for a given data analysis task.
- MeSH
- Databases, Genetic MeSH
- Datasets as Topic MeSH
- Humans MeSH
- Metabolic Networks and Pathways * MeSH
- Gene Expression Profiling methods MeSH
- High-Throughput Nucleotide Sequencing MeSH
- Check Tag
- Humans MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
- Comparative Study MeSH
Silene vulgaris possesses ecotype-specific tolerance to high levels of copper in the soil. Although this was reported a few decades ago, little is known about this trait on a molecular level. The aim of this study was to analyze the transcription response to elevated copper concentrations in two S. vulgaris ecotypes originating from copper-contrasting soil types - copper-tolerant Lubietova and copper-sensitive Stranska skala. To reveal if plants are transcriptionally affected, we first analyzed the HMA7 gene, a known key player in copper metabolism. Based on BAC library screening, we identified a BAC clone containing a SvHMA7 sequence with all the structural properties specific for plant copper-transporting ATPases. The functionality of the gene was tested using heterologous complementation in yeast mutants. Analyses of SvHMA7 transcription patterns showed that both ecotypes studied up-regulated SvHMA7 transcription after the copper treatment. Our data are supported by analysis of appropriate reference genes based on RNA-Seq databases. To identify genes specifically involved in copper response in the studied ecotypes, we analyzed transcription profiles of genes coding Cu-transporting proteins and genes involved in the prevention of copper-induced oxidative stress in both ecotypes. Our data show that three genes (APx, POD and COPT5) differ in their transcription pattern between the ecotypes with constitutively increased transcription in Lubietova. Taken together, we have identified transcription differences between metallifferous and non-metalliferous ecotypes of S. vulgaris, and we have suggested candidate genes participating in metal tolerance in this species.
- MeSH
- Adenosine Triphosphatases genetics metabolism MeSH
- Databases, Nucleic Acid MeSH
- Ecotype MeSH
- Gene Library MeSH
- Plant Roots drug effects genetics growth & development physiology MeSH
- Copper metabolism pharmacology MeSH
- Organ Specificity MeSH
- Cation Transport Proteins genetics metabolism MeSH
- Gene Expression Regulation, Plant * MeSH
- RNA, Plant chemistry genetics MeSH
- Plant Proteins genetics metabolism MeSH
- Sequence Analysis, RNA MeSH
- Silene drug effects genetics growth & development physiology MeSH
- Transcriptome * MeSH
- Plant Shoots drug effects genetics growth & development physiology MeSH
- Publication type
- Journal Article MeSH
- Research Support, Non-U.S. Gov't MeSH
Tento článek se zabývá automatickou lokalizací objektů (očí, úst) ve dvourozměrných (2D) černobílých obrazech obličejů. Je motivován praktickým problémem v genetice člověka a výstup lokalizace objektů v dané databázi obrazů je zapotřebí pro řešení dalších úloh v genetickém výzkumu. V článku se aplikuje robustní filtr na obrazy s cílem odstranit šum. Hlavní metodou jsou šablony. Ústa a obě oči se lokalizují současně za použití váženého Pearsonova korelačního koeficientu nebo jeho robustní analogie založené na robustních regresních metodách. V databázi s 212 obrazy obličejů tato metoda správně nalezne ústa a oči ve 100 % případů. Také robustní korelační koeficient založený na regresní metodě nejmenších vážených čtverců lokalizuje ústa a oči ve 100 % obrazů uvažované databáze. Článek studuje robustní aspekty této metody vzhledem k otočení, šumu, okluzi a asymetrii v obraze. Současná lokalizace úst i obou očí je invariantní vůči libovolnému otočení obličeje. Tato studie využívá speciální vlastnosti daných obrazů obličejů vzhledem k očekávanému použití v genetických aplikacích.
This paper is devoted to automatic localization of objects (eyes, mouth) in two-dimensional (2D) grey scale images of faces. Motivated by a practical problem in human genetics, the output of the localization of objects in the given database of images is needed for further tasks in the genetic research. A robust filter is applied on the image to ensure denoising. Templates are used as the main method. The mouth and both eyes are localized jointly using the weighted Pearson product-moment correlation coefficient or its robust analogy based on robust regression methods. In the database with 212 images of faces the method allows to locate the mouth and eyes correctly in 100 % of cases. Also the robust correlation coefficient based on the least weighted squares regression localizes the mouth and both eyes in 100 % of images of the given database. Robustness aspects of the method are examined with respect to rotation, noise, occlusion and asymmetry in the image. The joint localization of the mouth and both eyes produces the method invariant to rotation of any degree. This work is tailor made for the given images with expected usage of the methods in genetic applications.
- Keywords
- lokalizace objektů, šablony, detekce oči a úst, robustní korelační analýza, redukce šumu,
- MeSH
- Biometry methods MeSH
- Contrast Sensitivity physiology MeSH
- Databases as Topic standards MeSH
- Photography methods MeSH
- Genetic Research MeSH
- Image Interpretation, Computer-Assisted methods MeSH
- Humans MeSH
- Face MeSH
- Eye MeSH
- Image Processing, Computer-Assisted methods MeSH
- Regression Analysis MeSH
- Reproducibility of Results MeSH
- Pattern Recognition, Physiological physiology MeSH
- Subtraction Technique standards MeSH
- Mouth MeSH
- Image Enhancement methods MeSH
- Check Tag
- Humans MeSH
... Contents -- Contributors vii -- Acknowledgements ix -- 1 GIS and spatial analysis: introduction and overview ... ... -- 2 A review of statistical spatial analysis in geographical information systems 13 -- Trevor C. ... ... Haining -- 4 Spatial analysis and GIS 65 -- Morton E. ?’ ... ... pattern analysers relevant to GIS 83 -- Stan ? ... ... Ralston -- PART III GIS AND SPATIAL ANALYSIS: APPLICATIONS 187 -- 10 Urban analysis in a GIS environment ...
[1st ed.] 281 s.
- Keywords
- geografický informační systém,
- Conspectus
- Lékařské vědy. Lékařství
- NML Fields
- environmentální vědy
- lékařská informatika
BACKGROUND: Seasonality at the clinical onset of type 1 diabetes (T1D) has been suggested by different studies, however, the results are conflicting. This study aimed to evaluate the presence of seasonality at clinical onset of T1D based on the SWEET database comprising data from 32 different countries. METHODS: The study cohort included 23 603 patients (52% males) recorded in the international multicenter SWEET database (48 centers), with T1D onset ≤20 years, year of onset between 1980 and 2015, gender, year and month of birth and T1D-diagnosis documented. Data were stratified according to four age groups (<5, 5-<10, 10-<15, 15-20 years) at T1D onset, the latitude of European center (Northern ≥50°N and Southern Europe <50°N) and the year of onset ≤ or >2009. RESULTS: Analysis by month revealed significant seasonality with January being the month with the highest and June with the lowest percentage of incident cases (P < .001). Winter, early spring and late autumn months had higher percentage of incident cases compared with late spring and summer months. Stratification by age showed similar seasonality patterns in all four age groups (P ≤ .003 each), but not in children <24 months of age. There was no gender or latitude effect on seasonality pattern, however, the pattern differed by the year of onset (P < .001). Seasonality of diagnosis conformed to a sinusoidal model for all cases, females and males, age groups, northern and southern European countries. CONCLUSIONS: Seasonality at T1D clinical onset is documented by the large SWEET database with no gender or latitude (Europe only) effect except from the year of manifestation.
- MeSH
- Diabetes Mellitus, Type 1 epidemiology MeSH
- Child MeSH
- Cohort Studies MeSH
- Infant MeSH
- Humans MeSH
- Adolescent MeSH
- Young Adult MeSH
- Child, Preschool MeSH
- Seasons * MeSH
- Check Tag
- Child MeSH
- Infant MeSH
- Humans MeSH
- Adolescent MeSH
- Young Adult MeSH
- Male MeSH
- Child, Preschool MeSH
- Female MeSH
- Publication type
- Journal Article MeSH
- Multicenter Study MeSH
- Research Support, Non-U.S. Gov't MeSH
- Geographicals
- Europe epidemiology MeSH
The availability of a great range of prior biological knowledge about the roles and functions of genes and gene-gene interactions allows us to simplify the analysis of gene expression data to make it more robust, compact, and interpretable. Here, we objectively analyze the applicability of functional clustering for the identification of groups of functionally related genes. The analysis is performed in terms of gene expression classification and uses predictive accuracy as an unbiased performance measure. Features of biological samples that originally corresponded to genes are replaced by features that correspond to the centroids of the gene clusters and are then used for classifier learning. Using 10 benchmark data sets, we demonstrate that functional clustering significantly outperforms random clustering without biological relevance. We also show that functional clustering performs comparably to gene expression clustering, which groups genes according to the similarity of their expression profiles. Finally, the suitability of functional clustering as a feature extraction technique is evaluated and discussed.