Observer Variation
Dotaz
Zobrazit nápovědu
BACKGROUND AND OBJECTIVE: Biparametric magnetic resonance imaging (bpMRI), excluding dynamic contrast-enhanced (DCE) magnetic resonance imaging (MRI), is a potential replacement for multiparametric MRI (mpMRI) in diagnosing clinically significant prostate cancer (csPCa). An extensive international multireader multicase observer study was conducted to assess the noninferiority of bpMRI to mpMRI in csPCa diagnosis. METHODS: An observer study was conducted with 400 mpMRI examinations from four European centers, excluding examinations with prior prostate treatment or csPCa (Gleason grade [GG] ≥2) findings. Readers assessed bpMRI and mpMRI sequentially, assigning lesion-specific Prostate Imaging Reporting and Data System (PI-RADS) scores (3-5) and a patient-level suspicion score (0-100). The noninferiority of patient-level bpMRI versus mpMRI csPCa diagnosis was evaluated using the area under the receiver operating curve (AUROC) alongside the sensitivity and specificity at PI-RADS ≥3 with a 5% margin. The secondary outcomes included insignificant prostate cancer (GG1) diagnosis, diagnostic evaluations at alternative risk thresholds, decision curve analyses (DCAs), and subgroup analyses considering reader expertise. Histopathology and ≥3 yr of follow-up were used for the reference standard. KEY FINDINGS AND LIMITATIONS: Sixty-two readers (45 centers and 20 countries) participated. The prevalence of csPCa was 33% (133/400); bpMRI and mpMRI showed similar AUROC values of 0.853 (95% confidence interval [CI], 0.819-0.887) and 0.859 (95% CI, 0.826-0.893), respectively, with a noninferior difference of -0.6% (95% CI, -1.2% to 0.1%, p < 0.001). At PI-RADS ≥3, bpMRI and mpMRI had sensitivities of 88.6% (95% CI, 84.8-92.3%) and 89.4% (95% CI, 85.8-93.1%), respectively, with a noninferior difference of -0.9% (95% CI, -1.7% to 0.0%, p < 0.001), and specificities of 58.6% (95% CI, 52.3-63.1%) and 57.7% (95% CI, 52.3-63.1%), respectively, with a noninferior difference of 0.9% (95% CI, 0.0-1.8%, p < 0.001). At alternative risk thresholds, mpMRI increased sensitivity at the expense of reduced specificity. DCA demonstrated the highest net benefit for an mpMRI pathway in cancer-averse scenarios, whereas a bpMRI pathway showed greater benefit for biopsy-averse scenarios. A subgroup analysis indicated limited additional benefit of DCE MRI for nonexperts. Limitations included that biopsies were conducted based on mpMRI imaging, and reading was performed in a sequential order. CONCLUSIONS AND CLINICAL IMPLICATIONS: It has been found that bpMRI is noninferior to mpMRI in csPCa diagnosis at AUROC, along with the sensitivity and specificity at PI-RADS ≥3, showing its value in individuals without prior csPCa findings and prostate treatment. Additional randomized prospective studies are required to investigate the generalizability of outcomes.
- MeSH
- lidé středního věku MeSH
- lidé MeSH
- magnetická rezonanční tomografie MeSH
- multiparametrická magnetická rezonance * MeSH
- nádory prostaty * diagnostické zobrazování patologie MeSH
- odchylka pozorovatele MeSH
- senioři MeSH
- stupeň nádoru MeSH
- Check Tag
- lidé středního věku MeSH
- lidé MeSH
- mužské pohlaví MeSH
- senioři MeSH
- Publikační typ
- časopisecké články MeSH
- multicentrická studie MeSH
- srovnávací studie MeSH
- Geografické názvy
- Evropa MeSH
Cílem práce bylo porovnání úspěšnosti texturního klasifikátoru a vyšetřujícího lékaře (radiologa) při diagnóze autoimunitní thyroiditidy ze sonografického obrazu snímaného v B-módu. Určení inter- a intrapersonální variability lékařů. Datový soubor obsahující 161 vyšetřovaných subjektů byl rozdělen do tří skupin dle celkového klinického vyšetření: normální – H (healthy); hraniční stav – BS (border state); autoimunitní thyroiditida – AT. Následně byl soubor čtyřmi vyšetřujícími lékaři a Bayesovským klasifikátorem, založeným na texturních příznacích, hodnocen do těchto skupin. Dva lékaři dosáhli vyšší úspěšnosti při hodnocení subjektů z normální skupiny (74,4 % a 83,3 %) a dva lékaři hodnotili lépe subjekty s autoimunitní thyroiditidou (59,0% a 77,4 %). Klasifikátor dosáhl relativně vysoké a vyrovnané úspěšnosti pro obě tyto skupiny (100,0 % pro normální a 87,5 % pro thyroiditidu). Rozdílný úspěch jednotlivých lékařů při hodnocení subjektů vyústil ve vyšší interpersonální variabilitu, tedy nízkou shodu mezi nimi. V intrapersonální variabilitě jednotlivých lékařů nebyl nalezen významný rozdíl. Vzhledem ke slabé shodě mezi vyšetřujícími lékaři při diagnostice autoimunitní thyroiditidy ze sonografických obrazů a vysoké a vyrovnané úspěšnosti klasifikátoru se zdá jako nejvýhodnější pro stanovení konečné diagnózy kombinace automatické klasifikace obrazů a klinických zkušeností lékařů.
The objective has been to compare success of the texture classifier and a human observer in diagnosis of the autoimmune thyroiditis from B-mode ultrasound images and to determine inter- and intra-observer variability. The data set of 161 subjects was classified by four human observers and by the Bayes classifier based on the texture features to three classes (healthy, border state, autoimmune thyroiditis). Two observers had a higher success rate when classifying the healthy class (74.4 % and 83.3 %), the other two observers classified better cases with autoimmune thyroiditis (59.0 % and 77.4 %). The classifier gave the relatively high and balanced success rate for both classes (100.0 % for healthy and 875 % for thyroiditis). The different observers’ success rates resulted in the high inter-observer variability, showing only a fair agreement among the human observers. There was no significant difference among human observers in the intra-observer variability. Due to the fair agreement among observers in the diagnosis of autoimmune thyroiditis from ultrasound images and good results of the classifier, the best way in establishing diagnosis is computer-aided diagnosis combined with observers’ clinical experience.
- Klíčová slova
- sonografický obraz, B-mode sonografie, texturní analýza, počítačem podporovaná diagnóza, interpersonální variabilita, koeficient Kappa, vážený koeficient Kappa,
- MeSH
- autoimunitní tyreoiditida diagnóza ultrasonografie MeSH
- financování organizované MeSH
- interpretace obrazu počítačem přístrojové vybavení využití MeSH
- interpretace statistických dat MeSH
- lidé MeSH
- odchylka pozorovatele MeSH
- reprodukovatelnost výsledků MeSH
- senzitivita a specificita MeSH
- štítná žláza MeSH
- ultrasonografie metody statistika a číselné údaje MeSH
- Check Tag
- lidé MeSH
Aim. To determine the inter-observer reproducibility of 15 tests used for predicting difficult tracheal intubation (DI). Material and methods. Following local ethics committee approval and informed consent, 101 volunteers were examined by two assessors using 15 tests for predicting DI. The two assessors who were blinded to the results of the other, examined each volunteer independently. Cohen's kappa (ę) or first-order agreement coefficient (AC1) were used to measure agreement between assessor ratings on a qualitative scale. Agreement between two quantitative outcomes was described using the intraclass correlation coefficient (ICC) and Pearson's (PCC) or Spearman's (SCC) correlation coefficients. The following interpretation of the coefficients was used: poor (< 0.20), fair (0.21–0.40), satisfactory (0.41–0.60), good (0.61–0.80), and excellent (0.81–1.00). Results. Respective coefficients of inter-rater agreement and correlation coefficients were determined for the following parameters: pathologies associated with DI (ę=0.662, AC1=0.990), clinical impression (ę=-0.013, AC1=0.969), modified Mallampati test (ę=0.503, AC1=0.861), upper lip bite test (ę=0.370, AC1=0.897), temporo-mandibular joint movement (ę=0.088, AC1=0.797), max. anteroflexion of C-spine (ICC=0.136, SCC=0.391), max. retroflexion of C-spine (ICC=0.020, SCC=0.284), mandibular length (ICC=0.301, SCC=0.553), neck circumference (ICC=0.832, SCC=0.928), hyo-mental distance (ICC=0.378, SCC=0.472), thyro-mental distance (ICC=–0.002, PCC=0.265), sternomental distance (ICC=0.674, PCC=0.815), and finally, inter-incisor gap (ICC=0.695, PCC=0.785). Two tests (positive history of DI and retrogenia), were excluded from calculation because no positive cases were found. Conclusion. Best inter-rater agreement was found for the assessment of neck circumference while the highest discrepancies between raters were in goniometrically-measured mobility of the C-spine. Many of the pre-operative airway tests had only fair inter-observer reproducibility. This may be one reason why models for predicting difficult intubation are not universally reliable.
- MeSH
- antropometrie MeSH
- experimenty na lidech MeSH
- financování organizované MeSH
- intratracheální intubace metody normy škodlivé účinky MeSH
- lidé MeSH
- odchylka pozorovatele MeSH
- pravděpodobnostní funkce MeSH
- předoperační péče metody MeSH
- reprodukovatelnost výsledků MeSH
- rizikové faktory MeSH
- sexuální faktory MeSH
- Check Tag
- lidé MeSH
Excessive time for analysis may impede microcirculatory studies with large amounts of video data. Engaging more personnel in the analyses seems to be a rational approach in that scenario and could shorten the time-interval between capturing images and obtaining results. Our hypothesis was that novice users would be able to determine standard microcirculatory parameters using a semi-automated software with an acceptable degree of variability after participating in a standardized interactive training session. 14 volunteers were included in the study. All volunteers analyzed separately the same sample video after the training. The kappa statistic was calculated for the primary outcome parameter microvascular flow index (MFI) within small and large vessels and indicated a fair level of agreement in the results of the novice users. A standardized interactive tutorial can be useful to teach microcirculatory analysis in previously untrained subjects.
- MeSH
- audiovizuální záznam MeSH
- automatizace MeSH
- kardiologie výchova MeSH
- lidé MeSH
- mikrocirkulace * MeSH
- odchylka pozorovatele MeSH
- software MeSH
- statistické modely MeSH
- ústa krevní zásobení MeSH
- Check Tag
- lidé MeSH
- Publikační typ
- časopisecké články MeSH
- multicentrická studie MeSH
- práce podpořená grantem MeSH
- Geografické názvy
- Evropa MeSH
- Kanada MeSH
The European LeukemiaNet MDS (EUMDS) registry is collecting data of myelodysplastic syndrome (MDS) patients belonging to the IPSS low or intermediate-1 category, newly diagnosed by local cytologists. The diagnosis of MDS can be challenging, and some data report inter-observer variability with regard to the assessment of the MDS subtype. In order to ensure that correct diagnoses were made by the participating centres, blood and bone marrow slides of 10% of the first 1000 patients were reviewed by an 11-person panel of cytomorphologists. All slides were rated by at least 3 panel members (median 8 panel members; range 3-9). Marrow slides from 98 out of 105 patients were of good quality and therefore could be rated properly according to the WHO 2001 classification, including assessment of dysplastic lineages. The agreement between the reviewers whether the diagnosis was MDS or non-MDS was strong with an intra-class correlation coefficient (ICC) of 0.85. Six cases were detected not to fit the entry criteria of the registry, because they were diagnosed uniformly as CMML or AML by the panel members. The agreement by WHO 2001 classification was strong as well (ICC = 0.83). The concordance of the assessment of dysplastic lineages was substantial for megakaryopoiesis and myelopoiesis and moderate for erythropoiesis. Our data show that in general, the inter-observer agreement was high and a very low percentage of misdiagnosed cases had been entered into the EUMDS registry. Further studies including histomorphology are warranted.
- MeSH
- cytodiagnostika metody normy MeSH
- dospělí MeSH
- lidé středního věku MeSH
- lidé MeSH
- mladý dospělý MeSH
- myelodysplastické syndromy krev diagnóza MeSH
- odchylka pozorovatele * MeSH
- registrace statistika a číselné údaje MeSH
- reprodukovatelnost výsledků MeSH
- senioři nad 80 let MeSH
- senioři MeSH
- senzitivita a specificita MeSH
- vyšetřování kostní dřeně metody normy MeSH
- Check Tag
- dospělí MeSH
- lidé středního věku MeSH
- lidé MeSH
- mladý dospělý MeSH
- mužské pohlaví MeSH
- senioři nad 80 let MeSH
- senioři MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
Stromal tumour infiltrating lymphocytes (sTILs) are a strong prognostic marker in triple negative breast cancer (TNBC). Consistency scoring sTILs is good and was excellent when an internet-based scoring aid developed by the TIL-WG was used to score cases in a reproducibility study. This study aimed to evaluate the reproducibility of sTILs assessment using this scoring aid in cases from routine practice and to explore the potential of the tool to overcome variability in scoring. Twenty-three breast pathologists scored sTILs in digitized slides of 49 TNBC biopsies using the scoring aid. Subsequently, fields of view (FOV) from each case were selected by one pathologist and scored by the group using the tool. Inter-observer agreement was good for absolute sTILs (ICC 0.634, 95% CI 0.539-0.735, p < 0.001) but was poor to fair using binary cutpoints. sTILs heterogeneity was the main contributor to disagreement. When pathologists scored the same FOV from each case, inter-observer agreement was excellent for absolute sTILs (ICC 0.798, 95% CI 0.727-0.864, p < 0.001) and good for the 20% (ICC 0.657, 95% CI 0.561-0.756, p < 0.001) and 40% (ICC 0.644, 95% CI 0.546-0.745, p < 0.001) cutpoints. However, there was a wide range of scores for many cases. Reproducibility scoring sTILs is good when the scoring aid is used. Heterogeneity is the main contributor to variance and will need to be overcome for analytic validity to be achieved.
- Publikační typ
- časopisecké články MeSH
Práce stručně shrnuje výsledky studií pracovní skupiny EFLM o biologických variabilitách - EuBIVAS. Jsou popsány metody výpočtu intra a inter individuálních variabilit. Předmětem sdělení jsou dále aplikace biologických variabilit pro stanovení kritérií analytické kvality (kalkulace hodnot APS a nejistot měření) a pro stanovení hodnot významnosti změn dvou následných měření (RCV) jako kritéria schopnosti monitorovat průběh terapie u pacientů. Je popsána databáze EFLM biologických variabilit a trendy použití výsledků měření klinických laboratoří k personalizaci a individualizaci jejich interpretací (individuální referenční intervaly, přístupy machine learning).
The publication deals with a brief summarizing the results of a working group of EFLM for biological variation studies (Eu- BIVAS). There are described namely calculations of within- and between subject biological variation and their applications for calculation of analytical quality criteria (APS values) and for calculation of significance of reference changes values as potency for monitoring of the patient´s state. We described EFLM biological variation database. We also show the necessity of biological variation for personalized medicine and individualized approach to laboratory results (individual reference intervals, machine learning).
Although three-dimensional (3D) coordinates for human intra-skeletal landmarks are among the most important data that anthropologists have to record in the field, little is known about the reliability of various measuring techniques. We compared the reliability of three techniques used for 3D measurement of human remain in the field: grid technique (GT), total station (TS), and MicroScribe (MS). We measured 365 field osteometric points on 12 skeletal sequences excavated at the Late Medieval/Early Modern churchyard in Všeruby, Czech Republic. We compared intra-observer, inter-observer, and inter-technique variation using mean difference (MD), mean absolute difference (MAD), standard deviation of difference (SDD), and limits of agreement (LA). All three measuring techniques can be used when accepted error ranges can be measured in centimeters. When a range of accepted error measurable in millimeters is needed, MS offers the best solution. TS can achieve the same reliability as does MS, but only when the laser beam is accurately pointed into the center of the prism. When the prism is not accurately oriented, TS produces unreliable data. TS is more sensitive to initialization than is MS. GT measures human skeleton with acceptable reliability for general purposes but insufficiently when highly accurate skeletal data are needed. We observed high inter-technique variation, indicating that just one technique should be used when spatial data from one individual are recorded. Subadults are measured with slightly lower error than are adults. The effect of maximum excavated skeletal length has little practical significance in field recording. When MS is not available, we offer practical suggestions that can help to increase reliability when measuring human skeleton in the field.
This study was conducted to determine the incidence of grey-white matter abnormalities (GWMAs) on magnetic resonance images (MRIs) in patients with hippocampal sclerosis (HS), to assess the inter-observer reliability of this finding, and to establish a possible relationship between GWMA and histopathological findings in the anterior part of the temporal lobe, as well as its other relation to clinical variables. We established a group of 55 patients with histologically proven HS. Three observers independently reviewed the MRIs to assess whether GWMA was present. Substantial independent inter-observer agreement was reached for 44 of the 55 patients (80%) (Fleiss' kappa 0.732; p<0.0001). GWMAs were present in 38% of patients (HS+GWMA). Focal cortical dysplasia (FCD) of type IIIa (ILAE classification) was present in 31% of patients. FCD type IIIa was present in 52.4% with HS+GWMA, and in 17.6% without GWMA (HS-GWMA) (p=0.007). We did not find any statistically significant differences in the postoperative outcomes between HS+GWMA and HS-GWMA. We did not find any statistically significant differences in the presence or absence of GWMA and FCD of the temporal pole in relation to the onset of epilepsy, the duration of epilepsy, or the presence of potential epileptogenic insults. GWMA in the anterior part of temporal lobe in patients with HS is a reliable assessment sign for observers who are experienced in evaluating the MRIs of epilepsy patients. The presence of GWMA is significantly associated with the presence of FCD type IIIa in these patients. The presence or absence of GWMA and FCD type IIIa does not influence the postoperative outcome of HS patients.
- MeSH
- anatomická značka patologie MeSH
- dítě MeSH
- dospělí MeSH
- epilepsie temporálního laloku patologie chirurgie MeSH
- hipokampus patologie MeSH
- laboratorní medicína metody normy statistika a číselné údaje MeSH
- lidé středního věku MeSH
- lidé MeSH
- magnetická rezonanční tomografie metody normy statistika a číselné údaje MeSH
- malformace mozkové kůry patologie chirurgie MeSH
- mladiství MeSH
- mladý dospělý MeSH
- odchylka pozorovatele MeSH
- retrospektivní studie MeSH
- skleróza patologie MeSH
- spánkový lalok patologie MeSH
- Check Tag
- dítě MeSH
- dospělí MeSH
- lidé středního věku MeSH
- lidé MeSH
- mladiství MeSH
- mladý dospělý MeSH
- mužské pohlaví MeSH
- ženské pohlaví MeSH
- Publikační typ
- časopisecké články MeSH
- práce podpořená grantem MeSH
- validační studie MeSH