-
Something wrong with this record ?
Early prediction for fatty liver disease with eigenvector-based feature selections for model performance enhancement
Ji-Han Liu, Kuo-Chin Huang, Feipei Lai
Status minimal Language English Country Czech Republic
Objectives: This study is aimed to achieve the rapid optimization of the input feature subset that satisfies the expert's point of view and enhance the prediction performance of the early prediction model for fatty liver disease (FLD). Methods: We explore a large-scale and high-dimension dataset coming from a northern Taipei Health Screening Center in Taiwan, and the dataset includes data of 12,707 male and 10,601 female patients processed from around 500,000 records from year 2009 to 2016. We propose three eigenvector-based feature selections taking the Intersection of Union (IoU) and the Coverage to determine the sub-optimal subset of features with the highest IoU and the Coverage automatically, use various long short-term memory (LSTM) related classifiers for FLD prediction, and evaluate the model performance by the test accuracy and the Area Under the Receiver Operating Characteristic Curve (AUROC). Results: Our eigenvector-based feature selection EFS- TW has the highest IOU and the Coverage and the shortest total computing time. For comparison, the highest IOU, the Coverage, and computing time are 30.56%, 45.83% and 260 seconds for female, and that of a benchmark, sequential forward selection (SFS), are 9.09%, 16.67% and 380,350 seconds. The AUROC with LSTM, biLSTM, Gated Recurrent Unit (GRU), Stack-LSTM, Stack-biLSTM are 0.85, 0.86, 0.86, 0.86 and 0.87 for male, and all 0.9 for female, respectively. Conclusion: Our method explores a large-scale and high-dimension FLD dataset, implements three efficient and automatic eigenvector-based feature selections, and develops the model for early prediction of FLD efficiently.
Department of Computer Science and information Engineering National Taiwan University Taipei Taiwan
Department of Family medicine College of Medicine National Taiwan University Taipei Taiwan
Graduate institute of Networking and Multimedia National Taiwan University Taipei Taiwan
Literatura
- 000
- 00000naa a2200000 a 4500
- 001
- bmc21023542
- 003
- CZ-PrNML
- 005
- 20211101115736.0
- 007
- cr|cn|
- 008
- 210928s2021 xr ad fs 000 0|eng||
- 009
- eAR
- 024 7_
- $2 doi $a 10.24105/ejbi.2021.17.8.18-34
- 040 __
- $a ABA008 $d ABA008 $e AACR2 $b cze
- 041 0_
- $a eng
- 044 __
- $a xr
- 100 1_
- $a Liu, Ji-Han $u Graduate institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan
- 245 10
- $a Early prediction for fatty liver disease with eigenvector-based feature selections for model performance enhancement / $c Ji-Han Liu, Kuo-Chin Huang, Feipei Lai
- 504 __
- $a Literatura
- 520 9_
- $a Objectives: This study is aimed to achieve the rapid optimization of the input feature subset that satisfies the expert's point of view and enhance the prediction performance of the early prediction model for fatty liver disease (FLD). Methods: We explore a large-scale and high-dimension dataset coming from a northern Taipei Health Screening Center in Taiwan, and the dataset includes data of 12,707 male and 10,601 female patients processed from around 500,000 records from year 2009 to 2016. We propose three eigenvector-based feature selections taking the Intersection of Union (IoU) and the Coverage to determine the sub-optimal subset of features with the highest IoU and the Coverage automatically, use various long short-term memory (LSTM) related classifiers for FLD prediction, and evaluate the model performance by the test accuracy and the Area Under the Receiver Operating Characteristic Curve (AUROC). Results: Our eigenvector-based feature selection EFS- TW has the highest IOU and the Coverage and the shortest total computing time. For comparison, the highest IOU, the Coverage, and computing time are 30.56%, 45.83% and 260 seconds for female, and that of a benchmark, sequential forward selection (SFS), are 9.09%, 16.67% and 380,350 seconds. The AUROC with LSTM, biLSTM, Gated Recurrent Unit (GRU), Stack-LSTM, Stack-biLSTM are 0.85, 0.86, 0.86, 0.86 and 0.87 for male, and all 0.9 for female, respectively. Conclusion: Our method explores a large-scale and high-dimension FLD dataset, implements three efficient and automatic eigenvector-based feature selections, and develops the model for early prediction of FLD efficiently.
- 590 __
- $a NEINDEXOVÁNO
- 700 1_
- $a Huang, Kuo-Chin $u Department of Family medicine, College of Medicine, National Taiwan University, Taipei, Taiwan
- 700 1_
- $a Lai, Feipei $u Department of Computer Science and information Engineering, National Taiwan University, Taipei, Taiwan
- 773 0_
- $t European journal for biomedical informatics $x 1801-5603 $g Roč. 17, č. 8 (2021), s. 18-34 $w MED00173462
- 856 41
- $u http://www.ejbi.org/ $y domovská stránka časopisu - plný text volně přístupný
- 910 __
- $a ABA008 $b online $y 0 $z 0
- 990 __
- $a 20210927121633 $b ABA008
- 991 __
- $a 20211101120601 $b ABA008
- 999 __
- $a min $b bmc $g 1702515 $s 1144035
- BAS __
- $a 3 $a 4
- BMC __
- $a 2021 $b 17 $c 8 $d 18-34 $i 1801-5603 $m European Journal for Biomedical Informatics $n Eur. J. Biomed. Inform. (Praha) $x MED00173462
- LZP __
- $a NLK 2021-40/dk