Using Entropy in Web Usage Data Preprocessing
Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic
Typ dokumentu časopisecké články
Grantová podpora
APVV-14-0336
Slovak Research and Development Agency
VEGA 1/0776/18
Scientific Grant Agency of the Ministry of Education of the Slovak Republic (ME SR) and of Slovak Academy of Sciences (SAS)
PubMed
33265164
PubMed Central
PMC7512266
DOI
10.3390/e20010067
PII: e20010067
Knihovny.cz E-zdroje
- Klíčová slova
- Reference Length, data preprocessing, information entropy, session identification, web usage mining,
- Publikační typ
- časopisecké články MeSH
The paper is focused on an examination of the use of entropy in the field of web usage mining. Entropy creates an alternative possibility of determining the ratio of auxiliary pages in the session identification using the Reference Length method. The experiment was conducted on two different web portals. The first log file was obtained from a course of virtual learning environment web portal. The second log file was received from the web portal with anonymous access. A comparison of the results of entropy estimation of the ratio of auxiliary pages and a sitemap estimation of the ratio of auxiliary pages showed that in the case of sitemap abundance, entropy could be a full-valued substitution for the estimate of the ratio of auxiliary pages.
Zobrazit více v PubMed
Cooley R., Mobasher B., Srivastava J. Data preparation for mining world wide web browsing patterns. Knowl. Inf. Syst. 1999;1:5–32. doi: 10.1007/BF03325089. DOI
Munk M., Kapusta J., Švec P. Data preprocessing evaluation for web log mining: Reconstruction of activities of a web visitor. Procedia Comput. Sci. 2010;1:2273–2280. doi: 10.1016/j.procs.2010.04.255. DOI
Shannon C.E. A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 2001;5:3–55. doi: 10.1145/584091.584093. DOI
Clausius R. Annalen der Physik. Dover; Mineola, NY, USA: 1960. On the Motive Power of Heat, and on the Laws which Can be Deduced from it for the Theory of Heat.
Holzinger A., Hörtenhuber M., Mayer C., Bachler M., Wassertheurer S., Pinho A.J., Koslicki D. On Entropy-Based Data Mining. In: Holzinger A., Jurisica I., editors. Interactive Knowledge Discovery and Data Mining in Biomedical Informatics: State-of-the-Art and Future Challenges. Springer; Berlin/Heidelberg, Germany: 2014. pp. 209–226.
Lima C.F.L., de Assis F.M., de Souza C.P. A Comparative Study of Use of Shannon, Rényi and Tsallis Entropy for Attribute Selecting in Network Intrusion Detection; Proceedings of the 13th International Conference on Intelligent Data Engineering and Automated Learning; Natal, Brazil. 29–31 August 2012; Berlin/Heidelberg, Germany: Springer; 2012. pp. 492–501.
Arora P.N. On the Shannon measure of entropy. Inf. Sci. 1981;23:1–9. doi: 10.1016/0020-0255(81)90036-0. DOI
Jaynes E.T. Information theory and statistical mechanics. Phys. Rev. 1957;106:620. doi: 10.1103/PhysRev.106.620. DOI
Karmeshu J., editor. Entropy Measures, Maximum Entropy Principle, and Emerging Applications. Springer; Berlin/Heidelberg, Germany: 2003.
Harremoeës P., Topsøe F. Maximum Entropy Fundamentals. Entropy. 2001;3:191–226. doi: 10.3390/e3030191. DOI
Kumar S., Abhishek K., Singh M.P. Accessing Relevant and Accurate Information using Entropy. Procedia Comput. Sci. 2015;54:449–455. doi: 10.1016/j.procs.2015.06.052. DOI
Liu J., Lin Y., Lin M., Wu S., Zhang J. Feature selection based on quality of information. Neurocomputing. 2017;225:11–22. doi: 10.1016/j.neucom.2016.11.001. DOI
Arce T., Román P.E., Velásquez J., Parada V. Identifying web sessions with simulated annealing. Expert Syst. Appl. 2014;41:1593–1600. doi: 10.1016/j.eswa.2013.08.056. DOI
Levene M., Loizou G. Computing the Entropy of User Navigation in the Web. Int. J. Inf. Technol. Decis. Mak. 2003;2:459–476. doi: 10.1142/S0219622003000768. DOI
Maung H.M., Win K. An Efficient Test Cases Reduction Approach in User Session Based Testing. Int. J. Inf. Educ. Technol. 2015;5:768–771.
Maung H.M., Win K. Advances in Intelligent Systems and Computing, Proceedings of the Genetic and Evolutionary Computing (GEC 2015), Yangon, Myanmar, 26–28 August 2015. Volume 388. Springer; Cham, Switzerland: 2015. Entropy Based Test Cases Reduction Algorithm for User Session Based Testing; pp. 365–373.
Jin X., Zhou Y., Mobasher B. A maximum entropy web recommendation system; Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD’ 05); Chicago, IL, USA. 21–24 August 2005; New York, NY, USA: ACM Press; 2005. p. 612.
Wang J., Li M., Han J., Wang X. Modeling Check-in Preferences with Multidimensional Knowledge: A Minimax Entropy Approach; Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM’ 16); San Francisco, CA, USA. 22–25 February 2016; pp. 297–306.
Ibl M., Čapek J. Measure of Uncertainty in Process Models Using Stochastic Petri Nets and Shannon Entropy. Entropy. 2016;18:14. doi: 10.3390/e18010033. DOI
Ibl M., Čapek J. A Behavioural Analysis of Complexity in Socio-Technical Systems under Tension Modelled by Petri Nets. Entropy. 2017;19:572. doi: 10.3390/e19110572. DOI
Wang H., Wang L., Yi L. Maximum Entropy framework used in text classification; Proceedings of the 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems; Xiamen, China. 29–31 October 2010; pp. 828–833.
Erlandsson F., Bródka P., Borg A., Johnson H. Finding Influential Users in Social Media Using Association Rule Learning. Entropy. 2016;18:164. doi: 10.3390/e18050164. DOI
Bereziński P., Jasiul B., Szpyrka M. An Entropy-Based Network Anomaly Detection Method. Entropy. 2015;17:2367–2408. doi: 10.3390/e17042367. DOI
Jozani M.J., Ahmadi J. On uncertainty and information properties of ranked set samples. Inf. Sci. 2014;264:291–301. doi: 10.1016/j.ins.2013.12.025. DOI
Kao H.-Y., Chen M.-S., Lin S.-H., Ho J.-M. Entropy-based link analysis for mining web informative structures; Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM’ 02); McLean, VA, USA. 4–9 November 2002; pp. 574–581.
Kao H.-Y., Lin S.-H., Ho J.-M., Chen M.-S. Mining Web Informative Structures and Contents Based on Entropy Analysis. IEEE Trans. Knowl. Data Eng. 2004;16:41–55. doi: 10.1109/TKDE.2004.1264821. DOI
Wei S., Zhu Y. Cleaning Out Web Spam by Entropy-Based Cascade Outlier Detection; Proceedings of the Database and Expert Systems Applications; Lyon, France. 28–31 August 2017; Cham, Switzerland: Springer; 2017.
Agreste S., De Meo P., Ferrara E., Piccolo S., Provetti A. Analysis of a Heterogeneous Social Network of Humans and Cultural Objects. IEEE Trans. Syst. Man Cybern. Syst. 2015;45:559–570. doi: 10.1109/TSMC.2014.2378215. DOI
De Meo P., Ferrara E., Abel F., Aroyo L., Houben G.-J. Analyzing user behavior across social sharing environments. ACM Trans. Intell. Syst. Technol. 2013;5:14.
Patil P., Patil U. Preprocessing of web server log file for web mining. World J. Sci. Technol. 2012;2:14–18.
Spiliopoulou M., Mobasher B., Berendt B., Nakagawa M. A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis. INFORMS J. Comput. 2003;15:171–190. doi: 10.1287/ijoc.15.2.171.14445. DOI
Kapusta J., Munk M., Drlík M. Cut-off time calculation for user session identification by reference length; Proceedings of the 2012 6th International Conference on Application of Information and Communication Technologies (AICT 2012); Tbilisi, Georgia. 17–19 October 2012.
Munk M., Benko L’., Gangur M., Turčáni M. Influence of ratio of auxiliary pages on the pre-processing phase of Web Usage Mining. E M Ekon. Manag. 2015;18:144–159.
Munk M., Benko L’. Improving the Session Identification Using the Ratio of Auxiliary Pages Estimate; Proceedings of the Mediterranean Conference on Information & Communication Technologies (MedICT); Saidia, Morocco. 7–9 May 2015; pp. 551–556.
Munk M., Drlik M., Benko L., Reichel J. Quantitative and Qualitative Evaluation of Sequence Patterns Found by Application of Different Educational Data Preprocessing Techniques. IEEE Access. 2017;5:8989–9004. doi: 10.1109/ACCESS.2017.2706302. DOI
Berry M.J.A., Linoff G.S. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. John Wiley & Sons; Hoboken, NJ, USA: 2004.
Benko L’., Reichel J., Munk M. Analysis of student behavior in virtual learning environment depending on student assessments; Proceedings of the 13th International Conference on Emerging eLearning Technologies and Applications (ICETA 2015); Stary Smokovec, Slovakia. 26–27 November 2015; pp. 33–38.
Kapusta J., Munk M., Drlík M. Analysis of Differences between Expected and Observed Probability of Accesses to Web Pages. In: Hwang D., Jung J., Nguyen N.-T., editors. Lecture Notes in Computer Science, Proceedings of the Computational Collective Intelligence. Technologies and Applications, Seoul, Korea, 24–26 September 2014. Volume 8733. Springer; Berlin/Heidelberg, Germany: 2014. pp. 673–683.
Kapusta J., Munk M., Drlík M. Lecture Notes in Computer Science. Volume 9240 Springer; Berlin/Heidelberg, Germany: 2015. Identification of Underestimated and Overestimated Web Pages Using Pagerank and Web Usage Mining Methods.