Using Entropy in Web Usage Data Preprocessing

. 2018 Jan 22 ; 20 (1) : . [epub] 20180122

Status PubMed-not-MEDLINE Jazyk angličtina Země Švýcarsko Médium electronic

Typ dokumentu časopisecké články

Perzistentní odkaz   https://www.medvik.cz/link/pmid33265164

Grantová podpora
APVV-14-0336 Slovak Research and Development Agency
VEGA 1/0776/18 Scientific Grant Agency of the Ministry of Education of the Slovak Republic (ME SR) and of Slovak Academy of Sciences (SAS)

The paper is focused on an examination of the use of entropy in the field of web usage mining. Entropy creates an alternative possibility of determining the ratio of auxiliary pages in the session identification using the Reference Length method. The experiment was conducted on two different web portals. The first log file was obtained from a course of virtual learning environment web portal. The second log file was received from the web portal with anonymous access. A comparison of the results of entropy estimation of the ratio of auxiliary pages and a sitemap estimation of the ratio of auxiliary pages showed that in the case of sitemap abundance, entropy could be a full-valued substitution for the estimate of the ratio of auxiliary pages.

Zobrazit více v PubMed

Cooley R., Mobasher B., Srivastava J. Data preparation for mining world wide web browsing patterns. Knowl. Inf. Syst. 1999;1:5–32. doi: 10.1007/BF03325089. DOI

Munk M., Kapusta J., Švec P. Data preprocessing evaluation for web log mining: Reconstruction of activities of a web visitor. Procedia Comput. Sci. 2010;1:2273–2280. doi: 10.1016/j.procs.2010.04.255. DOI

Shannon C.E. A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 2001;5:3–55. doi: 10.1145/584091.584093. DOI

Clausius R. Annalen der Physik. Dover; Mineola, NY, USA: 1960. On the Motive Power of Heat, and on the Laws which Can be Deduced from it for the Theory of Heat.

Holzinger A., Hörtenhuber M., Mayer C., Bachler M., Wassertheurer S., Pinho A.J., Koslicki D. On Entropy-Based Data Mining. In: Holzinger A., Jurisica I., editors. Interactive Knowledge Discovery and Data Mining in Biomedical Informatics: State-of-the-Art and Future Challenges. Springer; Berlin/Heidelberg, Germany: 2014. pp. 209–226.

Lima C.F.L., de Assis F.M., de Souza C.P. A Comparative Study of Use of Shannon, Rényi and Tsallis Entropy for Attribute Selecting in Network Intrusion Detection; Proceedings of the 13th International Conference on Intelligent Data Engineering and Automated Learning; Natal, Brazil. 29–31 August 2012; Berlin/Heidelberg, Germany: Springer; 2012. pp. 492–501.

Arora P.N. On the Shannon measure of entropy. Inf. Sci. 1981;23:1–9. doi: 10.1016/0020-0255(81)90036-0. DOI

Jaynes E.T. Information theory and statistical mechanics. Phys. Rev. 1957;106:620. doi: 10.1103/PhysRev.106.620. DOI

Karmeshu J., editor. Entropy Measures, Maximum Entropy Principle, and Emerging Applications. Springer; Berlin/Heidelberg, Germany: 2003.

Harremoeës P., Topsøe F. Maximum Entropy Fundamentals. Entropy. 2001;3:191–226. doi: 10.3390/e3030191. DOI

Kumar S., Abhishek K., Singh M.P. Accessing Relevant and Accurate Information using Entropy. Procedia Comput. Sci. 2015;54:449–455. doi: 10.1016/j.procs.2015.06.052. DOI

Liu J., Lin Y., Lin M., Wu S., Zhang J. Feature selection based on quality of information. Neurocomputing. 2017;225:11–22. doi: 10.1016/j.neucom.2016.11.001. DOI

Arce T., Román P.E., Velásquez J., Parada V. Identifying web sessions with simulated annealing. Expert Syst. Appl. 2014;41:1593–1600. doi: 10.1016/j.eswa.2013.08.056. DOI

Levene M., Loizou G. Computing the Entropy of User Navigation in the Web. Int. J. Inf. Technol. Decis. Mak. 2003;2:459–476. doi: 10.1142/S0219622003000768. DOI

Maung H.M., Win K. An Efficient Test Cases Reduction Approach in User Session Based Testing. Int. J. Inf. Educ. Technol. 2015;5:768–771.

Maung H.M., Win K. Advances in Intelligent Systems and Computing, Proceedings of the Genetic and Evolutionary Computing (GEC 2015), Yangon, Myanmar, 26–28 August 2015. Volume 388. Springer; Cham, Switzerland: 2015. Entropy Based Test Cases Reduction Algorithm for User Session Based Testing; pp. 365–373.

Jin X., Zhou Y., Mobasher B. A maximum entropy web recommendation system; Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD’ 05); Chicago, IL, USA. 21–24 August 2005; New York, NY, USA: ACM Press; 2005. p. 612.

Wang J., Li M., Han J., Wang X. Modeling Check-in Preferences with Multidimensional Knowledge: A Minimax Entropy Approach; Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (WSDM’ 16); San Francisco, CA, USA. 22–25 February 2016; pp. 297–306.

Ibl M., Čapek J. Measure of Uncertainty in Process Models Using Stochastic Petri Nets and Shannon Entropy. Entropy. 2016;18:14. doi: 10.3390/e18010033. DOI

Ibl M., Čapek J. A Behavioural Analysis of Complexity in Socio-Technical Systems under Tension Modelled by Petri Nets. Entropy. 2017;19:572. doi: 10.3390/e19110572. DOI

Wang H., Wang L., Yi L. Maximum Entropy framework used in text classification; Proceedings of the 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems; Xiamen, China. 29–31 October 2010; pp. 828–833.

Erlandsson F., Bródka P., Borg A., Johnson H. Finding Influential Users in Social Media Using Association Rule Learning. Entropy. 2016;18:164. doi: 10.3390/e18050164. DOI

Bereziński P., Jasiul B., Szpyrka M. An Entropy-Based Network Anomaly Detection Method. Entropy. 2015;17:2367–2408. doi: 10.3390/e17042367. DOI

Jozani M.J., Ahmadi J. On uncertainty and information properties of ranked set samples. Inf. Sci. 2014;264:291–301. doi: 10.1016/j.ins.2013.12.025. DOI

Kao H.-Y., Chen M.-S., Lin S.-H., Ho J.-M. Entropy-based link analysis for mining web informative structures; Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM’ 02); McLean, VA, USA. 4–9 November 2002; pp. 574–581.

Kao H.-Y., Lin S.-H., Ho J.-M., Chen M.-S. Mining Web Informative Structures and Contents Based on Entropy Analysis. IEEE Trans. Knowl. Data Eng. 2004;16:41–55. doi: 10.1109/TKDE.2004.1264821. DOI

Wei S., Zhu Y. Cleaning Out Web Spam by Entropy-Based Cascade Outlier Detection; Proceedings of the Database and Expert Systems Applications; Lyon, France. 28–31 August 2017; Cham, Switzerland: Springer; 2017.

Agreste S., De Meo P., Ferrara E., Piccolo S., Provetti A. Analysis of a Heterogeneous Social Network of Humans and Cultural Objects. IEEE Trans. Syst. Man Cybern. Syst. 2015;45:559–570. doi: 10.1109/TSMC.2014.2378215. DOI

De Meo P., Ferrara E., Abel F., Aroyo L., Houben G.-J. Analyzing user behavior across social sharing environments. ACM Trans. Intell. Syst. Technol. 2013;5:14.

Patil P., Patil U. Preprocessing of web server log file for web mining. World J. Sci. Technol. 2012;2:14–18.

Spiliopoulou M., Mobasher B., Berendt B., Nakagawa M. A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis. INFORMS J. Comput. 2003;15:171–190. doi: 10.1287/ijoc.15.2.171.14445. DOI

Kapusta J., Munk M., Drlík M. Cut-off time calculation for user session identification by reference length; Proceedings of the 2012 6th International Conference on Application of Information and Communication Technologies (AICT 2012); Tbilisi, Georgia. 17–19 October 2012.

Munk M., Benko L’., Gangur M., Turčáni M. Influence of ratio of auxiliary pages on the pre-processing phase of Web Usage Mining. E M Ekon. Manag. 2015;18:144–159.

Munk M., Benko L’. Improving the Session Identification Using the Ratio of Auxiliary Pages Estimate; Proceedings of the Mediterranean Conference on Information & Communication Technologies (MedICT); Saidia, Morocco. 7–9 May 2015; pp. 551–556.

Munk M., Drlik M., Benko L., Reichel J. Quantitative and Qualitative Evaluation of Sequence Patterns Found by Application of Different Educational Data Preprocessing Techniques. IEEE Access. 2017;5:8989–9004. doi: 10.1109/ACCESS.2017.2706302. DOI

Berry M.J.A., Linoff G.S. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. John Wiley & Sons; Hoboken, NJ, USA: 2004.

Benko L’., Reichel J., Munk M. Analysis of student behavior in virtual learning environment depending on student assessments; Proceedings of the 13th International Conference on Emerging eLearning Technologies and Applications (ICETA 2015); Stary Smokovec, Slovakia. 26–27 November 2015; pp. 33–38.

Kapusta J., Munk M., Drlík M. Analysis of Differences between Expected and Observed Probability of Accesses to Web Pages. In: Hwang D., Jung J., Nguyen N.-T., editors. Lecture Notes in Computer Science, Proceedings of the Computational Collective Intelligence. Technologies and Applications, Seoul, Korea, 24–26 September 2014. Volume 8733. Springer; Berlin/Heidelberg, Germany: 2014. pp. 673–683.

Kapusta J., Munk M., Drlík M. Lecture Notes in Computer Science. Volume 9240 Springer; Berlin/Heidelberg, Germany: 2015. Identification of Underestimated and Overestimated Web Pages Using Pagerank and Web Usage Mining Methods.

Najít záznam

Citační ukazatele

Nahrávání dat ...

Možnosti archivace

Nahrávání dat ...