Data Science and Machine Learning Lab

Muhammad Usman Shahid khan
[email protected]
https://sites.google.com/site/ushahidkhan/home

 

Current Research Projects: 2
Current Funding: 2 Million PKR
Current PhD Students: 2
Current MS Students: 4
Current Research Assistants: 1

DSML lab aims at providing the benefits of the data science and machine learning fields to ordinary people. To this end, we conduct research that spans the Natural Language processing (NLP), recommender systems, classification techniques, big data, machine learning, and security. Although the abovementioned topics seem completely disjointed, they are all interconnected in providing the benefits of the computer science field to ordinary people. All our works require design, experimentation, quantitative and qualitative analysis, modeling, and simulation to answer the question that how the field can help people.

PI’s previous works in the abovementioned domains has appeared in 60 research publications including many reputable journals, such as IEEE Transactions on Dependable and Secure Computing, IEEE System Journal, IEEE Transactions on Service Computing, IEEE Transaction on Cloud Computing, and Future Generation Computer Systems. We now describe some of our recent published and on-going research works.

Previous Works and Research Interests:

  1. Network traffic analysis

Our recent work is on the analysis of the network traffic to identify the video being watched by the users connected to the network. The results of our works have shown that ISP can identify the title of the videos being watched by the users even if the traffic is encrypted by VPN and HTTPS [Kha22] [AfB22].

  1. Social media analysis

A major portion of PI’s research is on the analysis of data collected from social media, such as Twitter. In [KhA21], we proposed a convolutional neural network-based framework called “HateClassify” for labelling of social media contents as hate speech, offensive, or non-offensive. A methodology to separate spammers and bloggers from genuine experts on twitter is proposed in [KhA18].  The proposed approach employs modified Hyperlink Induced Topic Search (HITS) to separate the unsolicited bloggers from the experts on Twitter based on tweets. The approach considers domain specific keywords in the tweets and several tweet characteristics to identify the unsolicited bloggers.

  1. IoT and Sensors data analysis

Several of PI’s works involve the analysis of data from IoT devices. In [KhA18b], we proposed a methodology that utilizes the energy expenditure for human activities and reduces the dimensions of the feature space to differentiate among different human activities. A convolutional neural network was utilized to identify the human activities in [KhA18c]. He also co-edit a book “Big Data-enabled IoT” along with Dr. Samee U. Khan and Dr. Albert Y. Zomaya. The book covers the challenges and opportunities in the field of Big Data-enabled IoT [KhK19].

  1. Recommender Systems

Recommendation systems have remained one of my most interesting research topics. The work [KhK16] was accepted in IEEE Transactions on Service Computing. In this paper, PI proposed a scalable emergency evacuation service, termed the MacroServ that recommends the evacuees with the most preferred routes towards safe locations during a disaster. In [KhK14], we proposed the venue recommendation system using ant colony optimization methodology.

  1. Machine learning and optimization

The success of machine learning has always inspired us and is among favorite research interests of PI. In [KhJ21], PI proposed an optimization technique named adaptive diff-batch or adadb that removes the problem of overshooting gradient in Adam, slow convergence in diffGrad, and combines the methods with adaptive batch size for further increase in convergence rate. Similarly, in [KhK20], PI proposed an adoptive batch size methodology to overcome the problem of slow convergence in diffGrad optimizer algorithm. Moreover, many of his research works can be categorized into applied AI and machine learning field. Those works involve the classification of malwares, cloud computing, and security.

Ongoing and Future Works:

  1. Concept drift Problem

Concept drift is one of the major problems in practical deployment of machine learning algorithms in the real world. Due to this problem, machine learning models tend to forget the learning to change in training and real-world data and their accuracy decreases with the passage of time. Currently, in DSML lab, we are working on finding and developing methods for handling the concept drift in the network traffic data.

  1. Out-of-distribution Problem

Machine learning models are highly biased toward the known classes and predict the wrong class with high confidence for unseen and unknown data. This problem is known as out-of-distribution (OOD) problem in machine learning communities. We are also working on developing methods to handle this problem.

  1. Explainable Machine learning in Health and Bioinformatics data

Our lab is currently working on Explainable Machine Learning techniques applied to health and bioinformatics data. The focus is on developing models that not only achieve high predictive accuracy but also provide interpretable insights into complex biological and clinical datasets. By integrating explainability, we aim to enhance trust and usability of machine learning solutions for healthcare professionals, enabling better diagnosis, personalized treatment, and data-driven decision-making.

  1. Facial Recognition-based Attendance system for Classroom setting

Our lab is actively developing a Facial Recognition-based Attendance System tailored for classroom settings. This system leverages advanced computer vision and machine learning techniques to accurately identify students and record their attendance in real time. The project emphasizes efficiency, privacy, and scalability, ensuring a seamless integration into educational environments. By automating the attendance process, we aim to save valuable instructional time and provide educators with actionable insights into student participation.

References:

[KhB22] M. U. S. Khan, S. M. A. H. Bukhari, T. Maqsood, M. A.B. Fayyaz, D. Dancey, and R. Nawaz, “SCNN-Attack: A Side-Channel Attack to Identify YouTube Videos in a VPN and Non-VPN Network Traffic,” Electronics, vol. 11, pp. 350, 2022

[AfB22] W. Afandi, S. M. A. H. Bukhari, M. U. S. Khan, T. Maqsood, S. U. Khan, “Fingerprinting Technique for YouTube Videos Identification in Network Traffic,” IEEE Access, vol. 10, pp. 76731-76741, 2022

[KhA21] M.U. S. Khan, A. Abbas, A. Rehman and R. Nawaz, “HateClassify: A Service Framework for Hate Speech Identification on Social Media,” IEEE Internet Computing, Volume: 25, Issue: 1, pp. 40-49, Jan.-Feb. 1 2021

[KhA18] M.U.S. Khan, M. Ali, A. Abbas, S. U. Khan, and A. Y. Zomaya, “Segregating spammers and unsolicited bloggers from genuine experts on twitter,” IEEE Transactions on Dependable and Secure Computing, vol. 15, no. 4, pp. 551–560, 2018

[KhA18b] M.U.S. Khan, A. Abbas, M. Ali, M. Jawad, S. U. Khan, K. Li, and A. Y. Zomaya, “On the correlation of sensor location and human activity recognition in body area networks (bans),” IEEE Systems Journal, vol. 12, no. 1, pp. 82–91, 2018

[KhA18c] M. U. S. Khan, A. Abbas, M. Jawad, M. Ali, and S. U. Khan, “Convolutional Neural Networks as Means to Identify Apposite Sensor Combination for Human Activity Recognition,” in 3rd IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), Washington D.C., USA, September 2018

[KhK19] M. U. S. Khan, S. U. Khan, and A. Y. Zomaya, Big Data-Enabled Internet of Things, IET Press, London, UK, 2019, XIII, 488 p., ISBN 978–1–78561–636–5.

[KhK17] M.U.S. Khan, O. Khalid, Y. Huang, F. Zhang, R. Ranjan, S. U. Khan, J.Cao, K. Li, B. Veeravalli, and A. Zomaya, “MacroServ: A Route Recommendation Service for Large-Scale Evacuations,” IEEE Transaction of Service Computing, vol. 10, no. 4, pp. 589 – 602, July-Aug. 1 2017

[KhK14] O. Khalid, M.U.S. Khan, S. U. Khan, and A. Y. Zomaya, “OmniSuggest: A Ubiquitous Cloud based Context Aware Recommendation System for Mobile Social Networks,” IEEE Transactions on Services Computing, vol. 7, no. 3, pp. 401-414, 2014

[KhJ21] ]M. U. S. Khan, M. Jawad, and S. U. Khan, “Adadb: Adaptive Diff-batch Optimization Technique for Gradient Descent,” IEEE Access, vol. 9, pp. 99581-99588, 2021

[KhK20] W. Khan, S. Ali, M. U. S. Khan, M. Jawad, M. Ali and R. Nawaz, “AdaDiffGrad: An Adaptive Batch Size Implementation Technique for DiffGrad Optimization Method,” 2020 14th International Conference on Innovations in Information Technology (IIT), Al Ain, pp. 209-214, 2020