A Meta Learning-Based Framework for Automated Selection and Hyperparameter Tuning for Clustering

Novel technologies in automated machine learning ease the complexity of algorithm selection and hyper-parameter optimization. However, these are usually restricted to supervised learning tasks such as classification and regression, while unsupervised learning remains a largely unexplored problem. In this project, we offer a solution for automating machine learning specifically for the case of unsupervised learning with clustering, in a domain-agnostic manner. This is achieved through a combination of state-of-the-art processes based on meta-learning for algorithm and evaluation criteria selection, and evolutionary algorithm for hyper-parameter tuning.

Depending on AutoML frameworks as black-box can leave machine learning practitioners without insights into the inner working of the AutoML process and hence influence their trust in the models produced. In addition, excluding humans from the loop creates several limitations. For example, most of the current AutoML frameworks ignore the user preferences on defining or controlling the search space, which consequently can impact the performance of the models produced and the acceptance of these models by the end-users. The research in the area of transparency and controllability of AutoML has attracted much interest lately, both in academia and industry. How ever, existing tools are usually restricted to supervised learning tasks such as classification and regression, while unsupervised learning, particularly clustering, remains a largely unexplored problem. Motivated by these shortcomings, this project focuses on interactive visualization tool that enables users to refine the search space of AutoML and analyze the results. The project also focuses on meta-learning techniques to recommend a time budget that is likely adequate for a new dataset to obtain well-performing pipeline.

Project Publications:

  • R. El Shawi, H. Lekunze , S. Sakr. cSmartML: A Meta Learning-Based Framework for Automated Selection and Hyperparameter Tuning for Clustering. IEEE BigData 2021. link
  • R. El Shawi, S. Sakr. cSmartML-Glassbox: Increasing Transparency and Controllability in Automated Clustering. ICDMW 2022. [To appear]
  • R. El Shawi, S. Sakr. TPE-AutoClust: A Tree-based Pipline Ensemble Framework for Automated Clustering. ICDMW 2022. [To appear]

Automated Selection and Hyperparameter Optimization for Supervised Tasks

A major obstacle for developing machine learning models using big data is the challenging and time consuming process of identifying and training an adequate predictive model. Therefore, machine learning mode building is a highly iterative exploratory process where most scientists work hard to find the best model or algorithm that meets their performance requirement. In practice, there is no one-model-fits-all solutions, thus, there is no single model or algorithm that can handle all data set varieties and changes in data that may occur over time. All machine learning algorithms require user defined inputs to achieve a balance between accuracy and generalizability, this is referred to as hyperparameter optimization. This iterative and explorative nature of the building of distributed process is prohibitively expensive with big datasets. In this project, we addressed different issues about the hyperparamter optimization problem including the scalability, and controlability.

Although the research area of automated feature engineering has attracted much interest lately, both in academia and industry, the scalability and efficiency of the existing systems and tools are still practically unsatisfactory. This project focuses on scalable and interpretable automated feature engineering, that optimizes input features' quality to maximize the predictive performance according to a user-defined metric.

Project Publications:

  • S. Amashukeli, R. Elshawi, S. Sakr. iSmartML: An Interactive and User-Guided Framework for Automated Machine Learning. In HILDA 2020 : Workshop on Human-In-the-Loop Data Analytics. link
  • A. Abd Elrahman, M. El Helw, R. Elshawi, S. Sakr. D-SmartML: A Distributed Automated Machine Learning Framework. In2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS) 2020 Nov 1 (pp. 1215-1218). link
  • S. Dyrmishi, R. Elshawi, S. Sakr. A decision support framework for automl systems: A meta-learning approach. In 2019 International Conference on Data Mining Workshops (ICDMW) 2019 Nov 8 (pp. 97-106). IEEE. link
  • R. Elshawi, S. Sakr. Automated Machine Learning: Techniques and Frameworks. InEuropean Big Data Management and Analytics Summer School 2019 Jun 30 (pp. 40-69). Springer, Cham. link
  • R. Elshawi, M. Maher, S. Sakr. Automated machine learning: State-of-the-art and open challenges. arXiv preprint arXiv:1906.02287. 2019 Jun 5. link
  • H.Eldeeb , S. Amashukeli, R. El Shawi. BigFeat: Scalable and Interpretable Automated Feature Engineering Framework. IEEE BigData 2022. [To appear]

Machine learning Interpretability in Healthcare

Although complex machine learning models (e.g., Random Forest, Neural Networks) are commonly outperforming the traditional and simple interpretable models (e.g., Linear Regression, Decision Tree), in the healthcare domain, clinicians find it hard to understand and trust these complex models due to the lack of intuition and explanation of their predictions. With the new General Data Protection Regulation (GDPR), the importance for plausibility and verifiability of the predictions made by machine learning models has become essential. Hence, interpretability techniques for machine learning models are an area focus of this project. In general, the main aim of these interpretability techniques is to shed light and provide insights into the prediction process of the machine learning models and to be able to explain how the results from the prediction was generated. The project focuses on the following:

  • Proposing fundamental quantitative measures for assessing the quality of interpretability techniques . In addition, we present a comprehensive experimental evaluation of state-of-the-art local model agnostic interpretability techniques.
  • Proposing a novel local model agnostic explanation framework for learning a set of high-level transparent concept definitions in high-dimensional tabular data that uses clinician-labeled concept rather than raw features. Such framework explains the prediction of an instance using concepts that align with the clinician's knowledge about what a concept means and facilitates explaining the prediction of an instance through an interpretable model that includes concepts that are deemed important to the black-box model in predicting the decision of the instance.

Project Publications:

  • R.El Shawi, & M. Al‐Mallah. Interpretable Local Concept-based Explanation with Human Feedback to Predict All-cause Mortality. Journal of Artificial Intelligence Research, 2022;75: 833-855. link
  • R.El Shawi, K. Kilanava, and S. Sakr. "An interpretable semi-supervised framework for patch-based classification of breast cancer." Scientific Reports 12, no. 1 (2022): 1-15. link
  • R. ElShawi, Y. Sherif, M. Al‐Mallah, S. Sakr. Interpretability in healthcare: A comparative study of local machine learning interpretability techniques. Computational Intelligence. 2021 Nov;37(4):1633-50. link
  • R. Elshawi, MH Al-Mallah, S. Sakr. On the interpretability of machine learning-based model for predicting hypertension. BMC medical informatics and decision making. 2019 Dec;19(1):1-32. link
  • R. ElShawi, Y Sherif, M. Al-Mallah, S. Sakr. ILIME: Local and Global Interpretable Model-Agnostic Explainer of Black-Box Decision. InEuropean Conference on Advances in Databases and Information Systems 2019 Sep 8 (pp. 53-68). Springer, Cham. link

Interpretability of Black box models

In this project, we focus on different kinds on Interpretability; model specific and model agnostic techniques. We developed ILIME, a novel technique that explains the prediction of any supervised learning-based prediction model by relying on an interpretation mechanism that is based on the most influencing instances for the prediction of the instance to be explained. We demonstrate the effectiveness of our approach by explaining different models on different datasets. In addition, we present a global attribution technique that aggregates the local explanations generated from ILIME into few global explanations that can mimic the behaviour of the black-box model globally in a simple way.

Additionally, this project focuses on developing Automated Concept-based Decision Tree Explanations that provides human-understandable concept-based explanations for classification networks. Such explanation technique provides end-users with the flexibility of customising the model explanations by allowing them to choose the concepts of interest among a set of automatically extracted visual human-understandable concepts and infer such concepts from the hidden layer activations. Then, such concepts are interpreted through a shallow decision tree that includes concepts deem important to the model.

Project Publications:

  • R. ElShawi, Y Sherif, M. Al-Mallah, S. Sakr. ILIME: Local and Global Interpretable Model-Agnostic Explainer of Black-Box Decision. InEuropean Conference on Advances in Databases and Information Systems 2019 Sep 8 (pp. 53-68). Springer, Cham. link
  • R. El Shawi, Y. Sherif, S. Sakr. Towards Automated Concept-based Decision TreeExplanations for CNNs. In EDBT 2021 (pp. 379-384). link