Publications
2022
- N-1 Experts: Unsupervised Anomaly Detection Model SelectionConstantin Le Clei, Yasha Pushak, Fatjon Zogaj, and 6 more authorsIn First Conference on Automated Machine Learning (Late-Breaking Workshop) 2022
Manually finding the best combination of machine learning training algorithm, model and hyperparameters can be challenging. In supervised settings, this burden has been alleviated with the introduction of automated machine learning (AutoML) methods. However, similar methods are noticeably absent for fully unsupervised applications, such as anomaly detection. We introduce one of the first such methods, N-1 Experts, which we compare to a recent state-of-the-art baseline, MetaOD, and show favourable performance.
2021
- Portfolio Optimization on Classical and Quantum Computers Using PortFawnMoein Owhadi-Kareshk, and Pierre Boulanger2021
Portfolio diversification is one of the most effective ways to minimize investment risk. Individuals and fund managers aim to create a portfolio of assets that not only have high returns but are also uncorrelated. This goal can be achieved by comparing the historical performance, fundamentals, predictions, news sentiment, and many other parameters that can affect the portfolio’s value. One of the most well-known approaches to manage/optimize portfolios is the well-known mean-variance (Markowitz) portfolio. The algorithm’s inputs are the expected returns and risks (volatility), and its output is the optimized weights for each asset in the target portfolio. Simplified unrealistic assumptions and constraints were used in its original version preventing its use in practical cases. One solution to improve its usability is by altering the parameters and constraints to match investment goals and requirements. This paper introduces PortFawn, an open-source Python library to create and backtest mean-variance portfolios. PortFawn provides simple-to-use APIs to create and evaluate mean-variance optimization algorithms using classical computing (real-valued asset weights) as well as quantum annealing computing (binary asset weights). This tool has many parameters to customize the target portfolios according to the investment goals. The paper introduces the background and limitations of the mean-variance portfolio optimization algorithm, its architecture, and a description of the functionalities of PortFawn. We also show how one can use this tool in practice using a simple investment scenario.
2020
- Predicting Textual Merge ConflictsMoein Owhadi Kareshk2020
During collaborative software development, developers often use branches to add features or fix bugs. When merging changes from two branches, conflicts may occur if the changes are inconsistent. Developers need to resolve these conflicts before completing the merge, which is an error-prone and time-consuming process. Early detection of merge conflicts, which warns developers about resolving conflicts before they become large and complicated, is among the ways of dealing with this problem. Existing techniques do this by continuously pulling and merging all combinations of branches in the background to notify developers as soon as a conflict occurs, which is a computationally expensive process. One potential way for reducing this cost is to use a machine learning based conflict predictor that filters out the merge scenarios that are not likely to have conflicts, i.e. safe merge scenarios. In this thesis, we assess if conflict prediction is feasible. We employed binary classifiers to predict merge conflicts based on 9 light-weight Git feature sets. We train and test predictors for each repository separately. To evaluate our predictors, we perform a large-scale study on 147,967 merges from 105 GitHub repositories in seven programming languages. Our results show that decision trees can achieve high f1-scores, varying from 0.93 to 0.95 for repositories in seven different programming languages when predicting safe merges. The f1-score is between 0.45 and 0.71 for the conflicting merges. Our results indicate that predicting conflicts is feasible, which suggests it may successfully be used as a pre-filtering criteria for speculative merging.
2019
- Scalable software merging studies with merganserMoein Owhadi-Kareshk, and Sarah NadiIn 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) 2019
Software merging researchers constantly need empirical data of real-world merge scenarios to analyze. Such data is currently extracted through individual and isolated efforts, often with non-systematically designed scripts that may not easily scale to large studies. This hinders replication and proper comparison of results. In this paper, we introduce MERGANSER, a scalable and easy-to-use tool for extracting and analyzing merge scenarios in Git repositories. In addition to extracting basic information about merge scenarios from Git history, our tool also replays each merge to detect conflicts and stores the corresponding information of conflicting files and regions. We design a normalized and extensible SQL data schema to store the information of the analyzed repositories, merge scenarios and involved commits, and merge replays and conflicts. By running only one command, our proposed tool clones the target repositories, detects their merge scenarios, and stores their information in a SQL database. MERGANSER is written in Python and released under the MIT license. In this tool paper, we describe MERGANSER’s architecture and provide guidance for its usage in practice.
- Predicting Merge Conflicts in Collaborative Software DevelopmentMoein Owhadi-Kareshk, Sarah Nadi, and Julia RubinIn 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) 2019
Background. During collaborative software development, developers often use branches to add features or fix bugs. When merging changes from two branches, conflicts may occur if the changes are inconsistent. Developers need to resolve these conflicts before completing the merge, which is an error-prone and time-consuming process. Early detection of merge conflicts, which warns developers about resolving conflicts before they become large and complicated, is among the ways of dealing with this problem. Existing techniques do this by continuously pulling and merging all combinations of branches in the background to notify developers as soon as a conflict occurs, which is a computationally expensive process. One potential way for reducing this cost is to use a machine-learning based conflict predictor that filters out the merge scenarios that are not likely to have conflicts, i.e.safe merge scenarios.Aims. In this paper, we assess if conflict prediction is feasible.Method. We design a classifier for predicting merge conflicts, based on 9 light-weight Git feature sets. To evaluate our predictor, we perform a large-scale study on 267,657 merge scenarios from 744 GitHub repositories in seven programming languages.Results. Our results show that we achieve high f1-scores, varying from 0.95 to 0.97 for different programming languages, when predicting safe merge scenarios. The f1-score is between 0.57 and 0.68 for the conflicting merge scenarios.Conclusions. Predicting merge conflicts is feasible in practice, especially in the context of predicting safe merge scenarios as a pre-filtering step for speculative merging.
- Entropy-based Consensus for Distributed Data ClusteringM. Owhadi-Kareshki, and M.R. Akbarzadeh-T.Journal of AI and Data Mining 2019
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in the consensus process, hence no private data are transferred. With the proposed use of entropy as an internal measure of consensus clustering validation at each machine, the cluster centers of the local machines with higher expected clustering validity have more influence in the final consensus centers. We also employ relative cost function of the local Fuzzy C-Means (FCM) and the number of data points in each machine as measures of relative machine validity as compared to other machines and its reliability, respectively. The utility of the proposed consensus strategy is examined on 18 datasets from the UCI repository in terms of clustering accuracy and speed up against the centralized version of FCM. Several experiments confirm that the proposed approach yields to higher speed up and accuracy while maintaining data security due to its protected and distributed processing approach.
2017
- Pre-training of an artificial neural network for software fault predictionMoein Owhadi-Kareshk, Yasser Sedaghat, and Mohammad-R. Akbarzadeh-T.In 2017 7th International Conference on Computer and Knowledge Engineering (ICCKE) 2017
Software fault prediction is one of the significant stages in the software testing process. At this stage, the probability of fault occurrence is predicted based on the documented information of the software systems that are already tested. Using this prior knowledge, developers and testing teams can better manage the testing process. There are many efforts in the field of machine learning to solve this classification problem. We propose to use a pre-training technique for a shallow, i.e. with fewer hidden layers, Artificial Neural Network (ANN). While this method is usually employed to prevent over-fitting in deep ANNs, our results indicate that even in a shallow network, it improves the accuracy by escaping from local minima. We compare the proposed method with four SVM-based classifiers and a regular ANN without pre-training on seven datasets from NASA codes in the PROMISE repository. Results confirm that the pre-training improves accuracy by achieving the best overall ranking of 1.43. Among seven datasets, our method has higher accuracy in four of them, while ANN and support vector machine are the best for two and one datasets, respectively.
2016
- Control of elastic joint robot based on electromyogram signal by pre-trained Multi-Layer PerceptronMahdi Souzanchi-K, Moein Owhadi-Kareshk, and Mohammad-R Akbarzadeh-T.In 2016 International Joint Conference on Neural Networks (IJCNN) 2016
Nowadays, humans can play an important role in control of robots. Some researches have used signals that coming directly from humans for control interfaces. In this paper, electromyogram (EMG) signals from the muscles of the human’s upper limb are used as the control interface between the user and a robot arm. A Multi-Layer Perceptron (MLP) is trained by additional unsupervised pre-training to decode upper limb motion from kinematic data and EMG recordings. On the other hand, the control structure differs from previous ones because using the voltage control strategy instead of the torque control strategy. The common control structure for elastic-joint robots employs two control loops whereas this controller has only one control loop and actuators are considered in the dynamic equation of the robot. The proposed control design is verified by stability analysis and experimental results demonstrate the effectiveness of this controller.
2015
- Spectral Clustering-based ClassificationMoein Owhadi-Kareshk, and Mohammad-R. Akbarzadeh-T.In 2015 5th International Conference on Computer and Knowledge Engineering (ICCKE) 2015
Multi-class classification is a challenging problem in pattern recognition. Clustering-based Classification (CC) is one of the most effective classification methods that first divides data into several clusters, each cluster then being described by a One-Class Classifier (OCC). Scalability and accuracy are two key advantages of this clustering-enhanced approach. In continuation of this strategy, in this paper, we further propose Spectral Clustering-based Classification (SCC). In contrast to many other clustering algorithms, Spectral Clustering (SC) aims to put the more mutually interconnected data points in one cluster, hence producing output clusters with smoother borders. A simpler border is easier to be described by an OCC, leading to higher accuracy. Application to seven UCI data sets of various nature and size confirms this improved performance in terms of higher accuracy, while keeping scalability property.
- Representation learning by Denoising Autoencoders for Clustering-based ClassificationMoein Owhadi-Kareshk, and Mohammad-R. Akbarzadeh-TIn 2015 5th International Conference on Computer and Knowledge Engineering (ICCKE) 2015
Representation learning is a fast growing approach in machine learning that aims to improve the quality of the input data, instead of insisting on designing complex subsequent learning algorithms. In this paper, we propose to use Denoising AutoEncoders (DAEs), as one of the most effective representation learning methods, in Clustering-based Classification (CC). CC is a multi-class classification solution for large-scale and complicated data sets. In this approach, data are divided into small and simple clusters, which are described by One-Class Classifiers (OCCs). In the proposed Representation Learning for Clustering-based Classification (RLCC), the new representation of each cluster is generated locally to increase the performance of OCCs in term of accuracy. This method still preserves the scalability property as one of the significant advantages of CC methods. RLCC is evaluated with six different data sets from UCI. The results of the experiments show that RLCC has higher generalization power compared to the standard version of CC.