Trade-off between bagging and boosting for quantum separability-entanglement classification

Mohanty, Sanuja D.; Patro, Ram N.; Biswal, Pradyut K.; Pradhan, Biswajit; Sazim, Sk

doi:10.1007/s11128-024-04469-9

Trade-off between bagging and boosting for quantum separability-entanglement classification

Open access
Published: 15 July 2024

Volume 23, article number 273, (2024)
Cite this article

You have full access to this open access article

Download PDF

Quantum Information Processing Aims and scope Submit manuscript

Trade-off between bagging and boosting for quantum separability-entanglement classification

Download PDF

Sanuja D. Mohanty¹,
Ram N. Patro²,
Pradyut K. Biswal²,
Biswajit Pradhan¹ &
…
Sk Sazim ORCID: orcid.org/0000-0003-3117-0785^3,4

1231 Accesses
1 Citation
Explore all metrics

Abstract

Certifying whether an arbitrary quantum system is entangled or not, is, in general, an NP-hard problem. Though various necessary and sufficient conditions have already been explored in this regard for lower-dimensional systems, it is hard to extend them to higher dimensions. Recently, an ensemble bagging and convex hull approximation (CHA) approach (together, BCHA) was proposed and it strongly suggests employing a machine learning technique for the separability-entanglement classification problem. However, BCHA does only incorporate the balanced dataset for classification tasks which results in lower average accuracy. In order to solve the data imbalance problem in the present literature, an exploration of the boosting technique has been carried out, and a trade-off between the boosting and bagging-based ensemble classifier is explored for quantum separability problems. For the two-qubit and two-qutrit quantum systems, the pros and cons of the proposed random under-sampling boost CHA (RUSBCHA) for the quantum separability problem are compared with the state-of-the-art CHA and BCHA approaches. As the data are highly unbalanced, performance measures such as overall accuracy, average accuracy, F-measure, and G-mean are evaluated for a fair comparison. The outcomes suggest that RUSBCHA is an alternative to the BCHA approach. Also, for several cases, performance improvements are observed for RUSBCHA since the data are imbalanced.

Sparse quantum Gaussian processes to counter the curse of dimensionality

Article Open access 17 February 2021

The Usefulness of Roughly Balanced Bagging for Complex and High-Dimensional Imbalanced Data

A rigorous and robust quantum speed-up in supervised machine learning

Article 12 July 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Nowadays, machine learning (ML) is being employed more to tackle and delve deeper into the harder problems in quantum information science. In recent years, it has been applied in state classifications [1,2,3], state reconstruction [4], parameter estimation [5], and many others [6,7,8,9,10,11,12,13]. The motivation behind using ML in quantum information is to get more insights into the problems where usual numerical techniques either fail or need more resources, eg., the optimization tasks in high constraint or non-convex scenarios.

To decide whether an arbitrary quantum state is entangled or not is an NP-hard problem [14]. It is one of the long-standing fundamental issues in entanglement theory. A state of a composite system $\rho _{AB}$ is said to be separable if $\rho _{AB} = \sum _i p_i\rho _{A}^i \otimes \rho _{B}^i$ for any two subsystems A and B, where $p_i$ $(\ge 0)$ represents classical mixing probability with $\sum _ip_i=1$. Otherwise, it is an entangled state. There exist numerous criteria to detect bipartite entanglement;e however, these criteria are less reliable for higher-dimensional systems. For example, the popular Peres-Horodecki criteria state that the separable states are positive partial transpose (PPT) [15, 16], meaning for separable states $\rho _{AB}^{T_A}\ge 0$, where $T_A$ denotes transposition on system A. The criteria are necessary and sufficient for $d_Ad_B\le 6$, where d denotes system dimension. Other extant method includes entanglement witness, reduction criteria, cross-norm, or realignment criteria to name a few [17]. The most powerful technique is k-extension hierarchy, but it is notoriously hard to compute due to its exponentially growing complexity with k [18, 19]. Recently, in Ref.[1], it was studied that ML techniques are instrumental in probing separability-entanglement classification. It was established that the ML-based technique is more efficient in terms of speed and accuracy than all extant methods. A couple more ML-based techniques were well studied for quantum separable-entanglement classification using artificial neural networks [20, 21].

Ref. [1] employed the convex hull approximation (CHA) to probe the separability-entanglement boundary using a supervised learning scheme. To reduce the error in classification using CHA, the bagging method [22] was invoked. This new method is known as bagging CHA (BCHA). This method increases the speed and accuracy of data manipulation as it divides the whole process into smaller units, and then runs in parallel. Ref. [1] demonstrates their results for two-qubits and two-qutrit systems with fairly high accuracy.

In this work, building on the approaches of Ref. [1], we propose an alternative method that addresses some important issues with further accuracy improvements for the separability-entanglement classification using ML. First, a) we notice that the earlier work does not address the issue of handling data imbalance, and b) did not explore all extant performance measures in their study. In what follows, we find that there are some performance measures which are more relevant to the study of separability problem. Also, we show in this draft that a proper ML classifier with boosting can handle the class imbalance issue by optimally balancing between bagging and boosting methods.

2 Setting up the stage

2.1 Supervised learning

Supervised learning is a method of developing artificial intelligence that involves training a computer algorithm on input data that has been labeled for a certain output [23]. In order to apply it to real-time data, the model is trained until it can discover the underlying patterns and relationships between the input data and the output labels, allowing it to produce accurate classification results.

For supervised learning, the system is supplied with labeled datasets throughout its training phase, which tell it what output is associated with each specific input set. The trained model is then evaluated with test data, which is labeled data with the labels hidden from the algorithm [24]. Further, the unlabeled testing data are used to determine how well the algorithm performs the classification task [25].

To create the learning dataset, we consider bipartite quantum state $\rho _{AB}$ of dimension $d_A\otimes d_B$ in $\mathcal {H}_A\otimes \mathcal {H}_B$. Arbitrary density matrix $\rho _{AB}$ $\in \mathcal {H}_A\otimes \mathcal {H}_B$ can be represented by real vector ${\varvec{x}_i}$ $\in \mathcal {V}$ ($=\mathbb {R}^{d_A^2d_B^2-1}$) as $\rho ^\dagger =\rho $ and $\textrm{Tr}[\rho ]=1$. We call such a vector feature vector [see Appendix A for detail]. The training dataset is then defined as $\Omega _\textrm{train}=\{(\varvec{x}_i,y_i)|i=1,\cdots n\}$, where $x_i$ is the $i^{th}$ sample and $y_i$ is its corresponding class label, which is represented as, $y_i=1(0)$ if it is separable (entangled). Data labeling for $d_Ad_B\le 6$ is performed by using PPT criteria. However, for higher dimensions, the labeling is done as per the Appendix-C of Ref. [1].

In supervised learning, the main aim is to find a classifier (indicator function) $\Theta : \mathcal {V}\rightarrow \{0,1\}$ which will fit the training data at best among a class of functions $\mathcal {F}$. As the present quantum entanglement is a binary classification problem, the error expresses the miss classification rate over two classes. For any training data $\Omega _\textrm{train}$ consisting of n samples, each associated with feature vector $\mathcal {V}$ and a target class label $y_i$ ($\in \{0,1\}$); the loss function $\mathbb {L}$ for any binary classifier $\Theta $ can be represented as

$$\begin{aligned} \mathbb {L}(\Theta ,\Omega _\textrm{train})=\frac{1}{n}\sum _{i=1}^n \mathbbm {1}[y_i \ne \Theta (\varvec{x}_i)], \end{aligned}$$

where $\mathbbm {1}[\cdot ]$ is a truth function of its argument. For any test data $\Omega _\textrm{test}$, the value of function $\mathbb {L}(\Theta ,\Omega _\textrm{test})$ depicts the generalization error from $\Omega _\textrm{train}$ to $\Omega _\textrm{test}$.

It was found that among numerous extant supervised learning algorithms, eg., support vector machine (SVM) [26], decision tree [27], boosting [28], etc. do not provide acceptable accuracy for separability problem [1]. This is due to the complex structure of the set of separable states. This led authors of Ref. [1] to the following consideration.

2.2 Combining CHA with supervised learning

The set of all separable states, $\Omega _1$, is convex and compact, and its exterior points are all pure product states. Using this fact, one can sample $\Omega _1$ using convex hull ($\mathbb {C}$) of m number of product states, $\{\varvec{c}_i\}\in \mathcal {V}$, i.e., $\mathbb {C}:=\textrm{conv}\{\varvec{c}_i~|i=1,\dots , m\}$. The $\mathbb {C}$ is the CHA of $\Omega _1$, and one can decide if an unknown state $\rho $ is separable or not by examining whether its feature vector $\varvec{x}$ is in $\mathbb {C}$. Equivalently, it is the solution of following linear programming:

$$\begin{aligned}&\max ~ \alpha ~~~ \mathrm{s.t}. ~~~\alpha \varvec{x} \in \mathbb {C},~~~\mathrm{i.e.},\nonumber \\&\alpha \varvec{x} = \sum _{i=1}^{|\mathbb {C}|}\lambda _i\varvec{c}_i, ~~ \lambda _i\ge 0, ~~\sum _{i}\lambda _i=1, \end{aligned}$$

(1)

where $\alpha $ has functional dependence on both $\mathbb {C}$ and $\varvec{x}$. If $\varvec{x}$ is in $\mathbb {C}$, then the corresponding state, $\rho $, is separable, else $\rho $ is an entangled state with high possibility. More specifically $\rho $ is separable when $\alpha \ge 1$ and entangled otherwise. We denote a maximal $\alpha $ for a chosen m-value as $\alpha _{\max }^m$. If we increase m (to better approximate $\mathbb {C}$), we will achieve better classification. It is evident that adding more exterior points in convex approximation will increase the accuracy of the above algorithms, however, it is really time-consuming. To overcome this, Ref. [1] used CHA in combination with supervised learning. Now, training data are defined as $\Omega _\textrm{train}=\{(\varvec{x}_i,\alpha _i,y_i)|i=1,\dots , n\}$ and the loss function of classifier $\Theta $ is redefined as

$$\begin{aligned} \mathbb {L}(\Theta ,\Omega _\textrm{train})=\frac{1}{n}\sum _{i=1}^n \mathbbm {1}[y_i \ne \Theta (\varvec{x}_i,\alpha _i)]. \end{aligned}$$

(2)

Where $\alpha _i$ is the outcome of CHA for i-th random density matrix after solving the linear programming for finding $\varvec{x}$ in $\mathbb {C}$. Note that, CHA uses a threshold $\alpha \ge 1$ to classify as 1(0). The values of $\alpha $ acts as another feature for the classifier to learn the model. In Ref [1] bagging-based classification is performed on this feature space, known as bagging CHA (BCHA). More information on the bagging and boosting approaches is discussed further.

2.3 Overview of bagging and boosting classifiers

An ensemble meta-estimator called a bagging classifier fits base classifiers one at a time to random subsets of the original dataset, and then it aggregates the individual predictions (either by voting or by averaging) to provide a final prediction. By adding randomization to the process of building a black-box estimator (such as a decision tree), a meta-estimator of this kind can often be used to lower the variance of the estimator.

A training set is created by randomly selecting M instances (or pieces of data) from the original training dataset (of size N), and used to train each base classifier in parallel. Each base classifier’s training set is distinct from the others. In the resultant training set, many of the original data might be replicated while others might not. An overview of bagging classifiers is presented in Fig. 1.

A number of weak classifiers are combined in the broad ensemble approach known as "boosting" to produce a strong classifier. In order to do this, a model is first constructed using the training data, and a second model is then developed in an effort to fix the errors in the first model. The training set is predicted exactly or a predetermined number of models are added, depending on which comes first. AdaBoost [29] was the first really successful boosting algorithm developed for binary classification. An overview of boosting classifiers is presented in Fig. 2.

Both boosting and bagging fall under the category of "ensemble learning." Combining many weak learners to create a hybrid categorization system. Most often, "ensemble learning" refers to trained weak decision ensemble trees.

2.4 Imbalanced dataset

Imbalanced dataset refers to an unequal distribution of class samples within a dataset. Such unequal distribution of class samples reduces the training performance of the classifiers, and hence the classification results on the testing data are also affected.

In the present context, the volume of entangled states is far more than the separable states, making the dataset imbalanced. For more details on the experimented datasets, see Sect. 4.1. From the discussion in Sect. 4.1, we can observe that the prevalence differences are high for both datasets and hence they are highly imbalanced.

This demands a classifier that can handle data imbalance issues and can be more suitable for quantum separability-entanglement classification problems. Which is discussed in the next section.

Also, for such imbalanced datasets, the learning performance of any ML approach is greatly affected [30] and needs a careful performance evaluation. Such performance measures are discussed in Sect. 4.2.

2.5 Ensemble classifiers for imbalanced dataset

It has been well studied that, for imbalanced data, the SVM classifier may be biased toward the majority class [31]. A modification of SVM has already been presented, incorporating random under-sampling (RUS) for an unbalanced dataset [32] by removing the samples randomly from the training set. For highly unbalanced data, synthetic minority oversampling technique (SMOTE) [33, 34] has been applied toward classification, where it generally over-samples the minority class to create synthetic data points. So further incorporation of SMOTE to boosting approach may be effective for classification. When oversampling is performed by duplicating examples, it may lead to over-fitting [35]. So, further modification by incorporating the under-sampling may help in the performance improvement of the classifier. Instead of over-sampling the minority classes, under-sampling the majority classes also may help in improving the classifier results. The RUS randomly removes examples from the majority class until the desired class distribution is found [36]. Such integration with boosting is RUSBoost [36], which is a hybrid approach combining random under-sampling, SMOTE, and Adaptive Boost (AdaBoost) classifier.

For ensemble learning, bagging and boosting are generally applied (see Figs. 1 and 2). Already the bagging-based CHA (BCHA) is proposed [1], reporting higher accuracy than CHA. But, as the data are highly unbalanced, the accuracy evaluation should be twofold– 1) overall accuracy (OA) and 2) average accuracy (AA). For more details on the performance measures OA and AA, see 4.2. OA is the number of correctly classified test samples per total samples under test while AA is the sum of accuracy for each class predicted per the total number of classes (average of each accuracy per class). Hence, although the reported OA [1] is higher, we evaluated the AA of BCHA, which is of less margin than the CHA approach. This demands further improvement in the classifier which can take care of both the OA and AA for separability-entanglement classification.

As the experimented dataset is highly unbalanced (refer Sect. 4.1), the RUSBoost approach is explored for separability-entanglement classification and is validated over the state-of-the-art approaches. The subsequent section describes the RUSBoost ensembled CHA classifier.

3 RUSBoost CHA (RUSBCHA)

Initially, all examples in the training dataset are assigned equal weights. During each iteration of AdaBoost, a weak hypothesis is formed by the base learner. The error associated with the hypothesis is calculated, and the weight of each example is adjusted such that wrongly classified examples have their weights increased while correctly classified samples have their weights decreased. Therefore, subsequent iterations of boosting will generate hypotheses that are more likely to correctly classify the previously mislabeled examples. After all, iterations are completed, and a weighted vote of all hypotheses is used to assign a class to the unlabeled samples.

Data sampling techniques attempt to alleviate the problem of class imbalance by adjusting the class distribution of the training dataset. This can be accomplished by either removing examples from the majority class (under-sampling) or adding examples to the minority class (oversampling).

SMOTE adds new artificial minority examples by extrapolating between preexisting minority instances rather than simply duplicating original examples. The newly created instances cause the minority regions of the feature space to be fuller and more general.

The RUSBoost takes advantage of all these approaches by combining them. A detailed discussion on the RUSBoost approach can be found in [36].

Although significant classifier performance improvement is observed [1] in the case of BCHA as compared to standalone CHA, some limitations exist which are discussed in Sect. 1. So, it can be further improvised in two ways 1) by replacing the classifier and 2) by increasing the feature space by proper feature extraction technique. Presently the first case is explored by incorporating the RUSBCHA classifier for possible improvement in the classification results leaving scope to explore the feature extraction techniques as future work.

4 Experimental setup

All the classifications were carried out on two kinds of feature spaces 1) vector represented $\rho $ ($d^2-1$-dimensional feature space), 2) vector represented $\rho $ with CHA calculated $\alpha _{\max }^m$ for a specific m ($d^2$-dimensional feature space). The experiments are carried out for both the two-qubit and two-qutrit systems. Five different techniques such as bagging and boosting were tested on raw $d^2$-1-dimensional (for two-qubit system d=4 and for two-qutrit system d=9) feature vector $\varvec{x}$, CHA with only one $\alpha _{\max }^m$, while the BCHA and RUSBCHA are trained with both the $\varvec{x}$, and $\alpha _{\max }^m$. Their associated feature spaces are presented in Table 1.

Table 1 Various experimented classifiers with their associated feature space (dimensions)

Full size table

The dataset details and the performance evaluators are presented below.

4.1 Dataset preparation

The total data space $\Omega $ is a combination of the separable subspace $\Omega _{1}$ and entangled subspace $\Omega _{0}$; such that $\Omega =\Omega _{1}\cup \Omega _{0}$ and $\Omega _{1}\cap \Omega _{0}=\emptyset $ (see Fig 3). Two datasets, representing the feature vectors of random density matrices for two-qubit and two-qutrit systems, respectively, are supplied with their class labels in [37]. The procedure for creating the random separable and entangled states can be referred to in the BCHA manuscript [1]. The total and class-specific training and testing sample information for the pair of the experimented datasets, namely two-qubit and two-qutrit system, are presented in Table 2 and Table 3, respectively. Approximate 50% samples are randomly selected for training and the remaining 50% samples are used for testing to evaluate the performances of ML algorithms.

The maximized parameter, $\alpha _{\max }^m$ for CHA (with varying m) of 1) two-qubit system with $m=[1000,2000,..,10000]$, and 2) two-qutrit system with $m=[10000,20000,..,100000]$) were also obtained from [1, 37]. The minimization was made by solving the linear programming defined in Eq.(1).

Table 2 Dataset description of experimented training, testing, and total samples for two-qubit systems

Full size table

Table 3 Dataset description of experimented training, testing, and total samples for two-qutrit systems

Full size table

From Tables 2 and 3, we can observe that the class samples are unequally distributed within the dataset. A prevalence difference for a binary classification represents the degree of imbalance in the dataset. The dataset-specific prevalence difference of class samples can be interpreted as, for:

Two-qubit dataset (Table 2): $\left| \frac{2814}{40000} - \frac{37186}{40000}\right| =0.8593$.
Two-qutrit dataset (Table 3): $\left| \frac{6751}{20000} - \frac{13249}{20000}\right| =0.3249$.

For a balanced dataset, the prevalence difference must approach 0. However, we can observe that the prevalence difference for the two-qubit dataset is high (0.86) and for the two-qutrit dataset, it is comparatively low (0.32). This clearly signifies that the experimented dataset is highly imbalanced. For such imbalanced datasets, the learning performance of any ML approach is greatly affected [30] and needs a careful performance evaluation. Such performance measures are discussed further.

4.2 Performance measures

For ease of understanding the binary classification, the confusion matrix is presented in Fig. 4. In the figure, columns represent the original class labels (supplied with the data) as true and false; similarly each row represents the outcome of the classifier.

True positive (TP) and true negative (TN) are defined as both the original (ground truth) and the obtained (classified) class labels are true and false, respectively. The contradictions are presented as false positive (FP) and false negative (FN) which are off-diagonal in the confusion matrix. Let N number of samples be tested, i.e., $N=\sum \left( \textrm{TP}+\textrm{TN}+\textrm{FP}+\textrm{FN}\right) $. So, higher TP and TN values lead to better accuracy; on the contrary, higher FP and FN values reject the classifier.

Now we can define overall accuracy (OA) as

$$\begin{aligned} {\text {OA}} = \frac{{{\text {TP}} + {\text {TN}}}}{N}, \end{aligned}$$

and the overall error (OE) as OE=1-OA.

For binary classification, let, out of N tested samples, there are $N_1$ and $N_2$ samples labeled as true and false, respectively (where $N=N_1+N_2$). The average accuracy (AA) is the mean accuracy obtained for each class and is defined as

$$\begin{aligned} {\text {AA}} = \frac{1}{2}\left( {\frac{{{\text {TP}}}}{{N_{1} }} + \frac{{{\text {TN}}}}{{N_{2} }}} \right) . \end{aligned}$$

and the average error (AE) as AE=1-AA.

Similarly, other important measures such as sensitivity (${s}=\frac{\hbox {TP}}{\hbox {TP}+\hbox {FN}}$), specificity ($r=\frac{\textrm{TN}}{N}$), Precision ($k= \frac{\textrm{TP}}{\mathrm{TP+FP}}$), F-measure and G-mean can be incorporated for validating the classification results. We will use the following two for our analysis:

$$\begin{aligned} \text{ F-measure } = 2\left( \frac{k \times s}{k + s}\right) ,\,\,\text{ and }\,\, \text{ G-mean } = \sqrt{s \times r}. \end{aligned}$$

Higher values of OA, AA, F-measure, and G-mean are desirable for evaluating the performance of a classifier.

5 Results and discussion

We used both the datasets (see Sect. 4.1) and all the performance measures described in Sect. 4.2, to compare the proposed RUSBCHA and other state-of-art classifiers in terms of figures. For the robust representation of performances on the experimented data, all the classification performance measures are averaged over 30 independent evaluations.

The bagging and boosting classifier only incorporates the $d^2$-1-dimensional feature vector $\varvec{x}$. The classification performance as; AE, F-measure, G-mean, and OE; for two-qubit and two-qutrit systems are presented in Fig. 5 a, b, respectively. For the two-qubit system (Fig. 5 a), it is observed that the proposed boosting approach outperforms the bagging approach in terms of F-measure, G-mean, and AE. While marginal deviation is observed for OE. Similarly, for the two-qutrit system (Fig. 5 b), improvement is observed for G-mean and AE.

According to both the CHA and BCHA approaches, if $\alpha _{\max }^m \ge 1$, $\varvec{x}$ is separable; else, $\varvec{x}$ is highly possible to be an entangled state. Hence, our proposed RUSBCHA classifier also incorporates both the feature vectors $\varvec{x}$ and $\alpha _{\max }^m$. To find the trade-off between the state-of-the-art BCHA and the proposed RUSBCHA approach, further experiments are made on both two-qubit and two-qutrit datasets. These experiments include:

Experiment 1: Performance evaluation of classifiers over varying m.
Experiment 2: Performance evaluation of classifiers over varying percentages of training and testing samples.
Experiment 3: Performance evaluation of classifiers on varying prevalence difference of dataset.

5.1 Experiment 1

In this experiment, the CHA, BCHA, and proposed RUSBCHA classifiers are compared over varying m for both two-qubit and two-qutrit datasets. Experimental results are shown in Figs. 6 and 7.

For a two-qubit system, from Fig. 6b, it can be observed that the AE of BCHA is higher for all values of m as compared to CHA and RUSBCHA approaches. The BCHA performance has almost 40% error for the lower value of m. It can also be observed that, for lower values of m, the performances of CHA and RUSBCHA are similar, while for higher values of m RUSBCHA has lower AE values. This clearly signifies that the proposed RUSBCHA is less biased to the majority classes, and hence, the average accuracy is higher in comparison with other state of approaches. A similar interpretation also can be seen in Fig. 6d.

From Fig. 6a, it can be observed that the OE of BCHA has lower values, and hence its performance is better for lower values of m in comparison with RUSBCHA and CHA approaches. While the proposed RUSBCHA has intermediate performance in comparison with other state of approaches. However, in Fig. 6c, the F-measure performances are equivalently similar for all approaches.

On the other hand, for the two-qutrit system (Fig. 7), both the BCHA and RUSBCHA have similar performances over varying m with significant performance improvements as compared to the state-of-art CHA approach.

In this experiment, you can observe better performance of proposed RUSBCHA approach for two-qubit dataset in comparison with BCHA and CHA approaches while similar performances are observed for both RUSBCHA and BCHA for two-qutrit datasets. To find the rationale for performance differences of these two datasets, further experiments are carried out.

5.2 Experiment 2

In literature, it is proved that several machine learning techniques such as neural network and deep learning require a large number of samples to train. The above problem may occur due to the sensitivity of the classifier to the percentage of training samples. In experiment 1, 50% of samples are trained and the rest are tested. Hence, further validation of the approaches is carried out with varying training (10–50%) and testing (50–90%) scales, and the performances are presented in Figs. 8 and 9 for two-qubit and two-qutrit systems, respectively. Note that, for this experiment, the total samples are the same as Tables 2 and 3 for the respective datasets. In this experiment, m is set as 2000 and 20000 for two-qubit data and two-qutrit data, respectively.

From Fig. 8a, it can be observed that OA of BCHA is 2.5% more than RUSBCHA, while in Fig. 8b AA of RUSBCHA is more than 15% better than BCHA. However, the results of these classifiers do not vary by the variation in training percentages. Therefore, performance of both the classifiers is not sensitive to the number of training samples. For the two-qutrit data, in Fig. 9a and Fig. 9b, you can also observe similar results. However, the AA performances in Figs. 8b and 9b suggests that the RUSBCHA performs better than BCHA, specifically for two-qubit dataset. Note in this respect that the prevalence difference of the two-qutrit dataset (0.3249) which is comparatively low referring to the prevalence difference of the two-qubit dataset (0.8593) for this experiment. This further suggests that doing further experiments to test both the classifiers with varying prevalence difference ratios might provide us some clue on how these classifiers work for imbalanced datasets.

5.3 Experiment 3

The above experiments were performed with two-qubit and two-qutrit datasets as mentioned in Table 2 and Table 3, respectively. From these tables, you can observe that the separable samples are only 7% and 33% of the total samples for two-qubit and two-qutrit datasets, respectively. To test the performance of classifiers for different prevalence differences, we created imbalanced datasets of different prevalence differences for both two-qubit and two-qutrit.

Table 4 Description of imbalanced datasets created from the original two-qubit dataset of Table 2

Full size table

Table 4 shows the description of created imbalanced datasets for two-qubits. In this table, each row describes a dataset which is a subset of the dataset described in Table 2. For each created dataset subset, its number of separable, entangled, and total samples are represented. Also for each entry in the table, the prevalence difference of the respective dataset is mentioned. One notices the prevalence difference values range approximately from 0 to 0.9. The value 0 represents the dataset is balanced, and value 0.9 represents the dataset is highly imbalanced. A similar interpretation for the two-qutrit dataset can be done from Table 5.

Table 5 Description of imbalanced datasets created from the original two-qutrit dataset of Table 3

Full size table

Figure 10 shows the classifier performances over the varying prevalence of two-qubit data. In the figure, the performances are averaged over 30 iterations, and in each iteration, a new subset of the dataset is created with varying prevalence differences (Table 4). For this experiment, we fixed these parameters m=2000, and 50% training samples.

It is observed from Fig. 10a that the OA of both BCHA and RUSBCHA are similar up to 0.6 prevalence difference. However, afterward, there is a minor improvement of OA for BCHA approach in comparison with RUSBCHA approach. From Fig. 10b, it can be observed that both BCHA and RUSBCHA performances are similar up to 0.5 prevalence difference. However, afterward, there is a sharp decline of AA for BCHA in comparison with RUSBCHA.

Figure 11 shows the classifier performances over the varying prevalence of two-qutrit data. In the figure, the performances are averaged over 30 iterations, and in each iteration, a new subset of the dataset is created with varying prevalence differences (Table 5). For this experiment, we fixed these parameters m=20000, and 50% training samples.

It is observed from Fig. 11a that the OA of both BCHA and RUSBCHA are similar up to 0.3 prevalence difference. However, afterward, there is a minor improvement of OA for BCHA approach in comparison with RUSBCHA approach. From Fig. 11b, it can be observed that both BCHA and RUSBCHA performances are similar up to 0.25 prevalence difference. However, afterward, there is a sharp decline of AA for BCHA in comparison with RUSBCHA.

From the results in Fig. 10 and Fig. 11, it can be observed that the performance of the proposed RUSBCHA approach is consistent (almost a straight line) over varying prevalence differences of data. So, it can be concluded that the performance of RUSBCHA is not heavily affected by the data imbalances.

Referring to our earlier observations, for Fig. 6, the reason for having good AA of proposed RUSBCHA over BCHA, and for Fig. 7, the reason for having similar performances of both RUSBCHA and BCHA can now be justified using Fig. 10 and Fig. 11, respectively. Since the prevalence difference of two-qubit data is 0.8593 our proposed RUSBCHA performs better than BCHA. While the prevalence difference of two-qutrit data is 0.3249, hence, both BCHA and RUSBCHA performances are similar.

Hence, we can conclude that the RUSBCHA can be an alternative to the BCHA approach and also can be a better classifier to deal with highly imbalanced datasets. Overall, the ensemble learning is helpful for better understanding of separability-entanglement problem, when compared to the stand-alone CHA approach.

6 Conclusion

The necessity of a separability-entanglement classifier is well known in the quantum information forum. Although various necessary and sufficient criteria like PPT have been proposed in the past, still, they cannot be generalized for higher dimensions. The ML approaches are vastly exploited in the general data-mining perspective, while the discussions and applications are limited in quantum information processing. Similar to BCHA, we proposed RUSBCHA as an alternative ML-based solution for the quantum separability problem. The proposed RUSBCHA approach for quantum separability problem shown improvements in AE for the two-qubit system, while having similar responses for the two-qutrit systems in comparison with CHA. As the data are highly unbalanced, standard performance measures like OE, AE, F-measure, and G-mean are evaluated. The results suggest incorporating a proper ML approach to classify the separability-entanglement criteria with proper performance matrices. Also, the proposed RUSBCHA can be an alternative to CHA which can deal with the unbalanced dataset that may reduce the over-fitting error of the classifier.

In order to evaluate the effectiveness of the classifier, the feature extraction is unexploited here; however, this can be a further direction of research to improve the classification performance. Also, other ML approaches can be exploited and validated further.

Data and Code availability

We have uploaded the code and the data created for our analysis in the following open GitHub repository: https://github.com/ram-patro/RUSBCHA. Note that the earlier repository, QMLab [37] created by the authors of Ref. [1] is no longer available. Our repository given above includes all of the analysis by QMLab also.

Abbreviations

CHA:: Convex hull approximation
BCHA:: Bagging-based CHA
RUSBCHA:: Random under-sampling BCHA
ML:: Machine learning
PPT:: Positive partial transpose
SVM:: Support vector machine
SMOTE:: Synthetic minority oversampling technique
TP:: True positive
TN:: True negative
FP:: False positive
FN:: False negative
OA:: Overall accuracy
AA:: Average accuracy
OE:: Overall error
AE:: Average error

References

Lu, S., Huang, S., Li, K., Li, J., Chen, J., Lu, D., Ji, Z., Shen, Y., Zhou, D., Zeng, B.: Separability-entanglement classifier via machine learning. Phys. Rev. A 98, 012315 (2018)
Article ADS Google Scholar
Harney, C., Pirandola, S., Ferraro, A., Paternostro, M.: Entanglement classification via neural network quantum states. New J. Phys. 22, 045001 (2020). https://doi.org/10.1088/1367-2630/ab783d
Article ADS MathSciNet Google Scholar
Ahmed, S., Sánchez Muñoz, C., Nori, F., Kockum, A.F.: Classification and reconstruction of optical quantum states with deep neural networks. Phys. Rev. Res. 3, 033278 (2021). https://doi.org/10.1103/PhysRevResearch.3.033278
Article Google Scholar
Ahmed, S., Sánchez Muñoz, C., Nori, F., Kockum, A.F.: Quantum state tomography with conditional generative adversarial networks. Phys. Rev. Lett. 127, 140502 (2021). https://doi.org/10.1103/PhysRevLett.127.140502
Article ADS Google Scholar
Wang, W., Lo, H.-K.: Machine learning for optimal parameter prediction in quantum key distribution. Phys. Rev. A 100, 062334 (2019). https://doi.org/10.1103/PhysRevA.100.062334
Article ADS Google Scholar
Niu, M.Y., Boixo, S., Smelyanskiy, V.N., Neven, H.: Universal quantum control through deep reinforcement learning. NPJ Quantum Inf. 5, 33 (2019). https://doi.org/10.1038/s41534-019-0141-3
Article ADS Google Scholar
Zhang, X.M., Wei, Z., Asad, R., Yang, X.C., Wang, X.: When does reinforcement learning stand out in quantum control? A comparative study on state preparation. NPJ Quantum Inf. 5(1), 85 (2019)
Article ADS Google Scholar
Porotti, R., Tamascelli, D., Restelli, M., Prati, E.: Coherent transport of quantum states by deep reinforcement learning. Commun. Phys. 2, 61 (2019)
Article Google Scholar
Bukov, M., Day, A.G.R., Sels, D., Weinberg, P., Polkovnikov, A., Mehta, P.: Reinforcement learning in different phases of quantum control. Phys. Rev. X 8, 031086 (2018). https://doi.org/10.1103/PhysRevX.8.031086
Article Google Scholar
Ding, Y., Ban, Y., Martín-Guerrero, J.D., Solano, E., Casanova, J., Chen, X.: Breaking adiabatic quantum control with deep learning. Phys. Rev. A 103, L040401 (2021). https://doi.org/10.1103/PhysRevA.103.L040401
Article ADS Google Scholar
Cîrstoiu, C., Holmes, Z., Iosue, J., Cincio, L., Coles, P.J., Sornborger, A.: Variational fast forwarding for quantum simulation beyond the coherence time. NPJ Quantum Inf. 6, 82 (2020)
Article ADS Google Scholar
Schuff, J., Fiderer, L.J., Braun, D.: Improving the dynamics of quantum sensors with reinforcement learning. New J. Phys. 22, 035001 (2020). https://doi.org/10.1088/1367-2630/ab6f1f
Article ADS Google Scholar
Lohani, S., Lukens, J. M., Glasser, R. T., Searles, T. A., Kirby, B. T.: Data-centric machine learning in quantum information science, arXiv e-prints , eid arXiv:2201.09134 (2022), arXiv:2201.09134 [quant-ph]
Gurvits, L.: Classical deterministic complexity of Edmonds’ problem and quantum entanglement, in Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, series and number STOC ’03 (Association for Computing Machinery, New York, NY, USA, 2003) p. 10–19 https://doi.org/10.1145/780542.780545
Peres, A.: Separability criterion for density matrices. Phys. Rev. Lett. 77, 1413 (1996)
Article ADS MathSciNet Google Scholar
Horodecki, M., Horodecki, P., Horodecki, R.: Separability of mixed states: necessary and sufficient conditions. Phys. Lett. A 223, 1–8 (1996). https://doi.org/10.1016/s0375-9601(96)00706-2
Article ADS MathSciNet Google Scholar
Horodecki, R., Horodecki, P., Horodecki, M., Horodecki, K.: Quantum entanglement. Rev. Mod. Phys. 81, 865 (2009). https://doi.org/10.1103/RevModPhys.81.865
Article ADS MathSciNet Google Scholar
Doherty, A.C., Parrilo, P.A., Spedalieri, F.M.: Distinguishing separable and entangled states. Phys. Rev. Lett. 88, 187904 (2002). https://doi.org/10.1103/PhysRevLett.88.187904
Article ADS Google Scholar
Navascués, M., Owari, M., Plenio, M.B.: Power of symmetric extensions for entanglement detection. Phys. Rev. A 80, 052306 (2009). https://doi.org/10.1103/PhysRevA.80.052306
Article ADS Google Scholar
Harney, C., Paternostro, M., Pirandola, S.: Mixed state entanglement classification using artificial neural networks. New J. Phys. 23, 063033 (2021)
Article ADS MathSciNet Google Scholar
Girardin, A., Brunner, N., Kriváchy, T.: Building separable approximations for quantum states via neural networks. Phys. Rev. Res. 4, 023238 (2022)
Article Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123 (1996). https://doi.org/10.1007/BF00058655
Article Google Scholar
Patro, R.N., Subudhi, S., Biswal, P.K., Dell’Acqua, F.: Dictionary-based classifiers for exploiting feature sequence information and their application to hyperspectral remotely sensed data. Int. J. Remote Sens. 40, 4996 (2019)
Article Google Scholar
Patro, R.N., Subudhi, S., Biswal, P.K., Dell’Acqua, F., Sahoo, H.K.: Conditional nearest regularized subspace classifiers: a fast classification approach for HSI. Int. J. Remote Sens. 40, 9279 (2019)
Article Google Scholar
Kotsiantis, S.B., Zaharakis, I., Pintelas, P., et al.: Supervised machine learning: a review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 160, 3 (2007)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273 (1995). https://doi.org/10.1007/BF00994018
Article Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees (CRC Press. Boca Raton (1984). https://doi.org/10.1201/9781315139470
Schapire, R.E.: The boosting approach to machine learning: an overview, in Nonlinear Estimation and Classification, edited by D. D. Denison, M. H. Hansen, C. C. Holmes, B. Mallick, and B. Yu (Springer New York, New York, NY, 2003) pp. 149–171 https://doi.org/10.1007/978-0-387-21579-2_9
Schapire, R.E.: Explaining adaboost, in Empirical inference (Springer, 2013) pp. 37–52
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6, 429 (2002)
Article Google Scholar
Akbani, R., Kwek, S., Japkowicz, N.: Applying support vector machines to imbalanced datasets, in European conference on machine learning (Springer, 2004) pp. 39–50
Tang, Y., Zhang, Y.-Q., Chawla, N.V., Krasser, S.: Svms modeling for highly imbalanced classification, IEEE Transactions on Systems, Man, and Cybernetics. Part B (Cybernetics) 39, 281 (2008)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321 (2002)
Article Google Scholar
Subudhi, S., Patro, R.N., Biswal, P.K.: Pso-based synthetic minority oversampling technique for classification of reduced hyperspectral image, in Soft Computing for Problem Solving, edited by J.C. Bansal, K.N. Das, A. Nagar, K. Deep, and A. K. Ojha (Springer Singapore, Singapore, 2019) pp. 617–625
Drummond, C., Holte, R.C, et al.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling, in Workshop on learning from imbalanced datasets II, Vol. 11 (Citeseer, 2003) pp. 1–8
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A: Syst. Hum. 40, 185 (2009)
Article Google Scholar
QMLab: Global collaboration on quamtum machine learning (http://qmlab.org/) (2017)
Zyczkowski, K.: Volume of the set of separable states ii. Phys. Rev. A 60, 3496 (1999). https://doi.org/10.1103/PhysRevA.60.3496
Article ADS MathSciNet Google Scholar

Download references

Acknowledgements

SS acknowledges funding through Pasific program call 2 (Agreement No. PAN.BFB.S.BDN.460.022 with the Polish Academy of Sciences). This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska- Curie grant agreement No 847639 and from the Ministry of Education and Science. SS also acknowledges the financial support through DEQHOST (APVV-22-0570) and DESCOM (VEGA-2/0183/21) during his stay at IPSAS, Bratislava.

Author information

Authors and Affiliations

Department of Physics, International Institute of Information and Technology, Bhubaneswar, Odisha, 751029, India
Sanuja D. Mohanty & Biswajit Pradhan
Department of ECE, International Institute of Information and Technology, Bhubaneswar, Odisha, 751029, India
Ram N. Patro & Pradyut K. Biswal
RCQI, Institute of Physics, Slovak Academy of Sciences, 845 11, Bratislava, Slovakia
Sk Sazim
Center for Theoretical Physics, Polish Academy of Sciences, Aleja Lotników 32/46, 02-668, Warsaw, Poland
Sk Sazim

Authors

Sanuja D. Mohanty
View author publications
Search author on:PubMed Google Scholar
Ram N. Patro
View author publications
Search author on:PubMed Google Scholar
Pradyut K. Biswal
View author publications
Search author on:PubMed Google Scholar
Biswajit Pradhan
View author publications
Search author on:PubMed Google Scholar
Sk Sazim
View author publications
Search author on:PubMed Google Scholar

Contributions

The authors have no conflicts to disclose.

Corresponding author

Correspondence to Sk Sazim.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Feature vector

To illustrate what is feature vector $\varvec{x}$, we consider the following example. We know a quantum state ($\rho _d$) in d-dimensional Hilbert space can be represented by a $d\times d$ density matrix using generalized Gell-Mann matrices, $\sigma _i \in $ SU(d) as

$$\begin{aligned} \rho _d=\frac{1}{n}\left( \mathbb {I}+\sqrt{\frac{d(d-1)}{2}}\varvec{x}. \varvec{\sigma }\right) , \end{aligned}$$

(A1)

where $\varvec{x}\in \mathbb {R}^{d^2-1}$ is the feature vector which satisfies $x_i=\sqrt{\frac{d}{2(d-1)}}\textrm{Tr}[\rho _d\sigma _i]$. This is possible as $\rho $ is Hermitian and has trace unity.

In our analysis, we consider quantum systems in $d_A\otimes d_B$-dimensional Hilbert space $\mathcal {H}_A\otimes \mathcal {H}_B$ which are represented by $d_A d_B\times d_A d_B$ density matrices. Hence, to represent using feature vectors, we need Gell-Mann matrices $\sigma _i\in SU(d_A d_B)$, i.e., the $\varvec{x}\in \mathbb {R}^{d_A^2d_B^2-1}$.

Appendix B: Generating random density matrices in the code

Most of the contents in the appendix are elaborately discussed in Ref.[1]. We will discuss the methods of producing random density matrices for specific dimensions in a nutshell.

To produce random bipartite density matrices of any rank numerically, we use the probability distribution $p(\mu ,\theta ,d)=\mu \times \triangle _\theta $, where $\mu $ is the uniform distribution on U(d) according to the Haar measure, $\triangle _\theta $ is the Dirichlet distribution

$$\begin{aligned} \triangle _\theta (\ell _1,\cdots ,\ell _d):=C_\theta \prod _{i=1}^d \ell _i^{-\theta } \end{aligned}$$

(B1)

defined on the simplex $\sum _i^d\ell _i=1$, where $\theta >0$ is a parameter and $C_\theta $ is a normalization constant. We set $\theta =\frac{1}{2}$ for sampling both the two-qubit and two-qutrit states.

Note that our dataset is exactly the same as is used in Ref.[1]. The Ref.[1] observed the following trends during training using the generated samples:

For the two-qubit case, approximately 7% of the states among $5\times 10^4$ are PPT, i.e., separable state.
Among fairly large samples (randomly generated) of two-qutrits, only 2.2% are PPT. After rejecting all the states with negative partial transpose while sampling as they are assumed entangled (prior information), the total collected PPT states are a total of $2\times 10^4$ samples. Among PPT states, at least 66.24% are found to be separable using CHA. However, note that during the testing, NPT states are also included.

The authors in Ref.[1] observe that these trends are consistent with the previously predicted ones in Ref.[38].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mohanty, S.D., Patro, R.N., Biswal, P.K. et al. Trade-off between bagging and boosting for quantum separability-entanglement classification. Quantum Inf Process 23, 273 (2024). https://doi.org/10.1007/s11128-024-04469-9

Download citation

Received: 18 January 2024
Accepted: 17 June 2024
Published: 15 July 2024
DOI: https://doi.org/10.1007/s11128-024-04469-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Trade-off between bagging and boosting for quantum separability-entanglement classification

Abstract

Similar content being viewed by others

Sparse quantum Gaussian processes to counter the curse of dimensionality

The Usefulness of Roughly Balanced Bagging for Complex and High-Dimensional Imbalanced Data

A rigorous and robust quantum speed-up in supervised machine learning

Explore related subjects

1 Introduction

2 Setting up the stage

2.1 Supervised learning

2.2 Combining CHA with supervised learning

2.3 Overview of bagging and boosting classifiers

2.4 Imbalanced dataset

2.5 Ensemble classifiers for imbalanced dataset

3 RUSBoost CHA (RUSBCHA)

4 Experimental setup

4.1 Dataset preparation

4.2 Performance measures

5 Results and discussion

5.1 Experiment 1

5.2 Experiment 2

5.3 Experiment 3

6 Conclusion

Data and Code availability

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendices

Appendix A: Feature vector

Appendix B: Generating random density matrices in the code

Rights and permissions

About this article

Cite this article

Share this article

Keywords