Cluster Hdbscan Dan GMM

Uploaded by

FITRI INDAH ANGGREANI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

116 views45 pages

Cluster Hdbscan Dan GMM

Uploaded by

FITRI INDAH ANGGREANI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 45

DETERMINATION OF MEMBERSHIP, CLUSTER AND ~ CLUSTER CENTRE FOR M67 USING HDBSCAN, PLOTTING COLOR MAGNITUDE DIAGRAM AND FITTING ISOCHRONE 1. Introduction Open clusters have long been regarded as powerful tools for studies of the Galactic disk and evolution of stars (Chen, 2003). Membership determination is the first step to study an open cluster, which can directly influence estimation of physical parameters. Various methods have been used for membership determination based on proper motions, radial velocities, photometric data and their combination. Various algorithms have been developed for the determination of star cluster membership. Machine-learning applications for this case were introduced such as DBSCAN (Gao, 2014), Gaussian Mixture Model (Gao, 2020), KMEANS (El Aziz et al, 2016), kth nearest neighbor (Gao, 2016), ML-MOC (Agarwal et al. 2021) and many more. 2. Objectives Aims of the workshop are: 1. to determine the center of the open cluster; and 2. to determine the membership probability y 3. Theoretical Background Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) is a natural evolution of DBSCAN released in the past few years, almost 20 years after DBSCAN (Ester et al 1996). DBSCAN identifies clusters as overdensities in a multidimensional space in which the number of sources exceeds the required minimum number of points within a neighborhood (minPts) of a particular linking length e.Red: Core Points Yellow: Border points. Still part of the cluster because it's within epsilon of a core point, but not ‘does not meet the min_points criteria Blue: Noise point, Not assigned toacluster Fig 1. tllustration of DBSCAN (source: medium.com/@agarwalvibhor84) HDBSCAN works in a similar way except the user only needs to set a minimum cluster size. It does. not depend on e; instead it condenses the minimum spanning tree by pruning off the nodes that do not meet the minimum number of sources in a cluster, and reanalyzing the nodes that do (Kounkel & Covey, 2019). Not only does it automatically determine other things to set a density threshold accurately, it also does this on local levels, meaning that clusters can be returned in different areas. of a dataset with different density levels. 4. Data The data that will be used is from Gaia Early Data Release 3 (Gaia eDR3, Gaia Collaboration 2016b; 2020a). The third early data release (eDR3, Gaia Collaboration et al. 2018) of the ESA Gaia space mission (Gaia Collaboration et al. 2016b) is by far the deepest and most precise astrometric catalogue ever obtained, with proper motion nominal uncertainties a hundred times smaller than UCAC4 and PPMXL. We download sources from Gaia eDR3 in a cone around the cluster centre for a value of radius that is greater than the tidal radius of the cluster. Though our algorithm is quite robust to the choice of this initial radius, we download sources within a radius of 180 arcmin from the cluster centre. Next, we select the sources that satisfy the following criteria (Agarwal et al. 2021): 1. Each source must have the five astrometric parameters, positions, proper motions, and parallax as well as valid measurements in the three photometric passbands G, GBP, and GRP in the Gaia eDR3 catalogue 2. Their parallax values must be non-negative. 3. To eliminate sources with high uncertainty while still retaining a fraction of sources down to G ~ 21 mag, the errors in their G-mag must be less than 0.005.You can download NGC 752 data here. 5. Workshop Structure The workshop is made up of two Jupyter Notebooks. The layout of the workshop is as follows: 1. Determine the center of the open cluster. 2. Determine the membership probability. Our membership assignment relies on the astrometric solution, and we only used the Gaia eDR3 photometry to manually confirm that the groups identified matched the expected aspect of a cluster in a color-magnitude diagram. Part 1: Determine the Center of the Open Cluster To determine the membership of open cluster NGC 752, we will use a module in python called hdbsean (McInnes et al. 2017). if this notebook is run on gColab, firstly we need to install some libraries. Import the required packages Ipip install hdbscan import math import matplotlib.pyplot as plt import numpy as np import pandas as pd import hdbscan from sklearn.preprocessing import StandardScaler from astropy.coordinates import SkyCoord import astropy.units as u from sklearn.mixture import GaussianMixture import arviz as az from patsy import dmatrix import statsmodels.formula.api as smf from sklearn.metrics import r2_score from sklearn.model_selection import train_test_split import statsmodels.api as sm from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error from _future__ import print_functionRequirement already Requirement already Requirement already Requirement already Requirement already Requirement already Requirement already Requirement already hdbscan in /usr/local/1ib/python3.7/dist-packages (0.8.2: cython>=@,27 in /usr/local/lib/python3.7/dist-packages (4 scikit-Learn>[email protected] in /usr/local/1ib/python3.7/dist-packi joblib>=1. in /usr/local/1ib/python3.7/dist-packages (ft scipy>=1.@ in /usr/local/lib/python3.7/dist-packages (fre numpy>=1.26 in /usr/local/1ib/python3.7/dist-packages (ft six in /usr/local/Lib/python3.7/dist-packages (from hdbs« : threadpoolct1>=2.8.@ in /usr/local/1ib/python3.7/dist-pac Set some plotting configurations SMALL_SIZE = 12 MEDIUM SIZE = 14 BIGGER SIZE = 20 plt.re(‘font', size=SMALL_SIZe) # controls default text sizes plt.rc(‘axes", titlesize-SMALL_SIZE) # fontsize of the axes title plt.rc(‘axes', labelsize=MEDTUM_STZE) — # fontsize of the x and y labels plt.re(‘xtick', labelsize-SMALL_SIZE) # fontsize of the tick labels plt.rc(‘ytick', labelsize=SMALL_SIZE) # fontsize of the tick labels # plt.rc(‘legend', fontsize-10) legend fontsize %matplotlib inline Data Preparation Import data file FILENAME = “gaiaedr3_15@_M67.csv" datafile = pd.read_csv(FILENAME, delimiter=" datafiledatafile. source_id 603712422876575360 603712427171528448 603712457236324224 603712560315544576 nfo() ra 135,110808 135.118378 135,121338 136,130346 ra_error 0.179466 0.223748 0.523702 0.903004 dec 10.726941 10.723134 10.738075 10.744922 Rangelndex: 115670 entries, @ to 115669 Data columns (total 24 columns): 21 22 23 dtypes: floatea(23), intea(1) memory usage: 21.2 MB column source_id ra ra_error dec dec_error parallax parallax_error parallax_over_error pm pmra pmra_error pmdec pmdec_error phot_g_mean_flux phot_g_mean_flux_error phot_g_mean_nag phot_bp_mean_flux phot_bp_mean_flux_error phot_bp_mean_mag phot_rp_mean_flux phot_rp_mean_flux_error phot_rp_mean_mag dr2_radial_velocity Non-Null Count 115670 non-null 115670 non-null 115670 non-null 115678 non-null 115678 non-null 95584 non-null 95584 non-null 95584 non-null 95584 non-null 95584 non-null 95584 non-null 95584 non-null 95584 non-null 115551 non-null. 115551 non-null 115551 non-null 114016 non-null 114016 non-null 114016 non-null 114350 non-null 114350 non-null 114350 non-null 1571 non-null dr2_radial_velocity_error 1571 non-null Handling the missing values datafile. dropna(subset=[‘pmra’, “pmdec, Dtype intea Floatea floatea floatea floate4 floatea floatsa floatea floatea Floated floatea floates floatea floates Floatea floates floates floatea Floatea Floatea floatea floats4 floatea Floated dec_error 0.144282 0.181216 0.395677 0.761642 ‘parallax’ ]).reset_index() Parallax paral] 0.286789 0.051501 0.012734 0.781963,index source_id ra raerror dec dec_error parallax 0 0 603712422876575360 135,110808 0.179466 10.726941 0.144282 0.286789 1 1 603712427171528448 135.118378 0.223748 10.723134 0.181216 0.051501 2 2 603712457236324224 135.121338 0.523702 10.738075 0.395677 -0.012734 3 3 603712660315544576 135.130346 0.903004 10.744922 0.761642 0.781963 4 4 603713041351907200 135.140797 0.336336 10.772432 0.245219 0.951278 95579 115665 603622950118859520 135.003350 1.049273 10.782234 0.879750 0.584927 95580 115666 603623018837269632 134996200 0.473971 10.786565 0.302199 -0.134166 95581 115667 603623018837270016 134,986138 0.489632 10.790086 0.348764 1.001567 95582 115668 603623053197008512 134,998781 0.346450 10.791484 0.283039 0.518023 To eliminate sources with high uncertainty while still retaining a fraction of sources down to G ~ 21 mag, we need to select the errors in their G-mag must be less than 0.005. Calculate error of G ( |e), Gap (|onpl), and Grp (\onp|). 25 on, lea) “Into Fo. 2.5 OF yp lerl =~ TIO Fup 25 OF mp lenP| = Tt Far Adding 5 more columns named e_Gmag, e_BPmag, e_RPmag and bp_rp (to plot color-magnitude easily) datafile['e Gnag’] = abs(-2.5*datafile[ ‘phot_g_mean_flux_error’ ]/math. log(10)/datafile[ ‘phot_ datafile[‘e BPmag'] = abs(-2.5*datafile[ ' phot_bp_mean_flux_error’ ]/math. 1og(10) /datafile[ ‘pho datafile['e RPmag'] = abs(-2.5*datafile[ ‘phot_rp_mean_flux_error’ ]/math. log(10) /datafile[ ‘pho datafile[ 'bp_rp'] = datafile ‘phot_bp_mean_mag'] - datafile[ ‘phot_rp_mean_mag" ] datafile[ 'parallax_over_error’] = datafile[‘parallax'] / datafile[''parallax_error’ ] Select data with positive parallax value (w > 0) and error of G magnitude (7) < 0.005 pprocessdata = datafile[(datafile['parallax'] > @) & (datafile['e_Gmag'] < 0.005)].reset_inde pprocessdata70360 70361 70362 70363 70364 source_id 603712422876575360 603712427171528448 603715588267458176 603715618332282624 603715622627 196288 603622847038577280 603622881398315520 603622881398315776 603623018837270016 603623053197008512 70365 rows x 28 columns ra ra_error 135,110808 0.179466 135.118378 0.223748 135,105567 0.031968 135.119962 0.312882 135.120392 0.016787 134.983284 0.238657 134.968256 0.477930 134,965021 0.359085 134.986138 0.489632 134,998781 0.346450 Select pmra, pmdec and parallax for plotting parallax"]] dec 10.726941 10.723134 10.754417 10.770758 10.761850 10.768340 10.767039 10,770651 10.790086 10.791484 dec_error 0.144282 0.181216 0.023291 0.246001 0.012744 0.175209 0.351956 0.256683 0.348764 0.283039 parallax paralle 0.286789 0.051501 0.644799 1.489228 2.166628 0.577788 0.068398 0.671293 1.001567 0.518023 df = pprocessdata[["pmra", “pmdec' df = df.to_numpy().astype("float32", copy = False) Visualization | Spatial Distribution fig = plt.figure(figsiz: plt.plot(pprocessdata['ra"], pprocessdatal ‘dec’ ], 6, 6)) plt.xlabel(r'$\alpha$ (deg) Declination’) plt.ylabel(r'$\delta$ (deg) Right Ascension’) plt.title (‘Spatial Distribution of all stars’) plt.show() ") f fi fiSpatial Distribution of all stars 6 (deg) Right Asce! Vector Point Diagram I URANO I fig = plt-Figure(figsize-(6, 6)) plt.plot(pprocessdata['pmra'], pprocessdata[ ‘pmdec'], ',") plt.xlabel(r'$\mu_{\alpha*}$ (mas/yr)") plt.ylabel(r'$\mu_{\delta}$ (mas/yr)') plt.title (‘Plotting Proper Motion as Vector Point Diagram’) plt.ylin(-58,50) pt. x1im(-58, 58) plt.show()Plotting Proper Motion as Vector Point Diagram Color Magnitude Diagram 1 1 fig = plt.figure(figsize=(6, 8)) plt.plot(pprocessdata[ "bp_rp'], pprocessdata['phot_g_mean_mag'], ',') ax = plt.gea() ax. invert_yaxis() Hplt.xlim(@., 3.) plt.title("Color Magnitude Diagram") plt.xlabel('bp - rp") plt.ylabel(’g') plt.show() Color Magnitude Diagram 10 u 16 Normalize the data and run HDBSCAN stscaler_df = StandardScaler().fit(df)df_ = stscaler_df.transform(d#) clus_size = 2 * df_.shape(1] clusterer = hdbscan.HDBSCAN(clus_size) cluster_labels = clusterer.fit_predict (éf_) pprocessdatal ‘hdbscan’] = cluster_labels Vector Point Diagram for every HDBSCAN cluster fig, ax = plt.subplots()#figsize=(6,6)) plot = ax.scatter(pprocessdata[ 'pmra'], pprocessdata[ 'pmdec’], s=5, c=pprocessdatal 'hdbscan’ ] Fig.colorbar(plot, ax=ax) ax = plt.gca() ax.invert_yaxis() plt.xlin( 50,58) plt.ylin(-58,5@) plt.title('Vector Point Diagram for every HDBSCAN cluster”) plt.xlabel(r*$\mu_{\alpha*}$ (mas/yr)") plt.ylabel(r'$\mu_{\delta}$ (mas/yr)') plt.show() ‘Vector Point Diagram for every HDBSCAN cluster 1000 SI 800 S 2 600 & gz 400 200 - ° —o -202«0 CO Has (masiyn) Distribution of stars inside each cluster and the number of members from each clustering result. plt.figure(figsize=(6, 4)) plt.hist(pprocessdata[ "hdbscan’ ]) plt.xlabel(‘Label of Cluster’) plt.ylabel( ‘Number of Sources’) plt.title( Distribution of stars in each HDBSCAN cluster’)plt. show() plt.close() pprocessdatal ‘hdbscan'].value_counts() Distribution of stars in each HDBSCAN cluster 50000 i 7“ 5 30000 3 2 20000 2 10000 ° © 200 «400» «0-800 10001200 Label of Cluster a 54age 5371422 677 aa 1143 7 176 53 sea 6 249 6 491 6 140 6 569 6 Name: hdbscan, Length: 1188, dtype: into ‘Separate the data with a label that shows the background data (abel = -1). result_hdbscan = pprocessdata[pprocessdatal ‘hdbscan'] >= @] .reset_index(drop=True) © = result_hdbscan[ ‘hdbscan"].value_counts() print (c) 537-1422 677 84 1143 70 176 53 1080 51 221 6 259 6 285 6 293 61124 6 Finding the cluster with the most number after assuming the data used only consists of the background and one stellar cluster. n_max = c.index[np.argmax(c)] result = result_hdbscan[result_hdbscan[‘hdbscan"] == n_max] result source_id ra ra_error dec dec_error parallax paralle 119 603785987076155392 134.065048 0.079601 10.469566 0.045647 1.005484 fl 372 603848521800034176 134.004906 0.020841 10.788134 0.011424 1.187326 f 889 — 604003037543393920 134.634072 0.294349 11.514641 0.211293 1.021565, fl 970 604024585394575616 135.082808 0.582022 11.718627 0.392696 1.092838 cl 1159 604612549237529600 133.692016 0.015389 11018895 0.007980 1.097347 c 14532 597664700902078976 132,323039 0.021296 9.606395 0.012219 1.135177 cl 14777 597712426578737792 133.589252 0.013260 9.631878 0.006607 1.151925 cl 14825 597724757429410048 133,852332 0.481212 9.910795 0.264615 1.162087 c 14900 597743311687984768 133.498198 0.143220 10.020892 0.084263 1.389898 f 15297 597830722862488064 133.851594 0.097605 10.549505 0.047888 1.095713, fl 1422 rows 29 columns y Visualization II (Result) Spatial Distribution fig = plt.figure(figsize=(6, 6)) ax = plt.subplot() plt.plot(pprocessdata['ra’], pprocessdata[‘dec'], *.', mei plt.plot(result[‘ra'], result[‘dec'], ‘o', mfc='tab:orange’, markersizi “silver', mfc="darkgray", markersi +» label="HDBSCAN") plt.xlabel(r'$\alpha$ (deg)') plt.ylabel(r'$\delta$ (deg)')plt.legend() plt.show() Bb 2 84S Vector Point Diagram fig = plt.figure(figsize=(6, 6)) plt.plot(pprocessdata[‘pmra'], pprocessdata[‘pndec'], '.", mec="silver', mf plt.plot(result[‘pmra'], result[‘pmdec'], ‘o', mfc="tab:orange', mec='None', markersize darkgray’, mark ey plt.xlabel(r*$\nu_{\alpha*}$ (mas/yr)") plt.ylabel(r'$\mu_{\delta}$ (mas/yr)') plt.xticks() plt.yticks() pit. x1im(-38, 38) plt.ylin(-14, 38) plt. Legend() plt.show()Hs (mas/yr) Color Magnitude Diagram 1: 2 plt.figure(figsize > 8)) plt.plot(pprocessdatal 'bp_rp'], pprocessdatal ‘phot_g_mean_mag'], '.', mec='silver', mfc='dark plt.plot(result["bp_rp'], result['phot_g mean_mag'], ‘o', color="tab:orange’, markersize=2., plt.xlabel(r*$6_{BP}-G_{RP}$") plt.ylabel(r'$6$ (mag)') plt.xlin(@., 3.) plt.gca().invert_yaxis() plt.legend() plt.show()All sources, HOBSCAN Parallax Distribution | | bins_all = np.arange(pprocessdatal 'parallax"].min(), pprocessdatal ‘parallax’ ].max(), .@1) bins_sam = np.arange(result["parallax’].min(), result[‘parallax’].max(), .@1) Ie wh | plt.figure(Figsize=(6, 4)) pprocessdata.parallax.hist(bins=bins_all, color=‘gray', labe! result.parallax.hist(bins=bins_sam, color="orange’, label=' ‘ALL Sources") DBSCAN" ) plt.xlabel(r'$\onega$ (mas)') plt.ylabel( ‘Number of Sources") plt.xlin(@, 5) plt.xticks() plt.yticks() pit. Legend() pit. show() 8 8 Number of Sources 88 8 8 w (mas) 1. Determine the center of the stellar cluster rac np.mean(result[‘ra‘]) dec_c = np.mean(result[ ‘dec’ }) pnra_e = np.mean(result{ ‘pmra']) pmdec_c = np.mean(result[‘pmdec'])parallax_mean = np.mean(result[ ‘parallax’ ]) distance =1000/parallax_mean print (rac, dec_c, pmra_c, pndec_c, parallax_mean,distance) fig = plt.figure(figsize=(6, 6)) ax = plt.subplot() plt.plot(pprocessdata['ra"], pprocessdata[‘dec'], *.", mec="silver', mfce"darkgray’, markersi plt.plot(result['ra'], result["dec'], ‘0’, mfc='tab:orange’, markersize=2., label="HDBSCAN") plt.plot(ra_c,dec_c,‘o', markersize=5, c= ‘green’, label="centre of cluster) plt.xlabel(r'$\alphas (deg)') plt.ylabel(r*$\delta$ (deg)') pit. legend() plt.show() 132. 85333382840096 11.833583454686082 -10.960577721346255 -2.905743785149157 1.15488663° fig = plt.figure(figsize-(6, 6)) plt. plot (pprocessdatal 'pmra’], pprocessdata[ 'pmdec'], '.", mec="silver’, mfc="darkgray’, mark plt.plot(result[‘pmra'], result['pmdec'], ‘o', mfc="tab:orange', mec='None’, markersize=5., 1 plt.plot(pmra_c,pmdec_c, ‘o', markersize=5,c= ‘green’, label=‘centre of cluster‘) plt.xlabel(r'$\mu_{\alpha*}$ (mas/yr)') plt.ylabel(r'$\mu_{\delta}$ (mas/yr)') plt.xticks() plt.yticks() plt.xlim(-15,) plt.ylim(-5,@) plt.legend()plt.show() len(result) 1422 Selecting some parameters to be calculated for all stars allsource = pprocessdata[[ ‘rat, ‘raerror', ‘dec’, ‘dec_error’, ‘parallax’, ‘parallax_error’, ‘pmra’, “pmra_error' “pmdec’, ‘pmdec_error’, ‘phot_g_mean_mag’, “bp_rp’ n allsource.head()ra. raerror dec dec_error parallax parallax_error pra pmra 0 135110808 0.179466 10.726941 0.144282 0.286789 0.189191 -0,675509 0; 1 135.118378 0.223748 10.723134 0.181216 0.051501 0.257076 -6,506968 0; 2 135.105567 0.031968 10.754417 0.023291 0.644799 0.041918 -0.661945 0 ¥ Sample Sources Selection To select the sample source, we select range of proper motions and parallax of the all source that the mean of the enclosed values close to the mean of proper motions (ji,,,, jis) and the mean of parallax (3) HDBSCAN_MEAN_PHRA HDBSCAN_MEAN_PMDEC HDBSCAN_MEAN_PARALLAX pmra_c pmdec_c parallax_mean PMRALRANGE == 3. PMDEC_RANGE = 3. PARALLAX RANGE = 0.4 samplesource = allsource[ (allsource[ 'pmra*] >= HDBSCAN_MEAN_PMRA-(PMRA_RANGE/2.)) & (allsource[‘pmra’] <= HOBSCAN_ (allsource[ ‘pmdec’] >= HOBSCAN_MEAN_PMDEC-(PMDEC_RANGE/2.)) & (allsource[‘pndec'] <= HD8S (allsource[ "parallax" ] >= HOBSCAN_MEAN_PARALLAX-(PARALLAX RANGE/2.)) & (allsource[ ‘paral ].reset_index(drop=True) Vector Point Diagram fig = plt.figure(figsize=(6, 6)) plt.plot(alisource[‘pmra’], allsource['pmdec'], '.", coloi plt.plot(samplesource[ 'pmra'], samplesource[ ‘pndec’], * "gray', markersize=2., label="Al1 » color="blue’, markersize=2., label plt.xlabel(r"$\mu_{\alpha*}$ (mas/yr)") plt.ylabel(r"$\mu_{\delta}$ (mas/yr)") plt.title("Vector Point Diagram") plt.xticks() plt.yticks() plt.xlim(-25,25) plt.ylim(-25,25) pit. 1egend() plt.show()\Vector Point Diagram ll Sources ‘Sample Sources Us (mas/yr) Ha» (mas/yr) Parallax Distribution bins_all = np.arange(allsource[ 'parallax'].min(), allsource[parallax'].max(), .@1) bins_sam = np.arange(samplesource[ ‘parallax’ ].min(), samplesource[ ‘parallax’ ].max(), -01) plt.figure(figsize=(6, 4)) allsource[ ‘parallax’ ].hist(bins=bins_all, color="gray', labe! samplesource[ ‘parallax’ ].hist(bins=bins_sam, color="b', label= ‘11 Sources") ‘Sample Sources") plt.xlabel(r"$\omega$ (mas)") plt.ylabel("Number of Sources") plt.xlim([@, 5]) plt.xticks() plt.yticks() plt.legend() plt.show()mm All Sources ‘mm Sample Sources ver of Sources Color Magnitude Diagram , 1 plt.figure(Figsize=(6, 8)) plt.plot(allsource[‘bp_rp'], allsource[‘phot_g mean_mag'], *.', colo’ plt.plot(samplesource[ ‘bp_rp'], samplesource[ ‘phot_g mean_mag'], ‘.', color gray’, markersize: *, markersize plt.xlabel(r"$6_{8P}-G_{RP}$") plt.ylabel(r"$G$ (mag)") plt.xlim([@., 3.5]) plt.ylim(8,20) plt.gca().invert_yaxis() plt.legend() plt.show()print(‘Al1-Sources “d+ \nSample-Sources. %d" -%(1en(allsource), - 1en(samplesource))) All Sources = 70365 Sample Sources = 1714 ime Normalize the data df = samplesource[["pmra", “pmdec", “parallax")] df = d¥.to_numpy().astype("float32", copy = False) zB] aE 3 g stscaler_df = StandardScaler().fit(d#) df_ = stscaler_df.transform(d#) 201 AN EISSN: norm_pmra = df_[:,0] norm_pmde = df_[:,1] norm_para = df_[:,2] |, Ls 3 Select some parameters to be calculated a sample_data_dict = { *norm_pnra’ : norm_pmra, *norm_pnde’ : norm_pmde, *norm_para’ : norm_para, t sample_data = pd.DataFrame(sample_data_dict) Train Gaussian Mixture Model (GMM) with whole data with two gaussian components (field and cluster) gnm = GaussianMixture(n_components=2, max_iter=1000, covariance_type="full', randon_state=Non Calculate means, covariances and weights of trained/fitted models gnm.means_, gnm.covariances_, gnm.weights_ (array([[ @.04856147, 0.03984163, @.00229148), [-0.01871317, -0.01535297, -0.00088304)]), array([[[ 2.99284569, @.12718294, -0.09479567], { @.12718294, 3.024422, [email protected]], [-2.08479567, -8.16097453, 1.76752092]],[[ @.23079895, @.01806545, @.02431149], [ @.e1806545, .21904246, [email protected]], [ 0.02431149, [email protected]@58193, @.7@423423]]]), array([@.27816086, @.72183914])) Calculate the probabilities of the whole data pred_data = gnm.predict_proba(sample_data) pred_data array([[8.51605686e-02, 9.14839439e-01], [1.ee0eeeeec+20, 1.224277210-18], [1.2e000e00c+00, 1.06197402e-13], [1.000000000+00, 2.55563579e-10], [9.91619509e-01, 8.38049093e-03], [1.00000000e+00, 1.22142457e-12]]) Check the calculated probabilities plt.hist(pred_data[:,@], bins=[@., .1, .2, .3, .4, .5, plt.xlim([@., 1.]) plt.xlabel("Probability for mu_alpha (mas/yr)") plt.ylabel("Number of sources") plt.show() , +9, 1.]) 1000 800 600 400 Number of sources 200 00 02 oa 06 08 10 Probability for mu_alpha (mas/yr) plt.hist(pred_data[:,1], -bins=[0.,+.1,+.2, plt.x1im([@.,-1.]) plt.xlabel("Probability-for-$\mu_{\delta}$-(mas/yr)") plt.ylabel("Number-of sources”) plt.show() 0275 By*.9,02.])1000 800 600 400 Number of sources 200 00 02 o4 06 os 10 > The Probabilities samplesource| ‘prob’ ] pred_data[:,0] print: (samplesource[ ‘prob’ ]) NameError Traceback (most recent call last) in () => 1 samplesource[ 'prob’] = pred_data[:,0] 2 print (samplesource['prob']) NameError: name 'pred_data’ is not defined ‘SEARCH STACK OVERFLOW Determine the probability member classes. According to Agarwal et al. (2021), there are three main classes: member_high is high probability members (P(x) > 0.6); member_moder is moderate probability members (0.2 < P(x) < 0.6); and menber_low is low probability members ( P(x) < 0.2). There is also one additional class: member_ultra is ultra-high probability members ( P(x) > 0.8) menber_ultra = samplesource[samplesource[ ‘prob'] >= .8].reset_index(drop=True) menber_high = samplesource[samplesource[ ‘prob’] >= .6].reset_index(drop=True) menber_noder = samplesource[(samplesource[ ‘prob'] > .2) & (samplesource['prob'] < .6)].reset_ member_low = samplesource[samplesource[ ‘prob’ ] <= .2].reset_index(drop=True) print (menber_ultra)Stars with a high probability values are automatically considered as members of the cluster. Stars with medium probability values can be considered as the cluster members(member_incl) if their parallax values lie in the parallax value range of ultra-high probability cluster members. rember_ultral ‘parallax’ ].min()) & ember_ultraf ‘parallax’ ].max())].rese member_incl = member_moder[ (member_moder| ‘parallax" ] (menber_moder[ ‘parallax’ } print(‘Sample Sources = %d \nHigh probability menber sources (p >= @.6) = %d \nModerate proba Combine member_high and member_incl to get all members. member_all = pd.concat([member_high, member_incl]).sort_values(by=['prob'], ascending-False). Len(menber_all) Calculate some important parameters mean_para_val = np.mean(member_all[ ‘parallax’ ]) mean_para_std = np.std(member_all[ ‘parallax’ ]) menber_dist = 1000. /(menber_al1[‘parallax']) mean_pnra_val = np.mean(menber_all{ ‘pnra’]) mean_pmra_std = np.std(menber_all['pmra' }) mmean_pnde_val = np.mean(menber_all{ ‘pndec’ }) mean_pnde_std = np.std(menber_all{ 'pmdec' ]) mean_dist_val = np.mean(member_dist ) mean_dist_std = np.std(menber_dist ) mean_pmra_val, mean_pmra_std, mean_pnde_val, mean_pmde_std, mean_para_val, mean_para_std, mea Visualization Il (Result) Probability Distribution bins_sanp = np.arange(samplesource['prob"].min(), samplesource{ "prob"].max(), 1)bins_high = np.arange(samplesource[ 'prob'][samplesource[ 'prob'] >= .6].min(), samplesource['p bins_mode = np.arange(samplesource[ ‘prob’ ][(samplesource[ ‘prob’] >= .2) & (samplesourcet ‘prob (samplesource[ ‘parallax’ ] >= menber_ultra[ ‘parallax’ ]. (samplesource[ ‘parallax’ ] <= menber_ultra[ ‘parallax’ ]. samplesource[ ‘prob’ ][(samplesource[ ‘prob'] 2) & (samplesource[ ‘prob (samplesource['parallax'] >= menber_ultra[ ‘parallax’ ]. (samplesource['parallax'] <= menber_ultra[ ‘parallax’ ]. bins = np.linspace(@., 1., 19) plt.Figure(Figsize=(6, 4)) plt.hist(samplesource['prob'], bins=[0., .1, .2, .3, +4, +5, +6) «7, «8, -9, 14], color="dark plt.hist(member_high['prob'], bins=[.6, .7, .8, .9, 1.], color="tab:orange’, rwidth=.975, lab plt.hist(menber_incl[‘prob'], bins=[.2, .3, .4, .5, .6], color="tab:green’, rwidth=.975, labe plt.xlabel("Probability") plt.ylabel("Number of Sources") plt.xlim([@., 1.]) plt.xticks() plt.yticks() pit. legend() plt.show() 1000 Sm Sample Sources EE High Probabilty Members EE Moderate Probability Members 800 600 400 Number of Sources 200 00 02 oa 06 08 10 Probability Vector Point Diagram fig = plt.figure(Figsize-(6, 6)) plt.plot(samplesource[‘pmra'], samplesource['pndec'], ‘o", mec="silver', mfc="darkgray’, mark plt.plot(menber_high{‘pmra'], menber_high[‘pndec'], ‘o', mfc='tab:orange', mec='None’, marker plt.plot(menber_incl['pmra'], menber_incl['pmdec'], ‘o', mfc="tab:green’, mec='None', markers plt.xlabel(r"$\mu_{\alpha*}$ (mas/yr)")plt.ylabel(r"$\mu_{\delta}$ (mas/yr)") plt.xticks() plt.yticks() plt.title("Vector Point Diagram") plt.legend() pit. show() Vector Point Diagram -1s 20 e725 5 z a E-30 £ “35 es = Sample Sources “ + Hh probabity (2 > =0.6) + Moderate probabity (02< =p =06) 25 120 115 110 -105 -100 95 Ha» (mas/yr) Parallax and proper motions distribution bins_samp = np.arange(samplesource[ ‘parallax'].min(), samplesource[ ‘parallax'].max(), -05) bins_high = np.arange(menber_high[‘parallax'].min(), menber_high{ ‘parallax’ ].max(), .@5) bins_mode = np.arange(menber_incl['parallax'].min(), menber_incl{ ‘parallax’ ].max(), .@5) plt.Figure()#figsize=(6, 4)) sanplesource[ ‘parallax’ ].hist(bins=bins_samp, color='silver', rwidth=.85, label="Sample Sourc menber_high[ ‘parallax’ ].hist(bins-bins_high, color="tab:orange’, rwidth=.85, label=n"High pro menber_incl[ ‘parallax" ].hist(bins-bins_mode, color: plt.xlabel(r"$\onega$ (mas)") plt.ylabel("Number of Sources") plt.xticks() plt.yticks() plt.legend() plt.show()Ee TS Sample Sources Hoh probably (p> =06) $ wo sm Moderate protabity(02< =p< =06) 5 8 00 3 3 200 E 5 = 100 ot 095 100 105 110 115 120 125 130 w (mas) Spatial distribution fig = plt.figure(figsize-(6, 6)) plt.plot(samplesource['ra’], samplesource["dec'], ‘o", me: plt.plot(menber_high['ra'], member_high['dec'], ‘o plt.plot(menber_incl[‘ra*], member_incl[‘dec'], silver’, mfc="darkgray", markersi ‘tab:orange", markersiz plt.xlabel(r'$\alphas (deg)') plt.ylabel(r'$\delta$ (deg)') plt.legend() fax. set_xticklabels([358.25, 358.5, 358.75, 359.0, 359.25, 359.5, 359.75, 0.00, 0.25], fontsi plt.show() ‘Sample sources + High probabilty (p > =0.6) + Moderate probabiity (02< =p< =06) 6 (deg) BO O84Color Magnitude Diagram plt.Figure(figsize=(6, 8) plt.plot(samplesource['bp_rp'], samplesource['phot_g_mean_mag'], ‘o', mec='silver', mfc="dark plt.plot(menber_high['bp_rp'], member_high[‘phot_g mean_mag'], 'o', color='tab:orange’, marke plt.plot(menber_incl['bp_rp'], member_incl["phot_g mean_mag'], ‘o', color='tab:green’, marker plt.xlabel(r"$6_{8P}-G_{RP}$") plt.ylabel(r"$6$ (mag)") plt.xlim([@., 3.]) plt.gca().invert_yaxis() plt.legend() plt.show() ‘Semple Sources + High probabilty (p> =0.6) + Moderate probabilty (02< =p< =06) 10 2 u G (mag) 16 oo 05 10 15 20 25 30 plt.Figure(Figsize=(6, 8) plt.plot(menber_all['bp_rp'], menber_all['phot_g mean_mag'], ‘o', color=‘tab:blue', markersiz plt. xlabel(r"$6_{8P}-G_{RP}$") plt.ylabel(r"$68. (mag)")plt.xlim([@., 3.]) plt.ylin([10,20]) plt.gca().invert_yaxis() plt.legend() plt.show() 10 - u G (mag) 16 18 ‘All members 0.0 yemenber_al['phot_g_mean_mag"] member_all['bp_rp'] [00.5] yaex[x>0.5] xaqys5-5*np. 1ogi@ (distance) print (1en(x),len(y)) axeaz.plot_kde(ya, rugeTrue) pit. show() plt.close() ax-az.plot_kde(xa, rug-True) plt.show() plt.close() ax-az.plot_kde(xa, values2-ya, contour-False, pcolormesh_kwargs 3.0 “cmap “inferno"}, legend=ax. invert_yaxis() pit. show() plt.close() X_train,X_test,y_train,y test = train_test_split(xa,ya, [email protected]) pmse_list=[] P2_list=[] for i in range (7,17): for j in range (17): knots = 4 degree = j # try different knots and degree values try: X_spline = dnatrix("bs(x,df = ‘+str(knots)+", degree spline_fit = sm.GLM(y_train,X_spline).fit() +str(degree)+', include_interce y_pred_train = spline_fit.predict(dmatrix('bs(test, df = ‘+str(knots)+", degree = ‘+str rmse_train = np.sqrt(mean_squared_error(y_train,y_pred_train)) print (“root mean square error for training set ", rmse_train) print("r2 score for training set ",r2_score(y train,y pred_train)) y_pred = spline_fit.predict(dmatrix('bs(test, df = ‘+str(knots)+', degree = ‘+str(degre rse_test = np.sqrt(mean_squared_error(y_test,y_pred)) print(“root mean square error for training set “,rnse_test) print("r2 score for training set ",r2_score(y_test,y_pred)) rmse_list.append ([rmse_train,rmse_test]) 2_list.append([r2_score(y_train,y_pred_train),r2_score(y_test,y_pred)]) range_pred = np.Linspace(np.min(X_train) ,np-max(X_train),5@) prediction = spline_fit.predict(dnatrix(‘bs(xp, df = ‘+str(knots)+", degree = ‘+str(deg plt.Figure(Figsize=(7,7)) plt.plot(range_pred, prediction, color='r', label='Specifying degree = '+str(degree)+" plt.scatter(xa,ya, color="blue’ , alpha=8.3, edgecolor="k’) plt.xlabel('Color") plt.ylabel("6") pit. legend() #plt.scatter(menber_all['bp_rp'].tolist(), member_all{"phot_g mean_mag'].tolist(), face ax = plt.gcea() ax.invert_yaxis() plt.show() plt.close() except: print ("fail") print (rmse_list) print (r2_list)rmse_list=np.array(rmse_list) r2_listenp.array(r2_list) Hiprint (np.max(range_pred) .np.min(range_pred)) print (min(rmse_list[:,1]))0 2 4 6 8 10 Color root mean square error for training set 0.1359167441682956 2 score for training set @.947@239280007002 root mean square error for training set @.17200044938890915 r2 score for training set 0.9199368439062416 os — Specifying degree = 1 with 8 knots 10 is 20 25 0 2 4 6 8 10 Color root mean square error for training set @.13177983847404448 r2 score for training set @.9501997216524329 root mean square error for training set @.1597061776926086 2 score for training set @.9309733217975995 os — Specifying degree = 2 with 8 knots 10 is} © 20

RMxAA 61-1 - Wdias-I
No ratings yet
RMxAA 61-1 - Wdias-I
17 pages
Galaxy Properties & Universe Structure Analysis
No ratings yet
Galaxy Properties & Universe Structure Analysis
20 pages
ISA - Summer School - Project Report Dynamical Mass of Galaxy Cluster and SNe Cosmology
No ratings yet
ISA - Summer School - Project Report Dynamical Mass of Galaxy Cluster and SNe Cosmology
20 pages
Kami Export - 2303.08474v1
No ratings yet
Kami Export - 2303.08474v1
10 pages
Gharat 22
No ratings yet
Gharat 22
5 pages
PythonTutorial Metivier
No ratings yet
PythonTutorial Metivier
11 pages
PetroFit: Python for Galaxy Light Profiles
No ratings yet
PetroFit: Python for Galaxy Light Profiles
21 pages
Silo - Tips Astrometrynet-Documentation
No ratings yet
Silo - Tips Astrometrynet-Documentation
67 pages
Hyades Cluster Distance Analysis
No ratings yet
Hyades Cluster Distance Analysis
8 pages
EE485 Proposal
No ratings yet
EE485 Proposal
2 pages
Kernel Clustering in CoRoT Data
No ratings yet
Kernel Clustering in CoRoT Data
15 pages
A Normalizing Flow Approach For The Inference of Star Cluster Properties From Unresolved Broadband Photometry
No ratings yet
A Normalizing Flow Approach For The Inference of Star Cluster Properties From Unresolved Broadband Photometry
17 pages
SpectralLineDataReduction Salas 2.16.22
No ratings yet
SpectralLineDataReduction Salas 2.16.22
44 pages
ML0101EN Clus DBSCN Weather Py v1
No ratings yet
ML0101EN Clus DBSCN Weather Py v1
16 pages
Identify Main-Sequence Binaries From The Chinese Space Station Telescope Survey With Machine Learning. II. Based On Gaia and GALEX
No ratings yet
Identify Main-Sequence Binaries From The Chinese Space Station Telescope Survey With Machine Learning. II. Based On Gaia and GALEX
13 pages
Automated Galaxy Morphology Classification and Dynamical Mass Estimation Using Multi-Wavelength Data and Deep Convolutional Neural Networks
No ratings yet
Automated Galaxy Morphology Classification and Dynamical Mass Estimation Using Multi-Wavelength Data and Deep Convolutional Neural Networks
11 pages
Simulation Analysis
No ratings yet
Simulation Analysis
12 pages
Ijettcs 2015 02 25 109
No ratings yet
Ijettcs 2015 02 25 109
5 pages
A Comparative Study of Halo Mass Estimates From Group Catalogs and Lensing Signals
No ratings yet
A Comparative Study of Halo Mass Estimates From Group Catalogs and Lensing Signals
22 pages
(Astrophysics and Space Science Library 131) Fionn Murtagh, André Heck (Auth.) - Multivariate Data Analysis-Springer Netherlands (1987) PDF
No ratings yet
(Astrophysics and Space Science Library 131) Fionn Murtagh, André Heck (Auth.) - Multivariate Data Analysis-Springer Netherlands (1987) PDF
224 pages
Brescia 2015 Automated Physical Classification in SDSS DR10
No ratings yet
Brescia 2015 Automated Physical Classification in SDSS DR10
13 pages
Stellar Classification via ML Models
No ratings yet
Stellar Classification via ML Models
6 pages
Dark Matter Halo Parameters From Overheated Explonet
No ratings yet
Dark Matter Halo Parameters From Overheated Explonet
14 pages
Surveying The Reach and Maturity of Machine Learning and Artificial Intelligence in Astronomy
No ratings yet
Surveying The Reach and Maturity of Machine Learning and Artificial Intelligence in Astronomy
40 pages
Tidal Tails and Their Dynamics in Open Clusters Using Gaia DR3
No ratings yet
Tidal Tails and Their Dynamics in Open Clusters Using Gaia DR3
4 pages
Page 4
No ratings yet
Page 4
1 page
A COMPASS To Model Comparison and Simulation-Based Inference in Galactic Chemical Evolution
No ratings yet
A COMPASS To Model Comparison and Simulation-Based Inference in Galactic Chemical Evolution
18 pages
Page 5
No ratings yet
Page 5
2 pages
AstroMethods Assignement1 2024
No ratings yet
AstroMethods Assignement1 2024
2 pages
3 Comparisson
No ratings yet
3 Comparisson
14 pages
Galaxies in Xray Selected Clusters and Groups in Dark Energy Survey Data II. Hierarchical Bayesian Modelling of The Red Sequence Galaxy Luminosity Function
No ratings yet
Galaxies in Xray Selected Clusters and Groups in Dark Energy Survey Data II. Hierarchical Bayesian Modelling of The Red Sequence Galaxy Luminosity Function
17 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
27 pages
Michelle Lochner, Jason D. Mcewen, Hiranya V. Peiris, Ofer Lahav, Max K. Winter
No ratings yet
Michelle Lochner, Jason D. Mcewen, Hiranya V. Peiris, Ofer Lahav, Max K. Winter
15 pages
20bec100 MV Sa Report..
No ratings yet
20bec100 MV Sa Report..
4 pages
Density-Based Clustering Guide
No ratings yet
Density-Based Clustering Guide
21 pages
ASTR 404 - Worksheet 5 - Jason Seo
No ratings yet
ASTR 404 - Worksheet 5 - Jason Seo
20 pages
The Dawes Review 10 The Impact of Deep Learning For The Analysis of Galaxy Surveys - 2023 - Cambridge University Press
No ratings yet
The Dawes Review 10 The Impact of Deep Learning For The Analysis of Galaxy Surveys - 2023 - Cambridge University Press
53 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
PosterSAB2018 PHBarchi
No ratings yet
PosterSAB2018 PHBarchi
1 page
Interstellar Medium Clustering SEO
No ratings yet
Interstellar Medium Clustering SEO
27 pages
Kreidberg 2015 PASP 127 1161
No ratings yet
Kreidberg 2015 PASP 127 1161
5 pages
BSC Thesis Extract
No ratings yet
BSC Thesis Extract
32 pages
2024 Summer Projects Descriptions MPIA
No ratings yet
2024 Summer Projects Descriptions MPIA
14 pages
Sef Stars
No ratings yet
Sef Stars
50 pages
Healpy Skymap Analysis in Python
No ratings yet
Healpy Skymap Analysis in Python
9 pages
Angular Correlation Functions of Bright Lyman-Break Galaxies at 3 Z 5
No ratings yet
Angular Correlation Functions of Bright Lyman-Break Galaxies at 3 Z 5
18 pages
Classification of Stars, Galaxies and Quasars
No ratings yet
Classification of Stars, Galaxies and Quasars
8 pages
Astrostatistics: 09 Mar 2020
No ratings yet
Astrostatistics: 09 Mar 2020
21 pages
V5 N1 3 - Lima Hetem Compactado
No ratings yet
V5 N1 3 - Lima Hetem Compactado
16 pages
Gal 2003
No ratings yet
Gal 2003
21 pages
Benítez 2000 ApJ 536 571
No ratings yet
Benítez 2000 ApJ 536 571
13 pages
Time-Domain Astronomy: ML & Data Mining
No ratings yet
Time-Domain Astronomy: ML & Data Mining
37 pages
Ulisse:: Determination of Star-Formation Rate and Stellar Mass Based On The One-Shot Galaxy Imaging Technique
No ratings yet
Ulisse:: Determination of Star-Formation Rate and Stellar Mass Based On The One-Shot Galaxy Imaging Technique
22 pages
Es2018 2
No ratings yet
Es2018 2
7 pages
Research Paper
No ratings yet
Research Paper
29 pages
4.6 Dbscan
No ratings yet
4.6 Dbscan
27 pages
Deepshadows: Separating Low Surface Brightness Galaxies From Artifacts Using Deep Learning
No ratings yet
Deepshadows: Separating Low Surface Brightness Galaxies From Artifacts Using Deep Learning
22 pages

Cluster Hdbscan Dan GMM

Uploaded by

Cluster Hdbscan Dan GMM

Uploaded by

You might also like