Page 152 - Kỷ yếu hội thảo quốc tế: Ứng dụng công nghệ mới trong công trình xanh - lần thứ 9 (ATiGB 2024)
P. 152

th
               HỘI THẢO QUỐC TẾ ATiGB LẦN THỨ CHÍN - The 9  ATiGB 2024                                 143

               When the value of k  is re-added and the new cluster
               alike  the  antecedent  cluster  is  formed  with  the  error
               value not changing notably, which is the found value of
                k . The relation plot of  S  and k  looks like an arm, so
               “elbow” is the point of inflection on the curve and is
               considered as the best clustering number of the data.
                  3.2. k-means algorithm
                  The  k-means  is  an  unsupervised  learning
               algorithm designed to segregate unlabeled data into a
               certain number of distinct groups. In other words, k-
               means  classifies  observed  multivariate  data  into
               several groups, without any supervision, on the basis
               of similar features, common patterns or characteristics
               hidden inside the datasets.
                  The working flow of the k-means algorithm that is
                                                (1)
               used to group n observations including x , x , ..., x (3)
                                                   (2)
               into clusters is explained in the below steps [11]:
                  • Step 1: Choose the value of to decide the number
               of clusters to be formed.                             Fig. 3. Flowchart of proposed method.
                  • Step 2:  Randomly select points that will act as   4. NUMERICAL RESULTS
               centroids.  Means  of  clusters  are  symbolized
                             .
               as,  ,   ...,                                  4.1. Conducted method implementation
                            k
                      2
                   1
                                                                 The data used to feed the proposed approach are
                  •  Step  3:  Assign  each  datapoint,  based  on  their   accumulated  from  the  TWAMP  server  system.  Data
               Euclidean distance from the randomly centroids to the   are integer values of the number of PL and percentage
               closest  centroids  that  will  form  the  predefined   of TWAMP setup success rate overtime. The amount
               clusters.                                      of Twamp messages lost during the feedback process
                                          2
                     c ( ) i  =    ( ) i  − argmin x   ( ) k                           (2)   to the Twamp server is indicated in the data column
                                k
                                          2                   labeled  "Lost  Twamp  Msg".  Furthermore,  the  data
                  • Step 4: Calculate a new centroid of each cluster.     column  identified  as  "TWAMP  msg  transm  SR"
                                                              represents  the  successful  establishment  rate  of
                           1   ( )   ( )      ( )
                      =      x   1 k  +  x  2 k  + +  k m               (3)   TWAMP  message  transmission.  Over  2,747  BSs,  or
                                         ... x
                       k
                          m                                 2,747 observed samples, were gathered in a single day
                                                              to  provide  the  data  used  in  the  optimal  k-means
                  • Step 5: Repeat step number 3, which reassigns   model. Fig. 4 displays a  snapshot of a piece of data
               each  data  point  to  the  new  nearest  centroid  of  each   utilized in this paper.
               cluster.
                                                                 The implementation of our recommended method
                  • Step 6: If any re-assignment occurs, go to step 4;   is  done  in  Python  programming  language  with  the
               otherwise, go to step 7.                       most  known  up-to-date  libraries  for  ML  that  are
                  •  Step  7:  The  iteration  concludes  when  the   NumPy,  Scikit-Learn,  Pandas,  Matplotlib  and
               algorithm converges, signifying that the cost function   Seaborn.
               has reached its minimum value.
                                    (
                  argmin  ( ) 1  ( ) n  J c ( ) 1  ,...,  ( ) n  , c  ,...,   )     (4)
                        c  ,...,c  , 1 ,...,  k  1  k
                              
                  3.3. Flowchart of proposed method
                  The  proposed  method  is  principally  divided  into
               two  steps.  The  first  step  is  to  identify  the  optimal
               value number of clusters by the Elbow method. The
               second one is using k-means algorithm with optimal
               to  analyze  the  studied  dataset  in  order  to  discover
               concealed performance patterns of MBH. All of these
               steps are summarized as Fig. 3 via pseudo-code.



                                                                                   ISBN: 978-604-80-9779-0
   147   148   149   150   151   152   153   154   155   156   157