Page 152 - Kỷ yếu hội thảo quốc tế: Ứng dụng công nghệ mới trong công trình xanh - lần thứ 9 (ATiGB 2024)
P. 152
th
HỘI THẢO QUỐC TẾ ATiGB LẦN THỨ CHÍN - The 9 ATiGB 2024 143
When the value of k is re-added and the new cluster
alike the antecedent cluster is formed with the error
value not changing notably, which is the found value of
k . The relation plot of S and k looks like an arm, so
“elbow” is the point of inflection on the curve and is
considered as the best clustering number of the data.
3.2. k-means algorithm
The k-means is an unsupervised learning
algorithm designed to segregate unlabeled data into a
certain number of distinct groups. In other words, k-
means classifies observed multivariate data into
several groups, without any supervision, on the basis
of similar features, common patterns or characteristics
hidden inside the datasets.
The working flow of the k-means algorithm that is
(1)
used to group n observations including x , x , ..., x (3)
(2)
into clusters is explained in the below steps [11]:
• Step 1: Choose the value of to decide the number
of clusters to be formed. Fig. 3. Flowchart of proposed method.
• Step 2: Randomly select points that will act as 4. NUMERICAL RESULTS
centroids. Means of clusters are symbolized
.
as, , ..., 4.1. Conducted method implementation
k
2
1
The data used to feed the proposed approach are
• Step 3: Assign each datapoint, based on their accumulated from the TWAMP server system. Data
Euclidean distance from the randomly centroids to the are integer values of the number of PL and percentage
closest centroids that will form the predefined of TWAMP setup success rate overtime. The amount
clusters. of Twamp messages lost during the feedback process
2
c ( ) i = ( ) i − argmin x ( ) k (2) to the Twamp server is indicated in the data column
k
2 labeled "Lost Twamp Msg". Furthermore, the data
• Step 4: Calculate a new centroid of each cluster. column identified as "TWAMP msg transm SR"
represents the successful establishment rate of
1 ( ) ( ) ( )
= x 1 k + x 2 k + + k m (3) TWAMP message transmission. Over 2,747 BSs, or
... x
k
m 2,747 observed samples, were gathered in a single day
to provide the data used in the optimal k-means
• Step 5: Repeat step number 3, which reassigns model. Fig. 4 displays a snapshot of a piece of data
each data point to the new nearest centroid of each utilized in this paper.
cluster.
The implementation of our recommended method
• Step 6: If any re-assignment occurs, go to step 4; is done in Python programming language with the
otherwise, go to step 7. most known up-to-date libraries for ML that are
• Step 7: The iteration concludes when the NumPy, Scikit-Learn, Pandas, Matplotlib and
algorithm converges, signifying that the cost function Seaborn.
has reached its minimum value.
(
argmin ( ) 1 ( ) n J c ( ) 1 ,..., ( ) n , c ,..., ) (4)
c ,...,c , 1 ,..., k 1 k
3.3. Flowchart of proposed method
The proposed method is principally divided into
two steps. The first step is to identify the optimal
value number of clusters by the Elbow method. The
second one is using k-means algorithm with optimal
to analyze the studied dataset in order to discover
concealed performance patterns of MBH. All of these
steps are summarized as Fig. 3 via pseudo-code.
ISBN: 978-604-80-9779-0