Visitor clustering lets you leverage customer characteristics to dynamically categorize visitors and generate cluster sets based on selected data inputs, thus identifying groups that have similar interests and behaviors for customer analysis and targeting.

The clustering process requires you to identify metrics and dimension elements to use as inputs, and allows you to choose a specific target population to apply these elements to create specified clusters. When you run the clustering process, the system uses the metric and dimension inputs to determine appropriate initial centers for the specified number of clusters. These centers are then used as a starting point to apply the K-Means algorithm.

The initial centers are intelligently chosen via a Canopy Clustering pass. | Data clusters are created by associating every data point to the nearest center. | The mean of each of the K clusters becomes the new center. | The algorithm is repeated in steps 2 and 3 until convergence is reached. This can take multiples passes. |

The Maximum Iterations in the Options menu allows the analyst to specify the maximum number of iterations to be performed by the clustering algorithm. Setting this option may result in faster completion of the clustering process based on the maximum iterations cap at the expense of exact convergence of the cluster centers.

Note: Once the clusters have been defined, the Cluster Dimension can be
saved for use just like any other dimension. It can also be loaded into the
Cluster Explorer to examine the separation of cluster centers.

In the Cluster Builder, you can now select Options > Algorithm to select algorithms when defining clusters.

- KMeans. This algorithm uses canopy clustering to define the centers of the cluster.
- KMeans++. This algorithm expedites cluster building when running against large sets of data.

KMeans++ is an
improved implementation of KMeans clustering algorithm because it provides
better initialization of initial k centers. (The original KMeans algorithm
chooses initial centers randomly.) KMeans++ selects the first center randomly.
The remaining k-1 centers will be chosen one by one based on the distance a
data point is to the closest existing center. The furthest data points have a
better chance to be chosen as a new center than nearby data points. After the
initial center is chosen, the procedure is performed exactly the same as the
original KMeans clustering.

The workflow for KMeans++ is exactly the same as the workflow for
KMeans clustering, except that you need to select
**Options** >
**Algorithm** >
**KMeans++** in the cluster builder.

Note: Each DPU runs its own KMeans++ procedure on its own data portion.
If the DPU has enough available memory (the ratio is configurable in the
PAServer.cfg file), then the data of those involved variables will be brought
into memory. The remaining k-1 initial center selection and converging
iterations all happen in memory, which is faster than the previous KMeans
clustering.