1
2
3
4
5
#

Name:

Description:

An artificial data set consisting of 3000 points in 3 quite well-separated clusters.

Variables:

A data frame with 3000 observations on 3 numeric variables (named V1 and V2) giving the x and y coordinates of the points, respectively.

 

Note

Our version of the xclara is slightly more rounded than the one from read.table("xclara.dat") and the relative difference measured by all.equal is 1.15e-7 for V1 and 1.17e-7 for V2 which suggests that our version has been the result of a options(digits = 7) formatting.

Previously (before May 2017), it was claimed the three cluster were each of size 1000, which is clearly wrong. pam(*, 3) gives cluster sizes of 899, 1149, and 952, which apart from seven “outliers” (or “mislabellings”) correspond to observation indices 1:900, 901:2050, and 2051:3000, see the example.

Link To Google Sheets:

Rows:

Columns:

License Type:

References/Notes/Attributions:

Source

Sample data set accompanying the reference below (file ‘xclara.dat’ in side ‘clus_examples.tar.gz’).

References

Anja Struyf, Mia Hubert & Peter J. Rousseeuw (1996) Clustering in an Object-Oriented Environment. Journal of Statistical Software 1. doi: 10.18637/jss.v001.i04

R Dataset Upload:

Use the following R code to directly access this dataset in R.

d <- read.csv("https://www.key2stats.com/Bivariate_Data_Set_with_3_Clusters_174_38.csv")

R Coding Interface:


Datasets Tag Questions & Instructional Blocks

NumberContentType
No results found.