Issue

I'm using the following data set to perform a cluster analysis on categorical data - link to data set - using the following packages in R:

library(cluster)
library(dplyr)
library(ggplot2)
library(readr)

With the following code, I get to observe what the profile of clients is within 5 clusters (NB: I'm picking 5 clusters instead of 7 or 8 to keep things more or less simple):

df.torun <- subset(df.bank, select = -c(loan, contact, day, month, duration, campaign, pdays, previous, poutcome, y))

gower_dist <- daisy(df.torun, metric = "gower")

gower_mat <- as.matrix(gower_dist)

sil_width <- c(NA)
for(i in 2:8){
  pam_fit <- pam(gower_dist, diss = TRUE, k = i)
  sil_width[i] <- pam_fit$silinfo$avg.width
}

plot(1:8, sil_width,
     xlab = "Number of clusters",
     ylab = "Silhouette width")
lines(1:8, sil_width)

k <- 5
pam_fit <- pam(gower_dist, diss = TRUE, k)
pam_results <- df.torun %>% 
  mutate(cluster = pam_fit$clustering) %>% 
  group_by(cluster) %>% 
  do(the_summary = summary(.))
pam_results$the_summary

As you'll be able to see if you run this script using the data I shared, you'll get a lot of information about the profile o clients in the following categories: age, job, marital, education, default, balance and housing. Here's a screenshot of the results I get for cluster 1:

Results I get for cluster 1

As can be seen in the image above, under the job column, some of the results are "hiding" under the category (Other).

My question: what code can I use to list all the words from the job column that are "hiding" under (Other)?

Thank you very much for your help!

Solution

You may use maxsum=. Example:

d <- data.frame(x=gl(10, 5), y=rnorm(50))

summary(d)
#       x            y          
# 1      : 5   Min.   :-1.7459  
# 2      : 5   1st Qu.:-0.8480  
# 3      : 5   Median :-0.2293  
# 4      : 5   Mean   :-0.1439  
# 5      : 5   3rd Qu.: 0.4109  
# 6      : 5   Max.   : 2.5951  
# (Other):20            

summary(d, maxsum=11)
#  x           y          
# 1 :5   Min.   :-1.7459  
# 2 :5   1st Qu.:-0.8480  
# 3 :5   Median :-0.2293  
# 4 :5   Mean   :-0.1439  
# 5 :5   3rd Qu.: 0.4109  
# 6 :5   Max.   : 2.5951  
# 7 :5                    
# 8 :5                    
# 9 :5                    
# 10:5

Answered By - jay.sf

Answer Checked By - David Goodson (PHPFixing Volunteer)

Issue

I'm searching for a solution to treat a warning message, from the Kmeans function of amap package. The warning message is the following:

empty cluster: try a better set of initial centers.

Is there anyway I could get a signal, so could know when this error message thrown, and then handle the problem? (e.g: running the algorithm until the there is the return has no empty cluster)

It is quite hard to make a nice reproducible example for me. But, I came with this ugly, but functional:

library(amap)

numberK = 20
ts.len = 7

time.series <- rep(sample(1:8000, numberK, replace = TRUE),ts.len)
time.series <- rep(rbind(time.series, time.series), 30)
time.series <- matrix(time.series, ncol = ts.len)

centers <- matrix( sample(1:3000, numberK*ts.len), ncol = ts.len)

Kmeans((time.series), centers = centers, iter.max = 99)

If you run this on you terminal, it might send you the warning message I'm talking about.

Note: My thoughts of solving this problem is catching the signal of the warning, and then execute the solution. However, I have no idea how can I possibly do that

Solution

From ?options (scrolling down a long ways to find warn...):

sets the handling of warning messages. If warn is negative all warnings are ignored. If warn is zero (the default) warnings are stored until the top–level function returns. If 10 or fewer warnings were signalled they will be printed otherwise a message saying how many were signalled. An object called last.warning is created and can be printed through the function warnings. If warn is one, warnings are printed as they occur. If warn is two or larger all warnings are turned into errors.

So using tryCatch you can specify a warning handler function to do stuff upon catching a warning:

> tryCatch(expr = {Kmeans((time.series), centers = centers, iter.max = 99)},
         warning = function(e) "Caught warning")
[1] "Caught warning"

Or you can set all warnings to be escalated to errors via:

options(warn = 2)

as described in the docs. Then,

> tryCatch(expr = {Kmeans((time.series), centers = centers, iter.max = 99)},
           error = function(e) "Caught error")
[1] "Caught error"

Although many people seem to prefer tryCatch, I often like the explicitness of try, which feels easier to me if I want to do some sort of if...else block after running the expression:

options(warn = 2)
attempt <- try(expr = {Kmeans((time.series), centers = centers, iter.max = 99)},silent = TRUE)
> class(attempt)
[1] "try-error"

So then you can check class(attempt) in an if statement (the preferred way is to check inherits(attempt,"try-error")) and do stuff accordingly.

Answered By - joran

Answer Checked By - Clifford M. (PHPFixing Volunteer)

Monday, August 15, 2022

[FIXED] How to see the elements hiding under "Other" in the output of a summary in R?

Issue

Solution

Sunday, July 17, 2022

[FIXED] How can I treat a "empty cluster" warning, in K-means function?

Issue

Solution

Total Pageviews

Featured Post

Why Learn PHP Programming

Monday, August 15, 2022

Issue

Solution

Sunday, July 17, 2022

Issue

Solution

Total Pageviews

Featured Post

Why Learn PHP Programming

Subscribe To