# [Fixed] Error In Do_One(Nmeth) : Na/Nan/Inf In Foreign Function Call (Arg 1)

If you attempt to perform k-means clustering on data that contains missing values, NaN, or Inf, the following error will be raised: Error In Do_One(Nmeth) : Na/Nan/Inf In Foreign Function Call (Arg 1).

The K-means algorithm in R is unable to handle data with NA, NaN, or Inf values. By introducing these values, the mean and variance are no longer well defined, and the algorithm is unable to determine which cluster center is closest.

You can fix this error by replacing the Inf values with NA and then removing the rows with missing values with na.omit. Alternatively, the missing values can be imputed.

This tutorial will go over the error in detail and show you how to fix it using code examples.

## Example-1

Consider the following data frame, which contains a number of NaN, NA, and Inf values.

```df <- data.frame(var1=c(2, NaN, 4, 6, 7, Inf, 8, 6, 10, 12),
var2=c(NaN, 14, 14, 7, 7, 15, 10, 9, 9, Inf),
var3=c(22, NA, 19, 23, 25, 21, 19, 16, 12, 15))
df```
``` var1 var2 var3
1     2  NaN   22
2   NaN   14   NA
3     4   14   19
4     6    7   23
5     7    7   25
6   Inf   15   21
7     8   10   19
8     6    9   16
9    10    9   12
10   12  Inf   15```

Let’s try k-means clustering on the data frame with the kmeans() function:

```km <- kmeans(df, centers=3)
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)```

Because the data frame contains NA, NaN, and Inf values, the error occurs.

## Solution To This Error

### 1. Remove Rows

We must purge the data frame of values that kmeans cannot handle. Using a do.call, we will first replace the Inf values with NA.

```df_noinf <- do.call(data.frame,lapply(df, function(x) replace(x, is.infinite(x),NA)))
df_noinf```

We use lapply in the do.call to replace the Inf values in the data frame. Let’s take a look at the new data frame.

```  var1 var2 var3
1     2  NaN   22
2   NaN   14   NA
3     4   14   19
4     6    7   23
5     7    7   25
6    NA   15   21
7     8   10   19
8     6    9   16
9    10    9   12
10   12   NA   15```

Then we’ll use the na.omit() function to remove the rows with NA and NaN values.

```df_clean <- na.omit(df_noinf)
df_clean```

To see the clean data frame, run the following code:

```var1 var2 var3
3    4   14   19
4    6    7   23
5    7    7   25
7    8   10   19
8    6    9   16
9   10    9   12```

We can now run the k-means clustering algorithm to obtain cluster information now that we have a clean data frame.

```km <- kmeans(df_clean, centers=3)
km```

Let’s run the code to see what happens:

```K-means clustering with 3 clusters of sizes 3, 1, 2

Cluster means:
var1 var2 var3
1  6.0   11   18
2 10.0    9   12
3  6.5    7   24

Clustering vector:
3 4 5 7 8 9
1 3 3 1 1 2

Within cluster sum of squares by cluster:
 28.0  0.0  2.5
(between_SS / total_SS =  81.4 %)```

### 2. Impute Values

If we want to keep the number of rows the same, we can substitute values for the NA and NaN values.

```> df_noinf\$var1[is.na(df_noinf\$var1)] <- mean(df_noinf\$var1, na.rm=T)
> df_noinf\$var2[is.na(df_noinf\$var2)] <- mean(df_noinf\$var2, na.rm=T)
> df_noinf\$var3[is.na(df_noinf\$var3)] <- mean(df_noinf\$var3, na.rm=T)
df_noinf```

The subscript operator is used in the preceding code to manually impute missing values in each column using the mean for the column containing the missing value. Let’s take a look at the new data frame.

``` var1   var2     var3
1   2.000 10.625 22.00000
2   6.875 14.000 19.11111
3   4.000 14.000 19.00000
4   6.000  7.000 23.00000
5   7.000  7.000 25.00000
6   6.875 15.000 21.00000
7   8.000 10.000 19.00000
8   6.000  9.000 16.00000
9  10.000  9.000 12.00000
10 12.000 10.625 15.00000```

We can now run the k-means clustering algorithm to obtain cluster information now that we have a clean data frame.

```km <- kmeans(df_noinf, centers=3)
km```

Let’s run the code to see what happens:

```K-means clustering with 3 clusters of sizes 3, 4, 3

Cluster means:
var1      var2     var3
1 9.333333  9.541667 14.33333
2 6.437500 13.250000 19.52778
3 5.000000  8.208333 23.33333

Clustering vector:
 3 2 2 3 3 2 2 1 1 1

Within cluster sum of squares by cluster:
 29.09375 26.41377 27.42708
(between_SS / total_SS =  70.8 %)```