问题内容:

Hello stackoverflowers,

I wonder if I could use the %like% operator row-wise in the datatable between two columns of the same datatable.

The following reproducible example will make it more clear.

First prepare the data

``````library(data.table)

iris <- as.data.table(iris)
iris <- iris[seq.int(from = 1, to = 150,length.out = 5)]
iris[, Species2 := c('set', "set|vers", "setosa", "nothing" , "virginica")]
``````

Hence the dataset looks as follows.

``````   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species  Species2
1:          5.1         3.5          1.4         0.2     setosa       set
2:          4.9         3.6          1.4         0.1     setosa  set|vers
3:          6.4         2.9          4.3         1.3 versicolor    setosa
4:          6.4         2.7          5.3         1.9  virginica   nothing
5:          5.9         3.0          5.1         1.8  virginica virginica
``````

I would like to use something like the following command row-wise.

``````iris[Species%like%Species2]
``````

but it does not understand that I want it row-wise. Is that possible?
The result should be the 1,2,5 rows.

答案:

答案1:

One way would be to group by row:

``````iris[, .SD[Species %like% Species2], by = 1:5]
#   : Sepal.Length Sepal.Width Petal.Length Petal.Width   Species  Species2
#1: 1          5.1         3.5          1.4         0.2    setosa       set
#2: 2          4.9         3.6          1.4         0.1    setosa  set|vers
#3: 5          5.9         3.0          5.1         1.8 virginica virginica
``````

Or as per @docendodiscimus ‘s comment, in case there are duplicate entries, you can do:

``````iris[, .SD[Species[1L] %like% Species2[1L]], by = .(Species, Species2)]
``````

答案评论:

In case there are duplicate entries, I’d go for `iris[, .SD[Species[1L] %like% Species2[1L]], by = .(Species, Species2)]` instead of by-row grouping
Very nice solution
Good call @docendodiscimus thanks. I ll add this.

答案2:

You can’t pass a vector to the `pattern` argument of `%like%` since it calls upon `grepl/grep` and these aren’t vectorized. You could use `mapply` to call `%like%` for each row to get what you want:

``````iris[mapply(function(x,y) x %like% y, Species, Species2) ]

#   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species  Species2
#1:          5.1         3.5          1.4         0.2    setosa       set
#2:          4.9         3.6          1.4         0.1    setosa  set|vers
#3:          5.9         3.0          5.1         1.8 virginica virginica
``````

答案3:

`%like%` is just a wrapper around `grepl`, so the pattern (right-hand side) can only be length 1. You should be seeing a warning about this.

The `stringi` package lets you vectorize the `pattern` argument.

``````library(stringi)

iris[stri_detect_regex(Species, Species2)]
``````

If you like the operator style instead of the function, you can make your own:

```````%vlike%` <- function(x, y) {
stri_detect_regex(x, y)
}

iris[Species %vlike% Species2]
#    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species  Species2
# 1:          5.1         3.5          1.4         0.2    setosa       set
# 2:          4.9         3.6          1.4         0.1    setosa  set|vers
# 3:          5.9         3.0          5.1         1.8 virginica virginica
``````

原文地址：

https://stackoverflow.com/questions/47755000/filter-by-using-like-between-two-columns-of-the-data-table