Multiple matches and multiple excludes in regular expressions in R using PCRE

问题内容:

I am pretty new to regular expressions in R and I am trying to match a vector of strings including some patterns and excluding some patterns. I searched on stackoverflow and it seems that no similar questions have been asked. Here is the vector of string mystring to be matched.

mystring <- ("fhwjantdesd", "unwanted", "fdedsifrfed", "undesired", "sdsyessd", "yedsfd")

In this mystring I want to figure out if mystring includes any permutation of 6 letters of “wanted” excluding the string “wanted”. Similarly, includes any permutation of the 7 letters of “desired” and 3 letters of “yes” excluding the string “desired” and “yes”.

So the expected output of grepl(pattern, mystring, perl = TRUE) should be:

[1] TRUE, FALSE, TRUE, FALSE, FALSE, TRUE

I want to use the perl option of grepl, which could speed up the function. Does anyone could provide some clues on this pattern? And could you explain what does each part of the pattern means cos I am just a starter in using PCRE. Thanks

问题评论:

    
Re “I want to use the perl option of grepl, which could speed up the function“, It could also slow it down, which I find more likely.
    
Write a program that generates all the permutations of the words. Join the letters of each permutation with .*. ([^L]* where L is the next letter should be more efficient than .*.) Join each of those strings with |. Once you’ve used this pattern to find the matches, filter out any matches that contains the original words. (That assumes you want FALSE for wantedanted)
    
does this solve your issue?

答案:

答案1:

Below code will work with some limitation.

grepl("(^((?!yes|wanted|desired).)*$)", mystring, perl=TRUE)

It will only exclude above words. That is according to your data.

答案评论:

答案2:

You can try like this

mystring <- c("fhwjantdesd", "unwanted", "fdedsifrfed", "undesired", "sdsyessd", "yedsfd")
Status <- NULL
str <- c("wanted", "desired", "yes")
index <- 1


for (i in mystring) {
  for (j in str) {
    char_length <- nchar(j)

    if (is.na(str_extract(string = i, pattern = j)) | str_extract(string = i, pattern = j) == F) {
      if (sum(unlist(strsplit(j, "")) %in% unlist(strsplit(i, ""))) >=  char_length) {
        Status[index] <- T
        break
      }
    }
  }

  if (is.na(Status[index])) {
    Status[index] = F
  }

  index <- index + 1
}

Status

  > Status
[1]  TRUE FALSE  TRUE FALSE FALSE  TRUE

答案评论:

原文地址:

https://stackoverflow.com/questions/47746724/multiple-matches-and-multiple-excludes-in-regular-expressions-in-r-using-pcre

Tags:, ,

添加评论

友情链接:蝴蝶教程