Ruby look-behind with cyrillic and phrase ‘magist’

问题内容:

I would like to strip all urls except my site.

I wrote regex like

 r = 'https?:\/\/[\da-z\.-]+(?<!MY_DOMAIN)\.[a-z\.]{2,6}\/?[^\s\]\[\<\>]*'
 e = Regexp.new(e, Regexp::IGNORECASE | Regexp::MULTILINE), '***'
 text.gsub e, '***'

it’s work if ‘text’variable haven’t Cyrillic symbols and ‘r’ variable haven’t ‘magist’ – string.
If ‘text’ have Cyrillic then i got a error.

Examples:

r = 'https?:\/\/[\da-z\.-]+(?<!magist)\.[a-z\.]{2,6}\/?[^\s\]\[\<\>]*'
e = Regexp.new(r, Regexp::IGNORECASE | Regexp::MULTILINE)
text = 'Text https://magist.net/ text https://google.com text text'
text.gsub e, '***'
=> "Text https://magist.net/ text *** text text"

add ‘тест’ to any position

text = 'Text https://magist.net/ text https://google.com text тест text'
text.gsub e, '***'
=> RegexpError: invalid pattern in look-behind: /https?:\/\/[\da-z\.-]+(?<!magist)\.[a-z\.]{2,6}\/?[^\s\]\[\<\>]*/mi

Replace ‘magist’ to ‘magiat’ (s => a) then work fine with Cyrilic

r = 'https?:\/\/[\da-z\.-]+(?<!magiat)\.[a-z\.]{2,6}\/?[^\s\]\[\<\>]*'
e = Regexp.new(r, Regexp::IGNORECASE | Regexp::MULTILINE)
text.gsub e, '***'
=> "Text *** text *** text тест text"
text = 'Text https://magiat.net/ text https://google.com text тест text'
=> "Text https://magiat.net/ text https://google.com text тест text"

If i replace regex so:

r = 'https?:\/\/(?!magis)[\da-z\.-]+\.[a-z\.]{2,6}\/?[^\s\]\[\<\>]*'

It’s work fine to all strings.

Why look-behind don’t work with Cyrillic and ‘magist’ string? 🙂

问题评论:

    
What ruby version are you on? I have no issues running your code: everything runs smoothly.
    
ruby -v ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin16]
– DrobyshevAlex
5 hours ago
    
Huh darwin. There might be some problem induced, or it’s MacOS who screws things up as she always does. The code above runs without a glitch even on Ruby 1.9 on my local.
    
thank you! I will use the code on Debian. I hope it will work 🙂
– DrobyshevAlex
5 hours ago
1  
Seems to be related: bugs.ruby-lang.org/issues/13671

原文地址:

https://stackoverflow.com/questions/47750941/ruby-look-behind-with-cyrillic-and-phrase-magist

添加评论

友情链接:蝴蝶教程