Newline between delimiters in awk

问题内容:

I’m parsing a file, containing strings with nginx GET request bodies. And sometimes it contains line breaks between two parts of the same request, so I can’t parse such request with awk.

I have two delimiters with awk -F'delimeter1: |delimiter2' and maybe I can somehow tell awk that there can be a line break between those delimiters, so it would process such two lines as one?

Thanks in advance.

Sample input:

[2017-12-04 20:53:07] [ERROR] [ID-XX] Get: sr=342x487&c64=(not set)&c1=Phones, MP3s, GPS&v=1&c33=427&d28=
Like
&je=0&s4d=4-b&c32=(not set)&ua=Opera/9.80 (Android; Opera Mini/32.0.2254/77.161; U; uk) Presto/2.12.423 Version/12.16&time=04/Dec/2017:20:52:02 +0200&qtype=get

[2017-12-04 21:03:07] [ERROR] [ID-YY] Get: sr=342x487&c64=(not set)&c1=Phones, MP3s, GPS&v=1&em=Exception: Error: [$sc:ind] Aborting!&ua=Opera/9.80 (Android; Opera Mini/30.0.2254/77.161; U; ru) Presto/2.12.423 Version/12.16&time=04/Dec/2017:21:03:07 +0200&qtype=get

[2017-12-04 19:40:02] [ERROR] [ID-ZZ] Get: el=search&dl=https://market.com/?dt=Market – Electronics Store | Web Store (Market.com)&id=104777577&a=770227875&t=pageview&ua=Mozilla/5.0 (Linux; Android 7.0; RNE-L21 Build/HUAWEIRNE-L21) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36&time=04/Dec/2017:19:39:04 +0200&qtype=get

Desired output (print ID and body (in “”) in one line and replace & with _&_ ):

ID-XX "sr=342x487_&_c64=(not set)_&_c1=Phones, MP3s, GPS_&_v=1_&_c33=427_&_d28=Like_&_je=0_&_s4d=4-b_&_c32=(not set)_&_ua=Opera/9.80 (Android; Opera Mini/32.0.2254/77.161; U; uk) Presto/2.12.423 Version/12.16_&_time=04/Dec/2017:20:52:02 +0200_&_qtype=get"
ID-YY "sr=342x487_&_c64=(not set)_&_c1=Phones, MP3s, GPS_&_v=1_&_em=Exception: Error: [$sc:ind] Aborting!_&_ua=Opera/9.80 (Android; Opera Mini/30.0.2254/77.161; U; ru) Presto/2.12.423 Version/12.16_&_time=04/Dec/2017:21:03:07 +0200_&_qtype=get"
ID-ZZ "el=search_&_dl=https://example.market.com/?dt=Market – Electronics Store | Web Store (Market.com)_&_id=104777577_&_a=770227875_&_t=pageview_&_ua=Mozilla/5.0 (Linux; Android 7.0; RNE-L21 Build/HUAWEIRNE-L21) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36_&_time=04/Dec/2017:19:39:04 +0200_&_qtype=get"

There aren’t much of those torn request body strings, most of them are in one line, as expected. Also, there’re only GET request with an error, so search pattern shouldn’t have to include Get (it’s not necessary).

问题评论:

2  
Show a short section of sample input and your corresponding desired output.
    
Thanks for your attention, @John1024. Hope my clarification is clear.
– cardinal-gray
8 hours ago
    
@cardinal-gray: There are always three sets of lines like these?
    
@Inian nope, there’s not much of those torn request bodies, most of them are in one line, as expected.
– cardinal-gray
7 hours ago
2  
@cardinal-gray, single posted item would not be enough, post a few Get requests … mixed single line and multiline

答案:

答案1:

Awk solution:

awk 'f{ if (/^\[/) { printf "\042\n"; f=0 } else printf("%s", $0) }
     / Get:/{ 
         f=1; gsub(/[\[\]]/, "", $4); id=$4; sub(/^.* Get: /, "");
         gsub("&", "_&_"); printf "%s \042%s",id,$0 
     }
     END{ if (f) printf "\042\n" }' file
  • / Get:/ – on encountering “Get request” line
    • f=1f is a marker indicating subordinate/inner processing
    • id=$4 – capturing ID field (for ex. ID-XX)

The output:

ID-XX "sr=342x487_&_c64=(not set)_&_c1=Phones, MP3s, GPS_&_v=1_&_c33=427_&_d28=Like&je=0&s4d=4-b&c32=(not set)&ua=Opera/9.80 (Android; Opera Mini/32.0.2254/77.161; U; uk) Presto/2.12.423 Version/12.16&time=04/Dec/2017:20:52:02 +0200&qtype=get"
ID-YY "sr=342x487_&_c64=(not set)_&_c1=Phones, MP3s, GPS_&_v=1_&_em=Exception: Error: [$sc:ind] Aborting!_&_ua=Opera/9.80 (Android; Opera Mini/30.0.2254/77.161; U; ru) Presto/2.12.423 Version/12.16_&_time=04/Dec/2017:21:03:07 +0200_&_qtype=get"
ID-ZZ "el=search_&_dl=https://market.com/?dt=Market – Electronics Store | Web Store (Market.com)_&_id=104777577_&_a=770227875_&_t=pageview_&_ua=Mozilla/5.0 (Linux; Android 7.0; RNE-L21 Build/HUAWEIRNE-L21) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.84 Mobile Safari/537.36_&_time=04/Dec/2017:19:39:04 +0200_&_qtype=get"

答案评论:

原文地址:

https://stackoverflow.com/questions/47748055/newline-between-delimiters-in-awk

Tags:, , ,

添加评论

友情链接:蝴蝶教程