How to search groups of 3 lines with a certain pattern?

How to search groups of 3 lines with a certain pattern?

What I want to do is simply search and print groups of 3 consecutive lines in the following file:
C30 1.86494717 7.48500210 9.88662475
O86 1.23405589 6.84423578 21.24967645
O88 5.28196032 8.12576842 21.24967645
O90 3.01950053 8.12576842 3.03566806
C32 8.01630633 7.48500210 15.95796089
O92 1.07505084 8.12576842 9.10700419
O94 7.22641001 8.12576842 15.17834032
O96 6.07185664 6.20346947 22.02929701
xxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxxx
O111 3.82376560 6.83952632 25.21182108
H29 3.45376598 7.57952642 25.95182118
H30 4.93376561 6.83952632 25.21182108
O112 2.46658853 6.91893543 28.05848681
H31 2.09658891 7.65893553 28.79848692
H32 3.57658854 6.91893543 28.05848681
O113 6.25457469 6.74244996 26.28735053
H33 5.88457507 7.48245006 27.02735064
H34 7.36457470 6.74244996 26.28735053

I want to find in this case the lines which follow this pattern “O” “H” “H”:
Ox
Hx
Hx

I tried something with grep but it didn’t work properly.
Any suggestions?
Many thanks in advance.

Solutions/Answers:

Solution 1:

If i understand what you want this sed should work

sed '/^O/{N;/\nH/{N;/\nH[^\n]*$/p}};d' file

O111               3.82376560          6.83952632         25.21182108
H29                3.45376598          7.57952642         25.95182118
H30                4.93376561          6.83952632         25.21182108
O112               2.46658853          6.91893543         28.05848681
H31                2.09658891          7.65893553         28.79848692
H32                3.57658854          6.91893543         28.05848681
O113               6.25457469          6.74244996         26.28735053
H33                5.88457507          7.48245006         27.02735064
H34                7.36457470          6.74244996         26.28735053

Edit

I messed up the above won’t work if there is a multiple of two O lines together.

Below will though although its quite a bit longer…

sed '/^O/{:1;N;/\nH/{N;/\nH[^\n]*$/p};/\nO[^\n]*/{s/.*\n//;b1}};d' file

Solution 2:

Using newer version of GNU grep having -z option to match multiline inputs :

$ grep -Pzo 'O[^\n]+\nH[^\n]+\nH[^\n]+' file.txt
O111               3.82376560          6.83952632         25.21182108
H29                3.45376598          7.57952642         25.95182118
H30                4.93376561          6.83952632         25.21182108
O112               2.46658853          6.91893543         28.05848681
H31                2.09658891          7.65893553         28.79848692
H32                3.57658854          6.91893543         28.05848681
O113               6.25457469          6.74244996         26.28735053
H33                5.88457507          7.48245006         27.02735064
H34                7.36457470          6.74244996         26.28735053

You can also use the -M option of pcregrep to match multiline inputs :

$ pcregrep -M 'O[^\n]+\nH[^\n]+\nH[^\n]+' file.txt 
O111               3.82376560          6.83952632         25.21182108
H29                3.45376598          7.57952642         25.95182118
H30                4.93376561          6.83952632         25.21182108
O112               2.46658853          6.91893543         28.05848681
H31                2.09658891          7.65893553         28.79848692
H32                3.57658854          6.91893543         28.05848681
O113               6.25457469          6.74244996         26.28735053
H33                5.88457507          7.48245006         27.02735064
H34                7.36457470          6.74244996         26.28735053

Solution 3:

gawk -vRS='(^|\n)O[^\n]*\nH[^\n]*\nH[^\n]*' '{print RT}'

^ matches the beginning of the file, not the beginning of any line (this may be a dark corner).
RT is the text that matched RS.
You need GNU Awk for this; standard Awk doesn’t allow regex record separators.

Related:  How do I set the value in a command shell for dotnet core

Solution 4:

You can use this awk:

awk '/^O/ { oline=NR; a=$0; next }
     /^H/ && oline && NR==(oline+1) { hline=NR; a=a RS $0; next }
     /^H/ && hline && NR==(hline+1) {
       print a ORS $0;
       aline=hline=0
}' file

O111               3.82376560          6.83952632         25.21182108
H29                3.45376598          7.57952642         25.95182118
H30                4.93376561          6.83952632         25.21182108
O112               2.46658853          6.91893543         28.05848681
H31                2.09658891          7.65893553         28.79848692
H32                3.57658854          6.91893543         28.05848681
O113               6.25457469          6.74244996         26.28735053
H33                5.88457507          7.48245006         27.02735064
H34                7.36457470          6.74244996         26.28735053

Solution 5:

awk '
{ k = substr($0,1,1) }
(k=="H") && (prevNR["H"]==(NR-1)) && (prevNR["O"]==(NR-2)) {
    print prevRec["O"] ORS prevRec["H"] ORS $0
}
{ prevNR[k]=NR; prevRec[k]=$0 }
' file
O111               3.82376560          6.83952632         25.21182108
H29                3.45376598          7.57952642         25.95182118
H30                4.93376561          6.83952632         25.21182108
O112               2.46658853          6.91893543         28.05848681
H31                2.09658891          7.65893553         28.79848692
H32                3.57658854          6.91893543         28.05848681
O113               6.25457469          6.74244996         26.28735053
H33                5.88457507          7.48245006         27.02735064
H34                7.36457470          6.74244996         26.28735053

References