The field separator of the split
function is a regular expression, so you can split on =
OR ;
. If you know that $9
begins with "ID=", then
awk -v OFS='\t' '
$3 == "gene" {
split($9, id, /[=;]/)
print $1, $4, $5, id[2], $6, $7
}
' genes.gff3
If "ID=" is not necessarily at the beginning of the field, then there's a little more work to do:
awk -v OFS='\t' '
$3 == "gene" {
id = ""
len = split($9, f, /[=;]/)
for (i=1; i<len; i++) {
if (f[i] == "ID") {
id = f[i+1]
break
}
}
print $1, $4, $5, id, $6, $7
}
' genes.gff3