close
Skip to content

Bug fix: perfect match hits (no indels or mismatches) were skipped#873

Merged
daviesrob merged 1 commit into
samtools:developfrom
nh13:nh_blast2sam_perfect_matches
Jun 20, 2018
Merged

Bug fix: perfect match hits (no indels or mismatches) were skipped#873
daviesrob merged 1 commit into
samtools:developfrom
nh13:nh_blast2sam_perfect_matches

Conversation

@nh13
Copy link
Copy Markdown
Member

@nh13 nh13 commented Jun 15, 2018

No description provided.

@nh13
Copy link
Copy Markdown
Member Author

nh13 commented Jun 15, 2018

Perhaps this tool should be removed, as newer versions of blast support SAM output?

@daviesrob
Copy link
Copy Markdown
Member

blastn's SAM output seems very odd. When I try this query sequence:

>test
CTTGGTATTTACTCAAAGGAGTTGAAGACATGTCCACAAAAAAACCTGCACACAGATATTCATAGCAGCTTTATTCGTAA
TTGCCAAAACTTGAAAGCAACCAAGATACCCTTCAGTAGGTGAATGGAGAAATAAACTGTGGTACATCCAGATAATAGAA

against GCA_000001405.15_GRCh38_full_analysis_set.fna I get output like this:

@HD	VN:1.2	SO:coordinate	GO:reference
@SQ	SN:Query_1	LN:160
@PG	ID:0	VN:2.7.1+	CL:blastn -db GCA_000001405.15_GRCh38_full_analysis_set.fna -query /tmp/q -outfmt 17 	PN:blastn
chr9	0	Query_1	1	255	72373920H160M66020637H	*	0	0	*	*	AS:i:160	EV:f:2.21948e-78	NM:i:0	PI:f:100.00	BS:f:296.584
chr4	16	Query_1	1	255	18102142H29M2I60M1D70M172112252H	*	0	0	*	*	AS:i:94	EV:f:1.08505e-41	NM:i:3	PI:f:88.05	BS:f:174.705
[ ... rest of output snipped ... ]

So it's changed my query name from test to Query_1 and then used it as the reference instead. Meanwhile, it's taken the reference names from the blast database and used them as query names. I wonder if they really intended it to work that way?

Given that oddness, there is possibly some utility in keeping this script around even though it has its own problems (like it doesn't add any header lines).

@daviesrob daviesrob merged commit 475464c into samtools:develop Jun 20, 2018
@nh13
Copy link
Copy Markdown
Member Author

nh13 commented Jun 20, 2018

@daviesrob try -outfmt "17 SR SQ". The SQ will include query sequence (bases), while SR will swap query and reference, making it as you'd expect with SAM (header lines, query). The issue I have seen is that the blastn header will only include the references to which there are mappings, not all references. I'd vote to remove the script as it isn't well tested, and really we should patch blast.

I'd be happy to patch blast, but I only see source tarballs.

@nh13 nh13 deleted the nh_blast2sam_perfect_matches branch June 20, 2018 16:18
@daviesrob
Copy link
Copy Markdown
Member

Ah yes, that improves it. It still doesn't give the right query name, though.

I might add a note to its manual page to say there are better alternatives. I doubt many people use it given that you're the first to report a fairly obvious bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants