Posts: 4,665
Threads: 1,502
Joined: Sep 2016
i am looking for a script (or function to code with) that can detect if a file of code given to it
is or
is not Python code. it does not need to be exact. it should generally come up with the same answer as a human programmer who knows enough Python to get some code to work. it should
not depend on any specific ways to cause the Python interpreter to run (such as a Unix hash-bang). it does
not need to work in case someone is trying to fool it (e.g. Java code made to look Pythonic). it should
not try to test compile its test script.
has anyone seen such a script?
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.
Posts: 4,900
Threads: 79
Joined: Jan 2018
(Apr-09-2026, 02:12 AM)Skaperen Wrote: it should not try to test compile its test script.
Why is that so? Trying to compile looks like the most obvious solution.
You can already run the Python tokener in order to eliminate some languages. Then one way or another you need to perform syntax analysis. One problem with Python's own parser is that the result may depend on the version of Python used in the program. Anyway, you must run
some parser. You could perhaps run a function such as
pygments.lexers.guess_lexer() to guess the language.
« We can solve any problem by introducing an extra level of indirection »
Posts: 4,665
Threads: 1,502
Joined: Sep 2016
Apr-09-2026, 09:57 PM
(This post was last modified: Apr-09-2026, 09:57 PM by Skaperen.)
(Apr-09-2026, 04:57 AM)Gribouillis Wrote: Trying to compile looks like the most obvious solution.
i agree that it
looks like the most obvious solution. however, it is possible to do a compile afterwards if the code looks like it is Python code. IOW, this would be a front end test where compiling is possibly next (no compile attempted if it does not look like it is Python). i don't need to what language it is.
Tradition is peer pressure from dead people
What do you call someone who speaks three languages? Trilingual. Two languages? Bilingual. One language? American.