Ask HN: How do systems (or people) detect when a text is written by an LLM
Hello guys, just curious about how can people or systems (computers) detect when a text was written by an LLM. My question is mainly focused to if there is some API or similar to detect if a text was written by an LLM. Thanks!!!
Discussion Highlights (19 comments)
dipb
Humans detect them mostly through pattern matching. However, for systems, my guess is that a ML model is trained on AI genres texts to detect AI generated texts.
moonu
Pangram is probably the best known example of a detector with low false positives, they have a research paper here: https://arxiv.org/pdf/2402.14873 . They do have an API but not sure if you need to request access for it. For humans I think it just comes down to interacting with LLMs enough to realize their quirks, but that's not really fool-proof.
Someone1234
They cannot. Unfortunately many believe they can, and it is impossible to disprove. So now real people need to write avoiding certain styles, because a lot of other people have decided those are "LLM clues." Bullets, EM Dash, certain common English phases or words (e.g. Delve, Vibrant, Additionally, etc)[0]. Basicaly you need to sprinkle subtle mistakes, or lower the quality of your written communications to avoid accusations that will side-track whatever youre writing into a "you're a witch" argument. Ironically LLM accusations are now a sign of the high quality written word. [0] https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
PufPufPuf
I "detect" them through overuse of some patterns, like "It's not X. It's Y." This is an artifact of the default LLM writing style, cross-poisoned through training on outputs -- not an "universal" property.
mjlee
People Look For: Specific language tells, such as: unusual punctuation, including em–dashes and semicolons; hedged, safe statements, but not always; and text that showcases certain words such as “delve”. Here’s the kicker. If you happen to include any of these words or symbols in your post they’ll stop reading and simply comment “AI slop”. This adds even less to the conversation than the parent, who may well be using an LLM to correct their second or third language and have a valid point to make.
booleandilemma
I'm not going to tell you. I don't want that information going into the dark forest :)
m_w_
I don’t think there’s a reliable system or API for doing so, unclear that arms race will ever favor the side of the detectors. As far as how I / other people do it, there are some obvious styles that reek of LLMs, I think it’s chatgpt. There’s a very common structure of “nice post, the X to Y is real. miscellaneous praise — blah blah blah. Also curious about how you asjkldfljaksd?" From today: This comment is almost certainly AI-generated: https://news.ycombinator.com/item?id=47658796 And I'm suspicious of this one too - https://news.ycombinator.com/item?id=47660070 - reads just a bit too glazebot-9000 to believe it's written by a person.
dezgeg
For HN comments, the LLMs seem to really like 2 or 3 paragraphs long responses. It's pretty obvious when you click a profile's comments and see every comment being that exact same structure.
RestartKernel
People look for tells, systems detect word distributions. Though neither is as reliable as active fingerprinting using an encoded watermark.
rcxdude
There are some systems which can use the LLMs themselves to detect writing (basically, if the text matches what the LLM would predict too well, it's probably LLM generated), but they are far from infallible (with both false positives and false negatives). There's also certain tropes and quirks which LLMs tend to over-use which can be fairly obvious tells but they can be suppressed and they do represent how some people actually write.
block_dagger
Em dashes, “it’s x, not y”, excessive emojis and arrows.
blanched
I don't think there's any reliable way to tell. To me, it often feels like the text version of the uncanny valley. But again, that's just "feels", I don't have proof or anything.
mghackerlady
Overuse of "it's not X, it's Y" kind of writing, strange shifts in writing or thinking patterns, and excessive formatting (or, when I'm on wikipedia especially, ineffective formatting (such as using MD where it isn't supported))
rwc
Contrastive negation continues to be a dead giveaway.
leumon
You can try to use an ai detector, here is a leaderboard of the best ones according to this benchmark: https://raid-bench.xyz/leaderboard Results should of course always be taken with a grain of salt, but in most cases detectors are quite good in my opinion.
gwbas1c
I don't think you can 100% detect AI content, because at some point someone will just prompt the AI to not sound like AI. I think the better question to ask is: What are your goals? Is it to prevent AI SPAM, or to discourage people copy-pasting AI? Those are two very different problems: in the case of AI SPAM you look for patterns of usage, (IE, unusually high interaction from a single IP, timing patterns around when things are read and the response comes in,) and in the other case it all comes down to cultural norms.
Havoc
You don't really. There are a couple of tells like em dashes and similar patterns but you should be able to suppress that with even a simple prompt.
noufalibrahim
It's a lot easier to detect when you mostly interact with non English speakers. I asked an LLM to rewrite this to make it nicer and got the following. I'd flag the first because I don't usually hear "majority of your interactions" in conversation but I might miss it. The second will probably get by me. As for the third, I never say "considerably easier" unless I'm trying to sound artificially posh. 1. It becomes much more noticeable when the majority of your interactions are with non-native English speakers. 2.It tends to stand out more when most of the people you interact with speak English as a second language. 3. It's considerably easier to identify when most of your interactions involve people whose primary language isn't English.
sigotirandolas
I don't look at whether the text is written by an LLM but at whether it has substance and whether the writer understands what they are doing and is respecting my time. If the text is full of punchy three word phrases or nonsense GenAI images then that's an obvious sign. But so is if the other person has some revolutionary project with great results but they can't really explain why their solution works where presumably many failed in the past (or it's a word salad, or some lengthy writing that doesn't show any signs of getting you to an "aha, that's some great insight" moment). A good sign is also if the author had something interesting going before 2022, and they didn't fall into the earliest low quality LLM waves. Unfortunately some genuinely talented people have started using LLMs to turbocharge their output while leaving some quality on the table nowadays, so I don't really know. I'm becoming a lot more sceptical of the Internet, to be honest.