Rendered at 09:55:51 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
mosselman 17 minutes ago [-]
I feel like the audience of the file is more for me the reader rather than the LLM.
> Add this file to your AI assistant's system prompt or context to help it avoid
common AI writing patterns.
So if I put this into my LLM's conversation it is like I am instructing it to put this into its AI assistant's system prompt, so the AI assistant's AI assistant.
The alternative is to say:
"Here is a list of common AI tropes for you to avoid"
All tropes are described for me to understand what that AIs do wrong:
> Overuse of "quietly" and similar adverbs to convey subtle importance or understated power.
But this in fact instructs the assistant to start overusing the word 'quietly' rather than stop overusing it.
This is then counteracted a bit with the 'avoid the following...' but this means the file is full of contradictions.
Instead you'd need to say:
"Don't overuse 'quietly', use ... instead"
So while this is a great idea and list, I feel the execution is muddled by the explanation of what it is. I'd separate the presentation to us the user of assistants and the intended consumer, the actual assistants.
I've had claude rewrite it and put it in this gist:
I work on research studying LLM writing styles, so I am going to have to steal this. I've seen plenty of lists of LLM style features, but this is the first one I noticed that mentions "tapestry", which we found is GPT-4o's second-most-overused word (after "camaraderie", for some reason).[1] We used a set of grammatical features in our initial style comparisons (like present participles, which GPT-4o loved so much that they were a pretty accurate classifier on their own), but it shouldn't be too hard to pattern-match some of these other features and quantify them.
If anyone who works on LLMs is reading, a question: When we've tried base models (no instruction tuning/RLHF, just text completion), they show far fewer stylistic anomalies like this. So it's not that the training data is weird. It's something in instruction-tuning that's doing it. Do you ask the human raters to evaluate style? Is there a rubric? Why is the instruction tuning pushing such a noticeable style shift?
The RLHF is what creates these anomalies. See delve from kenya and nigeria.
Interestingly, because perplexity is the optimization objective, the pretrained models should reflect the least surprising outputs of all.
astrange 19 minutes ago [-]
The newer Claude models constantly use the word "genuinely" because Anthropic seems to have forcibly trained them to claim to be "genuinely uncertain" about anything they don't want it being too certain about, like whether or not it's sentient.
I wonder if th style shift has anything to do with training for conversation (ie. tuning models to respond well in a chat situation)?
red_hare 3 hours ago [-]
I wonder if it has to do with how meaning is tied to the tokens. c+amara+derie (using the official gpt-5 tokenizer).
There's also just that weird thing where they're obsessed with emoji which I've always assumed is because they're the only logograms in english and therefore have a lot of weight per byte.
astrange 19 minutes ago [-]
OAI puts instructions in the system prompt to use or not use emoji depending on your style settings.
albert_e 7 hours ago [-]
There is an organization named Tapestry (parent of Coach Inc).
Wonder how they can avoid the trop while not censoring themselves out.
Jordan-117 10 hours ago [-]
Wikipedia also has an exhaustive guide, though it's not fun finding tropes you use yourself (I'm very guilty of the false range "from X to Y" thing):
Another one that seems impossible for LLMs to avoid: breaking article into a title and a subtitle, separated by a colon. Even if you explicitly tell it not to, it'll do it.
malfist 8 hours ago [-]
Thats the thing about AI writing though. Those tropes are things humans do too. But like once or twice in an article. Not every single freaking paragraph
glenstein 7 hours ago [-]
I also think you can easily get overzealous with it and diagnose increasingly large percentages of ordinary human language as "tropified" due to being part of recognizable cadences. I think most of the things on the list are legit but I think it starts to get to a gray area where it's borrowing ordinary mannerisms of speech that aren't necessarily egregious.
lucumo 3 hours ago [-]
Yes, and it's a detection loop without feedback. You can never verify that a piece of work in the wild is actually AI. The poster is the only one who really knows, and they'll always say it's not.
This is a problem, because you can easily get stuck in a self-reinforcing loop. You feel strengthened in your convictions that you're good at ferreting out LLM-speak because you've found so much of it. And you find so much of it because you feel confident you're good at it. Nobody ever corrects you when you're wrong.
Combine that with general overconfidence and you get threads where every other post with correct grammar gets "called out" as AI generated. It's pretty boring.
There's a similar effect with contentious subject. You get reams and reams of posts calling the other side out for being part of a Russian/Israeli/Iranian/Chinese troll network. There's no independent falsification or verification for that, so people just get strengthened in their existing beliefs.
grey-area 1 hours ago [-]
At this point it’s pretty easy to detect unaltered LLM output because it is such bad writing. That will change over time with training I would hope. At some point I imagine it will be hard to tell.
I honestly don’t know what sites like this will do when that happens and the only way of detecting LLMs is that they are subtly wrong or post too much, we’d be overrun with them.
Not sure if we should be hopefully or fearful that they will improve to be undetectable but I suspect they will.
sebastiennight 20 minutes ago [-]
I wouldn't say it's "bad writing", but rather that the sheer volume of it allows the attentive reader to quickly identify the tropes and get bored of them.
Similar to how you can watch one fantastic western/vampire/zombie/disaster/superhero movie and love it, but once Hollywood has decided that this specific style is what brings in the money, they flood the zone with westerns, or superhero movies or whatever, and then the tropes become obvious and you can't stand watching another one.
If (insert your favorite blogger) had secret access to ChatGPT and was the only person in the world with access to it, you would just assume that it's their writing style now, and be ok with it as long as you liked the content.
lucumo 31 minutes ago [-]
> At this point it’s pretty easy to detect unaltered LLM output because it is such bad writing.
And yet people seem to still be terrible at that. Someone uses an em-dash and there's always a moron calling it out as AI.
> I honestly don’t know what sites like this will do when that happens and the only way of detecting LLMs is that they are subtly wrong or post too much, we’d be overrun with them.
My personal take is that it doesn't really matter. Most posts are already knee-jerk reactions with little value. Speaking just to be talking. If LLMs make stupid posts, it'll be basically the same as now: scroll a bit more. And if they chance upon saying something interesting then that's a net gain.
The very first heading in this doc was a giveaway even after your de LLM process 'The Em-Dash Pivot: "Not X—but Y"'. This title is so much AI like. I think it's the "The" in title which is putting me off and coming off as assigning unnecessary importance which is mentioned in the wiki.
6 hours ago [-]
matusp 4 hours ago [-]
I tried using Gemini for some light historical research. It could not stop using tech metaphors. Lords were the CEOs of their time, pope was the most important influencer, vassal uprisings were job interviews, etc. The metaphors were almost comically useless and imprecise, and Gemini kept using them even when I explicitly asked it to not do that.
lucumo 3 hours ago [-]
I think that's Gemini trying to personalize the answer specifically for you. It really leans heavily into that to the point of being galling.
You can give it additional instructions in the settings, but you have to be careful with that too. I've put my tech stack and code preferences in there to get better code examples. A while later I asked it about binary executable formats and it started ending every answer with "but the JVM and v8 take care of that for you."
Which is both funny in an "I, Robot" kind of way, and irritating. So I told it to ignore my tech stack. I have a master's in CS and can handle a bit of technical detail.
Turns out, Gemini learned sarcasm. Every following answer in that thread got a paragraph that started with something like "But for your master brain, this means..."
joshvm 10 hours ago [-]
No mention of Claude/ChatGPT's favourite new word genuine and friends? They also like using real and honest when giving advice. Far as I can tell this is a new-ish change.
> Honestly? We should address X first. It's a genuine issue and we've found a real bug here.
Honorable mention: "no <thing you told me not to do>". I guess this helps reassure adherence to the prompt? I see that one all the time in vibe coded PRs.
pixelmelt 2 hours ago [-]
I found a new one in claude recently with "Fair enough, ..."
glenstein 7 hours ago [-]
There are some subreddits where this trope is completely out of control. For better or worse I follow the NBA subreddit and in the comment sections the number of people who throw in honestly as a qualifier is like way more than you would assume from natural conversation.
stingraycharles 3 hours ago [-]
I really don’t understand what’s wrong with people using LLMs for these types of mundane conversations. There’s nothing to gain and it destroys value of online discourse.
wisemang 6 hours ago [-]
I’ve noticed the honestly thing for sure.
But I feel like I’ve noticed an uptick in people using the adverb “genuinely” in what I genuinely believe to not be AI generated comments, articles, etc. Maybe it’s just me, I got similar vibes about the word efficacy a few years ago, before the ascent of GenAI (but after the pandemic — again, maybe just me).
pinum 9 hours ago [-]
Similarly, "X that actually works"
layer8 9 hours ago [-]
...and half of the time still doesn't do what you want.
nprateem 2 hours ago [-]
And the "final version" statement. Irrelevant as obviously it has no idea how many iterations you'll go through
thih9 6 hours ago [-]
> no <thing you told me not to do>
I see this so often. Sometimes it’s just “no react hooks”, other times it gets literal and extra unnatural, like: “here’s <your thing>, no unnecessary long text explanation”. Perhaps we’re past AGI and this is passive aggressiveness ;)
If this bugs you, open chatGPT personality settings, choose “efficient” base style, and turn off the enthusiasm and warmth sliders
It makes a tremendous difference. Almost everything on this list is the emotional fluff ChatGPT injects to simulate a personality.
esperent 8 hours ago [-]
[flagged]
FiniteIntegral 2 hours ago [-]
A subtle tell for generated text is just how damn flat it is to read. Not that technical documentation require some form of grand prose, but how unspecific the text can truly get. Reading a high school persuasive essay can have more detail, and those are often just written for a grade.
I can understand someone needing help with writing but getting an agent to do the job for you feels like a personal defeat.
mvkel 11 hours ago [-]
Weirdly, LLMs seem to break with these instructions. They simply ignore them, almost as if the pretraining/RL weights are so heavy, no amount of system prompting can override it
RandomWorker 11 hours ago [-]
It's a beauty. We can easily detect the issues with Youtubers that generate scripts from this tool. I've noticed these tropes, after 30 seconds, remove, block, and do not recommend any further. I hope to train the algorithm to detect AI scripts and stop recommending me those videos. It's honestly turned me off from YouTube so much, or I find myself going to my "subscribed" tab and going to content creators that still believe in the craft.
antinomicus 9 hours ago [-]
I’ve taken it one step further. YouTube as a front end is awful, and I’ve had enough. Tons of little dark patterns made to keep you on the site, annoying algorithms taking you places you never want to go, shitty ai slop, the whole nine yards. But I still like certain channels. As a result I’m doing everything self hosted now - not just YouTube but literally every single piece of digital media I consume. For YouTube I had to create a rotating pool of 5 residential ISP proxies - replaced as soon as YouTube download bot restrictions kick in - and rotated weekly either way.
With this I am able to get all my favorite subs onto my actual hard drive, with some extra awesome features as a result: I vibe coded a little helper app that lets me query the transcript of the video and ask questions about what they say, using cheap haiku queries. I can also get my subs onto my jellyfin server and be able to view it in there on any device. Even comments get downloaded.
All these streamers have gone too far trying to maximize engagement and have broken the social contract, so I see this as totally fair game.
duskwuff 9 hours ago [-]
IIRC, it's well documented that negative instructions tend to be ineffective - possibly through some sort of LLM analogue to the "pink elephant paradox", or simply because the language models are unable to recognize clichés until they've already been generated.
esperent 8 hours ago [-]
That was definitely true with early LLMs but I don't know if that's still the case. Certainly not as strong as it used to be. I think now most negative instructions are followed quite well but there's still a few things that must be deeply embedded from pretaining that are harder to avoid - these specific annoying phrasings, for example.
esperent 8 hours ago [-]
I assume it'll work more as a review pass rather than expecting good results outright. For all kinds of things like this where I feel like I'm fighting the LLM, doing the initial work then auditing it seems to be the best approach (the other one is writing all kinds of tests, LLMs including Opus 4.6 love to fudge tests just as much as they love telling you how insightful you are).
carleverett 11 hours ago [-]
"The "It's not X -- it's Y" pattern, often with an em dash. The single most commonly identified AI writing tell. Man I f*cking hate it. AI uses this to create false profundity by framing everything as a surprising reframe. One in a piece can be effective; ten in a blog post is a genuine insult to the reader. Before LLMs, people simply did not write like this at scale."
This one hit home... the first time I ever saw Claude do it I really liked it. It's amazing how quickly it became the #1 most aggravating thing it does just through sheer overuse. And of course now it's rampant in writing everywhere.
zahlman 3 hours ago [-]
I would say that the constant attempt to create false profundity (as you call it), itself, is more of a tell than any of the rhetorical constructs used to do it.
bitwize 10 hours ago [-]
If you sound like a car ad from Road & Track, I'm going to flag you as bot.
"No rough handling. No struggles to accelerate. Just pure performance. The new Toyota GT. It's not just a car—it's a revolution."
Most of the tropes listed on this page give text a more "car ad" (or sometimes "movie trailer") quality. I wonder if magazine scans and press releases unduly weighted the training set.
Retr0id 10 hours ago [-]
I think it's more likely that car ads and chatbots are both optimizing for the same thing i.e. grabbing the audience's attention.
nh23423fefe 11 hours ago [-]
Weird to care about a harmless construction along with punctuation.
andrew_lettuce 7 hours ago [-]
Construction paired with punctuation is literally the entire point of written communication.
vntok 2 hours ago [-]
No it's just the medium. The point is to communicate.
You can test this quite easily, by checking and hopefully realizing that you in fact can understand written documents with syntax errors, emails with typos and road signs with improper casing or sentence construction.
ashivkum 10 hours ago [-]
weirder still to immerse your brain in sewage and take pride in your lack of discernment.
mapmeld 10 hours ago [-]
If you participate in certain online communities where posts used to generally share real ideas and ask real beginner questions, you get tired of it. I am especially tired of seeing "it's not X - it's Y" on /r/MachineLearning posts, claiming that they've found some "geometry" or basic PyTorch code which they think will solve AI hallucinations. And it's becoming clear these people are not just doing this sort of a thing on a whim, but spending days in delusional conversations with the AI.
jimmis 3 hours ago [-]
Isn't that just the state of every ai-related subreddit at this point?
newAccount2025 6 hours ago [-]
Great list. Invented Concept Labels is the one I think I get most frustrated by. When exploring new areas, I’ll read its paragraphs of acronyms and weird words and think I just don’t know some term of art, and as soon as I ask for a definition it’s like, “I just made that up, that’s not a formal term, blah blah blah.”
Honestly, you need a tailored one of these for each of the major LLM model/version pairs. Claude and Gemini don't exhibit all of the same tropes in the same severities as OpenAI's GPT series, and within each of those, each revision sometimes exhibits substantial variance from the stylistic propensities of its immediate predecessor.
layer8 9 hours ago [-]
As the article points out at the end, these aren't bad per se. The issue is that LLMs overuse them, and we're all getting the same(-ish) LLM. It's not so different from how people sometimes have their idiosyncratic phrasings they use all the time.
FartyMcFarter 10 hours ago [-]
The article has been slashdotted so I don't know if this one is in there but:
One I've seen Gemini using a lot is the "I'll shoot straight with you" preamble (or similar phrasing), when it's about to tell me it can't answer the question.
1970-01-01 10 hours ago [-]
What we really need is a browser plugin underlining these patterns, especially for comments.
Can someone explain why LLM's write like this when most humans don't?
MDWolinski 2 hours ago [-]
I asked ChatGPT about that and it gave a nicely reasoned explanation on what AI produces compared to humans.
But that being said, the problem I think is that people treat the output from LLMs as final.
It should be treated more as idea generation or early draft to get over the “staring at a blank page” and get the creative juices flowing and creating your own content.
Having purely AI generated content and eventually feeding the algorithms and soon enough every sounds the same (already does in a lot of places).
freetonik 3 hours ago [-]
Most humans don’t, but maybe “most humans” do? As in, on average, as a collective, regressed to the mean of mediocrity and devoid of personality, we write like this? It’s not self-deprecating, it’s humbling.
dtf 3 hours ago [-]
I suppose it might be because humans that use LLMs write like this.
Don't forget "The Ludlum Delusion"- every header in an article or readme reads like a Robert Ludlum novel title, ie "The [Noun:0.9|Adjective:0.1] [Noun]".
bryanrasmussen 9 hours ago [-]
This makes me think of the attractiveness of overly bad writing to writers, as a challenge, the most obvious example being the bulwer-lytton award, or the instinctive ignoring of instructions from fiction magazines that might say "we don't want any stories about murderous grandparents, French bashing, bestiality, bank robbers from the future, or kind-hearted Nazis - and especially do not try to be super brilliant and funny and send us your story about kind-hearted Nazi bank-robbing french-bashing grandparents that like killing people and having sexy fun times with barnyard animals! Because every original thinker like you thinks they are the first to have come up with that idea!" and then as a writer you feel challenged to do exactly what they say they don't want because what a glorious triumph if you manage to outdo everyone and get your dreck published because it's dreck that is so bad it's good!
It does not seem like there are lots of people who are perversely inclined to write a story with all these tropes and words in it, but surely there must be some, because if you make something that beats the LLM (by being creatively good) using all the crap the LLM uses, it would seem some sort of John Henry triumph (discounting the final end of John Henry of course, which is a real downer)
lorenzk 4 hours ago [-]
This changes everything.
vntok 2 hours ago [-]
You're absolutely right!
nprateem 2 hours ago [-]
What do people expect? You use an LLM, don't tell it your preferred writing style and get annoyed when it falls back to defaults.
All those tropes have their place in certain contexts. AI overusing them is because they have no memory across all they've written.
Each conversion is a new chat so it's like "I haven't used delve in a while, think I'll roll out that bad boy"
And then you try to fix this by telling it what not to do which doesn't work very well, so...
netsec_burn 9 hours ago [-]
Another trope: longer README.md's than anyone would make, or want.
NewsaHackO 9 hours ago [-]
Yes, to me this is a huge tell. Especially when it goes into detail about pros and cons (using a table) on the most superficial points.
verdverm 5 hours ago [-]
and all those emoji... sometimes to the point they are on most lines and commit messages.
xgulfie 9 hours ago [-]
If only we could fix how it writes like garbage
bitwize 10 hours ago [-]
You know how no one ever wrote their own software and then generative AI came along and suddenly we could have app meals home-cooked by barefoot developers? (The use of such cottagecore terminology for a process that requires being an ongoing client of a hundred-gigabuck, planet-burning megacorporation rubs me in many wrong ways.)
If AI finally gets rid of the thing that drove me nuts for years: "leverage" as a verb mean roughly "to use"—when no human intervention seems to work, then I shall be over-the-moon happy. I once worked at a place where this particular word was lever—er, used all the damn time and I'd never encountered something so NPC-ish. I felt like I was on The Twilight Zone. I could've told you way back then that you sounded like a bot doing that, now people might actually believe me and thank god.
I will stick by the em dashes however. And I might just start using arrows too. Compose - > → right arrow. Not even difficult.
crabmusket 6 hours ago [-]
> (The use of such cottagecore terminology for a process that requires being an ongoing client of a hundred-gigabuck, planet-burning megacorporation rubs me in many wrong ways.)
I hadn't noticed this - great point. To be fair the "home cooked meal" metaphor comes from 2020, predating genAI coding[1]. But even then, CPUs themselves are so normalised that we just kind of... forget how vertiginously complex the entire supply chain is.
At least with personal computers and your own programming skills, you could live off-grid and hack, and be kinda cottagecore, like Paul Lutus or those 100rabbits people. But if you depend on plugging yourself into the sloppotron to do anything, that's many things but self-sufficient isn't one. And self-hosted sloppotrons aren't there yet and require technical skills to set up besides.
cyanydeez 11 hours ago [-]
This kills the headline baiting tech blogger.
charlieflowers 10 hours ago [-]
This list reads like, "AIs are not your typical braindead person on the street. They actually use a decent but not crazily advanced vocabulary."
I mean, "tapestry" is a great word for something that is interconnected. Why not use it?
agnishom 9 hours ago [-]
> (let's play cat and mouse!).
No thanks, I hate this large scale social experiment
tiahura 10 hours ago [-]
Many of these are standard fare in legal writing.
Negative parallelism is a staple of briefs. "This case is not about free speech. It is about fraud." It does real work when you're contesting the other side's framing.
Tricolons and anaphora are used as persuasion techniques for closing arguments and appellate briefs.
Short punchy fragments help in persuasive briefs where judges are skimming. "The statute is unambiguous."
As with the em dash - let's not throw the baby out with the bath water.
grey-area 48 minutes ago [-]
They can work well when sparingly used and well thought-out, unfortunately LLM use is more on a par with:
‘It’s not mashed potato. Its potatoes lovingly mixed to perfection with butter and milk which quietly dominate the carrots beside them.’
The words are in the right order, th grammar is ok, but the subject is so banal as to undermine the melodramatic style chosen and they often insert several per paragraph.
cubefox 2 hours ago [-]
Another popular one is ending headlines with a remark/alternative in parentheses. Especially "(why this matters)".
More generally, it's interesting that many different LLMs have differences in their favorite tropes but converge on broadly similar patterns. Of course ChatGPT and its default persona (you can choose others in the settings, but most people don't do that) is overrepresented in these examples. For example, the article doesn't mention the casual/based tone of Grok that often feels somewhat forced.
xpe 7 hours ago [-]
> ... But prose? That's from human to human, it's sacred and meant for other people. Using AI for that is deceitful.
I understand the sentiment. Meaning I think I understand some of the underlying frustration. But I don't care for the tone or the framing or the depth of analysis (for there isn't much there; I've seen the "if you didn't write it, why should I read it" cliché before *, and it ain't the only argument in town). Now for my detailed responses:
1. In the same way the author wants people to respect other people, I want the author to respect the complexity of the universe. I'm not seeing that.
2. If someone says "I wrote this without any LLM assistance" but do so anyway, THAT is clearly deceptive.
3. If you read a page that was created with LLM assistance, it isn't reasonable for you to say the creator was being deceptive just because you assumed. It takes two to achieve deception: both the sender and the receiver.
4. If you read a page on the internet, it is increasingly likely there was no human in the loop for the article at all. Good luck tracing the provenance of who made the call to make it happen. It might well be downstream of someone's job. (Yes, we can talk about diffusion of responsibility, etc., that's fair game -- but if you want to get into the realm of moral judgments, this isn't going to be a quick and tidy conversation)
5. I think the above comment puts too much of a "oh the halcyon days!" spin on this. Throughout history, many humans, much of the time, are largely repackaging things we had heard before. Unfortunately (or just "in reality") more of us are catching on to just how memetically-driven people are. We are both individuals and cogs. It is an uncomfortable truth. That brainwashed uncle you have is almost certainly a less reliable source of information than Claude.
6. The web has crappy incentives. It sucks. Yes, I want people to behave better. That would be nice, but I can't realistically expect people to behave better on the web unless there are incentives and consequences that align with what I want. The Web is a dumpster fire, not because of bad individuals, but because of system dynamics. Incentives. Feedback.
7. If people communicate more clearly, with fewer errors, that's at least a narrow win. One has to at least factor this in.
8. People accusing other people of being LLMs has a cost. Especially when people do it overconfidently or in a crude or mean manner. I've been on the receiving end. Why? Because I write in a way that sometimes triggers people because it resembles how LLMs write.
* I want to read high quality things. I actually care less if you wrote it as bullet points, with the help of an LLM, on a napkin, on a posterboard ... my goal is to learn from something suited to some purpose. I'm happy reading a computer-generated chart. I don't need a human to do that by hand.
The previous paragraph attempts to gesture at some of the conceptual holes in the common arguments behind "if you want a human to read it, a human should right it": they aren't systematically nor rigorously "wargamed" or "thought-experimented"; they are mostly just "knee-jerked".
I am quite interested in many things, including: (1) connecting with real people; (2) connecting with real people that don't merely regurgitate an information source they just ingested; (3) having an intelligent process generating the things I read. As an example of the third, I want "intelligent" organizations that synthesize contributions from their constituent parts. I want "intelligent" algorithms to help me focus on what matters to me. &c.
If a machine does that well, I'm not intrinsically bothered. If a human collaborates with an LLM to do that, fine. Whatever. We have bigger problems! Much bigger ones.
Yes, I want to live in a world where humans are valued for what they write and their intrinsic qualities, even as machines encroach on what used to be our biggest differentiator: intelligence itself. But wanting this and morally shaming people for not doing it doesn't seem like a good way to actually make it happen. Getting to that world, to my eye, requires public sense-making, grappling with the reality of how the world works, forming coalitions, organizing society, and passing laws.
Yes, I understand that HN has a policy that people write their own stuff, and I do. (See #8 above as well as my about page.)
Thank you to the approximately zero or maybe one person who made it this far. I owe you a beer. You can easily find me. I'm serious. But then we have to find a way to have a discussion while enjoying a beer on a video call. Alas.
I expect better from people -- and unfortunately a lot of people's output is lower quality than what I get from Claude. THIS is what pisses me off: that a machine-curated output is actually more useful to me than a vast majority of what people say, at least when I have particular questions to ask. This is one or many uncomfortable realities I would like people need to not flinch away from. As far as intelligent output is concerned, humans are losing a lot of ground. And fast. Don't shoot the messenger. If you don't recognize this, you might have a rather myopic view of intelligence that somehow assumes it must be biological or you just keep moving the goalposts. Or that somehow (but how?) humans "have it" but machines can't.
iFire 4 hours ago [-]
Agree.
I don't write for sentimentality. I write so that my code designs can survive longer than my work on it.
No documentation is worse than deceit.
The emptiness and vastness of the void (entropy) is much deeper than humans or machines.
At the Ise Jingu, the shrine is not built to last; it is built to be reconstructed from scratch every twenty years.
If we want our systems to last, we would need the "process knowledge"—the actual mastery of the craft—to be in human hands rather than decaying in a dead system.
I don't think we can afford to process-knowledge-transfer many of our essential systems... without machine assistance.
> Add this file to your AI assistant's system prompt or context to help it avoid common AI writing patterns.
So if I put this into my LLM's conversation it is like I am instructing it to put this into its AI assistant's system prompt, so the AI assistant's AI assistant.
The alternative is to say:
"Here is a list of common AI tropes for you to avoid"
All tropes are described for me to understand what that AIs do wrong:
> Overuse of "quietly" and similar adverbs to convey subtle importance or understated power.
But this in fact instructs the assistant to start overusing the word 'quietly' rather than stop overusing it.
This is then counteracted a bit with the 'avoid the following...' but this means the file is full of contradictions.
Instead you'd need to say:
"Don't overuse 'quietly', use ... instead"
So while this is a great idea and list, I feel the execution is muddled by the explanation of what it is. I'd separate the presentation to us the user of assistants and the intended consumer, the actual assistants.
I've had claude rewrite it and put it in this gist:
https://gist.github.com/abuisman/05c766310cae4725914cd414639...
If anyone who works on LLMs is reading, a question: When we've tried base models (no instruction tuning/RLHF, just text completion), they show far fewer stylistic anomalies like this. So it's not that the training data is weird. It's something in instruction-tuning that's doing it. Do you ask the human raters to evaluate style? Is there a rubric? Why is the instruction tuning pushing such a noticeable style shift?
[1] https://www.pnas.org/doi/10.1073/pnas.2422455122, preprint at https://arxiv.org/abs/2410.16107. Working on extending this to more recent models and other grammatical features now
Interestingly, because perplexity is the optimization objective, the pretrained models should reflect the least surprising outputs of all.
> Why is the instruction tuning pushing such a noticeable style shift?
Gwern Branwen has been covering this: https://gwern.net/doc/reinforcement-learning/preference-lear....
There's also just that weird thing where they're obsessed with emoji which I've always assumed is because they're the only logograms in english and therefore have a lot of weight per byte.
Wonder how they can avoid the trop while not censoring themselves out.
https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
Another one that seems impossible for LLMs to avoid: breaking article into a title and a subtitle, separated by a colon. Even if you explicitly tell it not to, it'll do it.
This is a problem, because you can easily get stuck in a self-reinforcing loop. You feel strengthened in your convictions that you're good at ferreting out LLM-speak because you've found so much of it. And you find so much of it because you feel confident you're good at it. Nobody ever corrects you when you're wrong.
Combine that with general overconfidence and you get threads where every other post with correct grammar gets "called out" as AI generated. It's pretty boring.
There's a similar effect with contentious subject. You get reams and reams of posts calling the other side out for being part of a Russian/Israeli/Iranian/Chinese troll network. There's no independent falsification or verification for that, so people just get strengthened in their existing beliefs.
I honestly don’t know what sites like this will do when that happens and the only way of detecting LLMs is that they are subtly wrong or post too much, we’d be overrun with them.
Not sure if we should be hopefully or fearful that they will improve to be undetectable but I suspect they will.
Similar to how you can watch one fantastic western/vampire/zombie/disaster/superhero movie and love it, but once Hollywood has decided that this specific style is what brings in the money, they flood the zone with westerns, or superhero movies or whatever, and then the tropes become obvious and you can't stand watching another one.
If (insert your favorite blogger) had secret access to ChatGPT and was the only person in the world with access to it, you would just assume that it's their writing style now, and be ok with it as long as you liked the content.
And yet people seem to still be terrible at that. Someone uses an em-dash and there's always a moron calling it out as AI.
> I honestly don’t know what sites like this will do when that happens and the only way of detecting LLMs is that they are subtly wrong or post too much, we’d be overrun with them.
My personal take is that it doesn't really matter. Most posts are already knee-jerk reactions with little value. Speaking just to be talking. If LLMs make stupid posts, it'll be basically the same as now: scroll a bit more. And if they chance upon saying something interesting then that's a net gain.
You can give it additional instructions in the settings, but you have to be careful with that too. I've put my tech stack and code preferences in there to get better code examples. A while later I asked it about binary executable formats and it started ending every answer with "but the JVM and v8 take care of that for you."
Which is both funny in an "I, Robot" kind of way, and irritating. So I told it to ignore my tech stack. I have a master's in CS and can handle a bit of technical detail.
Turns out, Gemini learned sarcasm. Every following answer in that thread got a paragraph that started with something like "But for your master brain, this means..."
> Honestly? We should address X first. It's a genuine issue and we've found a real bug here.
Honorable mention: "no <thing you told me not to do>". I guess this helps reassure adherence to the prompt? I see that one all the time in vibe coded PRs.
But I feel like I’ve noticed an uptick in people using the adverb “genuinely” in what I genuinely believe to not be AI generated comments, articles, etc. Maybe it’s just me, I got similar vibes about the word efficacy a few years ago, before the ascent of GenAI (but after the pandemic — again, maybe just me).
I see this so often. Sometimes it’s just “no react hooks”, other times it gets literal and extra unnatural, like: “here’s <your thing>, no unnecessary long text explanation”. Perhaps we’re past AGI and this is passive aggressiveness ;)
It makes a tremendous difference. Almost everything on this list is the emotional fluff ChatGPT injects to simulate a personality.
I can understand someone needing help with writing but getting an agent to do the job for you feels like a personal defeat.
With this I am able to get all my favorite subs onto my actual hard drive, with some extra awesome features as a result: I vibe coded a little helper app that lets me query the transcript of the video and ask questions about what they say, using cheap haiku queries. I can also get my subs onto my jellyfin server and be able to view it in there on any device. Even comments get downloaded.
All these streamers have gone too far trying to maximize engagement and have broken the social contract, so I see this as totally fair game.
This one hit home... the first time I ever saw Claude do it I really liked it. It's amazing how quickly it became the #1 most aggravating thing it does just through sheer overuse. And of course now it's rampant in writing everywhere.
"No rough handling. No struggles to accelerate. Just pure performance. The new Toyota GT. It's not just a car—it's a revolution."
Most of the tropes listed on this page give text a more "car ad" (or sometimes "movie trailer") quality. I wonder if magazine scans and press releases unduly weighted the training set.
You can test this quite easily, by checking and hopefully realizing that you in fact can understand written documents with syntax errors, emails with typos and road signs with improper casing or sentence construction.
One I've seen Gemini using a lot is the "I'll shoot straight with you" preamble (or similar phrasing), when it's about to tell me it can't answer the question.
https://en.wikipedia.org/wiki/Snowclone
But that being said, the problem I think is that people treat the output from LLMs as final.
It should be treated more as idea generation or early draft to get over the “staring at a blank page” and get the creative juices flowing and creating your own content.
Having purely AI generated content and eventually feeding the algorithms and soon enough every sounds the same (already does in a lot of places).
https://news.ycombinator.com/item?id=47260028
It does not seem like there are lots of people who are perversely inclined to write a story with all these tropes and words in it, but surely there must be some, because if you make something that beats the LLM (by being creatively good) using all the crap the LLM uses, it would seem some sort of John Henry triumph (discounting the final end of John Henry of course, which is a real downer)
All those tropes have their place in certain contexts. AI overusing them is because they have no memory across all they've written.
Each conversion is a new chat so it's like "I haven't used delve in a while, think I'll roll out that bad boy"
And then you try to fix this by telling it what not to do which doesn't work very well, so...
If AI finally gets rid of the thing that drove me nuts for years: "leverage" as a verb mean roughly "to use"—when no human intervention seems to work, then I shall be over-the-moon happy. I once worked at a place where this particular word was lever—er, used all the damn time and I'd never encountered something so NPC-ish. I felt like I was on The Twilight Zone. I could've told you way back then that you sounded like a bot doing that, now people might actually believe me and thank god.
I will stick by the em dashes however. And I might just start using arrows too. Compose - > → right arrow. Not even difficult.
I hadn't noticed this - great point. To be fair the "home cooked meal" metaphor comes from 2020, predating genAI coding[1]. But even then, CPUs themselves are so normalised that we just kind of... forget how vertiginously complex the entire supply chain is.
[1] https://www.robinsloan.com/notes/home-cooked-app/
I mean, "tapestry" is a great word for something that is interconnected. Why not use it?
No thanks, I hate this large scale social experiment
Negative parallelism is a staple of briefs. "This case is not about free speech. It is about fraud." It does real work when you're contesting the other side's framing.
Tricolons and anaphora are used as persuasion techniques for closing arguments and appellate briefs.
Short punchy fragments help in persuasive briefs where judges are skimming. "The statute is unambiguous."
As with the em dash - let's not throw the baby out with the bath water.
‘It’s not mashed potato. Its potatoes lovingly mixed to perfection with butter and milk which quietly dominate the carrots beside them.’
The words are in the right order, th grammar is ok, but the subject is so banal as to undermine the melodramatic style chosen and they often insert several per paragraph.
More generally, it's interesting that many different LLMs have differences in their favorite tropes but converge on broadly similar patterns. Of course ChatGPT and its default persona (you can choose others in the settings, but most people don't do that) is overrepresented in these examples. For example, the article doesn't mention the casual/based tone of Grok that often feels somewhat forced.
I understand the sentiment. Meaning I think I understand some of the underlying frustration. But I don't care for the tone or the framing or the depth of analysis (for there isn't much there; I've seen the "if you didn't write it, why should I read it" cliché before *, and it ain't the only argument in town). Now for my detailed responses:
1. In the same way the author wants people to respect other people, I want the author to respect the complexity of the universe. I'm not seeing that.
2. If someone says "I wrote this without any LLM assistance" but do so anyway, THAT is clearly deceptive.
3. If you read a page that was created with LLM assistance, it isn't reasonable for you to say the creator was being deceptive just because you assumed. It takes two to achieve deception: both the sender and the receiver.
4. If you read a page on the internet, it is increasingly likely there was no human in the loop for the article at all. Good luck tracing the provenance of who made the call to make it happen. It might well be downstream of someone's job. (Yes, we can talk about diffusion of responsibility, etc., that's fair game -- but if you want to get into the realm of moral judgments, this isn't going to be a quick and tidy conversation)
5. I think the above comment puts too much of a "oh the halcyon days!" spin on this. Throughout history, many humans, much of the time, are largely repackaging things we had heard before. Unfortunately (or just "in reality") more of us are catching on to just how memetically-driven people are. We are both individuals and cogs. It is an uncomfortable truth. That brainwashed uncle you have is almost certainly a less reliable source of information than Claude.
6. The web has crappy incentives. It sucks. Yes, I want people to behave better. That would be nice, but I can't realistically expect people to behave better on the web unless there are incentives and consequences that align with what I want. The Web is a dumpster fire, not because of bad individuals, but because of system dynamics. Incentives. Feedback.
7. If people communicate more clearly, with fewer errors, that's at least a narrow win. One has to at least factor this in.
8. People accusing other people of being LLMs has a cost. Especially when people do it overconfidently or in a crude or mean manner. I've been on the receiving end. Why? Because I write in a way that sometimes triggers people because it resembles how LLMs write.
* I want to read high quality things. I actually care less if you wrote it as bullet points, with the help of an LLM, on a napkin, on a posterboard ... my goal is to learn from something suited to some purpose. I'm happy reading a computer-generated chart. I don't need a human to do that by hand.
The previous paragraph attempts to gesture at some of the conceptual holes in the common arguments behind "if you want a human to read it, a human should right it": they aren't systematically nor rigorously "wargamed" or "thought-experimented"; they are mostly just "knee-jerked".
I am quite interested in many things, including: (1) connecting with real people; (2) connecting with real people that don't merely regurgitate an information source they just ingested; (3) having an intelligent process generating the things I read. As an example of the third, I want "intelligent" organizations that synthesize contributions from their constituent parts. I want "intelligent" algorithms to help me focus on what matters to me. &c.
If a machine does that well, I'm not intrinsically bothered. If a human collaborates with an LLM to do that, fine. Whatever. We have bigger problems! Much bigger ones.
Yes, I want to live in a world where humans are valued for what they write and their intrinsic qualities, even as machines encroach on what used to be our biggest differentiator: intelligence itself. But wanting this and morally shaming people for not doing it doesn't seem like a good way to actually make it happen. Getting to that world, to my eye, requires public sense-making, grappling with the reality of how the world works, forming coalitions, organizing society, and passing laws.
Yes, I understand that HN has a policy that people write their own stuff, and I do. (See #8 above as well as my about page.)
Thank you to the approximately zero or maybe one person who made it this far. I owe you a beer. You can easily find me. I'm serious. But then we have to find a way to have a discussion while enjoying a beer on a video call. Alas.
I expect better from people -- and unfortunately a lot of people's output is lower quality than what I get from Claude. THIS is what pisses me off: that a machine-curated output is actually more useful to me than a vast majority of what people say, at least when I have particular questions to ask. This is one or many uncomfortable realities I would like people need to not flinch away from. As far as intelligent output is concerned, humans are losing a lot of ground. And fast. Don't shoot the messenger. If you don't recognize this, you might have a rather myopic view of intelligence that somehow assumes it must be biological or you just keep moving the goalposts. Or that somehow (but how?) humans "have it" but machines can't.
I don't write for sentimentality. I write so that my code designs can survive longer than my work on it.
No documentation is worse than deceit.
The emptiness and vastness of the void (entropy) is much deeper than humans or machines.
Google search says this philosphy is called https://plato.stanford.edu/entries/content-externalism/
If we want our systems to last, we would need the "process knowledge"—the actual mastery of the craft—to be in human hands rather than decaying in a dead system.
I don't think we can afford to process-knowledge-transfer many of our essential systems... without machine assistance.