Chat GPT appears to hallucinate or outright lie about everything

Buttflapper@lemmy.world · 8 months ago

Chat GPT appears to hallucinate or outright lie about everything

Dave. · edit-2 8 months ago

Most times what I get when asking it coding questions is a half-baked response that has a logic error or five in it.

Once I query it about one of those errors it replies with, “You’re right, X should be Y because of (technical reason Z). Here’s the updated code that fixes it”.

It will then give me some code that does actually work, but does dumb things, like recalculating complex but static values inside a loop. When I ask if there’s any performance improvements it can do, suddenly it’s full of helpful ways to improve the code that can make it run 10 to 100 times faster and fix those issues. Apparently if I want performant code, I have to explicitly ask for it.

For some things it will offer solutions that don’t solve the issue that I raise, no matter how many different ways I phrase the issue and try and coax it towards a solution. At that point, it basically can’t, and it gets bogged down to minor alterations that don’t really achieve anything.

Sometimes when it hits that point I can say “start again, and use (this methodology)” and it will suddenly hit upon a solution that’s workable.

So basically, right now it’s good for regurgitating some statistically plausible information that can be further refined with a couple of good questions from your side.

Of course, for that to work you have to know the domain you’re working in fairly well already otherwise you’re shit out of luck.

orclev@lemmy.world · edit-2 8 months ago

LLMs are basically just really fancy search engines. The reason the initial code is garbage is that it’s cut and pasted together from random crap the LLM found on the net under various keywords. It gets more performant when you ask because then the LLM is running a different search. The first search was “assemble some pieces of code to accomplish X”, while the second search was “given this sample of code find parts of it that could be optimized”, two completely different queries.

As noted in another comment the true fatal flaw of LLMs is that they don’t really have a threshold for just saying " I don’t know that" as they are inherently probabilistic in nature. When asked something they can’t find an answer for they assemble a lexically probable response from similar search results even in cases where it’s wildly wrong. The more uncommon and niche your search is the more likely this is to happen. In other words they work well for finding very common information, and increasingly worse the less common that information is.