This is a paper for a MIT study. Three groups of participants where tasked to write an essay. One of them was allowed to use a LLM. These where the results:

The participants mental activity was also checked repeatedly via EEG. As per the papers abstract:

EEG revealed significant differences in brain connectivity: Brain-only participants exhibited the strongest, most distributed networks; Search Engine users showed moderate engagement; and LLM users displayed the weakest connectivity. Cognitive activity scaled down in relation to external tool use.

  • utopiah@lemmy.world
    link
    fedilink
    arrow-up
    6
    arrow-down
    1
    ·
    edit-2
    15 hours ago

    The biggest flaw in this study is that the LLM group wasn’t allowed to edit their essays

    I didn’t read the whole thing but only skimmed through the protocol. I only spotted

    “participants were instructed to pick a topic among the proposed prompts, and then to produce an essay based on the topic’s assignment within a 20 minutes time limit. Depending on the participant’s group assignment, the participants received additional instructions to follow: those in the LLM group (Group 1) were restricted to using only ChatGPT, and explicitly prohibited from visiting any websites or other LLM bots. The ChatGPT account was provided to them. They were instructed not to change any settings or delete any conversations.”

    which I don’t interpret as no editing. Can you please share where you found that out?

    The biggest flaw in this study is that the LLM group wasn’t allowed to edit their essays

    • Zozano
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      5
      ·
      edit-2
      15 hours ago

      Lol, oops, I got poo brain right now. I inferred they couldn’t edit because the methodology doesn’t say whether revisions were allowed.

      What is clear, is they weren’t permitted to edit the prompt or add personalization details seems to imply the researchers weren’t interested in understanding how a participant might use it in a real setting; just passive output. This alone undermines the premise.

      It makes it hard to assess whether the observed cognitive deficiency was due to LLM assistance, or the method by which it was applied.

      The extent of our understanding of the methodology is that they couldn’t delete chats. If participants were only permitted to a a one-shot generation per prompt, then there’s something wrong.

      But just as concerning is the fact that it isnt explicitly stated.