• quixotic120@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    4 hours ago

    I never know what to tell people like you. DeepL is the best one at translating Japanese and it still is mixed

    Japanese is a contextual language. It is difficult for machines to translate because it is not a language where a word simply means this or that.

    Take a very short sentence:

    がくせいです - gakusei desu

    Gakusei is “student”

    A literal translation of this would be “am student” or “is student”

    But if I say this to you you will infer things based on context and です/desu takes on a different role. If I am clearly referring to myself then gakusei desu in this context becomes “I’m a student” (though to be fair I would probably have said it with a first person pronoun and particle like わたしはわ (watashi wa)

    But if I’m referring to another singular person in the room this would be inferred through context, eg “he’s a student.” But this is where it starts getting confusing, the copula (desu) doesn’t differentiate singular or plural, so context is also used to derive plural forms, eg “they are students”.

    This copula also applies to other situations outside of he/she like “it” eg コンビニです, konbini desu, konbini being “convenience store”, “its a convenience store”

    This is a very very basic idea of why. It gets more complex obviously once you move past these extremely basic examples but honestly someone more knowledgeable at Japanese should explain at that point, I’m self taught and mediocre (thus my use of 0 kanji, I’m pretty sure at a minimum there’s kanji for gakusei and watashi but I suuuuck at kanji. I at least know meat. 肉肉肉 although that’s mostly thanks to anime and the local Asian grocer lmao).

    I think AI can probably do it eventually but it will need to be able to do much better job of understanding what the source material is actually talking about and that’s the challenge to overcome. And that’s why it will probably never be able to really accurately translate a paragraph copy pasted into it

    As to the sanitation that’s a separate issue about corporate control of AI. If they don’t want their translation services to sound “vulgar” that’s their prerogative I suppose but it also means they sound less human and realistic because people are gross and ugly when they speak. Vernacular is ugly and lexicon adapts quickly in ways that people don’t always love. But to be clear I don’t mean it’s just about slurs and bad words, I mean it’s about slang in general. Words that are generally inoffensive but not considered proper. English equivalents would be what we consider zoomer speak, hella sus yeah bruh type shit (I’m not good at this part)

    • kalleboo@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      4 hours ago

      ChatGPT (and presumably the there chat LLMs) is way way better at this than DeepL or Google Translate, because you can give it context before. Like “A is a x and B is a Y, when A says z, what does that mean”