cross-posted from: https://lemmy.intai.tech/post/72919

Parameters count:

GPT-4 is more than 10x the size of GPT-3. We believe it has a total of ~1.8 trillion parameters across 120 layers. Mixture Of Experts - Confirmed.

OpenAI was able to keep costs reasonable by utilizing a mixture of experts (MoE) model. They utilizes 16 experts within their model, each is about ~111B parameters for MLP. 2 of these experts are routed to per forward pass.

Related Article: https://lemmy.intai.tech/post/72922

  • grabyourmotherskeys@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    And to curb enthusiasm for integrating stuff like this into production systems.

    There are currently law suits about copyright violations in training. They are either going to get settled for tons of money or the model will be retrained without that data source. This could have a significant effect on the business model, the business itself, and the models in use, especially over the long term.

    There’s a lot to figure out before this is a stable product.