Sorry I keep posting about Mistral but if you check: https://chat.mistral.ai/chat
I duno how they do it but some of these answers are lightning fast:
Fast inference dramatically improves the user experience for chat and code generation – two of the most popular use-cases today. In the example above, Mistral Le Chat completes a coding prompt instantly while other popular AI assistants take up to 50 seconds to finish.
For this initial release, Cerebras will focus on serving text-based queries for the Mistral Large 2 model. When using Cerebras Inference, Le Chat will display a “Flash Answer ⚡” icon on the bottom left of the chat interface.
You must log in or register to comment.
Is this a big deal? Definitely sounds like a big deal