Gemma 3n includes the following key features:

Audio input: Process sound data for speech recognition, translation, and audio data analysis.

Visual and text input: Multimodal capabilities let you handle vision, sound, and text to help you understand and analyze the world around you.

PLE caching: Per-Layer Embedding (PLE) parameters contained in these models can be cached to fast, local storage to reduce model memory run costs. Learn more

MatFormer architecture: Matryoshka Transformer architecture allows for selective activation of the models parameters per request to reduce compute cost and response times. Learn more

Conditional parameter loading: Bypass loading of vision and audio parameters in the model to reduce the total number of loaded parameters and save memory resources. Learn more

Wide language support: Wide linguistic capabilities, trained in over 140 languages. 32K token context: Substantial input context for analyzing data and handling processing tasks.