Google DeepMind has recently unveiled their new Gemini model, the company’s most capable and general AI model to date. Gemini stands out due to its multimodal nature, meaning it can process and understand various types of information – text, code, audio, images, and video. This capability allows Gemini to operate seamlessly across different data types, making it a versatile tool in the AI landscape.
Gemini’s flexibility is an important highlight, with its ability to function efficiently on a range of platforms, from large data centers to mobile devices. This adaptability makes it a valuable asset for developers and enterprise customers looking to leverage AI in diverse environments.
The first version, Gemini 1.0, comes in three variants:
In testing, Gemini Ultra has shown remarkable performance, surpassing human experts in MMLU (massive multitask language understanding) with a score of 90.0%. This benchmark uses a mix of 57 subjects, from math to ethics, to assess world knowledge and problem-solving skills. Gemini Ultra also excels in the new MMMU benchmark, which involves multimodal tasks requiring deliberate reasonin
Notably, Gemini Ultra has demonstrated superior capabilities in image benchmarks without relying on OCR systems, indicating its advanced native multimodal abilities. This proficiency in handling complex reasoning tasks is a testament to its sophisticated design.
Unlike previous multimodal models that were built by combining separate components for different modalities, Gemini was designed from the ground up to be natively multimodal. This approach allows it to understand and reason about various inputs more effectively than existing models.
Gemini 1.0’s sophisticated reasoning capabilities enable it to process and make sense of complex written and visual information, making it particularly adept at extracting insights from large volumes of data. Its proficiency extends to understanding and explaining complex subjects like math and physics, as well as generating high-quality code in popular programming languages.
In terms of software development, Gemini Ultra has shown exceptional performance in benchmarks like HumanEval and Natural2Code. It also serves as the foundation for advanced coding systems like AlphaCode 2, which excels in solving competitive programming problems involving complex math and theoretical computer science.
The development of Gemini was supported by Google’s AI-optimized infrastructure, including the latest Tensor Processing Units (TPUs) v4 and v5e. These TPUs have played a crucial role in enhancing the speed and efficiency of Gemini, making it a reliable and scalable model for various applications.
Safety and responsibility have been central to the new Gemini model’s development. Extensive safety evaluations, including assessments for bias and toxicity, have been conducted. Google has also engaged with external experts to identify and address potential risks, ensuring that Gemini adheres to robust safety standards.
Gemini 1.0 is now being integrated into various Google products and platforms. For instance, Bard will use a version of Gemini Pro for advanced reasoning and understanding. Gemini is also making its way into Google’s hardware, with the Pixel 8 Pro being the first smartphone to run Gemini Nano.
Developers and enterprise customers will soon have access to Gemini Pro through the Gemini API in Google AI Studio or Google Cloud Vertex AI. Additionally, Android developers will be able to utilize Gemini Nano via AICore in Android 14, starting with Pixel 8 Pro devices.
As for Gemini Ultra, it is currently undergoing extensive safety checks and refinements. It will soon be available to select customers and partners for early feedback before a broader rollout.
This introduction of the new Gemini model marks a new era in AI development at Google, showcasing a commitment to innovation and responsible AI advancement. The team at Google DeepMind is excited about the future possibilities and the transformative impact Gemini could have on various fields, from science to everyday life.