MMLM之Gemini:《Introducing Gemini: our largest and most capable AI model》的翻譯與解讀
導(dǎo)讀:2023年12月6日,Google重磅發(fā)布大規(guī)模多模態(tài)模型Gemini,表示了Google語言模型發(fā)展到了一個(gè)新階段,其多模態(tài)和通用能力明顯優(yōu)于目前大部分主流大模型。這是Google目前最大、最強(qiáng)大的人工智能模型。Gemini從底層構(gòu)建為多模式,可以概括和無縫地理解、操作和組合不同類型的信息,包括文本、圖像、音頻、視頻和代碼。這意味著它具有復(fù)雜的多模態(tài)推理和高級(jí)編碼能力。通過可以驅(qū)動(dòng)Google產(chǎn)品,提供更先進(jìn)的客戶服務(wù)互動(dòng),用于內(nèi)容創(chuàng)作和營銷活動(dòng),并在自然語言、代碼生成、競賽編程等任務(wù)上表現(xiàn)優(yōu)秀。
背景:隨著AI技術(shù)的不斷進(jìn)步,語言模型也在不斷發(fā)展,但現(xiàn)有模型在多模態(tài)處理能力和一致性暴露了不足。
解決痛點(diǎn):Gemini面向未來AI助手應(yīng)有的知識(shí)和能力,即多模態(tài)、通用、可靠等能力。
解決方案:
>> Gemini采用從一開始就注重多模態(tài)的訓(xùn)練方式,可以自然地理解和推理各種輸入。
>> Gemini在多種語言、圖像、知識(shí)測評benchmark上均超過目前SOTA,表明其強(qiáng)大的多模態(tài)能力。
>> Gemini在自然語言、代碼生成、競賽編程等任務(wù)上也表現(xiàn)出色。
>> Gemini的三個(gè)版本針對不同場景進(jìn)行優(yōu)化,可以在服務(wù)器、設(shè)備上高效運(yùn)行。
>> Gemini系列開發(fā)注重責(zé)任和安全,采取多重機(jī)制提升模型安全性。
>> Gemini將被應(yīng)用在谷歌多個(gè)產(chǎn)品中,同時(shí)也將通過API對開發(fā)者開放。
總之,Gemini極大提升了谷歌模型在多模態(tài)能力、通用性和運(yùn)行效率上的水平,解決了傳統(tǒng)模型在這方面的不足,有望助推AI助手的發(fā)展。
《Introducing Gemini: our largest and most capable AI model》的翻譯與解讀
地址 | 地址:Introducing Gemini: Google’s most capable AI model yet | 時(shí)間 | 2023年12月6日 | 作者 | Sundar Pichai CEO of Google and Alphabet Demis Hassabis CEO and Co-Founder, Google DeepMind |
Note from Sundar

A note from Google and Alphabet CEO Sundar Pichai: Every technology shift is an opportunity to advance scientific discovery, accelerate human progress, and improve lives. I believe the transition we are seeing right now with AI will be the most profound in our lifetimes, far bigger than the shift to mobile or to the web before it. AI has the potential to create opportunities — from the everyday to the extraordinary — for people everywhere. It will bring new waves of innovation and economic progress and drive knowledge, learning, creativity and productivity on a scale we haven’t seen before. That’s what excites me: the chance to make AI helpful for everyone, everywhere in the world. | 谷歌和Alphabet首席執(zhí)行官Sundar Pichai的一則聲明: 每一次技術(shù)變革都是推動(dòng)科學(xué)發(fā)現(xiàn)、加速人類進(jìn)步和改善生活的機(jī)會(huì)。我相信我們現(xiàn)在看到的人工智能的轉(zhuǎn)變將是我們一生中最深刻的,遠(yuǎn)遠(yuǎn)超過之前向移動(dòng)或網(wǎng)絡(luò)的轉(zhuǎn)變。人工智能有潛力為世界各地的人們創(chuàng)造機(jī)會(huì)——從日常生活到非凡的生活。它將帶來新的創(chuàng)新浪潮和經(jīng)濟(jì)進(jìn)步,并以前所未有的規(guī)模推動(dòng)知識(shí)、學(xué)習(xí)、創(chuàng)造力和生產(chǎn)力。 讓我興奮的是:有機(jī)會(huì)使人工智能對全球所有人都有幫助。 | Nearly eight years into our journey as an AI-first company, the pace of progress is only accelerating: Millions of people are now using generative AI across our products to do things they couldn’t even a year ago, from finding answers to more complex questions to using new tools to collaborate and create. At the same time, developers are using our models and infrastructure to build new generative AI applications, and startups and enterprises around the world are growing with our AI tools. This is incredible momentum, and yet, we’re only beginning to scratch the surface of what’s possible. We’re approaching this work boldly and responsibly. That means being ambitious in our research and pursuing the capabilities that will bring enormous benefits to people and society, while building in safeguards and working collaboratively with governments and experts to address risks as AI becomes more capable. And we continue to invest in the very best tools, foundation models and infrastructure and bring them to our products and to others, guided by our AI Principles. Now, we’re taking the next step on our journey with Gemini, our most capable and general model yet, with state-of-the-art performance across many leading benchmarks. Our first version, Gemini 1.0, is optimized for different sizes: Ultra, Pro and Nano. These are the first models of the Gemini era and the first realization of the vision we had when we formed Google DeepMind earlier this year. This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company. I’m genuinely excited for what’s ahead, and for the opportunities Gemini will unlock for people everywhere. – Sundar | 作為一家以人工智能為先的公司,我們已經(jīng)進(jìn)行了近八年的探索,進(jìn)展的速度只是在加快:數(shù)百萬人現(xiàn)在正在使用我們產(chǎn)品中的生成式人工智能,做一些他們一年前甚至無法做到的事情,從解答更復(fù)雜的問題到使用新工具進(jìn)行協(xié)作和創(chuàng)造。同時(shí),開發(fā)人員正在利用我們的模型和基礎(chǔ)設(shè)施構(gòu)建新的生成式人工智能應(yīng)用程序,全球范圍內(nèi)的初創(chuàng)公司和企業(yè)正在借助我們的人工智能工具實(shí)現(xiàn)增長。 這是不可思議的動(dòng)力,然而,我們只是剛剛開始觸及可能性的表面。 我們正在大膽而負(fù)責(zé)地開展這項(xiàng)工作。這意味著在研究中抱有雄心,并追求那些將為人們和社會(huì)帶來巨大利益的能力,同時(shí)建立防護(hù)措施,并與政府和專家合作,以應(yīng)對隨著人工智能變得更加強(qiáng)大而出現(xiàn)的風(fēng)險(xiǎn)。我們繼續(xù)投資于最優(yōu)秀的工具、基礎(chǔ)模型和基礎(chǔ)設(shè)施,并將它們引入我們的產(chǎn)品和其他產(chǎn)品,遵循我們的人工智能原則。 現(xiàn)在,我們正在Gemini的旅程中邁出下一步,這是我們迄今為止最強(qiáng)大且最通用的模型,在許多領(lǐng)先的基準(zhǔn)測試中具有最先進(jìn)的性能。我們的第一個(gè)版本Gemini 1.0針對不同的尺寸進(jìn)行了優(yōu)化:Ultra、Pro和Nano。這些是Gemini時(shí)代的第一批模型,也是我們今年早些時(shí)候成立Google DeepMind時(shí)的第一個(gè)愿景的首次實(shí)現(xiàn)。這一新時(shí)代的模型代表了公司迄今為止進(jìn)行的最大的科學(xué)和工程努力之一。我為即將發(fā)生的事情感到真正興奮,也為Gemini將為全球人民開啟的機(jī)會(huì)感到興奮。 |
Introducing Gemini介紹Gemini
By Demis Hassabis, CEO and Co-Founder of Google DeepMind, on behalf of the Gemini team AI has been the focus of my life's work, as for many of my research colleagues. Ever since programming AI for computer games as a teenager, and throughout my years as a neuroscience researcher trying to understand the workings of the brain, I’ve always believed that if we could build smarter machines, we could harness them to benefit humanity in incredible ways. This promise of a world responsibly empowered by AI continues to drive our work at Google DeepMind. For a long time, we’ve wanted to build a new generation of AI models, inspired by the way people understand and interact with the world. AI that feels less like a smart piece of software and more like something useful and intuitive — an expert helper or assistant. Today, we’re a step closer to this vision as we introduce Gemini, the most capable and general model we’ve ever built. | 由Google DeepMind首席執(zhí)行官兼聯(lián)合創(chuàng)始人Demis Hassabis代表Gemini團(tuán)隊(duì)發(fā)表 人工智能一直是我畢生工作的焦點(diǎn),也是我的許多研究同仁的焦點(diǎn)。自從十幾歲時(shí)為電腦游戲編寫人工智能程序以來,一直到我作為神經(jīng)科學(xué)研究者試圖理解大腦工作的這些年,我一直相信,如果我們能構(gòu)建更智能的機(jī)器,我們就能利用它們以令人難以置信的方式造福人類。 在Google DeepMind,我們繼續(xù)致力于這一由人工智能負(fù)責(zé)任地賦予世界權(quán)力的承諾。很長一段時(shí)間以來,我們一直想要構(gòu)建一代新的人工智能模型,靈感來自人們理解和與世界互動(dòng)的方式。這種人工智能感覺不像是一款聰明的軟件,更像是一種有用而直觀的東西 —— 一種專業(yè)的助手或?qū)<摇?/p> | Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research. It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across and combine different types of information including text, code, audio, image and video. | 今天,我們向這一愿景又邁進(jìn)了一步,我們推出了Gemini,這是我們有史以來打造的最強(qiáng)大、最通用的模型。 Gemini是谷歌各個(gè)團(tuán)隊(duì)大規(guī)模合作的結(jié)果,包括我們在谷歌研究部門的同事。它從頭開始構(gòu)建,以多模態(tài)為特點(diǎn),這意味著它可以泛化并無縫地理解、操作和組合不同類型的信息,包括文本、代碼、音頻、圖像和視頻。 | Introducing Gemini: our largest and most capable AI model Gemini is also our most flexible model yet — able to efficiently run on everything from data centers to mobile devices. Its state-of-the-art capabilities will significantly enhance the way developers and enterprise customers build and scale with AI. We’ve optimized Gemini 1.0, our first version, for three different sizes: >> Gemini Ultra — our largest and most capable model for highly complex tasks. >> Gemini Pro — our best model for scaling across a wide range of tasks. >> Gemini Nano — our most efficient model for on-device tasks. | Gemini:我們最大、最強(qiáng)大的人工智能模型 Gemini也是我們迄今為止最靈活的模型,能夠在從數(shù)據(jù)中心到移動(dòng)設(shè)備的所有設(shè)備上高效運(yùn)行。其最先進(jìn)的功能將顯著增強(qiáng)開發(fā)人員和企業(yè)客戶使用人工智能構(gòu)建和擴(kuò)展的方式。 我們已經(jīng)優(yōu)化了Gemini 1.0,我們的第一個(gè)版本,有三種不同的尺寸: >>GeminiUltra -用于高度復(fù)雜任務(wù)的最大最強(qiáng)大的模型。 >> Gemini Pro -在各種任務(wù)上擴(kuò)展的最佳模型。 >> Gemini Nano?-在設(shè)備上任務(wù)中最有效的模型。 |
State-of-the-art performance最先進(jìn)的性能
We've been rigorously testing our Gemini models and evaluating their performance on a wide variety of tasks. From natural image, audio and video understanding to mathematical reasoning, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development. With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities. Our new benchmark approach to MMLU enables Gemini to use its reasoning capabilities to think more carefully before answering difficult questions, leading to significant improvements over just using its first impression. | 我們已經(jīng)對Gemini模型進(jìn)行了嚴(yán)格的測試,并在各種任務(wù)上評估了它們的性能。從自然圖像、音頻和視頻理解到數(shù)學(xué)推理,Gemini Ultra的性能在32個(gè)廣泛使用的大語言模型(LLM)研究和開發(fā)中使用的學(xué)術(shù)基準(zhǔn)中有30個(gè)超越了當(dāng)前最先進(jìn)的結(jié)果。 在MMLU(大規(guī)模多任務(wù)語言理解)中,Gemini Ultra以90.0%的得分首次超過人類專家,該任務(wù)使用57個(gè)主題(如數(shù)學(xué)、物理學(xué)、歷史、法律、醫(yī)學(xué)和倫理學(xué))結(jié)合測試世界知識(shí)和解決問題的能力。 我們對MMLU的新基準(zhǔn)方法使Gemini能夠利用其推理能力在回答困難問題之前更加謹(jǐn)慎思考,從而比僅使用第一印象有顯著改善。 | Gemini Ultra also achieves a state-of-the-art score of 59.4% on the new MMMU benchmark, which consists of multimodal tasks spanning different domains requiring deliberate reasoning. With the image benchmarks we tested, Gemini Ultra outperformed previous state-of-the-art models, without assistance from object character recognition (OCR) systems that extract text from images for further processing. These benchmarks highlight Gemini’s native multimodality and indicate early signs of Gemini's more complex reasoning abilities. | Gemini Ultra在新的MMM(多模態(tài)多任務(wù))基準(zhǔn)測試中也取得了59.4%的最先進(jìn)得分,該基準(zhǔn)測試包括涉及不同領(lǐng)域的多模態(tài)任務(wù),需要深思熟慮的推理。 在我們測試的圖像基準(zhǔn)測試中,Gemini Ultra在沒有目標(biāo)字符識(shí)別(OCR)系統(tǒng)的輔助下,超越了以前最先進(jìn)的模型。這些基準(zhǔn)測試突顯了Gemini的本機(jī)多模態(tài)性,并表明Gemini具有更復(fù)雜推理能力的早期跡象。 |
See more details in our Gemini technical report.在我們的Gemini技術(shù)報(bào)告中看到更多細(xì)節(jié)。
在包括文本和編碼在內(nèi)的一系列基準(zhǔn)測試中都超越了最先進(jìn)的性能Gemini surpasses state-of-the-art performance on a range of benchmarks including text and coding.Gemini

在一系列多模式基準(zhǔn)上超越了最先進(jìn)的性能Gemini surpasses state-of-the-art performance on a range of multimodal benchmarks.

Next-generation capabilities新一代能力
Until now, the standard approach to creating multimodal models involved training separate components for different modalities and then stitching them together to roughly mimic some of this functionality. These models can sometimes be good at performing certain tasks, like describing images, but struggle with more conceptual and complex reasoning. We designed Gemini to be natively multimodal, pre-trained from the start on different modalities. Then we fine-tuned it with additional multimodal data to further refine its effectiveness. This helps Gemini seamlessly understand and reason about all kinds of inputs from the ground up, far better than existing multimodal models — and its capabilities are state of the art in nearly every domain. | 到目前為止,創(chuàng)建多模態(tài)模型的標(biāo)準(zhǔn)方法包括為不同的模態(tài)訓(xùn)練單獨(dú)的組件,然后將它們拼接在一起,粗略地模仿一些功能。這些模型有時(shí)可以很好地執(zhí)行某些任務(wù),比如描述圖像,但在更概念性和復(fù)雜的推理方面會(huì)遇到困難。 我們設(shè)計(jì)Gemini是天生的多模態(tài),從一開始就在不同的模態(tài)上進(jìn)行了預(yù)訓(xùn)練。然后我們用額外的多模態(tài)數(shù)據(jù)對其進(jìn)行微調(diào),以進(jìn)一步改進(jìn)其有效性。這有助于Gemini從一開始就無縫地理解和推理各種輸入,比現(xiàn)有的多模態(tài)模型要好得多,而且它的能力幾乎在每個(gè)領(lǐng)域都是最先進(jìn)的。 |
Learn more about Gemini’s capabilities and see how it works.了解有關(guān)Gemini能力的更多信息,并了解其工作原理。
Sophisticated reasoning復(fù)雜的推理
Gemini 1.0’s sophisticated multimodal reasoning capabilities can help make sense of complex written and visual information. This makes it uniquely skilled at uncovering knowledge that can be difficult to discern amid vast amounts of data. Its remarkable ability to extract insights from hundreds of thousands of documents through reading, filtering and understanding information will help deliver new breakthroughs at digital speeds in many fields from science to finance. | Gemini 1.0復(fù)雜的多模態(tài)推理能力有助于理解復(fù)雜的書面和視覺信息。這使得它在發(fā)現(xiàn)在大量數(shù)據(jù)中難以辨別的知識(shí)方面具有獨(dú)特的技能。 它通過閱讀、過濾和理解信息,從數(shù)十萬份文件中提取見解的非凡能力,將有助于在從科學(xué)到金融的許多領(lǐng)域以數(shù)字速度實(shí)現(xiàn)新的突破。 |
Gemini解鎖新的科學(xué)見解
Understanding text, images, audio and more理解文本,圖像,音頻和更多
Gemini 1.0 was trained to recognize and understand text, images, audio and more at the same time, so it better understands nuanced information and can answer questions relating to complicated topics. This makes it especially good at explaining reasoning in complex subjects like math and physics. | Gemini1.0經(jīng)過訓(xùn)練,可以同時(shí)識(shí)別和理解文本、圖像、音頻等,因此它能更好地理解細(xì)微的信息,并能回答與復(fù)雜話題有關(guān)的問題。這使得它特別擅長解釋數(shù)學(xué)和物理等復(fù)雜學(xué)科的推理。 |
Gemini explains reasoning in math and physics,Gemini在數(shù)學(xué)和物理的推理中表現(xiàn)優(yōu)異。
Advanced coding先進(jìn)的編碼
Our first version of Gemini can understand, explain and generate high-quality code in the world’s most popular programming languages, like Python, Java, C++, and Go. Its ability to work across languages and reason about complex information makes it one of the leading foundation models for coding in the world. Gemini Ultra excels in several coding benchmarks, including HumanEval, an important industry-standard for evaluating performance on coding tasks, and Natural2Code, our internal held-out dataset, which uses author-generated sources instead of web-based information. Gemini can also be used as the engine for more advanced coding systems. Two years ago we presented AlphaCode, the first AI code generation system to reach a competitive level of performance in programming competitions. Using a specialized version of Gemini, we created a more advanced code generation system, AlphaCode 2, which excels at solving competitive programming problems that go beyond coding to involve complex math and theoretical computer science. | 我們的第一個(gè)版本Gemini可以理解、解釋和生成世界上最流行的編程語言的高質(zhì)量代碼,如Python、Java、c++和Go。它具有跨語言工作和對復(fù)雜信息進(jìn)行推理的能力,使其成為世界上領(lǐng)先的編碼基礎(chǔ)模型之一。 Gemini Ultra在幾個(gè)編碼基準(zhǔn)測試中表現(xiàn)出色,包括HumanEval(一個(gè)重要的行業(yè)標(biāo)準(zhǔn),用于評估編碼任務(wù)的性能)和Natural2Code(我們的內(nèi)部保留數(shù)據(jù)集),它使用作者生成的來源而不是基于web的信息。 Gemini也可以用作更先進(jìn)的編碼系統(tǒng)的引擎。兩年前,我們推出了AlphaCode,這是第一個(gè)在編程比賽中達(dá)到競技水平的人工智能代碼生成系統(tǒng)。 使用專門的Gemini版本,我們創(chuàng)建了一個(gè)更高級(jí)的代碼生成系統(tǒng),AlphaCode 2,在解決涉及復(fù)雜數(shù)學(xué)和理論計(jì)算機(jī)科學(xué)的競爭性編程問題方面表現(xiàn)出色。 | When evaluated on the same platform as the original AlphaCode, AlphaCode 2 shows massive improvements, solving nearly twice as many problems, and we estimate that it performs better than 85% of competition participants — up from nearly 50% for AlphaCode. When programmers collaborate with AlphaCode 2 by defining certain properties for the code samples to follow, it performs even better. We’re excited for programmers to increasingly use highly capable AI models as collaborative tools that can help them reason about the problems, propose code designs and assist with implementation — so they can release apps and design better services, faster. | 當(dāng)在與原始AlphaCode相同的平臺(tái)上進(jìn)行評估時(shí),AlphaCode 2顯示出巨大的改進(jìn),解決了幾乎兩倍的問題,我們估計(jì)它的表現(xiàn)優(yōu)于85%的比賽參與者——較AlphaCode的近50%有所提高。當(dāng)程序員通過為代碼示例定義某些屬性與AlphaCode 2協(xié)作時(shí),它的性能會(huì)更好。 我們很高興程序員越來越多地使用高性能的人工智能模型作為協(xié)作工具,幫助他們推理問題、提出代碼設(shè)計(jì)并協(xié)助實(shí)現(xiàn)——這樣他們就可以更快地發(fā)布應(yīng)用程序和設(shè)計(jì)更好的服務(wù)。 |
Gemini excels at coding and competitive programming,Gemini擅長編碼和競爭性編程
See more details in our AlphaCode 2 technical report.詳見我們的AlphaCode 2技術(shù)報(bào)告。
Scalable and efficient可擴(kuò)展且高效
More reliable, scalable and efficient更可靠,可擴(kuò)展和高效
We trained Gemini 1.0 at scale on our AI-optimized infrastructure using Google’s in-house designed Tensor Processing Units (TPUs) v4 and v5e. And we designed it to be our most reliable and scalable model to train, and our most efficient to serve. On TPUs, Gemini runs significantly faster than earlier, smaller and less-capable models. These custom-designed AI accelerators have been at the heart of Google's AI-powered products that serve billions of users like Search, YouTube, Gmail, Google Maps, Google Play and Android. They’ve also enabled companies around the world to train large-scale AI models cost-efficiently. Today, we’re announcing the most powerful, efficient and scalable TPU system to date, Cloud TPU v5p, designed for training cutting-edge AI models. This next generation TPU will accelerate Gemini’s development and help developers and enterprise customers train large-scale generative AI models faster, allowing new products and capabilities to reach customers sooner. | 我們使用谷歌自家設(shè)計(jì)的Tensor Processing Units(TPUs)v4和v5e在我們的AI優(yōu)化基礎(chǔ)設(shè)施上大規(guī)模訓(xùn)練Gemini 1.0。我們把它設(shè)計(jì)成最可靠、最可擴(kuò)展的培訓(xùn)模式,也是最有效的服務(wù)模式。 在TPUs上,Gemini的運(yùn)行速度明顯快于早期、較小和功能較差的機(jī)型。這些定制設(shè)計(jì)的人工智能加速器一直是谷歌人工智能產(chǎn)品的核心,這些服務(wù)為數(shù)十億用戶提供搜索、YouTube、Gmail、Google Maps、Google Play和Android等服務(wù)。它們還使世界各地的公司能夠以經(jīng)濟(jì)高效的方式訓(xùn)練大規(guī)模的AI模型。 今天,我們宣布了迄今為止最強(qiáng)大,最高效和可擴(kuò)展的TPU系統(tǒng),Cloud TPU v5p,專為訓(xùn)練尖端的人工智能模型而設(shè)計(jì)。這款下一代TPU將加速Gemini的開發(fā),并幫助開發(fā)人員和企業(yè)客戶更快地訓(xùn)練大規(guī)模生成式人工智能模型,從而使新產(chǎn)品和功能更快地到達(dá)客戶手中。 |
A row of Cloud TPU v5p AI accelerator supercomputers in a Google data center.谷歌數(shù)據(jù)中心的一排Cloud TPU v5p AI加速器超級(jí)計(jì)算機(jī)

Responsibility and safety責(zé)任與安全
Built with responsibility and safety at the core以責(zé)任和安全為核心構(gòu)建
At Google, we’re committed to advancing bold and responsible AI in everything we do. Building upon Google’s AI Principles and the robust safety policies across our products, we’re adding new protections to account for Gemini’s multimodal capabilities. At each stage of development, we’re considering potential risks and working to test and mitigate them. Gemini has the most comprehensive safety evaluations of any Google AI model to date, including for bias and toxicity. We’ve conducted novel research into potential risk areas like cyber-offense, persuasion and autonomy, and have applied Google Research’s best-in-class adversarial testing techniques to help identify critical safety issues in advance of Gemini’s deployment. To identify blindspots in our internal evaluation approach, we’re working with a diverse group of external experts and partners to stress-test our models across a range of issues. To diagnose content safety issues during Gemini’s training phases and ensure its output follows our policies, we’re using benchmarks such as Real Toxicity Prompts, a set of 100,000 prompts with varying degrees of toxicity pulled from the web, developed by experts at the Allen Institute for AI. Further details on this work are coming soon. | 在谷歌,我們致力于在我們所做的一切中推進(jìn)大膽而負(fù)責(zé)任的人工智能。在谷歌的AI原則和我們產(chǎn)品各個(gè)領(lǐng)域的健全安全政策的基礎(chǔ)上,我們?yōu)镚emini的多模態(tài)能力增加了新的保護(hù)措施。在開發(fā)的每個(gè)階段,我們都考慮了潛在的風(fēng)險(xiǎn),并努力測試和緩解這些風(fēng)險(xiǎn)。 Gemini擁有迄今為止谷歌所有人工智能模型中最全面的安全評估,包括偏見和毒性。我們進(jìn)行了關(guān)于潛在風(fēng)險(xiǎn)領(lǐng)域的新穎研究,如網(wǎng)絡(luò)攻擊、說服和自治,并應(yīng)用了谷歌研究最佳的對抗測試技術(shù),以幫助在Gemini部署之前預(yù)先識(shí)別關(guān)鍵的安全問題。 為了在內(nèi)部評估方法中識(shí)別盲點(diǎn),我們與外部的多樣化的專家團(tuán)隊(duì)和合作伙伴合作,以在一系列問題上對我們的模型進(jìn)行壓力測試。 在Gemini的訓(xùn)練階段診斷內(nèi)容安全問題,并確保其輸出符合我們的政策,我們使用了真實(shí)毒性提示(Real toxic Prompts)等基準(zhǔn)測試,這是一組從網(wǎng)絡(luò)中提取的具有不同程度毒性的10萬個(gè)提示,由艾倫人工智能研究所的專家開發(fā)。有關(guān)此工作的進(jìn)一步細(xì)節(jié)即將發(fā)布。 | To limit harm, we built dedicated safety classifiers to identify, label and sort out content involving violence or negative stereotypes, for example. Combined with robust filters, this layered approach is designed to make Gemini safer and more inclusive for everyone. Additionally, we’re continuing to address known challenges for models such as factuality, grounding, attribution and corroboration. Responsibility and safety will always be central to the development and deployment of our models. This is a long-term commitment that requires building collaboratively, so we’re partnering with the industry and broader ecosystem on defining best practices and setting safety and security benchmarks through organizations like MLCommons, the Frontier Model Forum and its AI Safety Fund, and our Secure AI Framework (SAIF), which was designed to help mitigate security risks specific to AI systems across the public and private sectors. We’ll continue partnering with researchers, governments and civil society groups around the world as we develop Gemini. | 為了減少傷害,我們構(gòu)建了專用的安全分類器,用于識(shí)別、標(biāo)記和分類涉及暴力或負(fù)面刻板印象的內(nèi)容。結(jié)合強(qiáng)大的過濾器,這種分層方法旨在使Gemini更安全、更包容。此外,我們還在繼續(xù)解決模型的已知挑戰(zhàn),如事實(shí)性、基礎(chǔ)、歸因和協(xié)同。 責(zé)任和安全將始終是我們模型開發(fā)和部署的核心。這是一項(xiàng)長期的承諾,需要協(xié)作建設(shè),因此我們正在與行業(yè)和更廣泛的生態(tài)系統(tǒng)合作,共同制定最佳實(shí)踐,并通過MLCommons、Frontier Model Forum及其AI安全基金以及我們的安全AI框架(SAIF)等組織設(shè)定安全和安全標(biāo)準(zhǔn),該框架旨在幫助緩解公共和私營部門中特定于AI系統(tǒng)的安全風(fēng)險(xiǎn)。在我們開發(fā)Gemini的過程中,我們將繼續(xù)與世界各地的研究人員、政府和公民社會(huì)團(tuán)體合作。 |
Availability可用性
Making Gemini available to the world讓Gemini向世界開放
Gemini 1.0 is now rolling out across a range of products and platforms: | Gemini 1.0現(xiàn)在正在逐步在一系列產(chǎn)品和平臺(tái)上推出: |
Gemini Pro in Google products,Gemini Pro在谷歌產(chǎn)品中
We’re bringing Gemini to billions of people through Google products. Starting today, Bard will use a fine-tuned version of Gemini Pro for more advanced reasoning, planning, understanding and more. This is the biggest upgrade to Bard since it launched. It will be available in English in more than 170 countries and territories, and we plan to expand to different modalities and support new languages and locations in the near future. We’re also bringing Gemini to Pixel. Pixel 8 Pro is the first smartphone engineered to run Gemini Nano, which is powering new features like Summarize in the Recorder app and rolling out in Smart Reply in Gboard, starting with WhatsApp — with more messaging apps coming next year. In the coming months, Gemini will be available in more of our products and services like Search, Ads, Chrome and Duet AI. We’re already starting to experiment with Gemini in Search, where it's making our Search Generative Experience (SGE) faster for users, with a 40% reduction in latency in English in the U.S., alongside improvements in quality. | Gemini專業(yè)在谷歌產(chǎn)品 我們通過谷歌產(chǎn)品將Gemini帶給了數(shù)十億人。 從今天開始,Bard將使用Gemini Pro的微調(diào)版本進(jìn)行更高級(jí)的推理、規(guī)劃、理解等操作。這是Bard自推出以來的最大升級(jí)。它將在超過170個(gè)國家和地區(qū)提供英文版本,并計(jì)劃在不久的將來擴(kuò)展到不同的模態(tài),并支持新的語言和地區(qū)。 我們還將Gemini引入Pixel。Pixel 8 Pro是首款運(yùn)行Gemini Nano的智能手機(jī),它支持一些新功能,比如在Recorder應(yīng)用程序中進(jìn)行總結(jié),并在Gboard中推出智能回復(fù)功能,從WhatsApp開始,明年還會(huì)推出更多的即時(shí)通訊應(yīng)用程序。 在未來幾個(gè)月內(nèi),Gemini將在我們的更多產(chǎn)品和服務(wù)中推出,如Search、Ads、Chrome和Duet AI。 我們已經(jīng)開始在Search中嘗試Gemini,它使我們的搜索生成體驗(yàn)(SGE)對用戶更加快速,在美國英語中的延遲減少了40%,同時(shí)提高了質(zhì)量。 |
???????在線體驗(yàn)Gemini
產(chǎn)品測試地址:https://bard.google.com/
Building with Gemini使用Gemini構(gòu)建
Starting on December 13, developers and enterprise customers can access Gemini Pro via the Gemini API in Google AI Studio or Google Cloud Vertex AI. Google AI Studio is a free, web-based developer tool to prototype and launch apps quickly with an API key. When it's time for a fully-managed AI platform, Vertex AI allows customization of Gemini with full data control and benefits from additional Google Cloud features for enterprise security, safety, privacy and data governance and compliance. Android developers will also be able to build with Gemini Nano, our most efficient model for on-device tasks, via AICore, a new system capability available in Android 14, starting on Pixel 8 Pro devices. Sign up for an early preview of AICore. | 從12月13日開始,開發(fā)者和企業(yè)客戶可以通過Google AI Studio或Google Cloud Vertex AI中的Gemini API訪問Gemini Pro。 Google AI Studio是一款免費(fèi)的基于web的開發(fā)者工具,可以通過API密鑰快速創(chuàng)建和發(fā)布應(yīng)用。當(dāng)一個(gè)完全托管的人工智能平臺(tái)到來時(shí),Vertex AI允許Gemini的定制化,具有完全的數(shù)據(jù)控制,并受益于額外的谷歌云功能,包括企業(yè)安全、隱私、數(shù)據(jù)治理和合規(guī)性。 Android開發(fā)者還可以通過AICore (Android 14中的一項(xiàng)新系統(tǒng)功能,從Pixel 8 Pro設(shè)備開始),使用Gemini Nano(我們最高效的設(shè)備上任務(wù)模型)進(jìn)行構(gòu)建。注冊獲得AICore的早期預(yù)覽版。 |
Gemini Ultra coming soon,Gemini Ultra即將推出
For Gemini Ultra, we’re currently completing extensive trust and safety checks, including red-teaming by trusted external parties, and further refining the model using fine-tuning and reinforcement learning from human feedback (RLHF) before making it broadly available. As part of this process, we’ll make Gemini Ultra available to select customers, developers, partners and safety and responsibility experts for early experimentation and feedback before rolling it out to developers and enterprise customers early next year. Early next year, we’ll also launch Bard Advanced, a new, cutting-edge AI experience that gives you access to our best models and capabilities, starting with Gemini Ultra. | 對于Gemini Ultra,我們目前正在進(jìn)行廣泛的信任和安全性檢查,包括由可信賴的外部團(tuán)體進(jìn)行的紅隊(duì)測試,并在廣泛推出之前使用來自人類反饋的微調(diào)和強(qiáng)化學(xué)習(xí)(RLHF)進(jìn)一步完善模型。 作為這一過程的一部分,我們將向選定的客戶、開發(fā)人員、合作伙伴以及安全和責(zé)任專家提供Gemini Ultra,以便在明年年初向開發(fā)人員和企業(yè)客戶推出之前進(jìn)行早期實(shí)驗(yàn)和反饋。 明年年初,我們還將推出Bard Advanced,這是一種全新的尖端人工智能體驗(yàn),從Gemini Ultra開始,您可以使用我們最好的模型和功能。 |
The Gemini era: enabling a future of innovation,Gemini時(shí)代:開啟創(chuàng)新的未來
This is a significant milestone in the development of AI, and the start of a new era for us at Google as we continue to rapidly innovate and responsibly advance the capabilities of our models. We’ve made great progress on Gemini so far and we’re working hard to further extend its capabilities for future versions, including advances in planning and memory, and increasing the context window for processing even more information to give better responses. We’re excited by the amazing possibilities of a world responsibly empowered by AI — a future of innovation that will enhance creativity, extend knowledge, advance science and transform the way billions of people live and work around the world. | 這是人工智能發(fā)展的一個(gè)重要里程碑,也是我們谷歌一個(gè)新時(shí)代的開始,因?yàn)槲覀儗⒗^續(xù)快速創(chuàng)新,負(fù)責(zé)任地提高我們模型的能力。 到目前為止,我們在Gemini上取得了很大的進(jìn)展,并且我們正在努力進(jìn)一步擴(kuò)展其能力,包括在規(guī)劃和記憶方面的進(jìn)步,以及增加上下文窗口以處理更多信息,以提供更好的響應(yīng)。 我們對由人工智能負(fù)責(zé)任賦能的美好可能性感到興奮——這是一個(gè)通過創(chuàng)新來增強(qiáng)創(chuàng)造力、擴(kuò)展知識(shí)、推動(dòng)科學(xué)并改變?nèi)驍?shù)十億人生活和工作方式的未來。 |
|