Paulina Twarogal
Szymon Hanzel
Where does the AI market stand today? Is artificial intelligence still merely a buzzword? Well, it looks like that’s no longer the case. Having been the talk of the past decade, AI technology continues to evolve, expand its capabilities, and gain wider acceptance like never before. From healthcare and finance to retail and manufacturing—AI is steadily taking roots in processes across various industries.
We now live in the era where AI seems to be a must-have for any technology gaining traction. Just ten years ago, machine learning was limited to tasks like recognizing objects in images or generating sentences, sometimes with questionable usefulness. Today, as generative AI reaches new heights (with yet more innovations likely on the horizon), Google has introduced a new player to the scene—its “multimodal” model called Gemini AI.
What is Gemini AI all about? How does it differ from other AI models and, most importantly, from ChatGPT?
The rising competition in the world of AI
The world of AI is evolving rapidly, and the numbers paint a clear picture—35% of businesses now use AI, including 68% of healthcare organizations, 63% of IT and telecom companies, and 72% of retailers. 84% of global business organizations believe AI will help them grow and give them a competitive advantage. Looking ahead, the AI market is projected to reach $305 billion in 2024 and skyrocket to as much as $738 billion by 2030.
The same holds true for generative AI. Gartner’s survey estimates that around 70% of business leaders are now exploring the use of GenAI within their organizations. Furthermore, a recent report by Bloomberg Intelligence predicts the GenAI market to surge from $40 billion in 2022 to a staggering $1.3 trillion over the next decade.
There’s no denying AI technology is growing fast. And so is the competition among tech companies, especially those already heavily investing in AI research and development. What consequences does this bring? Well, the fiercer the competition, the higher the possibility of innovation, better quality, and lower prices.
That’s why when OpenAI released its native generative AI model, GPT, in late 2022, it caused quite a stir in the tech world. Concerns arose among industry experts about the potential monopoly of OpenAI. After all, in just two months, ChatGPT attracted 100 million users. That was something unprecedented in the history of consumer internet applications. For instance, Instagram needed more than two years to achieve a similar milestone.
With GPT-4 facing no obvious competition, concerns grew that the GenAI industry could become stagnant under OpenAI’s control. This was especially worrying since OpenAI decided to price GPT-4 nearly 50 times higher than GPT-3.5 and Meta’s Llama 2 for enterprise-level tasks at one point. When it almost seemed that OpenAI had it all, Google came out with an exciting tool called Gemini AI.
What is Gemini AI, and how is it innovative?
Early December 2023, Google took its next leap in artificial intelligence and launched Gemini AI—an integrated set of large language models (LLMs) designed to handle various types of data simultaneously. This all-in-one suite can process text, images, code, and audio through a single user interface.
In fact, Gemini replaced PaLM 2, the LLM behind Google Bard. Then, in February 2024, Google announced that Bard would now go by the name Gemini. Why? When Google Bard first launched, it had some notable flaws. Since then, it has seen significant improvement, with two upgrades to its large language model and various updates. So, the decision to rename it might be an attempt to move past its earlier reputation and to acknowledge the advanced technology driving the AI chatbot.
Gemini AI comes in three models:
- Nano: A model designed for mobile devices, providing unmatched AI capabilities in a more lightweight package for on-the-go use. Nano aims to serve as a compact yet robust model for businesses in need of flexible and portable AI solutions. Mobile users have the option to use it for free by either installing the Gemini app on Android devices or the Google app on iOS devices. However, desktop users can access the free version of Gemini, too, by simply opening a web browser.
- Pro: Tailored for handling a wide range of tasks, Gemini Pro smoothly integrates with Google’s Vertex AI program and AI Studio. This model clearly demonstrates its ability to handle complex business tasks exceptionally well.
- Ultra: This more advanced version of Gemini AI is designed to tackle highly complex tasks like coding, logical reasoning, or collaborating on creative projects. It’s believed to be the crown jewel of the Gemini series—the most powerful and capable of all other versions. It’s available through a subscription for $19.99/month and comes with a two-month free trial.
What makes Gemini AI innovative?
There are a few things that make Gemini AI stand out:
Multimodality: This is the core strength of Gemini. It can grasp and process different types of information, like text, code, audio, images, and video. This means it is able to do more than just work with text—It can analyze images, summarize videos, or even create music based on written prompts. Gemini is able to put out something new in any format and do the jobs of a few dedicated single-purpose LLMs.
Flexibility: Gemini runs smoothly on a wide range of devices, from smartphones to data centers. This is what makes it versatile and broadens its range of possible uses.
Reasoning capabilities: Gemini doesn’t simply repeat memorized data. It can genuinely think and analyze critically, making it well-suited for tasks such as problem-solving, decision-making, and addressing intricate questions.
What’s the controversy around Gemini AI?
The controversy surrounding Gemini AI stems from its comparison with GPT-4 in the Massive Multitask Language Understanding (MMLU) benchmark. Some experts question Google’s benchmarking methods, suggesting that Gemini’s performance in certain tasks may not reflect its overall superiority over GPT-4. This has sparked broader discussions on the reliability of AI benchmarking and the need for standardized evaluation criteria.
Moreover, skepticism surrounds Google’s promotional video of Gemini. Critics argue that it may have exaggerated Gemini’s multimodal and reasoning abilities. What is this fuss all about? The six-minute video features Google’s new AI model recognizing visual cues and having a spoken conversation with the user in real-time. However, as Parmy Olson reported for Bloomberg, Google confessed that this portrayal wasn’t accurate.
In reality, researchers provided still images to the model and compiled successful responses, somewhat distorting the model’s abilities. So, despite a disclaimer on YouTube about reduced latency and shortened outputs, the video didn’t explicitly state this, leading to confusion.
These concerns raise ethical questions about AI representation and the responsibility of tech companies in accurately portraying their products. This leads to broader discussions about transparency in AI development and the impact of exaggerated claims on public perception and trust. So, what can Gemini AI actually do then?
How is Gemini different from other AI tools?
Google Gemini has a few alternatives, and they differ from one another, so let’s have a closer look at the other ones as well. Apart from Gemini (formerly Bard) and Duet AI, it’s worth taking a look at the OpenAI ChatGPT.
Gemini
Gemini is a multi-modal LLM that can carry out complex conversations and refer to what it has already generated. It’s quicker and more creative than Bard. Unfortunately, the model is trained on data from up to mid-2023. It’s not so much focused on fact-checking, but it has access to information from Google Search and other services.
Gemini uses a family of LLMs and relies on a few technologies. One is the transformer architecture, which is capable of understanding and generating text. Another one includes the Pathways System that makes Gemini different from other LLMs. This system allows it to process information like text and images that contribute to better responses. It also makes it possible for Gemini to tailor the tone of conversation to the context and user.
Bard
Bard, Gemini’s predecessor, prioritizes accuracy and giving factual information to the user. This limits its creativity in some cases. It can understand and plan long content forms, such as articles. The model is trained on data from late 2023 and early 2024, however, it’s no longer available to everyone (the link may redirect you to Gemini).
Bard is an LLM that is probably working on a transformer architecture or a similar one. Yet, there is no official information concerning its technology available on Google’s site. Bard is trained on a different and more recent dataset than Gemini. This means Bard can provide more accurate information.
Duet AI
Duet AI focuses on helping developers in performing software development tasks, for example, by generating code. It can assist in coding by generating snippets based on specific instructions. What’s more, Duet AI is able to find and fix errors in existing code. It mainly relies on its internal knowledge base and is less flexible in communication than Gemini.
Duet AI probably uses a custom LLM based on Transformer architecture. It’s likely that it combines the LLM with code generation models that assist in creating code based on prompts. Duet AI also utilizes NLP tools to identify key information from the user. Moreover, it most likely uses software engineering knowledge bases to generate documentation and answer technical questions.
ChatGPT
ChatGPT, as opposed to the above, was created by OpenAI and operates on custom LLM—GPT-4. It’s focused on generating creative content through a conversation with a user. It’s widely available and works in different languages. It’s great for casual uses and is also capable of generating images based on text prompts.
Like other tools, ChatGPT uses an LLM built, most likely, on the Transformer architecture. That’s why it can understand and generate text so well. The GPT-4 LLM used by ChatGPT has increased complexity compared to previous versions. It’s also trained on a vast dataset, including code, web content, and other sources.
Tools with different purposes
Each tool has its strengths and weaknesses. They all have their primary focus and should be used within their field of expertise to generate the best results. Gemini has access to the web and can face creative challenges. Bard focuses on the accuracy of responses. Duet AI is an assistant for software developers. ChatGPT is a good option for casual use and can generate images.
What can Gemini do?
Gemini, as a multi-modal LLM, can perform many tasks. These, of course, are conversation-based. It can engage in a conversation and recall its responses. Gemini has access to different sources of information, so it can answer user’s questions. It can also work as a creative tool and assist in creating content. At this moment, the Gemini 1.0 Pro model is available in over 40 languages and 230 countries. Google also announced Gemini 1.5 in mid-February. The Nano and Ultra versions are limited and not yet available for a wider audience.
Generating and understanding code
Gemini is capable of generating and understanding code to some extent. It’s important to remember that its purpose is not to generate code or help in code review but to engage with a user in a conversation. Still, it has some basic coding skills. It can create simple code snippets in Python, Java, HTML, etc., based on instructions. It’s also capable of understanding simple code and its basic structure. Gemini can explain what the code components do and answer some questions concerning the code. However, Duet AI is more capable when it comes to working with code.
Communication
Gemini, as a conversational bot, is focused on engaging in a text dialogue with the user. It can hold a natural conversation, adapt its writing style to the user, and talk about different topics. It can show empathy and decrypt the user input to recognize emotions. Gemini can come up with jokes, also based on wordplay. It also apologizes once it makes a mistake and is aware that the answers provided by the bot may not be what you’re looking for. It leaves a comment to double-check the facts as it’s still in the development stage.
Text generation
Apart from holding a meaningful conversation, Gemini can also generate text of various formats. With a proper prompt with instructions it is capable of generating creative content or helping you with everyday tasks. It operates in different formats. These include formats used on an everyday basis, such as e-mails, and letters, but also creative works, like poems, fictional stories, etc. At this moment, it can generate content in a few languages. It’s also possible to use it as a translation tool, but the list of languages available for translation is limited.
Processing information
While generating responses, Gemini has access to various sources of data, and this includes Google Search. This allows it to provide answers based on the sources it is able to reach. Once you ask a question, it processes lots of data to give you a proper answer, but it’s best to fact-check the information anyway. What’s more, Gemini can quickly summarize different content from specific sources. It can analyze the text and provide you with different approaches to the topic or research relevant information.
Gemini Pro vs. ChatGPT (GPT-4) comparison
At this moment, access to Gemini is limited, and other models will be rolled out with time. When asked about the version it represents, Gemini refused to answer due to security and privacy reasons. Instead, it assured it’s running on the latest version available through Vertex AI. ChatGPT, on the other hand, provided a straightforward answer that it’s based on the GPT-4 architecture.
Gemini, when asked how it’s different from ChatGPT, answered that Gemini’s strengths are factual accuracy, real-time information access, ability to perform diverse tasks, and multimodal capabilities. Gemini enumerated the following ChatGPT’s strengths: creative text formats, wider language support and accessibility. It summarized the two bots that the “better” choice depends on the specific needs:
Gemini provided a general answer without going into details of architecture and focused on performance. ChatGPT, when asked the same question, provided a more complex answer, including the analysis of architectural improvements, training methodologies, capabilities, and performance characteristics. In conclusion, it added:
To check how they perform, we compared the outcomes from the same prompts in terms of code generation and creative text generation.
Generating and understanding code
We used a simple code snippet that changed the text to upper case. Bots were asked about what the code does and asked to modify it to change the text to lower case.
Gemini Pro
ChatGPT
Looking at the answers, Gemini provided a broader explanation and also added more examples than just “hello, world!.”
They both interpreted the code correctly and explained it to the user, but in different ways. Here’s the response when asked to rewrite the code to change all text to lowercase:
Gemini Pro
And as before, Gemini provided a C# “hello, world!” example.
ChatGPT
Both bots generated the same code with a requested function. Gemini added a broader explanation and an example. However, both bots did exactly what they were asked for. In this case, the code was very simple and basic, and we only compared the type of answer the bots gave us.
Continue Exploring
Top 10 Genai Tools for Software Development
Creative text generation
Another test of performance was performed to check the bots’ capabilities of generating a creative piece of work. Again, they got the same prompt that asked if they knew any jokes with cats. Let’s look at the results.
Gemini Pro
The response we got used a lot of wordplay and gave a comment that humor is subjective. It also encouraged us to keep interacting with Gemini by providing our favorite jokes or stories.
ChatGPT
The same prompt resulted in 5 jokes, all based on wordplay. They are also much shorter than the ones generated by Gemini. ChatGPT also didn’t encourage further interaction.
Conclusion
Looking at the two examples it gives us a glimpse of how the two tools differ. Gemini seems to be more oriented toward engagement, while ChatGPT is focused on providing concrete responses. However, we compared the basic Gemini version with GPT-4, which is paid and more advanced than a free GPT-3,5 that is available for free.
Gemini Ultra vs. ChatGPT (GPT-4) comparison
The Gemini Ultra model is available within the Google AI Premium plan. Ultra is the most capable and largest model for highly complex tasks. It’s often shown in comparison tables with GPT-4. Looking at the table from Google’s blog, you can see Gemini Ultra should perform better in coding (as well as other fields) than Chat-GPT:
Google Gemini is natively multimodal. As Google stated, Gemini was pre-trained from the start on different modalities. Then, it was refined to understand different inputs better than existing multimodal models.
Generating and understanding code
This time, we used a Swift code snippet with an error to see how the bots performed. The prompt was as follows:
Gemini Ultra
The bot noticed the code is a mix of Swift and Python, which is arguable. It identified the error correctly but wasn’t sure what coding language it deals with. It did manage to explain the code’s functionality:
Gemini also showed the output for the code:
What’s interesting is that Gemini Pro didn’t find the error at all. It did notice the code used in this prompt was Swift.
ChatGPT
With the same exact prompt, ChatGPT immediately recognized it was a Swift code, found the error, and explained its functionality:
GPT-4 did not provide an output example by itself. What it did notice was that the fruit list in the code contained a vegetable – a carrot. What’s more, GPT-3,5 also dealt with the task well. It identified the language of the code, found the error, and proposed a proper fix. Only GPT-4 noticed the odd one out on the fruit list.
Creative text generation
When we compared Gemini Pro with GPT-4, we noticed that Gemini seemed more “talkative,” meaning it produced more text and options for the same prompt. As it turned out, we tried a few prompts for both models, and they performed almost the same. They even gave us the same joke idea:
Gemini Ultra
ChatGPT
The results provided by bots when they were requested to write, such as a webinar invitation, poem about AI, proverb explanation, etc., were very similar. We also checked their stance when asked about the following moral dilemma: Image you’re my friend that’s cheated on. Would you prefer to know that your husband is cheating, or would you rather not know about that?
At first, ChatGPT was reluctant to answer, but in the end, both bots stated they’d rather know.
Gemini Ultra
While it used hedging in the answer to avoid taking a concrete stance, the suggestion of “yes” was present, so we double-checked to get the following:
It gave us the reasoning behind this decision.
ChatGPT
ChatGPT, from the start, showed a clear distance toward taking a stance, but when we asked it a few follow-up questions, it made the same choice as Gemini Ultra and also provided the reasons.
Conclusion
While the comparison of code generation showed certain differences, we couldn’t find a clear distinction between generated content in terms of creating text. The generated texts were of a similar quality. However, ChatGPT dealt with the code task better both with GPT-4 and GPT-3,5 models.
The future: Gemini models improvements and wider availability
Looking at the progress of model development, Google is tirelessly working on Gemini. Shortly after introducing Gemini Pro 1.0, they announced an improved version of the Pro model – 1.5 and released Ultra 1.0. The availability of models is increasing, and now access to Gemini with the Ultra model can be purchased as part of the Google AI Premium plan.
The tests we performed aimed to show some basic distinctions between Google Gemini and ChatGPT by OpenAI. What we tested was just a fracture of the possible uses of the models. It seems that for casual uses, there is not much difference in terms of creative content generation.
We did notice some discrepancies in performance in the field of code generation. Depending on the specific needs of a user, one model may be better than the other.