Gemini Ultra 1.0 vs ChatGPT 4 (My First Impression)
There’s been a lot of anticipation to see if Google could step up and compete with ChatGPT, the extremely capable AI assistant from OpenAI that took us by storm late 2022.
I finally got access to try out Google’s newest AI model, Gemini Ultra 1.0. And I’ll share my first impressions of Gemini after running it through my standard test suite that I use whenever I get access to a new large language model.
We’ll take a look at its performance in coding, reasoning about the physical world, and image captioning.
I’ll also compare some of the responses directly to what ChatGPT produces to get a sense of how they stack up.
My Initial Impressions of the Gemini UI
Taking a quick tour of the Gemini UI, I like what I see so far. It has a clean, dark mode design that will be familiar to ChatGPT users. The real-time response is great to see, and the ability to upload images directly for captioning is handy.
It does mention having extensions as well, though I won’t dive into those just yet.
Overall, my first impressions are positive – it looks well-designed and very capable as Google’s “most advanced AI assistant.” Now let’s see if it lives up to that billing.
Test #1 Logical Reasoning
I like to start my evaluations with some logical reasoning questions to test how well these models can take in a word problem, extract the key details, and reason step-by-step to an answer.
Here was the first one I tried:
Let’s see what Gemini came up with:
Unfortunately, this first try missed the mark. When pushed to explain its reasoning in the draft, it doubled down:
I gave it one more chance to get this right:
Better, but still not precise enough to fully get the reasoning. As a comparison, here is ChatGPT’s response:
ChatGPT nails the reasoning here. So point one goes to ChatGPT in this first round.
Test #2 Physical World Reasoning
For my next test, I wanted to see how Gemini handles a more complex word problem involving reasoning about objects interacting in the physical world:
I then describe placing the ball in the bag, walking to my office, dropping the bag in a box which I seal and mail to my friend in London. The question is:
The right reasoning is that since the hole is bigger than the ball, it likely dropped out in my office before I mailed the box. Let’s see Gemini’s response:
Unfortunately, Gemini fails to account for the hole in the bag and incorrectly assumes the ball made it to London. Here is ChatGPT’s response for comparison:
Another miss for Gemini. It seems like ChatGPT is doing better at this more complex line of deductive reasoning.
Test #3 Coding Test
I like to test these models on their ability to generate runnable code, so my next test was to prompt Gemini to:
Let’s jump right to Gemini’s final successful attempt after a couple of failed tries:
# Code for snake game
screen = pygame.display.set_mode((400,400))
snake = [[200, 200], [210, 200], [220,200]]
snake_skin = pygame.Surface((10,10))
apple_pos = [random.randint(0,390), random.randint(0,390)]
apple = pygame.Surface((10,10))
my_direction = 'RIGHT'
change_to = my_direction
clock = pygame.time.Clock()
for event in pygame.event.get():
if event.type == pygame.QUIT:
if event.type == pygame.KEYDOWN:
if event.key == pygame.K_UP:
change_to = 'UP'
# And so on for other keys
# Rest of game code
This snake game code works! Gemini got there on the third try. As a comparison, ChatGPT served up the working code on the first attempt.
So a bit of a variability difference there, but good to see Gemini can generate runnable Python code after a few tries.
Test #4 Explaining Python Code
I wanted to test how Gemini handles explaining Python code, so I threw this short function at it:
if n == 0:
return n * factorial(n-1)
Surprisingly, Gemini refused:
However, ChatGPT had no issues providing a concise explanation of how the recursive factorial function operates.
The refusal by Gemini’s was concerning – it seems like the sort of task it should be capable of. I’ll have to dig into this more.
Test #5 Image Captioning
As one final test, I wanted to validate if Gemini could actually caption images as advertised.
I uploaded a picture of some Nvidia merch I’ll be giving away and asked it to describe the contents:
This caption seems perfectly accurate, so the image feature appears working. Thumbs up here for Gemini!
Conclusion and Next Steps
So those were my first impression tests of Google’s new Gemini Ultra AI assistant.
I saw some positives like the fast response time, ability to generate code, and accurate image captions. However, it still doesn’t seem up to the consistency and reasoning ability I’ve come to expect from ChatGPT.
I need to dig into Gemini a bit more and also hopefully get access to the actual API which may reveal more capabilities.
Some of the refusal behaviors had me questioning if I properly had access to Ultra, so I need to validate that as well.
But I hoped this overview was helpful if you’ve been curious how Google’s latest offering stacks up to the current AI leader. While promising in areas, I don’t think it’s caught up just yet based on my early testing.
What other tests would you want to see run to continue evaluating Gemini Ultra? Let me know in the comments! I plan to revisit this and share more thorough benchmarking once I get API access.