Evaluating Gemini AI: Doubts Remain Over Its Efficacy in Performing Assignments

Key Takeaways

Google aims to push generative AI as virtual assistants for various tasks but current technology has limitations.
Google demonstrated using Gemini to compare home repair contractor bids and other tasks.
Gemini and other generative AI tools do not have human-like reasoning and logic.

Google had a lot of AI features to talk about today at Google I/O, supposedly capable of everything from solving math problems to managing home repairs. The more I watched the presentation, the more I wondered: how could you ever trust AI for these tasks?

The first wave of generative AI chatbots and models were focused almost entirely on parsing and creating text. You could ask ChatGPT to write a personalized poem, or summarize a classic fiction novel, or write some JavaScript or C++ code. That also expanded to different media formats, like images and video. Generative AI is still not the perfect tool for many of those use cases, but it can definitely be useful.

Google is now trying to push generative AI as a more complete virtual assistant, capable of completing some tasks with minimal or no human intervention. Google Assistant, Siri, and other earlier virtual assistants could handle some basic tasks, like finding a nearby restaurant and starting navigation directions. Google wants to push that even further with its Gemini AI technology, taking over tasks that would normally require human overview and logic processing.

When Summaries Go Wrong

There were a lot of AI feature demonstrations on stage at Google I/O, but importantly, most of them were pre-recorded or simulated. Gemini summarized all recent emails in Gmail from a specific school. It also created a recap of a meeting after parsing the call’s audio recording. Those are the same use cases that Microsoft has promoted for its Copilot assistant, and even though basic summarization has fewer steps that can go wrong, I would still be worried about using it for important meetings or messages. What happens when Gemini mishears comments from your boss about what project should be the priority?

Google also showed off its vision for “Agents,” which were described as AI-powered helpers that can “think multiple steps ahead” and “work across software and systems.” The first example was asking Gemini to help return a pair of shoes ordered online—the AI found the shipping label, contacted Converse to start the return, and created a calendar event for a package pickup with UPS. Another example involved asking Gemini, “I just moved to Chicago, anything I should be thinking about?” Gemini then suggested updating the user’s address in various services, and it completed the task automatically in their DoorDash account after a simple confirmation.

The demo that shocked me the most was a segment about using Gemini in Gmail . The person in the example needs the roof of their house repaired, and they ask Gemini to find and summarize the three bids sent to their email address. The person decides which contractor to pick based entirely on the summary, then sends a reply email that is also AI-generated.

Google

The repair quotes in the example email ranged from $875 to $1500. If Gemini got one detail wrong, or if it failed to include some important information in the summary, that could be a highly expensive mistake. I can’t imagine ever using AI assistance to make that kind of decision. I could very well make a mistake in the same situation, like forgetting to read one message in an email chain or mixing up numbers, but at least then it would be my own fault. How is Gemini an improvement?

Google won’t cover the cost of my broken roof because Gemini misread an email. It also won’t refund me when it sends my returned shoes to the wrong address. Google has promised to cover legal fees when its AI tools inevitably create content too close to copyrighted materials, but that’s about it.

I have used generative AI tools over the past year to help me with coding work, or writing bash scripts, or converting data between formats. The key phrase there is “help me,” because I’m still looking at the input and checking the results. Google increasingly wants generative AI to completely take over tasks, and that’s not something I feel comfortable about, given the current state of the technology.

There have been other attempts at using generative AI to accomplish real-life tasks. OpenTable was one of the first plugins for ChatGPT, potentially allowing you to book dinner reservations through the AI chatbot. You can find a few reports of that not working, though it’s unclear if the AI or the underlying API integrations are to blame.

Don’t Trust It

Google, and many other tech companies, want to sell you the idea that generative AI can be logical and make correct decisions with enough information. Generative AI cannot do that. There is no AI in existence right now that can think like you, me, or any other human. That’s why Gemini, ChatGPT, Copilot, and every other AI assistant can write programs but still fail at basic logic problems. Gemini made a mistake in a demo video today at Google I/O, and it’s not even the first time that happened .

Why would I trust Gemini to make important financial decisions for me, or any other real-life task? Is saving a few seconds or minutes with mundane tasks really worth the potential problems? The idea that a chatbot unable to play tic-tac-toe could somehow automate my life isn’t appealing at all, no matter how excited Google might be about it.

Some Techniques

Evaluating Gemini AI: Doubts Remain Over Its Efficacy in Performing Assignments