Windows Copilot Needs to Break Free from the Shackles of a Chatbox

In Short
  • Windows Copilot is the headline feature of Windows 11 to deliver AI experiences, but it's just another AI chatbot like Edge Copilot.
  • The text-based chatbot experience is not very intuitive to use, and it's a downgrade from Cortana. Windows Copilot is not deeply tied to the OS.
  • Microsoft should bring support for vision models and focus on performing actions to deliver meaningful AI experiences.

With the release of OpenAI’s ChatGPT, chatbots have become the face of Artificial Intelligence (AI) in today’s world. It appears talking with an AI chatbot is the only way to interact with AI models and intelligent systems. Although I agree that a chatbot offers a schematic, user-friendly interface for most users to interact with an AI model, it can’t be true that all your dreams of interacting with an intelligent system die in the four corners of a text chatbox.

In this regard, Microsoft has caught in the frenzy of integrating AI chatbots into many of its products. Most notably, it has integrated Windows Copilot, an AI chatbot powered by OpenAI’s models into Windows 11 with much hype and pomp. Not to miss the fact that Microsoft replaced Cortana with Windows Copilot on Windows 11. And the tech giant has also integrated Windows Copilot to Windows 10, replacing Cortana.

Surely, Microsoft believes that AI chatbots are going to be the future. But is it really the vision of intelligent computing, powered by AI? Or is Microsoft just pandering to the AI hype and integrating AI chatbots to show investors that it has skin in the game? Whatever the answer, the current form of AI-driven chatbots has limited application and it feels restrictive to get any meaningful help from the chatbot, especially at the OS level.

Windows Copilot: A Downgrade Over Cortana?

Microsoft decided to wind down Cortana — a 9-year-old product — in favor of Windows Copilot, but is that a suitable replacement, especially when Windows Copilot is still in preview?

Nevertheless, let’s go through the comparison point by point. First off, Cortana was primarily a voice assistant whereas Windows Copilot is a text-based AI chatbot, although it supports voice input but not by default.

Simply put, Windows Copilot is not designed for a voice-first user experience so it gives a disjointed experience, unlike Cortana which felt more personal. I think, when it comes to UI approachability, voice input is preferred by many over text input for the sheer ease of use and intuitiveness. So Windows Copilot fails on the vital user experience test at the very outset.

Now coming to features, Cortana was a fleshed-out product by now and it could perform a lot of system-level actions. It could create a timer, set an alarm, add reminders, compose an email, find definitions, open apps, and do much more. In essence, Cortana was deeply integrated into the Windows OS and understood the system very well.

In comparison, Copilot is powered by general-purpose large language models (LLM) that are not well-tuned for performing local actions on Windows. When I ask Windows Copilot to set a timer, it tells me to go to an online service to set a timer. It can’t even set an alarm or play music. Copilot simply opens the Spotify app for me. I can’t seem to find any AI magic in here.

Microsoft is in a rush to board the AI hype train, emblematic of how Microsoft missed the smartphone race which it now regrets, and doesn’t want to repeat the same mistake.

Of course, Windows Copilot is still in preview, and these features will be likely added in the future (some already in testing in Insider builds), but what was the tearing hurry to replace Cortana with a barely-working AI chatbot?

It appears to me that Microsoft is in a rush to board the AI hype train, emblematic of how Microsoft missed the smartphone race which it now regrets, and doesn’t want to repeat the same mistake.

What irks me is that Microsoft seems to have not given much thought to Windows Copilot. It has simply integrated a chatbot and called it a day, at least for now. The tech giant hasn’t even tried to bring feature parity between Copilot and Cortana before replacing nearly a decade-old product.

It’s especially disappointing because Microsoft is adding a Copilot key to the Windows keyboard — something Microsoft calls it “significant change to the Windows PC keyboard in nearly three decades” — yet so little thought has been given to it.

Where is the AI Magic in Windows Copilot?

Now, let’s come to what Windows Copilot can do. You can ask questions about any topic and get answers right away. You can also move to the Creative mode to talk to the powerful GPT-4 model.

It can summarize a webpage, find key insights, plan an itinerary, etc. Microsoft has also added a screenshot tool to Copilot that uses the GPT-4V model for visual analysis. You can use it to perform OCR or find information about an image.

As for Windows-specific features, you can say, “I am having issues with audio” and Copilot can open the audio troubleshooter for you. It works for troubleshooting other Windows issues as well. Besides that, you can turn on/ off dark mode, take a screenshot, and snap windows through Copilot.

While these features are decent for the preview version of Windows Copilot, most of them work in Edge Copilot as well, except for Windows-specific features. Moreover, Windows Copilot can’t access webpages from Chrome or other browsers. As Windows Copilot is running on Edge’s engine, it can’t access content from other windows, be it a browser, Notepad, or Office apps.

This is another major gap in Windows Copilot’s implementation. It’s not developed using the WinUI 3 framework for delivering a native experience, instead Copilot is running as an extension of the Edge browser. As a result, you don’t see deep integration of Windows Copilot in key elements of the OS.

For example, you can’t right-click on a file in Windows Explorer and ask Windows Copilot to explain it, convert the file format, or perform any action you want. It would have been so cool if you could throw an Excel file at Copilot from the context menu and it could perform data analysis right there. Currently, except for images, there is simply no way to interact with files using Windows Copilot on Windows 11.

Windows Copilot: A Case of Overpromising and Under-delivering

Of late, Microsoft has been very good at announcing and marketing new features, but when it comes to using the promised features, you can’t seem to find them. When Windows Copilot was announced three months back, it promised several new features, however, they are not available yet or don’t function as marketed.

For example, when you ask Windows Copilot to snap your windows, it asks your permission and then snaps just one window, leaving you to perform the rest of the action. Similarly, it doesn’t play mood-specific music when you ask it to play something while working. Copilot simply throws links from YouTube and other sources. That’s not what you expect from an intelligent AI-powered Copilot, do you?

Next, the much anticipated contextual menu for Copilot has not arrived yet. Rewrite, Explain, and Summarize are not available for any active window. Draft with Copilot is also nowhere to be found even after three months of release. Not to forget, you can’t remove the background of images using Copilot, and Extension support has not been added yet.

So all the marketed and hyped-up features are not there. It’s a simple case of Microsoft overpromising and under-delivering with many of its products.

What Could Be the Vision for Windows Copilot?

Now, let’s come to what Windows Copilot can do. If we look at what the open-source community is doing, we have an interesting Open Interpreter tool that can interact with your local files, convert them to other formats, process various file formats, create charts, and do much more. It can also interact with various system settings and tools and perform actions on Windows.

Just recently, a new version of Open Interpreter (0.2.0) was released with a fascinating OS mode. You can operate your computer with simple natural language prompting. Open Interpreter uses vision models like GPT-4V to understand the GUI environment and perform actions on your computer.

To give you an example, you can ask it to turn on dark mode, and it opens the appropriate Settings page and turns on the toggle using the Vision model.

You ask it to play some lo-fi music, and it opens the browser, and YouTube and finds some great lo-fi playlists, and plays it for you. These are some basic examples of what vision models are capable of, but Windows Copilot is stuck at throwing texts at you in the chatbox.

A truly intelligent Copilot should be able to send an email, tweak Windows settings, interact with the OS at the system level, and do so much more. The use-case is limitless and it can be so useful for improving accessibility on Windows 11 24H2.

Of course, calling the GPT-4V API will cost a lot of money for Microsoft, but it can build a small vision model specifically for Windows, much like CogVLM. This way, the latency will be reduced and everything will run locally, even when your PC is offline.

With the upcoming Intel and Snapdragon X Elite chipsets having dedicated NPUs, running smaller models on-device would be possible. Even if Microsoft runs its in-house developed visual model on the cloud, it would cost much less.

To give another example, we have just seen the demo of Rabbit R1 — an AI-first hardware device — that can perform actions for you. It’s powered by what they call an LAM (Large Action Model). From ordering pizza to sending emails and booking flights, it can intelligently do everything for you with just voice input.

Microsoft needs to come up with something like an LAM that is designed to perform actions, and not for just chatting with a chatbot.

If a small startup like Rabbit can pull it off, so can a large tech giant like Microsoft with humongous resources on its side. So far, we have seen Microsoft building its own Phi-2 model, a small LLM, for research purposes only. If Microsoft really wants to have us experience AI PCs in 2024, it needs to build Windows-specific vision models for running agents locally with near-zero latency. Microsoft needs to come up with something like an LAM that is designed to perform actions, and not for just chatting with a chatbot.

Windows Copilot Needs a Fresh Approach

Finally, to conclude, Windows Copilot, in its current chatbot form has an extremely limited use case and is already covered by countless browser extensions and Edge Copilot. Microsoft needs a fresh approach to make AI PCs a reality.

Microsoft’s fiercest competitor, Apple is known for building a product thoroughly and releasing it to the public when it’s ready for use. In contrast, Microsoft does the opposite. It rushes to release the product when it’s not even ready with functional and meaningful features available at launch.

It’s symbolic of how Microsoft is approaching AI without much thought. The company has already started calling Edge an AI browser by just integrating a chatbot. It’s also working to add AI features to Notepad and continues to bring AI-powered features to MS Paint, Snipping Tool, Office apps, and other first-party apps.

Microsoft needs to get over the obsession of integrating a chatbot and start afresh.

While these in-app AI features can help some users, to make Windows an intelligent OS, powered by AI, Microsoft needs to get over the obsession of integrating a chatbot and start afresh with novel ideas and approaches.

Comments 2
  • Francis Asiedu says:

    Very good analysis. Context-menu-capable AI would be very cool.

  • Raj Shreyansh says:

    I agree with what you wrote.

Leave a Reply