# ChatGPT, Bard, and the battle to become the "everything app"

[//]: # (title: ChatGPT, Bard, and the battle to become the "everything app")
[//]: # (description: AI-powered search assistants are crucial to ensure that the Web starts and ends on your platform)
[//]: # (image: /img/robot-voice.png)
[//]: # (author: Fabio Manganiello <fabio@manganiello.tech>)
[//]: # (published: 2023-02-07)

Everybody in the Silicon Valley is crazy about chatbots lately.

[ChatGPT](https://chat.openai.com/chat) has taken the world by storm with its
ability to communicate effortlessly about anything - up to the point that it can
generate content that [can pass the final
exam](https://edition.cnn.com/2023/01/26/tech/chatgpt-passes-exams/index.html)
for some business and law schools.

In the meantime, some folks (including myself) believe that the current wave of
AI, albeit surprisingly advanced in some applications, is still in the
[_stochastic parrots_](https://dl.acm.org/doi/10.1145/3442188.3445922) phase
rather than real artificial intelligence. That's just because of how these
models are trained (build a network with billions of units, train it with the
whole Internet, and see what comes out of it). Sure, they are very good at
spotting and replicate statistical patterns, but they still need input from
[human moderation in the form of explicit, old-fashioned if-else-like
rules](https://futurism.com/amazing-jailbreak-chatgpt) to prevent their answers
from going completely wild/immoral - and it's even [possible to jailbreak
them](https://www.cnbc.com/2023/02/06/chatgpt-jailbreak-forces-it-to-break-its-own-rules.html)
to bypass those restrictions.

## Does Big Tech even care about AI?

Anyways, the fact that these models are still in a stochastic parrot stage
doesn't seem to matter much for Big Tech. After all, their only purpose is to
make their investors rich in the shortest possible time. Not to build a truly
intelligent machine with human-like ethics and reasoning skills. That would be
stuff that requires actual long-term R&D investments, and VCs nowadays aren't
that patient - especially after many of them [lost big money in the crypto
scam](https://www.bloomberg.com/news/articles/2022-11-11/ftx-fueled-by-tech-cash-sam-bankman-fried-s-crash-exposes-vc-flaws),
and they are craving for quick ways to recover from their losses.

No, Big Tech really doesn't care about AI.

Had they really cared about progress in AI, they wouldn't have fired or
repurposed entire AI ethical teams for producing research that showed the
problems with the AI strategy pursued by the top management.

And they  wouldn't have spent the past decade iterating again and again over the
same technology (neural networks) to grab all the possible low-hanging fruits,
even when most of the academia has doubts about the fact that human-specific
skills and behaviour (like ethics and logical reasoning) can really emerge from
today's neural networks alone. Sure, they are cheap to scale, you add more
hidden layers and feed them with more quality data and their performance
improves like magic, and they are good at imitating the human brain in a
specific set of activities (mostly those based on associative memory that is
reinforced through repeated exposure to examples, mostly from sensorial
input/output). But human intelligence isn't only about associative memory that
learns to classify and predict things through repeated exposure. Otherwise we
would only need the visual neurons in our occipital cortex to process optical
signals from our eyes, and all the rest of the brain would be redundant.
Unfortunately, research in the field has become dominated by hammer experts who
think that the whole world is made of nails.

The past decade has seen many other fields in AI (like Bayesian techniques and
symbolic reasoning) languish, as [most of the AI conferences became dominated by
a handful of large companies](https://arxiv.org/pdf/2205.01039.pdf) which are
narrowly-focused on advancing research in the applications that are profitable
for their business, rather than pursuing high-hanging fruits that would benefit
the field as a whole. And, of course, since their business strategy is often at
odds with what a wise and gradual development of such a sensitive field would
require, discussions about ethical AI are often held at the fringe of the field,
far from the places and the people that are actually forging the technology.

That, of course, is also very deliberate: in a highly competitive market you
want to take everybody by surprise by showing people that you are one step
ahead. That can only happen if you throw something as a beta to the world, and
grab all the headlines before others do. AI ethical teams, peer reviews, or even
just giving people the time to let the changes sink in and think over the
possible consequences, are all things that stand in the way of being the first
at everything, and make investors as happy as a bunch of kids in a candy store.
Eventually, the development of technologies that can drastically change our
societies is only up to the technical teams in charge of the implementation, the
interaction with the scientific community revolves only around topics such as
"_is it better to use this or that model architecture for this class of
problems?_", and everything else is just an obstacle in the way of the business'
vision. The strategy is akin to delivering to market racing cars without seat
belts and airbags: of course somebody is likely to get hurt, but at least you'll
have shown your investors that you can build a fast car before your competitors.

[ChatGPT caused a "Code Red" at
Google](https://www.cnet.com/tech/services-and-software/chatgpt-caused-code-red-at-google-report-says/),
not because of concerns about generative AI being suddenly deployed to everyone
without much of a supervision or an ethical framework everybody agrees on. Their
only concern was "_how the hell did OpenAI manage to roll this out before us,
and which corners can we cut to deploy a competitor as soon as possible and show
our investors that we're still in the game?_".

If companies like Google and Microsoft are going all big on AI lately, it's
definitely because of reasons other than genuine interest in advancing the
field academically.

## All you need is hype

As mentioned earlier, many VCs have been heavily burned by the collapse of the
crypto scam. The recent big layoffs in tech are, at least in part, motivated by
investors tightening their pursue after losing a lot of money in a bad gamble.

Now that there aren't big profits left to make in crypto speculation, they need
a new hype wave to get rich, preferably without waiting too long. And who cares
if everybody is still far from building anything resembling a human (or even a
more primitive) form of true intelligence. If it looks intelligent, then it is
intelligent, and you can market it as AI. Something like ChatGPT was the rabbit
that was pulled out of the cylinder at the right time.

But there's an even darker reason behind the current hype. That has to do with
the real long-term mission of these tech giants.

## The path to becoming the Alpha and the Omega of the Web

Both Google and Microsoft are focused on a specific application of
conversational AI: search assistants. Turn their search engine experience into
something more human, where you get answers by conversing with a bot, instead of
scrolling through an endless list of links to other websites. This is not a
coincidence.

Their real mission, at least for the past decade, has been to build a platform
that isn't "just" a gateway to the Web. It should be **THE** Web. Google's
initial mission was to minimize the time that users spent on their website - you
find what you're looking for as fast as possible, you click on it, and then
you're out. Their initial success metric was simple: provide relevant results
for the user at the top of the page, so users don't have to spend time
scrolling and clicking around. The shorter the time the user spends on the
website before clicking on what they're looking for, the better.

Over the past 10 years, that mission has been flipped on its head:
now Google wants to be the beginning and the end of your Internet journey.

If you can get all the answers you want directly from your search engine, then
you will never need to open any another website. All the efforts on Google's
side (from going all-in with voice assistants, to [adding related
questions/answers](https://searchengineland.com/google-puts-some-contextual-search-features-in-context-344111)
on the search results, to more and more information provided in inline boxes on
the search page, to [scraping and showing lyrics directly in the search
engine](https://www.theverge.com/2014/12/23/7439833/google-now-shows-song-lyrics-with-search-results),
to showing rates for flights and hotels directly in the results) have gone in
that direction. From Google's perspective, a user that spends more time on the
website (because all the information that they need is already there) is much
more profitable than a user that uses them just as a gateway to the Web - even
if that's supposed to be the whole purpose of a search engine.

Silicon Valley isn't that explicit about it (well, [except for Elon
Musk](https://www.forbes.com/sites/madelinehalpert/2022/10/05/if-musk-turns-twitter-into-x-his-everything-app---heres-what-it-might-look-like/)
of course), but the dream of the giants is still to build THE ultimate platform
(the Chinese way) where your Internet experience starts and ends. "Everything"
platforms like Tencent and WeChat are actually a source of inspiration in the
valley. AI bots are just a nice shortcut to get to that end. Why would you want
to visit the websites of several outlets to stay up-to-date with the news about
an event, when you can just ask a question to your search assistant, and it will
provide you with everything it digested on that topic from hundreds or thousands
of articles? The AI already read them all for you, so you won't have any of
them!

## Have all the cakes and eat them

What kind of world would this be on the long run? I've given it a thought
lately, and I think that it's world that will eventually leave everybody worse
off on the long run - including the tech giants themselves.

Think of it for a moment. Models like ChatGPT exist thanks to the Web. Thanks to
a network of billions of websites that they can crawl and scrape, and that
content is eventually pre-digested into something that can be fed to humongous
models.

The relation is also very asymmetric: the whole business model of these
platforms is based on scraping the Web, but they'll do their best to prevent you
from scraping THEIR platforms (and, if you manage to do so, [they will often take
you to court](https://www.bloomberg.com/news/articles/2023-02-02/meta-was-scraping-sites-for-years-while-fighting-the-practice)).

If, for most of the users, the Internet journey starts and ends on the search
engine, and there's no way of making the information from the search engine
"trickle down" to other websites and platforms, then all the other websites will
starve off.

News outlets today complain for Google News putting a summary of their articles
in their feed (yet with the original link still attached)? Then imagine a world
where you just ask a bot about the news on a specific topic, and the bot answers
you with what it digested from other news outlets - no links attached.

Not only: since you spend a lot of time on the search engine, the search engine,
unlike the Guardian or the NYT, also knows a lot about you. It can provide you
with a personalized experience based on the content you're most likely
interested in, from the ideological perspective you're more likely to lean on,
thereby amplifying the creation of ideological bubbles. What's worse is that,
unlike articles written by journalists or other specialists (where the same
information is publicly available to everyone for scrutiny), customized answers
from a search assistant aren't subject to any form of external scrutiny, nor are
accountable for accuracy, since they are just passing around content digested
from somewhere else.

At some point, what will be the point of investing time and money into building
your website or platform, when people already have the "everything app"?
Publishing e.g. a blog, or some technical documentation, or a news outlet, or
building an e-commerce platform with an open API, means cooking some food that
these large models can feast on. They will digest all of your content, summarize
it, spit it out to users, and you may rarely see a single organic visit.

If other websites eventually die off, these AI models will have less and less
content to be trained on, unless the companies behind them also take on the
organic business of the websites that they are replacing instead of just
scraping them (something not very likely to happen, given their resistance to
scaling in fields that require hiring more humans).

Eventually, the performance of their models will degrade and they will start to
provide outdated information.

I'm pretty sure that there are smart people at these companies who have reached
the same conclusions. And yet they are still pushing at breakneck speed with
this strategy, because short-term returns are much more important than the
long-term damage inflicted to everyone (including themselves).

## Search assistants are a flawed version of what the semantic Web could have been

What infurates me the most, however, is that these businesses aren't inventing
anything new.

We already envisioned something similar to what these companies are working on
- a "Web of meaning" and connected information that can easily be parsed by
machines as well as human eyes. But it was better, less costly to run, open
and decentralized.

Actually, somebody already envisioned it more than 20 years ago - I mean, [Tim
Berners-Lee wrote an article about it already in
2001](https://www.scientificamerican.com/article/the-semantic-web/). It was
called "semantic Web" - a.k.a. the "real" Web 3.0.

The idea is based on a curated layer of "meaning" built on top of the Web. A Web
page shouldn't be only HTML that can be rendered on a screen. A Web page
actually contains information, and, from the perspective of a machine, that
information is best digested when provided in a structured format.

Consider this simple HTML snippet for example:

```html
<div>
  Alan Turing was born in Maida Vale, London, on June 23rd, 1912.
</div>
```

What can a machine infer from this snippet? Well, actually not much - unless it
scrapes text from the HTML, discards all the tags that aren't relevant, filters
out all the text from menus, ads etc., and it feeds the extracted text to a
language model that converts it to a structured representation. It's a snippet
that is meant to be rendered by a browser as text that obeys specific style
rules. It's a presentation layer for the human eye: the machine is just an
intermediary.

What if, however, you were writing HTML like this?

```html
<div vocab="https://schema.org/" typeof="Person">
  <span property="name">Alan Turing</span>
  was born in
  <span property="birthPlace">Maida Vale, London</span>
  on
  <span property="birthDate">June 23rd, 1912</span>.
</div>
```

The HTML would still be perfectly valid and rendered in the browser the same
way. However, just adding a schema/vocabulary makes it easy to model an
_ontology_ on top of HTML that can be also understood by a machine. Now a
machine must no longer rely on complex scrapers and NLP models to get the
meaning of a Web page. It just needs to go through the tags, extract the schemas,
read the properties and the predicates, and it can already provide you a summary
of the page worth of ChatGPT.

This is exactly what these language models, at the end of the day, do: extract
meaning out of natural language, and generate content in natural language. But,
instead of having grammar and semantics baked into the markup of the Web, they
are scraping whatever they can find on the web, and basically "brute-force" the
meaning out of them through expensive, huge statistical models. It's like
looking for somebody's house in a city that you don't know, and, instead of trying to
get the address and putting your hands on a map, you tried to walk all the roads
and ring all the doorbells until you find the right one. Or putting thousands of
monkeys in a room filled with typewriters, and wait until one of them ends up
replicating a Shakespeare sonnet. You want to use the right tools for the right
purpose, and stochastic methods just aren't the most efficient solution for all
the problems.

Think of it from a moment. In the annotated HTML snippet above, extracting
meaning from a page and summarizing it doesn't require a big model with
billions of units, trained with billions of documents on machines collectively
worth millions of dollars (with all the connected environmental concerns). A
simple DOM parser written by a junior developer would suffice to extract all
the information you need from any website. Anybody can do it, anybody could
build their own search engine, anybody could make their own network of crawlers
that extracts clean information from web pages. Sure, Web creators will also
have to be in charge of modelling the information in their websites in a
structured format. But, if that was the requirement for the content on their
websites to be indexed and searchable, would it be much different from the craze
they already go through for SEO optimization? Or would it be much different
from an Instagram influencer curating their posts with lists of hashtags so they
can be easily searched?

So, if this idea was so amazing, if it's been pushed by the inventor of the Web
himself, how come we haven't embraced it yet?

Well, exactly because it would empower _anybody_ to build algorithms that could
extract meaning from the Web!

Data is the new oil, and large companies don't want a world where anybody can
extract it. Such a world would dilute their value proposition. Technologies
that lower the barriers to access structured information, like the semantic
Web, are existential threats to their business model. In a world where anybody
can build a crawler that extracts clean information from the Web, products like
Google search would either have a fierce competition (and monopolies hate
competition), or no reason to exist at all.

So their strategy can be summarized as it follows:

1. Keep entry barriers as high as possible, so that the only competitors who can
   access the market are those who can invest millions/billions on language
   models - and there aren't many of them around.

2. Use your position of advantage to scrape the whole Web and feed it to a huge
   model, while discouraging other people from scraping.

3. Teach your model how to reproduce/mimic whatever it has learned.

4. Your model becomes the last Web you will ever need.