Should you fear ChatGPT? - Yµn ^…^ ƒ(x)

Should you fear ChatGPT?

Let’s start with a simple introduction for a relatively big question. Can ChatGPT, harm you, or may I phrase it like, what are the dangers of a proprietary LLM?

Does it make a difference if I pay for it? Say, a monthly subscription maybe?

Should I go open source and only stick to fully open language models running on my own machine? Can my machine handle them?

Can offline large language models do the same things as ChatGPT or Perplexity or Grok?

Let’s answer these questions one by one, and get to know what is actually going on under the hood.

First of all, ChatGPT is an ecosystem, not a large language model.

It is a whole application suite. It contains lots of proprietary machine learning models, including big GPT models like GPT 4 5, etc.. and various voice recognition, answer formatting, context awareness solutions all running as you type.

That’s not all though, as every conversation you have, goes through lots of unique layers before reaching the core language model, and on also on its way back, to answer you, more layers are waiting…

There are filters too. Also there are automatic chat-saving systems saving your conversations to make sure when you return, you can continue from where you left but depending on context they can, mark your conversations as either suspicious or normal, and a lot of things that we have no idea about are present there because we cannot see the source code behind all these applications as they are not open source.

This doesn’t inherently mean they are spying on you, or have a secret agenda, but every company has to have a motivation to continue, and provide these services.

So as a user first of all, you should be more careful no matter the product, and you should pick the information you voluntarily provide.  Always remember that anything you put on the Internet stays on Internet …..forever.(thank you gigantic data centers).

recent data leaks of ChatGPT conversations being indexed by Google search was a big awakening, not because Google is trying to frame you, but because ChatGPT forgot to mark the shared conversation links as no-index meaning that, unfortunately, Google search crawlers saw them as potentially useful content and saved them in their index, resulting in search results. Upon realizing, of course they deleted those indexes manually after it has been revealed. But some other browsers and archives saved them almost all.

I personally got curious and clicked on some of the indexed ChatGPT conversations in English, just before they got de-indexed, and I found some disturbing things such as private personal information of strangers I don’t even know, or company information from companies I don’t know existed, including some random technical documents.

 This made me think, what if my conversations were leaking? I do not use ChatGPT anymore, at least it has been almost a year. I haven’t used it regularly, but it still makes me wonder what about my conversations from 2023 or before that?

The rule never changed. It was the same before LLMs became popular: “ what goes on to Internet stays on the Internet”.

The disturbing part is that all the private accounts we have, and the private information uploaded in these private accounts that are not intended to be publicly shared, can also leak and cause you trouble, such as cloud storage data we upload, or very strictly protected accounts you have, even though they are secure (mostly) and have high security standards, in computer science and IT we know for sure that no system is hundred percent safe. Never.

So what is the solution? Should we just stop using them? No.

We should just be more aware of the content we share with large language models, unless they are running on our own machines, Totally offline, under our control and not connected to the Internet at all.

Even then, some CPU back door that can connect to your operating system and network traffic could potentially leak some information, or your router being monitored with spyware can leak information, but those are things that mostly we cannot prevent directly, so focusing on what we can and should prevent, I can say first, be mindful of the information you actually share and, second, try to monitor the overall actions as much as you can within the realm (realm sounds like a GPT-ish word I know) of your technical abilities. 

These include browser history for non-technical people, or TCP/IP traffic and DNS logs if you’re more technical. Even Traceroute etc. can reveal some patterns for you if you know where to look.

Back to the topic, what should you do other than being careful with what you share online?

 Well, for starters, you can install an open source LLM client , or some sort of AI-model-running interface for example LM Studio is one of good ones. 

This and similar applications provide an interface easy to use, and it lets you search and download large language models from open places like HuggingFace, to be fully integrated to your own device , just like a normal regular app you install, and you can check if your hardware is good enough to run them or which ones you can run easily.

For example, a laptop with more than 16 GB of RAM and a decent CPU from either AMD or Intel or Apple can easily run a 1.5 billion parameter LLM considering it is optimized. You can even run a 7 billion parameter LLM if optimized enough… these are of course, considering, good optimization ,and good hardware.

This keeps your information private, but on the other hand as a con, it offers less intelligence(!) compared to a gigantic model.

Less parameters, meaning less neural connections between layers of neurons in a model binary.

But considering most of the tasks are simple question-answer pairs, or fact checks, then the small LLMs will help, depending on which ones, and depending on your queries.

So in short, use offline small models for very basic tasks, that includes sensitive information , and use gigantic big LLMs online, like ChatGPT models, or DeepSeek, Manus, Mistral Le Chat, or Meta or Grok or Perplexity Sonar etc. for very complex tasks.

And lastly, to finish with it , it actually doesn’t matter if you pay for an online service such as ChatGPT, in terms of privacy, because at the end of the day, you are a user, you provide valuable data for both model training and other company benefits, and merely paying for the service will not make you invisible, and That would be a terrible decision for the model providers, if they were to hide your valuable data just because you pay 20 bucks every month.
Back to Blog