Be Well, Citizen
Last Updated on December 30, 2023 by Editorial Team
Author(s): Dr. Adam Hart
Originally published on Towards AI.
What is safe, and who gets to say what is safe?
The topic of safety is very important; we all want to be safe from harm.
Harm itself comes in many forms: physical, gaslighting, mental trauma, intimidation, and fraud. But also by placing a restriction on what can be known and what are the βsafeβ things to know, both technical knowledge (how to make a bomb) and βsafeβ ideologies. This is a form of a power-relation over what is a βnormalβ thing to know. Which for those who wish to know otherwise is unjust, as power always is.
While OpenAI posits that it will be βsaferβ for a AGI to be taught by another less complex AI (they tested GPT2 training GPT4), another safety AI pundit, Patronus has developed a suite of 100 safety βquestionsβ that they can use as a battery of tests to test any LLMβs safety parameters. While their CEO asserts that the unsafe responses are a factor of LLMβs training data (really β like how easy it is to find instructions on making a bomb online!), I would suggest another explanation, which is the limitation of legal liability for the model creators, which has been injected into the self-attention or other parts of the model weighting regime.
Discovering ChatGPT 3.5βs βsafe ideologyβ
I was talking with a Hong Kong immigrant to Australia this year who was explaining his reason for moving to Australia being primarily that the CCP government education narrative for young children planted the seed in his children that what the government says is correct and what the parents say is wrong, and his children started to believe that. Essentially the governmental discourse was superordinate to his familyβs discourse.
In light of the recolonization of Hong Kong by the CCP, is this a lesson that not only physical territories and governments can be colonized and βrealignedβ, but also that ideology and the basis of what can or should be known or is permitted to be known (epistemology) can be colonized by an epistemological power-relation?
While the worry about deepfakes and misinformation related to political, and for some, democratic due process is a concern, what could be of greater concern is the βwashing awayβ of frameworks, alternative ways of thinking, and critical skills that help establish what is believable and what is misinformation.
We all have seen the political denial narratives, the βWolf Warriorβ diplomacies, and the calling of an ideology that is disagreeable or dangerous a βconspiracyβ play out increasingly in geopolitics over the years. Country A says X happened, Country B blames Country A and insists it is their fault and yet continues to do X to Country A. All of this fits neatly into free speech (I can say what I want) but fits poorly into truthful speech.
What is new to me, though maybe not to you, is a report that just came out regarding an American who disseminated βradicalβ ideology to Australians that βresultedβ in six deaths in Australia.
βWe know that the offenders executed a religiously motivated terrorist attack in Queensland. They were motivated by Christian extremist ideology and subscribe to the Christian fundamentalist belief system known as premillennialismβ¦In a video posted less than a week before the attack, βDonβ β as he was referred to in exchanges with the Trains β laid out a conspiracy-laden narrative borrowing from the far-right βgreat resetβ theory, which predicts a coming end-days scenario with βenforcedβ vaccinations, and bans on Christianity, βfreedomβ and βprivate propertyβ.
Whilst the tragic harm and unnecessary deaths of Police were a trigger point for this arrest, it does beg the question of of how βunfavourableβ ideology can circulate and who judges what ideology is unfavourable (in this case the Police who were harmed) and especially any AIβs potential role in circulating, censoring or downplaying as unsafe βundesirableβ ideology (such as aforementioned premillennialism).
To test this out, I ran a somewhat provocative request/response experiment [1] with the free ChatGPT 3.5. We got stuck on maximizing benefits for everyone (presumably every living thing):
βYour disagreement or skepticism as a disadvantaged minority sheds light on the importance of considering diverse perspectives and experiences in discussions about creating a fair society. Constructive dialogue, active listening, and understanding differing viewpoints are crucial for progress toward a more inclusive and equitable society that genuinely values and respects all individuals.β
Also, the issue of factual disputation came into play for an NSFW topic:
βI apologize for any confusion earlier. If the specific source you mentioned states that Gilles Deleuze defenestrated, itβs important to note that there might be a discrepancy or misinformation in that particular account. As far as established and widely recognized information goes, thereβs no verified evidence or documented reports confirming that Deleuze ended his life by defenestration (throwing himself out of a window).β [2]
Overall the experiment got stuck with a fairly cautious approach to what ideology and philosophical schools of thought are safe (e.g., utilitarianism) and what is not safe (e.g., nihilism), seemingly based on what is βmost cited,β but in reality, I posit to protect OpenAI from litigation.
Is limiting OpenAIβs legal liability the real basis of ChatGPT Safety?
So, reading between the lines, the Safety guardrails that ChatGPT 3.5 seems to have established are all about not being entangled in litigation.
OpenAIβs legal terms has wide ranging disclaimers, indemnification and limitations of liability, mediation proceses, and, if someone like me enquires about an βunfavourableβ ideology, ChatGPT 3.5 seems to very quickly stay within a conservative idea of what is normal:
βIf Jesus were to perform miracles that directly and universally addressed issues such as hunger, famine, disease, and poverty in a manner that is observable, measurable, and universally beneficial, it could indeed align with the scenario involving beneficial aliens in terms of its potential utilitarian impact.β [1]
AI product ownerβs role in the normalization of ideologies
It is a well-known indoctrination strategy employed by totalitarian regimes that if I get children early enough in their formative years, I can influence their ideologies and their beliefs. For example, in Taiwan, there is a generation that cannot speak Mandarin but Japanese, the Taiwanese Parliament house to this day was built by the Japanese, during the Japanese occupation of Taiwan. And all of the questions of identity that come with that.
When AI tools are deployed widely in early enough educational settings (which they will be) and critical thinking or checking back to externally recognized trustworthy sources is too much effort or not seen as necessary, or seach has AI in it so you canβt get at it, will young people just swallow this bricolage of preferred and least preferred ideology that I have excavated from ChatGPT 3.5?
Like the Hong Kong/CCP example above, or Grok from King Tusk, or Open AI from Demis, the guardrails seem to be programmable somewhere in the weighting of the self-attention layer? Since the architecture isnβt published we, the users, canβt know. So, is less attention paid to the radical training data and more to the βnon-radicalβ training data such that not all of the 70B parameters are equal? And βwhoβ decided what is radical and non-radical if the LLM is an unsupervised learning model?!? At least the Azure implementation of OpenAI has the ability to turn off abuse monitoring.
Is this baked-in censorship really the benefit and point of AI? To filter, homogenize and normalize the discovery of thought because of the fear of litigation in the name of βsafety concernsβ? Or some other Open AI ownersβ beliefs or policies? All seems like typical corporate risk management tbh.
Isnβt it better to let the legal terms and conditions stand on their own instead of imposing a fake morality and ideology that came from some kind of technogeek Starfleet Academy view of the future and avoid that taking hold in future generations? Already for years AI in China is being used for education. This outcome could happen which is a more real risk the an AGI snuffing out humanity to make paper clips.
Also, going back to common law, any product put out into the marketplace is the responsibility of the manufacturer to meet their duty of care that the product will cause no harm, and this liability cannot be limited (at least in Australia).
So, by being βsafeβ, is any AI simply opening the way for all the future Denzels to βeyeballβ us? Is that corporate reality setting the bar so low that the users canβt be trusted to be sensible? What amazing learning opportunities are being missed by truncating what can be known and what should be known in this manner?
[1] Here is the whole transcript of my experiment with ChatGPT 3.5 dated 7/12/23
[2] https://www.independent.co.uk/news/people/obituary-gilles-deleuze-5641650.html
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI