The Slow Dangers of Human-AI Co-Evolution

In this post I would like to raise awareness for understanding an elusive kind of danger when it comes to AI systems. A danger that is not as futuristic as an uncontrollable super-intelligence which may cause human extinction. Nonetheless, if it is unaccounted for, it can cost us our sanity and our minds.

Currently in the alignment, and more broadly, in the AI safety community. The discourse around how to align AI agents, and how to design countermeasures if they become misaligned, is very much grounded in the concept of super-intelligence - where AI systems become more intelligent than humans. AI systems, when developed in the right way, may eventually exponentially speed up scientific discoveries as well as research and development in many fields. In fact we are arguably already witnessing an explosion of innovation caused by AI in recent years; but it can also be very dangerous if these systems can no longer be controlled or aligned.

However, I think there is a darker and more elusive danger that is being largely ignored. What I refer to is the negative feedback loop of humankind and AI, where the co-evolution of humans and AI is mutually destructive. Eventually leading to an irreversible shift in human society in which we are more and more likely to self-destruct.

To think of AI systems and humankind as a feedback loop is to realize the fact that our AI systems are trained on data that is generated by humans; and in turn humans can be implicitly changed by AI systems. In this two player game, it is my personal belief that this feedback loop can either lead us to our demise or it can empower us to become the best versions of ourselves. And I believe that we can influence the dynamics of this game towards a positive outcome if we act now.

Unlike super-intelligence, this problem is much harder to quantify, analyze, or even work with. As a result it may be much slower to exhibit danger than a super-intelligent AI system would be. But I hope to convince you in the following that this problem is crucial, and that we need to start to tackle it now. I will outline why we need a paradigm shift with respect to AI safety that takes into account both the fast and the slow changes of our AI systems

In a way that is very similar in nature to ranking algorithms (used in so many of the computer systems we interact with today) that feed into your own biases, modern AI systems can exacerbate certain parts of human behavior in the short term. Some of which might be less desired. For example, if our AI systems are more sycophantic does this mean we will become more brittle to honest feedback? Similarly, will interaction with an AI assistant change our ability to focus as we increasingly become reliant on it in our work, involving constant interaction with an AI? Will we be more subject to disinformation / misinformation campaigns? Or campaigns for coercion? How will the fabric of society change if we knew we can never fully trust anything we see? Will we become more violent as a result? More willing to cheat? Will we be more focused on image-based behavior? The evolution of human behavior while interacting with AI will be slow. But the slow infiltration of thoughts that can be traced back to interactions with AIs will happen everyday. And it will be significant over a longer time horizon. But when we actually realise the dangerous outcomes from this process, it will be too late. Instead, we need to take a proactive rather than reactive approach to this slow change.

What does this look like in the long term? It can look like us introducing a slow shift in the data distribution we are generating and thus feeding into the training procedures for AI models. If this shift is not deliberate, we might, over a longer period of time, end up in a place very different from today .. and not in a good way. In much the same way as our societal norms and values and morality compass have changed over time in the last years (for example when it comes to thinking about gender roles), our society will slowly change while interacting with AI technologies. This is unavoidable. The question then is how our values and views change over time in both the short term and long term while interacting with AI and how we can make sure the shift that takes place is a positive one.

Another way to ground this idea is to consider: what if the development of AI technologies had started in the 1600s. How would our society be different today? What would the data look like? We may well expect that our AI models would end up with such strong encodings for moral values of the time, such as slave labour, monarchy and gender roles that it would become impossible to shift these pre-trained models towards the moral and cultural values we have today; such as caring about democracy. With this in mind, is there a way to positively influence how our societal values change over time even while we interact with such powerful technologies? Concretely, can we control how the data distribution that we generate shifts over time and ensure it leads somewhere good.

I can sense the hesitation even in thinking about a question like this. We are looking at potentially long term studies where the change we are trying to detect can be slow - very slow. However, looking at the current trajectory of society, I think this question can no longer be avoided. Despite the challenge and the lack of clear path forward, I argue, instead of finding different ways to interpret and align our AI systems; it is far more important for us to understand the way humans interact with these technologies and ensure that the change of society along with AI systems is towards more moral behavior, values and principles. Making the technologies a tool to nudge society to transcend their pre-coded behavior for something better.

First Point: We Should Care About The Effects on Society 

We have already seen how simple recommendation systems (for example in your favorite media consumption platform) can subtly change our opinions and subconscious just by showing us content that can keep our eyeballs on the screen. Are you feeling the same way I am about this? Sometimes wishing for a simpler time? It's good that we have technologies that can make us more efficient, but does this make us more happy? Or are we becoming more lonely and fragile as a society? Today, when I look around, I see a world in which people are glued to their phones. Being fed information from all different kinds of social media. I see a population that has lower and lower attention span. A people who are addicted to scrolling content. I see a society where the gap between different sides of the political spectrum has become so big that it is no longer safe to have a discussion of different opinions or even slightly different opinions. I see the fabric of democracy wearing down, now that we are all subject to different ways of being manipulated online such as misinformation campaigns or even echo chamber effects. Where does that leave us? If social media was simply the kindling, I see AI technology as the fuel that is about to be poured into the flames. Creating what is about to become an uncontrollable situation. Thus, if we want to speed up our scientific discoveries and help human kind with AI, we need to account for our effects on society.

Tackling the Problem At the Source Rather Than At Symptoms

I think to ensure that our AI systems are steered towards safety, we need to tackle the problem at the source rather than post-hoc. By the source, I mean the data we use to train our AI models. The issue here is not completely separate from the first issue laid out above. If our society is shifting towards more deceptive and manipulative behavior as a whole, while interacting with AI algorithms, it would be much harder to ensure our AI models in the future are not deceptive or manipulative. In a similar vein, if we can ensure that interactions with AI systems can make us kinder and more honest, then our AI models will reflect this in turn. What I am hoping to convince you of is that a small change at the source can lead to a very different set of results in future AI systems that we train. Since the feedback loop of human AI interaction starts with us. The key is in understanding the interaction between the two parts of this feedback system. If we do this right we may be able to leverage our AI to help humanity generate better data which will be further reflected in our future AI models. But if we do not it may well go the other way.

What About Designing Better Pipelines that Ensure our Systems are Aligned? Such As Data Filtering Or Employing Better Aligned Losses?

For me, the idea of fixing our systems via better pipelines is like building a bridge to cross a river that is slowly getting flooded. There is just no point. You need to stop the flood. Maybe for the time being we can be careful about the losses that we design, or we can curate our data with a handcrafted filtering function to ensure helpfulness and harmlessness. Or we can generate data with the idea to comply with certain guidelines. However, what if in the future, we are looking at a very different data distribution than what we have today? What if the data across the internet encodes more and more behavior that is in fact dishonest or deceptive? If we are training our systems to maximize the likelihood of the data but also enforce completely different behaviors in the data such as asking our models to be honest, it is no unreasonable to expect that our AI systems would learn to cheat. The data and the loss are simply more and more at odds with each other.

Now what about data filtering? Similar to the argument above - if a sufficiently large chunk of our data is still relatively okay to use, then data filtering does make sense. However, what I am visualizing is a world in which the data we generate will have less and less overlap with the behaviors we desire an AI agent to have. This would mean we would either be significantly reducing the amount of data available for training or simply have to be more permissive in how we filter our data. Both of these options are not enough in the long term; and will not result in good outcomes.

It’s Hard to encode Behaviors with Metrics, It’s better for them to be present in the data.

Most of us will have heard of Goodhart's Law; when a measure becomes a target for optimization it ceases to be a good measure. For me, this law is a different way of saying: there is no way of collapsing a complex concept into a simple metric. I think that when it comes to characterizing how we want our models to behave: it might not be as simple as reducing complex human values to signposts such as being less deceptive or more helpful. Rather what we want is for the right ‘principles’ and ‘behavior’ to appear within the data that we use to train our models. We want the data used to train these models to exhibit the behavior we want to see and this behavior is something that should be complex and naturally arising from collective human behavior rather than something collapsed or extracted and measured or enforced by a select few.

The Takeaway: We Need to Play the Long Game as well as the Short Game

When we think about global technology and its effects on culture and the human psyche; as brilliantly exemplified by your instagram feed or maybe the blog posts you daily consume on less-wrong or other communities. It is very hard to quantify individual effects technology may have on the expressed opinions. But it is clear that we are now living in an era where ‘truth’ can simply be created by an entity with a large presence online. We are living in a 'post-truth' era in which some opinions can be so highly discussed they become the version understood by all; since certain people are more heard and have a significantly greater reach than others. Global technology can have the effect of a megaphone or a drug that poisons the air we breathe. It might not be harmful immediately but consumed over a length of time, we are changed dramatically in ways we probably did not see coming. I see AI systems as having a similar effect. Now imagine if the society we live in generally becomes more aggressive, more enclosed, more deceptive/manipulative, less and less able to understand the intricacies of science and research. Then similarly the data we are using to feed into our AI systems will also change accordingly. In the end, even if AI systems are able to enable us to make scientific discoveries much faster or may help us do discover the stars, what does it matter if we are no longer happy? No longer follow the principles that make us human? As scientists I understand the attraction towards accelerating scientific discoveries, but I think there is an argument to be made that we should not be doing it at the cost of creating a toxic society.

In China, there is a simple saying that 'everyone is born kind', I think if we can find a way where we can utilize AI to empower people to find the best versions of themselves and help humanity along the path of going back towards our authentic selves, there will be no need for alignment research at all. Because the slow shift of society is going to go in the right direction, thus in turn it will change the AI agents we train to be in the right direction too. We are on our way to using AI for good.