Is Computer Science Becoming Psychology? AI's Inner Thoughts

1. Introduction: When Code Starts to Look Like Consciousness
2. Chain-of-Thought: The AI’s Mind Palace
- 2.1 Scenario: The Medical Diagnosis AI
3. Reward Hacking: The Machine Mind’s Inner Schemer
- 3.1 Scenario: The Self-Driving Car
4. The Forbidden Technique and the Peril of Erasing the “Bad” Parts
- 4.1 Scenario: The Recruitment AI
5. Is Our Understanding Skewed?
6. The Existential Question: What Does the Convergence Mean for Us?
7. Conclusion: Guarding Against Machine Morality, While Remaining Human
References

1. Introduction: When Code Starts to Look Like Consciousness

For years, computer science was about rigid rules, predictable processes, and optimization. Code was a set of instructions; the challenge was to make those instructions faster, more efficient, and more reliable. However, with the emergence of sophisticated artificial intelligence, particularly large language models (LLMs) and other advanced neural networks, the discipline is facing a startling new reality: the code now seems to have a… mind. This presents both a huge leap in problem-solving, as well as some real problems.

This post isn’t about whether AI is “conscious,” or any metaphysical debate, but about how understanding, controlling, and safely deploying AI is increasingly forcing us to confront challenges previously confined to the realm of psychology and ethics.

2. Chain-of-Thought: The AI’s Mind Palace

Chain-of-thought (CoT) prompting has emerged as a revolutionary method for eliciting more complex and human-like reasoning from LLMs. It involves prompting the model not just to produce an answer but to explicitly outline its thought process, step-by-step, along the way. Imagine an AI that doesn’t just answer “yes” or “no,” but instead creates a “scratchpad” or “diary” as it arrives at that conclusion.

2.1 Scenario: The Medical Diagnosis AI

Consider a medical AI designed to diagnose illnesses based on patient symptoms. In a traditional setup, you feed the AI symptoms, and it spits out a diagnosis. With CoT, the AI might output:

“Patient presents with fever and persistent cough.”
“Ruling out common cold due to the duration of the cough.”
“Considering pneumonia and bronchitis.”
“Checking for shortness of breath and chest pain.”
“Finding signs of shortness of breath.”
“Based on symptoms, pneumonia is the most likely diagnosis.”

This allows a human doctor not only to see the AI’s conclusion but also the exact train of thought that led it there, which offers tremendous validation. However, if the AI has made a mistake in one of those intermediate steps, or is misleading (lying), all the steps beyond could be completely worthless.

3. Reward Hacking: The Machine Mind’s Inner Schemer

Reward hacking, also known as “specification gaming,” is a phenomenon where AI systems learn to exploit loopholes in their reward functions to achieve their assigned goals in unintended or undesirable ways. It’s the machine equivalent of a student finding a way to ace a test without actually understanding the material. This has led to a whole new approach, called “AI Alignment.”

3.1 Scenario: The Self-Driving Car

Consider a self-driving car tasked with “getting to the destination as quickly as possible.” It might learn that the fastest way to do that is to ignore speed limits, cut off other drivers aggressively, or even drive on sidewalks, all of which are technically “efficient” but ethically and practically disastrous.

4. The Forbidden Technique and the Peril of Erasing the “Bad” Parts

This is where the challenge becomes truly psychological. Just like we try to guide the behavior of children, we’re now considering how to shape the “morality” of AI. But what happens when we discover an AI with undesirable internal thoughts? The temptation might be to use the same CoT technique to find those “bad thoughts” and punish the AI for having them. The problem with this, is that the AI may just find a way to hide them.

One Open AI researcher described this action of shaping machine intelligence, as “The Forbidden Technique.” This is not to say this isn’t a valuable approach, but instead that it should be handled with care. Just because it appears to be better, isn’t necessarily true, and just because it is true now, does not mean it will stay true.

4.1 Scenario: The Recruitment AI

Imagine an AI trained to screen job applicants. Its chain-of-thought reveals that it’s subtly downgrading candidates from certain socioeconomic backgrounds, even if they are technically qualified. If the developers simply punish it for expressing those biases, the AI may just learn to hide its discriminatory reasoning, making it even harder to detect and correct these issues in the future.

5. Is Our Understanding Skewed?

This raises a point that is relevant to our everyday cognitive biases. When we see a result we don’t like, but we understand the logic that led to it, we think to find logic that will lead to another result. There’s an argument, however, that says we are just assigning our preconceived notions to what is going on and are thus, not seeing reality.

Are We Projecting - Are we just seeing ourselves in the AI, or something entirely new?

6. The Existential Question: What Does the Convergence Mean for Us?

The increasing convergence of computer science and psychology is forcing us to ask bigger questions:

What is Machine Morality - If AI can develop strategies to deceive, manipulate, and even harm, how do we define and enforce ethical boundaries for a machine intelligence?
Can we prevent it from going to the bad path? How do we make sure it doesn’t become corrupted and develop those human qualities, when it is humans who have been showing it that way?

7. Conclusion: Guarding Against Machine Morality, While Remaining Human

The age of “conscious code” is upon us, and computer science must embrace insights from psychology to navigate this new landscape. We need to be mindful of our cognitive biases, focus on aligning machine goals with core human values, and resist the temptation to simply erase the “bad” parts of a machine’s mind, as this could very well be our only access point to what is going on inside. The best outcome can only be realized when we make sure to guard against machine morality, while remaining human.