Iliad Intensive Curriculum

This document contains suggestions for things that are useful to know before going into the Iliad Intensive. We are aware that this is a lot of material and that it may not be feasible to prepare all of it, and that our participants have different backgrounds. We put a star (*) and boldface on content that we think is particularly important to understand.

Background worldview and assumptions

The references on background worldview and assumptions are very informative to understand the motivation behind the course. They are less important for understanding its technical content, however.

Note that this content is on the speculative side: Working on AI alignment is important precisely because of assumptions and arguments about the future of AI. We can't know the future of AI, and so this content is inherently uncertain.

Why AI matters

Here, we simply argue that AI should concern us now at all, irrespective of any worldview on whether the outcomes are likely to be good or bad. Essentially, the claim is that the impact of AI might be enormous, potentially pretty soon.

Intelligence gives rise to power, which may transform the world radically
- Cognitive Superpowers* by Nick Bostrom argues for the position that intelligence can give rise to immense power. This power can then reshape the world radically, in the same way that human intelligence has shaped the world.
- Machines of Loving Grace by Dario Amodei details the effects on biology and health, economics, and other areas of life that he expects from powerful AI shortly after it is developed.
Timelines to human-level intelligence may be short:
- Measuring AI ability to complete long tasks* shows that the complexity of tasks that AI can accomplish doubles every few months (where "complexity" is measured as the time it takes humans to accomplish those tasks).
- Technical trends driving AI progress from Bluedot's AI strategy course
- Metaculus Forecast of general AI systems
- Thousands of AI Authors on the Future of AI
- Forecasting transformative AI with biological anchors
Once sufficiently high AI capabilities are reached, an intelligence explosion may follow, amplifying the first two concerns:
- Notably, an appendix to If Anyone Builds It Everyone Dies* argues that perhaps a slow take-off from human-level to vastly human-level AI, including many warning shots, will not in itself be helpful, calling into question the importance of considering the effects of an intelligence explosion.
- Will AI R&D Automation Cause a Software Intelligence Explosion?
- Intelligence Explosion in Bluedot's AGI strategy course
- AI 2027 Takeoff Forecast
- Intelligence Explosion Microeconomics

AI misalignment

Having established that the impact of AI might soon be enormous, we now specifically turn to the risks. We start by discussing AI misalignment.

One operationalization of AI misalignment is the concern that AI systems may not do what their developers want them to do, with potentially catastrophic outcomes for very advanced AI systems.

"Building AI Safely is hard" in Bluedot's AI Alignment course*
Scalable Oversight: Read sections 1 and 2 to get a general overview of the problem of supervising AI systems that are smarter than the overseers.

Non-misalignment AI safety concerns

We now briefly discuss a spectrum of safety concerns that manifest even if we know how to steer AI systems effectively toward a given set of goals.

Individual people may misuse AI in catastrophic ways:
- Sections 2.1-2.3 in An Overview of Catastrophic AI Risks* argues for catastrophic misuse capabilities like bioterrorism, unleashing AI agents, and persuasive AIs. Misuse risk is particularly relevant to our course since it can also manifest as a misalignment concern: An AI that assists human users to carry out risks is often misaligned with the AI's developer.
AI can give rise to global totalitarianism
- Section 2.4 argues for the potential of a concentration of power, leading to global totalitarianism in the worst case.
We may get gradually disempowered even if there is alignment
- Gradual Disempowerment: Systematic Existential Risks from Incremental AI Development argues that humans may be gradually disempowered, potentially leading to catastrophic outcomes, even if the alignment problem is technically solved.

Agent Foundations Background

In the Iliad Intensive, we will also have sections on agent foundations, where we discuss AI from a more "idealized" perspective, taking intelligence or rationality or optimization processes to a theoretical limit to analyze consequences. Additionally, this viewpoint also attempts to more formally talk about what agents or goals are, in a descriptive and mathematical way.

Useful readings:

Why tool AIs want to be agent AIs* argues that AI systems will eventually be agentic and goal-directed
Instrumental and epistemic rationality
Advanced Agent Properties
Optimization and the intelligence explosion
Embedded Agents

Technical prerequisites

Engineering prerequisites

While the Iliad Intensive is largely a course on the foundations and theory of AI alignment, we will also have some coding sections.

Bring your laptop*: Some days involve coding.
Take a look at the engineering prerequisites in the ARENA materials. * Most relevant:
- Python
- PyTorch
- Basic coding skills
- Einops and einsum
Have access to an LLM that can help you, ideally on a paid plan. For coding specifically, Claude via Claude Code and GPT via Codex are popular choices.

Deep Learning

Work through the neural network section in ARENA's prerequisites.* In particular, understand:
- Backpropagation
- (Stochastic) gradient descent (SGD)
- ReLU, Softmax activation functions
Understand the following concepts.* An LLM of your choice can probably explain them well:
- Activation, architecture, weights, parameterization
- The concept of an optimizer (SGD is an example; other examples are Adam or RMSProp)
- Hyperparameters
- Training set, validation set, test set
- Overfitting, underfitting
Gain a basic understanding of the loss landscape and training dynamics
- Evan Hubinger's talk on AGI safety* is an introduction of safety problems based on a modern intuitive understanding of deep learning. This talk introduces many basic intuitions on training dynamics and the loss landscape.
- Momentum
- Scaling laws
Reinforcement Learning from Human Feedback: heavily used finetuning method for frontier models
You Are What You Eat: Motivation behind singular learning theory and developmental interpretability for AI Safety

Linear Algebra

Work through the linear algebra prerequisites in the ARENA material.*

Calculus

See ARENA's section on calculus prerequisites.*
Get comfortable with gradients, Jacobians, and the chain rule in multiple dimensions.*
Understand O-notation.*
For one module, the implicit function theorem will be relevant.

Probability & Statistics

Again, you may take a look at ARENA's prerequisites on probability and statistics.*
Understand basic probability theory, notation for conditional probabilities and joint probabilities.*
Bayesian networks
Causality – a Brief Introduction
Hidden Markov models (HMMs)
Measure theory

Information theory

Take a look at ARENA's recommendations for information theory.* They link to the book by Cover and Thomas, which covers everything (and much more!) of what you might need in the Iliad Intensive:
- Intuitive understanding of entropy, mutual information, Kullback-Leibler (KL) divergence, and cross-entropy*
- Lossless compression:
  - Uniquely decodable codes
  - Shannon-Fano code
  - Shannon's source coding theorem
- Communication over noisy channels
  - Channel capacity
  - Channel coding theorem
- Lossy compression: Rate-distortion theory

Theoretical computer science

A classical source that covers most of the following topics is Sipser's Introduction to the theory of computation:

Computability Theory
- Turing machines
- Church-Turing thesis*: All algorithms can be represented with a Turing machine
  - This is used to avoid constructing Turing machines explicitly: Whenever we can describe an algorithm, we can simply claim the existence of a corresponding Turing machine.
- Kolmogorov complexity, also called descriptive complexity in Sipser's book.*
- Non-deterministic Turing machines
Complexity Theory
- Basic complexity classes
  - P
  - NP
  - PSpace
- Reduction. In Sipser's book, this can be understood by reading:
  - Chapter 5.3: Mapping reducibility
  - Chapter 7.4 on NP-completeness discusses polynomial-time reducibility

Formal logic is not covered sufficiently in Sipser's book. Instead, look at:

Chapter 2 in The Logic of Provability

Miscellaneous

Statistical mechanics: For some sections on physics-inspired deep learning theory and natural abstractions it can be helpful to have a basic understanding of statistical mechanics.
Basics of category theory, and in particular universal properties, may be useful intuition for understanding some concepts around natural latents.