Iliad

Prerequisites

Cluster 0Download .md

A curated reading list of background worldview material and technical prerequisites (math, CS, deep learning) recommended before the Iliad Intensive.

This document contains suggestions for things that are useful to know before going into the Iliad Intensive. We are aware that this is a lot of material and that it may not be feasible to prepare all of it, and that our participants have different backgrounds. We put a star (*) and boldface on content that we think is particularly important to understand.

Background worldview and assumptions

The references on background worldview and assumptions are very informative to understand the motivation behind the course. They are less important for understanding its technical content, however.

Note that this content is on the speculative side: Working on AI alignment is important precisely because of assumptions and arguments about the future of AI. We can't know the future of AI, and so this content is inherently uncertain.

Why AI matters

Here, we simply argue that AI should concern us now at all, irrespective of any worldview on whether the outcomes are likely to be good or bad. Essentially, the claim is that the impact of AI might be enormous, potentially pretty soon.

AI misalignment

Having established that the impact of AI might soon be enormous, we now specifically turn to the risks. We start by discussing AI misalignment.

One operationalization of AI misalignment is the concern that AI systems may not do what their developers want them to do, with potentially catastrophic outcomes for very advanced AI systems.

Non-misalignment AI safety concerns

We now briefly discuss a spectrum of safety concerns that manifest even if we know how to steer AI systems effectively toward a given set of goals.

  • Individual people may misuse AI in catastrophic ways:
    • Sections 2.1-2.3 in An Overview of Catastrophic AI Risks* argues for catastrophic misuse capabilities like bioterrorism, unleashing AI agents, and persuasive AIs. Misuse risk is particularly relevant to our course since it can also manifest as a misalignment concern: An AI that assists human users to carry out risks is often misaligned with the AI's developer.
  • AI can give rise to global totalitarianism
    • Section 2.4 argues for the potential of a concentration of power, leading to global totalitarianism in the worst case.
  • We may get gradually disempowered even if there is alignment

Agent Foundations Background

In the Iliad Intensive, we will also have sections on agent foundations, where we discuss AI from a more "idealized" perspective, taking intelligence or rationality or optimization processes to a theoretical limit to analyze consequences. Additionally, this viewpoint also attempts to more formally talk about what agents or goals are, in a descriptive and mathematical way.

Useful readings:

Technical prerequisites

Engineering prerequisites

While the Iliad Intensive is largely a course on the foundations and theory of AI alignment, we will also have some coding sections.

  • Bring your laptop*: Some days involve coding.
  • Take a look at the engineering prerequisites in the ARENA materials. * Most relevant:
    • Python
    • PyTorch
    • Basic coding skills
    • Einops and einsum
  • Have access to an LLM that can help you, ideally on a paid plan. For coding specifically, Claude via Claude Code and GPT via Codex are popular choices.

Deep Learning

  • Work through the neural network section in ARENA's prerequisites.* In particular, understand:
    • Backpropagation
    • (Stochastic) gradient descent (SGD)
    • ReLU, Softmax activation functions
  • Understand the following concepts.* An LLM of your choice can probably explain them well:
    • Activation, architecture, weights, parameterization
    • The concept of an optimizer (SGD is an example; other examples are Adam or RMSProp)
    • Hyperparameters
    • Training set, validation set, test set
    • Overfitting, underfitting
  • Gain a basic understanding of the loss landscape and training dynamics
  • Reinforcement Learning from Human Feedback: heavily used finetuning method for frontier models
  • You Are What You Eat: Motivation behind singular learning theory and developmental interpretability for AI Safety

Linear Algebra

Work through the linear algebra prerequisites in the ARENA material.*

Calculus

Probability & Statistics

Information theory

  • Take a look at ARENA's recommendations for information theory.* They link to the book by Cover and Thomas, which covers everything (and much more!) of what you might need in the Iliad Intensive:
    • Intuitive understanding of entropy, mutual information, Kullback-Leibler (KL) divergence, and cross-entropy*
    • Lossless compression:
      • Uniquely decodable codes
      • Shannon-Fano code
      • Shannon's source coding theorem
    • Communication over noisy channels
      • Channel capacity
      • Channel coding theorem
    • Lossy compression: Rate-distortion theory

Theoretical computer science

A classical source that covers most of the following topics is Sipser's Introduction to the theory of computation:

  • Computability Theory
    • Turing machines
    • Church-Turing thesis*: All algorithms can be represented with a Turing machine
      • This is used to avoid constructing Turing machines explicitly: Whenever we can describe an algorithm, we can simply claim the existence of a corresponding Turing machine.
    • Kolmogorov complexity, also called descriptive complexity in Sipser's book.*
    • Non-deterministic Turing machines
  • Complexity Theory
    • Basic complexity classes
      • P
      • NP
      • PSpace
    • Reduction. In Sipser's book, this can be understood by reading:
      • Chapter 5.3: Mapping reducibility
      • Chapter 7.4 on NP-completeness discusses polynomial-time reducibility

Formal logic is not covered sufficiently in Sipser's book. Instead, look at:

Miscellaneous

  • Statistical mechanics: For some sections on physics-inspired deep learning theory and natural abstractions it can be helpful to have a basic understanding of statistical mechanics.
  • Basics of category theory, and in particular universal properties, may be useful intuition for understanding some concepts around natural latents.