Chapter 1

How It All Started

Four unrelated events that created the AI revolution.

The story of AI is not a straight line. It's four separate threads — running in parallel for years, sometimes decades — that nobody thought were connected.

A gaming graphics card. A young professor labeling millions of photos. A Netflix competition. And a handful of researchers the academic establishment had mostly written off.

Each one, on its own, looked like a dead end or a curiosity. Together, they detonated.

1.1 The Long Road — AI Never Really Stopped

There's a popular narrative that AI "died" twice and was magically resurrected. It's a great story. It's also mostly wrong.

The idea of a machine that thinks is older than most people realize. In 1943, McCulloch and Pitts published the first mathematical model of an artificial neuron. In 1957, Rosenblatt built the perceptron — and the New York Times said it would one day "walk, talk, see, write, reproduce itself, and be conscious of its existence."

The hype cycle is not a new invention.

Then Minsky and Papert proved single-layer perceptrons couldn't solve the XOR problem (1969). Funding dried up — the first "AI winter." The second came in the late 1980s when expert systems turned out to be brittle and useless the moment they encountered anything unexpected.

But research never stopped. Rumelhart, Hinton, and Williams published backpropagation in 1986. Yann LeCun built the first practical convolutional neural networks in 1989. Support vector machines and random forests became workhorses in the 2000s. By the early 2010s, a practical ML ecosystem was thriving — Kaggle had turned ML into a global sport, and tools like Scikit-learn made algorithms accessible in a few lines of Python.

The people who call 2012 a "sudden breakthrough" weren't paying attention to the 20 years of work that made it possible.

1.2 The GPU Trick

Around the year 2000, a Stanford PhD student named Ian Buck had an insight that would eventually transform how we use computers. He wasn't thinking about AI. He was thinking about graphics cards.

Here's the problem he noticed. CPUs do things one at a time — versatile and precise, but sequential. A GPU is a factory floor — thousands of simple workers all doing the same task at once. Buck realized that matrix multiplication — the core operation behind neural networks — is embarrassingly parallel: every element can be computed independently. What took days on a CPU could take hours on a GPU.

This was an order-of-magnitude improvement, and it came from hardware being mass-produced for teenagers playing video games. NVIDIA was subsidizing the future of artificial intelligence with Call of Duty revenue, and nobody at NVIDIA knew it yet.

The catch: programming a GPU for general math in 2000 was brutal — you had to disguise your calculations as fake graphics operations. That changed in 2006 when NVIDIA released CUDA, which let programmers write normal code that ran on GPUs.

The hardware foundation was in place. It was waiting for data.

1.3 The Data Revolution

In 2007, Fei-Fei Li was an assistant professor at Princeton with a conviction that most of her colleagues thought was misguided.

Her argument was simple. The bottleneck in computer vision wasn't better algorithms. It was better data. Models were data-hungry, and nobody was feeding them.

The datasets researchers were using at the time had a few thousand images. Li wanted to build something with millions. Across thousands of categories. Labeled by hand.

She called it ImageNet.

The scale she was proposing was considered borderline absurd. Colleagues told her it was a waste of time. One senior researcher said the project would take decades to complete using graduate students.

Li had a different idea. She used Amazon Mechanical Turk — the platform that lets you pay humans small amounts to do simple tasks — to label images at scale. Her team developed consensus mechanisms, redundant labeling, and quality filters that achieved 97%+ accuracy across the dataset.

By the time it was done, ImageNet contained over 14 million images across more than 21,000 categories. And in 2010, Li launched the ImageNet Large Scale Visual Recognition Challenge. Every year, research teams would compete to build the best image classifier.

The first two years were incremental. Error rates dropped slowly from 28% to 26%.

The fuel was ready. It just needed a spark.

1.4 The Netflix Prize

In October 2006, Netflix offered one million dollars to anyone who could beat their recommendation algorithm by 10%. Over 40,000 teams registered. The competition ran for nearly three years.

Geoffrey Hinton's team at the University of Toronto entered with Restricted Boltzmann Machines, a type of neural network. They didn't win, but their approach was competitive with methods that had been refined for decades. That was the signal.

The winning team eventually crossed the threshold in 2009 — but their solution was so complex that Netflix never actually deployed it.

The real lesson was subtler. Neural networks — the technology the establishment had dismissed twice — could compete with decades of specialized feature engineering. Given enough data and enough compute, learning beat knowing. The pattern that would define the next fifteen years was already visible, if you knew where to look.

Most people weren't looking.

1.5 The Inflection Point — AlexNet (2012)

By 2012, the four threads were in place. GPUs were fast and programmable. ImageNet had assembled the largest labeled image dataset ever created. Neural network research had quietly continued in a handful of labs despite two decades of skepticism. And the Netflix Prize had shown that deep learning could compete.

Nobody had put them all together. Not yet.

Alex Krizhevsky was a graduate student at the University of Toronto, working under Geoffrey Hinton. His colleague Ilya Sutskever — who would later co-found OpenAI — was a PhD student in the same lab. Together, they built a deep convolutional neural network and entered the 2012 ImageNet challenge.

The results were not an improvement. They were a discontinuity.

Previous winners achieved error rates around 26%. AlexNet achieved 15.3%. The second-place team, using traditional computer vision techniques, scored 26.1%.

The gap was so large that people initially thought there was an error. There wasn't.

Within a few years, every major tech company had a deep learning team. GPT and its descendants were still a decade away, but the direction was set. The question was no longer whether deep learning would work. It was how far it would go.

Key Takeaways

→AI never actually stopped — the 'winters' were funding gaps, not dead ends.
→The GPU made deep learning practical, and it happened by accident (gaming).
→ImageNet proved that scale of data matters as much as algorithm quality.
→AlexNet 2012 wasn't a breakthrough — it was 20 years of work becoming visible.

Enjoyed this chapter?

There are 21 more. Each one builds on the last.

PDF + EPUB · DRM-free · 30-day refund