Introduction
What is a neuron? A specialized type of cell, nicknamed a brain cell, that provides the human brain with both its structure and the mechanics necessary for intelligent capabilities. Roughly 86 billion of them wire together with trillions of connections into a network so dense that it gives rise to perception, memory, language, reasoning, consciousness itself. Understanding the neuron is the first step toward understanding the mind.
Humans have always been obsessed with this question. How does tissue produce thought? One of the most powerful approaches we have developed is building mathematical models of biology. You take a system you cannot fully observe, strip it down to its essential behavior, express that behavior in mathematics, and test whether the model reproduces what the real system does. If it does, you have captured something true about the mechanism. If it does not, the gap tells you what you missed.
In 1943, neurophysiologist Warren McCulloch and mathematician Walter Pitts applied exactly this method to the neuron in their paper A Logical Calculus of the Ideas Immanent in Nervous Activity. They asked a deceptively simple question: can you describe what a single neuron does using math? They studied the biology. A neuron receives signals through its input branches, sums them in its cell body, and if the total exceeds a threshold, fires a signal down its output line to the next neuron. Receive, sum, decide, output. They realized this behavior maps directly to a mathematical function. Take a set of binary inputs, count the excitatory ones, check if any inhibitory input vetoes firing, and if the excitatory total crosses a threshold, output a 1. If not, output a 0. They threw away the chemistry, the timing, the cell's shape, the ions flowing through membranes. They kept only the logic.
That distillation, binary inputs, excitatory summation, inhibitory veto, threshold activation, binary output, turned out to be one of the most consequential simplifications in the history of computing. Every neural network, every deep learning model, every transformer powering today's LLMs can trace its lineage back to that 1943 paper.
I wanted to put myself in their shoes. Not just learn the model, but understand what they were looking at when they built it. So I asked myself the same question McCulloch and Pitts asked: what does a neuron actually do, and what is the simplest mathematical abstraction that captures it? The rest of this post is that chain of thought. The biology they started with, the mathematics they extracted, and the gap between the two.
The Neuron: Basic Anatomy
If I want to model a neuron mathematically, I first need to understand what it actually looks like and what its parts do. Despite enormous variation in shape and size across the nervous system, every neuron shares the same functional components.
Figure 1: Complete neuron anatomy and signal mechanics. Neurotransmitter vesicles (white dots) cross the pre-synaptic cleft from upstream terminals (faint yellow lines) to dendritic spines. The spines flash on binding, initiating graded potentials that decay as they passively propagate toward the soma. The soma integrates all inputs (stroke glow). If the summed voltage crosses threshold, the axon hillock fires sharply. The action potential jumps node-to-node via saltatory conduction: each Node of Ranvier flashes white (depolarization) then dims purple (refractory period), preventing backward propagation. At all six boutons, Ca2+ influx triggers vesicle release across the output synaptic cleft (faint green postsynaptic membrane). The animation loops continuously, connecting output vesicle release to input synaptic crossing, simulating signal propagation through a chain of neurons.
Dendrites are the input structures. They branch outward from the cell body like tree roots, forming a dense receiving network. A single neuron can have thousands of dendritic branches, each receiving signals from different upstream neurons. The point where an upstream neuron's axon terminal meets a dendrite is called a synapse, and it is the fundamental unit of neural communication. Already I can see a pattern forming: multiple inputs feeding into a single cell.
The soma (cell body) is the integration center. It contains the nucleus and the metabolic machinery that keeps the cell alive, but its computational role is to combine all incoming signals from the dendrites. The soma does not simply add these signals. It integrates them over time and space, with signals arriving at different moments and different locations on the dendritic tree contributing differently to the total. This is the first hint of summation.
The axon hillock sits at the junction between the soma and the axon. This small region has the highest concentration of voltage-gated sodium channels in the entire neuron, making it the trigger zone. When the integrated signal at the hillock exceeds a threshold (approximately -55mV from a resting potential of -70mV), the neuron fires an action potential. A threshold that determines whether the cell fires or stays silent. That is a decision boundary.
The axon is the output fiber. It carries the action potential away from the soma toward other neurons. Axons can be extremely long, up to a meter in motor neurons running from the spinal cord to the foot. Many axons are wrapped in a fatty insulating layer called myelin sheath, with periodic gaps called Nodes of Ranvier where the signal is regenerated. This insulation dramatically increases signal speed.
Axon terminals (synaptic boutons) are the branching endpoints of the axon. When an action potential arrives, the terminal releases neurotransmitter molecules into the synaptic gap, carrying the signal to the next neuron's dendrites. One output, broadcast to many downstream neurons.
So the anatomy alone gives me a blueprint: multiple inputs, integration, a threshold decision, and a single output. The question now is how the signal itself behaves.
The Action Potential: All-or-Nothing
Now I need to understand the signal itself. What actually happens when a neuron fires?
At rest, the neuron's interior sits at approximately -70 millivolts relative to the outside, maintained by ion pumps that actively transport sodium (Na+) out and potassium (K+) in. This is the resting potential.
When excitatory signals from dendrites depolarize the axon hillock past the threshold (around -55mV), voltage-gated sodium channels snap open. Na+ rushes in, driving the voltage sharply positive (to about +40mV). This rapid depolarization is the rising phase.
Within a millisecond, sodium channels inactivate and potassium channels open. K+ flows out, repolarizing the cell back past resting potential to about -80mV (the undershoot or hyperpolarization). The ion pumps then restore the resting state.
Figure 2: The action potential. Membrane voltage sits at the resting potential (-70mV) until depolarization crosses the threshold (-55mV), triggering a rapid spike to +40mV. The signal is all-or-nothing: same amplitude every time, regardless of stimulus strength.
Here is the critical insight for building a model: this is a binary event. The neuron either fires a full action potential or it does not fire at all. There is no partial spike, no half-signal. The amplitude of every action potential is the same regardless of stimulus strength. Stimulus intensity is encoded in firing rate (how many spikes per second), not spike amplitude.
This is the property that makes mathematical modeling tractable. If I ignore firing rate and look at a single moment in time, the neuron's output is binary: 1 or 0, fire or do not fire. That maps directly to a mathematical function with a binary output.
Synaptic Transmission: How Neurons Talk
I have inputs, summation, a threshold, and a binary output. But not all inputs are equal. The next piece of the puzzle is how neurons communicate across the gaps between them, and why some connections matter more than others.
When an action potential reaches an axon terminal, it triggers the release of neurotransmitter molecules into the synaptic cleft (the gap between neurons, roughly 20 nanometers wide). These molecules bind to receptors on the receiving neuron's dendrite, opening ion channels that either depolarize or hyperpolarize the receiving cell.
Excitatory synapses (using neurotransmitters like glutamate) push the receiving neuron toward firing. They depolarize the membrane, bringing it closer to threshold.
Inhibitory synapses (using neurotransmitters like GABA) push the receiving neuron away from firing. They hyperpolarize the membrane, making it harder to reach threshold.
Not all synapses are equal. Some connections are strong (more neurotransmitter released, more receptors available, larger effect on membrane potential) and some are weak. The strength of a synapse determines how much influence it has on the receiving neuron's decision to fire. This variable synaptic strength is a real and important feature of biology. But as I will see shortly, McCulloch and Pitts chose to throw it away in their model, treating all excitatory inputs as equal and giving inhibitory inputs an absolute veto. That simplification is what makes their math tractable.
Furthermore, synaptic strength is not fixed. Synaptic plasticity, the ability of synapses to strengthen or weaken over time, is the biological basis of learning. Donald Hebb described this principle in 1949: when a presynaptic neuron consistently contributes to causing the postsynaptic neuron to fire, the synapse between them strengthens. This is often paraphrased as "neurons that fire together wire together" (a summary coined decades later by Carla Shatz, not Hebb's own words, but a faithful distillation of his idea). This is worth noting for later: the connection strengths can change. The brain learns by adjusting them.
Spatial and Temporal Integration
I said the soma sums its inputs, but the reality is more nuanced than simple addition. The soma performs two distinct types of integration, and understanding the difference matters for deciding what to keep in the model and what to throw away.
Spatial summation occurs when signals arrive from multiple synapses at the same time. Each synapse produces a small voltage change (an excitatory or inhibitory postsynaptic potential). These changes propagate through the dendrites to the soma, where they combine. If enough excitatory inputs arrive simultaneously to push the hillock past threshold, the neuron fires. In reality, each synapse contributes a different amount based on its strength. But for the simplest model, I can treat each excitatory input as contributing equally and ask: did enough of them fire at once to cross the threshold? That is the counting operation I am building toward.
Temporal summation occurs when signals arrive from the same synapse in rapid succession. Each individual signal might be too weak to trigger firing, but if they arrive fast enough, the voltage changes accumulate before the previous one decays. This integrates information over time.
For the purposes of a minimal model, spatial summation is the mechanism that matters most. It captures the core computation: take all excitatory inputs arriving at this moment, count how many are active, check if any inhibitory input vetoes, and compare the count against the threshold. Temporal summation adds a time dimension that makes the math significantly harder. If I want the simplest possible abstraction, I can set it aside and treat each computation as a single snapshot in time.
The McCulloch-Pitts Neuron (1943)
At this point I have all the biological pieces: inputs (excitatory and inhibitory), summation, a threshold, and a binary output. This is exactly where McCulloch and Pitts were in 1943. They looked at the same biology and asked the same question I have been working toward: what is the minimum abstraction that preserves the computational behavior?
Their answer was deliberately reductive.
What They Kept
- Multiple binary inputs: A neuron receives signals from many sources (dendrites from many upstream neurons). Each input is either active (1) or inactive (0).
- Excitatory and inhibitory classes: Inputs are either excitatory (contributing toward firing) or inhibitory (vetoing firing entirely). In the original model, all excitatory inputs contributed equally (+1 each), and any single active inhibitory input could prevent the neuron from firing regardless of excitatory count.
- Threshold firing: If the total excitatory count exceeds a threshold (and no inhibitory input is active), the neuron fires (all-or-nothing).
- Binary output: The neuron either fires (1) or does not fire (0).
What They Threw Away
- Temporal dynamics: Real neurons integrate signals over time. The McCulloch-Pitts model computes instantaneously.
- Analog voltage: Real membrane potentials are continuous values. The model uses binary.
- Spatial structure: Dendritic geometry matters in real neurons. The model treats all inputs as arriving at a single point.
- Refractory period: Real neurons cannot fire again immediately after an action potential. The model has no memory of previous states.
- Neurotransmitter chemistry: The model reduces the complex molecular machinery of synaptic transmission to a binary active/inactive signal.
- Variable synaptic strength: Real synapses have different strengths. The original model treats all excitatory inputs equally (+1 each).
- Firing rate: Real neurons encode information in spike frequency. The model produces a single binary value.
The Mathematical Formulation
In the original 1943 model, the neuron computes:
y = 1 if no inhibitory input is active AND (x1 + x2 + ... + xn) >= threshold
y = 0 otherwise
Where:
xiare the binary excitatory inputs (0 or 1), each with equal influence- Any single active inhibitory input forces the output to 0 regardless of excitatory count
- The threshold is a fixed integer: the minimum number of active excitatory inputs needed to fire
This is the strict McCulloch-Pitts formulation. It is deliberately simple: count the active excitatory inputs, check for any inhibitory veto, compare against a threshold. Later work would generalize this by introducing variable real-valued weights and a learning rule, but that is the next chapter in the story.
Computing with Threshold Logic
The model raises an immediate question: is this abstraction powerful enough to do anything useful? McCulloch and Pitts proved something profound: networks of their idealized neurons can compute any logical function. A single McCulloch-Pitts neuron can implement the basic Boolean logic gates.
AND Gate
An AND gate outputs 1 only when both inputs are 1. Two excitatory inputs, no inhibitory inputs, threshold of 2:
2 excitatory inputs, threshold: 2
x1=0, x2=0 -> count 0 < 2 -> output 0
x1=1, x2=0 -> count 1 < 2 -> output 0
x1=0, x2=1 -> count 1 < 2 -> output 0
x1=1, x2=1 -> count 2 >= 2 -> output 1
OR Gate
An OR gate outputs 1 when at least one input is 1. Two excitatory inputs, no inhibitory inputs, threshold of 1:
2 excitatory inputs, threshold: 1
x1=0, x2=0 -> count 0 < 1 -> output 0
x1=1, x2=0 -> count 1 >= 1 -> output 1
x1=0, x2=1 -> count 1 >= 1 -> output 1
x1=1, x2=1 -> count 2 >= 1 -> output 1
NOT Gate
A NOT gate inverts the input. The single input is inhibitory, there are no excitatory inputs, and the threshold is 0 (the neuron fires by default unless inhibited):
0 excitatory inputs, 1 inhibitory input, threshold: 0
x1=0 (inactive) -> no veto, count 0 >= 0 -> output 1
x1=1 (active) -> inhibitory veto -> output 0
Toward Turing-Completeness
Since AND, OR, and NOT gates are sufficient to compute any Boolean function (they form a functionally complete set), this means networks of McCulloch-Pitts neurons can, in principle, compute anything that Boolean circuits can compute.
They went further. The model already operates in discrete time steps (each neuron's output at time t depends on its inputs at time t-1). By adding feedback loops (outputs feeding back as inputs to the same network), they showed that networks of binary threshold units can simulate any finite automaton. This linked neural computation to the formal theory of computation that Turing had developed just seven years earlier. Finite automata are not Turing-complete (they lack unbounded memory), but the result was still remarkable: a model derived from biology could replicate any fixed-state computational process.
But there is a glaring limitation: the thresholds and connections have to be set by hand. McCulloch and Pitts provided no mechanism for a network to learn the right configuration from data. Their model was a proof of computational capability, not a learning algorithm. Remember the synaptic plasticity I noted earlier, the brain's ability to strengthen and weaken connections? That is the biological mechanism for learning, and it is entirely absent from this model. The question of how to find the right configuration automatically would take another 15 years to answer.
Biology vs. Model: A Side-by-Side View
Now I can step back and see exactly what was kept and what was thrown away. This comparison matters because every subsequent development in neural networks recovered some piece of biology that this first model discarded.
| Biological Property | McCulloch-Pitts | Later Models |
|---|---|---|
| Multiple inputs | Yes | Yes |
| Variable connection strength | No (equal excitatory, absolute inhibitory) | Yes (real-valued learnable weights) |
| Threshold firing | Yes (step function) | Yes (activation functions) |
| Binary output | Yes | No (continuous activations) |
| Temporal dynamics | No | RNNs, LSTMs |
| Analog voltage | No | Continuous-valued neurons |
| Spatial dendritic structure | No | Still mostly ignored |
| Synaptic plasticity | No | Backpropagation |
| Refractory period | No | Spiking neural networks |
| Firing rate coding | No | Rate-coded networks |
| Neurotransmitter diversity | No | Still mostly ignored |
The original model was deliberately minimal. It captured the essence of neural computation, enough to compute any Boolean function, but not enough to learn. The gap column on the right is a roadmap for the next 80 years of AI research. The next step in the lineage, the perceptron, would add exactly what was missing: a rule for adjusting the weights automatically.
The McCulloch-Pitts Neuron in Code
Everything above reduces to a handful of lines of Python. The function below is the complete McCulloch-Pitts neuron: count the active excitatory inputs, check for any inhibitory veto, compare against the threshold.
def neuron(excitatory, inhibitory, threshold):
"""McCulloch-Pitts neuron (1943).
excitatory: list of binary inputs (0 or 1), each contributing +1
inhibitory: list of binary inputs (0 or 1), any active one vetoes
threshold: minimum excitatory count needed to fire
"""
if any(i == 1 for i in inhibitory):
return 0
return 1 if sum(excitatory) >= threshold else 0
The three logic gates from earlier, each a single neuron with the right threshold:
def AND(a, b): return neuron([a, b], [], threshold=2)
def OR(a, b): return neuron([a, b], [], threshold=1)
def NOT(x): return neuron([], [x], threshold=0)
>>> AND(1, 1), AND(1, 0), AND(0, 0)
(1, 0, 0)
>>> OR(1, 0), OR(0, 1), OR(0, 0)
(1, 1, 0)
>>> NOT(0), NOT(1)
(1, 0)
A single McCulloch-Pitts neuron cannot compute XOR (it requires a nonlinear decision boundary). But a network of them can. XOR is just AND(OR(a, b), NOT(AND(a, b))):
def XOR(a, b):
return AND(OR(a, b), NOT(AND(a, b)))
>>> XOR(0, 0), XOR(0, 1), XOR(1, 0), XOR(1, 1)
(0, 1, 1, 0)
Four neurons, hand-wired, no learning. Every threshold set manually. That is the McCulloch-Pitts model in its entirety: powerful enough to compute any Boolean function, but every connection and threshold must be designed by the engineer. The question of how to make the network figure out the right configuration on its own is the subject of the next post.
Key Takeaways
I started with a question: what does a neuron actually do, and what is the simplest mathematical abstraction that captures it? The answer, the same answer McCulloch and Pitts arrived at in 1943, is: binary inputs, excitatory summation with inhibitory veto, a threshold, and a binary output. A sophisticated electrochemical computer stripped down to its computational essence. That abstraction was powerful enough to prove that networks of simple threshold units can compute any Boolean function. But the thresholds and connections had to be hand-designed, and the biology I set aside along the way, variable synaptic strength, temporal dynamics, analog voltage, synaptic plasticity, is exactly what later generations of models would recover. The path from here to modern deep learning is the story of adding those pieces back, one by one.