Understanding the IPSec Protocol Suite

Understanding the IPSec Protocol Suite
Alcatel, March 2000

Protocol Negotiation and Key Management

The password and its numerical analogue, the PIN, are among the simplest ways available of guaranteeing security. Easy to implement and generally suitable to a range of security needs, they're everywhere now. The concept is simple: a password or PIN is something you know that no one else does, that therefore identifies you. But they're a pain to remember. Probably everyone has at least one embarrassing incident to recount - the day they drew a blank at the bank machine and couldn't remember their PIN. Or the time building security almost called the police.The same basic difficulty faces anyone trying to implement security measures based on keyed encryption techniques. Keys are like passwords, and you've got to keep track of them. And the more keys you have, the harder it is to keep track. Factor in that keys often change regularly, and you've got the makings of a real nightmare on your hands.

How Many Keys? An Illustration

Imagine 20 stock traders have interests that are sometimes compatible and sometimes competing. They want to talk to each other privately and safely, through a public network. If they use a symmetric encryption scheme to do this, they will need a total of 190 keys among them. Each of them has to keep track of 19 keys - one to communicate with each of the other group members. Add to this that they want to keep changing those keys, at least monthly, and that they have to do it in person (which is safest). So each trader has to meet with each of the 19 other traders monthly, for a total of 190 two-person meetings each month. Obviously, this is not practical. The first thing you can do to improve this situation is to use asymmetric encryption instead of symmetric. With asymmetric encryption, each trader issues a public key that all the other traders use. Each trader also has his own private key. So each trader actually has to remember 20 keys (the 19 public keys of the other traders, plus his own private key). But exchanging them is far simpler. They could all meet at one big meeting, and each publicly announce his public key. We're now down from 190 meetings to one meeting a month. Still complicated, but getting easier. This artificial example illustrates two important points:

Key exchange is fundamentally a complicated business.
Key exchange gets more complicated as the group of communicating players expands.

So just because a system says it does encryption, that alone doesn't mean it's going to be appropriate for your needs, even if the system does that encryption well. Any proposed secure VPN solution is only as good as its method of key exchange, especially if it serves large enterprise needs. IPSec does support large enterprise needs, and its industrial strength key exchange and protocol negotiation scheme sets it apart from all other security systems.

Key Management and Exchange

To communicate with someone using authentication and encryption services (like those provided by IPSec's ESP and AH), you need to do three things: Negotiate with other people the protocols, encryption algorithms, and keys, to use. Exchange keys easily (this might include changing them often). Keep track of all these agreements. IPSec provides mechanisms to do all three.

The Security Association

The first problem IPSec's designers solved was actually number three, how to keep track of all the details, and which keys and algorithms to use. They did it by bundling everything together in something called the Security Association (SA). The SA groups together all the things you need to know about how you communicate securely with someone else. The SA, under IPSec, specifies:

the mode of the authentication algorithm used in the AH and the keys to that authentication algorithm
the ESP encryption algorithm mode and the keys to that encryption algorithm
the presence and size of (or absence of) any cryptographic synchronization to be used in that encryption algorithm
how you authenticate your communications (using what protocol, what encrypting algorithm and what key)
how you make your communications private (again, what algorithm and what key) how often those keys are to be changed
the authentication algorithm, mode and transform for use in ESP plus the keys to be used by that algorithm
the key lifetimes
the lifetime of the SA itself
the SA source address
a sensitivity level descriptor

You can think of the SA as your secure channel through the public network to a certain person, group of people, or network resource. It's like a contract with whomever is at the other end. The SA also has the advantage that it lets you construct classes of security channels. If you need to be a little more careful talking to one party than another, the rules of your SA with that party can reflect extra caution - specifying stronger encryption, for example.

The SA applied to business - an example

SAs are good for building multiple secure VPNs. Imagine your company has its own secure VPN. You develop a business relationship with another company that also has a secure VPN. You're perfectly happy to give them some access to your network by linking the two secure VPNs, as this facilitates business, but you don't want them to have full access. Some things do not belong outside the company, or even outside Human Resources' filing cabinets. So you have specific SAs between your secure VPN and theirs, controlling who has what access to which resources. And you have another set of specific SAs within your secure VPN. And another within Human Resources. And so on. You can layer your secure VPN like an onion using the SA concept. That's what the SA is for.

How the SA works with the SPI

The SA is a concept, whereas he SPI is more concrete. The SPI is a number that uniquely identifies an SA. The SPI, together with the SA concept, makes keeping track of keys and protocols easy and automatic. Specifically, the SPI is the 32-bit number we mentioned earlier when describing the ESP and AH packet formats (see How IPSec Embeds Encryption in the ESP and AH -Authentication Without Confidentiality in this paper). The SPI is an arbitrary 32-bit number your system picks to represent that SA whenever someone negotiates an SA with you. It identifies the SA. The SPI can not be encrypted in the packet because you use it to keep track of how to decrypt the packet. It works like this. When you negotiate an SA, the recipient node assigns an SPI it isn't already using, and preferably, one it hasn't used in a while. It then communicates this SPI to the node with which it negotiated the SA. From then until that SA expires, whenever that node wishes to communicate with yours using that SA, it uses that SPI to specify it. Your node, on receipt, would look at the SPI to determine which SA it needs to use. Then it authenticates and/or decrypts the packet according to the rules of that SA, using the agreed-upon keys and algorithms to verify (depending on the terms of the SA) that the data really did come from the node it claims, that the data has not been modified, and that no one between those nodes has read the data. The next thing we need is to work out how to negotiate those SAs in the first place.

IKE - industrial strength key exchange

IKE, the IPSec group's answer to protocol negotiation and key exchange through the Internet, is actually a hybrid protocol. It integrates the Internet Security Association and Key Management Protocol (ISAKMP) with a subset of the Oakley key exchange scheme.IKE provides a way to:

agree on which protocols, algorithms, and keys to use (negotiation services)
ensure from the beginning of the exchange that you're talking to whom you think you're talking to (primary authentication services)
manage those keys after they've been agreed upon (key management)
exchange material for generating those keys safely

Key exchange is a closely related service to SA management. When you need to create an SA, you need to exchange keys. So IKE wraps them both up together, and delivers them as an integrated package.

Manual key exchange

There is one other way to exchange keys. IPSec specifies that compliant systems support manual keying as well. That means if you wish to use manual (face-to-face) key exchange for certain situations, you still can. But IPSec's designers also assume that in most situations, for most large enterprises, this would be impractical. So IKE, the safe way to negotiate SAs and exchange keys through public networks, will probably do most of the work for most of the world.

IKE phases

IKE functions in two phases. In phase one, two IKE peers establish a secure channel for doing IKE (called the IKE SA). In phase two, those two peers negotiate general purpose SAs. An IKE peer is an IPSec-compliant node capable of establishing IKE channels and negotiating SAs. It might be the computer on your desktop, or something called a security gateway that negotiates security services for you.

IKE modes

Oakley provides three modes of exchanging keying information and setting up SAs - two for IKE phase one exchanges, and one for phase two exchanges. Main mode accomplishes a phase one IKE exchange by establishing a secure channel. _ Aggressive mode is another way of accomplishing a phase one exchange - it's a little simpler and a little faster than main mode, but does not provide identity protection for the negotiating nodes, as they must transmit their identities before having negotiated a secure channel through which to do so. Quick mode accomplishes a phase two exchange by negotiating an SA for general purpose communications.

Establishing a secure channel for negotiation

To establish an IKE SA, the initiating node proposes six things:

encryption algorithms (to protect data)
hash algorithms (to reduce data for signing)
an authentication method (for signing data)
information about a group over which to do a Diffie-Hellman exchange (see below)
a pseudo-random function (PRF) used to hash certain values during the key exchange for verification purposes (this is optional, you can also just use the hash algorithm)
the type of protection to use (ESP or AH)

Perfect forward secrecy

One of the most annoying things about passing encrypted data around a public network is the number of opportunities an attacker has to get hold of encrypted material. You can reduce the risk of their ever deciphering it by using larger and larger keys. But the larger the key, the slower and more complex the encryption - and this can impair network performance. A good compromise solution is to use reasonably large keys, and to keep changing them. But this presents difficulties too. You need ways to generate those new keys so that the person at the other end can agree on them as well. But, to generate your new key, you can't use either the key you're changing from, or material used to generate the key you're changing from. The point is that if you do, and then someone gets hold of the current key, that person can easily deduce your new key. What you need is a method of generating a new key that is in no way dependent on the value of the current key. So if someone gets hold of your current key, it only gives them a small part of the overall picture, and they would have to break yet another entirely unrelated key to get the next part. Cryptographers call this concept "perfect forward secrecy". IKE uses a scheme called Diffie-Hellman to do this.

Difiie-Hellman

A Diffie-Hellman exchange works like this: two people independently and randomly generate values much like a public/private key pair. Each sends its public value to the other (using authentication to close out the man-in-themiddle). Each then combines the public key just received with the private key just generated, using the Diffie-Hellman combination algorithm. The resulting value is the same on both sides, and therefore can be used for fast symmetric encryption by both parties. But no one else in the world can come up with the same value from the two public keys passed through the net, since the final value also depends on the private values, which remain secret. You can use the derived Diffie-Hellman key either as a session key for subsequent exchanges, or to encrypt yet another randomly generated key, which you can then pass through the net quite safely. Note that yes, you do need authentication to protect even Diffie-Hellman exchanges against the man-in-the-middle. Diffie-Hellman alone does not solve this problem. It would be complicated, but without authentication a man-in-themiddle could use an active attack to get in on the action and plant his own keys.But if the key exchange mechanism you're using is protected by an authentication scheme, Diffie-Hellman allows you to generate new shared keys to use for symmetric encryption which are independent of older keys - providing perfect forward secrecy. And since symmetric encryption techniques are a lot faster, this can be quite useful in network communications. You may wish to agree on a few things to do a Diffie-Hellman exchange in the first place. That's what the Diffie- Hellman parameter in the IKE SA is for. The parameter contains information on a group to perform the Diffie-Hellman exchange. The group consists of generation material used for coming up with keys.

The pseudo-random function

The pseudo-random function is also worth explaining briefly. A PRF is really just another name for a hash function. In IKE, you can use the PRF both for authentication purposes and to generate additional key material (as a randomizer).

Main mode

Main mode provides a mechanism for establishing the first phase IKE SA, which is used to negotiate future communications. Remember, the object here is to agree on enough things (authentication and confidentiality algorithms, hashes and keys) to be able to communicate securely long enough to set up an SA for future communication. The steps in full will be:

Use main mode to bootstrap an IKE SA.
Use quick mode within that IKE SA to negotiate a general SA.
Use that SA to communicate from now until it expires.

The first step, securing an IKE SA using main mode, occurs in three two-way exchanges between the SA initiator and the recipient (see Figure 7). In the first exchange (1 and 2 in the illustration below), the two agree on basic algorithms and hashes. In the second (3 and 4 in the illustration below), they exchange public keys for a Diffie-Hellman exchange, and pass each other nonces - random numbers the other party must sign and return to prove its identity. In the third (5 and 6 in the illustration below), they verify those identities.

Figure 7: IKE main mode

Cert: a certification payload

ID: an identification payload

Key: a key exchange payload

Nonce: a nonce payload

SA: a Security Association/proposal payload

Sig: a signature payload

So the following is how IKE establishes its own IKE SA, step by step, using main mode and established digital signatures for authentication. Each of the pieces is carried in its own payload, but you can pack any number of these payloads into a single IKE packet. The parties actually use the generated shared Diffie-Hellman value in three permutations, once they derive it. Both parties have to hash it three times - generating first a derivation key (to be used later for generating additional keys in quick mode), then an authentication key (for authentication), and then, finally, the encryption key to be used for the IKE SA.

Aggressive mode

Aggressive mode provides the same services as main mode. It establishes the original IKE SA. It looks much the same as main mode except that it is accomplished in two exchanges, rather than three, with only one round trip, and a total of three packets rather than six.In aggressive mode, the proposing party generates a Diffie -Hellman pair at the beginning of the exchange, and does as much as is practical with that first packet - proposing an SA, passing the Diffie-Hellman public value, sending a nonce for the other party to sign, and sending an ID packet that the responder can use to check its identity with a third party. The responder then sends back everything needed to complete the exchange - really an amalgamation of all three response steps in main mode, and all that's left for the initiator to do is to confirm the exchange (see Figure 8). The end result is that an aggressive mode exchange attains the same goal as a main mode exchange, except that aggressive mode does not provide identity protection for the communicating parties. That is to say, in aggressive mode, the parties exchange identification information prior to establishing a secure SA in which to encrypt it. So someone monitoring an aggressive exchange can actually identify who has just formed a new SA. The advantage of aggressive mode, however, is speed.

Figure 8: IKE aggressive mode

Cert: a certification payload

ID: an identification payload

Key: a key exchange payload

Nonce: a nonce payload

SA: a Security Association/proposal payload

Sig: a signature payload

Quick mode

Once two communicating parties have established an IKE SA using aggressive mode or main mode, they can use quick mode. Quick mode has two purposes - negotiating general IPSec security services and generating fresh keying material. Quick mode is less complex than either main or aggressive mode. Since it's already inside a secure tunnel (every packet is encrypted), it can also afford to be a little more flexible. Quick mode packets are always encrypted, and always start with a hash payload. The hash payload is composed using the agreed-upon PRF and the derived authentication key for the IKE SA. The hash payload is used to authenticate the rest of the packet. Quick mode defines which parts of the packet are included in the hash (see Figure 9).

Figure 9: IKE quick mode

ID: an identification payload

Key: a key exchange payload

Nonce: a nonce payload

SA: a Security Association/proposal payload

Key refreshing can be done one of two ways. If you don't want or need perfect forward secrecy, quick mode can just refresh the keying material already generated (in main or aggressive mode) with additional hashing. The two communicating parties can exchange nonces through the secure channel, and use these to hash the existing keys. If you do want perfect forward secrecy, you can still request an additional Diffie-Hellman exchange through the existing SA and change the keys that way. Basic quick mode is a three packet exchange, like aggressive mode. If the parties do not require perfect forward secrecy, the initiator sends a packet with the quick mode hash, with proposals and a nonce. The respondent then replies with a similar packet, but generates its own nonce and includes the initiator's nonce in the quick mode hash for confirmation. The initiator then sends back a confirming quick mode hash of both nonces, completing the exchange. Finally, both parties perform a hash of a concatenation of the nonces, the SPI, and the protocol values from the ISAKMP header that initiated the exchange, using the derivation key as the key for the hash. The resulting hash becomes the new password for that SA. If the parties do require perfect forward secrecy, the initiator first generates a public/private key pair, and sends the public key along with the initiation packet (along with the hash and nonce). The recipient then responds with its own public key and nonce, and both parties then generate the shared key using a Diffie-Hellman exchange - again fully protected by the quick mode hashes, and by full encryption within the IKE SA