UnboundID LDAP SDK for Java

Product Information

LDAPv3 Wire Protocol Reference

The ASN.1 Basic Encoding Rules

LDAP is a binary protocol, which helps make it compact and efficient to parse. The particular binary encoding that it uses is based on ASN.1 (Abstract Syntax Notation One), which is a framework for representing structured data. ASN.1 is actually a family of encodings that each have their own pros and cons for different situations. For example, you might use the Packed Encoding Rules (PER) if you want to make sure that the encoded representation is as small as possible, or you might use the Octet Encoding Rules (OER) if you favor encode/decode performance over size. LDAP uses the Basic Encoding Rules (BER), which finds a good middle ground between the two.

The complete BER specification has a lot of flexibility and ambiguity, and there are several special cases to consider. Covering all of it in depth would make for a somewhat daunting task, both for me trying to explain everything, and for someone trying to take it all in. If you’re interested in all the gory details, there are already some good books that tackle that much better than I can. I highly recommend Professor John Larmouth’s excellent book ASN.1 Complete, which you can get online as a free PDF download, or you can buy an honest-to-goodness paper copy if you’d prefer a physical copy. And you can always look at the official ASN.1 specifications for the authoritative source, although they can be dense and they’re not always all that easy to interpret.

Fortunately, LDAP uses a pretty well-defined subset of BER that has less ambiguity and fewer special cases. We should be able to cover all the BER that you need to understand the LDAP wire protocol without too much difficulty.

In ASN.1 BER, each piece of data is called an element, and each BER element has three parts: a type, a length, and a value. Let’s take a closer look at each of these components.

BER Types

A BER element’s type is used to indicate what kind of information that element can hold, not unlike declaring the data type (string, integer, boolean, etc.) for a variable in a computer program. There are a lot of different kinds of BER types, but if we’re just talking about LDAP’s use of BER, then there are really only seven basic data types that we need to know about:

Null elements don’t have a value.
Boolean elements have a value that is either true or false.
Integer elements have a value that is a whole number, with no fractional component.
Octet string elements have a value that is a collection of zero or more bytes. An octet string’s value may represent a text string, but it could also just be a binary blob.
Enumerated elements have a predefined set of values in which each value has a particular meaning.
Sequence elements encapsulate a collection of zero or more other elements in which the order of those elements is considered significant.
Set elements encapsulate a collection of zero or more other elements in which the order of those elements is not considered significant.

Using these seven types, we can construct any kind of LDAP request or response.

Because BER is a compact binary protocol, it uses a compact binary representation for an element’s type. Although general-purpose BER supports types that span multiple bytes, it is highly unlikely that you’ll ever encounter a BER element in an LDAP message that uses more than one byte for its type. And that byte is laid out as follows:

Bits	8	7	6	5	4	3	2	1
Purpose	Class		Primitive or Constructed?	Tag Number

The BER Type Class

The two most significant bits in this byte (i.e., the two leftmost bits in the big-endian representation of that byte) represent the class for the type. You can also think of this as the scope for the type, which lets you know how likely it is for the same BER type to have the same meaning in two different settings. Since the class is encoded as two bits, there are four possible values:

00 — This is the universal class. BER types in the universal class always mean the same thing, regardless of where you see it. For example, if you see a BER element with a type of 00000010 binary (0x02 hex, which means universal class, primitive, tag number two), then the value of that element will always be an integer.
01 — This is the application class. BER types in the application class always mean the same thing within one application but might mean something completely different in another application. And here “application” doesn’t necessarily mean a computer program; in the case of LDAP, it means the complete protocol specification. For example, if you see a BER element in an LDAP message with a type of 01110011 binary (0x63 hex, which means application class, constructed, tag number three), then the value of that element will always be an LDAP search request protocol op.
10 — This is the context-specific class. BER types in the context-specific class can have different meanings from one element to another, and you need to have an understanding of how it’s being used to be able to determine what it means. For example, if you see a BER element in an LDAP message with a type of 10100011 binary (0xa3 hex, which means context-specific class, constructed, tag number three), then it could represent a set of referral URLs if it appears in an LDAPResult sequence, or it could represent an equality filter component in a search request, or it could mean something completely different somewhere else in some other context.
11 — This is the private class. It’s intended to be something in between the universal and application classes, where an organization could define its own set of types that have the same meaning across all of their applications, but the use of the private class is discouraged, and it’s highly unlikely that you’ll ever encounter it in LDAP.

The BER Type Primitive/Constructed Bit

The third most significant bit in a BER type is used to indicate whether the element is primitive or constructed. If this bit is set to one, then it means that the element is constructed and that its value is comprised of a concatenation of zero or more encoded BER elements. Sequences and sets, which encapsulate elements, are constructed. On the other hand, if this third bit is set to zero, then it means that the element is primitive and that its value should not be assumed to be comprised of encoded elements. Null, Boolean, integer, octet string, and enumerated elements are all primitive.

The BER Type Tag Number

The remaining five bits in a BER type are used to encode the tag number, which is used to differentiate between different kinds of elements within the same class. The tag number is encoded using the binary representation of that number, so 00000 represents a tag number of zero, 00001 is a tag number of one, 00010 a tag number of two, and so on. Since there are only five bits used for the tag number, you can only have tag numbers up to thirty encoded in a single byte. Fortunately, it’s extremely unlikely that you’ll ever encounter a tag number that is greater than thirty in LDAP (the highest tag number I’m aware of is twenty-five, used for the LDAP intermediate response protocol op), so you probably don’t need to worry about multi-byte types.

The Universal BER Types Used in LDAP

The following are the BER types in the universal class that you’re likely to encounter in LDAP:

Element Type	Binary Encoding	Hex Encoding
Boolean	00000001	0x01
Integer	00000010	0x02
Octet String	00000100	0x04
Null	00000101	0x05
Enumerated	00001010	0x0a
Sequence	00110000	0x30
Set	00110001	0x31

The Application BER Types Used in LDAP

The following are the BER types in the application class that are defined for LDAP:

Element Type	Binary Encoding	Hex Encoding
Bind Request Protocol Op	01100000	0x60
Bind Response Protocol Op	01100001	0x61
Unbind Request Protocol Op	01000010	0x42
Search Request Protocol Op	01100011	0x63
Search Result Entry Protocol Op	01100100	0x64
Search Result Done Protocol Op	01100101	0x65
Modify Request Protocol Op	01100110	0x66
Modify Response Protocol Op	01100111	0x67
Add Request Protocol Op	01101000	0x68
Add Response Protocol Op	01101001	0x69
Delete Request Protocol Op	01001010	0x4a
Delete Response Protocol Op	01101011	0x6b
Modify DN Request Protocol Op	01101100	0x6c
Modify DN Response Protocol Op	01101101	0x6d
Compare Request Protocol Op	01101110	0x6e
Compare Response Protocol Op	01101111	0x6f
Abandon Request Protocol Op	01010000	0x50
Search Result Reference Protocol Op	01110011	0x73
Extended Request Protocol Op	01110111	0x77
Extended Response Protocol Op	01111000	0x78
Intermediate Response Protocol Op	01111001	0x79

The unbind request, delete request, and abandon request protocol op types are primitive, while all the rest are constructed. This explains why their hexadecimal representations are so out-of-line with their neighboring values. The unbind request protocol op is a null element, the delete request protocol op is an octet string element, the abandon request protocol op is an integer element, and all other types are sequence elements.

BER Lengths

A BER element’s length specifies the number of bytes in the value. There are two ways to encode the length: a single-byte representation for values of up to 127 bytes, and a multi-byte representation for values of any size.

In the single-byte representation, the length is just encoded using the binary representation of the number of bytes in the value. For example, if the value is zero bytes long (which will be the case for a null element, for a zero-byte octet string, or an empty sequence or set), then the length is encoded as 00000000 binary or 0x00 hex. If the value is five bytes long, then the length is encoded as 00000101 binary or 0x05 hex. And a value that is 123 bytes long would be encoded as 01111011 binary or 0x7b hex.

In the multi-byte representation, the first byte has its most significant bit set to one, and the lower seven bits are used to indicate how many bytes are required to represent the length. For example, let’s say that you want to encode the length for a value that is 1234 bytes long. The binary representation of 1234 is 10011010010 (0x4d2 hex), which is large enough that it will require two bytes. And then we’ll need to precede those two bytes with a third byte that has its leftmost bit set to one and the right seven bits used to hold the binary representation of the number two. So the full binary representation of a BER length for a 1234-byte-long value is 100000100000010011010010 (0x8204d2 hex).

Although the above

Note 1: Encoding BER Lengths with More Bytes than Necessary

BER doesn’t require you to encode the length in the smallest possible number of bytes. You can use a multi-byte representation for lengths that could be encoded in just a single byte, and you can use more bytes than necessary in a multi-byte representation. For example, all of the following hexadecimal encodings are valid ways to represent a BER length of ten bytes:

0a
81 0a
82 00 0a
84 00 00 00 0a
8a 00 00 00 00 00 00 00 00 00 0a

Some BER libraries choose to always use multi-byte encodings for certain types of elements (especially sequences and sets). When looking at encoded LDAP traffic, it’s relatively common to see encoded lengths that start with 0x84, followed by four more bytes that actually hold the encoded length. This is usually done for efficiency, because it allows the library to just directly copy the bytes that make up the 32-bit integer representation of the length, and because it makes it possible for the library to go back and fill in the length for a sequence or set once it knows how many elements that sequence or set contains and how big those elements are.

Although it’s technically valid to use any number of bytes to encode a BER length, many libraries impose a limit on the number of bytes that they will support in multi-byte lengths. In most cases, that limit is four bytes, not counting the one extra byte used to indicate that it’s a multi-byte length, so it’s probably best to avoid generating multi-byte lengths that start with anything larger that 0x84.

Note 2: Imposing Upper Bounds on BER Lengths for Safety

Most BER libraries impose an upper limit on the size of the elements that they will accept. This is a safety feature that is intended to mitigate the risk of a malicious application claiming that it’s going to send a very large element in the hopes that it will cause the application to allocate enough memory to hold that element, which could cause the application to crash or the system to start swapping. If you’re thinking about writing a BER decoder, it’s a very good idea to ensure you have some way of rejecting elements that are unreasonably large.

Note 3: The Indefinite Length Form

BER actually offers a third way to represent the length of an element. This is called the indefinite form, and it uses a special token at the beginning to indicate the start of a value that uses the indefinite form, and then another special token after the end of the value. This is potentially useful for cases in which the size of the element may not be known in advance (for example, when starting a sequence without knowing how many elements will be added to that sequence). However, you won’t encounter the indefinite length form in LDAP because RFC 4511 section 5.1 explicitly forbids its use, so I won’t go into any more detail about it here.

BER Values

A BER element’s value holds an encoded representation of the data for that element. The way that the value is encoded depends on the type of element, so we’ll cover each kind of value separately.

Null Values

A null element is one that doesn’t have a value. Or, more accurately, it always has a value with a length of zero bytes. Null elements are typically used in cases where an element is needed, but the value for that element isn’t important. For example, the LDAP unbind request protocol op is a null element because an unbind request doesn’t have any parameters.

Null elements are always primitive, and the value is always empty, so the length is always zero bytes. The universal BER type for a null element is 0x05, so the full hexadecimal encoding for a universal null element is:

05 00

In LDAP, the unbind request protocol op is encoded as a null element in the application class with a tag number of two (as per RFC 4511 section 4.3). The hexadecimal representation of that element is:

42 00

Boolean Values

A Boolean element is one whose value represents the Boolean condition of either true or false. The value of a Boolean element is always encoded as a single byte, with 0xff representing true and 0x00 representing false.

LDAP is more restrictive than general-purpose BER is when it comes to encoding Boolean values of true. In general BER, a value of false is always as represented a single byte with all bits set to zero (hex 0x00), while a value in which at least one bit is set to one represents true. But RFC 4511 section 5.1 states that LDAP messages should always encode true values with all bits set to one, which is 0xff hex.

Boolean elements are always primitive, and they always have a one-byte value. The universal BER type for a Boolean element is 0x01, so the encoding for a universal Boolean element with a value of true is:

01 01 ff

And the encoding for a universal Boolean element with a value of false is:

01 01 00

Octet String Values

An octet is a byte, and an octet string is simply zero or more bytes strung together. Those bytes can represent text (in LDAP, it’s usually the bytes that comprise the UTF-8 representation of that text), or they can just make up some arbitrary blob of binary data. LDAP uses octet strings all over the place, including for DNs, attribute names and values, diagnostic messages, and to hold the encoded values of controls, extended requests and responses, and SASL credentials.

In LDAP, octet strings are always primitive (BER allows for the possibility of constructed octet strings, but RFC 4511 section 5.1 forbids that use in LDAP). The universal BER type for an octet string element is 0x04, and the hexadecimal bytes that correspond to the UTF-8-encoded text string “Hello!” are: 48 65 6c 6c 6f 21, so the encoding for a universal octet string element meant to hold the text string “Hello!” is:

04 06 48 65 6c 6c 6f 21

Integer Values

An integer is a whole number, without any decimal point or fractional portion. Integer values can be positive, negative, or zero.

In BER, integer values are encoded using the two’s complement representation of the desired numeric value, using the smallest number of bytes that can hold the specified value. The process for coming up with the two’s complement representation varies a little based on whether the value is negative or not.

An integer value of zero is always encoded as a single byte, and that byte is 00000000 binary or 00 hexadecimal. Integer elements are always primitive, and the BER type for universal integer elements is 0x02, so the hexadecimal encoding for a universal integer element with a value of zero is:

02 01 00

Positive integer values are encoded in the smallest number of bytes needed to hold the big-endian binary representation of that number, with the caveat that the most significant bit of the first byte cannot be set to one. If the binary representation of the desired integer value requires a multiple of eight bits, then you should prepend an extra byte with all bits set to zero. For example, the binary representation of the integer value 50 is 00110010 (32 hex), so the hex encoding for a universal integer element with a value of 50 is:

02 01 32

But the binary representation of the integer value 50,000 is 11000011 01010000 (c3 50 hex), which does have its most significant bit set to one so we need to pad it with an extra byte of all zeros to get 00000000 11000011 01010000 binary (00 c3 50 hex), and the hex encoding for a universal integer element with a value of 50,000 is:

02 03 00 c3 50

Negative integer values are more difficult to understand in the two’s complement notation. If the most significant bit of the first byte is a one, then it indicates that the value is negative, but it’s not sufficient to just flip that bit from zero to one in order to turn a positive value into its negative equivalent. To compute the two’s complement representation for a negative integer, you need to use the following process:

Start with the big-endian binary representation of the absolute value for the desired negative number. For example, if you want to find the two’s complement representation for the number -12345, start by finding the big-endian binary representation of positive 12345, which is 00110000 00111001 (30 39 hex).
Flip all of the bits in the value that you just computed so that all the ones become zeros, and the zeros become ones. So 00110000 00111001 would become 11001111 11000110. (cf c6 hex).
Add one to the resulting value, so 11001111 11000110 binary would become 11001111 11000111 (cf c7 hex).

So the hex encoding for a universal integer element with a value of -12,345 is:

02 02 cf c7

Technically, BER does not impose any limits on the magnitude of the positive or negative integer values that it can represent. However, many BER libraries do define their own bounds for the sizes of integer values that they can handle. It’s probably a safe assumption that a BER library can work with signed 32-bit integer values (that is, numbers between -2,147,483,648 and 2,147,483,647), which is the range that LDAP requires, but if you’re writing your own BER library, or are looking for a library to use with LDAP, then it’s probably better to ensure that it has support for at least signed 64-bit integer values (between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807).

Enumerated Values

An enumerated element is like an integer in that its value is numeric, but each number is associated with a particular meaning. For example, LDAP result messages use an enumerated element to encode the result code (for example, a numeric value of 0 means that the operation completed successfully, 32 means that the operation targeted an entry that didn’t exist, 49 means that an authentication attempt failed because the user provided invalid credentials, etc.). LDAP also uses enumerated elements for things like modification types, search scopes, and alias dereferencing behaviors.

Enumerated elements are encoded in exactly the same way as integer elements. They’re always primitive, and the value is the two’s complement representation of the integer value that it holds. The universal BER type for an enumerated element is 0x0a, so the hexadecimal encoding for a universal enumerated element that represents the LDAP “success” result code (integer value zero) is:

0a 01 00

Although they often are, the allowed set of numeric values for an enumerated element do not have to fall in a contiguous range. For example, there are gaps in the defined set of values for LDAP result codes.

Enumerated elements should not have negative numeric values. However, values are still encoded using the two’s complement representation of the value, so that it may be necessary to add a leading byte in which all bits are set to zero if the binary representation of the numeric value would have otherwise caused the most significant bit in the first byte to be set to one.

Sequence Values

A sequence element is a container that holds a list of zero or more other elements. The order in which elements appear in a sequence is considered significant. The value of a sequence element is simply a concatenation of the encoded representations of all of the elements contained in the sequence.

Sequences are always constructed. The universal BER type for a sequence is 0x30, so the encoded representation of a BER sequence that contains a universal octet string with a value of “Hello!”, a Boolean value of true, and an integer value of five would be encoded as follows:

30 0e 04 06 48 65 6c 6c 6f 21 01 01 ff 02 01 05

This encoding is easier to understand if you break it up into its components, like:

30 0e -- The type and length of the sequence
   04 06 48 65 6c 6c 6f 21 -- The encoded octet string "Hello!"
   01 01 ff -- The encoded Boolean true
   02 01 05 -- The encoded integer five

LDAP makes heavy use of sequence elements. Every LDAP request and response is encapsulated in an element called an LDAP message, which is a sequence that contains a message ID (which is an integer), a protocol operation (which varies, but is often a sequence), and an optional list of controls (which, if present, is a sequence of sequences).

Set Values

A set element is also a container that holds zero or more other elements, and it’s encoded in exactly the same way as a sequence. The only real differences between a sequence and a set are that the order of elements in a set is not considered significant and that the universal BER type for a set is 0x31 instead of the 0x30 type used for a universal sequence.

LDAP does not use sets nearly as much as it does for sequences. The only place that sets are used in the core LDAP protocol specification (RFC 4511) are to hold a collection of values for an attribute, and to hold the components inside an AND or OR search filter.

The String Representation of ASN.1 Elements

The X.680 standard, titled “Information technology — Abstract Syntax Notation One (ASN.1): Specification of basic notation”, defines a syntax for representing ASN.1 elements as strings. As with many things related to the ASN.1, the complete syntax is long and complicated, but if you just constrain yourself to what you need to understand to get by in LDAP, it’s pretty manageable.

The string representation of an ASN.1 element is comprised of the following components:

An optional set of whitespace characters and comments. What exactly constitutes whitespace and comments will be described below.
The name of the element. This is technically called a “type reference”. This must start with a letter, and it must consist of one or more letters, digits, and hyphens. It must not end with a hyphen, and it must not contain consecutive hyphens.
An optional set of whitespace characters and comments.
The assignment operator “::=”.
An optional set of whitespace characters and comments.
An indication of the type of the element. This can be as simple as the name of the value type (for example, BOOLEAN or OCTET STRING), but it can be substantially more involved. We’ll get into this in more detail below.
An optional set of whitespace characters and comments.

For example, a simple ASN.1 element definition might look like:

AttributeValue ::= OCTET STRING

Whitespace in the String Representation of ASN.1 Elements

In the string representation of ASN.1 elements, whitespace consists of one or more of the following characters:

Description	UTF-8 Encoding (hexadecimal)
Regular space	20
Non-breaking space	c2 a0
Horizontal tab	09
Vertical tab	0b
Line feed (aka newline)	0a
Form feed	0c
Carriage return	0d

Comments in the String Representation of ASN.1 Elements

There are two ways to specify comments in the string representation of ASN.1 elements:

A comment can start with two consecutive hyphen characters, “--”, and it will continue either until the next occurrence of “--”, or until the end of the line, whichever comes first.
Like in a number of programming languages like C and Java, a comment can start with “/*”, and it will continue until it is closed with “*/”. These comments can span multiple lines.

Specifying the BER Type in the String Representation of ASN.1 Elements

You can specify the BER type for an ASN.1 element by enclosing it in square brackets in front of the name of the value type. The square brackets should include at least the tag number for the BER type, but may also contain a string that indicates the class for the type.

To indicate that an element should have a BER type in the universal class, you can use the string “UNIVERSAL” inside the square brackets, followed by whitespace and the tag number for that type of element. For example:

AttributeValue ::= [UNIVERSAL 4] OCTET STRING

But this is a rare occurrence because you can omit the type specification if the element is in the universal class. So the above is equivalent to:

AttributeValue ::= OCTET STRING

To indicate that an element should have a BER type in the application class, use the string “APPLICATION” inside the square brackets, followed by whitespace and the tag number. For example:

UnbindRequest ::= [APPLICATION 2] NULL

To indicate that an element should have a BER type in the context-specific class, simply place the tag number inside the square brackets without any other text. For example:

HypotheticalContextSpecificElement ::= [0] INTEGER

And although you’ll probably never encounter it in LDAP, if you want to indicate that an element should have a BER type in the private class, use the string “PRIVATE” inside the square brackets before the tag number, like:

HypotheticalPrivateElement ::= [PRIVATE 5] BOOLEAN

Specifying Null Values

Since null elements don’t have values, there isn’t much variation in the string representation of null values. You just use the string “NULL”, optionally preceded by the type specification in square brackets. For example:

UnbindRequest ::= [APPLICATION 2] NULL

Specifying Boolean Values

Unlike null elements, Boolean elements do have values. But since a Boolean value is so simple, there aren’t any constraints that you can impose, so the string representation of a Boolean value is just the string “BOOLEAN”, optionally including the type in square brackets. For example:

HypotheticalBooleanElement ::= [1] BOOLEAN

Specifying Octet String Values

The string representation of an octet string element uses the string “OCTET STRING”, optionally preceded by the BER type specification. For example:

AttributeValue ::= OCTET STRING

Octet string elements can have any kind of value since the value is just a collection of zero or more bytes. However, just because a general-purpose octet string can have any kind of value, that doesn’t mean that every octet string element should be treated as a free-for-all. A particular octet string element might be indented to hold a particular kind of value, and therefore you might want to indicate that there should be certain constraints on the value of that element.

If an octet string’s value should have a specific size, you can indicate that with the SIZE constraint with the allowed number of bytes specified in parentheses, like:

FiveByteOctetString ::= OCTET STRING SIZE(5)

And if the value’s size should be within a specified range, you can indicate that range by separating the upper and lower bounds with two periods, like:

FiveToTenByteOctetString ::= OCTET STRING SIZE(5..10)

If you need to specify a constraint that is more complex than just restricting the number of bytes that can be in the value, then you can just use a comment to specify what that constraint is. For example:

LDAPString ::= OCTET STRING -- UTF-8 encoded,
                            -- [ISO10646] characters

Specifying Integer Values

To indicate that an element has a value that is an integer, use the string “INTEGER”, like:

SomeNumber ::= INTEGER

You can specify a range of valid values by separating the upper and lower bounds with two periods and enclosing that range in parentheses, like:

NumberBetweenOneAndTen ::= INTEGER (1..10)

You can also define an integer constant, which is a named representation of a fixed value. For example, the LDAP specification defines a maxInt constant with a value of 2147483647, and it uses that constant in various places. For example:

MessageID ::= INTEGER (0..maxInt)
maxInt INTEGER ::= 2147483647 -- (2^^31 - 1) --

Specifying Enumerated Values

An enumerated element has exactly the same encoded representation as an integer element, but they have very different string representations. That’s because each of the possible numeric values for an enumerated element has a specific name that indicates its meaning, and the string representation correlates the name with its numeric value.

The string representation of an enumerated element starts with the string “ENUMERATED” (optionally preceded by the type specification in square brackets), followed by an opening curly brace. It then includes a number of name-value pairs in which the name for each pair follows the same syntax as a type reference (it must start with a letter, must not contain consecutive hyphens, must not end with a hyphen, and must contain only letters, digits, and hyphens), and the numeric value follows that name in parentheses. Each name-value pair except for the last one is followed by a comma, and the last one is followed by a closing curly brace. For example:

TrafficLightColor ::= ENUMERATED {
     red        (0),
     yellow     (1),
     green      (2) }

The string representation of an enumerated element typically lists the values in ascending order, but those values don’t have to represent a contiguous range, and there is no set minimum or maximum value. For example:

SparseValues ::= ENUMERATED {
     smallestValue     (5),
     middleValue       (10),
     largestValue      (17) }

There may also be cases in which you want to define a given set of allowed values now, but also permit defining additional values that can be used in the future. For example, the LDAP protocol specification uses an enumerated element to define a number of possible result code values, but it also allows for other result codes to be defined in other specifications or by specific vendors. To indicate that this should be allowed, use three periods to create an ellipsis, typically at the end of the list of possible values, like:

MayIncludeAdditionalValues ::= ENUMERATED {
     first      (1),
     second     (2),
     third      (3),
     ... }

This indicates that the three specified values are known at the time the specification was created, but that an application should be prepared to encounter other values. The application may not necessarily be able to interpret those values correctly, and it may return an error if it encounters an unrecognized value, but at least that error shouldn’t result from an inability to decode the element.

For example, RFC 4511 section 4.5.1 specification defines three possible search scope values but uses an ellipsis to indicate that there may be additional scopes defined in the future. It does this like:

scope ::= ENUMERATED {
     baseObject       (0),
     singleLevel      (1),
     wholeSubtree     (2),
     ... }

And in fact the draft-sermersheim-ldap-subordinate-scope specification does propose a fourth scope, subordinateSubtree, with a numeric value of 3.

Specifying Sequence Values

There are two basic kinds of sequences: those that have a well-defined set of elements, and those that have an arbitrary number of elements that are all of the same type. The first is primarily used as a data structure to represent some entity with multiple components, while the second is primarily used to hold a bunch of the same kind of thing.

Specifying Sequences with Predefined Element Types

The string representation of a sequence element with a specific number and type of elements is similar to that of an enumerated element. It starts with the “SEQUENCE” keyword (optionally preceded by the type specification in square brackets), followed by an opening curly brace, a comma-delimited list of the allowed elements, and a closing curly brace. Each item in the comma-delimited list of elements consists of a name, some whitespace, and the value specifier. For example:

Date ::= SEQUENCE {
     year           INTEGER,
     month          ENUMERATED {
          january       (1),
          february      (2),
          march         (3),
          april         (4),
          may           (5),
          june          (6),
          july          (7),
          august        (8),
          september     (9),
          october       (10),
          november      (11),
          december      (12) },
     dayOfMonth     INTEGER (1..31) }

All of the constraints that you can define for the elements on their own are also available for those elements in a sequence (for example, the above constraint that only allows the dayOfMonth value to be between 1 and 31). But there are also additional constraints that you can define for elements in a sequence. These include the OPTIONAL and DEFAULT constraints.

The OPTIONAL constraint indicates that the specified element is optional and doesn’t have to be present. For example, the following sequence defines a data structure for specifying the time of the day in which the hour and minute are required, but the second is optional:

TimeOfDay ::= SEQUENCE {
     hour       INTEGER (0..23),
     minute     INTEGER (0..59),
     second     INTEGER (0..60) OPTIONAL }

The DEFAULT constraint is like the OPTIONAL constraint in that it indicates that the specified element doesn’t have to be there, but the DEFAULT constraint also specifies what value should be assumed if that element isn’t present by following that keyword with whitespace and the default value that should be used. For example:

Control ::= SEQUENCE {
     controlType      LDAPOID,
     criticality      BOOLEAN DEFAULT FALSE,
     controlValue     OCTET STRING OPTIONAL }

As with an enumerated element, you may want to define a sequence that has a defined set of elements right now, but that may also have additional elements in the future. In that case, you can use the ellipsis (...) at the end of the sequence before the closing curly brace, just like you can in an enumerated element. For example:

ExtendableSequence ::= SEQUENCE {
     element1     OCTET STRING,
     element2     OCTET STRING,
     element3     OCTET STRING,
     ... }

Because the order of elements in a sequence is significant, you can often use the positions of each element to determine what they represent. In the ExtendableSequence defined above, the first element in the sequence corresponds to element1 in the definition, the second corresponds to element2, and the third corresponds to element3. However, this may not work if a sequence contains non-required elements. For example, consider the following:

AnInvalidSequenceDefinition ::= SEQUENCE {
     element1     OCTET STRING OPTIONAL,
     element2     OCTET STRING OPTIONAL,
     element3     OCTET STRING OPTIONAL }

The above sequence is not valid because, if any of the elements is omitted, it’s not possible to determine which one it was. To deal with this, you need to ensure that all of the elements (or at least all of the elements starting with the first non-required element) have a unique BER type so that you can use the type to determine which elements are present and which are absent. For example, the following is valid because even if one or two elements are missing, you can use the BER types of the elements that are present to figure out which ones they are:

AValidSequenceDefinition ::= SEQUENCE {
     element1     OCTET STRING OPTIONAL,
     element2     BOOLEAN DEFAULT TRUE,
     element3     INTEGER DEFAULT 1234 }

But what if you want a sequence to have multiple elements with the same data type? This is when you specify an explicit BER type (usually in the context-specific class) so that you can use it to tell the difference between them. So the following is valid:

AnotherValidSequenceDefinition ::= SEQUENCE {
     element1     [1] OCTET STRING OPTIONAL,
     element2     [2] OCTET STRING OPTIONAL,
     element3     [3] OCTET STRING OPTIONAL }

Specifying Sequences with an Arbitrary Number of Elements of the Same Kind

Sometimes you want to have a sequence that is just a list containing some number of elements of a given kind, and you may or may not know how many elements should be in that list. You can indicate this with “SEQUENCE OF” followed by the type of element that should be contained in the list. For example:

ListOfIntegers ::= SEQUENCE OF listItem INTEGER

If you want to restrict the number of elements in the sequence, you can use the SIZE constraint. In this case, the word SIZE comes immediately after the word SEQUENCE and is followed by either a single number in parentheses (to indicate exactly how many elements should be present) or a pair of numbers separated by two periods (to indicate that the number of elements should fall within a specified range). For example:

ListOfThreeIntegers ::= SEQUENCE SIZE (3) OF listItem INTEGER

ListOfFiveOrSixIntegers ::= SEQUENCE SIZE (5..6) OF listItem INTEGER

If there is a lower bound on the number of items but no upper bound, you can use the word MAX in place of the upper bound in the range, like:

NonEmptyListOfIntegers ::= SEQUENCE SIZE (1..MAX) OF listItem INTEGER

Inheriting from an Existing Sequence

Sometimes, you may want to create one sequence that contains all of the elements of another sequence, but that also allows additional elements not in the original sequence. For example, most response messages for LDAP operations allow for a result code, matched DN, diagnostic message, and a list of referral URLs, and these are all contained in an LDAPResult sequence, which is defined as follows:

LDAPResult ::= SEQUENCE {
     resultCode         ENUMERATED {
          success                      (0),
          operationsError              (1),
          protocolError                (2),
          timeLimitExceeded            (3),
          sizeLimitExceeded            (4),
          compareFalse                 (5),
          compareTrue                  (6),
          authMethodNotSupported       (7),
          strongerAuthRequired         (8),
               -- 9 reserved --
          referral                     (10),
          adminLimitExceeded           (11),
          unavailableCriticalExtension (12),
          confidentialityRequired      (13),
          saslBindInProgress           (14),
          noSuchAttribute              (16),
          undefinedAttributeType       (17),
          inappropriateMatching        (18),
          constraintViolation          (19),
          attributeOrValueExists       (20),
          invalidAttributeSyntax       (21),
               -- 22-31 unused --
          noSuchObject                 (32),
          aliasProblem                 (33),
          invalidDNSyntax              (34),
               -- 35 reserved for undefined isLeaf --
          aliasDereferencingProblem    (36),
               -- 37-47 unused --
          inappropriateAuthentication  (48),
          invalidCredentials           (49),
          insufficientAccessRights     (50),
          busy                         (51),
          unavailable                  (52),
          unwillingToPerform           (53),
          loopDetect                   (54),
               -- 55-63 unused --
          namingViolation              (64),
          objectClassViolation         (65),
          notAllowedOnNonLeaf          (66),
          notAllowedOnRDN              (67),
          entryAlreadyExists           (68),
          objectClassModsProhibited    (69),
               -- 70 reserved for CLDAP --
          affectsMultipleDSAs          (71),
               -- 72-79 unused --
          other                        (80),
          ...  },
     matchedDN          LDAPDN,
     diagnosticMessage  LDAPString,
     referral           [3] Referral OPTIONAL }

Referral ::= SEQUENCE SIZE (1..MAX) OF uri URI

URI ::= LDAPString     -- limited to characters permitted in
                       -- URIs

But an LDAP bind response can include all of these LDAPResult elements, plus an additional octet string element used to hold server SASL credentials. And an LDAP extended response can include all of the LDAPResult elements, plus an additional octet string for the response OID and an additional octet string for the response value.

Rather than duplicating the entire LDAPResult element and making the desired changes, you can use the “COMPONENTS OF” keyword followed by the name of the sequence whose elements you want to import. For example:

BindResponse ::= [APPLICATION 1] SEQUENCE {
     COMPONENTS OF LDAPResult,
     serverSaslCreds    [7] OCTET STRING OPTIONAL }

ExtendedResponse ::= [APPLICATION 24] SEQUENCE {
     COMPONENTS OF LDAPResult,
     responseName     [10] LDAPOID OPTIONAL,
     responseValue    [11] OCTET STRING OPTIONAL }

Specifying Set Values

The string representation of set elements is virtually identical to that of sequence elements. Just replace “SEQUENCE” with “SET”, and “SEQUENCE OF” with “SET OF”. However, given that the order of elements in a set is not considered significant, you are more likely to encounter the “SET OF” variant.

Specifying Choice Values

There may be cases in which you want to allow for one of several elements in a given slot in a sequence or set. You can accomplish that with a choice. The string representation of a choice element is very much like a sequence or a set, except that the encoded element can only contain one of the elements. For example:

NameValuePair ::= SEQUENCE {
     name      OCTET STRING,
     value     CHOICE {
          booleanValue     BOOLEAN,
          integerValue     INTEGER,
          stringValue      OCTET STRING } }

Most of the time, the encoded representation of the choice element is just the encoded representation of the element that is selected. For example, the encoded representation of the above sequence with a name of “age” and an integer value of 35 would be:

30 08 -- Begin a universal sequence with a total value size of 8 bytes
   04 03 61 67 65 -- The universal octet string age
   02 01 23 -- The universal integer 35 (0x23)

This even works for most choice elements with custom element types. For example:

NameAndOctetStringValue ::= SEQUENCE {
     name      OCTET STRING,
     value     CHOICE {
          stringValue     [0] OCTET STRING,
          binaryValue     [1] OCTET STRING } }

In this case, if you have a name of “hello” and a string value of “there” then the encoded representation would be:

30 0e -- Begin a universal sequence with a total value size of 14 bytes
   04 05 68 65 6c 6c 6f -- The universal octet string hello
   80 05 64 68 65 72 65 -- The context-specific primitive zero octet string there

However, there is a case in which this doesn’t work, and that is the case in which the choice element itself is defined with a custom BER type. For example:

NameAndOptionalValue ::= SEQUENCE {
     name      [0] OCTET STRING,
     value     [1] CHOICE {
          booleanValue     [2] BOOLEAN,
          integerValue     [3] INTEGER,
          stringValue      [4] OCTET STRING,
          binaryValue      [5] OCTET STRING } OPTIONAL }

In this case, the choice element is encoded as a constructed element, with a value that is the full encoding of the selected element inside that choice. For example, if you have a name of “state” and a string value of “Texas”, the encoding would be:

30 10 -- Begin a universal sequence with a total value size of 16 bytes
   80 05 73 74 61 74 65 -- The context-specific primitive zero octet string state
   A1 07 -- Begin a context-specific constructed one value size of 7 bytes
      84 05 54 65 78 61 73 -- The context-specific primitive four octet string Texas

Previous: LDAPv3 Wire Protocol Reference Next: The LDAPMessage Sequence