The split-ldif Command-Line Tool

Splits a single LDIF file into multiple sets by separating entries below a specified base DN into different mutually-exclusive collections of entries. A number of algorithms are available to determine how entries should be split, and entries outside the split base DN may be included in all sets or added to a dedicated LDIF file.

Usage

split-ldif {subCommand} {arguments}

This tool uses subcommands to indicate which function you want to perform.
Jump to a list of the available subcommands.

Global Arguments

Exclusive Argument Sets

Examples

    split-ldif split-using-hash-on-rdn --sourceLDIF whole.ldif \
         --targetLDIFBasePath split.ldif \
         --splitBaseDN ou=People,dc=example,dc=com --numSets 4 \
         --schemaPath config/schema --addEntriesOutsideSplitBaseDNToAllSets
    split-ldif split-using-hash-on-attribute --sourceLDIF whole.ldif \
         --targetLDIFBasePath split.ldif \
         --splitBaseDN ou=People,dc=example,dc=com --attributeName uid \
         --numSets 4 --schemaPath config/schema \
         --addEntriesOutsideSplitBaseDNToAllSets
    split-ldif split-using-fewest-entries --sourceLDIF whole.ldif \
         --targetLDIFBasePath split.ldif \
         --splitBaseDN ou=People,dc=example,dc=com --numSets 4 \
         --schemaPath config/schema --addEntriesOutsideSplitBaseDNToAllSets
    split-ldif split-using-filter --sourceLDIF whole.ldif \
         --targetLDIFBasePath split.ldif \
         --splitBaseDN ou=People,dc=example,dc=com \
         --filter "(timeZone=Eastern)" --filter "(timeZone=Central)" \
         --filter "(timeZone=Mountain)" --filter "(timeZone=Pacific)" \
         --schemaPath config/schema --addEntriesOutsideSplitBaseDNToAllSets

 

Available Subcommands

 

Usage For Subcommand split-using-hash-on-rdn

Splits the data by computing a hash on the normalized representation of the DN component immediately below the split base DN, and using a modulus operation to determine the set in which to place the entry. This split algorithm does not require any caching to ensure that a subordinate entry is placed in the same set as its parent.

Arguments

Examples

    split-ldif split-using-hash-on-rdn split-using-hash-on-rdn \
         --sourceLDIF whole.ldif --targetLDIFBasePath split.ldif \
         --splitBaseDN ou=People,dc=example,dc=com --numSets 4 \
         --schemaPath config/schema --addEntriesOutsideSplitBaseDNToAllSets

 

Usage For Subcommand split-using-hash-on-attribute

Splits the data by computing a hash on the value(s) of a specified attribute in entries immediately below the split base DN, and using a modulus operation to determine the set in which to place the entry. If an entry does not contain the target attribute, the set will be chosen based on a hash of the DN component immediately below the split base DN. Unless the --assumeFlatDIT argument is provided, a cache will be used to ensure that subordinate entries are added into the same set as their parents.

Arguments

Examples

    split-ldif split-using-hash-on-attribute split-using-hash-on-attribute \
         --sourceLDIF whole.ldif --targetLDIFBasePath split.ldif \
         --splitBaseDN ou=People,dc=example,dc=com --attributeName uid \
         --numSets 4 --schemaPath config/schema \
         --addEntriesOutsideSplitBaseDNToAllSets

 

Usage For Subcommand split-using-fewest-entries

Splits the data by selecting the set with the fewest number of entries. When processing data in a flat DIT, this will essentially be a round-robin distribution. When processing data in a DIT with branches of varying sizes, this can help ensure that the resulting LDIF files will have roughly the same number of entries (although running the tool with multiple threads may make this less accurate). Unless the --assumeFlatDIT argument is provided, a cache will be used to ensure that subordinate entries are added into the same set as their parents.

Arguments

Examples

    split-ldif split-using-fewest-entries split-using-fewest-entries \
         --sourceLDIF whole.ldif --targetLDIFBasePath split.ldif \
         --splitBaseDN ou=People,dc=example,dc=com --numSets 4 \
         --schemaPath config/schema --addEntriesOutsideSplitBaseDNToAllSets

 

Usage For Subcommand split-using-filter

Splits the data by using search filters to select the appropriate set. The filters will be evaluated in the order they were provided on the command line, and the entry will be added to the first set for which a matching filter is found. If the entry doesn't match any of the provided filters, then the appropriate set will be determined by computing a hash on the RDN. Unless the --assumeFlatDIT argument is provided, a cache will be used to ensure that subordinate entries are added into the same set as their parents.

Arguments

Examples

    split-ldif split-using-filter split-using-filter --sourceLDIF whole.ldif \
         --targetLDIFBasePath split.ldif \
         --splitBaseDN ou=People,dc=example,dc=com \
         --filter "(timeZone=Eastern)" --filter "(timeZone=Central)" \
         --filter "(timeZone=Mountain)" --filter "(timeZone=Pacific)" \
         --schemaPath config/schema --addEntriesOutsideSplitBaseDNToAllSets