NIH Architecture Review and Guidelines

Draft Architecture Review

Recommendations

for an

NIH Enterprise Directory Service

Final Draft

November 2, 1998

Draft Report with input from the NIH AMG Directory Team

Prepared by The Burton Group
Executive Summary

The National Institutes of Health (NIH) is developing architectural guidelines to leverage Directory Services standards, including the ITU X.500 specification and the IETF Lightweight Directory Access Protocol (LDAP). These guidelines are intended to provide enterprise-wide directory capabilities and lower the cost of managing this information. NIH has developed an enterprise directory schema and a design for deploying these services.

Directories are used to manage information about people and resources, or "objects". End-users and computer applications use the directory to locate features and characteristics about the object in question. A number of directories exist today, including host and network operating systems (NOS), messaging and database applications, and directories within the network itself. Organizations like NIH are striving to consolidate and standardize the directory information in order to increase its accuracy and effectiveness, to eliminate redundancy, and to reduce the costs of its administration.

Intensifying directory consolidation efforts are beginning to leverage meta-directory technology to provide an overall perspective on information about objects in an organization, as well as a flexible mechanism to consolidate, refine and distribute directory-related information to and from distributed repositories. Distributed repositories include human resources databases, NOS, and telephone and security directories. Several vendors have announced or are delivering meta-directory products; however, the market is just beginning to see viable solutions. In addition, organizational issues, ownership issues, data inconsistency and incompatibilities make the deployment of a meta-directory solution a significant effort, often taking a year or longer to fully deploy.

Through its directory database survey, NIH has identified nine major sources of directory information to consolidate. These include human resources, parking and security, NOS, telecommunications, and messaging repositories. The information and the technologies used to manage and maintain these sources vary widely, including relational databases as well as proprietary applications. These disparities indicate the need for meta-directory services that will simplify the overall information management effort by replacing some of the existing directory environments, and coexisting with others.

NIH has also made considerable progress in defining a logical and physical directory design. This design includes the use of "meta-directory" technology, or services that can consolidate and integrate directory information from multiple directory sources. Meta-directory services can provide new directory functionality, such as LDAP and web access to consolidated directory information, without disturbing business processes in certain "connected directories" such as the HRDB. NIH also plans to integrate a Unique ID (UID) Generator system with the meta-directory in order to track and identify all people using NIH facilities. Finally, NIH has defined a "schema" or data model that describes the information to be stored and presented in the directory.

This report comments on the work done to date, offering suggestions on some of the issues that remain to be solved. In particular, NIH's "Rich DIT" (Directory Information Tree) schema design, while appropriate to NIH needs for multiple views of directory information, represents some degree of risk in terms of the capability of the directory vendors to implement it, as well as the ability of applications to properly interpret it. In addition, the report notes additional work that must occur, such as the integration of the NIH Electronic Directory into the NIH business processes.

Due to the size and scope of this effort, it is critical for NIH to carefully examine the capabilities of the respective meta-directory providers in view of its prioritized requirements. As its next step, NIH must conduct a Proof of Concept of its design with selected vendors in order to insure that the schema design is feasible and to identify any issues it might raise with other applications, such as security, flexibility, performance, or compatibility. This document identifies the issues that NIH should seek to resolve in its Proof of Concept effort. At the same time, The Burton Group is conducting a Meta-Directory Technology assessment that will recommend solutions to be used for Proof of Concept testing and pilot operation.

As NIH realizes its directory vision, it will need to employ additional standards and develop procedures to insure compatibility and to insure the integrity and long-term viability of the information. As the meta-directory becomes ubiquitous, mechanisms to arbitrate authoritative sources of information, management and directory access techniques will need to be defined and implemented. Those relating to directory applications are identified as in the report as requirements, in addition to procedural guidelines and recommendations for the NIH organization.

NIH shares its goal of an enterprise directory with many other organizations. While there are many caveats to be considered, the technology is clearly capable of meeting the majority of NIH needs. A prudent course of action is to investigate the ability of products to satisfy requirements carefully in a pilot, and to prioritize the implementation in a manner that isolates and minimizes elements of risk.

The potential benefits of employing a meta-directory in a well-planned implementation are many. These include lower cost of administration, accuracy of information, and improved functionality of applications. Specific NIH initiatives which will benefit include the NIH Public Key Infrastructure, messaging and application development. The NIH has conducted its efforts to date in a manner that will insure success, and recognizes the hurdles it must overcome to reach its goals. As plans to move forward become more concrete, NIH participants must organize and commit the resources needed to satisfy NIH directory project objectives.

Table of Contents

Executive Summary *

1 Introduction *

1.1 NIH Directory Project Background *

1.2 Industry Background *

2 NIH Current Situation Inventory *

2.1.1 Feedback on Current Situation Inventory *

3 Review of NIH Enterprise Directory Architecture *

3.1 NIH Schema *

3.1.1 Feedback on Schema *

3.2 DIT Naming Recommendations *

3.3 NIH DIT Structure *

3.3.1 Feedback on Rich DIT Design *

3.4 NIH Logical/Physical Design *

3.4.1 Feedback on Logical/Physical Directory Design *

4 Other NIH Directory Architecture Guidelines *

4.1 Recommended NIH Directory Clients and Applications Guidelines *

4.2 Recommended NIH Directory Server Guidelines *

4.3 Recommended NIH Enterprise Directory Guidelines *

4.3.1 Enterprise Directory Interfaces *

4.3.2 Enterprise Directory Management and Administration *

4.3.3 Enterprise Directory Security *

1 Introduction
1.1 NIH Directory Project Background

The National Institutes of Health (NIH) is in the process of developing architectural guidelines to leverage Directory Services standards such as the ITU X.500 specification and the IETF Lightweight Directory Access Protocol (LDAP) RFCs. The intent is to provide enterprise-wide directory services and lower long run costs by reducing the number of supported directory administration interfaces. NIH has developed guidelines for an enterprise directory schema, and a logical/physical design for deploying directory services.

NIH has retained The Burton Group to review their enterprise directory DIT design, schema, and logical/physical design.

1.2 Industry Background

Directories are databases used to locate objects on a network. Information in a directory can include user information (people), organizational information (relationships, or ownership), and system information. Broadly speaking, directories are used by directory enabled applications, directory enabled network infrastructure components, and directory enabled management applications.

Directory enabled applications potentially include any application on an enterprise network that has a need to obtain information about users, either for the purposes of profiling user preferences, or determining user authorization. A NIH directory service can support several different applications. The architecture of the directory will be driven by the long-term support of a variety of applications, phased in over time.

Directory enabled networks depend heavily on directories already in the network operating system (NOS) area, where directories are essential to login, authorization, and even file and print capabilities. In addition, the network routing infrastructure (routers, hubs, and switches) will soon come to depend on directories to contain per-user policy information governing per-user quality of service and security decisions for intelligent network and multimedia applications.

Today's directories are usually proprietary and tightly bound to some application component, such as a message store for email. There is little or no integration with directories of other applications. The closest approximation of an enterprise directory that most users have at present resides in their email system(s), but the most they can take for granted is that directory synchronization will be available between servers from the same vendor. There are also many third party directory synchronization tools that can synchronize across the directories of different email systems.

Directory synchronization, however, usually provides only a solution for the messaging application, leaving many other directory namespaces untouched. Also, directory synchronization almost always limits users to a distributed update model, where changes must be made in email directories, rather than centrally to an enterprise directory, for example, at the time of employee hiring.

Modern email systems and Intranet applications must be built on modern directories. Modern directories will consolidate information - such as user names, contact information, Certificates - that today are duplicated across many applications. With modern directories, common fields can be shared across applications. Advanced directories also give users more choices on how to manage directory information. Some attributes may be updated in a distributed manner by an IC administrator, or by end users themselves. Other attributes could be updated in a centralized manner, at either enterprise or the IC level.

To allow increased integration and flexibility, modern directories will be based on Lightweight Directory Access Protocol (LDAP), DNS, X.500, NOS directory, and meta-directory capabilities. Each of these directory standards or technology categories has a great deal to offer, but none is sufficient (on its own) to resolve the whole directory problem.

LDAP is a replacement for the X.500 Directory Access Protocol (DAP). It is based on X.500 naming and data models, but is designed to be easier to implement and to use less memory on the PC client. LDAP is based on the X.500 standard, and essentially replaces X.500's native client/server access protocol with a lighter weight, Internet-based alternative. Version 3 of LDAP has now been granted Request for Comment (RFC) status by the IETF, adding referrals, optional strong authentication, and other functions. In the meantime, all major email and network operating system vendors are implementing LDAP access (primarily for read access) to their products.

Today, however, LDAP is only a client/server access protocol to diverse directories. It doesn't provide the directory-to-directory communication required to pull together information from different directory repositories in a manner that is seamless and transparent to the end user. Future versions of LDAP will add additional capabilities, such as replication, to remove some of LDAP's limitations. But this will take at least two years, perhaps longer, to emerge.

Nevertheless, after years of relative neglect, directories are becoming accepted as a key Intranet infrastructure component. Enterprises are moving forward towards consolidated directories, making the best possible use of the tools at hand. These tools consist of currently available LDAP add ons to popular directories, NOS directories, and meta-directories.

NOS directories are provided with the Microsoft NT server and Novell NetWare operating systems. Because it serves as the repository for configuration, addressing, and access control information in the NOS environment, the NOS directory is critical for enabling NOS services, such as login, access control, file sharing, printing, as well as applications such as e-mail, ideally providing single signon and a single point of administration.

Some analysts, vendors, and users believe that the NOS will become the center of the enterprise directory universe as offerings such as Novell's NDS, NetWare 5.0, and Novell's Zero Effort Networks (ZENworks) mature - and especially when Microsoft finally rolls out its NT 5.0 operating system, which will include the LDAP-based Active Directory service. Already, many independent software vendors (ISVs) are writing applications that integrate directly into the NOS instead of creating new embedded directories of their own. Microsoft, Novell, and others will provide numerous incentives for ISVs to rely on NOS based directories.

However, there are various reasons why NOS directories won't scale to become the whole enterprise directory solution. Specifically, many enterprises use multiple NOSes. Even if an enterprise uses a single NOS, it may need to merge with, acquire, or just interact closely with other enterprise directories. In addition, the NOS directory may prove to be an inflexible alternative as an enterprise directory due to its dependency on the operating system itself and other reasons.

The Burton Group generally recommends that users take full advantage of NOS directory consolidation opportunities where they make sense, but deploy meta-directories as a higher-level integration strategy. Some directory enabled applications built to work with NetWare NDS or Microsoft ADS can be more easily deployed at the NOS level, others can be best deployed at the meta-directory level.

Meta-directories are tools and technologies enabling users to deploy a consolidated directory designed expressly to interwork with their existing email, NOS, human resources (HR) and other directories through sophisticated multi-master replication and object merging capabilities. Figure 1 shows how an enterprise can deploy a meta-directory directory as a higher level solution while still leveraging NOS and other directories to their fullest. Note that a meta-directory strategy is consistent with the desire to support LDAP, X.500, or both.

Because the meta-directory supports both multi-directional entry synchronization and attribute level synchronization between directories, it offers great flexibility. For example, the authoritative source for the email address could be the email system; the network logon id could be provided by the NOS; a phone number by the HR department. These attributes arriving from the three authoritative sources could be automatically synchronized to the others in turn, increasing the wealth of information in all directory repositories without requiring manual effort. The HR department could also act as the authoritative source for entry existence, so that an employee leaving the company would automatically be deleted from all directory databases. In addition, the meta-directory could create new entries and attributes in some environments if centralized administration was desired. In essence, the meta-directory allows for bottom up, top down, or mixed models of administration with different authoritative sources for entry existence or attribute values.

Figure 1: The Enterprise Meta-directory

Meta-directory products (both released and announced) are available through vendors such as CDS, ISOCOR, Siemens-Nixdorf, and Zoomit. Other vendors offering X.500, directory synchronization, and some level of meta-directory capability through scripting environments should also be considered. Zoomit currently has the most advanced meta-directory product offering, but is still a small company that lacks the ability to deliver global 24x7 support. CDS is a larger company with some meta-directory tools and the ability to address directory problems through its systems integration services.

NIH should be aware, however, that although meta-directory technology is making good progress, it is not yet very mature. There will be update performance issues and some functionality issues dealing with "people data", and still more issues when the meta-directory is used to store computer resources information. Also, meta-directory implementations will require a significant degree of project coordination across ICs or enterprise support groups and customization of the vendor provided software. They will in many cases require a large scale efforts to "clean up" the existing directory data. Sophisticated but complex solutions, meta-directories can take a year or longer to fully deploy.

More information on meta-directory product alternatives for NIH will be provided in The Burton Group's forthcoming "NIH Directory Technology Assessment" deliverable.

2 NIH Current Situation Inventory

The NIH AMG Directory Team has surveyed the following directory databases:

NIH Email Directory and Forwarding Service: A meta-directory and forwarding service for 23 NIH email systems using a freeware CSO server on Solaris. Contains over 28,700 entries, >700 duplicates. Can provide to NIH Directory PH alias, Preferred email address, Nickname list.

NIH Human Resources Database: Hosted on DB2/MVS, it contains current and historical HR data (from 1982) on all NIH Civil Service and PHS Commissioned Corps employees. Contains 17,000 current employees. Most data provided by DHHS/OPM, though ICs enter the timekeeper name. Can provide to NIH Directory First name, Last name, Middle Initial, SSN, Position code and free form text, Tenure code, Home Postal address, Timekeeper, SAC (Standard Administration Code), SAC/IC table. However, the SACs are believed to outdated, incomplete, and inconsistent and might better be sourced from the NIH Scientific Directory. The NIH Directory could provide to HRDB the information that is currently obtained from NIH Telephone and Services Directory, such as buildingName, houseIdentifier, roomNumber, telephoneNumber, FacsimileTelephoneNumber, and Timekeeper information currently obtained from ICDs.

NIH Scientific Directory: Runs on Oracle on Digital UNIX (EOS). Contains consolidated, historical information about over 9,000 researchers who have worked in an NIH lab for more than 2 months, and NIH organizational information on about 4,000 organizational units. It is initialized with data from HRDB (scientific series), OFM, and JEFIC.

Parking/ID Badge/Transhare (PAID) DB: Uses dBASE IV on NT. Tracks permits issued to persons using NIH parking facilities (about 18,800), participants in the NIH Transhare program (about 2,000), and assignment of NIH ID badges to (non FDA) workers at MD sites (about 31,800). Can provide to NIH Directory the digitized photo, ID, NIH status (NIH, Guest, Contractor, Volunteer), and updates to work and home address and telephone number. The PAID system to be redesigned as part of directory project, and integration with new card key system under investigation. Note: there are additional ID badge systems at RTP/NC, Waltham/MA, RML/MT (not yet surveyed).

NIH Telecommunications DB: On-line information system for NIH switchboard operators and code blue service, and production of the NIH Telephone and Service Directory. Uses FoxPro on NT. Lists Permanent Federal employees, Temporary Federal employees (>1 year),Temporary Federal physicians (>6 months), and Other non-Federal employees. Includes SSN for Federal employees (60% of all "white pages" entries). Could be used to initially load NIH directory with telephone, FAX, building/room/MSC. NIH Directory should replace form NIH 433, and NIH should consider direct update by users via the Web. Performing other "ad hoc" updates to directory of on-call personnel ("gray pages") via Web is also highly desirable.

NCI/NHLBI NDS: Directory supporting Novell NOS. Contains about 4,000 NCI users and 1,200 NHLBI users. Can provide: Name, Title, Organization, Work phone, Work FAX, Work location. Information in NDS is entered/updated by system administrators. Unlike PAID, Telecom, Scientific Directory and others whose administrators wish to update attributes in the meta-directory, NDS administrators wish to update these attributes in NDS. Conflicts could occur, helpful if meta-directory supports multi-mastered attributes.

NIH Data Warehouse: Read-only, "transformed and cleaned" data from HRDB. Contains job description data, and there is risk of overlap with Scientific Directory work. Could be used to support ODBC/SQL access to NIH Enterprise Directory data after taking a daily dump of the data, as well as possibly maintaining historical views of directory information.

CIT NT & Exchange: There are 171 NT domains at NIH, about 55 of which are linked in an NIH-wide, centrally administered NT infrastructure with established trust relationships. There are approximately 8,000 entries containing logon id, First/Last Name, and BUILDING/ROOM WORK-TELEPHONE IC-BRANCH in the description field. NIH could integrate this information with the meta-directory and add UID to the description field. The NIH MS Exchange email system provides directory lookup and delivery service for users and contains approximately 15,000 entries. It is connected to the NIH Email Directory and Forwarding Service, and contains attributes for non-Exchange entries in its Global Address List (GAL).

Fogarty DB: Tracks information on visiting scientists from foreign countries. It consists of TSO and custom COBOL/ASM applications on MVS with read-only access via DB2 and SILK. It contains 17,000 individuals, 2,200 active, and about 300 matches to records in HRDB. Fields include name, date and place of birth, home address, and sex. It does not contain the passport for the foreign scientists. Data entry is done by JEFIC immigration specialists.

2.1.1 Feedback on Current Situation Inventory

For those directory databases that have been surveyed so far, the level of data collected has been very good, and even more impressive has been the widespread willingness on the part of most directory stakeholders to change the status quo. Note that while a meta-directory strategy is designed to foster coexistence and flexibility ("live and let live") the organization as a whole can benefit from replacing at least some of the outdated databases and procedures with the meta-directory, provided stakeholders buy into the idea. The AMG Directory Team seems to have considerable expertise, and to have the benefit of both upper management support and broad lateral support from directory stakeholders.

The Team should be aware, however, that verbal assurances of lateral support from directory database stakeholders can fade when the crunch comes and lateral resources are required from other projects to change connected directories and procedures (for example, to install a UID in an existing database). Good program management is needed to ensure that all stakeholders understand their roles, deadlines, resource requirements, and accountability.

In addition to the above caution, it is also important to note that an approved list of prioritized requirements (high, medium, and low) developed by the team should be in the inventory. Also, the directory strategy should be coordinated with:

Planning to consolidate the number of diverse email environments at NIH in order to reduce email maintenance costs (23 email systems is too many for an organization of 25,000)

A public key infrastructure (PKI) strategy to promote the use of smartcards and Certificates with secure messaging, workflow, extranets, and electronic commerce

An application development strategy appropriate for NIH's heterogeneous environment, possibly involving use of Java, DCOM, and CORBA/IIOP

Once constructed, the directory will help enable messaging, PKI, and application development initiatives. It would be best, however, if the collateral strategies were more clearly defined at this stage in order to fine tune the directory schema, maintenance, and product selection specifications.

3 Review of NIH Enterprise Directory Architecture

The AMG proposed architecture includes a proposed DIT structure design, schema definitions for NIH organizational person and other objects, and a logical/physical design of the enterprise directory system. This section describes each component of the AMG proposed architecture insofar as it has been defined, and then provides our feedback and recommendations.

3.1 NIH Schema

NIH plans to focus initially on person, organization, role and group objects.

The nihInetOrgPerson is used to represent persons using NIH facilities, including employees, contractors, visiting scientists, etc. Instances of nihInetOrgPerson are defined to contain the following attributes, which are drawn from the LDAPv3 schema RFC 2256, the Lightweight Internet Person Schema (LIPS), Internet Drafts for the LDAPv3 pilot schema, and the earlier RFC 1274. NIH-defined attributes are denoted with the "nih" prefix. The attributes currently defined for the NihInetOrgPerson are shown in the box below. For a detailed description of these contents, see the "NIH Person Schema v4" document.

NihInetOrgPerson:

cn, sn, personalTitle, givenName, middleName, sn, generationQualifier, nihSuffixQualifier, initial, uniqueIdentifier, o, ou, title, organizationalStatus, businssCateogy, mail, nihUniqueMail, userCertificate, labeledURI, description, homePhone, homeFax, personalMobile, personalPager, homePostalAddress, nihHomeMail, thumbnailPhoto, jpegPhoto, postalAddress, postalCode, nihDeliveryAddress, l, st, c, street, roomNumber, buildingName, nihPhysicalAddress, dhhsMailstop, telephoneNumber, facsimileTelephoneNumber, mobileTelephoneNumber, pager, secretary, manager, nihSac, userPassword, createTimestamp, modifyTimestamp, creatorsName, modifiersName

The nihOrganizationalRole will be used to represent organizational functions, such as secretary, that are independent of the person filling the role. This object class is based on the organizationalRole defined in X.500, and contains additional attributes defined for use by NIH. To use the role function, for example, a person could send email to the role, or look up the roleOccupant and send email to his/her address. Roles get the context from their placement in the NIH Directory Information Tree (DIT) organizational hierarchy. The role can also be used for NIH "green pages" construction. Searching for objectclass=nihOrganizationalRole at one particular organizational unit level can return all roles. The role also allows one person to have multiple representations of their contact attributes (e.g., phone #, address, etc.).

The attributes currently defined for the nihOrganizationalRole are shown in the box below. For a detailed description of these contents, see the " NIH Organizational Role Schema v1" document.

NihOrganizationalRole: cn,description,ou,nihSac,roleOccupant,seeAlso,l,st,street,roomNumber,buildingName,nihDeliveryAddress,postalAddress,postalCode,dhhsMailstop,telephoneNumber,facsimileTelephoneNumber,mobileTelephoneNumber,pager,userPassword,userCertificate,createTimestamp,modifyTimestamp,creatorsName,modifiersName

The NIH AMG Directory Team has also defined an nihOrganizationalUnit object class. This class is intended to represent actual NIH organizations, enabling users to identify attributes about the organization, such as its hours of operation, fax number, or contact person.

The attributes currently defined for the nihOrganizationalUnit are shown in the box below. For a detailed description of these contents, see the " NIH Organizational [Unit]

Schema v2" document.

NihOrganizationalUnit: o,ou,nihSac,nihVisible,description,businessCategory,seeAlso,secretary,manager,keyWord,programList,hoursOfOperation,primaryPOC,secondaryPOC,organizationalChart,informationRequestForm,postalAddress,postalCode,l,st,street,nihDeliveryAddress,nihPhysicalAddress,telephoneNumber,facsimileTelephoneNumber,mobileTelephoneNumber,pager,dhhsMailstop,roomNumber,rfc822mailbox

3.1.1 Feedback on Schema

NIH should define the data definitions for people, roles, and other objects of interest on paper using new object class definitions, such as nihInetOrgPerson, and nihOrganizationalRole. These object class definitions should continue to be developed on paper in an abstract way, and then input into a product's schema configuration modules in a manner that minimizes changes to the off the shelf schema provided by the product. Testing should be done to determine the best way to implement new classes, either by defining nihInetOrgPerson as an auxiliary class or using "local extensions" features of products (such as "zcPerson" in Zoomit VIA).

The schema description should indicate which attributes are NIH-defined, and which are standards-defined. For the most part this has been done through the use of the "nih" prefix on attribute names. The schema should identify the authoritative source, and planned implementation phasing, behavior, and access control rules for each attribute as requirements become more fully defined.

Slight inconsistencies in the schema should be dealt with, for example, the nihOrganizationalUnit uses the "rfc822mailbox" descriptor for email address and the nihInetOrgPerson uses "mail". The "rfc822mailbox" is used in the COSINE and Government Electronic Directory schemas; "mail" is used in the LIPS schema and is recognized as the mail attribute of choice by many clients (e.g., Outlook, MSIE, Netscape). The AMG should use the "mail" attribute description in preference to the "rfc822Mailbox" attribute description.

Also, the AMG should determine whether the nihUniqueMail attribute (holding the unique email alias assigned by the CSO-based NIH Directory and Email Forwarding Service) should be used in addition to the rfc822mailbox or mail attributes, or whether multiple values of the standard "mail" or "rfc822mailbox" attributes should be used.

NIH may wish to make provisions for additional person attributes, such as badge number, and additional role attributes, such as Mail.

3.2 DIT Naming Recommendations

The RDN is the criteria for ensuring that a directory entry at any level of the directory tree is unique between all entries under the same superior (parent) entry. The RDN is not necessarily used in directory search requests or displayed by browsers, but it may be used by clients for correlation purposes, and it is used by the server to maintain uniqueness of entries.

There are four options for RDN structure. Option 2, storing the UID value in the UID attribute is recommended.

CN: Use the person's common name ("Keith Gorlen")

Description: The CN is the RDN

Advantages

Most compatible with clients

CN is what users want displayed

No or few changes required to connected directories

Disadvantages

Multiple hits occur on searches (while users can pick the right entry from a list most of the time, applications cannot)

Does not work in a flat DIT structure, or CNs must contain numbers, etc.

Change of CN changes RDN, DN, and referential attributes, Certificate revocation may be required

Difficult to join

2) UID=1234567: Use a person's Unique ID (UID) in the "uniqueID" or "uid" attribute of the directory.

Description: The UID is the RDN. CN still contains the user's name and is usually used for display. Browsers ignore it during user-driven search or display, but use it for correlation when building offline address books, groups, etc.)

Advantages

Unambiguous

Allows use of either flat or deep structures

UID RDN is increasingly becoming a common approach

Never changes

Disadvantages

A browser may display the DN as the value of an attribute such as RoleOccupant or group Member, which will not be very meaningful to the user.

Display of "meaningless" UIDs as part of the names shown in Group objects may make the management of the directory more difficult.

Use both a UID and a common name together to build the RDN

Description: Both CN and UID are used in the RDN.

Advantages

Only useful to combine these two if it increases probability that a client will display CN and UID (to overcome the disadvantage of Option 2)

Disadvantages

Multi-valued RDN support in products and future LDAP standards is questionable

Change of CN changes RDN, DN, and referential attributes, possible certificate revocation

4) CN=123-46789: Use the UID as the value of the common name.

Description:

UID is the value of the CN, CN is the RDN.

Advantages

None known. It is believed that most implementations support UID.

Disadvantages

Many off the shelf clients will display UID value rather than the user's name.

It is recommended that NIH use a person's Unique ID (UID) in the "uniqueID" or "uid" attribute of the directory. Thus, a typical Option 2 RDN might look like "UID=123-456789".

3.3 NIH DIT Structure

NIH has a large population of "organizationally mobile" People in its directory, including temporary staff such as Visiting Scientists, Contractors, and Exchange Students in addition to employees. NIH believes that a high degree of stability in directory Distinguished Names (DNs) is necessary in order to maintain referential integrity, and to avoid frequent revocation of Certificates due to DN changes.

At the same, time NIH has rich organizational data available in its HR systems. NIH wishes to use this rich organizational data to search for and organize People and Role objects.

For these reasons, NIH proposes to use a hybrid flat+deep DIT as shown in Figure 2. In this design - which we'll refer to as the "Rich DIT" structure, the master copy of user entries would be held in a flat namespace. The typical person would have a distinguished name such as UID=123456789, O=NIH, C=US. The figure below diagrams the proposed NIH DIT structure.

Figure 2: Rich DIT Design

3.3.1 Feedback on Rich DIT Design

We concur that the C=US, O=NIH approach for the top level of the NIH DIT is appropriate given that NIH is essentially located entirely in the US.

We also concur, with reservations, that the Rich DIT Design approach may be appropriate for the inner structure of the NIH DIT. However, while we understand the benefits of the Rich DIT approach, we note that it is not yet a "mainstream" solution supported off the shelf by most products. Today, LDAP/X.500 and meta-directory products are limited to supporting the original hierarchical definition of X.500 and, in their off the shelf configuration, generally limit users to choosing a single hierarchical view of directory information. It is not easy to view an entry in terms of its real world involvement in multiple hierarchies and relationships, such as geographic and organizational. Creating a Rich DIT can be done (using intelligent applications, multiple copies of entries, or aliases), but this approach has its costs.

Because of these concerns we conducted an RFI against a number of directory vendors, including CDS, IBM/Telstra, ISOCOR, OpenDirectory, Siemens, and Zoomit. The response from these vendors suggests that it may be practical to build the Rich DIT, but at the same time confirms some of our concerns.

We begin our feedback by noting that the Rich DIT design is one of four basic approaches that we have identified for the inner structure of an enterprise DIT. These approaches are:

1) Organizational DIT: Use organizational unit structures (such as ICs and departments within ICs) under O=NIH

2) Geographical DIT: Use locality or organizational unit structures containing site or building names under O=NIH.

3) Flat DIT Only: People entries are stored in a flat list under "OU=People" under "O=NIH"

4) Rich DIT: This is the hybrid flat+deep tree with multiple views, possibly both organizational and geographical. A flat database would contain the "master" names for people. Certificates, roles and other references point into the flat namespace. From the flat namespace or other sources, deep views are generated.

The advantage of the Rich DIT is that it may provide the best of all possible worlds for both searching and browsing. Users can essentially view the directory data through many prisms -- flat, organizational, geographical, or others. Administrators may also potentially be able to use deep views to define access controls for those attributes that are updated from the enterprise directory rather than from a connected directory. Another advantage of the Rich DIT is that it might enable NIH to go into the pilot phase with inaccurate or incomplete organizational information, use flat people data, and "play around" with the best organizational information available until the Rich DIT is ready for production rollout.

Among the disadvantages of the Rich DIT are that NIH may encounter referential problems between the flat and deep views. If aliases to the flat DIT are used, they must be kept in synchronization, that is, when an entry is deleted or added, the alias must also be simultaneously deleted or added. Likewise, if copies are used, all the attributes in the copy of the person in the deep DIT must be added, deleted, or changed simultaneously in the flat view. Maintaining this referential integrity may degrade performance if a great many updates must be processed during a reorganization (or reload) of the directory. In addition, NIH may face difficulty defining access controls for updates that are done to the enterprise directory. This is because X.500 and X.500-like products typically depend on access controls defined at the container, or naming context level and do not provide means of easily customizing access controls in a flat namespace. In conclusion, the combined complexity of maintaining referential integrity, assuring adequate performance, and managing access controls will result in increased development costs and complexity for NIH over deploying a more traditional, if simpler and less functional, structure.

The Burton Group therefore recommended that NIH conduct a Request for Information (RFI) with major vendors and a Proof of Concept for the Rich DIT prior to full committing to this approach. Vendor RFI responses can be summarized as follows:

CDS: Can provide a Rich DIT through a customized solution using its Global User Agent. Defers most issues to a consulting process. Providing granular access control in a Rich DIT would be costly.

IBM and Telstra: Does not recommend a Rich DIT structure. Could do it and guarantee referential integrity, but not granular access control.

ISOCOR: Does not recommend the Rich DIT, but could do it once their Meta-Connect product is available.

OpenDirectory: Can provide the solution using a single DSA, expressed some performance concerns.

Netscape: Does not recommend Rich DIT. Recommends flat structure with OU attributes in entries. Can provide granular access control in the flat DIT.

Siemens/Nixdorf: Has not done Rich DITs in the past, but could try. Might have performance issues.

Zoomit: Could support the Rich DIT. Recommends storing master data in the deep part of the Rich DIT.

Based on the RFI responses to the Rich DIT, The Burton Group remains concerned about the access control and performance implications of the Rich DIT. We have thought of two alternative "Rich DIT simulation" approaches below (rich attributes or saved searches) to address performance concerns.

On access control, first note that there are four possible sources of updates to the enterprise directory: 1) connected directory initiated (indirect), 2) end user initiated, 3) central support group initiated, and 4) IC (or other organizational grouping) initiated. In the case of IC-initiated updates, it will be difficult to define granular access controls against the flat namespace which is part of the Rich DIT or Flat DIT options. Of all products, only Netscape provides access control features dealing with contingency. Thus, with the Rich DIT or the Flat DIT NIH faces the possible requirement to develop custom enterprise directory update software. This is not necessarily a bad thing, however, since it may be in any case desirable to embed directory-related business processes into the custom update logic.

The Burton Group continues to recommend the Proof of Concept activity, and also recommends that four (4) approaches to the Rich DIT be evaluated by the Proof of Concept technical team. These approaches are described as follows:

Rich DIT using Copies: One of the Rich DIT areas (Deep or Flat) contains copies of the People entries in the other, which is the "master." Maintenance issues ensue, and search bases must be separated so that clients don't get duplicate results on simple searches. However, the respective Rich DIT areas are navigable by off the shelf clients supporting "walk the tree" style user interfaces.

Rich DIT using Aliases: The Deep DIT contains alias pointers to the Flat DIT people entries. Maintenance issues ensue, and there concerns about diminishing industry support for aliases.

Saved Search: The Deep DIT is populated with Organizational Unit entries, each containing an LDAP URL attribute that, when selected (clicked) by the user, causes a search into the Flat DIT for People entries containing all the OU attribute values in the current OU's Distinguished name path. This dynamic approach minimizes maintenance, and performance impacts are likely minor. It could be implemented as web server-based logic triggered by browser based exploration of the Deep DIT. However, off the shelf LDAP clients attempting "walk the tree" style user interface access to the Deep DIT would not automatically invoke the saved searches or display them appropriately for the user.

Rich Attributes: People entries in the Flat DIT would contain a textual nihOrganizatinalPath attribute value comprised of all ordered, slash-delimited organizational unit values representing the organizations to which the person belongs. This approach minimizes maintenance and performance impacts. A "walk the tree" function using the rich attributes could be implemented as web server-based logic triggered by browser based exploration of the Flat DIT. However, off the shelf LDAP clients attempting "walk the tree" style user interface access to the Flat DIT would not use the rich attributes.

Among the questions to be answered in the Proof of Concept are:

Will SACs be sufficient to construct a deep organizational tree?

Will SACs in HRDB records be sufficient to populate entries in deep organizational tree?

What might be the best format for "ou" attributes in nihOrgPerson schema (e.g., OU=CIT/CFB vs. OU=CIT,OU=CFB)?

Is it better to use aliases, copies, rich attributes, or saved searches to provide the Rich DIT?

How should abbreviations of long, unwieldy organizational unit names be implemented?

What RDN issues are involved with COTS products? Should UID=123-456789 be used as the RDN format for the nihInetOrgPerson?

Can we update the DIT structure after it has been constructed; i.e., add/delete/move people and organizational units?

Is the performance of such updates acceptable?

Does the DIT provide sufficient security and access control flexibility?

Do COTS clients work correctly and conveniently with this DIT structure, especially aliases and the RDN structure?

Do searches of the flat people directory perform acceptably well under load?

Do searches of the deep organizational view perform acceptably well under load?

What procedures or search base setting should be used to ensure that clients don't see two copies of the information (with or without aliases)? How are these search base settings distributed?

Can NIH replicate both the flat and deep view and what is the performance of such replication?

How much performance improvement results from replication?

Is the performance of replica synchronization acceptable?

It is important to amplify several of these questions to ensure that testing is sufficient to really prove the Rich DIT concept. The key areas that MUST be stressed during the proof of concept are interoperability and performance.

Interoperability must be stressed, particularly if aliases are used, by testing the Windows LDAP client, IE LDAP client, Outlook Express and Outlook 98 LDAP clients, Netscape Navigator LDAP client, and Notes LDAP access against the Rich DIT once a prototype has been constructed. It is also important to test COTS server products (e.g. Netscape) against the rich DIT.

There is some doubt about the performance capabilities of Zoomit or other meta-directory products that perform batched merge-and-compare operations against connected directories. NIH should classify the nihOrgInetPerson attributes according to update quality required, and stress test the meta-directory during proof of concept by generating heavy streams of updates against the meta-directory once it 1) includes links to several connected directories, 2) contains the management agents to construct the Rich DIT, 3) replicates the Rich DIT content to three (3) or more servers. Benchmarking tools should be developing to ensure the solution is robust enough to stand up to worst-case production scenarios. The Burton Group is familiar with specialized resources that can assist with this type of benchmarking task if needed.

3.4 NIH Logical/Physical Design

NIH plans an enterprise directory logical/physical design incorporating the following elements:

LDAP query and web browser client support

Meta-Directory services integration with connected directories operated at the enterprise or IC level. Users will continue to access the connected directories for local application needs. Administrators may in some cases continue to update the connected directories.

Directory updates via the Web, performed by either end users (for certain attributes) and administrators

Relational DBMS support for applications using SQL or ODBC; a "minimum schema" image of the directory will be maintained in relational table format.

Rich DIT structure: An organizationally structured DIT will be created using the SAC and organization affiliation table. Then a flat person namespace will be created using the person entries from the HRDB tables. Aliases or copies of the person data will be created in the organizational DIT for each person entry in the flat person namespace under each person's respective organization as indicated in their HRDB entry via the SAC. Note, however, that this only defines a simple proof of concept and is not indicative of the design of the ultimate NIH directory (e.g., the UID DB will instigate the creation of new entries).

UID generator to generate a unique id for each person that uses NIH facilities, whether employee, contractor, visiting scientist, etc.

This overall model is shown in Figure 4.

Figure 3: Directory Logical/Physical Design

3.4.1 Feedback on Logical/Physical Directory Design

The design in Figure 3 as described by NIH represents a good, sophisticated approach to building comprehensive enterprise directory services. However, the NIH AMG Directory team should ensure that appropriate flexibility is maintained during the Proof of Concept/Pilot design phase. All update scenarios should be considered, not just the ones implied by the directions of arrows in Figure 3. For example, it might be preferable for the UID Generator to be embedded in the meta-directory services, which then create the minimum schema database. It may also be appropriate for the meta-directory services to be provided by a separate product than the one used to satisfy LDAP queries in production during the heavy 8:00 AM email "rush hour."

We present the remainder of our feedback on the logical/physical NIH directory design in the form of a set of architecture guidelines below.

4 Other NIH Directory Architecture Guidelines

The following section provides a set of architecture guidelines for directory clients, servers, and enterprise directory components. Note that while only the enterprise directory components are currently being actively developed by the AMG during the Proof of Concept and Pilot phases, the enterprise directory design implies certain architectural characteristics and operational guidelines for directory client and server components. These recommendations and guidelines for directory client and server components should be used as a basis for evaluating new or updated directory clients, connected directory servers, and the enterprise directory services.

4.1 Recommended NIH Directory Clients and Applications Guidelines

New or updated NIH directory clients, such as e-mail and locator applications should utilize the LDAP protocol and standard directory object/attribute schemas for user and organizational information. This will make more directory information accessible to end users, and help NIH move towards a single logical directory for user information.

New or updated NIH applications and operating environments should be capable of accessing required user information in the NIH Enterprise Directory via LDAP. Over the long term, some applications should store system configuration information in the enterprise directory or in a standard NOS directory such as ADS or NDS rather than in an application-specific repository.

It is recognized that for some time some NIH applications will continue to require their own embedded directories (connected directories) and that many users will access these connected directories, for example, from within a proprietary e-mail system. NIH users of connected directories still indirectly benefit from the NIH enterprise directory, whose meta-directory services may feed the connected directory additional attributes, such as telephone number, job title, etc.

In addition, NIH applications may use SQL or ODBC forms of relational access against the minimum schema database maintained in relational table format as part of the NIH enterprise directory. Note, however, that within the Microsoft Active Directory Systems Interface (ADSI) framework, ADO and OLE DB interfaces will enable applications to look at LDAP-retrieved data in row-column format. As ADSI use proliferates in NIH, or as other vendors adopt similar approaches, the need to maintain a relational copy of the directory can be periodically reviewed.

In general, NIH LDAP directory clients and applications should follow the following guidelines:

Protocols: Access the NIH Enterprise Directory via LDAP Version 3 (RFCs 2251-2256) over TCP/IP.

Portability: LDAP program libraries compliant with the LDAP API (RFC 1823) should be installed on workstations. However, this interface is somewhat too low level for some applications. The Microsoft Messaging API (MAPI) or Active Directory System Interface (ADSI) will be appropriate for some email applications and script-based applications in the Windows NT environment. JAVA LDAP clients should employ the JAVA Naming and Directory Interface (JNDI).

Schema: Support the common X.500, IETF, and LIPS defined attributes off the shelf, and enable other attributes for the nihInetOrgPerson schema to be configured. Clients should be configurable to support new attribute definitions, which should be settable at the administrator level, for example, via JAVA classes.

Security: Use Secure Sockets Layer (SSL) or Kerberos for strong LDAP authentication to the directory for administrator privileges or access to sensitive information. Use simple passwords, Kerberos, or SSL for LDAP access to non-sensitive information within the NIH Intranet. Use the X.509 Version 3 Certificate for end to end, public key based services.

User interface: Avoid using typed names containing "/" or "=" characters. Support simple filters and wildcard search as well as subtree navigation.

Caching and failover: Enterprise applications should implement caching of any system information that is stored in the directory, and without which they could not operate. Other clients may implement caching as a value-added feature, but cached information should be refreshed periodically. Clients should be capable of failover to an alternate directory server should their primary source become unavailable.

Advanced features: Advanced LDAP/X.500 features such as referrals and manipulation of various LDAP/X.500 service parameters should be handled transparently in the client as much as possible, with minimum involuntary involvement of end users. Mechanisms for administrative feature default control should be investigated.

4.2 Recommended NIH Directory Server Guidelines

New or updated NIH directory server systems operating at the enterprise or IC level should comply with the following guidelines:

Portability: LDAP program libraries compliant with the LDAP API (RFC 1823), JNDI, or ADSI should be installed on the server.

Access Protocols: Enable client access via LDAP Version 2 (RFC 1777) or LDAP Version 3 (RFCs 2251-2256) over TCP/IP.

Security: Support Kerberos or SSL for strong LDAP authentication to the directory for administrator privileges or access to sensitive information. Use simple passwords, Kerberos, or SSL for LDAP access to non-sensitive information within the NIH Intranet. Use the X.509 Version 3 Certificate for end to end, public key based services.

Connected directory update: Servers should write-enable their connected directories via LDAP or the Lightweight Directory Interchange (LDIF) batch file format so that they can be updated either by an administrator, or by the NIH meta-directory services, in a standard manner.

Management: LDAP access or LDAP update should support NIH's choice of propagation or synchronization update strategies.

Schema: Support the common X.500, IETF, and LIPS defined attributes off the shelf, and enable other attributes for the nihInetOrgPerson schema to be configured. Server should be configurable to support new attribute definitions, for example, via JAVA classes.

Distributed operation: When queried for remote information (i.e., not within their portion of the NIH Enterprise Directory namespace), servers should be configurable to refer LDAP callers to the NIH Enterprise Directory. In some cases, the NIH Enterprise Directory may likewise refer queries to the directory server for information that is within its namespace.

Caching, replication, and failover: If necessary for performance and availability, the enterprise applications and operating environments should implement local directories. Caching can be used for building a local directory, but cached information should be refreshed periodically from the NIH directory. As LDAP replication becomes available, enterprise applications with local directories can obtain copies of the directory through replication and thereby be automatically updated. Servers without local directory caches or replicas should be capable of failover to an alternate directory server should their primary source become unavailable.

4.3 Recommended NIH Enterprise Directory Guidelines

As noted above, the NIH Enterprise Directory includes LDAP query support, meta-directory services to communicate with connected directories, Rich DIT support, the minimum schema database with relational query support, and the UID generator.

Initially the NIH Enterprise Directory may function almost as a "data warehouse" that collects information from connected directories and provides read access to that information for users and applications. Over time NIH should transition the NIH Enterprise Directory into an "operational database" that is updated by administrators and/or by end users themselves. This transition is expected to begin with the UID Generator; next, the NIH Enterprise Directory could become the centralized update location for parking and badge information, and later it might replace the PH database used in e-mail forwarding.

As part of its enterprise directory development, the AMG Directory Team will need to determine which directory information attributes are:

Updated through connected directories

Updated centrally in the enterprise directory by centralized support groups

Updated centrally in the enterprise directory by IC administrators

Updated centrally in the enterprise directory by facilities staff

Updated centrally in the enterprise directory by end users

Deciding which update methods to use for which directory information attributes combines economic, business process, and political/organizational issues. The enterprise directory must provide the flexibility to enable central or distributed mastering of information, making itself accessible to both secure online updates and batch updates.

4.3.1 Enterprise Directory Interfaces

The enterprise directory should enable the following interfaces:

Client or server access via LDAP Version 2 (RFC 1777) or LDAP Version 3 (RFCs 2251-2256) over TCP/IP. Note that is possible for the target LDAP repository to be provided by a different product than the one employed for meta-directory services.

Synchronization or propagation interfaces to connected directories such NT, NDS, PH, HRDB, NIH Telecommunications Directory, the Scientific Directory, etc.

Optional X.500 Directory System Protocol (DSP) in compliance with the 1988 and 1993 X.500 standards, and X.500 Directory Information Shadowing Protocol (DISP) in compliance with the 1993 standards (deployed in the event of a business requirement)

Server based LDAP API, RFC 1823. Scripting access.

4.3.2 Enterprise Directory Management and Administration

The enterprise directory should enable meta-directory functions, including both synchronization (where it acts as an active data warehouse, merging and reflecting attributes from connected directories), and propagation (where it functions as a creator of attributes and propagates them to connected directory environments).

The following administration and management capabilities are required from the enterprise directory:

UID Generator as the Source for Existence of a Person in the NIH Enterprise Directory

Generate Rich DIT view and minimum schema database view

Email directory synchronization: Synchronize with off the shelf e-mail packages, such as cc:Mail, Lotus Notes, Microsoft Exchange, Novell GroupWise, etc.

General flat file synchronization: Load or update the directory from configurable flat file formats. Generate flat files that can be used to update other environments. Flat files will most likely be used to interchange information between the meta-directory services and databases such as HRDB, PAID, NIH Telecommunications Directory, and others.

NT Synchronization: Synchronization and use of NT registry and SAM data

NDS/NetWare Synchronization: Synchronization NetWare 3.x and 4.x directory services.

Join information: Create a directory that merges information from various connected directories

Synchronization with the UID Generator: The UID generator is the authoritative source for entry existence.

Handle multiple namespaces with overlapping people information: For example, some contractors may be registered in PAID, but not in any of the email directories. Other contractors may be registered in an email directory but not in PAID. Many contractors will be registered in both.

Entry Inclusion, exclusion rules: Filter entries selectively to and from environments; e.g., may not want Notes server records in Rich DIT view

Attribute Inclusion, exclusion rules: Filter attributes selectively; may not want NT account management parameters in Rich DIT view

Create entries/attributes: Use meta-directory as authoritative source that propagates to other environments. For example, enable end users to update their phone numbers, security officers to update badge numbers.

Bi-directional sync: Allow some information to be changed either in a connected directory or in the meta-directory. Resolve conflicts among multiple databases that could potentially act as the source of an attribute.

Synchronization Scheduling: Schedule synchronization to occur at "slow times" of day for non-critical information, and more frequently for attributes used in security sensitive processes.

Exception handling: Handle identical names, duplicate names, multiple mailboxes

Callouts/exits: Allow customization by adding code

Statistical reports

4.3.3 Enterprise Directory Security

The following guidelines should be followed for enterprise directory security:

Use SASL/SSL or Kerberos for strong LDAP authentication to the directory for administrator privileges or access to sensitive information. Use simple passwords, SSL, or Kerberos for LDAP access to non-sensitive information within the NIH Intranet. Use the X.509 Version 3 Certificate for end to end, public key based services.

Create prescriptive access controls that limit access to certain entries, attributes, and portions of the namespace based on whether a user has been authenticated or not, and whether an authenticated user belongs to NIH, or to a privileged function (i.e., administrator) within NIH. Strong authentication is required for access to administrator update privilege levels, but not (at least initially) for end user access. Access controls should allow IC information in the enterprise directory to be updated by IC authorized administrators only, if desired.

Create a "firewall DSA" that has no access privileges into NIH and maintains a "sanitized" replica of the NIH directory content. Position that DSA on the firewall where it can safely handle external access. Note: Once strongly authenticated LDAP is in place, the "firewall DSA" might be reconsidered as the same effect could be achieved through access control.

Additional guidelines must be developed for using system information stored in the directory in order to enable an authenticated, location-independent binding to computing resources. The directory could, for example, act as repository for Kerberos Principal information, public key Certificates, and access controls to be used by single sign on software and/or applications that act as gatekeepers, but consult the users’ access control privileges as centrally configured in the NIH Enterprise Directory. This approach may prove effective in facilitating specialized applications (such as transitive logon from NT to NDS, or vice versa) that must access multiple environments. Some applications may use the NIH Enterprise Directory as a repository for their user profiles, others may use a NOS directory, such as the Microsoft Active Directory, or even a local directory maintained by the application.

Security procedures must also be build into the UID Generator registration and deregistration processes. Adequate audit trails must be provided.