GO TO: Tab 1, Tab 2, Tab 3, Tab 4, Tab 5, Tab 6
FVC-APP Help for the CIHM:  Please click [To Be Done] on a field name to go to a description of the data element for that field.

Details: FVC Application Client Identification Hashing Module (in progress)

Terms listed below in ALL CAPS correspond to the field names on the CIHM screen of the FVC-APP.
 

 DRAFT DRAFT DRAFT

Garrett Pilot Data Mart

CDI-10 – CIHM Product Analysis

August 31, 1999

AUTHOR:Corey Ellsworth

REVISION: 1 [Note this document is in the process of being updated,
as there have been some revisions to the CIHM since 8/99. Updates will be posted when available.]

Table of Contents:

Introduction

2 Product Resolution

2.1CIHM Requirements

2.1.AHashing Function

2.1.BNaming Convention

2.1.CAliases

2.1.DMissing ID’s

2.2CIHM Solutions

2.2.A Hashing Function

2.2.BNaming Convention

2.2.CAliases

2.2.DMissing ID’s
 

Conclusion



1 Introduction – The Client Identification Hashing Module (CIHM)

The Garrett Pilot Data Mart project will consist of two OLTP databases and one OLAP database.True names will not be stored in any of the databases.To ensure data continuity and individual anonymity between the three databases a standard data entry protocol will need to be implemented.The CIHM incorporates this protocol into a set of functions that will be used throughout the Pilot Data Mart project.This document will detail the CIHM.

Product Resolution

2.1CIHM Requirements

The following sections will detail the requirements of the CIHM as defined in the document “Performance Requirements Specification – Section 4”.

2.1.AHashing Function

Since the databases will not store true names, a way of maintaining individual anonymity while keeping the data useful will be needed.To achieve this, the SHA-1 hashing algorithm, developed by the National Institute of Standards and Technology (NIST), will be implemented in the CIHM.The SHA-1 algorithm will hash an individual’s name, date of birth and gender into a 160-bit binary number.This number will ensure individual anonymity while still allowing valuable analytical data to be retrieved from the OLAP database.

2.1.BNaming Convention

For the CIHM to identify two separate entries as the same person a standard naming convention will have to be adopted and enforced.The naming convention that has been chosen is First Name, Middle Name, Last Name, Jr|Sr|2nd/etc, Date of Birth and Gender.These data items will be concatenated into one string.For example, Terrance Corey Ellsworth 4/9/75 M.This string will be hashed with the SHA-1 algorithm.To enforce data integrity, the data being fed to the CIHM will need to be run through stringent error checking code.This code will probably not be a part of the CIHM itself, but must be mentioned because of the importance of enforcing data entry protocols.

2.1.CAliases

A person’s name may change over time.On the same note, one department entering into the data mart might not have access to the person’s first, middle, and last name, while another does.To account for this the CIHM will need to gather and store (in the appropriate database) a list of alias names for each individual.These aliases will allow for more accurate reporting and more reliable statistical output.

2.1.DMissing ID’s

During the transfer of legacy data to their respective database the CIHM will be required to handle incomplete identification input.The CIHM will accept this input and handle it in a consistent fashion.

2.2CIHM Solutions

The following sections will describe the proposed solutions to each of the requirements of the CIHM.Features and functionality discussed in these sections will be contained in the prototype unless specifically stated otherwise.

2.2.AHashing Function

The prototype CIHM will be implemented as a Visual Basic class called “CIHM”.The CIHM class will also contain an instance of two other classes, the “ALIAS” class and the “HASH” class.This section will detail the interface provided to the CIHM class by the “HASH” class.

The HASH class will provide the CIHM with one method and two properties.The method is called Hash() and requires one parameter, strMessage.The strMessage parameter is a string that consists of the information to be hashed.The Hash() method makes calls to various private functions that hash the message into the 160-bit message digest (hash).

Once the Hash() method has been executed the two properties of the HASH class will become available.The first and most commonly used property is the HexDigest property.This property returns the hashed message digest in hexadecimal (base 16).In this format, the hashed message can be inserted into the SQL Server binary data field.The second property is the BinaryDigest property.This property returns the message digest in binary.The BinaryDigest property is provided for easier debugging of CIHM applications.Both the HexDigest and BinaryDigest properties are read-only.

2.2.BNaming Convention

In order to ensure that correct data is being fed to the CIHM a naming convention will be enforced on the data input screens of each application employing the CIHM.The data input screens will ensure that at least the first and last name of the individual is entered.They will also convert all letters in the name to upper case.This will eliminate the possibility that one might type John McAfee and another might type John Mcafee.

In addition to checking name input, the input forms will also validate the date of birth as being valid date input.All dates will be converted to a standard MM/DD/YYYY to eliminate inconsistencies in date input.

Finally, the gender field will be a dropdown box. This will ensure that gender input will be standard throughout each database.

Implementing these input-cleansing routines will better enable the anonymous tracking of cases throughout each database.

2.2.CAliases

The ALIAS class contained in the CIHM will provide 2 methods.The first method is the Initialize() method.This method will initialize the class for use with a specific database, based on the parameters passed to it.The parameters passed to the Initialize() method are as follows:

adoActiveConn : ADO Connection Type : provides the class with a reference to

an active ADO connection object

(database interface)

strTName : String Type : Contains the name of the table to manipulate

strFNID : String Type : Contains the DB field name of the field that stores ID’s

strFNHash : String Type : Contains the DB field name of the field that stores

hashed identification information

The second method will be the ID() method.The ID() method will return the ID number of the person who is being hashed.This method will take the following parameters:

strName : String Type : Contains the true name of the individual

datDOB : Date Type : Contains the birth date of the individual

strGender : String Type : Contains the gender of the individual

The ID() method will first hash the individual’s identification and check to see if the individual already exists in the database.If the individual exists, ID() will simply return the ID of that individual to the calling procedure.If the individual does not exist, ID() will invoke a private function that will prompt for aliases of the individual.Each alias will be hashed and checked against the database for existing instances.If none exist then the hashed alias will be inserted into the database.If a hashed alias does exist in the database then the existing ID will be used for all entries for this individual.If the individual is female, the maiden name will also be prompted for.The same procedure will be applied to the hashed maiden name.

2.2.DMissing ID’s

During the transfer of legacy data to their respective database, the CIHM will be required to handle incomplete identification input.To account for this the CIHM will treat all unidentifiable legacy records as a new input and assign an ID accordingly.

The hash field for unidentifiable records will contain the hashed value for the current date/time in MM/DD/YYYY HH:MM:SS format.This will ensure that every hash value will be unique.An identifying database field will distinguish all legacy records from current records.

Legacy records with partial or full identification will be hashed.For tracking and reporting purposes, these records will be marked as legacy data as well.

3.0Conclusion

This document was generated from the working CIHM prototype.All features covered in this document have been implemented in the CIHM prototype.Revisions to this document will be reflected in the CIHM prototype as well.

  Return to the MART Home Page