Referencedata sets contain relatively static, unchanging data values that are commonly used by an organization. In Pentaho Data Catalog, you can create referencedata sets that contain valid data values for your organization to reference.
Some examples of common referencedata sets include:
Branch Numbers
Country codes
Currencies
Exchange codes
Language codes
Measurement units
Postal codes
Product codes
Regions
Transaction codes
Import the 'Medical' Reference Dataset
Ensure you have logged in as: Data Steward.
Username
data_steward@hv.com
Password
Welcome123!
Click Reference Data in the left navigation menu & select: Import from the drop=down Actions menu options.
The parent Medical Category is added together with a 'Data Set' Antibiotics.
Import Antibiotics Data Set
Highlight the 'Antibiotics Data Set'.
Click on: Data Values (currently there are no data values).
Note the schema: Sr, ID, Antibiotic Name
Scroll along and select: Import
Proceed with Import
Complete the steps ..
Choose file type.
Import Data Set.
Review & assign version.
Check Data Values.
Commit Data Values
Create areferencedataset to categorize enterprise dataand maintain organizational consistency.
If you need a new category to contain the referencedataset, you must create the category before creating the referencedataset.
Click ReferenceData in the left navigation menu.
In the ReferenceData menu, click Actions -> Add New DataSet.
In the DataSet Name box, Enter aDataSet Name:
In the Parent list, select the category or referencedataset that you want to be the parent of the new referencedataset.
Select a reference dataset as a parent only for organizational purposes. Reference datasets do not inherit any properties or information from parent reference datasets.
Click Create. A new, empty referencedataset is created and the Summary tab for the new referencedataset opens.
In the Description box, enter a description for the referencedataset.
In the Purpose box, enter an explanation of the purpose for the referencedataset.
(Optional) In the Properties box, update one or more of the following properties:
Property
Value options
Sensitivity
Unknown (default)
Low
Medium
High
Status
Info (default)
Valid
Warning
Expired
Version
1.0 (default)Note: The version number can only be increased.
Click Save.The referencedataset is created.
Add schema for areferencedataset so that you can maintain data quality by standardizing and controlling what data values can be entered in the referencedataset.
For example, you can add schema to specify that the value for a type of information is selected from a pre-defined list, and then specify the list of valid values.
A schema can be added that has the same values in all columns as an existing schema, but has a unique identifier assigned to it in the system. If the duplicate schema are used in different parts of an organization and one schema is updated, then the reference data values that the schema is meant to control might no longer be consistent across the organization.
Verify that a schema with all the same values does not already exist before adding a new schema.
You can also import reference data schema and values in a CSV file or from a Data Catalog table by clicking Import to open the Import ReferenceData wizard.
Perform the following steps to add a schema to areferencedataset:
Click ReferenceData in the left navigation menu.
In the ReferenceData menu, navigate to the referencedataset that you want to update, and then select the referencedataset.
Click the Schema tab.
In the ReferenceData Schema table, click + Add Row.
In the new table row, update the following fields:
Field
Description
Column Name
A column name that represents the type of data that the schema controls.
Data Type
The type of data that can be entered as a value. Data Type options include:
Text
String
Integer
Float
Binary
Length
The number of characters that can be entered for the value.
Input Type
The input method that can be used to enter a value. Input Type options include:
Pre-defined
Free text
Valid Value
A comma-separated list of values that are valid as input. You must update the Valid Value field when the schema Input Type is Pre-defined.
For example, to create a list of colors that a user can select from, you might enter the following list of valid values: red, yellow, blue.
Editable
A switch that that can be toggled to specify whether the schema can be edited. Editable options are:
no
yes
You must have the Admin user role to specify whether a schema can be edited.
On the right side of the new table row, click Save.
The new schema is saved to the ReferenceData Schema table and is added as a column to the ReferenceData Values table on the Data Values tab.
Populate areferencedataset with values to serve as authoritative lookup references for fields that are governed by the referencedataset.
A reference data value can be added that has the same values in all columns as an existing reference data value, but has a unique identifier assigned to it in the system. If the duplicate values are used in different parts of an organization and one value is updated, then the reference data is no longer consistent across the organization.
Verify that a reference data value with all the same values does not already exist before adding a new reference data value.
Perform the following steps to add values to areferencedataset:
Click ReferenceData in the left navigation menu.The ReferenceData page opens.
In the ReferenceData menu, navigate to the referencedataset that you want to update, and then select the referencedataset.
Click the Data Values tab.
Click + Add Row.Note: If the value already exists in a row that is disabled, you can re-enable that row by toggling the Status switch to the Enabled position.A row is added to the ReferenceData Values table. Columns in the table correspond to the schemas that are defined on the Schema tab.
Update the new table row with values that adhere to the schema that controls each column.
On the right side of the new table row, click Save.
The new values are saved to the ReferenceData Values table.If you made multiple modifications to the ReferenceData Values table, consider committing a new version of the referencedataset.
Adda business term to areferencedataset to clarify the context for using the dataand to enhance organizational understanding of the data.
Perform the following steps to adda business term to areferencedataset:
Click ReferenceData in the left navigation menu.The ReferenceData page opens.
In the ReferenceData menu, navigate to the referencedataset that you want to update, and then select the referencedataset.
Click the Business Terms tab.
In the Business Terms tab, click Add Terms.The Add Business Terms dialog box opens.
Navigate to the business term that you want to add to the referencedatasetand select it.
Click Add.
The business term is added to the referencedatasetand appears in the Business Terms table.