Pentaho Data Catalog
Data QualityData IntegrationBusiness Analytics
  • Overview
    • Pentaho Data Catalog ..
  • Overview
  • Data Catalog
    • Getting Started
      • Data Sources
      • Process the data
      • Identify the data
      • Business Glossaries & Terms
      • Reference Data
      • Data Lineage
    • Management
      • Users, Roles & Community
      • Data Identification Methods
      • Business Rules
      • Metadata Rules
      • Schedules
      • Workers
    • Components
      • Keycloak
      • Reverse Proxy Server
      • App Server
      • Metadata Store
      • Worker Server
      • Observability
Powered by GitBook
On this page
  1. Data Catalog
  2. Getting Started

Reference Data

PreviousBusiness Glossaries & TermsNextData Lineage

Last updated 1 year ago

Reference data sets contain relatively static, unchanging data values that are commonly used by an organization. In Pentaho Data Catalog, you can create reference data sets that contain valid data values for your organization to reference.

Some examples of common reference data sets include:

  • Branch Numbers

  • Country codes

  • Currencies

  • Exchange codes

  • Language codes

  • Measurement units

  • Postal codes

  • Product codes

  • Regions

  • Transaction codes

Import the 'Medical' Reference Dataset

Ensure you have logged in as: Data Steward.

Username

data_steward@hv.com

Password

Welcome123!

  1. Click Reference Data in the left navigation menu & select: Import from the drop=down Actions menu options.

  1. Browse to:

~/Workshop--Pentaho-Data-Catalog/Reference Dataset/import_assets_1710938718263.csv

  1. Click: Open

  1. Click: Submit

  2. The parent Medical Category is added together with a 'Data Set' Antibiotics.


Import Antibiotics Data Set

  1. Highlight the 'Antibiotics Data Set'.

  2. Click on: Data Values (currently there are no data values).

Note the schema: Sr, ID, Antibiotic Name

  1. Scroll along and select: Import

  1. Proceed with Import

  1. Complete the steps ..

  1. Choose file type.

  1. Import Data Set.

  1. Review & assign version.

  1. Check Data Values.

  1. Commit Data Values

Create a reference data set to categorize enterprise data and maintain organizational consistency.

If you need a new category to contain the reference data set, you must create the category before creating the reference data set.

  1. Click Reference Data in the left navigation menu.

  2. In the Reference Data menu, click Actions -> Add New DataSet.

  1. In the DataSet Name box, Enter a DataSet Name:

  2. In the Parent list, select the category or reference data set that you want to be the parent of the new reference data set.

Select a reference dataset as a parent only for organizational purposes. Reference datasets do not inherit any properties or information from parent reference datasets.

  1. Click Create. A new, empty reference data set is created and the Summary tab for the new reference data set opens.

  2. In the Description box, enter a description for the reference data set.

  3. In the Purpose box, enter an explanation of the purpose for the reference data set.

  4. (Optional) In the Properties box, update one or more of the following properties:

    Property

    Value options

    Sensitivity

    • Unknown (default)

    • Low

    • Medium

    • High

    Status

    • Info (default)

    • Valid

    • Warning

    • Expired

    Version

    1.0 (default)Note: The version number can only be increased.

  5. Click Save.The reference data set is created.

Add schema for a reference data set so that you can maintain data quality by standardizing and controlling what data values can be entered in the reference data set.

For example, you can add schema to specify that the value for a type of information is selected from a pre-defined list, and then specify the list of valid values.

A schema can be added that has the same values in all columns as an existing schema, but has a unique identifier assigned to it in the system. If the duplicate schema are used in different parts of an organization and one schema is updated, then the reference data values that the schema is meant to control might no longer be consistent across the organization.

Verify that a schema with all the same values does not already exist before adding a new schema.

You can also import reference data schema and values in a CSV file or from a Data Catalog table by clicking Import to open the Import Reference Data wizard.

Perform the following steps to add a schema to a reference data set:

  1. Click Reference Data in the left navigation menu.

  2. In the Reference Data menu, navigate to the reference data set that you want to update, and then select the reference data set.

  3. Click the Schema tab.

  4. In the Reference Data Schema table, click + Add Row.

  5. In the new table row, update the following fields:

    Field
    Description

    Column Name

    A column name that represents the type of data that the schema controls.

    Data Type

    The type of data that can be entered as a value. Data Type options include:

    • Text

    • String

    • Integer

    • Float

    • Binary

    Length

    The number of characters that can be entered for the value.

    Input Type

    The input method that can be used to enter a value. Input Type options include:

    • Pre-defined

    • Free text

    Valid Value

    A comma-separated list of values that are valid as input. You must update the Valid Value field when the schema Input Type is Pre-defined.

    For example, to create a list of colors that a user can select from, you might enter the following list of valid values: red, yellow, blue.

    Editable

    A switch that that can be toggled to specify whether the schema can be edited. Editable options are:

    • no

    • yes

    You must have the Admin user role to specify whether a schema can be edited.

  6. On the right side of the new table row, click Save.

The new schema is saved to the Reference Data Schema table and is added as a column to the Reference Data Values table on the Data Values tab.

Populate a reference data set with values to serve as authoritative lookup references for fields that are governed by the reference data set.

A reference data value can be added that has the same values in all columns as an existing reference data value, but has a unique identifier assigned to it in the system. If the duplicate values are used in different parts of an organization and one value is updated, then the reference data is no longer consistent across the organization.

Verify that a reference data value with all the same values does not already exist before adding a new reference data value.

Perform the following steps to add values to a reference data set:

  1. Click Reference Data in the left navigation menu.The Reference Data page opens.

  2. In the Reference Data menu, navigate to the reference data set that you want to update, and then select the reference data set.

  3. Click the Data Values tab.

  4. Click + Add Row.Note: If the value already exists in a row that is disabled, you can re-enable that row by toggling the Status switch to the Enabled position.A row is added to the Reference Data Values table. Columns in the table correspond to the schemas that are defined on the Schema tab.

  5. Update the new table row with values that adhere to the schema that controls each column.

  6. On the right side of the new table row, click Save.

The new values are saved to the Reference Data Values table.If you made multiple modifications to the Reference Data Values table, consider committing a new version of the reference data set.

Add a business term to a reference data set to clarify the context for using the data and to enhance organizational understanding of the data.

Perform the following steps to add a business term to a reference data set:

  1. Click Reference Data in the left navigation menu.The Reference Data page opens.

  2. In the Reference Data menu, navigate to the reference data set that you want to update, and then select the reference data set.

  3. Click the Business Terms tab.

  4. In the Business Terms tab, click Add Terms.The Add Business Terms dialog box opens.

  5. Navigate to the business term that you want to add to the reference data set and select it.

  6. Click Add.

The business term is added to the reference data set and appears in the Business Terms table.

x

x

Import Reference Dataset
Import 'Medical' Reference Dataset
Data Set: Antibiotics
Data Values
Choose file type
Import Data set
Check Reference Data Values
Review
Check Data Values
Assign version when Confirming
Add New DataSet