Pentaho Data Catalog
Data QualityData IntegrationBusiness Analytics
  • Overview
    • Pentaho Data Catalog ..
  • Overview
  • Data Catalog
    • Getting Started
      • Data Sources
      • Process the data
      • Identify the data
      • Business Glossaries & Terms
      • Reference Data
      • Data Lineage
    • Management
      • Users, Roles & Community
      • Data Identification Methods
      • Business Rules
      • Metadata Rules
      • Schedules
      • Workers
    • Components
      • Keycloak
      • Reverse Proxy Server
      • App Server
      • Metadata Store
      • Worker Server
      • Observability
Powered by GitBook
On this page
  1. Data Catalog

Getting Started

Walk-through of Pentaho Data Catalog ..

PreviousOverviewNextData Sources

Last updated 2 months ago

Introduction

Pentaho Data Catalog is a powerful tool that enables data engineers, data scientists, and business users to accelerate their data intelligence journey. It automatically discovers, classifies, and contextualizes structured and unstructured data. Here are some key features:

  • Powerful Business Glossary: Contextualize data with business vocabulary based on governance policies and business rules. This helps activate metadata and ensures alignment with business language.

  • Data Lineage and Trust: Track data lineage with Open Lineage, building trust as data flows through your organization. Enable data quality and remediation activities.

  • Observability and Monitoring: A robust observability stack captures popular assets, popular searches, and trends. This helps stewardship organizations focus their energy on the right data.

  • Integration and Scalability: API-powered integrations with various platforms (NetApp, SAP Hana, S3, SQL views) ensure interoperability. The modern architecture design scales seamlessly.

  • Enterprise Security: Features include role-based access control (RBAC), password vault support, minimum privileges, multifactor authentication, secure cloud deployments, and no data deduplication.

Discover, understand, and govern your data with Pentaho Data Catalog. It offers faster discovery, lower total cost of ownership (TCO), and improved data quality.


Accessing Your Catalog

x

To access your catalog, please follow these steps:

  1. Open Google Chrome web browser.

  2. Enter the following email and password, then click Sign In.

Username
Password
Role

system_admin@hv.com

Welcome123!

All the Roles combined

admin@hv.com

Welcome123!

Community & User Administrator

business_steward@hv.com

Welcome123!

Manage Business Glossary

business_user@hv.com

Welcome123!

View Business Glossary

data_user@hv.com

Welcome123!

Add & Delete content

data_developer@hv.com

Welcome123!

Manage Business Rules & Domain Assets

data_steward@hv.com

Welcome123!

Manage most features except Glossary

Security Advisory: Handling Login Credentials

For enhanced security, it is strongly recommended that users avoid saving their login details directly in web browsers. Browsers may inadvertently autofill these credentials in unrelated fields, posing a security risk.

Best Practice

• Disable Autofill: To mitigate potential risks, users should disable the autofill functionality for login credentials in their browser settings. This preventive measure ensures that sensitive information is not unintentionally exposed or misused.

User Interface

You can access your user profile via the top menu bar or navigate to various features using the left menu bar.

The Home page serves as a central hub for accessing business tools relevant to your role, including data canvas, business glossary, management tools, and worker resources.


Top Menu

Access the top user menu to manage your user profile, manage assigned data sources, and log out.

View the following table for details about these features:

Icon
Name
Function

Apps

Click to explore all apps associated with Data Catalog, including Dashboard, that extend the visual discovery and relationship discovery capabilities of Data Catalog.

Profile

Click the Profile icon to open the User Profile and Data Sources where the user can manage the details and assign data sources with the required access levels.

More

Click the More icon and select Log Out to log out of Data Catalog

Edit

Click Edit to open the Landing Page Options window, where you can configure the landing page with available options in Shortcuts and Tables. Additionally, you can choose to have a vertical or stacked layout in Layout.


Left Menu Options

Icon
Name
Feature

Home

Returns you to the Home page from your current location in Data Catalog.

Data Canvas

Glossary

Opens the Business Glossary page where you can create, organize, and curate business terms to help you navigate your data.

Reference Data

Management

Workers

Monitor the data activities’ progress on the Worker’s screen.

Explore Your Data with the Data Canvas

Dive into the Data Canvas to uncover and analyze your data in depth. This powerful tool provides extensive insights into resource metadata, enhancing your comprehension and illustrating real-world applications. Discover the potential of your data through this intuitive platform.

Once you have processed a dataset ..

  1. Click Data Canvas in the left navigation menu to open the Data Canvas view.

Item
Name
Description

1

Top Navigation

Navigation path. Navigate the tree of data entities to find the one you want to explore in the canvas.

2

Displays information about the selected entity / resource.

3

‍Data lineage refers to the ability to track the origin and movement of data throughout its lifecycle. Data lineage helps to ensure data accuracy, troubleshoot issues, and meet compliance requirements.

4

Key Metrics

Metrics to indicate the overall Data Quality (pulled from Pentaho Data Quality) of the resource. You can set the Sensitivity & Trust Score

5

You create business terms to standardize definitions of business concepts so that your data is described in a uniform and easily understood way across your enterprise.

Business terms can describe the contents of the data, the sensitivity of the data, or other aspects of the data, such as the subject or purpose of the data. You can assign one or more business terms to individual columns in relational data sets, to other governance artifacts, or to data assets.

6

Properties

Metadata about the asset / resource, for example: 'Last Update' &

7

Tags

8

Custom Properties

Using Galaxy View for Advanced Data Searches

Galaxy view offers an intuitive approach to navigating complex data structures, empowering users to conduct precise searches across databases. It's an invaluable tool for roles such as information security officers who need to pinpoint sensitive information efficiently, like credit card data within expansive databases.

Key Features:

  • Search Flexibility: Easily search for terms like "credit" with the ability to filter results. Filters such as Columns allow users to identify specific columns containing credit card information, while the Tables filter returns tables explicitly named with "credit".

  • Scope Definition: Tailor your search scope using filters to streamline the process of locating pertinent information. This ensures that you only get relevant results matching your search criteria.

  • Data Visualization: The Galaxy view provides a comprehensive overview, highlighting data relationships at a glance. This bird's eye view is particularly useful for understanding the structure and interconnections of your data beyond what a traditional navigation tree offers.

  • Drill-Down Capability: Once in the Galaxy view, users can delve deeper into specific data points for detailed information, ensuring a thorough analysis of the data structure and content.

Galaxy view is especially recommended for those who require a macro yet detailed perspective on data relationships, making it easier to manage and analyze vast databases effectively.

Obviously .. the data needs to be processed and Business Glosseries & Terms added.

  1. To access Galaxy view from the Data Canvas, select, for example 'synthea' folder.

  1. Click 'Actions' and select 'View Galaxy'.

Here are the key tasks you can perform in Galaxy view:

Task
Description

Search

Enter a keyword and select Search to find specific information within the resources. For example, enter "patients" to just show those sources, tables, and columns containing patients information.

View Details

Right-click on a selected data resource or column in Galaxy view and select View Details. The details panel appears. Depending on your selection, you can view different information, such as properties, tags, and custom properties. Additionally, you can also view and add business terms to the resource.

View Items

Right-click on a data resource in Galaxy view and select View Items. You can view the associated parent and child data assets in a tree view.

Select and Select Tree

To select a single data resource, right-click on a data resource in Galaxy view and click Select. Additionally, you can select associated data resources by clicking Select Tree. When an item is selected, you can right-click and Deselect the item

Focus

Right-click on a selected data resource or table and select Focus. Only the resource and its children appear. Continue to drill down using the Focus option as needed or select Leave Focus to return to the full view.

If you want to reduce the amount of data displayed, you can filter the level of detail in your view by columns or tables.

  1. In Galaxy view, click Filters to open the Filters dialog box and select one or more of the following options:

Filter Option
Description

Level of Detail

By default, Galaxy view shows down to the table level as a reduced set of data. Click Columns and apply to have a detailed view down to the column level.

Show Relationships

Helps to limit the results in the view with the data resources that are Declared Foreign Key and Discovered Foreign Key.

Show only Related Items

When the Show Relationships is active, you can choose a threshold number to refine the results further.

Show only Tagged Items

Select this check box to limit the results in the view with the data resources that have associated tags. You can further refine your view by selecting specific tags.

Show only Items with Business Terms

To further limit the results in the view, select this check box. You can further refine your view by selecting specific business terms.

Show Data Elements

You can also choose to show the data elements by selecting this check box.

Reset

Discard your filters.

Navigate to

Explore your data in the Data Canvas. For more information, see .

For more information, see .

Opens the Reference Data page. Reference data sets contain relatively static, unchanging data values that are commonly used by an organization. For more information, see .

Manage your data sources, users, user roles, workers, business rules, schedules, dictionaries, and more in the page.

https://pdc.pdc.lab/
Business Glossary
Reference Data
Manage Your Environment
Data Lineage
Data Canvas
Business Terms
Pentaho Data Catalog - Workflows
PDC Login
Pentaho Data Catalog - UI
Data Canvas
Actions - View Galaxy
View Galaxy - synthea
Filters
Filter to Column level that have PII tags.