Pentaho Data Catalog
Data QualityData IntegrationBusiness Analytics
  • Overview
    • Pentaho Data Catalog ..
  • Overview
  • Data Catalog
    • Getting Started
      • Data Sources
      • Process the data
      • Identify the data
      • Business Glossaries & Terms
      • Reference Data
      • Data Lineage
    • Management
      • Users, Roles & Community
      • Data Identification Methods
      • Business Rules
      • Metadata Rules
      • Schedules
      • Workers
    • Components
      • Keycloak
      • Reverse Proxy Server
      • App Server
      • Metadata Store
      • Worker Server
      • Observability
Powered by GitBook
On this page
  1. Overview

Pentaho Data Catalog ..

Why Pentaho Data Catalog ..

NextOverview

Last updated 2 months ago

This course is work in progress!

This course is in early access. The content is continually being updated over the next few months.

These Workshops are not intended for production environments.

Introduction

Pentaho Data Catalog serves as a comprehensive metadata management solution that helps organizations document, organize, and understand their data assets. It provides a centralized repository where data professionals can discover, understand, and govern data across the enterprise.

One of the primary use cases for Pentaho Data Catalog is data discovery and lineage tracking. Organizations with complex data ecosystems can use it to map relationships between different data sources, transformations, and outputs. This capability is particularly valuable for regulatory compliance, as it enables teams to trace how sensitive data moves through systems and who has access to it.

Another key application is business glossary management, where Pentaho Data Catalog helps bridge the gap between technical metadata and business terminology. This creates a common language across the organization, allowing business users to find and understand relevant data without requiring deep technical knowledge of underlying systems. For data governance initiatives, this capability ensures consistent definitions and usage of critical business terms.

Pentaho Data Catalog also supports impact analysis, helping teams understand how changes to data sources might affect downstream reports and applications. This proactive approach to change management reduces the risk of disruptions when modifying databases, ETL processes, or reporting structures.

These series of workshops introduce Pentaho Data Catalog and its capabilities to manage both structured and unstructured data efficiently. Through a combination of automated processes and machine learning, the workshops will guide you through the essential functions of data ingestion, profiling, and curation of multiple data sources.

By the end of the workshops, you will have a comprehensive understanding of:

Key Concepts & Terminology

Familiarize yourself with the foundational terminology and concepts used within the Pentaho Data Catalog environment.

Connecting to various Data Sources

Learn how to establish to a wide range of data sources to enable .

Ingesting & Profiling Data
Business Glossary & Terms
Rules

Explore how metadata rules are applied to data within the Pentaho Data Catalog to ensure consistency and relevance.


Overview

The following video introduces the main Topics covered in the Workshops.

To listen to the videos please copy and paste the website URL into your host Chrome browser, as there's no soundcard in the Lab environment.

Placeholder img for video ..

The following content has been automatically generated by an AI system and should be used for informational purposes only. We cannot guarantee the accuracy, completeness, or timeliness of the information provided. Any actions taken based on this content are at your own risk. We recommend seeking qualified expertise or conducting further research to validate and supplement the information provided.


Lab Environment ..

The video highlights the menu options which will help you get the best Lab experience ..

Placeholder img for video ..

The following content has been automatically generated by an AI system and should be used for informational purposes only. We cannot guarantee the accuracy, completeness, or timeliness of the information provided. Any actions taken based on this content are at your own risk. We recommend seeking qualified expertise or conducting further research to validate and supplement the information provided.

Login OS

Username

pdc

Password

password

PDC - System Admin role

Username

system_admin@hv.com

Password

Welcome123!

minIO

Username

minioadmin

Password

minioadmin

Portainer

Username

admin

Password

Portainer123


FAQs

VM is unresponsive !
  • Refresh the browser session to reconnect.

  • Try another browser. The recommended browseris Google Chrome.

  • If you're connecting via a Corporate VPN, then this may cause issues. Contact your IT dept to get the URL 'white' listed.

When does the Lab expire ?

The initial duration is 5 days. You will receive an email asking if you wish to extend your time limit.

Is there sound ?

Yes .. There's no sound card attched to the Lab, so you'll need to copy and paste the Lab Guide URL in your host machine browser. 😊

Where can I get a copy of the 'workshop files' ?

Sure .. All the collateral can be found at: ~/Workshop--Data-Integration. You can also copy/fork the Git repository.

gh repo clone jporeilly/Workshop--Pentaho-Data-Catalog

Corporate Headquarters Regional Contact Information

© Hitachi Vantara LLC 2025. All rights reserved. HITACHI is a trademark or registered trademark of Hitachi, Ltd. VSP is the trademark or registered trademark of Hitachi Vantara Corporation. All other trademarks, service marks and company names are properties of their respective owners.

Discover the methods used for ingesting and how assists in understanding your data's structure and quality.

Understand the significance of maintaining a and how it aids in aligning data with business terminology.

Americas: +1 866 374 5822 or

Europe, Middle East and Africa: +44 (0) 1753 618000 or

Asia Pacific: +852 3189 7900 or

business glossary
info@hitachivantara.com
info.emea@hitachivantara.com
info.marketing.apac@hitachivantara.com
connections
profiling
data ingestion