Process the data
Collect statistical and informative summaries about the data ..
Last updated
Collect statistical and informative summaries about the data ..
Last updated
After successfully ingesting the metadata from our database schemas, our next step focuses on the data - 'Patient' table. This phase is essential for evaluating the data's structure, quality, and integrity to ensure our database's efficiency and effectiveness. Although currently limited to database metadata, Data Profiling will offer deeper insights, which we will explore later in this workshop.
To access your catalog, please follow these steps:
Open Google Chrome web browser. and click on the bookmark, or
Navigate to: https://pdc.pentaho.example/
Enter the following email and password, then click Sign In.
Username
data_steward@hv.com
Password
Welcome123!
For enhanced security, it is strongly recommended that users avoid saving their login details directly in web browsers. Browsers may inadvertently autofill these credentials in unrelated fields, posing a security risk.
Best Practice
• Disable Autofill: To mitigate potential risks, users should disable the autofill functionality for login credentials in their browser settings. This preventive measure ensures that sensitive information is not unintentionally exposed or misused.
Select 'Data Canvas' from the left menu option.
Click the checkbox to select the 'synthea' schema.
For optimal performance, keep your selection within a practical limit. Processing a very large number of tables, like 100,000, can drastically reduce speed. The default settings on the Configure Data Profiling page generally fulfill most needs and are recommended.
Click 'Process'.
In the process of managing both structured and unstructured data, two critical steps stand out: Metadata Ingest and Data Profiling. This distinction is essential for ensuring data quality and accessibility.
Metadata ingest is a foundational process in data management within a Data Catalog. It involves the automatic collection of metadata — the data about data — from a database schema / file / object. This step is crucial for understanding and organizing the data, making it easily accessible for further analysis and data profiling.
Navigate to the metadata ingest section of your Data Catalog tool and initiate the process by clicking the Start
button.
Users can select specific tables or datasets for metadata ingestion. For example, if you are interested in patient information, you might expand the 'patients' table and opt for relevant fields such as 'passport'.
After starting the ingest process, monitor its progress on the Manage Workers page. This page provides real-time updates on the ingestion task.
Next -> 2. Data Profiling