Data Sources

Adding Data Sources ..

Data Source Configuration Guide

Before integrating your data sources, it's essential to collect all the necessary configuration details. This guide outlines the key pieces of information required to establish a connection to your data sources. Your database administrator (DBA) will be a valuable resource in providing this configuration information.

  • URI (Uniform Resource Identifier): This unique identifier is used to locate your data source. You'll typically need a username and password to authenticate your connection.

  • Driver: Ensure you have the appropriate driver for your data source. This is crucial for enabling your application to communicate with the database.

Link to download database drivers
  • Credentials: Username and password are fundamental for authentication.

  • Host Name and Port Number: These are required to point your application towards the right server and port where your database is running.

  • Key Store and Trust Store Configuration:

* Key Store Type and Location: Specifies the type (e.g., JKS) and location of the key store file.

* Key Store Password: The password to access the key store.

* Trust Store Type and Location: Defines the type and location of the trust store file.

* Trust Store Password: The password required to access the trust store.

* Cipher Suite: A list of encryption algorithms supported for SSL/TLS connections.

  • Encryption Type:

* Encryption Only: Data is encrypted during transmission but does not require authentication.

* Encryption with Server and Client Authentication: Both the client and server authenticate each other, providing a higher security level.

* SSL Configuration: For secure connections, SSL configuration details are essential, including the cipher suite, key and trust store information.

* Data Source Type: Identifying the type of data source (e.g., SQL database, NoSQL database, file system) is crucial for selecting the correct driver and configuration settings.

Configuration Method: Decide on the approach for configuring your data source. This can be via direct credentials, SSL, or a URI.

Gathering this information beforehand will streamline the process of adding your data sources. Remember, your DBA is your go-to person for obtaining most of this configuration detail.

Link to: Data Sources

Accessing Your Catalog

To access your catalog, please follow these steps:

  1. Open Google Chrome web browser. and click on the bookmark, or

    Navigate to: https://pdc.pdc.lab/

  2. Enter the following email and password, then click Sign In.

Password

Welcome123!

Security Advisory: Handling Login Credentials

  1. Click on: Management -> Resources tile.

Resources
  1. Click on: Add Data Source.

  2. Specify the following basic information for the connection to your data source (you'll find the connection details in the table below these descriptions):

Field
Description

Data Source Name

Specify the name of your data source. This name is used in the Data Catalog interface. It should be something your Data Catalog users recognize.

Names must start with a letter, and must contain only letters, digits, and underscores. White spaces in names are not supported.

Data Source ID (Optional)

Specify a permanent identifier for your data source. If you leave this field blank, Data Catalog generates a permanent identifier for you.

You cannot modify Data Source ID for this data source after you specify or generate it.

Description (Optional)

Specify a description of your data source.

Data Source Type

Select the database type of your source. You are then prompted to specify additional connection information based on the file system or database type you are trying to access.

After you have specified the basic information, specify the following additional connection information based on the file system or database type you are trying to access.

Field
Description

Affinity

This default setting specifies which agents should be associated with the data source in a multi-agent deployment.

Configuration Method: Select Credentials or URI as a configuration method.

Configuration Method: Credentials

• Username/Password: Credentials that provide access to the specified database.

• Host: The address of the machine where the Microsoft SQL database server is running. It can be an IP address or a domain name.

• Port: The port number on which the Microsoft SQL server is listening for incoming connections. The default port is 5432.

Configuration Method: URI

• Username/Password: Credentials that provide access to the specified database.

• Service URI: For example, URL would look like Server=myServerAddress;Database=myDatabase;User Id=myUsername;Password=myPassword;Port=1433;Integrated Security=False;Connection Timeout=30;.

Driver

Select an existing driver or upload a new driver to ensure that the communication between the application and the database is efficient, secure, and follows the required standards.

Database Name

The name of the database within the Microsoft SQL server that you want to connect with.


Connect to Demo Data Sources

Follow the steps below to connect to one of the demo datasets. In this workshop we're going to connect to the Synthea dataset, stored on a PostgreSQL database:

  1. To install the 'Synthea' demo datasource, click on the PostgreSQL tab below:

SyntheaTM is an open-source tool that generates synthetic patient data, simulating individuals' complete medical histories. This encompasses medications, allergies, encounters, and social health determinants for each mock patient.

Synthea

The generated data is free from legal and privacy concerns.

synthea - ERD

Create a connection to the Synthea dataset, then ingest the database schema.

Synthea dataset

Follow the steps below to connect and ingest the schema metadata:

Test Connection and Ingest Metadata Schema ..

After you have specified the detailed information according to your data source type, test the connection to the data source and add the data source.

  1. Enter the following details to connect to: PostgreSQL business_apps_db (Synthea) database.

Field
Setting

Data Source Name

postgresql:synthea

Data Source ID

Leave Blank to autogenerate

Description

Demo dataset of patients medical records

Data Source Type

PostgreSQL

Affinity

Default

Configuration Method

Credentials

Username

sqlreader

Password

2Petabytes

*Host

pdc.pdc.lab

Port

5432

**Driver

postgresql-42.7.1.jar

Database Name

business_apps_db

  1. After you have specified the detailed information according to your data source type, test the connection to the data source and add the data source.

  2. Click Test Connection to test your connection to the specified data source.

  3. Take a look at the 'workers' to check for any issues.

Worker - Test Connection
  1. Click Ingest Schema, select the 'synthea' schema, and then click Ingest Schemas.

Select schemas

While you have the option to select all schemas, it is advisable to exclude system-related schemas that are not relevant to your requirements.

Ingesting Schemas
  1. (Optional) Enter a Note for any information you need to share with others who might access this data source.

  2. Click: Create Data Source to establish your data source connection.

postgresql:synthea connection
Connection details

Last updated