Pentaho Data Catalog
Data QualityData IntegrationBusiness Analytics
  • Overview
    • Pentaho Data Catalog ..
  • Overview
  • Data Catalog
    • Getting Started
      • Data Sources
      • Process the data
      • Identify the data
      • Business Glossaries & Terms
      • Reference Data
      • Data Lineage
    • Management
      • Users, Roles & Community
      • Data Identification Methods
      • Business Rules
      • Metadata Rules
      • Schedules
      • Workers
    • Components
      • Keycloak
      • Reverse Proxy Server
      • App Server
      • Metadata Store
      • Worker Server
      • Observability
Powered by GitBook
On this page
  1. Data Catalog
  2. Components

Observability

PreviousWorker Server

Last updated 8 months ago

Observability Stack Overview

Our observability infrastructure comprises various tools designed to capture and analyze telemetry data, ensuring the health and performance of our systems. The stack is structured around three primary data types: traces, logs, and metrics.

Metrics

Metrics collection utilizes the Otel Collector along with a suite of exporters included within our Platform Data Collection (PDC) framework. This includes tools like the Node Exporter, MongoDB Exporter, and cAdvisor, among others. Together, they provide a comprehensive view of our system's performance and usage statistics.

Traces

For tracing, we employ the OpenTelemetry (Otel) Collector, which facilitates capturing and managing trace data across our distributed systems. This component is key for understanding request lifecycles and inter-service dependencies.

Logs

Log aggregation and management are handled by Fluent Bit. This lightweight data processor is part of our future work plans to enhance log analysis and storage capabilities. Stay tuned for updates in this area.

cAdvisor (short for container Advisor) analyzes and exposes resource usage and performance data from running containers. cAdvisor exposes Prometheus metrics out of the box.

cAdvisor is not enabled by default.

To enable cAdvisor:

  1. Navigate to deployment folder:

cd
cd /opt/pentaho/pdc-docker-deployment/vendor
sudo nano docker-compose.mom.yml
  1. Scroll down to the mon-cadvisor: section

  2. Uncomment all the lines:

  1. Save.

CTRL + o
ENTER
CTRL + x

Note the profile: mon_enhanced


Enable the OTEL collector to scrap the logs.

  1. Navigate to deployment folder:

cd
cd /opt/pentaho/pdc-docker-deployment/vendor/mon/otel_col
sudo nano otelcol-config.yml
  1. Uncomment cadvisor.

  1. Save.

CTRL + o
ENTER
CTRL + x

COMPOSE_PROFILES

Ensure the profile has been enabled

  1. Navigate to deployment folder:

cd
cd /opt/pentaho/pdc-docker-deployment/vendor
sudo nano .env.default
  1. Add mon_enhanced

  1. Save.

CTRL + o
ENTER
CTRL + x
  1. You will need to restart PDC to deploy the cAdvisor container.

cd
cd /opt/pentaho/pdc-docker-deployment
./pdc.sh restart
  1. Check cAdvisor container is up and running.

docker ps -n 1

  1. Log into Portainer either by clicking on the bookmark or

  2. Enter credentials.

Username

admin

Password

Portainer123

  1. Click on 'Live Connect' option.

  2. Make a note of the mon_cadvisor-1 container IP address & port.

  1. In your brower enter: http://[mon_cadvisor IP:8080]

Be aware that the IP & port is exposed.

OpenObserve is a comprehensive observability platform designed to provide insights into the health and performance of IT systems. At its core, OpenObserve integrates seamlessly with a suite of tools for collecting metrics, traces, and logs— the three primary pillars of observability.

This section is for reference only.

OpenObserve has been enabled by default.

  1. Edit the .env.default

cd
cd /opt/pentaho/pdc-docker-deployment/vendor
sudo nano .env.default
  1. Check the following parameter has been added.

COMPOSE_PROFILES=mongodb,collab,pdso,mon_enhanced
  1. Save.

CTRL + o
ENTER
CTRL + x
  1. You will need to restart PDC to deploy the container.

cd
cd /opt/pentaho/pdc-docker-deployment
./pdc.sh up

OpenObserve

  1. Log in using the credentials provided below:

Username

root@example.com

Password

Complexpass123

  1. Ensure that the organisation on the top right of the home page has been set to: pdc

Logs

OpenObserve provides a centralized log management interface that allows users to easily search, filter, and analyze log data from various sources. This aids in troubleshooting issues and understanding the system's behavior over time.

  1. Select: Logs from the left-hand menu.

You can filter logs by specific fields or keywords and visualize the log events in chronological order. This enables quick identification of patterns and anomalies in the log data.

As you type the SELECT statement, you will be prompted ..!

  1. Ensure your in SQL mode and type in the following Query.

SELECT * FROM "default" WHERE container_name='/pdc-mongodb-1' AND attr_principalname='root'

A simplier method is to select the field/value from the list

  1. Delete the query and select the required fields.

This will also give us an idea of the number of records.

Metrics

Each exporter utilizes a specific naming convention for its metrics, facilitating the identification of their sources.

For instance, metrics from the node exporter, responsible for server-specific data, start with node_*, while metrics collected by cAdvisor, which targets metrics from all Docker containers, begin with container_*

  1. Select: Metrics from the left-hand menu.

In this example: node_memory_MemAvailable_bytes

The screenshot displays the total memory (bytes) available for the last 4 days.

Choose any metric and confirm that:

  • A time series graph can be produced for any metric stored in Prometheus

  • Metric values are being recorded in real time

  • The PromQL editor can be used to drill down on metrics values with a specific parameter.

Traces are one of its key components of OpenObserve.

Traces help you understand the flow of requests across multiple services in a distributed system. This is crucial for identifying bottlenecks and optimizing performance.

By linking errors to specific traces, you can quickly identify the root cause of issues and the context in which they occurred.

Traces reveal how different services interact, helping you understand and manage dependencies in your system.

With trace data, you can identify areas for potential optimization, such as reducing unnecessary API calls or improving database queries.

Enable Tracing

  1. Log into OpenObserve and select Ingestion -> Traces (Open Telementry)

  1. Make a note of the OLTP gRPC settings (you will have a different Auth token)

endpoint: localhost
  Authorization: "Basic cm9vdEBleGFtcGxlLmNvbTpKeFJHYTh6UEFSZUZUSUlt"
  organization: pdc
  stream-name: default
tls:
  insecure: false
  1. Edit the following OTEL configuration file.

cd
sudo nano /opt/pentaho/pdc-docker-deployment/vendor/mon/otelcol.config.yml
  1. Restart PDC

cd
cd /opt/pentaho/pdc-docker-deployment
./pdc.sh restart

Searching for a Specific Trace

To locate a specific trace by its traceID, you can refine your search by editing the query in the query editor. Use the field name trace_id to direct your search to a particular trace.

On the left side of the page, you'll find a list of field names that assist in filtering traces. For instance, to explore traces originating from the front end of the PDC, you can input the following query into the editor:

str_match(service_name, 'pdc-web-client')

This query retrieves all traces associated with the pdc-web-client service, allowing for a focused analysis of front-end activities.

x

x

Query Functions

OpenObserve supports a variety of functions to manipulate and analyze data effectively. These functions can be used within queries to perform operations like aggregations, calculations, and transformations on collected metrics, traces, and logs.

Aggregation Functions: Functions such as SUM(), AVG(), and COUNT() allow for the aggregation of data points over a specified interval.

Transformation Functions: Functions like TOPK(), PERCENTILE(), and RATE() help in transforming raw data into useful insights.

Math Functions: Basic arithmetic functions (+, -, *, /) can be applied to metrics for custom calculations.

String Functions: Functions such as str_match() and str_replace() aid in manipulating text-based log and trace data.

Aggregation Functions: Functions such as SUM(), AVG(), and COUNT() allow for the aggregation of data points over a specified interval.

Transformation Functions: Functions like TOPK(), PERCENTILE(), and RATE() help in transforming raw data into useful insights.

Math Functions: Basic arithmetic functions (+, -, *, /) can be applied to metrics for custom calculations.

String Functions: Functions such as str_match() and str_replace() aid in manipulating text-based log and trace data.

Combining these functions within queries can help in deriving meaningful and actionable insights from your telemetry data.

x

x

x

x

VRL Functions

OpenObserve integrates VRL (Vector Remap Language) for complex data transformations and log manipulations. VRL functions provide a powerful way to convert, enrich, and process logs, metrics, and traces with ease.

Basic Functions: Use functions like parse_json(), to_string(), and to_int() for basic data type conversions.

Conditional Functions: Implement conditional logic with if, else, and case statements.

String Functions: Manipulate text data using upcase(), downcase(), trim(), and substring() functions.

Log Enrichment: Enrich logs with metadata or additional context using functions like add_field() and merge().

As a simple example we're going to add a timestamp to the logs.

Let's take a look at the different fields in the default log stream.

  1. Click on a row and select table to view the fields.

We're going to add a formatted_date field to the logs to help with selecting range - dates in our dashboard.

  1. The following VRL function will add a field: formatted_date to the logs.

# Create a timestamp (current time)
current_timestamp = now()

# Format the current timestamp into a standard date format
.formatted_date, err = format_timestamp(current_timestamp, "%Y-%m-%d %H:%M:%S")

# Output the formatted date
.
  1. Copy and paste the following into the VRL Function Editor.

current_timestamp = now()
.formatted_date, err = format_timestamp(current_timestamp, "%Y-%m-%d %H:%M:%S")
.
  1. Execute the VRL function & check the log file.

  1. Save the function so that you can apply the function to all incoming pdc logs.

  1. Click on the function option in the sidebar & select: Stream Association.

  2. Highlight the default logs stream & associate the formatted_date function.

The function will now be applied to all incoming pdc logs.

Alerts in OpenObserve

Alerts in OpenObserve enable proactive monitoring by notifying users of anomalies or performance issues in real-time.

x

  1. Click on the "Alerts" option from the left-hand menu.

  2. Click on 'Create Template'.

x

x

Dashboards in OpenObserve

Dashboards in OpenObserve offer a consolidated view of metrics, logs, and traces, enabling a holistic perspective on system health and performance.

Let's create a simple dashboard that monitors OS resources.

  1. Click on the "Dashboards" option from the left-hand menu.

  2. Click on 'New Folder' and enter the following details:

  1. Click 'Save'.

  2. Click on 'New Dasboard' and enter the following details:

  1. Click 'Save'.

  2. Start by clicking on 'ADD PANEL'

  1. Use the editor to add widgets for visualizing telemetry data.

x

x

x

  • Graphs: Plot time-series data for real-time monitoring.

  • Tables: Display logs or metrics in tabular form.

  • Heatmaps: Identify patterns and anomalies.

x


Community Dashboards

Explore and import pre-built dashboards shared by the OpenObserve community. These can serve as a quick start for common monitoring scenarios.

  1. To import a dashboard, browse to:

~/Workshop--Pentaho-Data-Catalog/Dashboards

x

x

Open your web browser and navigate to the following URL:

https://localhost:9443/#!/auth
http://localhost/internal/openobserve/web
OpenObserve | Open Source Observability Platform for Logs, Metrics, Traces, and More – Your Ultimate Dashboard for Alerts and InsightsOpen Source Observability Platform for Logs, Metrics, Traces, and More – Your Ultimate Dashboard for Alerts and Insights
Link to OpenObserve
Logo
VRL function referencevectordotdev
Link to VRL Reference
Logo
Observability
Uncomment cAdvisor
Enable Prometheus to scrap
add mon_enhanced
Restart PDC
mon_advisor-1 container IP & port
cAdvisor UI
OpenObserve - Login
OpenObserve
SQL Mode
search for field
node_memory_MemAvailable_bytes
Ingestion
Configure endpoints
pdc-web-client
Fields in Log File - pdc default stream
formatted_date field
save function
Associate function with stream
Create Template
Create a folder
New Dashboard