Observability
Last updated
Last updated
Our observability infrastructure comprises various tools designed to capture and analyze telemetry data, ensuring the health and performance of our systems. The stack is structured around three primary data types: traces, logs, and metrics.
Metrics collection utilizes the Otel Collector along with a suite of exporters included within our Platform Data Collection (PDC) framework. This includes tools like the Node Exporter, MongoDB Exporter, and cAdvisor, among others. Together, they provide a comprehensive view of our system's performance and usage statistics.
For tracing, we employ the OpenTelemetry (Otel) Collector, which facilitates capturing and managing trace data across our distributed systems. This component is key for understanding request lifecycles and inter-service dependencies.
Log aggregation and management are handled by Fluent Bit. This lightweight data processor is part of our future work plans to enhance log analysis and storage capabilities. Stay tuned for updates in this area.
cAdvisor (short for container Advisor) analyzes and exposes resource usage and performance data from running containers. cAdvisor exposes Prometheus metrics out of the box.
cAdvisor is not enabled by default.
To enable cAdvisor:
Navigate to deployment folder:
Scroll down to the mon-cadvisor: section
Uncomment all the lines:
Save.
Note the profile: mon_enhanced
Navigate to deployment folder:
Uncomment cadvisor.
Save.
Ensure the profile has been enabled
Navigate to deployment folder:
Add mon_enhanced
Save.
You will need to restart PDC to deploy the cAdvisor container.
Check cAdvisor container is up and running.