Monitoring API
The Monitoring API provides endpoints for tracking application performance, health status, and error monitoring. These endpoints are designed for operational observability and debugging.
Overview
The Monitoring API tracks:
- Request performance metrics (duration, success rate, percentiles)
- Slowest operations (top 20)
- Recent errors with full context
- System health (uptime, memory, CPU usage)
Key Features:
- Automatic performance tracking for all API endpoints
- In-memory metrics buffer (last 1000 requests)
- Slow request detection (>200ms threshold)
- Real-time health monitoring
- No authentication required (public endpoints)
Base URL
https://api.boardapi.io/api/v1/monitoringFor development:
http://localhost:4000/api/v1/monitoringEndpoints Overview
| Method | Endpoint | Description |
|---|---|---|
GET | /monitoring/performance/stats | Get performance statistics |
GET | /monitoring/performance/slowest | Get slowest operations |
GET | /monitoring/performance/errors | Get recent errors |
GET | /monitoring/health | Health check endpoint |
Get Performance Statistics
Returns aggregated performance metrics for the application.
Request
GET /api/v1/monitoring/performance/statsAuthentication: None (public endpoint)
Response
{
"timestamp": "2025-11-19T10:30:45.123Z",
"stats": {
"totalRequests": 1543,
"successRate": 98.5,
"averageDuration": 45.3,
"p50": 32,
"p95": 180,
"p99": 350,
"slowRequests": 23,
"errors": 12
}
}Response Fields
| Field | Type | Description |
|---|---|---|
timestamp | string | ISO 8601 timestamp of the response |
stats.totalRequests | number | Total number of tracked requests in buffer |
stats.successRate | number | Success rate as percentage (0-100) |
stats.averageDuration | number | Average request duration in milliseconds |
stats.p50 | number | 50th percentile (median) duration in ms |
stats.p95 | number | 95th percentile duration in ms |
stats.p99 | number | 99th percentile duration in ms |
stats.slowRequests | number | Number of slow requests (>200ms) |
stats.errors | number | Number of failed requests |
Example
curl http://localhost:4000/api/v1/monitoring/performance/statsUse Cases:
- Dashboard monitoring
- Performance trend analysis
- SLA compliance verification
- Capacity planning
Get Slowest Operations
Returns the 20 slowest successful operations tracked in the buffer.
Request
GET /api/v1/monitoring/performance/slowestAuthentication: None (public endpoint)
Response
{
"timestamp": "2025-11-19T10:30:45.123Z",
"operations": [
{
"requestId": "1732012245123-abc123xyz",
"className": "BoardsController",
"handlerName": "create",
"method": "POST",
"url": "/api/v1/boards",
"duration": 485,
"success": true,
"timestamp": "2025-11-19T10:25:30.000Z"
},
{
"requestId": "1732012200456-def456uvw",
"className": "WebhooksService",
"handlerName": "deliverWebhook",
"method": "POST",
"url": "/api/v1/webhooks/subscriptions",
"duration": 420,
"success": true,
"timestamp": "2025-11-19T10:20:15.000Z"
}
]
}Response Fields
| Field | Type | Description |
|---|---|---|
timestamp | string | ISO 8601 timestamp of the response |
operations | array | Array of performance metrics (max 20) |
operations[].requestId | string | Unique request identifier |
operations[].className | string | NestJS controller/service class name |
operations[].handlerName | string | Method/handler name |
operations[].method | string | HTTP method (GET, POST, etc.) or "WS" for WebSocket |
operations[].url | string | Request URL or handler name |
operations[].duration | number | Request duration in milliseconds |
operations[].success | boolean | Whether request succeeded |
operations[].timestamp | string | ISO 8601 timestamp of request |
Example
curl http://localhost:4000/api/v1/monitoring/performance/slowestUse Cases:
- Identifying performance bottlenecks
- Optimization prioritization
- Database query analysis
- API endpoint optimization
Get Recent Errors
Returns the 20 most recent failed requests with error details.
Request
GET /api/v1/monitoring/performance/errorsAuthentication: None (public endpoint)
Response
{
"timestamp": "2025-11-19T10:30:45.123Z",
"errors": [
{
"requestId": "1732012300789-ghi789rst",
"className": "BoardsController",
"handlerName": "findOneByToken",
"method": "GET",
"url": "/api/v1/boards/invalid-uuid",
"duration": 15,
"success": false,
"error": "Board not found",
"timestamp": "2025-11-19T10:28:30.000Z"
},
{
"requestId": "1732012150234-jkl012mno",
"className": "AuthController",
"handlerName": "validateBoardToken",
"method": "POST",
"url": "/api/v1/auth/validate-board-token",
"duration": 8,
"success": false,
"error": "Invalid or expired token",
"timestamp": "2025-11-19T10:15:45.000Z"
}
]
}Response Fields
| Field | Type | Description |
|---|---|---|
timestamp | string | ISO 8601 timestamp of the response |
errors | array | Array of error metrics (max 20) |
errors[].requestId | string | Unique request identifier |
errors[].className | string | NestJS controller/service class name |
errors[].handlerName | string | Method/handler name where error occurred |
errors[].method | string | HTTP method or "WS" for WebSocket |
errors[].url | string | Request URL or handler name |
errors[].duration | number | Request duration before failure (ms) |
errors[].success | boolean | Always false for errors |
errors[].error | string | Error message |
errors[].timestamp | string | ISO 8601 timestamp of error |
Example
curl http://localhost:4000/api/v1/monitoring/performance/errorsUse Cases:
- Error rate monitoring
- Debugging production issues
- Error pattern detection
- Alerting and incident response
Health Check
Returns the current health status of the application, including uptime, memory usage, and CPU metrics.
Request
GET /api/v1/monitoring/healthAuthentication: None (public endpoint)
Response
{
"status": "healthy",
"timestamp": "2025-11-19T10:30:45.123Z",
"uptime": 86400.5,
"memory": {
"rss": 125829120,
"heapTotal": 83296256,
"heapUsed": 52428800,
"external": 2097152,
"arrayBuffers": 1048576
},
"cpu": {
"user": 1500000,
"system": 500000
}
}Response Fields
| Field | Type | Description |
|---|---|---|
status | string | Health status (always "healthy" if responding) |
timestamp | string | ISO 8601 timestamp of the response |
uptime | number | Process uptime in seconds |
memory | object | Memory usage statistics (in bytes) |
memory.rss | number | Resident Set Size (total memory allocated) |
memory.heapTotal | number | Total heap size |
memory.heapUsed | number | Heap memory currently in use |
memory.external | number | Memory used by C++ objects bound to JS |
memory.arrayBuffers | number | Memory allocated for ArrayBuffers and SharedArrayBuffers |
cpu | object | CPU usage statistics (in microseconds) |
cpu.user | number | CPU time spent in user mode |
cpu.system | number | CPU time spent in system mode |
Example
curl http://localhost:4000/api/v1/monitoring/healthUse Cases:
- Load balancer health checks
- Uptime monitoring
- Resource usage tracking
- Kubernetes/Docker liveness probes
Implementation Details
Performance Tracking
The monitoring system automatically tracks all HTTP requests and WebSocket events through a NestJS interceptor:
- Buffer Size: 1000 most recent requests (circular buffer)
- Slow Request Threshold: 200ms
- Metrics Collected: Duration, success/failure, error messages, timestamps
- Auto-cleanup: Old metrics are automatically removed when buffer is full
Log Levels
// Debug: All successful requests (if debug logging enabled)
Logger.debug('BoardsController.create - 45ms [POST /api/v1/boards]')
// Warning: Slow requests (>200ms)
Logger.warn('Slow request: BoardsController.create - 485ms [POST /api/v1/boards]')
// Error: Failed requests
Logger.error('Failed request: BoardsController.findOneByToken - 15ms [GET /api/v1/boards/invalid-uuid] - Board not found')Memory Considerations
- The metrics buffer has a fixed size of 1000 entries
- Older metrics are automatically evicted (FIFO)
- No persistent storage - metrics are lost on restart
- Typical memory usage: ~500KB for full buffer
Integration Examples
Prometheus/Grafana
// Fetch and expose metrics for Prometheus
const response = await fetch('http://localhost:4000/api/v1/monitoring/performance/stats');
const { stats } = await response.json();
// Expose as Prometheus metrics
stats_requests_total.set(stats.totalRequests);
stats_success_rate.set(stats.successRate);
stats_duration_average.set(stats.averageDuration);
stats_duration_p95.set(stats.p95);
stats_slow_requests_total.set(stats.slowRequests);
stats_errors_total.set(stats.errors);Uptime Monitoring (UptimeRobot, Pingdom)
# Simple health check URL
https://api.boardapi.io/api/v1/monitoring/healthError Alerting
// Check for recent errors every minute
setInterval(async () => {
const response = await fetch('http://localhost:4000/api/v1/monitoring/performance/errors');
const { errors } = await response.json();
if (errors.length > 10) {
// Alert: High error rate detected
sendAlert(`High error rate: ${errors.length} errors in last 1000 requests`);
}
}, 60000);Dashboard Widget
// Real-time performance dashboard
async function updateDashboard() {
const stats = await fetch('/api/v1/monitoring/performance/stats').then(r => r.json());
document.getElementById('totalRequests').textContent = stats.stats.totalRequests;
document.getElementById('successRate').textContent = stats.stats.successRate.toFixed(2) + '%';
document.getElementById('p95Duration').textContent = stats.stats.p95 + 'ms';
document.getElementById('errorCount').textContent = stats.stats.errors;
}
setInterval(updateDashboard, 5000); // Update every 5 secondsBest Practices
1. Regular Monitoring
- Poll
/performance/statsevery 30-60 seconds for dashboards - Use
/healthfor load balancer health checks (every 10-30 seconds) - Check
/performance/errorswhen error rates spike
2. Alerting Thresholds
- Success Rate: Alert if < 95%
- P95 Duration: Alert if > 500ms
- Error Count: Alert if > 5% of total requests
- Memory Usage: Alert if heapUsed > 80% of heapTotal
3. Performance Optimization
- Identify slow operations via
/performance/slowest - Focus on endpoints with p95 > 200ms
- Investigate patterns in error messages
4. Production Deployment
- Consider restricting monitoring endpoints to internal network
- Use reverse proxy authentication for sensitive metrics
- Export metrics to external monitoring systems (Prometheus, DataDog)
Security Considerations
Public Endpoints: All monitoring endpoints are currently public (no authentication required).
Production Recommendations:
- Restrict access via reverse proxy (nginx, API Gateway)
- Use IP whitelisting for monitoring systems
- Consider adding API key authentication
- Do not expose sensitive error messages to external users
Example nginx configuration:
# Restrict monitoring endpoints to internal IPs
location /api/v1/monitoring {
allow 10.0.0.0/8; # Internal network
allow 192.168.0.0/16; # Local network
deny all;
proxy_pass http://backend:4000;
}Troubleshooting
No Metrics Available
Problem: /performance/stats returns all zeros
Solution:
- Metrics buffer is empty (no requests processed yet)
- Application recently restarted (metrics are not persisted)
- Make a few API requests to populate metrics
High Memory Usage
Problem: Application memory increasing over time
Solution:
- Metrics buffer has fixed size (1000 entries)
- Check for memory leaks elsewhere in application
- Monitor
heapUsedvia/healthendpoint
Missing Slow Requests
Problem: Known slow endpoint not appearing in /performance/slowest
Solution:
- Only successful requests are tracked
- Buffer size is limited to 1000 entries
- Slow requests may have been evicted if buffer is full
- Check if request is actually >200ms (threshold)
Related Documentation
- Error Codes Reference - HTTP status codes and error formats
- Webhooks API - Webhook delivery monitoring
- Authentication API - Token validation performance
Support
For monitoring-related issues:
- Check application logs for errors
- Verify endpoints are accessible via
curl - Review nginx/proxy configuration
- Contact DevOps team for production access