Skip to main content

Monitoring and Logging

Overview

Proper monitoring and logging are essential for maintaining a healthy production system. This guide covers how BookWish tracks errors, performance, and system health.

Logging Strategy

Log Levels

BookWish uses structured logging with these levels:

LevelUsageExample
debugDevelopment detailsFunction entry/exit, variable values
infoNormal operationsUser login, order created
warnUnusual but handledSlow query, rate limit approached
errorErrors that were caughtAPI timeout, validation failure
fatalCritical system failureDatabase unreachable, service crash

Backend Logging

Using the Logger Utility:

import { logger } from '../lib/logger';

// Info level - normal operations
logger.info('user.login', { userId, email });

// Warning level - concerning but handled
logger.warn('payment.retry', {
orderId,
attempt: 3,
reason: 'timeout'
});

// Error level - something went wrong
logger.error('database.query.failed', {
query: 'SELECT * FROM users',
error: error.message,
stack: error.stack
});

Never Use Console:

// ❌ BAD - No structured logging
console.log('User logged in:', userId);
console.error('Error:', error);

// ✅ GOOD - Structured logging
logger.info('user.login', { userId });
logger.error('auth.login.failed', {
error: error.message,
userId
});

Frontend Logging

Flutter App:

// Production - no logging
// Development - use debugPrint sparingly

// For errors in production, report to error tracking
try {
await someOperation();
} catch (e) {
// Report to error tracking service
ErrorTracker.report(e);
}

Error Tracking

Sentry Integration (Placeholder)

Backend Setup:

import * as Sentry from '@sentry/node';

Sentry.init({
dsn: process.env.SENTRY_DSN,
environment: process.env.NODE_ENV,
tracesSampleRate: 1.0,
});

// Capture errors
try {
await processPayment(order);
} catch (error) {
Sentry.captureException(error, {
tags: {
feature: 'payment',
orderId: order.id
}
});
throw error;
}

Flutter Setup:

import 'package:sentry_flutter/sentry_flutter.dart';

await SentryFlutter.init(
(options) {
options.dsn = 'your-dsn-here';
options.environment = 'production';
},
appRunner: () => runApp(MyApp()),
);

// Capture errors
try {
await fetchData();
} catch (error, stackTrace) {
await Sentry.captureException(
error,
stackTrace: stackTrace,
);
}

Error Context

Always include relevant context with errors:

logger.error('payment.process.failed', {
orderId: order.id,
userId: user.id,
amount: order.total,
paymentMethod: order.paymentMethod,
error: error.message,
stack: error.stack,
timestamp: new Date().toISOString()
});

Performance Monitoring

Application Performance Monitoring (APM)

Placeholder for APM Service (e.g., DataDog, New Relic)

// Track transaction performance
const transaction = apm.startTransaction('process-order');

try {
await validateOrder(order);
await chargePayment(order);
await createShipment(order);

transaction.result = 'success';
} catch (error) {
transaction.result = 'error';
throw error;
} finally {
transaction.end();
}

Database Query Monitoring

// Log slow queries in development
if (process.env.NODE_ENV === 'development') {
const start = Date.now();
const result = await db.query(sql);
const duration = Date.now() - start;

if (duration > 1000) {
logger.warn('slow.query', { sql, duration });
}
}

API Response Times

// Middleware to track response times
app.use((req, res, next) => {
const start = Date.now();

res.on('finish', () => {
const duration = Date.now() - start;

logger.info('http.request', {
method: req.method,
path: req.path,
status: res.statusCode,
duration
});

if (duration > 3000) {
logger.warn('slow.request', {
method: req.method,
path: req.path,
duration
});
}
});

next();
});

System Metrics

Health Checks

Basic Health Endpoint:

app.get('/health', async (req, res) => {
const health = {
status: 'healthy',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
checks: {
database: await checkDatabase(),
redis: await checkRedis(),
storage: await checkStorage()
}
};

const isHealthy = Object.values(health.checks)
.every(check => check.status === 'ok');

res.status(isHealthy ? 200 : 503).json(health);
});

async function checkDatabase() {
try {
await prisma.$queryRaw`SELECT 1`;
return { status: 'ok' };
} catch (error) {
return { status: 'error', message: error.message };
}
}

Resource Metrics

Track these key metrics (placeholder for monitoring service):

  • CPU usage - Alert if > 80% for 5 minutes
  • Memory usage - Alert if > 85%
  • Disk usage - Alert if > 90%
  • Network I/O - Track bandwidth usage
  • Database connections - Alert if pool exhausted

Alerting

Alert Levels

LevelResponse TimeExamples
CriticalImmediate (PagerDuty)Database down, API unreachable
HighWithin 15 minutesHigh error rate, payment failures
MediumWithin 1 hourSlow queries, cache misses
LowNext business dayDeprecation warnings, low disk space

Alert Channels

Placeholder Configuration:

# alerts.yml
alerts:
- name: database-down
condition: database.connection.failed
severity: critical
notify: pagerduty

- name: high-error-rate
condition: error.rate > 10/min
severity: high
notify: slack

- name: slow-response
condition: response.time.p95 > 3000ms
severity: medium
notify: email

Log Aggregation

Centralized Logging (Placeholder)

Example with CloudWatch/DataDog/Logstash:

// All logs automatically shipped to central service
logger.info('event.name', { data });
// → CloudWatch Logs → DataDog → Alert if needed

Log Queries

Common queries to have ready:

-- Find all errors for a user
SELECT * FROM logs
WHERE userId = 'xxx'
AND level = 'error'
AND timestamp > NOW() - INTERVAL '1 day';

-- Find slow queries
SELECT * FROM logs
WHERE event = 'database.query'
AND duration > 1000
ORDER BY duration DESC;

-- Error rate by endpoint
SELECT path, COUNT(*) as errors
FROM logs
WHERE level = 'error'
AND timestamp > NOW() - INTERVAL '1 hour'
GROUP BY path
ORDER BY errors DESC;

Dashboards

Key Metrics Dashboard (Placeholder)

Create dashboards showing:

Application Health:

  • Request rate (requests/minute)
  • Error rate (errors/minute)
  • Response time (p50, p95, p99)
  • Uptime percentage

Business Metrics:

  • Orders per hour
  • New user signups
  • Active users
  • Revenue (if applicable)

Infrastructure:

  • CPU and memory usage
  • Database connection pool
  • Cache hit rate
  • API rate limit usage

Example Dashboard Widgets

// Placeholder dashboard configuration
{
"dashboard": "BookWish Production",
"widgets": [
{
"type": "timeseries",
"title": "Requests per Minute",
"query": "SELECT COUNT(*) FROM logs WHERE event = 'http.request'"
},
{
"type": "number",
"title": "Active Users (24h)",
"query": "SELECT COUNT(DISTINCT userId) FROM logs WHERE timestamp > NOW() - INTERVAL '24 hours'"
},
{
"type": "gauge",
"title": "Error Rate",
"query": "SELECT (errors / total) * 100 FROM request_stats"
}
]
}

Production Checklist

Before launching monitoring:

  • Error tracking configured (Sentry or similar)
  • APM tool installed (DataDog, New Relic, etc.)
  • Health check endpoint implemented
  • Critical alerts configured
  • Dashboards created and shared
  • On-call rotation established
  • Runbooks documented for common issues
  • Log retention policy defined
  • Backup monitoring in place

Troubleshooting

High Error Rates

  1. Check error tracking dashboard
  2. Identify most common error types
  3. Review recent deployments
  4. Check external service status
  5. Review resource usage

Slow Performance

  1. Check APM traces
  2. Identify slow endpoints
  3. Review database query performance
  4. Check cache hit rates
  5. Look for resource constraints

Missing Logs

  1. Verify logger is configured correctly
  2. Check log shipping configuration
  3. Verify network connectivity
  4. Check log retention settings
  5. Review IAM permissions

Best Practices

Do's

✅ Use structured logging with context ✅ Set up alerts for critical failures ✅ Monitor business metrics, not just technical ✅ Keep dashboards focused and actionable ✅ Document how to respond to alerts ✅ Review logs regularly for patterns

Don'ts

❌ Log sensitive data (passwords, credit cards) ❌ Use console.log in production ❌ Create alerts without clear actions ❌ Ignore warning-level logs ❌ Log excessively in hot paths ❌ Ship logs without sanitization

Security Considerations

Log Sanitization

// Remove sensitive data before logging
function sanitize(data: any) {
const sanitized = { ...data };

// Remove sensitive fields
delete sanitized.password;
delete sanitized.creditCard;
delete sanitized.ssn;

// Mask email addresses
if (sanitized.email) {
sanitized.email = maskEmail(sanitized.email);
}

return sanitized;
}

logger.info('user.created', sanitize(userData));

Access Control

  • Limit log access to authorized personnel
  • Use role-based access for dashboards
  • Audit log access
  • Encrypt logs at rest and in transit

References