Monitoring and Logging

Overview

Proper monitoring and logging are essential for maintaining a healthy production system. This guide covers how BookWish tracks errors, performance, and system health.

Logging Strategy

Log Levels

BookWish uses structured logging with these levels:

Level	Usage	Example
`debug`	Development details	Function entry/exit, variable values
`info`	Normal operations	User login, order created
`warn`	Unusual but handled	Slow query, rate limit approached
`error`	Errors that were caught	API timeout, validation failure
`fatal`	Critical system failure	Database unreachable, service crash

Backend Logging

Using the Logger Utility:

import { logger } from '../lib/logger';

// Info level - normal operations
logger.info('user.login', { userId, email });

// Warning level - concerning but handled
logger.warn('payment.retry', {
  orderId,
  attempt: 3,
  reason: 'timeout'
});

// Error level - something went wrong
logger.error('database.query.failed', {
  query: 'SELECT * FROM users',
  error: error.message,
  stack: error.stack
});

Never Use Console:

// ❌ BAD - No structured logging
console.log('User logged in:', userId);
console.error('Error:', error);

// ✅ GOOD - Structured logging
logger.info('user.login', { userId });
logger.error('auth.login.failed', {
  error: error.message,
  userId
});

Frontend Logging

Flutter App:

// Production - no logging
// Development - use debugPrint sparingly

// For errors in production, report to error tracking
try {
  await someOperation();
} catch (e) {
  // Report to error tracking service
  ErrorTracker.report(e);
}

Error Tracking

Sentry Integration (Placeholder)

Backend Setup:

import * as Sentry from '@sentry/node';

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  environment: process.env.NODE_ENV,
  tracesSampleRate: 1.0,
});

// Capture errors
try {
  await processPayment(order);
} catch (error) {
  Sentry.captureException(error, {
    tags: {
      feature: 'payment',
      orderId: order.id
    }
  });
  throw error;
}

Flutter Setup:

import 'package:sentry_flutter/sentry_flutter.dart';

await SentryFlutter.init(
  (options) {
    options.dsn = 'your-dsn-here';
    options.environment = 'production';
  },
  appRunner: () => runApp(MyApp()),
);

// Capture errors
try {
  await fetchData();
} catch (error, stackTrace) {
  await Sentry.captureException(
    error,
    stackTrace: stackTrace,
  );
}

Error Context

Always include relevant context with errors:

logger.error('payment.process.failed', {
  orderId: order.id,
  userId: user.id,
  amount: order.total,
  paymentMethod: order.paymentMethod,
  error: error.message,
  stack: error.stack,
  timestamp: new Date().toISOString()
});

Performance Monitoring

Application Performance Monitoring (APM)

Placeholder for APM Service (e.g., DataDog, New Relic)

// Track transaction performance
const transaction = apm.startTransaction('process-order');

try {
  await validateOrder(order);
  await chargePayment(order);
  await createShipment(order);

  transaction.result = 'success';
} catch (error) {
  transaction.result = 'error';
  throw error;
} finally {
  transaction.end();
}

Database Query Monitoring

// Log slow queries in development
if (process.env.NODE_ENV === 'development') {
  const start = Date.now();
  const result = await db.query(sql);
  const duration = Date.now() - start;

  if (duration > 1000) {
    logger.warn('slow.query', { sql, duration });
  }
}

API Response Times

// Middleware to track response times
app.use((req, res, next) => {
  const start = Date.now();

  res.on('finish', () => {
    const duration = Date.now() - start;

    logger.info('http.request', {
      method: req.method,
      path: req.path,
      status: res.statusCode,
      duration
    });

    if (duration > 3000) {
      logger.warn('slow.request', {
        method: req.method,
        path: req.path,
        duration
      });
    }
  });

  next();
});

System Metrics

Health Checks

Basic Health Endpoint:

app.get('/health', async (req, res) => {
  const health = {
    status: 'healthy',
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    checks: {
      database: await checkDatabase(),
      redis: await checkRedis(),
      storage: await checkStorage()
    }
  };

  const isHealthy = Object.values(health.checks)
    .every(check => check.status === 'ok');

  res.status(isHealthy ? 200 : 503).json(health);
});

async function checkDatabase() {
  try {
    await prisma.$queryRaw`SELECT 1`;
    return { status: 'ok' };
  } catch (error) {
    return { status: 'error', message: error.message };
  }
}

Resource Metrics

Track these key metrics (placeholder for monitoring service):

CPU usage - Alert if > 80% for 5 minutes
Memory usage - Alert if > 85%
Disk usage - Alert if > 90%
Network I/O - Track bandwidth usage
Database connections - Alert if pool exhausted

Alerting

Alert Levels

Level	Response Time	Examples
Critical	Immediate (PagerDuty)	Database down, API unreachable
High	Within 15 minutes	High error rate, payment failures
Medium	Within 1 hour	Slow queries, cache misses
Low	Next business day	Deprecation warnings, low disk space

Alert Channels

Placeholder Configuration:

# alerts.yml
alerts:
  - name: database-down
    condition: database.connection.failed
    severity: critical
    notify: pagerduty

  - name: high-error-rate
    condition: error.rate > 10/min
    severity: high
    notify: slack

  - name: slow-response
    condition: response.time.p95 > 3000ms
    severity: medium
    notify: email

Log Aggregation

Centralized Logging (Placeholder)

Example with CloudWatch/DataDog/Logstash:

// All logs automatically shipped to central service
logger.info('event.name', { data });
// → CloudWatch Logs → DataDog → Alert if needed

Log Queries

Common queries to have ready:

-- Find all errors for a user
SELECT * FROM logs
WHERE userId = 'xxx'
AND level = 'error'
AND timestamp > NOW() - INTERVAL '1 day';

-- Find slow queries
SELECT * FROM logs
WHERE event = 'database.query'
AND duration > 1000
ORDER BY duration DESC;

-- Error rate by endpoint
SELECT path, COUNT(*) as errors
FROM logs
WHERE level = 'error'
AND timestamp > NOW() - INTERVAL '1 hour'
GROUP BY path
ORDER BY errors DESC;

Dashboards

Key Metrics Dashboard (Placeholder)

Create dashboards showing:

Application Health:

Request rate (requests/minute)
Error rate (errors/minute)
Response time (p50, p95, p99)
Uptime percentage

Business Metrics:

Orders per hour
New user signups
Active users
Revenue (if applicable)

Infrastructure:

CPU and memory usage
Database connection pool
Cache hit rate
API rate limit usage

Example Dashboard Widgets

// Placeholder dashboard configuration
{
  "dashboard": "BookWish Production",
  "widgets": [
    {
      "type": "timeseries",
      "title": "Requests per Minute",
      "query": "SELECT COUNT(*) FROM logs WHERE event = 'http.request'"
    },
    {
      "type": "number",
      "title": "Active Users (24h)",
      "query": "SELECT COUNT(DISTINCT userId) FROM logs WHERE timestamp > NOW() - INTERVAL '24 hours'"
    },
    {
      "type": "gauge",
      "title": "Error Rate",
      "query": "SELECT (errors / total) * 100 FROM request_stats"
    }
  ]
}

Production Checklist

Before launching monitoring:

Error tracking configured (Sentry or similar)
APM tool installed (DataDog, New Relic, etc.)
Health check endpoint implemented
Critical alerts configured
Dashboards created and shared
On-call rotation established
Runbooks documented for common issues
Log retention policy defined
Backup monitoring in place

Troubleshooting

High Error Rates

Check error tracking dashboard
Identify most common error types
Review recent deployments
Check external service status
Review resource usage

Slow Performance

Check APM traces
Identify slow endpoints
Review database query performance
Check cache hit rates
Look for resource constraints

Missing Logs

Verify logger is configured correctly
Check log shipping configuration
Verify network connectivity
Check log retention settings
Review IAM permissions

Best Practices

Do's

✅ Use structured logging with context ✅ Set up alerts for critical failures ✅ Monitor business metrics, not just technical ✅ Keep dashboards focused and actionable ✅ Document how to respond to alerts ✅ Review logs regularly for patterns

Don'ts

❌ Log sensitive data (passwords, credit cards) ❌ Use console.log in production ❌ Create alerts without clear actions ❌ Ignore warning-level logs ❌ Log excessively in hot paths ❌ Ship logs without sanitization

Security Considerations

Log Sanitization

// Remove sensitive data before logging
function sanitize(data: any) {
  const sanitized = { ...data };

  // Remove sensitive fields
  delete sanitized.password;
  delete sanitized.creditCard;
  delete sanitized.ssn;

  // Mask email addresses
  if (sanitized.email) {
    sanitized.email = maskEmail(sanitized.email);
  }

  return sanitized;
}

logger.info('user.created', sanitize(userData));

Access Control

Limit log access to authorized personnel
Use role-based access for dashboards
Audit log access
Encrypt logs at rest and in transit

Overview​

Logging Strategy​

Log Levels​

Backend Logging​

Frontend Logging​

Error Tracking​

Sentry Integration (Placeholder)​

Error Context​

Performance Monitoring​

Application Performance Monitoring (APM)​

Database Query Monitoring​

API Response Times​

System Metrics​

Health Checks​

Resource Metrics​

Alerting​

Alert Levels​

Alert Channels​

Log Aggregation​

Centralized Logging (Placeholder)​

Log Queries​

Dashboards​

Key Metrics Dashboard (Placeholder)​

Example Dashboard Widgets​

Production Checklist​

Troubleshooting​

High Error Rates​

Slow Performance​

Missing Logs​

Best Practices​

Do's​

Don'ts​

Security Considerations​

Log Sanitization​

Access Control​

References​