Monitoring and Logging
Overview
Proper monitoring and logging are essential for maintaining a healthy production system. This guide covers how BookWish tracks errors, performance, and system health.
Logging Strategy
Log Levels
BookWish uses structured logging with these levels:
| Level | Usage | Example |
|---|---|---|
debug | Development details | Function entry/exit, variable values |
info | Normal operations | User login, order created |
warn | Unusual but handled | Slow query, rate limit approached |
error | Errors that were caught | API timeout, validation failure |
fatal | Critical system failure | Database unreachable, service crash |
Backend Logging
Using the Logger Utility:
import { logger } from '../lib/logger';
// Info level - normal operations
logger.info('user.login', { userId, email });
// Warning level - concerning but handled
logger.warn('payment.retry', {
orderId,
attempt: 3,
reason: 'timeout'
});
// Error level - something went wrong
logger.error('database.query.failed', {
query: 'SELECT * FROM users',
error: error.message,
stack: error.stack
});
Never Use Console:
// ❌ BAD - No structured logging
console.log('User logged in:', userId);
console.error('Error:', error);
// ✅ GOOD - Structured logging
logger.info('user.login', { userId });
logger.error('auth.login.failed', {
error: error.message,
userId
});
Frontend Logging
Flutter App:
// Production - no logging
// Development - use debugPrint sparingly
// For errors in production, report to error tracking
try {
await someOperation();
} catch (e) {
// Report to error tracking service
ErrorTracker.report(e);
}
Error Tracking
Sentry Integration (Placeholder)
Backend Setup:
import * as Sentry from '@sentry/node';
Sentry.init({
dsn: process.env.SENTRY_DSN,
environment: process.env.NODE_ENV,
tracesSampleRate: 1.0,
});
// Capture errors
try {
await processPayment(order);
} catch (error) {
Sentry.captureException(error, {
tags: {
feature: 'payment',
orderId: order.id
}
});
throw error;
}
Flutter Setup:
import 'package:sentry_flutter/sentry_flutter.dart';
await SentryFlutter.init(
(options) {
options.dsn = 'your-dsn-here';
options.environment = 'production';
},
appRunner: () => runApp(MyApp()),
);
// Capture errors
try {
await fetchData();
} catch (error, stackTrace) {
await Sentry.captureException(
error,
stackTrace: stackTrace,
);
}
Error Context
Always include relevant context with errors:
logger.error('payment.process.failed', {
orderId: order.id,
userId: user.id,
amount: order.total,
paymentMethod: order.paymentMethod,
error: error.message,
stack: error.stack,
timestamp: new Date().toISOString()
});
Performance Monitoring
Application Performance Monitoring (APM)
Placeholder for APM Service (e.g., DataDog, New Relic)
// Track transaction performance
const transaction = apm.startTransaction('process-order');
try {
await validateOrder(order);
await chargePayment(order);
await createShipment(order);
transaction.result = 'success';
} catch (error) {
transaction.result = 'error';
throw error;
} finally {
transaction.end();
}
Database Query Monitoring
// Log slow queries in development
if (process.env.NODE_ENV === 'development') {
const start = Date.now();
const result = await db.query(sql);
const duration = Date.now() - start;
if (duration > 1000) {
logger.warn('slow.query', { sql, duration });
}
}
API Response Times
// Middleware to track response times
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = Date.now() - start;
logger.info('http.request', {
method: req.method,
path: req.path,
status: res.statusCode,
duration
});
if (duration > 3000) {
logger.warn('slow.request', {
method: req.method,
path: req.path,
duration
});
}
});
next();
});
System Metrics
Health Checks
Basic Health Endpoint:
app.get('/health', async (req, res) => {
const health = {
status: 'healthy',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
checks: {
database: await checkDatabase(),
redis: await checkRedis(),
storage: await checkStorage()
}
};
const isHealthy = Object.values(health.checks)
.every(check => check.status === 'ok');
res.status(isHealthy ? 200 : 503).json(health);
});
async function checkDatabase() {
try {
await prisma.$queryRaw`SELECT 1`;
return { status: 'ok' };
} catch (error) {
return { status: 'error', message: error.message };
}
}
Resource Metrics
Track these key metrics (placeholder for monitoring service):
- CPU usage - Alert if > 80% for 5 minutes
- Memory usage - Alert if > 85%
- Disk usage - Alert if > 90%
- Network I/O - Track bandwidth usage
- Database connections - Alert if pool exhausted
Alerting
Alert Levels
| Level | Response Time | Examples |
|---|---|---|
| Critical | Immediate (PagerDuty) | Database down, API unreachable |
| High | Within 15 minutes | High error rate, payment failures |
| Medium | Within 1 hour | Slow queries, cache misses |
| Low | Next business day | Deprecation warnings, low disk space |
Alert Channels
Placeholder Configuration:
# alerts.yml
alerts:
- name: database-down
condition: database.connection.failed
severity: critical
notify: pagerduty
- name: high-error-rate
condition: error.rate > 10/min
severity: high
notify: slack
- name: slow-response
condition: response.time.p95 > 3000ms
severity: medium
notify: email
Log Aggregation
Centralized Logging (Placeholder)
Example with CloudWatch/DataDog/Logstash:
// All logs automatically shipped to central service
logger.info('event.name', { data });
// → CloudWatch Logs → DataDog → Alert if needed
Log Queries
Common queries to have ready:
-- Find all errors for a user
SELECT * FROM logs
WHERE userId = 'xxx'
AND level = 'error'
AND timestamp > NOW() - INTERVAL '1 day';
-- Find slow queries
SELECT * FROM logs
WHERE event = 'database.query'
AND duration > 1000
ORDER BY duration DESC;
-- Error rate by endpoint
SELECT path, COUNT(*) as errors
FROM logs
WHERE level = 'error'
AND timestamp > NOW() - INTERVAL '1 hour'
GROUP BY path
ORDER BY errors DESC;
Dashboards
Key Metrics Dashboard (Placeholder)
Create dashboards showing:
Application Health:
- Request rate (requests/minute)
- Error rate (errors/minute)
- Response time (p50, p95, p99)
- Uptime percentage
Business Metrics:
- Orders per hour
- New user signups
- Active users
- Revenue (if applicable)
Infrastructure:
- CPU and memory usage
- Database connection pool
- Cache hit rate
- API rate limit usage
Example Dashboard Widgets
// Placeholder dashboard configuration
{
"dashboard": "BookWish Production",
"widgets": [
{
"type": "timeseries",
"title": "Requests per Minute",
"query": "SELECT COUNT(*) FROM logs WHERE event = 'http.request'"
},
{
"type": "number",
"title": "Active Users (24h)",
"query": "SELECT COUNT(DISTINCT userId) FROM logs WHERE timestamp > NOW() - INTERVAL '24 hours'"
},
{
"type": "gauge",
"title": "Error Rate",
"query": "SELECT (errors / total) * 100 FROM request_stats"
}
]
}
Production Checklist
Before launching monitoring:
- Error tracking configured (Sentry or similar)
- APM tool installed (DataDog, New Relic, etc.)
- Health check endpoint implemented
- Critical alerts configured
- Dashboards created and shared
- On-call rotation established
- Runbooks documented for common issues
- Log retention policy defined
- Backup monitoring in place
Troubleshooting
High Error Rates
- Check error tracking dashboard
- Identify most common error types
- Review recent deployments
- Check external service status
- Review resource usage
Slow Performance
- Check APM traces
- Identify slow endpoints
- Review database query performance
- Check cache hit rates
- Look for resource constraints
Missing Logs
- Verify logger is configured correctly
- Check log shipping configuration
- Verify network connectivity
- Check log retention settings
- Review IAM permissions
Best Practices
Do's
✅ Use structured logging with context ✅ Set up alerts for critical failures ✅ Monitor business metrics, not just technical ✅ Keep dashboards focused and actionable ✅ Document how to respond to alerts ✅ Review logs regularly for patterns
Don'ts
❌ Log sensitive data (passwords, credit cards) ❌ Use console.log in production ❌ Create alerts without clear actions ❌ Ignore warning-level logs ❌ Log excessively in hot paths ❌ Ship logs without sanitization
Security Considerations
Log Sanitization
// Remove sensitive data before logging
function sanitize(data: any) {
const sanitized = { ...data };
// Remove sensitive fields
delete sanitized.password;
delete sanitized.creditCard;
delete sanitized.ssn;
// Mask email addresses
if (sanitized.email) {
sanitized.email = maskEmail(sanitized.email);
}
return sanitized;
}
logger.info('user.created', sanitize(userData));
Access Control
- Limit log access to authorized personnel
- Use role-based access for dashboards
- Audit log access
- Encrypt logs at rest and in transit