Incident Case Study: PHP-FPM OOM Kill
This case study explains a real failure mode and the prevention rules.
Summary
- The Linux kernel triggered OOM-killer
- A
php-fpmprocess was killed - Nginx logged:
recv() failed (104: Connection reset by peer) while reading response header from upstream
What Happened
- Some requests held PHP-FPM workers for long durations
- Memory use grew until the system ran out of RAM
- Kernel killed php-fpm to protect the system
Evidence Pattern
- Kernel logs show:
php invoked oom-killerOut of memory: Killed process (php-fpm...)
- Nginx shows:
Connection reset by peer
Root Cause Categories
Common reasons for PHP-FPM OOM:
- Unlimited runtime overrides in code
ini_set('memory_limit', '-1')ini_set('max_execution_time', 100000)
- Unbounded DB fetches
- huge
->get()orModel::all()
- huge
- Exports/reports in web request
- Large uploads processed inline
- Too many concurrent FPM workers for available RAM
Lessons Learned
- Web requests must be bounded:
- ≤ 120 seconds
- ≤ 256MB memory
- Heavy work must be moved to queues
- FPM must have a safe
pm.max_childrencap - Add slowlog to detect slow routes early
Preventive Controls
- Server-enforced timeouts:
request_terminate_timeout = 120s
- Code review checklist enforcement
- Queue workers with:
--timeout=120--memory=256
What To Do If You See This Again
- Check kernel logs for OOM:
dmesg -T | egrep -i 'oom|killed process'
- Identify heavy endpoints (slowlog / access logs)
- Move heavy logic to queue
- Reduce concurrency pressure (FPM caps + caching)
Campus On Click