Bloopist has been crashing lately. Occasionally when trying to access Bloopist, a 502 Bad Gateway error would be returned. Since an error page was being returned, I felt pretty confident that Nginx was still running and the true culprit was Phusion Passenger.
I checked my recent Nginx errors with tail /opt/nginx/logs/error.log -n 20000
, and I found this:
App 8973 stderr: sh: error while loading shared libraries: libc.so.6: failed to map segment from shared object: Cannot allocate memory
terminate called after throwing an instance of 'Passenger::SystemException'
what(): Cannot fork a new process: Cannot allocate memory (errno=12)
ERROR: cannot fork a process for executing 'tee'
[ pid=2095, timestamp=1443343763 ] Process aborted! signo=SIGABRT(6), reason=SI_TKILL, signal sent by PID 2095 with UID 0, si_addr=0x82f, randomSeed=1443035040
[ pid=2095 ] Could not create crash log file, so dumping to stderr only.
[ pid=2095 ] Could fork a child process for dumping diagnostics: fork() failed with errno=12
[ 2015-09-27 08:49:23.7784 2092/7f59515f9700 age/Wat/AgentWatcher.cpp:96 ]: Passenger core (pid=2095) crashed with signal SIGABRT, restarting it...
[ 2015-09-27 08:49:23.7855 2092/7f595163c7c0 age/Wat/WatchdogMain.cpp:323 ]: Error in Passenger core watcher:
Cannot fork a new process: Cannot allocate memory (errno=12)
(empty)
Digging a little deeper into the error log, I found that there was an error when trying to send an exception notification email from another Ruby on Rails application of mine, BarStack:
App 2139 stdout: An error occurred when sending a notification using 'email' notifier. Net::ReadTimeout: Net::ReadTimeout
App 2139 stdout: /usr/local/lib/ruby/2.2.0/net/protocol.rb:158:in `rescue in rbuf_fill'
Suspicious of my error handling code, I made a dummy view and and controller and raised an error inside the dummy controller. Sure enough, when I hit that page the server crashed and started sending out 502 Bad Gateway errors again.
I Googled the error above and found an issue on Github for the 'mail' gem. It was specifically for version 2.5.4 of the mail gem. I checked my Gemfile.lock, and yes, that was the exact version of the gem that I was using.
That version of mail is a dependency of actionmailer (as shown in my Gemfile.lock):
actionmailer (3.2.16)
actionpack (= 3.2.16)
mail (~> 2.5.4)
Here, I had a choice to make. I could either upgrade actionmailer, or I could use the fix in the Github issue I found. Since the jump from Rails 3.2 to 4.0 is a major version change, I decided to take the safe route and use the fix in the GitHub issue. I can always upgrade later, and the fix won't have any negative impacts.
I applied the fix, reset my server, hit my error page, and everything worked!
Yes, I know I should have better tests in place to prevent this sort of thing. Had this been a professional project that's certainly how I would do it. This is just a personal project of mine that I seldom work on, so I'm not really concerned with writing bullet-proof code for it.