This morning (Friday, Sept 11, 2020) we suffered some downtime for over 2 hours. Folks were not able to access CozyCal's booking pages and logged in interface.
It has been fixed now, and we're sorry for the disruption caused to our customers, especially to those in Europe who were affected the most.
What happened:
- Recently, we upgraded our servers to use a different Linux distribution. We changed from Ubuntu to Debian.
- One thing we overlooked during the migration is that Debian Server has a low default file descriptor limit of 1024.
- On Friday morning, there was an increase in traffic, and our Caddy proxy server, went down due to maxing out the file descriptor limit.
- We could have been alerted about this earlier, however our website monitoring service (Varys.io) had gone out of business without us being aware 🥺.
- Note: we have monitors many things, such as high CPU or memory usage, daily backups, and system updates. However, none of them were triggered by our proxy server being unresponsive.
What we did to fix it:
- We bumped up our ulimit from the default 1024 to 16384.
- We switched to Uptime Robot for our website monitoring service. It has been in business for a long time, hopefully it will continue to be around for a lot longer.
