So after we’ve extended the virtual cloud server twice, we’re at the max for the current configuration. And with this crazy growth (almost 12k users!!) even now the server is more and more reaching capacity.
Therefore I decided to order a dedicated server. Same one as used for mastodon.world.
So the bad news… we will need some downtime. Hopefully, not too much. I will prepare the new server, copy (rsync) stuff over, stop Lemmy, do last rsync and change the DNS. If all goes well it would take maybe 10 minutes downtime, 30 at most. (With mastodon.world it took 20 minutes, mainly because of a typo :-) )
For those who would like to donate, to cover server costs, you can do so at our OpenCollective or Patreon
Thanks!
Update The server was migrated. It took around 4 minutes downtime. For those who asked, it now uses a dedicated server with a AMD EPYC 7502P 32 Cores “Rome” CPU and 128GB RAM. Should be enough for now.
I will be tuning the database a bit, so that should give some extra seconds of downtime, but just refresh and it’s back. After that I’ll investigate further to the cause of the slow posting. Thanks @[email protected] for assisting with that.
@[email protected] DM me if you need help setting up monitoring/alerting on server health. IRL I’m on an SRE team, so happy to help where I can!
Be interesting to see. I’d assume the db is the bottleneck, would be nice to get stats on that. Should point the way to scaling more economically, i.e. well spec’d DB server and cheaper app server(s).
Could also use something like https://github.com/awslabs/pgbouncer-rr-patch to route read only queries to a replica without any application changes, although there’s some nuances, e.g. it might need some finessing if the app writes and then reads in two sessions, expecting to see the results from the write in the read.