MattScott 15245 Posted May 8, 2019 Hi all, I'm not interested in sugarcoating or spinning this last week. I am sorry for all the downtime. Many players have asked what happened, so I'm going to do my best to summarize the events. On Monday the 29th, we took the servers down to move data centers and migrate to all new hardware for our backend systems. This move was necessary for a couple reasons, but mainly because a lot of the hardware was more than 5 years old, and we were already seeing failures and system performance problems. It was only a matter of time before something critical failed. The timing also happened to line up with the end of our last legacy (ludicrously overpriced) hosting contract, and we needed to make some network architecture changes to facilitate some of the features coming after the Engine Upgrade. I figured we could kill three birds with one stone. The core challenge was managing all the information needed to drive APB since all the hardware was new. That meant backing everything up and then hand carrying a series of large hard drives from one location to another. The hardware had been prepped and configured in advance, so while we knew this would be challenging, but we had a fairly detailed plan and felt we could manage any issues that popped up. Problem #1: Unfortunately, during the move we unearthed some buried issues that delayed our ability to bring servers back online for quite a while. The team did a solid job of working through those problems and even recompiling code in a couple places to remove landmines that we stumbled over. But once we got the servers back online, I felt we did a decent job. Problem #2: Shortly after we went live, the brand new RAID controller in our new primary database server quickly degraded in performance. To make the situation worse, we rushed to get servers back online and decided it would be okay to let players back in while the secondary database server finished syncing. The hardware failure hit so quickly that the secondary wasn't ready, so we couldn't failover. At this point we made an effort to keep the servers online through the weekend, and while our jury-rigged fix allowed some players to get on, it also lead to many other players being unable to login (Error 9). The team decided the quickest way to fix the issue would be to build an entirely new primary database server and then swapping everything over on Monday. We didn't want to risk moving damaged drives to the new servers, so we needed a complete backup to make sure we didn't lose anything. Problem #3: Once we shutoff all the servers and started the backup, we found that the faulty RAID controller could only copy files at the rate of 1GB per minute. After 18+ hours, we were finally able to complete the backup, and then finally get the new server finalized and back online. There are a lot of things that went wrong, but in the end, I should have planned better. With that much new hardware, we were bound to have an issue somewhere. To make it up to everyone, on Friday we will be turning on 2 weeks of free Premium for all players. For anyone who has existing Premium, this will add to it. I never want to have to make this kind of apology again. Little Orbit can and will do better in the future. EDIT: We just started awarding the Premium (4pm Friday Pacific time). There are a lot of players to gift, so we're doing this in waves. To start we will hit everyone who has logged in within the last 30 days sorted by most recent login first. Then we'll go back 30 days, etc. Sorry, Matt 65 36 Share this post Link to post Share on other sites
VanilleKeks 746 Posted May 8, 2019 (edited) I'm not really accustomed with servers so pardon me asking, the new hardware is only for back end, database stuff? If so, are there any plans for upgrading the servers that run the actual districts too? Edit: I'm not sure what back-end means in this case is what I mean. I suppose handling logins and saving player data etc? Edited May 8, 2019 by VanilleKeks Share this post Link to post Share on other sites
Fyre 38 Posted May 8, 2019 Big ups to my man Matt for the detailed breakdown and the unconditional apology. Feels good to not be treated like an idiot. The compensation is also a nice touch. 10/10 good PR moves. 15 Share this post Link to post Share on other sites
demonshinta 44 Posted May 8, 2019 (>Oo)> (Golf Clap) Share this post Link to post Share on other sites
TurboBRCrim 29 Posted May 8, 2019 Thanks for the clarification. I also appreciate that you guys are concerned about server performance after the engine upgrade hits. It must be really difficult to migrate such an obsolete hardware, so props to you guys. 2 Share this post Link to post Share on other sites
SKay 207 Posted May 8, 2019 26 minutes ago, VanilleKeks said: I'm not really accustomed with servers so pardon me asking, the new hardware is only for back end, database stuff? If so, are there any plans for upgrading the servers that run the actual districts too? Edit: I'm not sure what back-end means in this case is what I mean. I suppose handling logins and saving player data etc? The backend is the blanket term used for all services not maintained by the end user, such as user information storage, networking infrastructure and CDN, server capacity and server hardware. It's the equivalent of going behind the counter of a shop and being a cashier. The client should under optimal circumstances be able to use the software as is designed and without hassle such as downtime. There are multiple layers in which you can distribute server hardware, from creating database VMs to bare metal game servers. Each server is given one or more jobs, such as to be a network entry point, or being a database instance or such like. They could theoretically set it up so that everything is hosted on the one server (which is not a good idea at all). A RAID controller in this case is mainly used for storage servers and database instances where backups are extremely critical, and where losing records just willy nilly is literally not an option. By definition a RAID controller is a specialised controller that is able to push more Read/Write IOPS (effectively how fast we can read or write) to a set of disks under the control of the RAID controller. You can easily get RAID cards these days for not that much expense, but for RAIDS controlling two dozen drives, its still quite expensive. Also this has been said many times in the past, not just here but in a global context, but it's always worth repeating - throwing hardware at a software problem really only gets you so far in regards to server performance. You would need to enable the software to be able to take advantage of newer hardware instruction sets and architectures, not just bankroll on IPC increases and clock count. 7 3 Share this post Link to post Share on other sites
VanilleKeks 746 Posted May 8, 2019 1 minute ago, SKay said: The backend is the blanket term used for all services not maintained by the end user, such as user information storage, networking infrastructure and CDN, server capacity and server hardware. It's the equivalent of going behind the counter of a shop and being a cashier. The client should under optimal circumstances be able to use the software as is designed and without hassle such as downtime. There are multiple layers in which you can distribute server hardware, from creating database VMs to bare metal game servers. Each server is given one or more jobs, such as to be a network entry point, or being a database instance or such like. They could theoretically set it up so that everything is hosted on the one server (which is not a good idea at all). A RAID controller in this case is mainly used for storage servers and database instances where backups are extremely critical, and where losing records just willy nilly is literally not an option. By definition a RAID controller is a specialised controller that is able to push more Read/Write IOPS (effectively how fast we can read or write) to a set of disks under the control of the RAID controller. You can easily get RAID cards these days for not that much expense, but for RAIDS controlling two dozen drives, its still quite expensive. Also this has been said many times in the past, not just here but in a global context, but it's always worth repeating - throwing hardware at a software problem really only gets you so far in regards to server performance. You would need to enable the software to be able to take advantage of newer hardware instruction sets and architectures, not just bankroll on IPC increases and clock count. Thanks for the info Still, knowing G1 I'm pretty sure a hardware upgrade alongside the software changes going on wouldn't hurt. Then again we don't know how much of server issues are related to either side Share this post Link to post Share on other sites
ZacAttackLeader 3 Posted May 8, 2019 (edited) This has to be the best clarification I have ever seen on these forums. Spot on Matt, keep up this communication. Im hurt i didnt get to play because I am slowly coming back but wow, I am impressed. Thank you so much. Didnt G1 give us server upgrades 2 years ago? And they still werent up to date? Any ETA on when we can actually start playing again? Edited May 8, 2019 by ZacAttackLeader 2 Share this post Link to post Share on other sites
Pedroxin 107 Posted May 8, 2019 I'd clap for you but sadly everyone's asleep here. Really feels good to be respected tbh, thanks, Matt. 1 Share this post Link to post Share on other sites
Akito 29 Posted May 8, 2019 more transparency than G1 ever did, I'll give you props for that @MattScott 4 Share this post Link to post Share on other sites
vsb 6174 Posted May 8, 2019 watch as 2 weeks of premium makes people magically forget they were ready to lynch orbit less than 24 hours ago that aside continued transparency continues to be great (shocker), keep up the good work 7 Share this post Link to post Share on other sites
MattScott 15245 Posted May 8, 2019 20 minutes ago, ZacAttackLeader said: Any ETA on when we can actually start playing again? All servers are back online now. We're monitoring things to see I there are any straggler issues. 6 1 Share this post Link to post Share on other sites
ZakoN 0 Posted May 8, 2019 well it's actually great what we finally backed online, but Im still have issue which without any kind of error code so I'm just Disconnected and it's sad ;c Share this post Link to post Share on other sites
Fortune Runner 796 Posted May 8, 2019 16 minutes ago, MattScott said: All servers are back online now. We're monitoring things to see I there are any straggler issues. Is there any way to do crowdfunding to supply Little Orbit with coffee? honestly I think you guys could use it with all the hard work done on a regular basis , and that's before this challenge happened. thank you for everything. 1 Share this post Link to post Share on other sites
MACKxBOLAN 435 Posted May 8, 2019 We appreciate the efforts to better the equipment, the updates throughout the outage. Thank You to Matt and Lixil 1 Share this post Link to post Share on other sites
Guest Posted May 8, 2019 (edited) 1 hour ago, MattScott said: Hi all, I'm not interested in sugarcoating or spinning this last week. I am sorry for all the downtime. Many players have asked what happened, so I'm going to do my best to summarize the events. On Monday the 29th, we took the servers down to move data centers and migrate to all new hardware for our backend systems. This move was necessary for a couple reasons, but mainly because a lot of the hardware was more than 5 years old, and we were already seeing failures and system performance problems. It was only a matter of time before something critical failed. The timing also happened to line up with the end of our last legacy (ludicrously overpriced) hosting contract, and we needed to make some network architecture changes to facilitate some of the features coming after the Engine Upgrade. I figured we could kill three birds with one stone. The core challenge was managing all the information needed to drive APB since all the hardware was new. That meant backing everything up and then hand carrying a series of large hard drives from one location to another. The hardware had been prepped and configured in advance, so while we knew this would be challenging, but we had a fairly detailed plan and felt we could manage any issues that popped up. Problem #1: Unfortunately, during the move we unearthed some buried issues that delayed our ability to bring servers back online for quite a while. The team did a solid job of working through those problems and even recompiling code in a couple places to remove landmines that we stumbled over. But once we got the servers back online, I felt we did a decent job. Problem #2: Shortly after we went live, the brand new RAID controller in our new primary database server quickly degraded in performance. To make the situation worse, we rushed to get servers back online and decided it would be okay to let players back in while the secondary database server finished syncing. The hardware failure hit so quickly that the secondary wasn't ready, so we couldn't failover. At this point we made an effort to keep the servers online through the weekend, and while our jury-rigged fix allowed some players to get on, it also lead to many other players being unable to login (Error 9). The team decided the quickest way to fix the issue would be to build an entirely new primary database server and then swapping everything over on Monday. We didn't want to risk moving damaged drives to the new servers, so we needed a complete backup to make sure we didn't lose anything. Problem #3: Once we shutoff all the servers and started the backup, we found that the faulty RAID controller could only copy files at the rate of 1GB per minute. After 18+ hours, we were finally able to complete the backup, and then finally get the new server finalized and back online. There are a lot of things that went wrong, but in the end, I should have planned better. With that much new hardware, we were bound to have an issue somewhere. To make it up to everyone, on Friday we will be turning on 2 weeks of free Premium for all players. For anyone who has existing Premium, this will add to it. I never want to have to make this kind of apology again. Little Orbit can and will do better in the future. Sorry, Matt This is the first time in many years i see someone from a company explain perfectly what happened, you couldn't add more details than these alredy, you are a beast and i am honorated to have you as Owner of this game. I do understand, old thigns can cause issues but too new ones if not set up correctly. With these changes are there any improvements to servers? As performance and temperature? Or just same type of part but new? Im cuorius about that. Wow, 2 weeks of free premium for everyone.. i wish G1 was at this level when bad things happened in the past. Atleast is something and thank you very much, for everything, really (not only you but Lixil and all others back there). Little Orbit growing up together us and of course will be better No sorry needed. Forgiven. PS I've seen Amaii talking about changes to BE will come soon, what that about? Are you going to do another Thread to explain? (i know is a separated question) Edited May 8, 2019 by Guest Share this post Link to post Share on other sites
AlexDhigh 0 Posted May 8, 2019 I appreciate the communication done here. Unfortunatly, I've seen countless numbers of posts like this one. I've been waiting for years for the engine upgrade to get back in APB. I feel like the game is cursed and everything always go wrong. I don't want to add oil on the fire but seriously guys... if you can't even migrate on new equipment, how will you upgrade the engine smoothly ? Share this post Link to post Share on other sites
_chain 176 Posted May 8, 2019 What a team of absolute troopers. Kudos to the knights behind the keyboards at LO. 3 Share this post Link to post Share on other sites
Snubnose 641 Posted May 8, 2019 I hope I'll see a post as honest and open like this sometime in the future of matt admitting that RIOT mode with seasons + seasonpass is not the right way of bringing APB back to life... sorry that I mention something somewhat offtopic, and I know some people will disagree*, but I just really felt like saying this now... didn't mean to bring the mood down.*brace for impact of downvotes Share this post Link to post Share on other sites
AntiCross 12 Posted May 8, 2019 Hey man great job, You guys taking a piece of garbage and creating a non-garbage version of it is awesome and challenging. Y'all doing great and setbacks are normal. keep the good work thanks for the free premium 1 Share this post Link to post Share on other sites
ffejrxx 3 Posted May 8, 2019 it could have gone worse 1 Share this post Link to post Share on other sites
killerskull 111 Posted May 8, 2019 Ive not played video games in months because I think Ive hit gaming menopause, but hearing about free prem and new hardware might make me log in this weekend just to check the game out. Share this post Link to post Share on other sites
Rubbermade 36 Posted May 8, 2019 how long does it usually take for FE to Initialize after it crashes?? Share this post Link to post Share on other sites