-
Content Count
1297 -
Joined
-
Last visited
Everything posted by MattScott
-
Apology to the community for this last week
MattScott replied to MattScott's topic in General Discussion Archive
For all the PS4 players, we are looking at the login issue. -
Hi all, Looks like the PS4 services on our end are having trouble communicating with Sony’s end. We are investigating the issue. Thanks, Matt
-
Apology to the community for this last week
MattScott replied to MattScott's topic in General Discussion Archive
All servers are back online now. We're monitoring things to see I there are any straggler issues. -
Hi all, I'm not interested in sugarcoating or spinning this last week. I am sorry for all the downtime. Many players have asked what happened, so I'm going to do my best to summarize the events. On Monday the 29th, we took the servers down to move data centers and migrate to all new hardware for our backend systems. This move was necessary for a couple reasons, but mainly because a lot of the hardware was more than 5 years old, and we were already seeing failures and system performance problems. It was only a matter of time before something critical failed. The timing also happened to line up with the end of our last legacy (ludicrously overpriced) hosting contract, and we needed to make some network architecture changes to facilitate some of the features coming after the Engine Upgrade. I figured we could kill three birds with one stone. The core challenge was managing all the information needed to drive APB since all the hardware was new. That meant backing everything up and then hand carrying a series of large hard drives from one location to another. The hardware had been prepped and configured in advance, so while we knew this would be challenging, but we had a fairly detailed plan and felt we could manage any issues that popped up. Problem #1: Unfortunately, during the move we unearthed some buried issues that delayed our ability to bring servers back online for quite a while. The team did a solid job of working through those problems and even recompiling code in a couple places to remove landmines that we stumbled over. But once we got the servers back online, I felt we did a decent job. Problem #2: Shortly after we went live, the brand new RAID controller in our new primary database server quickly degraded in performance. To make the situation worse, we rushed to get servers back online and decided it would be okay to let players back in while the secondary database server finished syncing. The hardware failure hit so quickly that the secondary wasn't ready, so we couldn't failover. At this point we made an effort to keep the servers online through the weekend, and while our jury-rigged fix allowed some players to get on, it also lead to many other players being unable to login (Error 9). The team decided the quickest way to fix the issue would be to build an entirely new primary database server and then swapping everything over on Monday. We didn't want to risk moving damaged drives to the new servers, so we needed a complete backup to make sure we didn't lose anything. Problem #3: Once we shutoff all the servers and started the backup, we found that the faulty RAID controller could only copy files at the rate of 1GB per minute. After 18+ hours, we were finally able to complete the backup, and then finally get the new server finalized and back online. There are a lot of things that went wrong, but in the end, I should have planned better. With that much new hardware, we were bound to have an issue somewhere. To make it up to everyone, on Friday we will be turning on 2 weeks of free Premium for all players. For anyone who has existing Premium, this will add to it. I never want to have to make this kind of apology again. Little Orbit can and will do better in the future. EDIT: We just started awarding the Premium (4pm Friday Pacific time). There are a lot of players to gift, so we're doing this in waves. To start we will hit everyone who has logged in within the last 30 days sorted by most recent login first. Then we'll go back 30 days, etc. Sorry, Matt
- 138 replies
-
- 101
-
-
-
Hi everyone, Unfortunately, we are still working on backing things up properly. The file transfer from the bad server is very slow. We’ve done the math on the remaining files at their current rate and added repair time, and we feel the outage is going to run another 6 hours to roughly 6am Pacific time. We are working through it as fast as we can. Sorry, Matt
- 5 replies
-
- 22
-
-
-
Hi everyone, We're running a little long on the hardware maintenance. We're currently estimating another 6 hours with servers coming online around midnight Pacific time. We apologize for the delay, but we also feel it is critical to get this piece of equipment working properly for everyone. I'll post as soon as servers are back online. Sorry, Matt
- 5 replies
-
- 17
-
-
-
login issue + report you name and server you play
MattScott replied to Donjae's topic in General Discussion Archive
Hi all, I’m going to go ahead and close this thread. I appreciate the OP’s effort to raise awareness. We are working on the issue. Apologies. Thanks, Matt -
Hi there, Lixil has updated her post on this issue. We are hoping to fix the issue tomorrow (5/6). Thanks, Matt
-
Log In / Error 9 Discussion Thread
MattScott replied to Synezisia's topic in General Discussion Archive
Hi all, Lixil has updated her original post. We are waiting on replacement hardware, but we should be able to do some maintenance tomorrow (5/6) to fix the issue. Thanks, Matt -
login issue + report you name and server you play
MattScott replied to Donjae's topic in General Discussion Archive
Hi all, Lixil has updated her original post with more details. Our brand new primary login server has some bad hardware. We have a replacement arriving soon, and we are hoping to schedule maintenance for tomorrow (5/6). Thanks, Matt -
Log In / Error 9 Discussion Thread
MattScott replied to Synezisia's topic in General Discussion Archive
Hi all, Lixil has updated her original post with more information. We have some faulty equipment that is hopefully ready to be fixed tomorrow (5/6). Sorry, Matt -
Log In / Error 9 Discussion Thread
MattScott replied to Synezisia's topic in General Discussion Archive
Hi all, We are working on the issue. Lixil has updated her original post with more information. We moved many servers last week. Unfortunately, one of them has some bad hardware and is malfunctioning. Logging in is hit or miss right now. We are hoping to fix the hardware tomorrow (5/6). Thanks, Matt -
Hi there, Lixil just updated her post in Social. Essentially, we have a bad piece of equipment in the new data center that is causing a problem. We have already secured a replacement part and we hope to schedule downtime tomorrow (5/6) to fix the issue. Sorry, Matt
-
Extended Server Maintenance (Monday 4/29/2019)
MattScott replied to Lixil's topic in General Discussion Archive
Hi Wastelanders, Servers are up. My full post about the incident is here: We will schedule Commander early this week so everyone gets notice and can enjoy it properly. Thanks, Matt -
Hi all, On Monday, April 29th, we scheduled downtime to physically move from one data center to a new one. This move was unavoidable based on a very expensive, legacy Reloaded contract that had ended. Even though we asked for an extension, we were forced to move out before May 1st 2019. As you know, Fallen Earth is an extremely old game. The servers run on an OS that is no longer available to download, it takes more than an hour to even reboot, and it requires many different databases that all have to be synced to operate correctly. We made backups and planned to move each system one by one into the new data center. However despite many precautions, some data was lost. The engineers have spent their waking hours attempting to find the right mix of files to get everything restored properly. We did eventually get the system back online, but it appears roughly 2 weeks of progress was lost. Having exhausted all other options, we are going to be putting the servers back online and moving forward. In the meantime, we'll be doing the following to help players recover: - We'll be giving out Commander to all players for 2 weeks to help them get caught back up - Anyone who lost purchases due to the rollback can open a trouble ticket at http://support.gamersfirst.com, and we'll escalate getting those taken care off as quickly as we can We're not going to start the Commander for another couple days, so that all the players can get up to speed on what has happened. I want everyone to be able to take full advantage of the boost over the coming weeks. Please know that the team worked very hard to get us to this point, and we are committed to getting the back end re-written so it can be properly supported in the future. EDIT: We will be turning on 2 weeks of Commander for the Fallen Earth players. EDIT: We waited an extra couple of days to try and make sure server performance was better. Effective 5/16, we have activated a 4 week Commander code as compensation to the players in the hopes of helping them catch up on lost progress. The code is: FallenNot4gotten Apologies, Matt
- 136 replies
-
- 12
-
-
-
Extended Server Maintenance (Monday 4/29/2019)
MattScott replied to Lixil's topic in General Discussion Archive
Hi all, I think we've reached the limits of what we can do, and the down time has already been excessive. We did some tests and the newer data is too incomplete to work. With that in mind, the servers are going to be put back online shortly, and I'll make a public post about the missing data for the rest of the players. Moving forward: - We'll be giving out Commander to all players for 2 weeks to help them get caught back up - Anyone who lost purchases due to the rollback can open a trouble ticket at http://support.gamersfirst.com, and we'll escalate getting those taken care off as quickly as we can The team is committed to getting the back end re-written so it can be properly supported. Apologies, Matt -
Extended Server Maintenance (Monday 4/29/2019)
MattScott replied to Lixil's topic in General Discussion Archive
Hi all, We are going to temporarily take down FE so that players don't keep playing. The hope is to quickly test a more up-to-date database and then put the game back online. Thanks, Matt -
Extended Server Maintenance (Monday 4/29/2019)
MattScott replied to Lixil's topic in General Discussion Archive
Hi all, We are going to try getting our secondary environment up and running. Then we can test a new set of data. This wont be a super fast process, and many of my team of recuperating this weekend. Bear with us. My advice for current players would be to not go crazy. If we can confirm a solid way to restore the 2 weeks of missing progress we will. But that will mean losing anything since the servers came back online. Thanks, Matt -
Extended Server Maintenance (Monday 4/29/2019)
MattScott replied to Lixil's topic in General Discussion Archive
Hi all, I am looking at the data issue with my team, and there is no easy answer. Right now we have two choices. 1) Leave it the way it is and work on fixing the accounts that lost paid items. We have records on the payment side, so it should be easy to restore those. The team is exhausted, but over time we can also try restoring some of the new data in a separate area which would allow us to verify some of the bigger lost progression items / in-game items in order to grant those back to players. or 2) Take the servers down and try another round of restoring data from a different set of possibly newer backups. We already know one of the databases from this newer set of backups is significantly out of sync and older. So there is risk that we will introduce a whole bunch of problems with incompatible data. Personally, as painful as it is, I'm going to recommend that we plow forward and escalate the support tickets for the players who lost real money purchases. Apologies for the issues. Sorry, Matt -
Extended Server Maintenance (Monday 4/29/2019)
MattScott replied to Lixil's topic in General Discussion Archive
Hi all, We have the servers working now. We need to keep them locked for some QA to make sure everything in-game looks okay. We appreciate your patience. Thanks, Matt -
Extended Server Maintenance (Monday 4/29/2019)
MattScott replied to Lixil's topic in General Discussion Archive
We can't upgrade the hardware. And the code for the servers runs on an OS that isn't available any more. For the move, we made images of everything. Backed up everything. And then moved the hardware 1 to 1, so as not to disrupt anything. Even with all of our precautions, we still have problems. My approach is to upgrade the code first and then we can update the hardware properly. That effort has been underway for quite a while now. -
Extended Server Maintenance (Monday 4/29/2019)
MattScott replied to Lixil's topic in General Discussion Archive
Hi all, I can’t know how frustrating it is for everyone right now. I wish I had more details to give out. Fallen Earth is a collection of many databases and many different servers that all have to mesh together properly for us to unlock the doors and let players in. There were problems in the move and in the restore process that we are still working out. To give you an idea of how old and large the game is, it takes over and hour to full reboot everything and come back online. That means we make some changes, but then wait significant amounts of time just to see the results. Then we make more changes and then wait some more. It’s been extremely frustrating. The team is going to continue working on the servers this weekend till they are online properly. Stay tuned. Thanks, Matt -
Extended Server Maintenance (Monday 4/29/2019)
MattScott replied to Lixil's topic in General Discussion Archive
Hi everyone, The team is still working through issues, but progress is being made. Thanks, Matt -
Extended Server Maintenance (Monday 4/29/2019)
MattScott replied to Lixil's topic in General Discussion Archive
Hi all, We're in a weird position at this point. Technically, all the servers are working properly. All the databases are restored and up. However, the system overall is unbearably slow. It takes hours to boot and load all the data. Once it is up, the timeouts between servers make the system unstable and unplayable. We took everything down and restarted it all from scratch about an hour ago, and not everything is fully booted. I'll keep you posted. Thanks, Matt -
Extended Server Maintenance (Monday 4/29/2019)
MattScott replied to Lixil's topic in General Discussion Archive
Hi all, Just posting another status. We are currently working through a database speed issue and an issue with the login server. The engineers on my end are troubleshooting. Thanks, Matt