Jump to content

MattScott

CEO
  • Content Count

    1286
  • Joined

  • Last visited

Everything posted by MattScott

  1. Hi all, We just started awarding the Premium (4pm Friday Pacific time). There are a lot of players to gift, so we're doing this in waves. To start we will hit everyone who has logged in within the last 30 days sorted by most recent login first. Then we'll go back 30 days, etc. Your Premium wont start till it is awarded to the account. I'll keep you posted. Thanks, Matt
  2. I can appreciate your position, but I've already gone on record to say we are not wiping data. We can and will get the server back up. The requirements are just a little tricky to nail down for something like this. Having a server up for years collecting and indexing data over time is a different spec from starting a server empty and loading it with years of data. Once we get stabilized then we have a different way of moving old data out of the active tables dynamically without wiping anything. This just requires some coding and testing in our PTS environment, which we don't have time for right now. Thanks, Matt
  3. We can't change the software OS, the size or structure of the structure of the database. There is no way to avoid some of the "dirty tricks" to keep the game online till we re-write the backend. However we are running on new hardware across the board. The issue we are solving right now relates to the problematic database server (used for pulling all character data). Originally we got everything restored, but clearly players were having trouble retrieving characters on login. At this point, we have determined that this specific database is so large (and poorly structured) that we can no longer simply restore it. Loading that much data results in unusable table indexes which generates the problems we were seeing before we took it down. It takes 6 hours each attempt to restore and then 7 hours each attempt to repair. We have tried 3 different ways to get the database setup, and all have yielded the same result. We are shifting tactics and moving to a different set of expanded hardware. Then we'll give it a 4th attempt. Everyone is working the weekend to get this back online asap. Thanks, Matt
  4. Hi all, This is my last update for the evening. I'm managing the hand off between engineers and then going to call it a night. We got entirely through the restore and most of the repair and then the database errored. It cost us hours of work. Instead of proceeding on a shaky server, I'm having the team tear it down and rebuild it, so we know its stable. Unfortunately there is a lot more work to do. I don't have a good ETA, but likely 12 more hours. I'll update everyone where we are at in the morning. Sorry, Matt
  5. Hi all, We're nearly done with the partial restore to get back to a position to repair things. The backup was done right after we took the servers offline, so there should be no rollback. Restore should be done around 1am, and then I'll have a better estimate for the repair time. Unfortunately we wont be back online at 1am like I had hoped. Sorry, Matt
  6. Sure thing. I can see the confusion. It was both. The contract was up, and we had already put our notice in with the old hosting facility. However at the very end there was a delay with the new hardware arriving and getting setup on-time at the new facility. My preference would have been to get 2 more weeks at the old facility, but they denied our request to stay a bit longer.
  7. Hi all, Small update. I have updated the main post, but we hit a serious snag and are working around it now. We are going to have to unfortunately extend the maintenance window. Thanks, Matt
  8. Hi all, We spent some time yesterday looking at the character login issues and found that 6 tables across our various databases needed heavy re-indexing after they got restored during the move. The tables are all over 5GB in size with the largest being 550MM rows at 55GB. This largest table contains all character attributes for the more than 2.5MM characters made in Fallen Earth to date. The short term fix is to re-index and stabilize things. The long term fix will be to move old character data to an archive database where it can be restored on the fly if those older players choose to log back in. Since 1am we spent a couple hours backing everything up, and then about 8 hours re-indexing. We are down to the last large table. It appears that it will take about 5 hours more and then we need about another hour to turn on servers and run through a quick QA test. That puts our earliest ETA for letting players back in at ~6pm Pacific time. With a little luck we should have things running much better for the weekend. I'll keep monitoring this and updating the community. EDIT: We hit a serious snag and ran out of drive space on the volume while doing the repair. The team is working around it, but we are going to have to extend the maintenance. EDIT: 5/10 1am Pacific. Nearly done restoring and adding more space for the repair. No ETA yet. EDIT: 5/10 3:20am Pacific. The server dumped. We've decided to scrap it and replace it due to ongoing issues with it. The team is working on replacing it now. EDIT: 5/10 12:00pm Pacific. New hardware is online. Reinstalling everything and then we'll start the database restore / repair process. EDIT: 5/10 3:20pm Pacific. Current estimate is 9:30pm for the restore to finish. Then the repair will start. EDIT: 5/11 3:30am Pacific. Still waiting on various parts to finish. EDIT: 5/11 12pm Pacific. Finishing one last task and then we'll be bringing servers online for QA internal testing. To be clear, backups are intact. No rollback is expected. Thanks, Matt
  9. Out of curiosity, did you create your account in April?
  10. There is no intention to change the core mechanics to the game. Specifically we are not turning Fallen Earth into a first person game. Most of this effort is just to upgrade the tech so it is more supportable.
  11. Hmm. Did you lose all your achievements? According to my understanding, we only lost about 2 weeks of progression. So the large majority of your 8.5 year progress should be preserved.
  12. Hi all, This post went up today. It's been a while since I posted engine update progress for @FallenEarth. One big blocker has been figuring out the animation structure and porting them over to match the new characters and creatures. Looks like we have that solved now. #ThisOldGame Thanks, Matt
  13. Hi all, The team has organized how to make up for the lost progression, and we will be starting 2 weeks of Commander for all players on Friday 5/10. I have edited the original post to reflect this. Thanks, Matt
  14. Hi there, We recently had an issue with our primary database. The situation is fixed, and no data was lost. But we are still restoring the old Hoplon data that used by the migration system. You should be able to migrate your account by Friday. Thanks, Matt
  15. For all the PS4 players, we are looking at the login issue.
  16. Hi all, Looks like the PS4 services on our end are having trouble communicating with Sony’s end. We are investigating the issue. Thanks, Matt
  17. All servers are back online now. We're monitoring things to see I there are any straggler issues.
  18. Hi all, I'm not interested in sugarcoating or spinning this last week. I am sorry for all the downtime. Many players have asked what happened, so I'm going to do my best to summarize the events. On Monday the 29th, we took the servers down to move data centers and migrate to all new hardware for our backend systems. This move was necessary for a couple reasons, but mainly because a lot of the hardware was more than 5 years old, and we were already seeing failures and system performance problems. It was only a matter of time before something critical failed. The timing also happened to line up with the end of our last legacy (ludicrously overpriced) hosting contract, and we needed to make some network architecture changes to facilitate some of the features coming after the Engine Upgrade. I figured we could kill three birds with one stone. The core challenge was managing all the information needed to drive APB since all the hardware was new. That meant backing everything up and then hand carrying a series of large hard drives from one location to another. The hardware had been prepped and configured in advance, so while we knew this would be challenging, but we had a fairly detailed plan and felt we could manage any issues that popped up. Problem #1: Unfortunately, during the move we unearthed some buried issues that delayed our ability to bring servers back online for quite a while. The team did a solid job of working through those problems and even recompiling code in a couple places to remove landmines that we stumbled over. But once we got the servers back online, I felt we did a decent job. Problem #2: Shortly after we went live, the brand new RAID controller in our new primary database server quickly degraded in performance. To make the situation worse, we rushed to get servers back online and decided it would be okay to let players back in while the secondary database server finished syncing. The hardware failure hit so quickly that the secondary wasn't ready, so we couldn't failover. At this point we made an effort to keep the servers online through the weekend, and while our jury-rigged fix allowed some players to get on, it also lead to many other players being unable to login (Error 9). The team decided the quickest way to fix the issue would be to build an entirely new primary database server and then swapping everything over on Monday. We didn't want to risk moving damaged drives to the new servers, so we needed a complete backup to make sure we didn't lose anything. Problem #3: Once we shutoff all the servers and started the backup, we found that the faulty RAID controller could only copy files at the rate of 1GB per minute. After 18+ hours, we were finally able to complete the backup, and then finally get the new server finalized and back online. There are a lot of things that went wrong, but in the end, I should have planned better. With that much new hardware, we were bound to have an issue somewhere. To make it up to everyone, on Friday we will be turning on 2 weeks of free Premium for all players. For anyone who has existing Premium, this will add to it. I never want to have to make this kind of apology again. Little Orbit can and will do better in the future. EDIT: We just started awarding the Premium (4pm Friday Pacific time). There are a lot of players to gift, so we're doing this in waves. To start we will hit everyone who has logged in within the last 30 days sorted by most recent login first. Then we'll go back 30 days, etc. Sorry, Matt
  19. Hi everyone, Unfortunately, we are still working on backing things up properly. The file transfer from the bad server is very slow. We’ve done the math on the remaining files at their current rate and added repair time, and we feel the outage is going to run another 6 hours to roughly 6am Pacific time. We are working through it as fast as we can. Sorry, Matt
  20. Hi everyone, We're running a little long on the hardware maintenance. We're currently estimating another 6 hours with servers coming online around midnight Pacific time. We apologize for the delay, but we also feel it is critical to get this piece of equipment working properly for everyone. I'll post as soon as servers are back online. Sorry, Matt
  21. Hi all, I’m going to go ahead and close this thread. I appreciate the OP’s effort to raise awareness. We are working on the issue. Apologies. Thanks, Matt
  22. MattScott

    error code 9

    Hi there, Lixil has updated her post on this issue. We are hoping to fix the issue tomorrow (5/6). Thanks, Matt
  23. Hi all, Lixil has updated her original post. We are waiting on replacement hardware, but we should be able to do some maintenance tomorrow (5/6) to fix the issue. Thanks, Matt
  24. Hi all, Lixil has updated her original post with more details. Our brand new primary login server has some bad hardware. We have a replacement arriving soon, and we are hoping to schedule maintenance for tomorrow (5/6). Thanks, Matt
  25. Hi all, Lixil has updated her original post with more information. We have some faulty equipment that is hopefully ready to be fixed tomorrow (5/6). Sorry, Matt
×
×
  • Create New...