Jump to content
MattScott

Apology to the community for this last week

Recommended Posts

Thank you, LO! ❤️

  • Like 1

Share this post


Link to post
Share on other sites
16 minutes ago, Rubbermade said:

how long does it usually take for FE to Initialize after it crashes??

 

Good question!

@MattScott 

 

I'd also like to know that.

Share this post


Link to post
Share on other sites

thanks for the clarification !! And thanks for the free premium!!  

Share this post


Link to post
Share on other sites

8Sbvffr.pngLO reminds me of Eugene from Hey Arnold.

You know that kid who had literally every conceivable bad thing happen to him for no reason but never let it get him down and kept smiling? He was a literal jinx and kept doing his best

 

That's Little Orbit. 

 

You go guys. 

*head pat* 

Edited by ChellyBean
  • Like 3

Share this post


Link to post
Share on other sites

For all the PS4 players, we are looking at the login issue. 

  • Like 2

Share this post


Link to post
Share on other sites

 

Quote

I figured we could kill three birds with one stone.

 

 

Bulletproof jackeT birds.

Edited by Nettuno

Share this post


Link to post
Share on other sites

As a guy who was in the IT industry once upon a time, and also one who took down a couple server racks once, I am aware of how troublesome this kinda stuff can be.  Appreciate all your hard work and heads up on what all transpired!  Thank you for the free premium it is much appreciated!  Honestly I am really impressed with ya'll since you took over.  I am very grateful to you for doing so.  This past year I was afraid we were going to see the end of apb soon.  Since ya'll have been on the scene, i've noticed the little things you added. It has been as though apb has new life breathed into it!

 

Again Thank You!,

John Nails

 

  • Like 1

Share this post


Link to post
Share on other sites

Thanks for the detailed information.

Share this post


Link to post
Share on other sites

Thanks for being open and honest with us, Mr Scott.

 

Unfortunately, this kind of thing was going to lurk in the background and terrible, unexpected things are (by very nature) a pain in the posterior.

 

Best of luck going forward.

Share this post


Link to post
Share on other sites

You said old equipment is outdated and there were problems. 

Also, you said that players should not experience a decrease or increase in performance. 

 

Have you moved to a new (other) equipment, or to the same servers that were before? 

 

It is also interesting, have the IP addresses changed the server login? (If you have changed - please tell me. Players from one of the Russian countries 2 years used to experience connection problems due to an IP blocking error)

Edited by vfterlife

Share this post


Link to post
Share on other sites

When do you plan to say that OTW RIOT is postponed again? 😜

p.s. http://apb.com/ Does not work several days. Suddenly you dont know

Edited by introlapse
  • Like 1

Share this post


Link to post
Share on other sites

Im happy about Little Orbit took over APB, otherwise the game would be dead now, Keep up that good work srysly, Your the best that can happen for APB^^

 

For the whole Team... #RESPECT

  • Like 1

Share this post


Link to post
Share on other sites

Thoroughly impressed with the honesty and quick response to players about what happened.  I hope more gaming companies take after you in the future.  I'll definitely be sticking around.

Share this post


Link to post
Share on other sites

dude,

the fact that you made a thread to explain this, is already reason for praise, thanks for keeping us up to date and i appreciate the premium.

Share this post


Link to post
Share on other sites

Thanks Matt. Thanks for the work you guys did, thanks @Lixil to keep us up to date and for having nice chats with us on Discord, but most important thanks for being so honest with us.

Share this post


Link to post
Share on other sites

Glad that you got it working, but there are a few things i dont really understand.

Why wasn't the database backed up in advance? Seems like a bad idea to rely on the faulty controller to back it up. If it was already backed up you'd waste less time moving it over to the new, new server.

How do you know that it was the raid controller failing, and not a number of drives? And if you were sure it was the controller, why move the data, and not just put in the old drives in the new server?

 

I'm happy with the way you are communicating when there's issues, but you should always expect more issues.  Maybe you should say that it will take longer, than what you actually predict. It kinda sucks when the fix is pushed back several times, and people are waiting for it. It would've been better to push it back by 12 hours twice, than saying "6 hours" "6 hours" "6 hours" "3 hours" "3 hours", it just makes it look like you dont know what you are doing.

Share this post


Link to post
Share on other sites
35 minutes ago, CheesyAPB said:

Glad that you got it working, but there are a few things i dont really understand.

Why wasn't the database backed up in advance? Seems like a bad idea to rely on the faulty controller to back it up. If it was already backed up you'd waste less time moving it over to the new, new server.

How do you know that it was the raid controller failing, and not a number of drives? And if you were sure it was the controller, why move the data, and not just put in the old drives in the new server?

 

I'm happy with the way you are communicating when there's issues, but you should always expect more issues.  Maybe you should say that it will take longer, than what you actually predict. It kinda sucks when the fix is pushed back several times, and people are waiting for it. It would've been better to push it back by 12 hours twice, than saying "6 hours" "6 hours" "6 hours" "3 hours" "3 hours", it just makes it look like you dont know what you are doing.

"Problem #2: Shortly after we went live, the brand new RAID controller in our new primary database server quickly degraded in performance. To make the situation worse, we rushed to get servers back online and decided it would be okay to let players back in while the secondary database server finished syncing. The hardware failure hit so quickly that the secondary wasn't ready, so we couldn't failover. At this point we made an effort to keep the servers online through the weekend, and while our jury-rigged fix allowed some players to get on, it also lead to many other players being unable to login (Error 9). The team decided the quickest way to fix the issue would be to build an entirely new primary database server and then swapping everything over on Monday. We didn't want to risk moving damaged drives to the new servers, so we needed a complete backup to make sure we didn't lose anything." ~ Matt Scott

 

They basically explained their actions pretty correctly and what they did was pretty clear. Secondary database is still "old good one" but their were just synch it up with new primary database but they rushed it and done oopsie while at that.

Also they didn't knew WHEN they gonna fix it, that's why they kept pushing dates. It's all fun to talk like that from our perspective but we talk about not GBs of data but TBs which somehow need to be moved from damaged database. And then try to find new RAID controller over weekend... etc. etc.

 

Be glad we still aren't under old G1 - it most likely would end up with death of game in their case.

 

But yeah, I kinda warned you about inherted G1 hardware, LO.

Edited by Mitne

Share this post


Link to post
Share on other sites
10 hours ago, MattScott said:

Hi all,

 

I'm not interested in sugarcoating or spinning this last week.

I am sorry for all the downtime.

 

Many players have asked what happened, so I'm going to do my best to summarize the events.

 

On Monday the 29th, we took the servers down to move data centers and migrate to all new hardware for our backend systems. This move was necessary for a couple reasons, but mainly because a lot of the hardware was more than 5 years old, and we were already seeing failures and system performance problems. It was only a matter of time before something critical failed. The timing also happened to line up with the end of our last legacy (ludicrously overpriced) hosting contract, and we needed to make some network architecture changes to facilitate some of the features coming after the Engine Upgrade. 

 

I figured we could kill three birds with one stone.

 

The core challenge was managing all the information needed to drive APB since all the hardware was new. That meant backing everything up and then hand carrying a series of large hard drives from one location to another. The hardware had been prepped and configured in advance, so while we knew this would be challenging, but we had a fairly detailed plan and felt we could manage any issues that popped up.

 

Problem #1: Unfortunately, during the move we unearthed some buried issues that delayed our ability to bring servers back online for quite a while. The team did a solid job of working through those problems and even recompiling code in a couple places to remove landmines that we stumbled over. But once we got the servers back online, I felt we did a decent job.

 

Problem #2: Shortly after we went live, the brand new RAID controller in our new primary database server quickly degraded in performance. To make the situation worse, we rushed to get servers back online and decided it would be okay to let players back in while the secondary database server finished syncing. The hardware failure hit so quickly that the secondary wasn't ready, so we couldn't failover. At this point we made an effort to keep the servers online through the weekend, and while our jury-rigged fix allowed some players to get on, it also lead to many other players being unable to login (Error 9). The team decided the quickest way to fix the issue would be to build an entirely new primary database server and then swapping everything over on Monday. We didn't want to risk moving damaged drives to the new servers, so we needed a complete backup to make sure we didn't lose anything.

 

Problem #3: Once we shutoff all the servers and started the backup, we found that the faulty RAID controller could only copy files at the rate of 1GB per minute. After 18+ hours, we were finally able to complete the backup, and then finally get the new server finalized and back online.

 

There are a lot of things that went wrong, but in the end, I should have planned better. With that much new hardware, we were bound to have an issue somewhere.

 

To make it up to everyone, on Friday we will be turning on 2 weeks of free Premium for all players. For anyone who has existing Premium, this will add to it.

 

I never want to have to make this kind of apology again.

Little Orbit can and will do better in the future.

 

Sorry,

Matt

Thanks to the APB team that we can enjoy playing now again. Thanks for this game and all you've done and that you're still working on this game.

Share this post


Link to post
Share on other sites

Glad to know it's back up and running.

 

Thanks Matt!

And thanks to the L.O team, too.

 

 

 

 

Share this post


Link to post
Share on other sites

We had 17 failovers this year with half of them failing themselves because they couldn't handle the load, so everything went down and took hours to get back up.

And we have the luck that we can operate comfortably in the weekend without active users.

 

So you have my respect when it comes to migrating data over with failovers/downtime and this much transparency.

Good job @MattScott and the rest of the team.

Edited by Deadpo0l
  • Like 1

Share this post


Link to post
Share on other sites

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...