Hello!
There was a network blip earlier which triggered massive HA events, the instances being restarted on the nodes unaffected by the outage.
There are hundreds of restarts queued so it may take some time.
We are investigating the issue to find ways on how to avoid it in the future and are deeply sorry for this unfortunate event.
Update: Disabled HA temporarily so this does not happen again while we are investigating and find out why there was a massive disconnect.
iwstack outage
Re: iwstack outage
RFO sent already to all customers:
Hello!
Today at 10:01 AM CEST the iwStack orchestrator received a disconnection event from several hosts due to a bad network card which flooded the vlan which triggered a massive High Availability recover procedure for more than 600 instances.
At 10:50 while most instances were back running, a couple hundreds were stuck in starting state waiting for the network setup to complete.
At 11:10 in the attempt to speed up the process we forced a network restart (including VR rebuild), but this turned to be a wrong solution causing more delay.
Finally at 13:00 all the queued instances were started.
If your instances are still in stopped state, just start them. Please open a ticket if some instance don't start.
At present we have disabled the HA setting on all the instances while we're investigating on the incident.
We are sorry for any inconvenience this issue may have caused.
Hello!
Today at 10:01 AM CEST the iwStack orchestrator received a disconnection event from several hosts due to a bad network card which flooded the vlan which triggered a massive High Availability recover procedure for more than 600 instances.
At 10:50 while most instances were back running, a couple hundreds were stuck in starting state waiting for the network setup to complete.
At 11:10 in the attempt to speed up the process we forced a network restart (including VR rebuild), but this turned to be a wrong solution causing more delay.
Finally at 13:00 all the queued instances were started.
If your instances are still in stopped state, just start them. Please open a ticket if some instance don't start.
At present we have disabled the HA setting on all the instances while we're investigating on the incident.
We are sorry for any inconvenience this issue may have caused.
Who is online
Users browsing this forum: No registered users and 23 guests