We are experiencing some reboots among the Milano IWStack clusters.
While the orchestrator tries to cope restarting VMs on other nodes, due to the big number of events, the restarts take a long time, up to 30-50 minutes.
We are investigating the issue.
Issues with IWStack Milano DC1
Re: Issues with IWStack Milano DC1
After a lot of investigations we believe there is a problem with the fiberchannel links, one of them is probably causing some issues affecting the others or the switches.
When the nodes lose connectivity, they are automatically rebooted to protect the data.
A lot of such events at the same time create backlogs, meaning the vms restart slowly.
When the nodes lose connectivity, they are automatically rebooted to protect the data.
A lot of such events at the same time create backlogs, meaning the vms restart slowly.
Re: Issues with IWStack Milano DC1
We believe we have found the issue, as the fc cards were overwhelmed by the number of commands. This cause them to stop accepting new to clear the backlog which inceased the iowait to the point the orchestrator considered the node dead starting the VMs on another which had the same issue in turn, bringing the whole cluster down in a cascade failure, like the ones that cause power grids to fail in countries or areas of countries.
As a result, we will balance the clusters to consider this issue too, hopefully stopping the reboots.
As a result, we will balance the clusters to consider this issue too, hopefully stopping the reboots.
Who is online
Users browsing this forum: No registered users and 7 guests