Datum a čas: 2018-11-30 04:00 CET Očekavaná délka: 120 minut Oznámení se týká serverů: vpsadmin.prg, node2.prg, node3.prg, node4.prg, node5.prg, node6.prg, node7.prg, node8.prg, node9.prg, node10.prg, node11.prg, node12.prg, node13.prg, node14.prg, node15.prg, node16.prg, node17.prg, node18.prg, backuper.prg, nasbox.prg, node1.pgnd, node2.pgnd Typ výpadku: network Důvod: Sitovani nedostupne: problem na L2 Výpadek řeší: Pavel Šnajdr, Richard Marko
V Praze se rozlamala interakce 4 switchu proti sobe, vypada to na cosi s LLDP; museli jsme jako drasticke prozatimni reseni shodit pulku patere, aby se slo pres jednu cestu na jisto, misto dvou volitelnych (kde ani jedna se nevybrala a nefungovala).
Dneska v noci to budu bliz zkoumat; je docela mozne, ze prehodime na BGP/10GE i ne-vpsAdminOS produkci, tohle se neda takhle nechavat, kdyby to melo blbnout dal...
ENGLISH: Date and time: 2018-11-30 04:00 CET Expected duration: 120 minutes Affected systems: vpsadmin.prg, node2.prg, node3.prg, node4.prg, node5.prg, node6.prg, node7.prg, node8.prg, node9.prg, node10.prg, node11.prg, node12.prg, node13.prg, node14.prg, node15.prg, node16.prg, node17.prg, node18.prg, backuper.prg, nasbox.prg, node1.pgnd, node2.pgnd Outage type: network Reason: Network downtime: L2 issues Handled by: Pavel Šnajdr, Richard Marko
We've experienced issues with L2 networking, most likely it was some unexpected Cisco "magic".
Currently we're running with half of the gigabit switches down to prevent further LLDP mess; going to investigate further tonight.
-----BEGIN BASE64 ENCODED PARSEABLE JSON----- eyJpZCI6NDgzLCJwbGFubmVkIjpmYWxzZSwiYmVnaW5zX2F0IjoiMjAxOC0x MS0zMFQwNDowMDowMCswMTowMCIsImR1cmF0aW9uIjoxMjAsInR5cGUiOiJu ZXR3b3JrIiwiZW50aXRpZXMiOlt7Im5hbWUiOiJOb2RlIiwiaWQiOjUsImxh YmVsIjoidnBzYWRtaW4ucHJnIn0seyJuYW1lIjoiTm9kZSIsImlkIjoxMDIs ImxhYmVsIjoibm9kZTIucHJnIn0seyJuYW1lIjoiTm9kZSIsImlkIjoxMDMs ImxhYmVsIjoibm9kZTMucHJnIn0seyJuYW1lIjoiTm9kZSIsImlkIjoxMDQs ImxhYmVsIjoibm9kZTQucHJnIn0seyJuYW1lIjoiTm9kZSIsImlkIjoxMDUs ImxhYmVsIjoibm9kZTUucHJnIn0seyJuYW1lIjoiTm9kZSIsImlkIjoxMDYs ImxhYmVsIjoibm9kZTYucHJnIn0seyJuYW1lIjoiTm9kZSIsImlkIjoxMDgs ImxhYmVsIjoibm9kZTcucHJnIn0seyJuYW1lIjoiTm9kZSIsImlkIjoxMDks ImxhYmVsIjoibm9kZTgucHJnIn0seyJuYW1lIjoiTm9kZSIsImlkIjoxMTAs ImxhYmVsIjoibm9kZTkucHJnIn0seyJuYW1lIjoiTm9kZSIsImlkIjoxMTEs ImxhYmVsIjoibm9kZTEwLnByZyJ9LHsibmFtZSI6Ik5vZGUiLCJpZCI6MTEy LCJsYWJlbCI6Im5vZGUxMS5wcmcifSx7Im5hbWUiOiJOb2RlIiwiaWQiOjEx MywibGFiZWwiOiJub2RlMTIucHJnIn0seyJuYW1lIjoiTm9kZSIsImlkIjox MTQsImxhYmVsIjoibm9kZTEzLnByZyJ9LHsibmFtZSI6Ik5vZGUiLCJpZCI6 MTE1LCJsYWJlbCI6Im5vZGUxNC5wcmcifSx7Im5hbWUiOiJOb2RlIiwiaWQi OjExNiwibGFiZWwiOiJub2RlMTUucHJnIn0seyJuYW1lIjoiTm9kZSIsImlk IjoxMTcsImxhYmVsIjoibm9kZTE2LnByZyJ9LHsibmFtZSI6Ik5vZGUiLCJp ZCI6MTE4LCJsYWJlbCI6Im5vZGUxNy5wcmcifSx7Im5hbWUiOiJOb2RlIiwi aWQiOjExOSwibGFiZWwiOiJub2RlMTgucHJnIn0seyJuYW1lIjoiTm9kZSIs ImlkIjoxNjAsImxhYmVsIjoiYmFja3VwZXIucHJnIn0seyJuYW1lIjoiTm9k ZSIsImlkIjoxNzAsImxhYmVsIjoibmFzYm94LnByZyJ9LHsibmFtZSI6Ik5v ZGUiLCJpZCI6MzAwLCJsYWJlbCI6Im5vZGUxLnBnbmQifSx7Im5hbWUiOiJO b2RlIiwiaWQiOjMwMSwibGFiZWwiOiJub2RlMi5wZ25kIn1dLCJoYW5kbGVy cyI6WyJQYXZlbCDFoG5hamRyIiwiUmljaGFyZCBNYXJrbyJdLCJ0cmFuc2xh dGlvbnMiOnsiZW4iOnsic3VtbWFyeSI6Ik5ldHdvcmsgZG93bnRpbWU6IEwy IGlzc3VlcyIsImRlc2NyaXB0aW9uIjoiV2UndmUgZXhwZXJpZW5jZWQgaXNz dWVzIHdpdGggTDIgbmV0d29ya2luZywgbW9zdCBsaWtlbHkgaXQgd2FzIHNv bWUgdW5leHBlY3RlZCBDaXNjbyBcIm1hZ2ljXCIuXHJcblxyXG5DdXJyZW50 bHkgd2UncmUgcnVubmluZyB3aXRoIGhhbGYgb2YgdGhlIGdpZ2FiaXQgc3dp dGNoZXMgZG93biB0byBwcmV2ZW50IGZ1cnRoZXIgTExEUCBtZXNzOyBnb2lu ZyB0byBpbnZlc3RpZ2F0ZSBmdXJ0aGVyIHRvbmlnaHQuIn0sImNzIjp7InN1 bW1hcnkiOiJTaXRvdmFuaSBuZWRvc3R1cG5lOiBwcm9ibGVtIG5hIEwyIiwi ZGVzY3JpcHRpb24iOiJWIFByYXplIHNlIHJvemxhbWFsYSBpbnRlcmFrY2Ug NCBzd2l0Y2h1IHByb3RpIHNvYmUsIHZ5cGFkYSB0byBuYSBjb3NpIHMgTExE UDsgbXVzZWxpIGpzbWUgamFrbyBkcmFzdGlja2UgcHJvemF0aW1uaSByZXNl bmkgc2hvZGl0IHB1bGt1IHBhdGVyZSwgYWJ5IHNlIHNsbyBwcmVzIGplZG51 IGNlc3R1IG5hIGppc3RvLCBtaXN0byBkdm91IHZvbGl0ZWxueWNoIChrZGUg YW5pIGplZG5hIHNlIG5ldnlicmFsYSBhIG5lZnVuZ292YWxhKS5cclxuXHJc bkRuZXNrYSB2IG5vY2kgdG8gYnVkdSBibGl6IHprb3VtYXQ7IGplIGRvY2Vs YSBtb3puZSwgemUgcHJlaG9kaW1lIG5hIEJHUC8xMEdFIGkgbmUtdnBzQWRt aW5PUyBwcm9kdWtjaSwgdG9obGUgc2UgbmVkYSB0YWtobGUgbmVjaGF2YXQs IGtkeWJ5IHRvIG1lbG8gYmxibm91dCBkYWwuLi4ifX19 -----END BASE64 ENCODED PARSEABLE JSON-----
Vyřešeno v: 2018-12-01 05:00 CET Stav: announced -> closed
Popis: Docasne skororeseni
Aktualne bezime na polovinu switchu, v pripade HW vypadku sitovaciho zarizeni bude potreba prepojit na druhou pulku pomoci prepojeni napajeni manualne.
Vypada to, ze narazime na bugovou interakci bondingu Mikrotik<->Cisco, ktera se velmi spatne debuguje za behu; cili podstatne urychlime deploy 10GE/BGP sitovani, ktere musime tim padem stihnout zavest idealne do konce roku i pro stavajici produkci.
Jeste je 6 serveru, ktere nemaji nainstalovanou 10GE sitovku, jinak uz bychom mohli zapojovat; je to objednane, jakmile to dorazi a bude nakablovano, planujeme po nocich server po serveru konvertovat z OSPF na BGP, seriove po jednom serveru za sebou. Vanoce v datacentru \o/
Ale aspon budeme pripraveni driv na vstup do NIXu :)
Nahlásil: Pavel Šnajdr
ENGLISH: Finished at: 2018-12-01 05:00 CET State: announced -> closed
Summary: Workaround in effect
Effectively we don't have a backup now, since enabling full network configuration seems to trigger bug parade in Mikrotik against Cisco SG500.
In case of a network device problem, we will have to switch power between the two branches of switches;
we will speed up deployment of 10GE networking, so that we have that done by the end of this year, then we'll re-do the gigabit networking from scratch, simplified, without production traffic on them (just for management).
Reported by: Pavel Šnajdr
-----BEGIN BASE64 ENCODED PARSEABLE JSON----- eyJpZCI6MTA5NywiY2hhbmdlcyI6eyJmaW5pc2hlZF9hdCI6eyJmcm9tIjpu dWxsLCJ0byI6IjIwMTgtMTItMDFUMDU6MDA6MDArMDE6MDAifSwic3RhdGUi OnsiZnJvbSI6ImFubm91bmNlZCIsInRvIjoiY2xvc2VkIn19LCJ0cmFuc2xh dGlvbnMiOnsiZW4iOnsic3VtbWFyeSI6Ildvcmthcm91bmQgaW4gZWZmZWN0 IiwiZGVzY3JpcHRpb24iOiJcclxuRWZmZWN0aXZlbHkgd2UgZG9uJ3QgaGF2 ZSBhIGJhY2t1cCBub3csIHNpbmNlIGVuYWJsaW5nIGZ1bGwgbmV0d29yayBj b25maWd1cmF0aW9uIHNlZW1zIHRvIHRyaWdnZXIgYnVnIHBhcmFkZSBpbiBN aWtyb3RpayBhZ2FpbnN0IENpc2NvIFNHNTAwLlxyXG5cclxuSW4gY2FzZSBv ZiBhIG5ldHdvcmsgZGV2aWNlIHByb2JsZW0sIHdlIHdpbGwgaGF2ZSB0byBz d2l0Y2ggcG93ZXIgYmV0d2VlbiB0aGUgdHdvIGJyYW5jaGVzIG9mIHN3aXRj aGVzO1xyXG5cclxud2Ugd2lsbCBzcGVlZCB1cCBkZXBsb3ltZW50IG9mIDEw R0UgbmV0d29ya2luZywgc28gdGhhdCB3ZSBoYXZlIHRoYXQgZG9uZSBieSB0 aGUgZW5kIG9mIHRoaXMgeWVhciwgdGhlbiB3ZSdsbCByZS1kbyB0aGUgZ2ln YWJpdCBuZXR3b3JraW5nIGZyb20gc2NyYXRjaCwgc2ltcGxpZmllZCwgd2l0 aG91dCBwcm9kdWN0aW9uIHRyYWZmaWMgb24gdGhlbSAoanVzdCBmb3IgbWFu YWdlbWVudCkuIn0sImNzIjp7InN1bW1hcnkiOiJEb2Nhc25lIHNrb3JvcmVz ZW5pIiwiZGVzY3JpcHRpb24iOiJBa3R1YWxuZSBiZXppbWUgbmEgcG9sb3Zp bnUgc3dpdGNodSwgdiBwcmlwYWRlIEhXIHZ5cGFka3Ugc2l0b3ZhY2lobyB6 YXJpemVuaSBidWRlIHBvdHJlYmEgcHJlcG9qaXQgbmEgZHJ1aG91IHB1bGt1 IHBvbW9jaSBwcmVwb2plbmkgbmFwYWplbmkgbWFudWFsbmUuXHJcblxyXG5W eXBhZGEgdG8sIHplIG5hcmF6aW1lIG5hIGJ1Z292b3UgaW50ZXJha2NpIGJv bmRpbmd1IE1pa3JvdGlrXHUwMDNjLVx1MDAzZUNpc2NvLCBrdGVyYSBzZSB2 ZWxtaSBzcGF0bmUgZGVidWd1amUgemEgYmVodTsgY2lsaSBwb2RzdGF0bmUg dXJ5Y2hsaW1lIGRlcGxveSAxMEdFL0JHUCBzaXRvdmFuaSwga3RlcmUgbXVz aW1lIHRpbSBwYWRlbSBzdGlobm91dCB6YXZlc3QgaWRlYWxuZSBkbyBrb25j ZSByb2t1IGkgcHJvIHN0YXZhamljaSBwcm9kdWtjaS5cclxuXHJcbkplc3Rl IGplIDYgc2VydmVydSwga3RlcmUgbmVtYWppIG5haW5zdGFsb3Zhbm91IDEw R0Ugc2l0b3ZrdSwgamluYWsgdXogYnljaG9tIG1vaGxpIHphcG9qb3ZhdDsg amUgdG8gb2JqZWRuYW5lLCBqYWttaWxlIHRvIGRvcmF6aSBhIGJ1ZGUgbmFr YWJsb3Zhbm8sIHBsYW51amVtZSBwbyBub2NpY2ggc2VydmVyIHBvIHNlcnZl cnUga29udmVydG92YXQgeiBPU1BGIG5hIEJHUCwgc2VyaW92ZSBwbyBqZWRu b20gc2VydmVydSB6YSBzZWJvdS4gVmFub2NlIHYgZGF0YWNlbnRydSBcXG8v XHJcblxyXG5BbGUgYXNwb24gYnVkZW1lIHByaXByYXZlbmkgZHJpdiBuYSB2 c3R1cCBkbyBOSVh1IDopIn19fQ== -----END BASE64 ENCODED PARSEABLE JSON-----