Datum a čas: 2022-06-02 10:45 CEST Očekavaná délka: 360 minut Oznámení se týká serverů: Praha, Playground, Praha Storage, Staging Typ výpadku: vps_reset Důvod: Výpadek napájení v MasterDC Praha Výpadek řeší: Pavel Šnajdr, Tomáš Srnka, Jakub Skokan, Martin Myška
V DC v Praze vypadly obě větve napájení. Od obnovy napájení a konektivity jsme pracovali na zprovoznění všech VPS. Aktuálně by měly běžet všechny nody a VPS.
Omlouváme se pokud jsme nezvedli telefon nebo nenapsali dříve, naším cílem bylo co nejdříve zprovoznit všechny systému.
ENGLISH: Date and time: 2022-06-02 10:45 CEST Expected duration: 360 minutes Affected systems: Praha, Playground, Praha Storage, Staging Outage type: vps_reset Reason: Power outage in MasterDC Praha Handled by: Pavel Šnajdr, Tomáš Srnka, Jakub Skokan, Martin Myška
Both power lines went down in MasterDC Praha. We've been working on recovery as soon as power and network connectivity were renewed. This meant restarting all of our hardware. Right now, all nodes should be up and running.
We apologies for not picking up the phone or writing sooner, all of our efforts went to recover our systems.
-----BEGIN BASE64 ENCODED PARSEABLE JSON----- eyJpZCI6OTA0LCJwbGFubmVkIjpmYWxzZSwiYmVnaW5zX2F0IjoiMjAyMi0w Ni0wMlQxMDo0NTowMCswMjowMCIsImR1cmF0aW9uIjozNjAsInR5cGUiOiJ2 cHNfcmVzZXQiLCJlbnRpdGllcyI6W3sibmFtZSI6IkxvY2F0aW9uIiwiaWQi OjMsImxhYmVsIjoiUHJhaGEifSx7Im5hbWUiOiJMb2NhdGlvbiIsImlkIjo1 LCJsYWJlbCI6IlBsYXlncm91bmQifSx7Im5hbWUiOiJMb2NhdGlvbiIsImlk Ijo2LCJsYWJlbCI6IlByYWhhIFN0b3JhZ2UifSx7Im5hbWUiOiJMb2NhdGlv biIsImlkIjo3LCJsYWJlbCI6IlN0YWdpbmcifV0sImhhbmRsZXJzIjpbIlBh dmVsIMWgbmFqZHIiLCJUb23DocWhIFNybmthIiwiSmFrdWIgU2tva2FuIiwi TWFydGluIE15xaFrYSJdLCJ0cmFuc2xhdGlvbnMiOnsiZW4iOnsic3VtbWFy eSI6IlBvd2VyIG91dGFnZSBpbiBNYXN0ZXJEQyBQcmFoYSIsImRlc2NyaXB0 aW9uIjoiQm90aCBwb3dlciBsaW5lcyB3ZW50IGRvd24gaW4gTWFzdGVyREMg UHJhaGEuIFdlJ3ZlIGJlZW4gd29ya2luZyBvbiByZWNvdmVyeSBhcyBzb29u IGFzIHBvd2VyIGFuZCBuZXR3b3JrIGNvbm5lY3Rpdml0eSB3ZXJlIHJlbmV3 ZWQuIFRoaXMgbWVhbnQgcmVzdGFydGluZyBhbGwgb2Ygb3VyIGhhcmR3YXJl LiBSaWdodCBub3csIGFsbCBub2RlcyBzaG91bGQgYmUgdXAgYW5kIHJ1bm5p bmcuXHJcblxyXG5XZSBhcG9sb2dpZXMgZm9yIG5vdCBwaWNraW5nIHVwIHRo ZSBwaG9uZSBvciB3cml0aW5nIHNvb25lciwgYWxsIG9mIG91ciBlZmZvcnRz IHdlbnQgdG8gcmVjb3ZlciBvdXIgc3lzdGVtcy4ifSwiY3MiOnsic3VtbWFy eSI6IlbDvXBhZGVrIG5hcMOhamVuw60gdiBNYXN0ZXJEQyBQcmFoYSIsImRl c2NyaXB0aW9uIjoiViBEQyB2IFByYXplIHZ5cGFkbHkgb2LEmyB2xJt0dmUg bmFww6FqZW7DrS4gT2Qgb2Jub3Z5IG5hcMOhamVuw60gYSBrb25la3Rpdml0 eSBqc21lIHByYWNvdmFsaSBuYSB6cHJvdm96bsSbbsOtIHbFoWVjaCBWUFMu IEFrdHXDoWxuxJsgYnkgbcSbbHkgYsSbxb5ldCB2xaFlY2hueSBub2R5IGEg VlBTLlxyXG5cclxuT21sb3V2w6FtZSBzZSBwb2t1ZCBqc21lIG5lenZlZGxp IHRlbGVmb24gbmVibyBuZW5hcHNhbGkgZMWZw612ZSwgbmHFocOtbSBjw61s ZW0gYnlsbyBjbyBuZWpkxZnDrXZlIHpwcm92b3puaXQgdsWhZWNobnkgc3lz dMOpbXUuIn19fQ== -----END BASE64 ENCODED PARSEABLE JSON-----
Stav: -> closed
Popis: Pár slov k výpadku
Na našem blogu jsme uvedli více informací ke čtvrtečnímu výpadku napájení v Praze. Blog obsahuje také vyjádření, které jsme obdrželi od MasterDC:
https://blog.vpsfree.cz/post-mortem-par-slov-k-vypadku/
Nahlásil: Jakub Skokan
ENGLISH: State: -> closed
Summary: A few words regarding the outage
We have published a blog post with more information about the power outage on Thursday. It also contains a communication we've received from MasterDC.
https://blog.vpsfree.cz/post-mortem-par-slov-k-vypadku/
Since the blog post is in Czech, I include a short translation in English.
The outage was connected to power loss in Prague city, not just the datacenter. While the DC has power backups, they failed and the first power line went down. At this time, we've continued to run on the second power line. As the DC was reconnecting selected devices to the second power line, a short circuit has occurred and the second power line went down as well.
Everything we have in Prague was thus shut down, including our email support and tools that we otherwise use to communicate with our members. We've immediately set out to the DC to sort out the issues on the spot. The outage was further complicated by slowly booting switches. Most of our nodes are booted from PXE and since the nodes were up faster than the network, they failed to boot. This led to an additional delay, as we've had to reset them again. We've also lost one 10G switch during the outage.
We're working on making the PXE server available faster to avoid the boot issues in the future. We appreciate MasterDC's response, as they were open about the situation.
Reported by: Jakub Skokan
-----BEGIN BASE64 ENCODED PARSEABLE JSON----- eyJpZCI6MjM3MywiY2hhbmdlcyI6eyJzdGF0ZSI6eyJmcm9tIjpudWxsLCJ0 byI6ImNsb3NlZCJ9fSwidHJhbnNsYXRpb25zIjp7ImVuIjp7InN1bW1hcnki OiJBIGZldyB3b3JkcyByZWdhcmRpbmcgdGhlIG91dGFnZSIsImRlc2NyaXB0 aW9uIjoiV2UgaGF2ZSBwdWJsaXNoZWQgYSBibG9nIHBvc3Qgd2l0aCBtb3Jl IGluZm9ybWF0aW9uIGFib3V0IHRoZSBwb3dlciBvdXRhZ2Vcclxub24gVGh1 cnNkYXkuIEl0IGFsc28gY29udGFpbnMgYSBjb21tdW5pY2F0aW9uIHdlJ3Zl IHJlY2VpdmVkIGZyb20gTWFzdGVyREMuXHJcblxyXG5odHRwczovL2Jsb2cu dnBzZnJlZS5jei9wb3N0LW1vcnRlbS1wYXItc2xvdi1rLXZ5cGFka3UvXHJc blxyXG5TaW5jZSB0aGUgYmxvZyBwb3N0IGlzIGluIEN6ZWNoLCBJIGluY2x1 ZGUgYSBzaG9ydCB0cmFuc2xhdGlvbiBpbiBFbmdsaXNoLlxyXG5cclxuVGhl IG91dGFnZSB3YXMgY29ubmVjdGVkIHRvIHBvd2VyIGxvc3MgaW4gUHJhZ3Vl IGNpdHksIG5vdCBqdXN0IHRoZSBkYXRhY2VudGVyLlxyXG5XaGlsZSB0aGUg REMgaGFzIHBvd2VyIGJhY2t1cHMsIHRoZXkgZmFpbGVkIGFuZCB0aGUgZmly c3QgcG93ZXIgbGluZSB3ZW50IGRvd24uXHJcbkF0IHRoaXMgdGltZSwgd2Un dmUgY29udGludWVkIHRvIHJ1biBvbiB0aGUgc2Vjb25kIHBvd2VyIGxpbmUu IEFzIHRoZSBEQyB3YXNcclxucmVjb25uZWN0aW5nIHNlbGVjdGVkIGRldmlj ZXMgdG8gdGhlIHNlY29uZCBwb3dlciBsaW5lLCBhIHNob3J0IGNpcmN1aXQg aGFzXHJcbm9jY3VycmVkIGFuZCB0aGUgc2Vjb25kIHBvd2VyIGxpbmUgd2Vu dCBkb3duIGFzIHdlbGwuXHJcblxyXG5FdmVyeXRoaW5nIHdlIGhhdmUgaW4g UHJhZ3VlIHdhcyB0aHVzIHNodXQgZG93biwgaW5jbHVkaW5nIG91ciBlbWFp bCBzdXBwb3J0XHJcbmFuZCB0b29scyB0aGF0IHdlIG90aGVyd2lzZSB1c2Ug dG8gY29tbXVuaWNhdGUgd2l0aCBvdXIgbWVtYmVycy4gV2UndmUgaW1tZWRp YXRlbHlcclxuc2V0IG91dCB0byB0aGUgREMgdG8gc29ydCBvdXQgdGhlIGlz c3VlcyBvbiB0aGUgc3BvdC4gVGhlIG91dGFnZSB3YXMgZnVydGhlclxyXG5j b21wbGljYXRlZCBieSBzbG93bHkgYm9vdGluZyBzd2l0Y2hlcy4gTW9zdCBv ZiBvdXIgbm9kZXMgYXJlIGJvb3RlZCBmcm9tIFBYRVxyXG5hbmQgc2luY2Ug dGhlIG5vZGVzIHdlcmUgdXAgZmFzdGVyIHRoYW4gdGhlIG5ldHdvcmssIHRo ZXkgZmFpbGVkIHRvIGJvb3QuIFRoaXNcclxubGVkIHRvIGFuIGFkZGl0aW9u YWwgZGVsYXksIGFzIHdlJ3ZlIGhhZCB0byByZXNldCB0aGVtIGFnYWluLiBX ZSd2ZSBhbHNvIGxvc3Rcclxub25lIDEwRyBzd2l0Y2ggZHVyaW5nIHRoZSBv dXRhZ2UuXHJcblxyXG5XZSdyZSB3b3JraW5nIG9uIG1ha2luZyB0aGUgUFhF IHNlcnZlciBhdmFpbGFibGUgZmFzdGVyIHRvIGF2b2lkIHRoZSBib290IGlz c3Vlc1xyXG5pbiB0aGUgZnV0dXJlLiBXZSBhcHByZWNpYXRlIE1hc3RlckRD J3MgcmVzcG9uc2UsIGFzIHRoZXkgd2VyZSBvcGVuIGFib3V0IHRoZVxyXG5z aXR1YXRpb24uXHJcbiJ9LCJjcyI6eyJzdW1tYXJ5IjoiUMOhciBzbG92IGsg dsO9cGFka3UiLCJkZXNjcmlwdGlvbiI6Ik5hIG5hxaFlbSBibG9ndSBqc21l IHV2ZWRsaSB2w61jZSBpbmZvcm1hY8OtIGtlIMSNdHZydGXEjW7DrW11IHbD vXBhZGt1IG5hcMOhamVuw61cclxudiBQcmF6ZS4gQmxvZyBvYnNhaHVqZSB0 YWvDqSB2eWrDoWTFmWVuw60sIGt0ZXLDqSBqc21lIG9iZHLFvmVsaSBvZCBN YXN0ZXJEQzpcclxuXHJcbmh0dHBzOi8vYmxvZy52cHNmcmVlLmN6L3Bvc3Qt bW9ydGVtLXBhci1zbG92LWstdnlwYWRrdS9cclxuIn19fQ== -----END BASE64 ENCODED PARSEABLE JSON-----