Datum a čas: 2020-09-03 16:22 CEST
Očekavaná délka: 0 minut
Oznámení se týká serverů: Brno
Typ výpadku: network
Důvod: Sekundarni router mrtev
Výpadek řeší: Pavel Šnajdr
Klasika Mikrotiku, mrtvy zdroj - objednali jsme novejsi verzi s dvema zdroji. Traffic na NASbox z Brna musi byt omezen (do doby, nez router nahradime, par dni max).
ENGLISH:
Date and time: 2020-09-03 16:22 CEST
Expected duration: 0 minutes
Affected systems: Brno
Outage type: network
Reason: Secondary router down
Handled by: Pavel Šnajdr
Mikrotik classics (tm) - dead PSU, ordered a newer dual-PSU version. Traffic to NASbox will be throttled until the router is replaced (few days at max).
-----BEGIN BASE64 ENCODED PARSEABLE JSON-----
eyJpZCI6NjgxLCJwbGFubmVkIjpmYWxzZSwiYmVnaW5zX2F0IjoiMjAyMC0w
OS0wM1QxNjoyMjowMCswMjowMCIsImR1cmF0aW9uIjowLCJ0eXBlIjoibmV0
d29yayIsImVudGl0aWVzIjpbeyJuYW1lIjoiTG9jYXRpb24iLCJpZCI6NCwi
bGFiZWwiOiJCcm5vIn1dLCJoYW5kbGVycyI6WyJQYXZlbCDFoG5hamRyIl0s
InRyYW5zbGF0aW9ucyI6eyJlbiI6eyJzdW1tYXJ5IjoiU2Vjb25kYXJ5IHJv
dXRlciBkb3duIiwiZGVzY3JpcHRpb24iOiJNaWtyb3RpayBjbGFzc2ljcyAo
dG0pIC0gZGVhZCBQU1UsIG9yZGVyZWQgYSBuZXdlciBkdWFsLVBTVSB2ZXJz
aW9uLiBUcmFmZmljIHRvIE5BU2JveCB3aWxsIGJlIHRocm90dGxlZCB1bnRp
bCB0aGUgcm91dGVyIGlzIHJlcGxhY2VkIChmZXcgZGF5cyBhdCBtYXgpLiJ9
LCJjcyI6eyJzdW1tYXJ5IjoiU2VrdW5kYXJuaSByb3V0ZXIgbXJ0ZXYiLCJk
ZXNjcmlwdGlvbiI6IktsYXNpa2EgTWlrcm90aWt1LCBtcnR2eSB6ZHJvaiAt
IG9iamVkbmFsaSBqc21lIG5vdmVqc2kgdmVyemkgcyBkdmVtYSB6ZHJvamku
IFRyYWZmaWMgbmEgTkFTYm94IHogQnJuYSBtdXNpIGJ5dCBvbWV6ZW4gKGRv
IGRvYnksIG5leiByb3V0ZXIgbmFocmFkaW1lLCBwYXIgZG5pIG1heCkuIn19
fQ==
-----END BASE64 ENCODED PARSEABLE JSON-----
Datum a čas: 2020-10-25 21:26 CET
Očekavaná délka: 35 minut
Oznámení se týká serverů: node1.stg
Typ výpadku: restart
Důvod: Test Linuxu 5.9.1
Výpadek řeší: Pavel Šnajdr
+ opravu pro funkci availableProcessors v Jave
ENGLISH:
Date and time: 2020-10-25 21:26 CET
Expected duration: 35 minutes
Affected systems: node1.stg
Outage type: restart
Reason: Testing Linux 5.9.1
Handled by: Pavel Šnajdr
+ fix for Java availableProcessors function
-----BEGIN BASE64 ENCODED PARSEABLE JSON-----
eyJpZCI6Njk4LCJwbGFubmVkIjpmYWxzZSwiYmVnaW5zX2F0IjoiMjAyMC0x
MC0yNVQyMToyNjowMCswMTowMCIsImR1cmF0aW9uIjozNSwidHlwZSI6InJl
c3RhcnQiLCJlbnRpdGllcyI6W3sibmFtZSI6Ik5vZGUiLCJpZCI6NDAwLCJs
YWJlbCI6Im5vZGUxLnN0ZyJ9XSwiaGFuZGxlcnMiOlsiUGF2ZWwgxaBuYWpk
ciJdLCJ0cmFuc2xhdGlvbnMiOnsiZW4iOnsic3VtbWFyeSI6IlRlc3Rpbmcg
TGludXggNS45LjEiLCJkZXNjcmlwdGlvbiI6IisgZml4IGZvciBKYXZhIGF2
YWlsYWJsZVByb2Nlc3NvcnMgZnVuY3Rpb24ifSwiY3MiOnsic3VtbWFyeSI6
IlRlc3QgTGludXh1IDUuOS4xIiwiZGVzY3JpcHRpb24iOiIrIG9wcmF2dSBw
cm8gZnVua2NpIGF2YWlsYWJsZVByb2Nlc3NvcnMgdiBKYXZlIn19fQ==
-----END BASE64 ENCODED PARSEABLE JSON-----
Datum a čas: 2020-10-23 01:05 CEST
Očekavaná délka: 45 minut
Oznámení se týká serverů: node15.prg, node19.prg, node20.prg, node1.pgnd
Typ odstávky: restart
Důvod: Aktualizace na Linux 5.9
Odstávku řeší: Pavel Šnajdr
Nase jadro od ted virtualizuje /proc/meminfo a pametovou cast sysinfo(2), co znamena, ze kontejnerizovane aplikace dodrzuji pametove limity VPSky a tak se redukuje velka cast OOM situaci.
ENGLISH:
Date and time: 2020-10-23 01:05 CEST
Expected duration: 45 minutes
Affected systems: node15.prg, node19.prg, node20.prg, node1.pgnd
Maintenance type: restart
Reason: Update to Linux 5.9
Handled by: Pavel Šnajdr
Our kernel now includes virtualized /proc/meminfo and the memory bit of sysinfo(2), which make applications in nested containerization respect the memory limits of the VPS, reducing OOM situations by a lot.
-----BEGIN BASE64 ENCODED PARSEABLE JSON-----
eyJpZCI6Njk3LCJwbGFubmVkIjp0cnVlLCJiZWdpbnNfYXQiOiIyMDIwLTEw
LTIzVDAxOjA1OjAwKzAyOjAwIiwiZHVyYXRpb24iOjQ1LCJ0eXBlIjoicmVz
dGFydCIsImVudGl0aWVzIjpbeyJuYW1lIjoiTm9kZSIsImlkIjoxMTYsImxh
YmVsIjoibm9kZTE1LnByZyJ9LHsibmFtZSI6Ik5vZGUiLCJpZCI6MTIwLCJs
YWJlbCI6Im5vZGUxOS5wcmcifSx7Im5hbWUiOiJOb2RlIiwiaWQiOjEyMSwi
bGFiZWwiOiJub2RlMjAucHJnIn0seyJuYW1lIjoiTm9kZSIsImlkIjozMDAs
ImxhYmVsIjoibm9kZTEucGduZCJ9XSwiaGFuZGxlcnMiOlsiUGF2ZWwgxaBu
YWpkciJdLCJ0cmFuc2xhdGlvbnMiOnsiZW4iOnsic3VtbWFyeSI6IlVwZGF0
ZSB0byBMaW51eCA1LjkiLCJkZXNjcmlwdGlvbiI6Ik91ciBrZXJuZWwgbm93
IGluY2x1ZGVzIHZpcnR1YWxpemVkIC9wcm9jL21lbWluZm8gYW5kIHRoZSBt
ZW1vcnkgYml0IG9mIHN5c2luZm8oMiksIHdoaWNoIG1ha2UgYXBwbGljYXRp
b25zIGluIG5lc3RlZCBjb250YWluZXJpemF0aW9uIHJlc3BlY3QgdGhlIG1l
bW9yeSBsaW1pdHMgb2YgdGhlIFZQUywgcmVkdWNpbmcgT09NIHNpdHVhdGlv
bnMgYnkgYSBsb3QuIn0sImNzIjp7InN1bW1hcnkiOiJBa3R1YWxpemFjZSBu
YSBMaW51eCA1LjkiLCJkZXNjcmlwdGlvbiI6Ik5hc2UgamFkcm8gb2QgdGVk
IHZpcnR1YWxpenVqZSAvcHJvYy9tZW1pbmZvIGEgcGFtZXRvdm91IGNhc3Qg
c3lzaW5mbygyKSwgY28gem5hbWVuYSwgemUga29udGVqbmVyaXpvdmFuZSBh
cGxpa2FjZSBkb2RyenVqaSBwYW1ldG92ZSBsaW1pdHkgVlBTa3kgYSB0YWsg
c2UgcmVkdWt1amUgdmVsa2EgY2FzdCBPT00gc2l0dWFjaS4ifX19
-----END BASE64 ENCODED PARSEABLE JSON-----
Datum a čas: 2020-10-19 14:42 CEST
Očekavaná délka: 35 minut
Oznámení se týká serverů: node16.prg
Typ výpadku: reset
Důvod: Zaseknute jadro
Výpadek řeší: Pavel Šnajdr
Resetuju pro nabeh s novejsim Linuxem 5.9, ktery ma v sobe spoustu fixu ohledne memory cgroup, cimz by tenhle node mel bezet stabilneji.
ENGLISH:
Date and time: 2020-10-19 14:42 CEST
Expected duration: 35 minutes
Affected systems: node16.prg
Outage type: reset
Reason: Kernel stuck
Handled by: Pavel Šnajdr
Reseting to boot with newer Linux 5.9, which contains a lot of fixes for memory cgroups, that should make the node run more stable.
-----BEGIN BASE64 ENCODED PARSEABLE JSON-----
eyJpZCI6Njk1LCJwbGFubmVkIjpmYWxzZSwiYmVnaW5zX2F0IjoiMjAyMC0x
MC0xOVQxNDo0MjowMCswMjowMCIsImR1cmF0aW9uIjozNSwidHlwZSI6InJl
c2V0IiwiZW50aXRpZXMiOlt7Im5hbWUiOiJOb2RlIiwiaWQiOjExNywibGFi
ZWwiOiJub2RlMTYucHJnIn1dLCJoYW5kbGVycyI6WyJQYXZlbCDFoG5hamRy
Il0sInRyYW5zbGF0aW9ucyI6eyJlbiI6eyJzdW1tYXJ5IjoiS2VybmVsIHN0
dWNrIiwiZGVzY3JpcHRpb24iOiJSZXNldGluZyB0byBib290IHdpdGggbmV3
ZXIgTGludXggNS45LCB3aGljaCBjb250YWlucyBhIGxvdCBvZiBmaXhlcyBm
b3IgbWVtb3J5IGNncm91cHMsIHRoYXQgc2hvdWxkIG1ha2UgdGhlIG5vZGUg
cnVuIG1vcmUgc3RhYmxlLiJ9LCJjcyI6eyJzdW1tYXJ5IjoiWmFzZWtudXRl
IGphZHJvIiwiZGVzY3JpcHRpb24iOiJSZXNldHVqdSBwcm8gbmFiZWggcyBu
b3ZlanNpbSBMaW51eGVtIDUuOSwga3RlcnkgbWEgdiBzb2JlIHNwb3VzdHUg
Zml4dSBvaGxlZG5lIG1lbW9yeSBjZ3JvdXAsIGNpbXogYnkgdGVuaGxlIG5v
ZGUgbWVsIGJlemV0IHN0YWJpbG5lamkuIn19fQ==
-----END BASE64 ENCODED PARSEABLE JSON-----
Datum a čas: 2020-10-09 10:09 CEST
Očekavaná délka: 35 minut
Oznámení se týká serverů: node15.prg
Typ výpadku: reset
Důvod: Sluzby jadra prilis pomale
Výpadek řeší: Pavel Šnajdr
Pravdepodobne kvuli vzrustajicimu poctu zombie procesu v par kontejnerech :(
ENGLISH:
Date and time: 2020-10-09 10:09 CEST
Expected duration: 35 minutes
Affected systems: node15.prg
Outage type: reset
Reason: Kernel services too slow to respond
Handled by: Pavel Šnajdr
Increasing number of zombie processes in few containers caused the node to slow down in responses to a crawl.
-----BEGIN BASE64 ENCODED PARSEABLE JSON-----
eyJpZCI6NjkxLCJwbGFubmVkIjpmYWxzZSwiYmVnaW5zX2F0IjoiMjAyMC0x
MC0wOVQxMDowOTowMCswMjowMCIsImR1cmF0aW9uIjozNSwidHlwZSI6InJl
c2V0IiwiZW50aXRpZXMiOlt7Im5hbWUiOiJOb2RlIiwiaWQiOjExNiwibGFi
ZWwiOiJub2RlMTUucHJnIn1dLCJoYW5kbGVycyI6WyJQYXZlbCDFoG5hamRy
Il0sInRyYW5zbGF0aW9ucyI6eyJlbiI6eyJzdW1tYXJ5IjoiS2VybmVsIHNl
cnZpY2VzIHRvbyBzbG93IHRvIHJlc3BvbmQiLCJkZXNjcmlwdGlvbiI6Iklu
Y3JlYXNpbmcgbnVtYmVyIG9mIHpvbWJpZSBwcm9jZXNzZXMgaW4gZmV3IGNv
bnRhaW5lcnMgY2F1c2VkIHRoZSBub2RlIHRvIHNsb3cgZG93biBpbiByZXNw
b25zZXMgdG8gYSBjcmF3bC4ifSwiY3MiOnsic3VtbWFyeSI6IlNsdXpieSBq
YWRyYSBwcmlsaXMgcG9tYWxlIiwiZGVzY3JpcHRpb24iOiJQcmF2ZGVwb2Rv
Ym5lIGt2dWxpIHZ6cnVzdGFqaWNpbXUgcG9jdHUgem9tYmllIHByb2Nlc3Ug
diBwYXIga29udGVqbmVyZWNoIDooIn19fQ==
-----END BASE64 ENCODED PARSEABLE JSON-----