Tareas #6349

Problema: Zabbix agent on Servidor zabbix en SeCIU is unreachable for 5 minutes

Added by Victor Alem 11 months ago. Updated 11 months ago.

Status:ResueltaStart date:10/28/2019
Priority:NormalDue date:
Assignee:Andrés Pías% Done:

100%

Category:-Spent time:-
Target version:-

Description

Durante todo el fin de semana estuvimos recibiendo alertas como esta:

El zabbix del CURE detectó lo siguiente:
El problema comenzó a las 09:22:59 del 2019.10.28
Problema: Zabbix agent on Servidor zabbix en SeCIU is unreachable for 5 minutes
Equipo: Servidor zabbix en SeCIU
Gravedad: Average

Verifiquemos que todo esté bien.

History

#1 Updated by Victor Alem 11 months ago

  • Assignee changed from Cielito - Monitoreo TI to Victor Torterola
  • % Done changed from 0 to 100

Lanzamos manualmente desde consola el agent.ping desde achiras, el servidor zabbix del CURE:

root@achiras:~# zabbix_get -k agent.ping -s 164.73.98.18
zabbix_get [24431]: Check access restrictions in Zabbix agent configuration
root@achiras:~# zabbix_get -k agent.ping -s 164.73.98.18
zabbix_get [24516]: Check access restrictions in Zabbix agent configuration
root@achiras:~# zabbix_get -k agent.ping -s tacuara.interior.edu.uy
zabbix_get [24546]: Check access restrictions in Zabbix agent configuration

Verificamos las versiones de zabbix en ambos equipos:

root@achiras:~# dpkg -l | grep zabbix
ii  zabbix-agent                   1:4.0.10-1+stretch             amd64        Zabbix network monitoring solution - agent
ii  zabbix-frontend-php            1:4.0.10-1+stretch             all          Zabbix network monitoring solution - PHP front-end
ii  zabbix-get                     1:3.4.15-1+stretch             amd64        Zabbix network monitoring solution - get
ii  zabbix-release                 1:4.0-2+stretch                all          Zabbix official repository configuration
ii  zabbix-server-pgsql            1:4.0.10-1+stretch             amd64        Zabbix network monitoring solution - server (PostgreSQL)

root@tacuara:~# dpkg -l | grep zabbix
ii  zabbix-agent                    1:4.2.3-2+stretch              amd64        Zabbix network monitoring solution - agent
ii  zabbix-frontend-php             1:4.2.1-1+stretch              all          Zabbix network monitoring solution - PHP front-end
ii  zabbix-get                      1:4.2.4-1+stretch              amd64        Zabbix network monitoring solution - get
ii  zabbix-server-mysql             1:4.2.1-1+stretch              amd64        Zabbix network monitoring solution - server (MySQL)

Probamos actualizar achiras, hay una actualización en los repos:

root@achiras:~# apt list --upgradable

[...]

zabbix-agent/desconocido 1:4.0.14-1+stretch amd64 [actualizable desde: 1:4.0.10-1+stretch]
zabbix-frontend-php/desconocido 1:4.0.14-1+stretch all [actualizable desde: 1:4.0.10-1+stretch]
zabbix-get/desconocido 1:4.0.14-1+stretch amd64 [actualizable desde: 1:3.4.15-1+stretch]
zabbix-release/desconocido 1:4.0-3+stretch all [actualizable desde: 1:4.0-2+stretch]
zabbix-server-pgsql/desconocido 1:4.0.14-1+stretch amd64 [actualizable desde: 1:4.0.10-1+stretch]

Actualizamos...

Parece que se arregló:

root@achiras:~# zabbix_get -k agent.ping -s tacuara.interior.edu.uy
1
root@achiras:~# zabbix_get -k agent.ping -s 164.73.98.18
1

Paso para verificación.

#2 Updated by Victor Alem 11 months ago

  • Status changed from Nueva to En curso
  • Assignee changed from Victor Torterola to Victor Alem
Get value from agent failed: ZBX_TCP_READ() timed out

#3 Updated by Victor Alem 11 months ago

Victor Alem escribió:

[...]

No se solucionó, pasamos a analizar logs:

   704:20191028:095239.809 resuming Zabbix agent checks on host "tacuara.interior.edu.uy": connection restored
   704:20191028:095243.811 Zabbix agent item "system.cpu.util[,steal]" on host "tacuara.interior.edu.uy" failed: first network error, wait for 15 seconds
   704:20191028:095302.921 Zabbix agent item "proc.num[,,run]" on host "tacuara.interior.edu.uy" failed: another network error, wait for 15 seconds
   704:20191028:095321.925 Zabbix agent item "system.cpu.switches" on host "tacuara.interior.edu.uy" failed: another network error, wait for 15 seconds
   704:20191028:095340.130 temporarily disabling Zabbix agent checks on host "tacuara.interior.edu.uy": host unavailable
   704:20191028:095657.566 enabling Zabbix agent checks on host "tacuara.interior.edu.uy": host became available
   704:20191028:095701.598 Zabbix agent item "system.cpu.switches" on host "tacuara.interior.edu.uy" failed: first network error, wait for 15 seconds
   704:20191028:095716.629 resuming Zabbix agent checks on host "tacuara.interior.edu.uy": connection restored
   704:20191028:095720.631 Zabbix agent item "system.cpu.switches" on host "tacuara.interior.edu.uy" failed: first network error, wait for 15 seconds
   704:20191028:095739.640 Zabbix agent item "proc.num[]" on host "tacuara.interior.edu.uy" failed: another network error, wait for 15 seconds
   704:20191028:095758.668 Zabbix agent item "system.cpu.intr" on host "tacuara.interior.edu.uy" failed: another network error, wait for 15 seconds
   704:20191028:095813.709 resuming Zabbix agent checks on host "tacuara.interior.edu.uy": connection restored
   704:20191028:095817.711 Zabbix agent item "proc.num[,,run]" on host "tacuara.interior.edu.uy" failed: first network error, wait for 15 seconds
   704:20191028:095832.932 resuming Zabbix agent checks on host "tacuara.interior.edu.uy": connection restored
   704:20191028:095836.962 Zabbix agent item "system.cpu.util[,iowait]" on host "tacuara.interior.edu.uy" failed: first network error, wait for 15 seconds
   704:20191028:095852.020 resuming Zabbix agent checks on host "tacuara.interior.edu.uy": connection restored
   704:20191028:095856.022 Zabbix agent item "vm.memory.size[available]" on host "tacuara.interior.edu.uy" failed: first network error, wait for 15 seconds
   704:20191028:095917.061 resuming Zabbix agent checks on host "tacuara.interior.edu.uy": connection restored
   704:20191028:095921.063 Zabbix agent item "system.cpu.util[,iowait]" on host "tacuara.interior.edu.uy" failed: first network error, wait for 15 seconds
   704:20191028:095936.566 resuming Zabbix agent checks on host "tacuara.interior.edu.uy": connection restored
   704:20191028:095940.568 Zabbix agent item "proc.num[]" on host "tacuara.interior.edu.uy" failed: first network error, wait for 15 seconds
   704:20191028:095955.622 resuming Zabbix agent checks on host "tacuara.interior.edu.uy": connection restored
   704:20191028:095959.656 Zabbix agent item "system.cpu.load[percpu,avg1]" on host "tacuara.interior.edu.uy" failed: first network error, wait for 15 seconds
   704:20191028:100018.659 Zabbix agent item "system.cpu.util[,softirq]" on host "tacuara.interior.edu.uy" failed: another network error, wait for 15 seconds
   704:20191028:100036.708 resuming Zabbix agent checks on host "tacuara.interior.edu.uy": connection restored
   704:20191028:100040.710 Zabbix agent item "system.cpu.util[,interrupt]" on host "tacuara.interior.edu.uy" failed: first network error, wait for 15 seconds
   704:20191028:100055.765 resuming Zabbix agent checks on host "tacuara.interior.edu.uy": connection restored
   704:20191028:100059.884 Zabbix agent item "proc.num[]" on host "tacuara.interior.edu.uy" failed: first network error, wait for 15 seconds
   704:20191028:100114.915 resuming Zabbix agent checks on host "tacuara.interior.edu.uy": connection restored
   704:20191028:100118.916 Zabbix agent item "system.cpu.util[,idle]" on host "tacuara.interior.edu.uy" failed: first network error, wait for 15 seconds
   704:20191028:100205.061 resuming Zabbix agent checks on host "tacuara.interior.edu.uy": connection restored
   704:20191028:100209.062 Zabbix agent item "system.cpu.util[,interrupt]" on host "tacuara.interior.edu.uy" failed: first network error, wait for 15 seconds
   704:20191028:100228.097 Zabbix agent item "system.swap.size[,pfree]" on host "tacuara.interior.edu.uy" failed: another network error, wait for 15 seconds

Problemas de red, parecen...

#4 Updated by Victor Alem 11 months ago

  • Status changed from En curso to Resuelta
  • Assignee changed from Victor Alem to Andrés Pías

Dejé solo configurado solo con IPv4 la dirección del servidor del CURE y localhost. Parece que ahora no está teniendo problemas de red.

Also available in: Atom PDF