Tareas #3731

Faltan respaldos de godel

Added by Victor Alem over 4 years ago. Updated over 3 years ago.

Status:CerradaStart date:12/01/2014
Priority:InmediataDue date:
Assignee:Victor Alem% Done:

70%

Category:-Spent time:4.00 hours
Target version:-

Description

Recibí este correo:


-------- Mensaje reenviado --------
Asunto:     BackupPC: no hay copias de seguridad recientes de godel.csic.edu.uy
Fecha:     Mon, 01 Dec 2014 01:08:50 -0200
De:     BackupPC
A:     victor

Estimado victor,

No se ha podido completar ninguna copia de seguridad de su PC (godel.csic.edu.uy) durante
13.1 días.
Su PC ha realizado copias de seguridad correctas 16 veces desde
193.3 hasta hace 13.1 días.
Las copias de seguridad deberían efectuarse automáticamente cuando su PC está
conectado a la red.

Si su PC ha estado conectado durante algunas horas a la red durante los últimos
13.1 días debería contactar con su soporte técnico para ver porqué las copias
de seguridad no funcionan adecuadamente.

Por otro lado, si está fuera de la oficina, no hay mucho que se pueda hacer al
respecto salvo copiar manualmente los archivos especialmente críticos a otro
soporte físico. Debería estar al corriente de que cualquier archivo que haya
creado o modificado en los últimos 13.1 días (incluyendo todo el correo nuevo
y archivos adjuntos) no pueden ser restaurados si su disco se avería.

Saludos:
Agente BackupPC
http://backuppc.sourceforge.net

Como no tengo acceso a dalembert no pude ver cual es el problema. Marco como inmediata.


Related issues

Related to Respaldos - Errores #3656: No hay respaldos de Godel Cerrada 11/10/2014
Related to Plataforma de servidores - Errores #3981: Problemas en Zimbra Dirac Resuelta 01/28/2015
Related to Plataforma de servidores - Tareas #5665: Nuevos problemas de CPU en Barrán En curso 05/03/2017
Related to Computación en la nube para el Interior - Tareas #6051: Migración de godel En curso 10/24/2018

History

#1 Updated by Andrés Pías over 4 years ago

Este es parte de log de errores que se observa en Backuppc:

014-12-01 01:00:47 Output from DumpPreUserCmd:   /dev/dm-3: read failed after 0 of 4096 at 0: Error de entrada/salida
2014-12-01 01:00:47 Output from DumpPreUserCmd:   Logical volume "opt_snap" already exists in volume group "godel-r5" 
2014-12-01 01:00:47 Output from DumpPreUserCmd: Creating a mountpoint for the LV...
2014-12-01 01:00:47 Output from DumpPreUserCmd: Mounting the LV...
2014-12-01 01:00:47 Output from DumpPreUserCmd: mount: tipo fs incorrecto, opción incorrecta, superbloque incorrecto en /dev/mapper/godel--r5-opt_snap,
2014-12-01 01:00:47 Output from DumpPreUserCmd:        falta página de código o programa ayudante, u otro error
2014-12-01 01:00:47 Output from DumpPreUserCmd:        En algunos casos se encuentra información en syslog, pruebe
2014-12-01 01:00:47 Output from DumpPreUserCmd:    dmesg | tail   o algo parecido
2014-12-01 06:02:48 Output from DumpPostUserCmd: umount: /opt.snap/: no montado
2014-12-01 06:02:48 Output from DumpPostUserCmd:   /dev/dm-3: read failed after 0 of 4096 at 0: Error de entrada/salida
2014-12-01 06:02:49 Output from DumpPostUserCmd:   /dev/dm-3: read failed after 0 of 4096 at 214748299264: Error de entrada/salida
2014-12-01 06:02:49 Output from DumpPostUserCmd:   /dev/dm-3: read failed after 0 of 4096 at 214748356608: Error de entrada/salida
2014-12-01 06:02:49 Output from DumpPostUserCmd:   /dev/dm-3: read failed after 0 of 4096 at 4096: Error de entrada/salida
2014-12-01 06:02:49 Output from DumpPostUserCmd:   Volume group "godel" not found
2014-12-01 06:02:49 Output from DumpPostUserCmd:   Skipping volume group godel
2014-12-01 06:02:49 Output from DumpPostUserCmd: Backup ended at lun dic 1 06:02:48 UYST 2014.
2014-12-01 06:02:49 Output from DumpPostUserCmd: Fin del script post-respaldo por backuppc..
2014-12-01 06:02:49 Got fatal error during xfer (No files dumped for share /opt.snap)
2014-12-01 06:02:54 Backup aborted (No files dumped for share /opt.snap)

#2 Updated by Andrés Pías over 4 years ago

  • Status changed from Nueva to Resuelta
  • % Done changed from 0 to 80

Para que sucedía, empece creando un nuevo volúmen opt_snap2 como lo hace el script, porque opt_snap ya existía:

root@godel:/dev/godel-r5# lvcreate -L6G -s -n opt_snap2 /dev/godel-r5/opt
  /dev/dm-3: read failed after 0 of 4096 at 0: Error de entrada/salida
  Logical volume "opt_snap2" created

Lo monté y no me dejaba porque el punto de montaje ya estaba ocupado (Luego desmonté el anterior para montar este):

mount -t ext4 -o ro /dev/godel-r5/opt_snap2 /opt.snap/
Logical volume "opt_snap" already exists in volume group "godel-r5" 

Probé volver a crear el volúmen y me dio el mismo error que le daba a backuppc

root@godel:/dev/godel-r5# lvcreate -L6G -s -n opt_snap2 /dev/godel-r5/opt
  /dev/dm-3: read failed after 0 of 4096 at 0: Error de entrada/salida
  Logical volume "opt_snap2" already exists in volume group "godel-r5" 

Al momento estos eran los volúmenes creados:

root@godel:/dev/godel-r5# ls -altr
total 0
drwxr-xr-x  2 root root  100 dic  1 11:34 .
drwxr-xr-x 16 root root 4280 dic  1 11:34 ..
lrwxrwxrwx  1 root root    7 dic  1 11:34 opt_snap -> ../dm-3
lrwxrwxrwx  1 root root    7 dic  1 11:34 opt_snap2 -> ../dm-6
lrwxrwxrwx  1 root root    7 dic  1 11:34 opt -> ../dm-1

Vi que no se habían removido los anteriores porque faltó cambiar el scripts de finalización. Hice los cambios también en fin_respaldo.sh y removí los volumenes manualmente:

lvremove --force /dev/godel-r5/opt_snap
lvremove --force /dev/godel-r5/opt_snap2

Ahora creé el volúmen manualmente y lo monté y no se vió ningún error de lectura:

root@godel:/dev/godel-r5# lvcreate -L6G -s -n opt_snap2 /dev/godel-r5/opt
  Logical volume "opt_snap2" created
root@godel:/dev/godel-r5# mount -t ext4 -o ro /dev/godel-r5/opt_snap2 /opt.snap/
root@godel:/dev/godel-r5# cd /opt.snap/
root@godel:/opt.snap# ls
lost+found  openldap  problemas  scripts  zimbra  zimbra_uptade
root@godel:/# umount /opt.snap/
root@godel:/# lvremove --force /dev/godel-r5/opt_snap2
  Logical volume "opt_snap2" successfully removed

Dejo abierto para monitoreo pero ahora deberían generarse bien los respaldos.

#3 Updated by Andrés Pías over 4 years ago

  • Status changed from Resuelta to En curso
  • Assignee changed from Cielito - adminsys to Cielito - Coord. regional
  • % Done changed from 80 to 70

Seguimos con problemas en los respaldos. Es esta la razón por la que se cae el servicio de correo. Ahora levantó de nuevo Godel, lo tuve que reiniciar a lo bruto porque había quedado sin red.

Había un par de procesos colgados de Backuppc que se estaban corriendo para Godel en D'alembert que los tuve que matar

Ví en la interfaz de Backuppc que el último respaldo exitoso fue el del 1ero de Diciembre.

Bueno... esto son los logs de lo que pasó con los respaldos hoy:

2014-12-06 02:00:19 Output from DumpPreUserCmd: Empieza el script pre-respaldo por backuppc..
2014-12-06 02:00:19 Output from DumpPreUserCmd: Backup started at sáb dic 6 02:00:20 UYST 2014.
2014-12-06 02:00:19 Output from DumpPreUserCmd: Stopping the Zimbra services...
2014-12-06 02:00:19 Output from DumpPreUserCmd:  This may take several minutes.
2014-12-06 02:00:20 Output from DumpPreUserCmd: Host godel.csic.edu.uy
2014-12-06 02:00:20 Output from DumpPreUserCmd:     Stopping vmware-ha...skipped.
2014-12-06 02:00:20 Output from DumpPreUserCmd:         /opt/zimbra/bin/zmhactl missing or not executable.
2014-12-06 02:00:22 Output from DumpPreUserCmd:     Stopping zmconfigd...Done.
2014-12-06 02:00:25 Output from DumpPreUserCmd:     Stopping stats...Done.
2014-12-06 02:00:30 Output from DumpPreUserCmd:     Stopping mta...Done.
2014-12-06 02:00:31 Output from DumpPreUserCmd:     Stopping spell...Done.
2014-12-06 02:00:34 Output from DumpPreUserCmd:     Stopping snmp...Done.
2014-12-06 02:00:34 Output from DumpPreUserCmd:     Stopping cbpolicyd...Done.
2014-12-06 02:00:40 Output from DumpPreUserCmd:     Stopping archiving...Done.
2014-12-06 02:00:46 Output from DumpPreUserCmd:     Stopping opendkim...Done.
2014-12-06 02:00:49 Output from DumpPreUserCmd:     Stopping amavis...Done.
2014-12-06 02:01:38 Output from DumpPreUserCmd:     Stopping antivirus...Done.
2014-12-06 02:01:40 Output from DumpPreUserCmd:     Stopping antispam...Done.
2014-12-06 02:01:40 Output from DumpPreUserCmd:     Stopping proxy...Done.
2014-12-06 02:01:40 Output from DumpPreUserCmd:     Stopping memcached...Done.
2014-12-06 02:05:22 Output from DumpPreUserCmd:     Stopping mailbox...Failed.
2014-12-06 02:05:31 Output from DumpPreUserCmd: Stopping mailboxd...done.
2014-12-06 02:05:31 Output from DumpPreUserCmd: Stopping mysqld...failed.
2014-12-06 02:05:31 Output from DumpPreUserCmd: 
2014-12-06 02:05:31 Output from DumpPreUserCmd: 
2014-12-06 02:05:32 Output from DumpPreUserCmd:     Stopping logger...Done.
2014-12-06 02:05:38 Output from DumpPreUserCmd:     Stopping ldap...Done.
2014-12-06 02:05:38 Output from DumpPreUserCmd: Creating a LV called ZimbraBackup:
2014-12-06 02:05:58 Output from DumpPreUserCmd:   Logical volume "opt_snap" created
2014-12-06 02:05:58 Output from DumpPreUserCmd: Creating a mountpoint for the LV...
2014-12-06 02:05:58 Output from DumpPreUserCmd: Mounting the LV...
2014-12-06 02:05:59 Output from DumpPreUserCmd: Starting the Zimbra services...
2014-12-06 02:05:59 Output from DumpPreUserCmd: Fin del script pre-respaldo, acá sigue backuppc..
2014-12-06 02:06:01 Output from DumpPreUserCmd: Host godel.csic.edu.uy
2014-12-06 02:06:07 Output from DumpPreUserCmd:     Starting ldap...Done.
2014-12-06 02:08:49 Output from DumpPreUserCmd:     Starting zmconfigd...Done.
2014-12-06 02:09:14 Output from DumpPreUserCmd:     Starting logger...Done.
2014-12-06 02:11:34 Output from DumpPreUserCmd:     Starting mailbox...Done.
2014-12-06 02:11:51 Output from DumpPreUserCmd:     Starting amavis...Done.
2014-12-06 02:11:52 Output from DumpPreUserCmd:     Starting antispam...Done.
2014-12-06 02:18:57 Output from DumpPreUserCmd:     Starting antivirus...Done.
2014-12-06 02:22:15 Output from DumpPreUserCmd:     Starting opendkim...Done.
2014-12-06 02:22:15 Output from DumpPreUserCmd:     Starting snmp...Done.
2014-12-06 02:22:15 Output from DumpPreUserCmd:     Starting spell...Done.
2014-12-06 02:22:15 Output from DumpPreUserCmd:     Starting mta...Done.
2014-12-06 02:22:16 Output from DumpPreUserCmd:     Starting stats...Done.
2014-12-06 02:22:22 incr backup started back to 2014-12-01 23:00:13 (backup #168) for directory /etc
2014-12-06 02:25:59 incr backup started back to 2014-12-01 23:00:13 (backup #168) for directory /home
2014-12-06 02:26:42 incr backup started back to 2014-12-01 23:00:13 (backup #168) for directory /var
2014-12-06 02:36:48 incr backup started back to 2014-12-01 23:00:13 (backup #168) for directory /usr/local
2014-12-06 02:36:52 incr backup started back to 2014-12-01 23:00:13 (backup #168) for directory /opt.snap
2014-12-06 09:27:43 Aborting backup up after signal PIPE
2014-12-06 09:27:46 Got fatal error during xfer (aborted by signal=PIPE)
2014-12-06 09:27:53 no ping response
2014-12-06 10:00:14 no ping response
2014-12-06 11:00:11 no ping response
2014-12-06 12:00:08 no ping response
2014-12-06 13:00:07 no ping response
2014-12-06 14:00:07 no ping response

Hago foco en la parte finalde los logs:

2014-12-06 02:36:52 incr backup started back to 2014-12-01 23:00:13 (backup #168) for directory /opt.snap
2014-12-06 09:27:43 Aborting backup up after signal PIPE
2014-12-06 09:27:46 Got fatal error during xfer (aborted by signal=PIPE)
2014-12-06 09:27:53 no ping response
2014-12-06 10:00:14 no ping response

Claramente se vé que se utilizan 7 horas para respaldar el volúmen LVM de 61 GB, pero que no son suficientes.
Luego sale el error signal PIPE que buscando encontré que tiene que ver con que el cliente ssh calló por alguna razón falta de espacio o de memoria. Veo que D'alembert está bien, no se si es un tema de config de backuppc, pero antes de migrado esto funcionaba.

Removí el volúmen opt.snap (porque al fallar no llegó a hacerlo el script) y volví a generar un respaldo incremental para ver cuanto demora y donde se tranca.

#4 Updated by Andrés Pías over 4 years ago

Como otras veces ya a pasado, la solución de estos errores está en aumententar el ClientTimeout, como explica acá :

Short version: it's probably a timeout issue. Try increasing
$Conf{ClientTimeout}. Mine is set to 604800, which is one week.

Estaba en 72000, lo puse en 604800 solo para Godel (en la config del cliente).

#5 Updated by Andrés Pías over 4 years ago

Finalmente el backup terminó.. y terminó bien! ;)

2014-12-06 15:41:34 incr backup started back to 2014-12-01 23:00:13 (backup #168) for directory /opt.snap
2014-12-06 16:37:49 Output from DumpPostUserCmd: Empieza el script post-respaldo por backuppc..
2014-12-06 16:37:49 Output from DumpPostUserCmd: Unmounting and removing the LV.
2014-12-06 16:37:50 Output from DumpPostUserCmd:   Logical volume "opt_snap" successfully removed
2014-12-06 16:37:50 Output from DumpPostUserCmd: Backup ended at sáb dic 6 16:37:52 UYST 2014.
2014-12-06 16:37:50 Output from DumpPostUserCmd: Fin del script post-respaldo por backuppc..
2014-12-06 16:37:50 incr backup 172 complete, 19002 files, 92120949337 bytes, 0 xferErrs (0 bad files, 0 bad shares, 0 other)

Demorando tan solo una hora, no descarto que haya tenido que ver con un problema de red de CSIC a esas horas que hayan quedado respaldos colgados.

#6 Updated by Andrés Pías over 4 years ago

Hoy Godel calló de nuevo y lo reinicié manual.

El problema no está en la configuración de los respaldos, si no en Godel.

Se ve mucho consumo de CPU desde virt-manager, además de que entra en crash su kernel:

Dec  7 16:03:00 godel kernel: [91209.173199] BUG: soft lockup - CPU#1 stuck for 29s! [rsync:3355]
Dec  7 16:03:00 godel kernel: [91209.173209] BUG: soft lockup - CPU#0 stuck for 29s! [zmstatuslog:23807]
Dec  7 16:03:00 godel kernel: [91209.174680] Modules linked in: xt_tcpudp xt_LOG xt_state iptable_filter ip_tables
Dec  7 16:03:00 godel kernel: [91209.174681] Modules linked in: xt_tcpudp xt_LOG
Dec  7 16:03:00 godel kernel: [91209.174684]  nf_conntrack_netlink nfnetlink nf_conntrack_h323 nf_conntrack_proto_udplite nf_conntrack_tftp xt_state iptable_filter ip_$
Dec  7 16:03:00 godel kernel: [91209.174691]  nf_conntrack_sip nf_conntrack_proto_dccp
Dec  7 16:03:00 godel kernel: [91209.174694]  nf_conntrack_netlink
Dec  7 16:03:00 godel kernel: [91209.174697]  nf_conntrack_sane nf_conntrack_netbios_ns nf_conntrack_pptp
Dec  7 16:03:00 godel kernel: [91209.174698]  nfnetlink
Dec  7 16:03:00 godel kernel: [91209.174701]  nf_conntrack_proto_gre nf_conntrack_ftp ts_kmp
Dec  7 16:03:00 godel kernel: [91209.174702]  nf_conntrack_h323
Dec  7 16:03:00 godel kernel: [91209.174705]  nf_conntrack_amanda nf_conntrack_snmp nf_conntrack_broadcast
Dec  7 16:03:00 godel kernel: [91209.174705]  nf_conntrack_proto_udplite
Dec  7 16:03:00 godel kernel: [91209.174709]  nf_conntrack_irc nf_conntrack_proto_sctp xt_conntrack
Dec  7 16:03:00 godel kernel: [91209.174709]  nf_conntrack_tftp
Dec  7 16:03:00 godel kernel: [91209.174713]  x_tables nf_conntrack_ipv4 nf_conntrack
Dec  7 16:03:00 godel kernel: [91209.174713]  nf_conntrack_sip
Dec  7 16:03:00 godel kernel: [91209.174716]  nf_defrag_ipv4 ext2 cirrus
Dec  7 16:03:00 godel kernel: [91209.174717]  nf_conntrack_proto_dccp
Dec  7 16:03:00 godel kernel: [91209.174720]  ttm drm_kms_helper drm
Dec  7 16:03:00 godel kernel: [91209.174720]  nf_conntrack_sane
Dec  7 16:03:00 godel kernel: [91209.174723]  sysimgblt sysfillrect lp
Dec  7 16:03:00 godel kernel: [91209.174724]  nf_conntrack_netbios_ns
Dec  7 16:03:00 godel kernel: [91209.174727]  syscopyarea microcode i2c_piix4
Dec  7 16:03:00 godel kernel: [91209.174727]  nf_conntrack_pptp
Dec  7 16:03:00 godel kernel: [91209.174731]  joydev psmouse serio_raw
Dec  7 16:03:00 godel kernel: [91209.174731]  nf_conntrack_proto_gre nf_conntrack_ftp ts_kmp nf_conntrack_amanda nf_conntrack_snmp nf_conntrack_broadcast nf_conntrack_$
Dec  7 16:03:00 godel kernel: [91209.179401]  parport mac_hid virtio_balloon
Dec  7 16:03:00 godel kernel: [91209.177160]  mac_hid virtio_balloon
Dec  7 16:03:00 godel kernel: [91209.179401]  hid_generic hid_generic usbhid dm_snapshot hid floppy
Dec  7 16:03:00 godel kernel: [91209.177160]
Dec  7 16:03:00 godel kernel: [91209.179401]  usbhid dm_snapshot hid floppy
Dec  7 16:03:00 godel kernel: [91209.177160] CPU 1
Dec  7 16:03:00 godel kernel: [91209.177160] Pid: 3355, comm: rsync Not tainted 3.8.0-44-generic #66~precise1-Ubuntu Bochs Bochs
Dec  7 16:03:00 godel kernel: [91209.177160] RIP: 0010:[<ffffffff8136279f>]  [<ffffffff8136279f>] memset+0x1f/0xb0
Dec  7 16:03:00 godel kernel: [91209.179401] CPU 0
Dec  7 16:03:00 godel kernel: [91209.179401] Pid: 23807, comm: zmstatuslog Not tainted 3.8.0-44-generic #66~precise1-Ubuntu Bochs Bochs
Dec  7 16:03:01 godel kernel: [91209.179401] RIP: 0010:[<ffffffff81360de7>]  [<ffffffff81360de7>] clear_page_c+0x7/0x10
Dec  7 16:03:01 godel kernel: [91209.179401] RSP: 0000:ffff88006f7ef9e0  EFLAGS: 00010246
Dec  7 16:03:01 godel kernel: [91209.179401] RAX: 0000000000000000 RBX: ffffea00034fe6c0 RCX: 0000000000000200
Dec  7 16:03:01 godel kernel: [91209.179401] RDX: 00000000034ff980 RSI: 0000000000000001 RDI: ffff8800d3fe6000
Dec  7 16:03:01 godel kernel: [91209.179401] RBP: ffff88006f7efa38 R08: 0000000000000000 R09: 000000000001c4b4
Dec  7 16:03:01 godel kernel: [91209.179401] R10: 000000000000396c R11: 0000000000000000 R12: 0000000000000001
Dec  7 16:03:01 godel kernel: [91209.179401] R13: 0000000000000000 R14: 00000000000280da R15: ffff88006f7ee000
Dec  7 16:03:01 godel kernel: [91209.179401] FS:  00007fa068600700(0000) GS:ffff88031fc00000(0000) knlGS:0000000000000000
Dec  7 16:03:01 godel kernel: [91209.179401] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Dec  7 16:03:01 godel kernel: [91209.179401] CR2: 00007fa0677ab3b0 CR3: 00000000368d7000 CR4: 00000000000006f0
Dec  7 16:03:01 godel kernel: [91209.177160] RSP: 0018:ffff88030d599920  EFLAGS: 00010206
Dec  7 16:03:01 godel kernel: [91209.177160] RAX: 0000000000000000 RBX: ffffffff8124023a RCX: 0000000000000200
Dec  7 16:03:01 godel kernel: [91209.179401] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec  7 16:03:01 godel kernel: [91209.177160] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800d3fd2000
Dec  7 16:03:01 godel kernel: [91209.179401] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec  7 16:03:01 godel kernel: [91209.177160] RBP: ffff88030d5999e8 R08: 00000000001306c0 R09: ffff8800d3fd2000
Dec  7 16:03:01 godel kernel: [91209.177160] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffde
Dec  7 16:03:01 godel kernel: [91209.177160] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8801034939b0
Dec  7 16:03:01 godel kernel: [91209.177160] FS:  00007f2d84db8700(0000) GS:ffff88031fc80000(0000) knlGS:0000000000000000
Dec  7 16:03:01 godel kernel: [91209.177160] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Dec  7 16:03:01 godel kernel: [91209.177160] CR2: 000000000ce90001 CR3: 0000000111182000 CR4: 00000000000006e0
Dec  7 16:03:01 godel kernel: [91209.179401] Process zmstatuslog (pid: 23807, threadinfo ffff88006f7ee000, task ffff88030c9fc5c0)
Dec  7 16:03:01 godel kernel: [91209.177160] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec  7 16:03:01 godel kernel: [91209.179401] Stack:
Dec  7 16:03:01 godel kernel: [91209.177160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec  7 16:03:01 godel kernel: [91209.177160] Process rsync (pid: 3355, threadinfo ffff88030d598000, task ffff880111285d00)
Dec  7 16:03:01 godel kernel: [91209.177160] Stack:
Dec  7 16:03:01 godel kernel: [91209.177160]  ffffffff8113dab5 0000000000000000 00000000034ff980 ffff88006f7effd8
Dec  7 16:03:01 godel kernel: [91209.179401]
Dec  7 16:03:01 godel kernel: [91209.179401]  ffffffff811db3b3 ffff88030d599988 0000000000000000
Dec  7 16:03:01 godel kernel: [91209.177160] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800d3fd2000
Dec  7 16:03:01 godel kernel: [91209.179401] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec  7 16:03:01 godel kernel: [91209.177160] RBP: ffff88030d5999e8 R08: 00000000001306c0 R09: ffff8800d3fd2000
Dec  7 16:03:01 godel kernel: [91209.177160] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffde
Dec  7 16:03:01 godel kernel: [91209.177160] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8801034939b0
Dec  7 16:03:01 godel kernel: [91209.177160] FS:  00007f2d84db8700(0000) GS:ffff88031fc80000(0000) knlGS:0000000000000000
Dec  7 16:03:01 godel kernel: [91209.177160] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Dec  7 16:03:01 godel kernel: [91209.177160] CR2: 000000000ce90001 CR3: 0000000111182000 CR4: 00000000000006e0
Dec  7 16:03:01 godel kernel: [91209.179401] Process zmstatuslog (pid: 23807, threadinfo ffff88006f7ee000, task ffff88030c9fc5c0)
Dec  7 16:03:01 godel kernel: [91209.177160] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Dec  7 16:03:01 godel kernel: [91209.179401] Stack:
Dec  7 16:03:01 godel kernel: [91209.177160] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec  7 16:03:01 godel kernel: [91209.177160] Process rsync (pid: 3355, threadinfo ffff88030d598000, task ffff880111285d00)
Dec  7 16:03:01 godel kernel: [91209.177160] Stack:
Dec  7 16:03:01 godel kernel: [91209.177160]  ffffffff8113dab5 0000000000000000 00000000034ff980 ffff88006f7effd8
Dec  7 16:03:01 godel kernel: [91209.179401]
Dec  7 16:03:01 godel kernel: [91209.179401]  ffffffff811db3b3 ffff88030d599988 0000000000000000
Dec  7 16:03:01 godel kernel: [91209.177160]  0000000000000040 ffff88031fffbd80 ffff88031fc17170 ffffea00034ff980
Dec  7 16:03:01 godel kernel: [91209.177160]  0000000000000000
Dec  7 16:03:01 godel kernel: [91209.179401]
Dec  7 16:03:01 godel kernel: [91209.179401]  ffff88030d599a88 0000001b0d599958
Dec  7 16:03:01 godel kernel: [91209.177160]  ffff88031fffb6c0 0000000000000000 00000000000280da ffff88006f7efb08
Dec  7 16:03:01 godel kernel: [91209.177160]  ffff88030d599a90 ffff8801034939b0
Dec  7 16:03:01 godel kernel: [91209.179401]
Dec  7 16:03:01 godel kernel: [91209.179401] Call Trace:
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff8113dab5>] ? prep_new_page+0x145/0x1e0
Dec  7 16:03:01 godel kernel: [91209.177160]  0000000000000000 ffffea00034ff480
Dec  7 16:03:01 godel kernel: [91209.179401]  0000000c00000000 00000000001306a5
Dec  7 16:03:01 godel kernel: [91209.177160] Call Trace:
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff811db3b3>] ? do_mpage_readpage+0x293/0x550
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff8113dd37>] get_page_from_freelist+0x1e7/0x5a0
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff8113e782>] __alloc_pages_nodemask+0x152/0x990
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff81193149>] ? memcg_check_events+0x29/0x50
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff81137d88>] ? filemap_fault+0xd8/0x430
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff8113637d>] ? add_to_page_cache_locked+0x7d/0x90
....
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff81186fb1>] ? kmem_cache_alloc+0x31/0x140
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff811db7c6>] mpage_readpages+0xe6/0x130
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff8117dc33>] alloc_pages_vma+0xa3/0x150
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff81240390>] ? ext4_get_block_write+0x20/0x20
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff8115becb>] do_anonymous_page.isra.37+0x7b/0x2f0
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff81240390>] ? ext4_get_block_write+0x20/0x20
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff810b8298>] ? get_futex_value_locked+0x28/0x40
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff81160189>] handle_pte_fault+0x209/0x230
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff81161390>] handle_mm_fault+0x2a0/0x3e0
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff8123bce5>] ext4_readpages+0x45/0x60
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff81141528>] read_pages+0x48/0x100
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff816fbf0f>] __do_page_fault+0x1af/0x560
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff8114173b>] __do_page_cache_readahead+0x15b/0x170
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff811bc929>] ? mntput_no_expire+0x49/0x160
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff81141ab1>] ra_submit+0x21/0x30
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff811bca64>] ? mntput+0x24/0x40
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff81141bd5>] ondemand_readahead+0x115/0x230
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff8119e179>] ? __fput+0x189/0x240
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff81141d70>] page_cache_async_readahead+0x80/0xa0
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff810bb60c>] ? do_futex+0x7c/0x1b0
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff81136a79>] do_generic_file_read.constprop.34+0x269/0x440
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff8107bf3d>] ? task_work_run+0xcd/0xf0
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff81137941>] generic_file_aio_read+0xe1/0x220
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff816fc2ce>] do_page_fault+0xe/0x10
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff8119c753>] do_sync_read+0xa3/0xe0
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff816fb9a5>] do_async_page_fault+0x35/0x90
Dec  7 16:03:01 godel kernel: [91209.179401]  [<ffffffff816f86c8>] async_page_fault+0x28/0x30
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff8119ce90>] vfs_read+0xb0/0x180
Dec  7 16:03:01 godel kernel: [91209.179401] Code: 0f 1f 40
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff8119cfb2>] sys_read+0x52/0xa0
Dec  7 16:03:01 godel kernel: [91209.177160]  [<ffffffff81700c1d>] system_call_fastpath+0x1a/0x1f
Dec  7 16:03:01 godel kernel: [91209.177160] Code: 1e 44 88 1f c3 90 90 90 90 90 90 90 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01$
Dec  7 16:03:01 godel kernel: [91209.179401] 00 4c 8d 6d c0 48 89 d6 4c 89 ef e8 11 d0 ff ff 44 29 e8 eb 9b e8 17 8c cf ff 90 90 90 90 90 90 90 b9 00 02 00 00 31 c0 <f$

Es un bug del Kernel que se solucionaría actualizando a una nueva versión:
http://serverfault.com/questions/336591/how-to-fix-bug-soft-lockup-cpu0-stuck-for-17163091968s

#7 Updated by Andrés Pías over 4 years ago

  • Assignee changed from Cielito - Coord. regional to Victor Alem

La tarea de los respaldos está resuelta, la paso para verificar. Abro otra tarea para los temas generales de servidores.

#8 Updated by Andrés Pías over 4 years ago

  • Status changed from En curso to Resuelta

#9 Updated by Andrés Pías over 3 years ago

  • Status changed from Resuelta to Cerrada

#10 Updated by Andrés Pías almost 2 years ago

  • Related to Tareas #5665: Nuevos problemas de CPU en Barrán added

#11 Updated by Andrés Pías about 2 months ago

Also available in: Atom PDF