Hardware Log

29 Sep 2015

  • K80 Failure, Vesta2
volker@vesta2:~$ nvidia-smi -L
GPU 0: Tesla K80 (UUID: GPU-0c5ab30c-edd7-5783-863f-1e7456dfe380)
GPU 1: Tesla K80 (UUID: GPU-850dfaee-cd37-0da0-8a0e-89fd2dc4195c)
GPU 2: Tesla K80 (UUID: GPU-53da1a60-cbd9-aaae-7b7a-0486f7a68956)
GPU 3: Tesla K80 (UUID: GPU-1b2c7b2e-b895-a4eb-bac7-ae5887eec894)
GPU 4: Tesla K80 (UUID: GPU-2195f8d4-a646-3ae5-9867-bcbca2cd41b0)
GPU 5: Tesla K80 (UUID: GPU-c0a7616e-6af3-da34-47cb-4ae0a32b3fac)
GPU 6: Tesla K80 (UUID: GPU-aaf87699-406d-a1f0-0d59-d73efb6b906c)
GPU 7: Tesla K80 (UUID: GPU-986eef1c-8c4f-45ac-42af-42bf5792dd92)
Unable to determine the product name for gpu 0000:86:00.0: GPU is lost
Unable to determine the product name for gpu 0000:87:00.0: GPU is lost
GPU 10: Tesla K80 (UUID: GPU-36ef7210-3faa-a757-9e53-3cdc2b973aeb)
GPU 11: Tesla K80 (UUID: GPU-10756d7f-f8d9-3711-a8b9-fd3d30db5e88)
GPU 12: Tesla K80 (UUID: GPU-12ff95e8-3277-e9d1-d87e-02b83bd55431)
GPU 13: Tesla K80 (UUID: GPU-4457aa27-9352-3308-199c-86bc3fbe31de)
GPU 14: Tesla K80 (UUID: GPU-143afd37-0bbd-ba30-8e5f-59eb0a169eda)
GPU 15: Tesla K80 (UUID: GPU-c7fdb877-88ab-3b85-4d4f-ed80cf5cdd7b)

24 Sep 2015

  • Retired Pages
volker@vesta2:~$ nvidia-smi --query-retired-pages=gpu_uuid,retired_pages.address,retired_pages.cause --format=csv
gpu_uuid, retired_pages.address, retired_pages.cause
GPU-4457aa27-9352-3308-199c-86bc3fbe31de, 0x0000000000004fc2, Double Bit ECC
GPU-4457aa27-9352-3308-199c-86bc3fbe31de, 0x0000000000003d62, Double Bit ECC
