This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
hwlog [2015/09/29 17:44] volker created |
hwlog [2015/09/29 20:48] (current) volker [29 Sep 2015] |
||
---|---|---|---|
Line 2: | Line 2: | ||
===== 29 Sep 2015 ===== | ===== 29 Sep 2015 ===== | ||
+ | |||
+ | * K80 Failure, Vesta1 | ||
+ | |||
+ | <code> | ||
+ | volker@vesta1:~$ nvidia-smi -L | ||
+ | GPU 0: Tesla K80 (UUID: GPU-a4c1ed90-a3e2-22be-1739-bf836ea157fc) | ||
+ | GPU 1: Tesla K80 (UUID: GPU-f090063b-9435-87a5-0ba5-34975b5fc981) | ||
+ | GPU 2: Tesla K80 (UUID: GPU-f76f6bfa-a99f-5196-7624-24b8a0e84269) | ||
+ | GPU 3: Tesla K80 (UUID: GPU-dc4b8a71-f635-ea72-a054-973021340366) | ||
+ | GPU 4: Tesla K80 (UUID: GPU-eb698ccb-c615-ab6c-dfc6-74561a6098ea) | ||
+ | GPU 5: Tesla K80 (UUID: GPU-c2ecd9ae-8fed-5172-e7e3-6dc6c770e2a4) | ||
+ | Unable to determine the product name for gpu 0000:14:00.0: GPU is lost | ||
+ | Unable to determine the product name for gpu 0000:15:00.0: GPU is lost | ||
+ | GPU 8: Tesla K80 (UUID: GPU-63eb6611-c38f-3146-07e4-17b0dd45beec) | ||
+ | GPU 9: Tesla K80 (UUID: GPU-3acb2e5f-5862-e59e-237a-24affdbb65dd) | ||
+ | GPU 10: Tesla K80 (UUID: GPU-c6ea8d80-24bc-6a47-91ac-616db5b680eb) | ||
+ | GPU 11: Tesla K80 (UUID: GPU-27266a29-25e5-c902-4f6b-e7018ee2d145) | ||
+ | GPU 12: Tesla K80 (UUID: GPU-3c7c64c7-9ea9-809a-92e9-f78b30174375) | ||
+ | GPU 13: Tesla K80 (UUID: GPU-bff19037-7ff9-8e92-41ee-4a1b72196287) | ||
+ | GPU 14: Tesla K80 (UUID: GPU-3bfef1d4-9262-f902-84fa-809fde093c1a) | ||
+ | GPU 15: Tesla K80 (UUID: GPU-d6784c9c-c990-1d47-da07-30a341630bb6) | ||
+ | </code> | ||
* K80 Failure, Vesta2 | * K80 Failure, Vesta2 | ||
Line 25: | Line 47: | ||
</code> | </code> | ||
+ | * Newly Retired Pages (WTF?) | ||
+ | |||
+ | <code> | ||
+ | volker@vesta2:$ nvidia-smi --query-retired-pages=gpu_uuid,retired_pages.address,retired_pages.cause --format=csv | ||
+ | gpu_uuid, retired_pages.address, retired_pages.cause | ||
+ | GPU-4457aa27-9352-3308-199c-86bc3fbe31de, 0x0000000000004fc2, Double Bit ECC | ||
+ | GPU-4457aa27-9352-3308-199c-86bc3fbe31de, 0x0000000000003d62, Double Bit ECC | ||
+ | GPU-4457aa27-9352-3308-199c-86bc3fbe31de, 0x0000000000006a56, Double Bit ECC | ||
+ | </code> | ||
+ | ===== 24 Sep 2015 ===== | ||
+ | |||
+ | * Retired Pages | ||
+ | |||
+ | <code> | ||
+ | volker@vesta1:~$ nvidia-smi --query-retired-pages=gpu_uuid,retired_pages.address,retired_pages.cause --format=csv | ||
+ | gpu_uuid, retired_pages.address, retired_pages.cause | ||
+ | GPU-3acb2e5f-5862-e59e-237a-24affdbb65dd, 0x00000000000065a1, Double Bit ECC | ||
+ | GPU-3acb2e5f-5862-e59e-237a-24affdbb65dd, 0x00000000000065e2, Double Bit ECC | ||
+ | </code> | ||
+ | |||
+ | <code> | ||
+ | volker@vesta2:~$ nvidia-smi --query-retired-pages=gpu_uuid,retired_pages.address,retired_pages.cause --format=csv | ||
+ | gpu_uuid, retired_pages.address, retired_pages.cause | ||
+ | GPU-4457aa27-9352-3308-199c-86bc3fbe31de, 0x0000000000004fc2, Double Bit ECC | ||
+ | GPU-4457aa27-9352-3308-199c-86bc3fbe31de, 0x0000000000003d62, Double Bit ECC | ||
+ | </code> |