User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
hwlog [2015/09/29 17:46]
hwlog [2015/09/29 20:48] (current)
volker [29 Sep 2015]
Line 2: Line 2:
 ===== 29 Sep 2015 ===== ===== 29 Sep 2015 =====
 +  * K80 Failure, Vesta1
 +volker@vesta1:​~$ nvidia-smi -L
 +GPU 0: Tesla K80 (UUID: GPU-a4c1ed90-a3e2-22be-1739-bf836ea157fc)
 +GPU 1: Tesla K80 (UUID: GPU-f090063b-9435-87a5-0ba5-34975b5fc981)
 +GPU 2: Tesla K80 (UUID: GPU-f76f6bfa-a99f-5196-7624-24b8a0e84269)
 +GPU 3: Tesla K80 (UUID: GPU-dc4b8a71-f635-ea72-a054-973021340366)
 +GPU 4: Tesla K80 (UUID: GPU-eb698ccb-c615-ab6c-dfc6-74561a6098ea)
 +GPU 5: Tesla K80 (UUID: GPU-c2ecd9ae-8fed-5172-e7e3-6dc6c770e2a4)
 +Unable to determine the product name for gpu 0000:​14:​00.0:​ GPU is lost
 +Unable to determine the product name for gpu 0000:​15:​00.0:​ GPU is lost
 +GPU 8: Tesla K80 (UUID: GPU-63eb6611-c38f-3146-07e4-17b0dd45beec)
 +GPU 9: Tesla K80 (UUID: GPU-3acb2e5f-5862-e59e-237a-24affdbb65dd)
 +GPU 10: Tesla K80 (UUID: GPU-c6ea8d80-24bc-6a47-91ac-616db5b680eb)
 +GPU 11: Tesla K80 (UUID: GPU-27266a29-25e5-c902-4f6b-e7018ee2d145)
 +GPU 12: Tesla K80 (UUID: GPU-3c7c64c7-9ea9-809a-92e9-f78b30174375)
 +GPU 13: Tesla K80 (UUID: GPU-bff19037-7ff9-8e92-41ee-4a1b72196287)
 +GPU 14: Tesla K80 (UUID: GPU-3bfef1d4-9262-f902-84fa-809fde093c1a)
 +GPU 15: Tesla K80 (UUID: GPU-d6784c9c-c990-1d47-da07-30a341630bb6)
   * K80 Failure, Vesta2   * K80 Failure, Vesta2
Line 24: Line 46:
 GPU 15: Tesla K80 (UUID: GPU-c7fdb877-88ab-3b85-4d4f-ed80cf5cdd7b) GPU 15: Tesla K80 (UUID: GPU-c7fdb877-88ab-3b85-4d4f-ed80cf5cdd7b)
 </​code>​ </​code>​
 +  * Newly Retired Pages (WTF?)
 +volker@vesta2:​$ nvidia-smi --query-retired-pages=gpu_uuid,​retired_pages.address,​retired_pages.cause --format=csv
 +gpu_uuid, retired_pages.address,​ retired_pages.cause
 +GPU-4457aa27-9352-3308-199c-86bc3fbe31de,​ 0x0000000000004fc2,​ Double Bit ECC
 +GPU-4457aa27-9352-3308-199c-86bc3fbe31de,​ 0x0000000000003d62,​ Double Bit ECC
 +GPU-4457aa27-9352-3308-199c-86bc3fbe31de,​ 0x0000000000006a56,​ Double Bit ECC
 +===== 24 Sep 2015 =====
   * Retired Pages   * Retired Pages
 +volker@vesta1:​~$ nvidia-smi --query-retired-pages=gpu_uuid,​retired_pages.address,​retired_pages.cause --format=csv
 +gpu_uuid, retired_pages.address,​ retired_pages.cause
 +GPU-3acb2e5f-5862-e59e-237a-24affdbb65dd,​ 0x00000000000065a1,​ Double Bit ECC
 +GPU-3acb2e5f-5862-e59e-237a-24affdbb65dd,​ 0x00000000000065e2,​ Double Bit ECC
 <​code>​ <​code>​
Line 33: Line 73:
 GPU-4457aa27-9352-3308-199c-86bc3fbe31de,​ 0x0000000000003d62,​ Double Bit ECC GPU-4457aa27-9352-3308-199c-86bc3fbe31de,​ 0x0000000000003d62,​ Double Bit ECC
 </​code>​ </​code>​
hwlog.1443541615.txt.gz ยท Last modified: 2015/09/29 17:46 by volker