User Tools

Site Tools


hwlog

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
hwlog [2015/09/29 17:46]
volker
hwlog [2015/09/29 20:14]
volker
Line 2: Line 2:
  
 ===== 29 Sep 2015 ===== ===== 29 Sep 2015 =====
 +
 +  * K80 Failure, Vesta1
 +
 +<​code>​
 +volker@vesta1:​~$ nvidia-smi -L
 +GPU 0: Tesla K80 (UUID: GPU-a4c1ed90-a3e2-22be-1739-bf836ea157fc)
 +GPU 1: Tesla K80 (UUID: GPU-f090063b-9435-87a5-0ba5-34975b5fc981)
 +GPU 2: Tesla K80 (UUID: GPU-f76f6bfa-a99f-5196-7624-24b8a0e84269)
 +GPU 3: Tesla K80 (UUID: GPU-dc4b8a71-f635-ea72-a054-973021340366)
 +GPU 4: Tesla K80 (UUID: GPU-eb698ccb-c615-ab6c-dfc6-74561a6098ea)
 +GPU 5: Tesla K80 (UUID: GPU-c2ecd9ae-8fed-5172-e7e3-6dc6c770e2a4)
 +Unable to determine the product name for gpu 0000:​14:​00.0:​ GPU is lost
 +Unable to determine the product name for gpu 0000:​15:​00.0:​ GPU is lost
 +GPU 8: Tesla K80 (UUID: GPU-63eb6611-c38f-3146-07e4-17b0dd45beec)
 +GPU 9: Tesla K80 (UUID: GPU-3acb2e5f-5862-e59e-237a-24affdbb65dd)
 +GPU 10: Tesla K80 (UUID: GPU-c6ea8d80-24bc-6a47-91ac-616db5b680eb)
 +GPU 11: Tesla K80 (UUID: GPU-27266a29-25e5-c902-4f6b-e7018ee2d145)
 +GPU 12: Tesla K80 (UUID: GPU-3c7c64c7-9ea9-809a-92e9-f78b30174375)
 +GPU 13: Tesla K80 (UUID: GPU-bff19037-7ff9-8e92-41ee-4a1b72196287)
 +GPU 14: Tesla K80 (UUID: GPU-3bfef1d4-9262-f902-84fa-809fde093c1a)
 +GPU 15: Tesla K80 (UUID: GPU-d6784c9c-c990-1d47-da07-30a341630bb6)
 +</​code>​
  
   * K80 Failure, Vesta2   * K80 Failure, Vesta2
Line 24: Line 46:
 GPU 15: Tesla K80 (UUID: GPU-c7fdb877-88ab-3b85-4d4f-ed80cf5cdd7b) GPU 15: Tesla K80 (UUID: GPU-c7fdb877-88ab-3b85-4d4f-ed80cf5cdd7b)
 </​code>​ </​code>​
 +
 +===== 24 Sep 2015 =====
  
   * Retired Pages   * Retired Pages
 +
 +<​code>​
 +volker@vesta1:​~$ nvidia-smi --query-retired-pages=gpu_uuid,​retired_pages.address,​retired_pages.cause --format=csv
 +gpu_uuid, retired_pages.address,​ retired_pages.cause
 +GPU-3acb2e5f-5862-e59e-237a-24affdbb65dd,​ 0x00000000000065a1,​ Double Bit ECC
 +GPU-3acb2e5f-5862-e59e-237a-24affdbb65dd,​ 0x00000000000065e2,​ Double Bit ECC
 +</​code>​
  
 <​code>​ <​code>​
 volker@vesta2:​~$ nvidia-smi --query-retired-pages=gpu_uuid,​retired_pages.address,​retired_pages.cause --format=csv volker@vesta2:​~$ nvidia-smi --query-retired-pages=gpu_uuid,​retired_pages.address,​retired_pages.cause --format=csv
 gpu_uuid, retired_pages.address,​ retired_pages.cause gpu_uuid, retired_pages.address,​ retired_pages.cause
-[GPU is lost], [GPU is lost], Single Bit ECC 
-[GPU is lost], [GPU is lost], Double Bit ECC 
-[GPU is lost], [GPU is lost], Single Bit ECC 
-[GPU is lost], [GPU is lost], Double Bit ECC 
 GPU-4457aa27-9352-3308-199c-86bc3fbe31de,​ 0x0000000000004fc2,​ Double Bit ECC GPU-4457aa27-9352-3308-199c-86bc3fbe31de,​ 0x0000000000004fc2,​ Double Bit ECC
 GPU-4457aa27-9352-3308-199c-86bc3fbe31de,​ 0x0000000000003d62,​ Double Bit ECC GPU-4457aa27-9352-3308-199c-86bc3fbe31de,​ 0x0000000000003d62,​ Double Bit ECC
 </​code>​ </​code>​
- 
hwlog.txt ยท Last modified: 2015/09/29 20:48 by volker