User Tools

Site Tools


hwlog

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
hwlog [2015/09/29 17:48]
volker
hwlog [2015/09/29 20:14]
volker
Line 2: Line 2:
  
 ===== 29 Sep 2015 ===== ===== 29 Sep 2015 =====
 +
 +  * K80 Failure, Vesta1
 +
 +<​code>​
 +volker@vesta1:​~$ nvidia-smi -L
 +GPU 0: Tesla K80 (UUID: GPU-a4c1ed90-a3e2-22be-1739-bf836ea157fc)
 +GPU 1: Tesla K80 (UUID: GPU-f090063b-9435-87a5-0ba5-34975b5fc981)
 +GPU 2: Tesla K80 (UUID: GPU-f76f6bfa-a99f-5196-7624-24b8a0e84269)
 +GPU 3: Tesla K80 (UUID: GPU-dc4b8a71-f635-ea72-a054-973021340366)
 +GPU 4: Tesla K80 (UUID: GPU-eb698ccb-c615-ab6c-dfc6-74561a6098ea)
 +GPU 5: Tesla K80 (UUID: GPU-c2ecd9ae-8fed-5172-e7e3-6dc6c770e2a4)
 +Unable to determine the product name for gpu 0000:​14:​00.0:​ GPU is lost
 +Unable to determine the product name for gpu 0000:​15:​00.0:​ GPU is lost
 +GPU 8: Tesla K80 (UUID: GPU-63eb6611-c38f-3146-07e4-17b0dd45beec)
 +GPU 9: Tesla K80 (UUID: GPU-3acb2e5f-5862-e59e-237a-24affdbb65dd)
 +GPU 10: Tesla K80 (UUID: GPU-c6ea8d80-24bc-6a47-91ac-616db5b680eb)
 +GPU 11: Tesla K80 (UUID: GPU-27266a29-25e5-c902-4f6b-e7018ee2d145)
 +GPU 12: Tesla K80 (UUID: GPU-3c7c64c7-9ea9-809a-92e9-f78b30174375)
 +GPU 13: Tesla K80 (UUID: GPU-bff19037-7ff9-8e92-41ee-4a1b72196287)
 +GPU 14: Tesla K80 (UUID: GPU-3bfef1d4-9262-f902-84fa-809fde093c1a)
 +GPU 15: Tesla K80 (UUID: GPU-d6784c9c-c990-1d47-da07-30a341630bb6)
 +</​code>​
  
   * K80 Failure, Vesta2   * K80 Failure, Vesta2
Line 25: Line 47:
 </​code>​ </​code>​
  
-==== 24 Sep 2015 ====+===== 24 Sep 2015 =====
  
   * Retired Pages   * Retired Pages
 +
 +<​code>​
 +volker@vesta1:​~$ nvidia-smi --query-retired-pages=gpu_uuid,​retired_pages.address,​retired_pages.cause --format=csv
 +gpu_uuid, retired_pages.address,​ retired_pages.cause
 +GPU-3acb2e5f-5862-e59e-237a-24affdbb65dd,​ 0x00000000000065a1,​ Double Bit ECC
 +GPU-3acb2e5f-5862-e59e-237a-24affdbb65dd,​ 0x00000000000065e2,​ Double Bit ECC
 +</​code>​
  
 <​code>​ <​code>​
Line 35: Line 64:
 GPU-4457aa27-9352-3308-199c-86bc3fbe31de,​ 0x0000000000003d62,​ Double Bit ECC GPU-4457aa27-9352-3308-199c-86bc3fbe31de,​ 0x0000000000003d62,​ Double Bit ECC
 </​code>​ </​code>​
- 
hwlog.txt ยท Last modified: 2015/09/29 20:48 by volker