Support Summary for Aug 13-19th

Dominant issues or repeated problems and their causes, if known. Suggestions for solutions would also be nice.

 

  • Privileged operations:
  • Boot manager:
  1. My node doesn't boot with the error 'unable to contact any boot servers' -- This problem has come up often, sometimes because of the high load on the db and web servers, making them fail requests with a 500 internel server error. Faiyaz and Tony have recently resolved this issue. But in general /tmp/bm.log should contain the exact error.
  • General networking:
  1. When you traceroute PL nodes, they don't return the port unreachable message that they're supposed to as the last hop -- I tcpdumped such a traceroute and confirmed that they do emit the message, so the assumption is that they get firewalled off somewhere.
  • Sensors:
  1. I would like to implement a sensor for PL, can I get more details about how CoMon, Netflow etc. implement it? -- The sensor API has been dormant for a long time, and Netflow and CoMon do not actually implement it. Andy is the person to contact about this API.
  • Node manager
  • GUI
  1. I can't upload my public key because the GUI rejects it. -- It turns out that the regex we use to validate keys is quite restrictive and rejects anything not generated by ssh-keyge (eg. Solaris keys). Short answer - use ssh-keygen.
  • Misc
  1. Does PL have a blacklist of IP addresses belonging to administrators who rant? -- No, but Neil Spring @ UMD does.