Meeting Notes for 12 Mar

Present: Bob, Liz, TomJ

Problems during Edit's Vacation

The goal of this discussion was to analyze the problems we had and how we could take care of them without disturbing Edit.
  • e-Labs, Drupal and quarknet.us not unreachable after Argonne changed the upstream connection to network (March 3)
    • Argonne did not remount the servers properly after change.
    • We were not aware of the problem until users complained. We need to ask Argonne about monitoring. Later Tom discovered they use a tool called Nagios. He asked to added to the notification list. Note that in this particular situation, the Argonne people claimed to be able to see our servers so the tool might not have helped. This is a rare occurrence. Also Tom asked them to add data4 to the list of servers to monitor.
  • Certificates and Phong's e-Mail: Phong was ready to fix this, but he does not have sudo permissions.
    • this was complicated by the problem with the email he sent us being bounced so he could not communicate with us.
    • Another issue was that because the e-Lab help desk listserve was getting a huge number of bounces, it was blacklisted. Once we unsubscribed Phong's fnal account, they could turn it on again.
  • Cron job - one of Phong's cleanup scripts on data2 was causing error. Edit fixed; could we have done this without her?
  • Problem with LIGO taking huge amounts of memory and taking down the servers. Temporary fix from Mihael removed a LIGO class from WEB-INF that disabled the LIGO e-Lab. We needed Mihael for this one.
  • Analysis problem - Kevin Martz
    • Swift code that tries to run wire delay was not properly formed. (munge issue?)
    • Trying to rerun an analysis on a file without a .thresh. Why did this happen?
    • Uploaded files without making .thresh files (When are they not there? – Is there any kind of diagnostics on this). They should be made at upload time. Is it rare problem or is it something that is unreported.). Tom can look for .thresh file for recent uploads? We logged in as admin and looked at listings of processes, the notifications and the also directly at the detector directories on the server.
Stickiness Issues (Retaining our users)
  • We discussed the excel spread sheets of teachers and research groups that Edit made.

Action Items:

  • Asking Argonne about monitoring tools (Tom) - done (see above)
  • Testing the help form to make sure it’s going through (Tom)
  • Changing the i2u2 help to go to Phong’s gmail (Bob) (done)
  • More analysis of excel spreadsheets (Tom) - Tom is going to try to compare the number of logins for research groups associated with a teacher with the number of logins the teacher has made .

Notes for Edit:

  • We have some files that are not creating .thresh files (e.g., Kevin Martz, March 12). Bob was able to upload today and get threshold files. Why are these not getting written?
  • Please add the detector id in the output when you click on ProcessUpload from the admin list-all.jsp page.

  • -- Main.LizQuigg - 2014-03-13
Topic revision: r1 - 2014-03-13, liz
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback