Meeting Notes for 2 January 2008

Present: Mihael, Liz, Tom L., Tom J., Nick
Absent: Eric, Bob, Mike W. Randy, Dan K., K. Whelan, Joao

Refactored Code

Testing Refactored Code

It is hard to compare the results of the analyses between production and the refactored code because the new version takes into account the frequency of the clock.

[Mihael] It occurred to me that the VDL code is slightly different between the two cases, so it may well be that production needs to be updated too in order to be able to run both from the same VDC.

Tom J. does not think we have to change production to check this. He suggested that we could test it by hand on a set of files as it would work in production and compare it with the results from the refactored code. These would be command line tests..

Mihael fixed some bugs related to the SSH provider. He also realized that he did not have access to teraport. His account had expired and he did not realize he did not have access to it because he still had his email. Mike Wilde needs to renew that for Mihael. Still does not work.

[Mihael] It turns out my account is OK, but the problem is caused by a software upgrade on Teraport. It applies to everybody.

With the refactored code we have gotten rid of the RLS problems and so the local machine and cluster should be quite reliable. SSH works and it does not crash. It is a good mechanism ([Mihael] but not as heavily tested - some problems still exist).

We need to build a Stress test to get a good idea how robust the e-Lab is. We have a better chance of understanding problems with our own test than asking Tom L’s students to use it and report problems. JMeter is one way to do this.

Refactored codes works decently. The SWIFT code and everything except the grid works. Nick thinks that Bob is not ready to see it in production

Action Item: Bob, Tom and Nick need to put their heads together to decide the criteria for moving the refactored code into production. They need to decide if we have any show stoppers. This blessing of the refactored code could include Tom’s offer to do some command line tests of the production code to compare the results to what the refactored code does.

Grid Execution

Mihael noted that there are still reliability issues with the grid. Our workhorse should be the local machine and the cluster. Our research project has had the goal to provide access to the grid to students; grid sites don’t seem to be very reliable so maybe it is too early in the development of this technology to rely so heavily on the grid. We should talk to the e-Lab fellows about how they would want to rely on this tool. Do we want the students to give up on this? Is it the responsibility of OSG and/or Teragrid? Both have the reliability issues so it is not specific to any such organization. It is hard to coordinate a bunch of researchers. None has a financial interest in this the way users of Amazon do. We have to use other mechanisms. It’s a research project and we cannot claim that we provide reliable grid execution yet.

[Mihael] There are ways to deal with low reliability, and Swift does employ some. However, we're dealing with a stochastic process, so those ways must necessarily rely on multiple sites in order to achieve acceptable reliability in a given time frame. I think we can get there, but not with the current configuration/code.

We had a long discussion of how we should provide guidance to students about using the grid.

Liz suggested indicating to the students that "the grid is a cutting-edge tool and like all cutting-edge tools, sometimes it does not always work or is flakey. Students are encouraged to use it because there are tangible benefits from it."

But, here’s the rub with making such a statement – we have to convince ourselves that there really is a tangible benefit.

Proving this is difficult because previous problems with the production were due to disk space problems that we have fixed. We also have to be confident that all the servers are using the disk space adequately because it is used in different ways (e.g., for the temporary space when sorting data). Some of the improvements in analyses in the refactored code are due to the ability to submit jobs rather than waiting for the completion of one before you can submit another. This feature does not require grid execution.

Some of the benefits of using the grid will not show up until you have a large number of users although submitting many jobs by one user can help to emulate this.

Action Item: Nick will try to do a series of tests where he exercises the local machine and cluster to its limit and then compares this with exercising the grid execution. He can run multiple jobs at a time, one after another.

Tom L. suggested tests with the St. Joe’s data. (November and December)– Compare how it works with the cluster and grid.

Some other issues we touched on were making our code use parallelism. Our analysis is not parallelized because Mihael did not want to make any changes to how the analyses were done until we were sure the current ones worked. We could be using ten different grid nodes to create threshold files. Is it an essential part of making the grid useful to us? We have scaleability issue – when we have many users at once.

Added Functionality to Cosmic – plotting flux data against other data

  • Proposal 1: Temperature and pressure data can be collected with the DAQ board. Grabbing all of that and then making a plot of temperature or pressure against the flux. Nick has done local tests. Currently Split code automatically chops out everything that is not raw data including any pressure or temperature data. We would have to add something to the workflow to save this information. Students would need to calibrate the pressure. You have to tell the card the multplier to the old cards to convert to a correct pressure.Temperature is on the gps. The pressure is in millibars.

  • Proposal 2: Upload a spreadsheet; their own data; solar data is an example. Correlation between particles output by the sun. We would choose a protocol for uploading data files. (Tom specifically avoided the term spreadsheet.) Columnar data. Time-related data. Tom also talked about having flux vs. pressure.

[Nick] There is also solar data that Bob had pointed out that records the flux and energies of protons and electrons at different times of the day (every 5 minutes). This could be something that would be interesting to look at. Those files are found here. Here are the local plots I made: Temperature Pressure Protons Proposal 1 would be easier to implement, but proposal 2 sounds like it may be more useful in general. We would need to decide on a lot of conventions and protocols before we could start working on it.

[Mihael] Eric and I chatted a while ago about tools and mechanisms to allow more generic data processing/plotting to be done. We spoke about it in the context of LIGO, which is by excellence a project where correlation between multiple data sources can yield meaningful "scientific results" (hehe, sounds like "business processes"). We should probably keep this in mind. We should also keep in mind that flexibility is hard.


CMS – Danielle McDermott and Dan K will be working for a month on e-Lab. We can discuss this further next week.

-- Main.LizQuigg - 02 Jan 2008
Topic revision: r14 - 2008-01-15, LizQuigg
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback