Opened 9 years ago

Closed 7 years ago

Last modified 5 years ago

#867 closed enhancement (duplicate)

clsim - propagation server

Reported by: David Schultz Owned by: David Schultz
Priority: major Milestone:
Component: combo simulation Keywords: clsim
Cc: Melanie Day, Alex Olivas, Claudio Kopper

Description

Melanie has finally found a need for a "clsim server," where you can send it things to propagate and it will return the results.

The problem:

  • Muongun results in really, really large files, and the only way to reduce filesize is to run through detector simulation (to throw out things that don't trigger, I guess).
  • This means running lots of cpu modules on a gpu machine, which is very inefficient.
  • Worse, these are often memory-intensive; gpu jobs should not be using large amounts of memory in the current NPX setup.

Solution:

  • Run the job on a regular cpu slot, with a remote call to a clsim server for the photon propagation.

Implementation:

  • Prototype: just send whole frame to the gpu machine, with it returning the completed frame.
  • Better: send bunches of photons, return raw results
    • helps for high energy events where a single frame is huge
  • RPC communication: json over http? ZeroMQ?
  • Choosing the gpu machine: need a central distribution server
    • registers gpu servers and can load balance client requests
    • if gpu server is open to world, can give client its address to push work directly
    • if gpu server is behind firewall, can act as intermediary for pull-based connections

Control flow diagram (direct):

Client I3Module     Central Server       GPU Server
       |                   |      register    |
       |      request      | <--------------- |
       | ----------------> |                  |
       |    try direct     |                  |
       | <---------------- |                  |
       |        data       |                  |
       | -----------------------------------> |
       |                   |      result      |
       | <----------------------------------- |
       |      release      |                  |
       | ----------------> |                  |
       |                   |      remove      |
       |                   | <--------------- |

Control flow diagram (intermediary):

Client I3Module     Central Server       GPU Server
       |                   |      register    |
       |      request      | <--------------- |
       | ----------------> |                  |
       |     indirect      |                  |
       | <---------------- |                  |
       |       data        |                  |
       | ----------------> |                  |
       |                   |       poll       |
       |                   | <--------------- |
       |                   |       data       |
       |                   | ---------------> |
       |                   |      result      |
       |      result       | <--------------- |
       | <---------------- |                  |
       |      release      |                  |
       | ----------------> |                  |
       |                   |      remove      |
       |                   | <--------------- |

Change History (8)

comment:1 Changed 9 years ago by Alex Olivas

Is there simply a need/use or is this critical? There are mixed priorities in this ticket. There was talk back in the day about doing the same with photonics, but that never really got much traction. A "need" was identified, but it remained low priority.

Is this part of simulation or possibly an iceprod 2 feature?

comment:2 Changed 9 years ago by David Schultz

This is critical in the sense that Melanie needs this soon or she'll be kicked off the gpus in Madison. We haven't done muongun much in production, but we should in the future. So production will likely encounter a similar issue as well.

This is more part of simulation than iceprod.

comment:3 Changed 9 years ago by David Schultz

  • Status changed from new to accepted

Prototype for direct communication finished in r129456/IceCube. Uses ZeroMQ to send the frame hierarchy (GCDQ) to the gpu, and get the resultant Q frame back.

Note that Geant4 doesn't play very nice with this, for some reason. Or maybe that's just clsim + Geant4 in general? But compiling without Geant4 makes it work.

comment:4 Changed 9 years ago by David Schultz

  • Milestone set to IceSim 4.3
  • Priority changed from critical to major

comment:5 Changed 9 years ago by Alex Olivas

  • Milestone changed from IceSim 4.3 to IceSim 5.1

comment:6 Changed 8 years ago by David Schultz

  • Milestone changed from IceSim 5.1 to IceSim 6

comment:7 Changed 7 years ago by David Schultz

  • Resolution set to duplicate
  • Status changed from accepted to closed

The distribute project by Claudio does this and a lot more. Defer to that, and close this out.

comment:8 Changed 5 years ago by Alex Olivas

  • Milestone IceSim 6 deleted

Milestone IceSim? 6 deleted

Note: See TracTickets for help on using tickets.