pre e301

Uploaded from authorPOINTLite
Views:
 
Category: Entertainment
     
 

Presentation Description

No description available.

Comments

Presentation Transcript

AMUN: 

AMUN A Practical Application Using the Nile Distributed Operating System Authors: R. Baker (Cornell University, Ithaca, NY USA) L. Zhou (University of Florida, Gainesville, FL USA) J. Duboscq (Ohio State University, Columbus, OH USA) Presented by: D. Mimnagh (University of Texas, Austin, TX USA)

Overview: 

Overview What is Nile? What is AMUN? Results Conclusions

What is Nile?: 

What is Nile? Nile: Distributed computing solution for CLEO fault-tolerant (recover from resource failure) self-managing (sophisticated resource scheduling) heterogeneous (will run anything anywhere) Designed for HEP track reconstruction data analysis simulation But very generic

Nile Architecture: 

Nile Architecture

What is AMUN?: 

What is AMUN? Advanced Monte Carlo Under Nile CLEO II.V signal Monte Carlo τ lepton pair events Testbed Nile control system using RMI (see E272) Borrowed workstation program

Slide6: 

Prototype csh scripts list of machine owners Must be easy and honest simple configuration files creation monitor usage remotely and locally allow preemption for unexpected usage need local space for intermediate results Will be integrated with Nile in Java Managing Loaned Workstations

Slide7: 

Very stable weeks of uninterrupted use Heterogeneity as many as 60 machines, Alpha Linux + Unix SpecInt ranging from 1 to 25 Scaling linear Network topology issues can break linearity 1-3 second to reschedule CPU Nile performance Results

Scaling with Total SpecInt: 

Scaling with Total SpecInt

Events Generated: 

Events Generated Job construction requirements: choose subjob size collection script 25 million τ events generated as many as 1 million a day

Conclusion: 

Conclusion Successful implementation of Nile in RMI CPU resources used efficiently loaned CPU To do: rewrite scripts in Java admin tools GUI tools