Energy-SFE banner - Based on the public domain photo by Simon: https://pixabay.com/en/wind-energy-wind-power-pinwheel-722360/

EnergySFE

Energy-aware Scheduling and Fault Tolerance Techniques for the Exascale Era - STIC-AmSud Project Grant 99999.007556/2015-02

Scientific Missions

  • Dec. 2 - Dec. 7, 2017 – Patricia Plentz went to Quito, Ecuador, where she presented the research activities developed in LaPeSD and had meetings with local researchers.
  • Nov. 23 - Nov. 27, 2017 – Vanessa Vargas and Pablo Ramos went to Florianópolis, Brazil, where they visited LaPeSD and ECL. Moreover, they presented talks in the themes related to fault tolerance.
  • Sept. 24 - Oct. 9, 2017 – Paolo Rech went to Grenoble, France, where he visited local researchers, presented talks in themes related to the project [slides] and presented a paper at RADECS.
  • July 30 - Aug. 7, 2017 – Jean-François Méhaut went to Florianópolis, Brazil, where he visited LaPeSD and ECL at UFSC and had meetings with researchers.
  • Jan. 29 - Feb. 21, 2017 – Márcio Castro went to Grenoble, France, where he worked on a scheduling algorithm with other researchers and used the local infrastructure for experiments.
  • Nov. 29 - Dec. 3, 2016 – Laércio Pilla went to Quito, Ecuador, where he presented EnergySFE at the IV REDU [slides] and had meetings with researchers.
  • Sept. 25 - Oct. 4, 2016 – Jean-François Méhaut went to Florianópolis, Brazil, where he visited LaPeSD and ECL at UFSC and had meetings with researchers.
  • Sept. 1st, 2016 – First EnergySFE International Workshop [Click here for more information]
  • Aug. 21 - Sept. 6, 2016 – Márcio Castro went to Grenoble, France, where he organized the first workshop of the project, visited the neutron accelerator Genepi-2 and participated in Euro-Par 2016.

Project Objectives

The main goal of the EnergySFE research project is to propose fast and scalable energy-aware scheduling and fault tolerance techniques and algorithms for large-scale highly parallel architectures. To achieve this goal, it will be crucial to answer the following research questions:

Research Questions

  • How to schedule tasks and threads that compete for resources with different constraints while considering the complex hierarchical organization of future Exascale supercomputers?
  • How to tolerate faults without incurring in too much overhead in future Exascale supercomputers?
  • How scheduling and fault tolerance approaches can be adapted to be energy-aware?

Another important goal of the project is to establish a perennial collaboration between international partners from Brazil (UFSC and UFRGS), France (CNRS) and Ecuador (ESPE), as well as to promote knowledge transfer between them.

Project Members

UFSC: Federal University of Santa Catarina, Brazil

LIG/CNRS: National Center for Scientific Research, France

ESPE: Ecuadorian Armed Forces University, Ecuador

UFRGS: Federal University of Rio Grande do Sul, Brazil

Scientific Production

  • A.D. Pereira, R.C.O. Rocha, M. Castro, L.F.W. Góes, M.A.R. Dantas. “Extending OpenACC for Efficient Stencil Code Generation and Execution by Skeleton Frameworks”. In: HPCS, 2017. (to appear)

  • D. Oliveira, L.L. Pilla, M. Hanzich, V. Fratin, F. Fernandes, C. Lunardi, J.M. Cela, P.O.A. Navaux, L. Carro, P. Rech. “Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators”. In: HPCA, 2017.

  • D. Oliveira, L.L. Pilla, N. DeBardeleben, S. Blanchard, H. Quinn, I. Koren, P.O.A. Navaux, P. Rech. “Experimental and Analytical Study of Xeon Phi Reliability”. In: SC, 2017. (to appear)

  • P.H. Penna, E.C. Inacio, M. Castro, P.D.M. Plentz, H.C. Freitas, F. Broquedis, J.-F. Méhaut. “Assessing the Performance of the SRR Loop Scheduler with Irregular Workloads”. In: ICCS, 2017.

  • P. Ramos, V. Vargas, M. Baylac, F. Villa, S. Rey, N.E. Zergainoh, R. Velazco. “Error-rate prediction for applications implemented in Multi-core and Many-core processors”. In RADECS, 2017. (to appear)

  • V. Vargas, P. Ramos, V. Ray, C. Jalier, R. Stevens, B.D. de Dinenchin, M. Baylac, F. Villa, S. Rey, N.E. Zergainoh, J.F. Méhaut, R. Velazco. “Radiation Experiments on a 28nm Single-Chip Many-core Processor and SEU error-rate prediction”. In: IEEE Transactions on Nuclear Science, 2017.

  • E.H.M. Cruz, M. Diener, L.L. Pilla, P.O.A. Navaux. “A Sharing-Aware Memory Management Unit for Online Mapping in Multi-core Architectures”. In: Euro-Par, 2016.

  • M. Castro, E. Francesquini, F. Dupros, H. Aochi, P.O.A. Navaux, J.-F. Méhaut. “Seismic Wave Propagation Simulations on Low-power and Performance-centric Manycores”. In: Parallel Computing, 2016.

  • M.A. Souza, P.H. Penna, M.M. Queiroz, L.F.W. Góes, H.C. Freitas, M. Castro, P.O.A. Navaux, J.-F. Méhaut. “CAP Bench: A Benchmark Suite for Performance and Energy Evaluation of Low-Power Many-Core Processors”. In: Concurrency and Computation: Practice and Experience, 2016.

  • P.H. Penna, M. Castro, H.C. Freitas, F. Broquedis, J.-F. Méhaut. “Design Methodology for Workload-Aware Loop Scheduling Strategies Based on Genetic Algorithm and Simulation”. In: Concurrency and Computation: Practice and Experience, 2016.

  • P. Ramos, V. Vargas, M. Baylac, F. Villa, S. Rey, J. Clemente, N-E. Zergainoh, J-F. Méhaut, R. Velazco. “Evaluating the SEE sensitivity of a 45nm SOI Multi-core Processor due to the 14 MeV Neutrons”. In: IEEE Transactions on Nuclear Science, 2016.