[Apologies if you got multiple copies of this email. This message is
sent to . If you'd like to opt out of these
announcements, information on how to unsubscribe is available at the
bottom of this email.]
****************************** ****************************** ********************
(https://lists.mcs.anl.gov/ mailman/listinfo/hpc-announce
If you do not remember your password (which is needed to change these options), you can reset it using the "Unsubscribe or Edit Options" button at the bottom of the page.
(https://lists.mcs.anl.gov/ mailman/listinfo/hpc-announce)
.
****************************** ****************************** ********************
sent to . If you'd like to opt out of these
announcements, information on how to unsubscribe is available at the
bottom of this email.]
Special Issue of Sustainable Computing: Informatics and Systems (SUSCOM) on Resilience and/or Energy-aware techniques for High-Performance Computing (RE-HPC)
SCOPE:
Resilience and energy consumption have become two important concerns for high-performance computing (HPC) systems. With the increasing core count and technology miniaturization, today's large computing platforms (datacenters, clusters, supercomputers, etc.) are increasingly prone to failures. Faults are becoming norm rather than exception. Besides the classical fail-stop errors (such as hardware failures), soft errors (such as SDCs for silent data corruptions) constitute another threat that can no longer be ignored by the HPC community. Another concern is energy. Presently, large computing centers are among the largest consumers of energy, hence measures must be taken to reduce energy consumption. Energy is needed not only to power the individual cores but also to provide cooling for the system. In today's datacenters, a large proportion of energy is spent on cooling and thermal-related activities. It is anticipated that the power dissipated to perform communications and I/O transfers will also make up a much larger share of the overall power consumption. The relative cost of communication is expected to increase dramatically, both in terms of latency/overhead and of consumed energy. Re-designing algorithms for HPC systems to ensure resilience and to reduce energy consumption will be crucial to achieving sustained performance. The link between resilience and energy must also be carefully tackled. Better resilience often requires redundancy (replication and/or checkpointing, rollback and recovery), which consumes extra energy. Hot cores may lead to less resilient computing or increase the probability of individual failures. On the other hand, reducing the energy consumption via voltage/frequency scaling techniques will increase the application running time, and hence the expected number of failures during execution.
This Special Issue will encompass a broad range of topics related to resilience and energy efficiency for HPC. Its objective is to facilitate exchange of valuable information and ideas among researchers and practitioners. Topics of interest include (but are not limited to):
● Fault-tolerant algorithms, tools, and protocols
● Checkpointing, replication, and recovery techniques
● Detection and prediction of soft errors and SDCs
● System reliability, testing, and verification
● Resilience models, algorithms, and simulations
● Energy-efficient scheduling and resource management
● Power-aware runtime systems
● Energy-efficient I/O, storage, and networking
● Thermal behavior modeling, control and management
● Cooling-aware optimizations and evaluations
● Tradeoffs between performance, reliability, energy and temperature
SUBMISSION DETAILS:
General information for submitting papers to SUSCOM can be found at http://www.journals.elsevie r.com/sustainable-computing ( please note the “Guide for Authors” link). Submissions to this Special Issue (SI) should be made using Elsevier's editorial system at the journal website (under the “submit your paper” link). Please make sure to select the “SI: RE-HPC” option for the type of the paper during the submission process. All submissions must be original and may not be under review. A submission based on one or more papers that appeared elsewhere has to include major value-added extensions over what appeared previously (at least 30% new conceptual material). Authors are requested to attach to the submitted paper such earlier articles and a summary document explaining the enhancements made in the journal version. All submitted papers will be peer-reviewed using the normal standards of SUSCOM.
IMPORTANT DATES:
● Manuscript due date: May 1, 2017
● First decision notification: August 1, 2017
● Tentative publication schedule: December- 2017
GUEST EDITORS:
Anne Benoit, ENS-Lyon, France
Jean-Marc Pierson, University of Toulouse, France
Hongyang Sun, Vanderbilt University, USA
Any question may be sent to hongyang.sun@vanderbilt.edu
SCOPE:
Resilience and energy consumption have become two important concerns for high-performance computing (HPC) systems. With the increasing core count and technology miniaturization, today's large computing platforms (datacenters, clusters, supercomputers, etc.) are increasingly prone to failures. Faults are becoming norm rather than exception. Besides the classical fail-stop errors (such as hardware failures), soft errors (such as SDCs for silent data corruptions) constitute another threat that can no longer be ignored by the HPC community. Another concern is energy. Presently, large computing centers are among the largest consumers of energy, hence measures must be taken to reduce energy consumption. Energy is needed not only to power the individual cores but also to provide cooling for the system. In today's datacenters, a large proportion of energy is spent on cooling and thermal-related activities. It is anticipated that the power dissipated to perform communications and I/O transfers will also make up a much larger share of the overall power consumption. The relative cost of communication is expected to increase dramatically, both in terms of latency/overhead and of consumed energy. Re-designing algorithms for HPC systems to ensure resilience and to reduce energy consumption will be crucial to achieving sustained performance. The link between resilience and energy must also be carefully tackled. Better resilience often requires redundancy (replication and/or checkpointing, rollback and recovery), which consumes extra energy. Hot cores may lead to less resilient computing or increase the probability of individual failures. On the other hand, reducing the energy consumption via voltage/frequency scaling techniques will increase the application running time, and hence the expected number of failures during execution.
This Special Issue will encompass a broad range of topics related to resilience and energy efficiency for HPC. Its objective is to facilitate exchange of valuable information and ideas among researchers and practitioners. Topics of interest include (but are not limited to):
● Fault-tolerant algorithms, tools, and protocols
● Checkpointing, replication, and recovery techniques
● Detection and prediction of soft errors and SDCs
● System reliability, testing, and verification
● Resilience models, algorithms, and simulations
● Energy-efficient scheduling and resource management
● Power-aware runtime systems
● Energy-efficient I/O, storage, and networking
● Thermal behavior modeling, control and management
● Cooling-aware optimizations and evaluations
● Tradeoffs between performance, reliability, energy and temperature
SUBMISSION DETAILS:
General information for submitting papers to SUSCOM can be found at http://www.journals.elsevie
IMPORTANT DATES:
● Manuscript due date: May 1, 2017
● First decision notification: August 1, 2017
● Tentative publication schedule: December- 2017
GUEST EDITORS:
Anne Benoit, ENS-Lyon, France
Jean-Marc Pierson, University of Toulouse, France
Hongyang Sun, Vanderbilt University, USA
Any question may be sent to hongyang.sun@vanderbilt.edu
******************************
(https://lists.mcs.anl.gov/
If you do not remember your password (which is needed to change these options), you can reset it using the "Unsubscribe or Edit Options" button at the bottom of the page.
(https://lists.mcs.anl.gov/
******************************
No comments:
Post a Comment