1. This paper studies the resilient scheduling of moldable parallel jobs on high-performance computing (HPC) platforms.
2. Two resilient scheduling algorithms, Lpa-List and Batch-List, are introduced to cope with silent errors.
3. An extensive set of simulations is conducted to evaluate different variants of the two algorithms, and the results show that they consistently outperform some baseline heuristics.
The article provides a comprehensive overview of the resilient scheduling of moldable parallel jobs on high-performance computing (HPC) platforms. The authors introduce two resilient scheduling algorithms, Lpa-List and Batch-List, which use the List strategy to schedule the jobs in order to cope with silent errors. The article also provides an extensive set of simulations to evaluate different variants of the two algorithms, and the results show that they consistently outperform some baseline heuristics.
The article appears to be reliable and trustworthy as it provides a detailed overview of its topic and presents evidence for its claims through an extensive set of simulations. However, there are some potential biases in the article that should be noted. For example, while the authors provide evidence for their claims through simulations, they do not provide any real-world examples or case studies which could further support their findings. Additionally, while they discuss various speedup models such as roofline, communication, Amdahl, power, monotonic and mix model in relation to their proposed algorithms, they do not explore any counterarguments or alternative models which could potentially affect their findings. Furthermore, while they present both sides equally in terms of discussing their proposed algorithms versus baseline heuristics, they do not discuss any possible risks associated with using these algorithms or how these risks can be mitigated.
All in all, this article appears to be reliable and trustworthy but there are some potential biases that should be noted when considering its content.