How South Africa Is Solving One of HPC’s Biggest Challenges
Last Updated on July 20, 2023 by Editorial Team
Author(s): Jan Rowell
Originally published on Towards AI.
The Future of HPC Leadership U+007C Towards AI
A strategic approach to a student cluster competition nets top prizes and helps address a critical skills shortage
High-performance computing (HPC) drives economic growth and scientific progress around the world. But HPC skills are in short supply, and the problem is growing more critical as HPC workloads increasingly encompass artificial intelligence (AI) and big data as well as modeling and simulation.
The Republic of South Africa is using international student cluster competitions as one way to expand its HPC workforce. Its efforts have been so successful that it has emerged as a global powerhouse, with other teams and national leaders asking how they do it. It also has South Africa looking toward a more pan-African focus for future competitions. We spoke with South African students and leaders to learn more.
A Global Challenge
South Africa has fielded a national team for the past seven years in the International Supercomputing Conference (ISC) Student Cluster Competition, held in Frankfurt, Germany, in conjunction with what is now known as the ISC High-Performance conference. The nation took top honors in 2019 and in three previous years. When they weren’t taking home the top prize, the South African teams secured two second-place finishes and one-third place, giving them major hardware all seven years. The 2019 team also earned honorable mention for its performance on the LINPACK benchmark.
South Africa’s leaders view the competition as a way to expose undergraduates to HPC, increase their knowledge of the field, and ultimately help fill South Africa’s HPC pipeline. “The paucity of skills is the biggest challenge in HPC, and it is not unique to South Africa,” said Dr. Happy Marumo Sithole, who directs South Africa’s Center for High Performance Computing (CHPC). “The student competition leads to more students getting excited about HPC and developing skills that are critical for South Africa and the rest of the world. It is helping us remove a major roadblock.”
Strategic Goals
South Africa is the continent’s leader in high-performance computing. Along with Australia, it will be one of two countries to host the massive radio telescopes and supercomputers that will comprise the Square Kilometer Array (SKA) project, an international collaboration to field a next-generation radio observatory. In 2016, CHPC installed Africa’s fastest supercomputer, a petaflops Dell cluster powered by Intel Xeon processors. The nation established the CHPC in 2007 under the management of its Council for Scientific and Industrial Research (CSIR). CHPC has since funded major research in areas ranging from astrophysics to climate change to HIV modeling.
“High-performance computing supports South Africa’s systems of research, innovation, and industry,” said Dr. Sithole. “It is essential to industrial modernization and large-scale science projects that will improve social and economic development and scientific understanding in South Africa, across the continent, and around the world.”
The student cluster competition emphasizes practical challenges and applications in high-performance computing. Each team designs and builds a cluster and uses it to run a set of industry-standard HPC benchmarks and widely used applications. Reflecting real-world constraints, the judging is based on the performance achieved without exceeding a power range of 3 kilowatts. The 2019 teams ran benchmarks that tested floating-point and integer performance, sustainable memory bandwidth, and other critical metrics. The applications were relevant to astrophysics, quantum chemistry, hydrodynamics and gravity, fluid dynamics, deep learning model training, climate modeling, and other areas.
“When students learn about HPC and see all that it can do, they become interested and excited,” Dr. Sithole said.
National Team, Year-Round Process
To fulfill their strategic objectives, South Africa’s HPC leaders decided to create the country’s team through a nationwide competition, choosing all new members each year. The 2019 team consisted of Kaamilah Desai and Anita de Mello Koch from the University of the Witwatersrand in Johannesburg, and Dillon Heald, Stefan Schroeder, Jehan Singh, and Clara Stassen from the University of Cape Town. They competed against 13 teams from China, Estonia, Germany, Poland, Scotland, Spain, Switzerland, and the United States.
The competition is a year-round activity in South Africa, with the first step toward selection taking place shortly after the Frankfurt competition concludes. In this first phase, CSIR invites universities around the country to send four-person teams to Cape Town, where students learn the basics of building a Linux-based cluster and create a prototype cluster in the cloud. The first-round competition for the 2020 team was held in early July in Cape Town and drew 20 teams from across the country.
The gender balance of the 2019 team came about “accidentally on purpose,” according to CHPC’s David Macleod, who, along with fellow CHPC computer engineer Matthew Cawood, coaches the team. “We bake diversity in at the start,” Macleod said. “We reach out to minorities and give them the encouragement to participate. For 2020, we’re requiring each team at this first level to have at least one woman. After the first level, we go strictly by merit.”
In the second round, held in December, teams build small clusters out of parts supplied by CHPC. They install software and run benchmarks, and are judged on the cluster design and application performance.
Designing a Powerful Cluster
Once the team is chosen, the students spend the next six months designing the cluster for the international competition and optimizing their software stack. Students conduct research and identify the technologies they believe will give them the best chance to win the competition. Then, team advisors work with global technology sponsors to secure hardware. The South African divisions of international enterprises have been supportive of the project from its earliest years, even before the teams started racking up wins.
“To have any chance of winning, you need excellent hardware,” said Macleod. “The final scores are often very close, so getting the best performance per watt and wringing every bit of performance out of the system can make the difference. If you’re not coming with the very, very best hardware you can, you’re going to have a difficult time.”
The 2019 team identified the Intel Xeon Platinum 8180 processor as offering the best capabilities, including performance/watt, for the competition’s diverse workloads and benchmarks. Intel provided the processor, and Dell donated a six-node PowerEdge with two 8180 processors per node. NVIDIA and Mellanox were also sponsors, and the cluster used V100 GPUs for training and inferencing for a climate modeling AI solution and for the HPL portion of the HPC Challenge benchmarks. All other codes ran on the Intel Xeon Platinum processors.
“We found the Intel Xeon Platinum processor had the best performance per watt and the best solutions for managing the power,” said Heald. “We used the model-specific registers to control the thermal design power when we needed to. We also had the libraries and compilers, which worked very well with the Intel processors.”
The processor’s high core count was another plus. “With 28 cores per processor, we had the flexibility to clock the cores lower to get better performance and power efficiency when we needed to,” said Singh.
The students said other elements of the Intel HPC environment delivered further value. Swift, the hydrodynamics and gravity code, benefited from Intel Advanced Vector Extensions 512. CP2K, the quantum chemistry, and solid-state physics package, got a boost from Intel OpenMP and the Intel Math Kernel Library. “We used Intel Parallel Studio to compile many of our codes because the Intel compilers gave us such good performance gains,” Singh added.
Beyond the Platform
In addition to donating the students’ desired hardware elements, the team’s sponsors and advisors were generous with advice, encouragement, and insights. Dell brought the students to visit its Texas headquarters, where they discussed design considerations and reviewed their eventual design with HPC experts from Dell and the Texas Advanced Computer Center (TACC). TACC, which houses the world’s fifth most powerful supercomputer, has a longstanding relationship with CHPC and waterfalls its end-of-life clusters to South Africa.
“The conversations with Dell and TACC were extremely helpful when we were learning about different types of processors and how the implementations can affect things within a cluster like latency, maximum throughput, and power limitations,” said Heald.
Fostering teamwork was also important. Each team member had primary responsibility for one application or benchmark, and each shadowed or provided backup for another teammate. Everyone emphasized having fun, keeping the atmosphere light, and encouraging questions. Students practiced on the available hardware and received the final hardware a week or two before they left for the competition. Once they arrived in Frankfurt, they reassembled the cluster, installed the software stack, and performed power profiling. When the starter bell rang, they were ready to go.
“It takes three things to win the competition,” Macleod said in summary. “You need talented students with an aptitude for the tasks. You need the right equipment. And you need to prepare. The team spent six months — from December, when they were chosen, to June, when they went to Frankfurt — and it showed. They were able to go in calm and stay focused. It really is a simple recipe.”
Advancing HPC, Advancing Africa
South Africa’s commitment to the student cluster competition is helping the nation grow a new crop of HPC leaders, expand its supply of human capital, and accelerate its scientific, technological, and economic progress.
Student competitors have gone from being unfamiliar with HPC to be eager to pursue a career in the field.
For example, Koch, who is majoring in electrical engineering, said, “The competition introduced me to HPC. If I hadn’t done the competition, I wouldn’t have been thinking of HPC and maybe never would have looked into it. Now, I can see that this field is going to become even more important as the world changes. It will be important to be part of this field.”
Similarly, Schroeder stated, “Our sponsors and CHPC have made possible an amazing opportunity that has opened my eyes to supercomputing and altered my career path. I hope to work with supercomputers in the future at the KSA project.”
The competition’s impact is poised to reach beyond South Africa. “Other African nations have realized the value of the student challenge and would like to see it in their countries,” said Dr. Sithole. “We have received interest in how they can develop their own competitive teams or participate in our program. Our intention is to move toward a more continental approach, and we have received funding from the South African government to start the process. We expect to see more involvement at ISC next year and over the long term.”
____________________
Jan Rowell is an award-winning writer who covers technology trends and advances in HPC, AI, healthcare, and other areas.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI