Sheikh K. Ghafoor

Research

While Sheikh Ghafoor has had many years of research experience, his current research focus involves high-performance computing, cyber security, computational earth science, and computer science education. With much of his research funded by NSF, NASA, NSA, and DoE, he presently has 10 active research projects across multiple collaborations, including ORNL, UNC-Chapel Hill, UT-Chattanooga, AU, UA, UW, LSU, UMass Amherst, and UTSA. Sheikh has also had the privilege of working with many outstanding students over the years, and their combined efforts have resulted in numerous publications in various conferences and journals. Below you will find a description of just some of the research that Sheikh is currently working on.

High-Performance Computing

Sheikh's current research in high-performance computing is focused on pushing the support for elastic applications in HPC systems and addressing issues relating to the ever-increasing heterogeneous nature of HPC systems.

Elastic parallel applications are applications that can change the number of resources being used dynamically during runtime. They provide a new class of highly dynamic and event-driven applications. As the mean-time between component failures continues to decrease as HPC systems become larger and more complex, elastic applications provide the possibility of proactive fault tolerance. However, current resource management systems, middleware, and runtime libraries do not provide adequate support for elastic applications. Sheikh's research into elastic applications focuses on three directions: (1) developing programming models for elastic applications and solving real-world applications with these models, (2) developing and analyzing scheduling algorithms for parallel workloads that include traditional and elastic applications, and (3) developing and prototyping semantic and syntax APIs for MPI along with extensions of the Slurm resource management system to support elastic parallel applications. This work is funded partially by NSF and DoE.

There is a significant change in high-performance computing system architecture every decade or so. Large HPC systems are becoming ever-increasingly diverse and complex with the addition of hardware accelerators and other offloading devices. These and other aspects, such as increased memory and networking complexity, are pushing HPC systems towards becoming more heterogeneous. Productivity and performance portability is becoming a major challenge in heterogeneous HPC; the cost of porting existing applications to new HPC systems is often much higher than the cost of the system, and the performance of the ported applications (e.g., speedup and scalability) often degrades. Sheikh and his research group are developing a performance-portable framework for heterogeneous HPC environments, along with a structured grid cellular automata application that uses this framework. The purpose of the project is to evaluate the framework's performance at scale and determine: (1) How much performance degradation does a framework incur compared to hand-tuned code for a specific architecture? (2) How can we achieve performance portability, and what differences are there between applications implemented using different frameworks? (3) How can we measure productivity using conventional metrics for stencil computation in heterogeneous architectures, and what is the gain in productivity using a framework? This research is funded partially by ORNL.

Cyber Security

Sheikh's current research in cyber security involves mostly cyber-physical systems. His current interests involve security problems in embedded and SCADA systems, as well as in-vehicle network security.

Embedded systems often fall victim to malware simply because they were not designed with security in mind. Unlike traditional computing systems, direct forensic methods cannot be used to detect compromised embedded systems; embedded systems are often deployed in hard-to-reach locations, were not designed to be disassembled and analyzed, are under real-time constraints, and are vulnerable to fileless malware. Remote attestation is a security protocol designed to help administrators detect malware compromises in embedded systems across the internet, and it can be deployed across a wide variety of embedded systems architectures, hardware, and peripherals. Remote attestation generally involves two network entities — the prover and the verifier. The prover is an untrusted device that may have fallen victim to malware, while the verifier is a trusted network entity. The verifier issues a challenge to the prover in order to determine whether the prover is infected. Sheikh and his research group have developed a companion-assisted technique that examines the response time of the prover and decides if the prover has been compromised. Techniques to examine executable memory sizes and process scan times of the provers to determine if they have been compromised are also being developed. This project is supported by students working towards their graduate degrees under the CyberCorps: Scholarship for Service program.

The operations of current and emerging vehicles are controlled by hundreds of small purpose-built embedded computers called ECUs. ECUs communicate using in-vehicle networks and real-time protocols, such as CAN and LIN. While these protocols are robust and reliable, they have not been designed with security in mind and are therefore vulnerable to attacks. The lack of flexible and configurable environments in which researchers can simulate and test existing and emerging systems and protocols poses a serious constraint on in-vehicle security research. Reconfigurable platforms are preferable to statically configured ones as ECU configurations within in-vehicle networks vary from vehicle to vehicle. Sheikh's research group is currently developing XiveNet, an extensible, innovative, and open architecture testbed for in-vehicle network security research.

The CAN protocol is the backbone for communication among devices in modern automobiles. It has many desirable traits for embedded systems, including low-cost, low-power, reliability, and simplicity. However, CAN suffers from many security vulnerabilities that cannot be addressed by security solutions for general-purpose computers and networks. Some reasons for this are that many security solutions violate the real-time requirements of automobile networks, as well as costs and backward compatibility constraints. Furthermore, almost all the current proposed solutions violate one or more practical constraints of CAN, making it difficult for the automobile industry to adopt. We have developed a secure CAN protocol, called SecCAN, that uses lightweight encryption and message authentication via segmentation-based shared secret group keys. Our proposed SecCAN protocol can effectively prevent masquerade and replay attacks, and we are working on improving SecCAN to address other security threats, such as denial of service attacks. We are working to validate our SecCAN protocol using real-world data on our XiveNet testbed. This work has been funded by NSA and ORNL.

Computational Earth Science

Sheikh is interested in applying high-performance and distributed computing technologies to solving significant earth science and environmental engineering problems.

Despite the importance of lakes in the Earth's ecosystem, we have limited understanding of how they change over time, their effects on climate and weather systems, and what processes impact lake reservoirs. Scientists at UNC and UW are currently monitoring lake water storage in the United States, France, Bangladesh, Pakistan, India, and Nepal as part of an ongoing project that allows everyday citizens to measure lake water levels using simple and intuitive gauges. In collaboration with UNC-Chapel Hill and UW, our Tennessee Tech team has been developing the computational infrastructure for the lake observational data, consisting of a cloud-based system that allows citizens to submit lake height measurements via text, app, or web interface. These measurements are curated, validated, stored, and combined with satellite measurements while being made available for research scientists. The upcoming NASA SWOT satellite will simultaneously measure water surface elevation and inundation extent. The Tennessee Tech team is currently prototyping an app for water/land boundary measurement to validate SWOT inundations with extent measurements. This data will then be used to develop machine learning models to provide accurate lake area measurements from satellite imagery when ground data is not available. This project has been funded by NASA since 2017.

Sheikh is also working with domain scientists in hydrology, hydraulics, and remote sensing to apply HPC and other modern computing technologies to solve earth science and environmental engineering problems at scale. One such project involves a collaboration between faculty at Tennessee Tech's Civil and Environmental Engineering department and researchers at ORNL to develop high-resolution multi-model predictive two-dimensional flood models. Sheikh's group is developing an architecture agnostic parallel implementation of these flood models for HPC systems. We have simulated Hurricane Harvey, a Category 4 hurricane that made landfall on Texas and Louisiana in August 2017, on TACC's Stampede2 and ORNL's Summit. Specifically, on Summit, we were able to simulate Hurricane Harvey's landfall at a 5-meter resolution using 768 GPUs in just under an hour. This work is funded by ORNL.

Computer Science Education

The demand for PDC and HPC skills is at an all-time high. Multicore and many-core processors are becoming increasingly pervasive, even in areas where computing power has traditionally been limited, such as embedded systems. All modern desktops and laptops come with multiple processors, each containing multiple cores, and oftentimes possessing programmable GPUs. Even in industry, server farms, cloud-computing resources, and even supercomputers are becoming more common. Many of our everyday applications take advantage of multiple cores and/or GPUs. And so, understanding how parallelism and distributed computing affects problem-solving is crucial. However, most undergraduate programs do not teach these skills, and students are typically exclusively trained to think and program sequentially. To meet these growing demands, universities need to present a range of PDC and HPC knowledge and skills across multiple levels in computer science and computer engineering. However, there are numerous and significant challenges that must be overcome. Curriculums are often not up to date and instructors are not typically trained in PDC or HPC concepts. Furthermore, there is a lack of PDC and HPC resources, tools, and hands-on exercises available to instructors.

Sheikh enjoys sharing PDC and HPC concepts with students and is devoted to improving PDC and HPC education. He is a member of the NSF/IEEE-TCPP Curriculum Initiative on Parallel and Distributed Computing - Core Topics for Undergraduates and is helping lead the pervasive and emerging topics working group for Version 2.0-beta of the curriculum. He helps organize the Edu series of workshops (EduPar, EduHPC, EduHiPC, and Euro-EduPar) dedicated to PDC/HPC education. Sheikh is working to develop self-contained easily adoptable hands-on plugged and unplugged curriculum modules for early computer science and engineering classes. As part of this effort, he has conducted week-long faculty development workshops on integrating PDC in early computer science classes every summer since 2017. So far, over 80 faculty from more than 70 U.S. colleges have participated in these training workshops, and the participants have subsequently integrated some aspects of PDC into their classes.