Smart Grid

Big data analytics and machine learning are pivotal in integrating renewable energy sources with energy storage and forecasting services, while accounting for uncertainties and customer behavior. Mining information from big data available in smart grids can help prevent blackouts by uncovering underlying patterns and understanding the highly complex structures and thousands of system variables in power networks. This paves the way for timely, accurate predictions of unstable power network components. Our research focuses on developing a real-time big data analytics platform for both current and future smart grids. The project encompasses big data analytics, predictive analytics, forecasting analytics, cyber-physical security, and privacy to enable accurate and efficient energy consumption forecasting. By leveraging the immense computational power of the TAMUQ supercomputer, our researchers can process and analyze vast quantities of data to optimize grid performance [1]. High-performance computing (HPC) facilitates real-time decision-making, enhancing the adaptability of smart grids and contributing to improved energy management, reduced operational costs, and increased sustainability. This research will lead to the development and testing of a dynamic energy management system with a big data analytics platform, optimizing energy resources and load management for both energy efficiency and demand response programs based on real-time processing. Specifically, we employ a multiprocessing approach for load forecasting using machine learning models and parallel processing to reduce training time and improve accuracy [3]. The obtained data from the industry partner Iberdrola contains 2.2 billion records for 100,000 transformers. Figure 1 illustrates the proposed big data platform. The prediction models are executed on a supercomputer with an Intel Xeon E5-2680 v4 2.40GHz processor, 64GB DDR4 memory, and CentOS 7 OS, utilizing SLURM scheduling, Intel Omni-Path Fabric 100 Series switches, and 2560M memory per node. Additionally, experiments were conducted on Microsoft Azure using vCPUs ranging from 8 to 32. The methodology requires only 4 minutes to train 1,000 transformers for an hourly day-ahead forecast, handling approximately 24 million records with 32 processors. In [4], the proposed approach involves implementing a distributed tree-based machine learning technique for short-term load forecasting in a multi-AMI environment utilizing Apache Spark. This proposed method optimizes computational resource allocation across multiple Spark jobs, reducing job completion time while minimizing communication overhead. It incorporates multiple layers of parallel processes executed sequentially.

Fig. 2. Proposed big data platform

Table 1 presents a comparison of computational resources and their respective performance metrics. It demonstrates that TAMUQ supercomputing (spark) with 5 nodes, 20 cores, and 120 GB/node outperforms the other two methodologies, offering the fastest time per iteration at 0.409 seconds. For n clusters, the number of iterations required to train the clustered data is n/j, where j represents the number of jobs submitted simultaneously. Since the processes access the n clusters repeatedly, the training data related to these clusters are cached in memory. The article also examines optimal scheduling algorithms for load prediction involving multiple transformers.

Table 1
Methodology	Capacity	Total Time	Time/iteration
TAMU supercomputer	1 node 1 core 32GB/node	2400 sec	2.400 sec
TAMU supercomputer (parallel processing)	8 nodes 20 cores 64 GB/node	1010.02 sec	1.010 sec
TAMUQ supercomputing (spark)	5 nodes 20 cores 120 GB/node	409.52 sec	0.409 sec