INTRODUCTION
Background
Advanced materials play a dominant role to human well-being and economic security
Materials play a crucial role across various industries, impacting national security, clean energy, and human welfare by influencing product design, construction, and cost Their significance is paramount in addressing industry-specific challenges and developing effective solutions.
Elastic constants are crucial in defining a material's mechanical properties, as they form a matrix of coefficients that describe how a material responds to stress, deforms, and ultimately returns to its original shape By utilizing these elastic constants, we can efficiently calculate key values such as Bulk modulus, Lame constant, and Young’s modulus, which are essential for understanding material behavior under various conditions.
Estimating elastic constant coefficients can be achieved through various methods, including manufacturing, testing, calculation, and simulation However, the selection of advanced materials typically relies on extensive experimental studies, which are both time-consuming and costly According to a 2000 report by the National Research Council, while bringing a new consumer product to market can take 2 to 5 years, developing a new material may extend to 15 to 20 years, excluding the initial lab-scale invention phase This significant time and financial investment highlights the need for more efficient strategies in the design, replacement, and optimization of target candidate materials.
The rapid advancement of science and technology has positioned machine learning as a promising solution for various challenges The availability of large-scale experimental data, coupled with the rise of cost-effective computing systems and efficient data storage solutions, has made powerful open-source tools more accessible than ever.
2 software, and the greater support of the community, machine learning have supported significantly in expediting novel and improved materials design and discovery.
Research objectives
The research objective of this thesis is to investigates the application of the machine learning approach for predicting elastic constants and other derived mechanical properties of multi-component systems
Structure of the thesis
In order to better understand the solution method as well as give an appropriate result, the thesis is presented in the following structure:
Highlights the role and importance of the mechanical properties, material informatic, especially the disadvantages of existing method for estimating mechanical properties
This article reviews previous studies related to the research issues at hand, highlighting the limitations of existing methods It also explores innovative solutions that show promise in addressing these challenges Additionally, the background context and the objectives of the research are clearly outlined, setting the stage for further investigation.
The database and model for elastic constants prediction are discussed
Check the reliability and performance of the proposed method through comparison with other authors considered The results are expressed, visualized, and discussed
Summarize the results, the achievement, and provide further direction for the study
LITERATURE REVIEW
Recent advancements in materials informatics, particularly through machine learning (ML) techniques, have significantly accelerated the design and discovery of innovative materials By utilizing established statistical methods, these approaches effectively extract valuable insights from data, providing a streamlined pathway for material development Historically, the process of bringing new materials from conception to market could span decades; however, these modern strategies have the potential to drastically reduce both time and costs involved Given their demonstrated impact, it's clear that the future of material science is increasingly driven by big-data-enabled discovery.
In the past decade, various machine learning (ML) approaches have significantly advanced materials design, utilizing supervised, unsupervised, and semi-supervised learning models to tackle challenges in materials clustering, classification, and property regression Generative methods, inverse design, adaptive design, and active learning strategies have facilitated efficient experimental design, leading to autonomous synthesis of materials with targeted properties The integration of natural language processing has unveiled substantial potential for automated knowledge extraction in novel materials discovery This progress has driven the growth of large-scale open-source materials databases and robust ML toolkits, alongside curated materials datasets and software repositories As a result, the emerging field of materials informatics is experiencing rapid expansion, supported by technological advancements and community engagement.
Machine learning applications in material science aim to develop accurate property prediction models, significantly improving the assessment of various material properties such as mechanical, thermodynamic stability, electronic, magnetic, thermal, transport, and catalytic characteristics Once established, these ML-based models serve as effective screening tools to identify promising candidate materials for further in-depth computations, simulations, or experimental evaluations, thereby streamlining the research process.
This thesis focuses on the study of elastic constants in multi-component systems across a vast chemical space Once validated, these models offer significant advantages for material design challenges, which are crucial in the context of dwindling natural resources The primary aim of this research is to develop a machine learning model that can quickly and accurately estimate elastic constants, including key properties such as elastic constant coefficients, Young’s modulus, and hardness.
Numerous studies have focused on the elastic properties of metals, alloys, and crystalline materials Furmanchuk et al developed a Random Forest regressor model to predict bulk moduli, demonstrating strong correlations with density functional theory (DFT) calculations and utilizing data from the TE Design Lab database Similarly, Evans et al employed a gradient boosting regressor to forecast the bulk and shear modulus of silica zeolites, relying solely on geometric features related to the zeolite's local geometry, structure, and porosity Their model provided accurate estimates for bulk and shear moduli when compared to DFT predictions Additionally, Wang et al explored various machine learning models to further enhance these predictions.
5 the prediction of elastic constants 𝐶 𝑖𝑗 of binary alloys In this study, neural networks
Artificial intelligence techniques, such as neural networks and support vector machines, are increasingly utilized to analyze elastic constants through DFT calculations, which leverage experimentally estimated values For instance, a study by Wang et al focused on predicting bulk (𝐵) and shear (𝐺) moduli of Fe-Cr-Al ternary alloys by using composition and temperature as key features, yielding accurate predictions for elastic properties Similarly, Wen et al introduced a machine learning-assisted approach to identify high entropy alloys (HEAs) with exceptional hardness, specifically within the Al-Co-Cr-Cu-Fe-Ni system Chaudry et al predicted high-performance aluminum alloys with elevated hardness values by employing a weighted compositional average technique for Al-Cu-Mg-x alloys Furthermore, Zhu et al explored the mechanical properties of Ti-alloys, examining the influence of Mo and Cr on microstructure, and successfully integrated machine learning models with experimental methods to design Ti-alloys with superior mechanical properties, demonstrating an excellent balance of strength.
Previous literature indicates that existing approaches rely on inadequate attribute sets for structure representation, leading to suboptimal accuracy in machine learning models This thesis presents a deep neural network model designed to create a machine learning-based mapping between structures and their DFT-computed 𝐶 𝑖𝑗 constants The model's effectiveness is demonstrated through precise approximations of 𝐵, 𝐺, and ν Furthermore, the model's performance is validated by comparing its results with experimentally measured elastic constants of multi-component alloys.
MODELING & METHODOLOGY
Data
The Materials Project offers open web-based access to both known and predicted materials, along with powerful analysis tools for designing innovative materials Pymatgen, an open-source Python library for materials analysis, facilitates data collection from the Materials Project database via its integration with the Materials Project REST API The computational methodology details are available in the referenced literature, where the elastic constants for specific systems are determined using a stress-strain approach By applying various strains and utilizing density functional theory (DFT) calculations, the full stress tensor is obtained, allowing for the elastic matrix to be derived through a linear fit of the calculated stresses over the applied strain range.
Table 3.1 Structure information and the calculated results that contained in the database
𝑛 Total atomic number ν - Poisson’s ratio
Figure 3.1 Distribution of calculated Poisson ratio, bulk modulus and shear modulus by DFT calculation
Figure 3.1 presents a log-log plot showcasing the relationship between the averaged bulk modulus and the averaged shear modulus across all systems in the dataset, with color-coding representing the Poisson ratio Additionally, bar plots illustrate the distribution of materials in relation to their respective bulk and shear modulus values The data indicates a concentration of most materials within the ranges of 80 to 190 GPa for both shear and bulk modulus.
8 moduli, respectively Thus, this diagram distills several well-known results in the field of elasticity and illustrates them via a large amount of data
The dataset encompasses a wide range of component systems that represent nearly all chemical elements in the periodic table Each sample is defined by its structural information and basic physical properties, including calculated elastic and compliance tensors Additionally, the dataset features calculations of the bulk and shear moduli, along with averaged values derived from the Voigt-Reuss-Hill equations Figure 3.2 provides a visual representation of the frequency of elemental species present in the dataset.
Figure 3.2 Elements contained in the dataset The color represents the frequency of elements containing in the system
Monoclinic structures can exhibit up to 13 independent elastic constants; however, we focused on just 9 constants for all elements, as monoclinic structures represent only 3.8% of the total dataset The dataset includes DFT-computed values for bulk modulus (𝐵), Poisson's ratio (ν), compliance tensor, and volume, as summarized in Table 3.1.
Figure 3.3 Correlation plot of features in the database
In the data preprocessing phase, outliers in 𝐶 𝑖𝑗 and ν were removed, specifically targeting negative values of 𝐶 𝑖𝑗 and ν values outside the typical range of (0–1) for metallic alloys Subsequently, the values of 𝐵 and 𝐺 were estimated using the Voigt-Reuss-Hill equations.
In the context of elasticity, the Voigt and Reuss approximations are denoted by subscripts V and R, respectively, with 𝑆 𝑖𝑗 indicating the elastic compliance tensor components The average values of bulk modulus (𝐵) and shear modulus (𝐺) allow for the expression of Poisson's ratio (ν) as ν = (3𝐵 − 2𝐺).
Descriptor
A foundational approach to encoding compounds involves representing them as a vector based solely on the elements present, focusing on their attributes without considering crystal structure or stoichiometry For instance, a binary compound represented as 𝐴 𝑥 𝐵 𝑦 may encompass properties like the atomic radius of the pure elements.
𝐴 element, the atomic radius of the pure 𝐵 element, and the experimentally known melting points of these elements, or the compositionally-averaged feature 𝑥×𝐹 𝐴 +𝑦×𝐹 𝐵
The selected features 𝐹 𝐴 and 𝐹 𝐵 for elements 𝐴 and 𝐵, respectively, can be limiting, as the properties of pure elements do not always align with those observed in their compounds A notable example is oxygen gas, which exhibits elemental properties that significantly differ from its anionic characteristics in oxide compounds.
The "atoms-in-compounds" approach enhances the basic technique by focusing on the identity of atoms within a compound while utilizing descriptions derived from existing compound data For instance, when analyzing a compound represented as 𝐴 𝑥 𝐵 𝑦, we can incorporate descriptors like the ionic radius of 𝐴, based on its bonding in selected compounds, or the oxide heat of formation of 𝐵 These specific descriptors can provide more accurate predictions than the properties of the individual pure elements, depending on the compound being modeled.
Significant research is focused on creating new descriptors for encoding complex materials like crystal structures Instead of solely relying on space group descriptions, utilizing crystal structure prototypes is crucial Various algorithms have been developed and implemented to enhance this approach.
11 toward the probabilistic crystal structure prediction models In addition to descriptors derived from experimental data and structural analysis, descriptors from DFT computations are becoming increasingly popular
In this thesis, we propose a descriptor that captures the information of the structure for predicting elastic properties of the material.
Neural network
An artificial neural network (ANN) is a mathematical model that uses algorithms to identify patterns in data, simulating human brain functions Research has explored the fundamental principles of ANNs, which find applications in data analysis, detection, and recognition ANNs consist of interconnected neurons, each defined by an input, output, and activation function The structure includes an input layer that receives data, an output layer that produces results, and hidden layers that process information The architecture of an ANN is determined by the number of layers, the number of neurons per layer, their activation functions, connections, and loss functions, typically established through experimental procedures under various conditions.
Figure 3.4 The general learning circle diagram
The error function is essential for estimating the difference between actual network outputs and target values in the training set Training a neural network involves optimizing connection weights \( w_k \) to minimize the error function, commonly referred to as the loss function The backpropagation training algorithm is primarily used to determine these weights To assess the performance of the developed models, metrics such as mean square error (MSE), mean absolute error (MAE), and the coefficient of determination \( R^2 \) are utilized.
Methodology
The workflow of modern machine learning (ML) approaches is depicted in Figure 3.5, showcasing the process of breaking down a chemical structure into a descriptor A neural network is then utilized to derive an embedding representation, which is essential for predicting the elastic constant.
In this article, we explore the concept of vector-valued outputs in regression models, where each data point (𝑥 𝑝 , 𝑦 𝑝 ) consists of an 𝑁-dimensional input 𝑥 𝑝 and a 𝐶-dimensional output 𝑦 𝑝 This approach allows for a more complex representation of relationships between variables, enhancing the model's ability to capture multidimensional data patterns.
𝑥 𝑁,𝑝 ], 𝑦 𝑝 = [𝑦 1,𝑝 𝑦 2,𝑝 … 𝑦 𝐶,𝑝 ] in which the 𝑁 × 1 input column vector and the 1 × 𝐶 output row vector are corresponding to the feature vector of a structure and a flattened array elastic constant
𝐶 𝑖𝑗 , respectively To tune the weights, we invoke regression cost function and properly minimize in order to make this approximation hold as well as possible
Figure 3.5 The general workflow for current ML schemes, in which a chemical structure is decomposed to a descriptor and a neural network is adopted to extract embedding representation for predicting elastic constant
In the second part of the model's architecture, a feed-forward neural network is introduced, featuring output units equivalent to the target \( y_p \) The elastic constant vector can be defined through a linear relationship: \( y_p = H(x_p) \) For example, with a single hidden layer, the model's output is calculated as \( y_p = H(x_p) = W_2 \times g(W_1 \times x_p) \), where \( W_1 \) and \( W_2 \) represent the \( N \times C \) weight matrices for the hidden and output layers, respectively The function \( g \) denotes a nonlinear activation function applied to each element of the input matrix, while the notation \( \times \) indicates the dot product.
The feature vector, 𝑥 𝑝, for a local structure is derived from the set of pairwise terms that illustrate the interactions between the center atom and its neighboring atoms Additionally, the pairwise term is represented by the feature vector 𝑏 𝑖𝑗.
This study explores the interaction between atoms i and j, focusing on the influence of local structure on neighboring atoms Utilizing the Behler method with symmetry basis functions, we implement a neural network to learn these basis functions The vector function \( f(b_{ij}) \) transforms the feature vector of pairwise terms into embedding feature vectors with specified dimensions, represented as \( a_{ij} \) This function is uniformly applied across all atom pairs, and we employ a deep neural network consisting of three hidden layers The embedding feature vector is formulated as \( a_{ij} = w_3 \times g(w_2 \times g(w_1 \times b_{ij})) \), where \( w_1, w_2, \) and \( w_3 \) are the weights of the hidden layers, and \( g \) denotes a non-linear activation function.
To leverage a common deep learning framework, we construct a representation matrix for local structures, organized in descending order based on the distance to the center atoms of their pairwise interaction terms, denoted as 𝐵 = (𝑏 𝑖1 , 𝑏 𝑖2 , … , 𝑏 𝑖𝑛), where n represents the number of atoms surrounding atom i Given the variability in the number of atoms within chemical environments, we utilize padding to standardize vector dimensions by appending zero elements to shorter vectors This process allows us to derive a local structure matrix, with rows and columns corresponding to the maximum number of neighboring atoms and the dimensions of the pairwise terms, respectively Ultimately, we compile these matrices into a 3D input tensor that represents the structure, with dimensions reflecting the number of atoms, the maximum neighboring atoms, and the dimensions of the pairwise vectors Our model employs convolutional networks, as illustrated in Figure 3.6, where 2D convolutional layers are utilized to extract hidden basis functions.
The deep neural network architecture designed for predicting elastic constant coefficients begins by converting input structures into 3D tensors that represent pairwise terms Conv2D layers are then utilized to extract embedding features from these tensors, with the output of the convolutional layers being summed to capture the pairwise interactions of an atom with its neighboring atoms Ultimately, fully connected layers are employed to predict the coefficients \( C_{ij} \).
We designed the feature vectors for describing the pairwise interaction of atom i and j as follows: 𝑏 𝑖𝑗 = (𝑟 𝑖𝑗 𝑓 𝑐 (𝑟 𝑖𝑗 ), 1
To compute the embedding feature vector for pairwise atomic interactions, we utilized a three-layer perceptron network, which was consistently applied across all pairwise terms The embedding feature vector for each atom is derived from the cumulative output of its interactions with neighboring atoms Additionally, we employed 2D convolutional (Conv2D) layers, utilizing the TensorFlow/Keras Library for implementation.
We utilized a 1 × 1 kernel to extract the embedding vector for pairwise terms, denoted as 𝑎 𝑖𝑗, with the number of embedding features determined by the filters in the Conv2D layers Specifically, we employed three Conv2D layers, each containing 128 filters After processing through a Conv2D layer, the input tensor transforms into a new tensor that retains the same dimensions, except for the third dimension, which corresponds to the number of filters To derive the feature vectors of local structures, we summed the third dimension of the output tensor, resulting in a matrix, X, where each row represents a feature vector of the local structures Finally, we implemented the network for the elastic constant using two fully connected layers.
16 with 128 neurons Regarding the final layer, we used nine neurons to represent the elastic constants The mean absolute error can be obtained as:
The loss function \( L \) for our model is defined using the elastic constant vector \( y_{c,p} \) obtained through DFT and the predicted elastic constant vector \( \hat{y}_{c,p} \) from our model, which serves as the basis for training our network.
The architecture of our model for predicting elastic constants, depicted in Figure 3.6, consists of multiple independent convolutional neural networks (CNNs) and two fully connected layers To mitigate covariance shift, we implemented batch normalization after each convolutional layer Additionally, L2-norm regularization was utilized to penalize the kernel parameters, while the Dropout technique was employed to prevent overfitting by freezing random connections between layers We applied rectified linear unit (ReLU) activation functions to all hidden layers, reserving a linear activation function for the output layer The Adam Optimizer was chosen for optimizing the neural network.
RESULTS AND DISCUSSION
We assessed the performance of our deep neural network model for predicting elastic constant coefficients using a dataset of 1,181 structures with DFT-calculated values of 𝐶 𝑖𝑗 This dataset encompasses various multi-component systems across a wide chemical space, featuring 44 metallic elements from the periodic table We partitioned the dataset into training and testing sets, consisting of 1,063 and 118 structures, respectively.
Figure 4.7 Comparison of predicted 𝐶 11 (GPa) by using deep neural network and DFT calculation (left) and Mean Absolute Error is used to train the model (learning curves) (right)
Figure 4.1 demonstrates the mean absolute error (MAE) for the elasticity model, showing a decrease in learning curves for both training and validation sets Additionally, the left side of Figure 4.1 compares the predicted elastic constant 𝐶 11 from the deep neural network with the values calculated by density functional theory (DFT) The results indicate a strong consistency, as most points on the scatter plot align along a straight line with a slope of one, suggesting that the predicted outcomes closely match the DFT results.
Table 4.2 Model performance for elastic constant prediction: Root Mean Squared
Error, Mean Absolute Error, Coefficient of Determination estimated from the validation set
Table 4.1 presents the statistical metrics used to assess our model's performance, specifically the root mean square error (RMSE) and mean absolute error (MAE) Utilizing the ReLU activation function helped lower computational costs, resulting in RMSE and MAE values of 16.9324 GPa and 13.4939 GPa for the test set, respectively.
Figure 4.8 Comparison of predicted(a) bulk modulus (𝐵) and (b) Poisson’s ratio (ν) by DFT calculation and averaged values calculated from predicted 𝐶 𝑖𝑗 using deep neural network
Moreover, our model illustrated a significant improvement, compared to the RF model with an RMSE of 34.61 GPa and coefficient of determination of 0.779
Limited training data and a significant scale of elastic constants can lead to noticeable outline points, indicating relatively poor performance This performance can potentially be enhanced by gathering additional training data points.
We utilize K-Fold Cross-Validation to evaluate the performance of machine learning models, which helps determine how well the trained model generalizes to new, unseen data By selecting K as 5, we achieved an average performance result of 0.9767.
Next, the Voigt-Reuss-Hill equations (i.e., Equations 1) is used to calculate estimate
The study utilized ML-predicted constants to calculate the bulk modulus (B) and shear modulus (G) for a randomly selected 10% test set, applying average values from both Voigt and Reuss equations These averages facilitated the calculation of Poisson's ratio (ν) using Equation 2 The findings are visually represented in Figure 4.2, illustrating a comparison between the values of B and ν obtained from deep neural networks and those derived from DFT calculations.
The performance of the predictions for 𝐵 and 𝜈 is evaluated using various statistical metrics, as detailed in Table 4.1 The Mean Absolute Error (MAE) for 𝐵 is 6.3895 GPa, while for 𝜈 it is 0.0081 Additionally, the Root Mean Square Error (RMSE) is 9.4941 GPa for 𝐵 and 0.0104 for 𝜈.
𝜈 predictions, respectively It indicated a good agreement with the DFT computation
Figure 4.9 Comparison of predicted 𝐶 𝑖𝑗 (GPa) by using deep neural network and DFT calculation
Figure 4.3 compares the predicted 𝐶 𝑖𝑗 (GPa) values obtained from a deep neural network with those from DFT calculations The results demonstrate that nearly all elements in the elastic tensor show strong agreement with the DFT results.
CONCLUSIONS
This study presents a machine learning approach to efficiently predict the elastic properties of binary alloys using a comprehensive database of DFT-computed elastic constants Our method leverages deep neural networks to automatically extract hidden features that serve as structural descriptors, demonstrating significant improvements over previous studies that relied on predefined or compositionally-averaged features Additionally, the neural network models effectively calculated the Bulk modulus and Poisson’s ratio, yielding results consistent with DFT findings.
Nguyen Van Quyen, Nguyen Van Thanh, Tran Quoc Quan, Nguyen Dinh Duc,
Nonlinear forced vibration of sandwich cylindrical panel with negative Poisson’s ratio auxetic honeycombs core and CNTRC face sheets, Thin-Walled Structures, Volume 162,
2021, 107571, ISSN 0263-8231, https://doi.org/10.1016/j.tws.2021.107571 (Elsevier,
Nguyen Van Quyen and Nguyen Dinh Duc conducted a study on the vibration and nonlinear dynamic response of nanocomposite multi-layer solar panels supported by elastic foundations Their research, published in the journal Thin-Walled Structures, Volume 177, in 2022, provides valuable insights into the structural performance of solar panels under dynamic loading conditions The article, which can be accessed via DOI 10.1016/j.tws.2022.109412, contributes to the field with a focus on improving the design and reliability of solar energy systems.
Tien-Cuong Nguyen, Van-Quyen Nguyen, Van-Linh Ngo, Quang-Khoat Than, and Tien-Lam Pham explore the innovative application of deep neural networks to uncover hidden aspects of chemistry in their article "Learning Hidden Chemistry with Deep Neural Networks," published in Computational Materials Science, Volume 200, 2021 This research, which is accessible through the DOI link provided, contributes to the field by leveraging advanced computational techniques to enhance the understanding of chemical properties and behaviors, reflecting the significance of artificial intelligence in materials science The article is published by Elsevier and holds an impact factor of 3.3.
Van-Quyen Nguyen, Viet-Cuong Nguyen, Tien-Cuong Nguyen, Nguyen-Xuan-Vu
Nguyen, Tien-Lam Pham, Pairwise interactions for potential energy surfaces and atomic forces using deep neural networks, Computational Materials Science, Volume 209, 2022,
111379, ISSN 0927-0256, https://doi.org/10.1016/j.commatsci.2022.111379 (Elsevier,
[1] K Rajan, Materials informatics, Materials Today 8 (2005) 38–45 https://doi.org/https://doi.org/10.1016/S1369-7021(05)71123-8
[2] G Mulholland, S Paradiso, Perspective: Materials informatics across the product lifecycle: Selection, manufacturing, and certification, APL Materials 4 (2016)
[3] D Morgan, R Jacobs, Opportunities and Challenges for Machine Learning in Materials Science, Annual Review of Materials Research 50 (2020) 71–103 https://doi.org/10.1146/annurev-matsci-070218-010015
[4] R Ramprasad, R Batra, G Pilania, A Mannodi-Kanakkithodi, C Kim, Machine Learning and Materials Informatics: Recent Applications and Prospects, Npj Computational Materials 3 (2017) https://doi.org/10.1038/s41524-017-0056-5
[5] K Butler, D Davies, H Cartwright, O Isayev, A Walsh, Machine learning for molecular and materials science, Nature 559 (2018) https://doi.org/10.1038/s41586-018-0337-2
[6] J Schmidt, M.R ~G Marques, S Botti, M.A ~L Marques, Recent advances and applications of machine learning in solid-state materials science, Npj Computational Mathematics 5 (2019) 83 https://doi.org/10.1038/s41524-019- 0221-0
[7] R Vasudevan, G Pilania, P v Balachandran, Machine learning for materials design and discovery, Journal of Applied Physics 129 (2021) 70401 https://doi.org/10.1063/5.0043300
[8] A Agrawal, A Choudhary, Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science, APL Materials 4 (2016) 53208 https://doi.org/10.1063/1.4946894
[9] C Draxl, M Scheffler, NOMAD: The FAIR concept for big data-driven materials science, MRS Bulletin 43 (2018) 676–682 https://doi.org/10.1557/mrs.2018.208
[10] L Ward, C Wolverton, Atomistic calculations and materials informatics: A review, Current Opinion in Solid State and Materials Science 21 (2017) 167–176 https://doi.org/10.1016/j.cossms.2016.07.002
[11] A Jain, G Hautier, S.P Ong, K Persson, New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships, Journal of Materials Research 31 (2016) 977–994 https://doi.org/10.1557/jmr.2016.80
[12] B Sanchez-Lengeling, A Aspuru-Guzik, Inverse molecular design using machine learning: Generative models for matter engineering, Science (1979) 361
(2018) 360–365 https://doi.org/10.1126/science.aat2663
[13] L Chen, G Pilania, R Batra, T.D Huan, C Kim, C Kuenneth, R Ramprasad, Polymer Informatics: Current Status and Critical Next Steps, ArXiv abs/2011.00508 (2021)
[14] A Mannodi-Kanakkithodi, G Pilania, T Huan, T Lookman, R Ramprasad, Machine Learning Strategy for Accelerated Design of Polymer Dielectrics, Scientific Reports 6 (2016) https://doi.org/10.1038/srep20952
[15] R Batra, H Dai, T.D Huan, L Chen, C Kim, W.R Gutekunst, L Song, R Ramprasad, Polymers for Extreme Conditions Designed Using Syntax-Directed Variational Autoencoders, Chemistry of Materials 32 (2020) 10489–10500 https://doi.org/10.1021/acs.chemmater.0c03332
[16] T Lookman, P Balachandran, D Xue, R Yuan, Active learning in materials science with emphasis on adaptive sampling using uncertainties for targeted design, Npj Computational Materials 5 (2019) https://doi.org/10.1038/s41524- 019-0153-8
[17] F Họse, L.M Roch, A Aspuru-Guzik, Next-Generation Experimentation with Self-Driving Laboratories, Trends in Chemistry 1 (2019) 282–291 https://doi.org/10.1016/j.trechm.2019.02.007
[18] V Tshitoyan, J Dagdelen, L Weston, A Dunn, Z Rong, O Kononova, K Persson, G Ceder, A Jain, Unsupervised word embeddings capture latent knowledge from materials science literature, Nature 571 (2019) 95–98 https://doi.org/10.1038/s41586-019-1335-8
[19] E Kim, K Huang, A Saunders, A McCallum, G Ceder, E Olivetti, Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning, Chemistry of Materials 29 (2017) 9436–9444 https://doi.org/10.1021/acs.chemmater.7b03500
[20] A Jain, S.P Ong, G Hautier, W Chen, W.D Richards, S Dacek, S Cholia, D Gunter, D Skinner, G Ceder, K.A Persson, Commentary: The Materials Project:
A materials genome approach to accelerating materials innovation, APL Materials
AFLOWLIB.ORG is a comprehensive repository for materials properties, facilitated by high-throughput ab initio calculations The work of Curtarolo et al (2012) highlights the significance of this distributed database in the field of computational materials science, providing valuable insights and data for researchers The publication, found in *Computational Materials Science*, emphasizes the innovative approach to materials discovery and characterization.
[22] Jarvis, The joint automated repository for various integrated simulations (JARVIS) for data-driven materials design, Npj Computational Materials 6
[23] C Draxl, M Scheffler, The NOMAD laboratory: from data sharing to artificial intelligence, Journal of Physics: Materials 2 (2019) 36001 https://doi.org/10.1088/2515-7639/ab13bb
Matminer is an open-source toolkit designed for materials data mining, developed by L Ward and colleagues Published in the journal Computational Materials Science, this toolkit facilitates the analysis and discovery of materials data The article outlines its features and applications in enhancing materials research, providing researchers with essential tools for efficient data handling and analysis.
[25] F Pedregosa, G Varoquaux, A Gramfort, V Michel, B Thirion, O Grisel, M Blondel, P Prettenhofer, R Weiss, V Dubourg, J Vanderplas, A Passos, D Cournapeau, M Brucher, M Perrot, E Duchesnay, G Louppe, Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research 12 (2012)
[26] L Himanen, M.O.J Jọger, E v Morooka, F Federici Canova, Y.S Ranawat, D.Z Gao, P Rinke, A.S Foster, DScribe: Library of descriptors for machine learning in materials science, Computer Physics Communications 247 (2020) 106949 https://doi.org/https://doi.org/10.1016/j.cpc.2019.106949
[27] B Blaiszik, L Ward, M Schwarting, J Gaff, R Chard, D Pike, K Chard, I Foster, A data ecosystem to support machine learning in materials science, MRS Communications 9 (2019) 1125–1133 https://doi.org/10.1557/mrc.2019.118
[28] L Ghiringhelli, C Carbogno, S Levchenko, F Mohamed, G Huhs, M Lueders,
M Oliveira, M Scheffler, Towards efficient data exchange and sharing for big- data driven materials science: Metadata and data formats, Npj Computational Materials 3 (2017) https://doi.org/10.1038/s41524-017-0048-5
[29] A Talapatra, B.P Uberuaga, C.R Stanek, G Pilania, A Machine Learning Approach for the Prediction of Formability and Thermodynamic Stability of Single and Double Perovskite Oxides, Chemistry of Materials 33 (2021) 845–
858 https://doi.org/10.1021/acs.chemmater.0c03402
[30] A.C Rajan, A Mishra, S Satsangi, R Vaish, H Mizuseki, K.-R Lee, A.K Singh, Machine-Learning-Assisted Accurate Band Gap Predictions of Functionalized MXene, Chemistry of Materials 30 (2018) 4031–4038 https://doi.org/10.1021/acs.chemmater.8b00686
[31] A Seko, T Maekawa, K Tsuda, I Tanaka, Machine learning with systematic density-functional theory calculations: Application to melting temperatures of single- and binary-component solids, Phys Rev B 89 (2014) 54303 https://doi.org/10.1103/PhysRevB.89.054303
[32] G Pilania, P v Balachandran, C Kim, T Lookman, Finding New Perovskite Halides via Machine Learning, Frontiers in Materials 3 (2016) https://doi.org/10.3389/fmats.2016.00019
[33] G Pilania, C.N Iverson, T Lookman, B.L Marrone, Machine-Learning-Based Predictive Modeling of Glass Transition Temperatures: A Case of Polyhydroxyalkanoate Homopolymers and Copolymers, Journal of Chemical
Information and Modeling 59 (2019) 5013–5025 https://doi.org/10.1021/acs.jcim.9b00807
[34] A Seko, A Togo, H Hayashi, K Tsuda, L Chaput, I Tanaka, Prediction of Low- Thermal-Conductivity Compounds with First-Principles Anharmonic Lattice- Dynamics Calculations and Bayesian Optimization, Phys Rev Lett 115 (2015)
[35] M Andersen, S v Levchenko, M Scheffler, K Reuter, Beyond Scaling Relations for the Description of Catalytic Materials, ACS Catalysis 9 (2019) 2752–2759 https://doi.org/10.1021/acscatal.8b04478
[36] B Weng, Z Song, Z Rlong, Q Yan, Q Sun, C Grice, Y Yan, W.-J Yin, Simple descriptor derived from symbolic regression accelerating the discovery of new perovskite catalysts, Nature Communications 11 (2020) 3513 https://doi.org/10.1038/s41467-020-17263-9
[37] L Breiman, Random Forests, Machine Learning 45 (2001) 5–32 https://doi.org/10.1023/A:1010933404324
[38] A Furmanchuk, A Agrawal, A Choudhary, Predictive analytics for crystalline materials: bulk modulus, RSC Adv 6 (2016) 95246–95251 https://doi.org/10.1039/C6RA19284J
[39] P Gorai, D Gao, B Ortiz, S Miller, S.A Barnett, T Mason, Q Lv, V Stevanović, E.S Toberer, TE Design Lab: A virtual laboratory for thermoelectric material design, Computational Materials Science 112 (2016) 368–376 https://doi.org/https://doi.org/10.1016/j.commatsci.2015.11.006
[40] N Duffy, D Helmbold, Boosting Methods for Regression, Machine Learning 47
[41] J.D Evans, F.-X Coudert, Predicting the Mechanical Properties of Zeolite Frameworks by Machine Learning, Chemistry of Materials 29 (2017) 7833–7839 https://doi.org/10.1021/acs.chemmater.7b02532
The article by Wang et al (2017) presents innovative methodologies for predicting elastic constants by integrating density functional theory with machine learning techniques This research is published in *Computational Materials Science*, volume 138, pages 135–148 The study emphasizes the potential of combining advanced computational methods to enhance the accuracy of material property predictions.
[43] A.K Jain, J Mao, K.M Mohiuddin, Artificial neural networks: a tutorial,
Computer (Long Beach Calif) 29 (1996) 31–44 https://doi.org/10.1109/2.485891
[44] H Drucker, C.J.C Burges, L Kaufman, A Smola, V Vapnik, Support Vector Regression Machines, in: M.C Mozer, M Jordan, T Petsche (Eds.), Advances in