Polyconvex anisotropic hyperelasticity with neural networks Dominik K. Klein1,*, Mauricio Fernández2, Robert J. Martin3, Patrizio Neff3 and Oliver Weeger1 1Cyber-Physical Simulation Group, Department of Mechanical Engineering & Centre for Computational Engineering, Technical University of Darmstadt, Dolivostr. 15, 64293 Darmstadt, Germany 2Access e.V., Intzestr. 5, 52072 Aachen, Germany 3Chair for Nonlinear Analysis and Modeling, Faculty of Mathematics, University of Duisburg-Essen, Thea-Leymann-Str. 9, 45127 Essen, Germany *Corresponding author, email: klein@cps.tu-darmstadt.de October 29, 2021 Abstract In the present work, two machine learning based constitutive models for finite deformations are proposed. Using input convex neural networks, the models are hyperelastic, anisotropic and fulfill the polyconvexity condition, which implies ellipticity and thus ensures material stability. The first constitutive model is based on a set of polyconvex, anisotropic and objective invariants. The second approach is formulated in terms of the deformation gradient, its cofactor and determinant, uses group symmetrization to fulfill the material symmetry condition, and data augmentation to fulfill objectivity approximately. The extension of the dataset for the data augmentation approach is based on mechanical considerations and does not require additional experimental or simulation data. The models are calibrated with highly challenging simulation data of cubic lattice metamaterials, including finite deformations and lattice instabilities. A moderate amount of calibration data is used, based on deformations which are commonly applied in experimental investigations. While the invariant-based model shows drawbacks for several deformation modes, the model based on the deformation gradient alone is able to reproduce and predict the effective material behavior very well and exhibits excellent generalization capabilities. In addition, the models are calibrated with transversely isotropic data, generated with an analytical polyconvex potential. For this case, both models show excellent results, demonstrating the straightforward applicability of the polyconvex neural network constitutive models to other symmetry groups. Key words: constitutive modeling, nonlinear elasticity, anisotropic hyperelasticity, polyconvexity, ellipticity, material stability, soft materials, metamaterials, invariants, structural tensors, parameter identification, data-driven modeling, machine learning, input convex neural networks Accepted version of manuscript published in the Journal of the Mechanics and Physics of Solids. Date accepted: October 29, 2021. DOI: 10.1016/j.jmps.2021.104703. License: CC BY-NC-ND 4.0 1 Introduction In the last decades, a vast amount of highly specialised materials has been developed and, with advancing requirements in engineering applications, the trend is growing. In particular, with recent advances in additive and advanced manufacturing technologies, flexible and functional mechanical metamaterials and composites 1 mailto:klein@cps.tu-darmstadt.de https://doi.org/10.1016/j.jmps.2021.104703 https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode are being developed [77]. As a consequence, numerous constitutive models have been formulated, each specifically designed for the mechanical characteristics of a comparatively small class of (soft) materials [15, 84]. However, while the specific formulations may be different, the theoretical considerations that lead to physically sensible and mathematically well-posed models stay the same, i.e., ellipticity, thermodynamic consistency, objectivity, material symmetry, etc. [34]. Recent progress in the field of machine learning (ML) has sparked the development of data-driven nu- merical methods that avoid the explicit formulation of constitutive models and purely operate on discrete stress-strain data [14, 45, 64], as well as of data-driven constitutive models that employ reduced bases [22, 23, 88], polynomials [43, 87], or artificial neural networks (ANNs) [21, 49, 54, 86] for the representations of nonlinear energy potentials or stress-strain relationships. In particular, the latter approach offers a high flexibility and applicability to a wide range of materials due to the universal approximation properties of ANNs [36]. Furthermore, they can also be formulated to fulfill important material theoretical considerations, e.g., the material symmetry condition [20, 52]. In González, Chinesta, and Cueto [27], an analytical model was extended using ML methods. While preserving favorable properties of the analytical model, the ML extension can improve the models performance and, thus, the applicability to different materials, which is shown in González et al. [28] for vascular soft tissues. In Fernández et al. [20], two anisotropic material models for finite deformations are proposed. The first model is based on an analytical model [37], whose function space is extended using ANNs. While preserving favorable analytical properties of the model, the flexibility is significantly improved. For the second approach, the method of group symmetrization is introduced to fulfill the material symmetry. With the six independent components of the right Cauchy-Green tensor C = FTF as input quantity for an ANN, the model offers a highly flexible, objective hyperelastic potential, which is able to reproduce the challenging effective behavior of cubic beam-lattice metamaterials, including lattice insta- bilities. The model proposed by Linka et al. [52] uses structural tensors, c.f. [90], processed through ANNs, to formulate a set of invariants reflecting the intrinsic material anisotropy and, using these invariants as input for another ANN, predicts the potential of the hyperelastic structure. By using a Lagrangian multiplier to enforce incompressibility, c.f. [34, Section 6.3], the model is applicable to elastomeric material behavior, and shows excellent results for Treloar’s experimental data on vulcanised rubber. Both the models introduced in [20] and [52] include the objectivity and material symmetry condition in their formulation. In Heider, Wang, and Sun [31], a neural network based constitutive model for anisotropic elastoplastic materials is proposed, and different methods are examined to fulfill the objectivity condition, e.g., the representation of stress and strain tensors in their spectral decomposition for the input and output data of the neural network. In the context of finite elasticity theory, existence and stability of solutions for boundary-value problems are strongly linked to the notion of polyconvexity introduced by Ball [4, 5]. From a material theory point of view, polyconvexity is advantageous since it implies ellipticity of the corresponding constitutive model, while being more straightforward to include into the model formulation than the ellipticity condition itself. Ellipticity ensures material stability [62, 89], and is thus important for numerical applications such as the finite element method. As will be shown for the model of Fernández et al. [20], the polyconvexity condition is easily violated when using unrestricted ANNs without caution. As this may lead to a loss of ellipticity and thus to material instability, the polyconvexity condition should be treated with special care in the model formulation. The theoretical aspects of polyconvexity are still subject to current research [25, 58, 59], and the formula- tion of polyconvex models remains a challenging task [56, 57]. For a long time after its initial conception, the polyconvexity condition was practically restricted to isotropic material response, as no anisotropic formulation was available that ensured at the same time: polyconvexity, objecticity, material symmetry and a stress-free reference configuration. In fact, the fulfillment of multiple constitutive requirements at the same time can be seen as “the main open problem of the theory of material behavior” (Truesdell and Noll [79, Section 20]). In a landmark paper, Schröder and Neff [70] derived a formulation which fulfills all of the former mentioned requirements, quickly followed by [18, 29, 38, 41, 68, 71, 72]. It was shown that questions of material stability could neatly be avoided while being able to match experimental data [6]. The approaches are based on second and fourth order structural tensors, which can reproduce a wide range of anisotropies. Combining the structural tensors with the right Cauchy-Green deformation tensor, sets of invariants can be derived. By a suitable construction of polynomials of these invariants, constitutive models can be created which fulfill all of the former mentioned restrictions. By using polyconvex invariants instead of non-polyconvex invariants, constitutive models can be improved [11]. Subsequently, also finite strain finite element methods tailored to 2 the discretization of polyconvex material models were developed [10, 66]. It must be emphasized, however, that the polyconvexity condition is a purely mathematical framework that ensures ellipticity, but unlike ellipticity, to the best knowledge of the authors, is not a fundamental physical requirement. For example, polyconvexity imposes restrictions upon arbitrarily large strains that will never occur in actual experiments. Here, we adopt polyconvexity as a suitable means to ensure overall material stability, i.e., to avoid loss of ellipticity. Otherwise, this would be cumbersome to be checked a posteriori. Note that the only real physical requirement is satisfaction of ellipticity in a compact set including the identity. But even this can be very challenging without polyconvexity [61, 62]. Coming back to the field of ML, the difficulty of formulating physically sensible models, e.g., material models which fulfill multiple constitutive requirements, is still an open problem [16, 42, 82, 83]. Several constitutive modeling approaches found in the literature consider convexity properties, however, none of them known to the authors fulfills the polyconvexity condition. In Ghaderi, Morovati, and Dargazany [24], an attempt is made to formulate polyconvex potentials based on ANNs. However, for the neural network core, non-convex activation functions are used such as the hyperbolic tangent, and consequently, the potential proposed in [24] violates the polyconvexity condition, which follows from corollary A.4. In Vlassis, Ma, and Sun [81], a feed-forward neural network (FFNN) based hyperelastic model for anisotropic material behavior is proposed, and the convexity of the resulting potential w.r.t. the right Cauchy-Green tensor C is examined for a special material. However, while the potentials are convex in C for some examined deformation modes, the model only approximates the convexity condition and, consequently, it may be violated for other deformations. Furthermore, convexity in C does not ensure polyconvexity, as not all components of C are convex in the deformation gradient F . Therefore, convexity in C does not imply ellipticity, and exhibits no physical significance at all. The constitutive model proposed in Xu, Huang, and Darve [85] identifies the Cholesky factor of a tangent stiffness matrix to describe the behavior of the material. By using symmetric positive definite neural networks, the formulation is closely linked to convex potentials, and numerical robustness of the model is ensured. While the model was able to predict both time-dependent and plastic problems, it requires a large amount of data for the calibration. A FFNN based constitutive model for nearly incompressible anisotropic hyperelasticity is proposed in Tac et al. [78]. By using additional terms in the objective function which enforce semi positive-definiteness of the FFNN’s Hessian, the neural network is approximately convex. However, the polyconvexity of the invariants which are used as input quantities for the neural network is not examined. For example, the invariant Ī2 = 1 2 ( tr C̄ )2 − 1 2 tr C̄ 2 with C̄ = J−2/3C is not polyconvex [29, Lemma 2.4]. Therefore, in [78], the polyconvexity condition is violated by the choice of input arguments. In the present work, two polyconvex constitutive models are introduced, which are both based on input convex neural networks (ICNNs), see [2], and formulated for hyperelastic, anisotropic material behavior and finite deformations. ICNNs are a special class of FFNNs which are, through a suitable network architecture, constructed as convex functions. As already discussed, there is a variety of data-driven or machine learning methods which are used for constitutive modeling. However, for the construction of polyconvex potentials, convex FFNNs (=̂ ICNNs) are a very natural choice, as the simple mathematical structure of FFNNs offers a straightforward application of analytically received convexity conditions, c.f. Appendix A. While there are several applications for ICNNs in, e.g., convex optimization [2, 12, 13], the application towards polyconvex constitutive models has, to the best of the authors’ knowledge, not been studied yet. The first model developed in this work is based on a set of invariants already introduced in Schröder, Neff, and Ebbing [73], which fulfills the objectivity and material symmetry condition by construction. Using ICNNs, substantially more complex functions for the potential compared to polynomial approaches can be created, while preserving the polyconvexity of the model. While this model can be seen as a straightforward extension of invariant- polynomials to the more flexible function space ANNs offer, the second constitutive model is specifically designed for machine learning. Formulated directly in the deformation gradient, its cofactor and determinant, the objectivity condition is not fulfilled by construction. That objectivity, i.e., invariance of the energy W (QF ) = W (F ) under rotations Q ∈ SO(3), is not automatically satisfied, may be surprising at first glance. Usually, hyperelastic formulations come with objectivity built in a priori. Indeed, objectivity is not derived from an experimental observation, but is a physical law, c.f. sections 17 and 19 in Truesdell and Noll [79]. However, in consideration of a useful model for a compact set of deformation gradients F = Dφ, the possible error in not exactly satisfying objectivity remains controllable. The benefit in giving up exact objectivity is the increased flexibility in combining all the needed constitutive requirements. We remind that the major obstacle in originally extending polyconvexity to the anisotropic setting was coming from exact objectivity 3 combined with the stress-free reference configuration and material symmetry. Here, the objectivity condition is approximated using data augmentation, following [51]. The approach is based on mechanical considerations, and requires no further simulation or experimental data. The anisotropy is taken into account with the group symmetrization already introduced in Fernández et al. [20]. Both models can be calibrated with a moderate amount of data, which is shown for the highly challeng- ing behavior of cubic beam-lattice metamaterials, using synthetic homogenization data. Modern additive and advanced manufacturing methods allow for a variety of tailored materials with specifically designed mi- crostructure, consisting of, e.g., beams and shells, which leads to mechanical characteristics not encountered in classical materials, e.g., due to lattice instabilites within the microstructures [8, 26, 40, 44, 50, 53]. Nonlin- ear multiscale simulations for this class of metamaterials can be executed by homogenizing the microstructure [60] either in a current multiscale setting using the FE2 method [26], or sequentially based on the formulation of an effective constitutive model [20, 39, 43], as applied here. The outline of the manuscript is as follows. In Sect. 2, the present work starts with a short introduction to the basics of constitutive modeling relevant to this work. In Sect. 3, the lack of polyconvexity for models with unrestricted ANN architecture is discussed, and the two polyconvex ML based models are introduced. Finally, in Sect. 4, the models are calibrated to the highly challenging, homogenized behavior of cubic lattice metamaterials and compared with each other and to a conventional polyconvex model. The application to another material symmetry, i.e., transverse isotropy, is demonstrated in Sect. 5. In Sect. 6, some issues raised by the use of ML techniques in nonlinear elasticity theory are discussed. After the conclusion in Sect. 7, some general properties of ICNNs are introduced and discussed in Appendix A. Notation Throughout this work, tensor compositions and contractions are denoted by (AB)ij = AikBkj , a · b = aibi = ⟨a, b⟩, A : B = AijBij and (A : A)ij = AijklAkl, respectively, with vectors a and b, second order tensors A and B and fourth order tensor A. The tensor product is denoted by ⊗, the second order identity tensor by 1. The first Fréchet derivative of a function f w.r.t. X is denoted by DXf , the second Fréchet derivative (or Hessian) is denoted by D2 Xf . For the function composition f(g(x)) the compact notation f ◦ g ◦ x is applied. The set of invertible second order tensors with positive determinant is denoted by GL+(3) := { X ∈ R3×3 | detX > 0 } , the special orthogonal group in R3 by SO(3) := { X ∈ R3×3 | XTX = 1, detX = 1 } . 2 Basics of material theory The hyperelastic potential W : GL+(3) → R , F 7→ W (F ) (1) corresponds to the strain energy density stored in the body Ω ⊂ R3 due to the deformation φ : Ω → R3, and depends on the deformation gradient F = Dφ [34]. To ensure a physically sensible and mathematically well-posed formulation, several restrictions on the potential must be considered. The restrictions relevant to the present work are shortly introduced in the following: The reference configuration of the body must be stress-free, i.e., for the first Piola-Kirchhoff stress S = DFW (F ) (2) it holds that S(1) = 0 [34]. In fact, this is implied by the condition that the potential (1) attains its unique, global minimum at the identity, i.e., W (1) = 0 and W (F ) ≥ 0 [34]. The principle of objectivity [79] states that the material behavior has to be independent of the observer, which means that the potential is invariant under the transformation W (QF ) = W (F ) ∀F ∈ GL+(3) , Q ∈ SO(3) . (3) Following (2), this implies the invariance of the stress tensor under transformations according to S(QF ) = QS(F ) ∀F ∈ GL+(3) , Q ∈ SO(3) . (4) For models formulated in terms of the right Cauchy-Green tensor C = FTF , i.e., as W = W (C), the material objectivity condition is automatically fulfilled, which is an advantage compared to models depending directly 4 on the deformation gradient. For anisotropic materials with symmetry group G ⊂ SO(3), the strain energy must be invariant under the symmetry transformation [30] W (F Q) = W (F ) ∀F ∈ GL+(3) , Q ∈ G ⊂ SO(3) , (5) which implies the stress invariance condition S(F Q) = S(F )Q ∀F ∈ GL+(3) , Q ∈ G ⊂ SO(3) . (6) The growth condition W (F ) → ∞ as detF → 0+ ( ⇔ 1 detF → ∞ ) (7) captures the physical consideration that for infinitely large volumetric compression, an infinite amount of energy is required. There are several other growth conditions known in constitutive modeling, which regard the material behavior for very large deformations and are often referred to as coercivity conditions.1 However, while the case of large volumetric compression (7) is important to consider especially for highly compressible materials, coercivity conditions regard material behavior which usually lies outside the considered deformation modes. Therefore, they are of rather theoretical interest, and will not be considered throughout this work. In finite elasticity theory, the existence of minimizers for the underlying variational functionals is guar- anteed if the energy potential (1) fulfills the polyconvexity condition introduced by Ball [4, 5] and an ad- ditional coercivity condition [47]. The potential W ( F ) is polyconvex if and only if there exists a function P : R3×3 × R3×3 × R → R such that W (F ) = P(F, Cof F, detF ) , (8) so that P is convex in its arguments (F, Cof F, detF ). The function P is in general non-unique [70]. Due to the reason mentioned above, the coercivity condition is not considered in this work. For convex func- tions which are sufficiently smooth, the Hessian matrix is positive semi-definite [75], and the polyconvexity condition can be formulated as δξ ·D2 ξP(ξ) · δξ ≥ 0 ∀ ξ, δξ , (9) with the rearranged arguments ξ = (F, Cof F, detF ) ∈ R19 and δξ ∈ R19 [17]. From this point on, the notation ξ = (F, Cof F, detF ) ∈ R19 will be used to adress the argument in the polyconvexity contexts more compactly. Polyconvexity implies ellipticity, while being more straightforward to include into the model formulation than the ellipticity condition itself. The ellipticity (or rank-one convexity) condition [62, 89] (a⊗ b) : D2 FW (F ) : (a⊗ b) ≥ 0 ∀ a, b ∈ R3 (10) ensures material stability of the constitutive model. 3 Polyconvex constitutive models based on FFNNs Feed-forward neural networks (FFNNs) are universal approximators [36], meaning that they can represent continuous functions of arbitrary complexity. FFNN based constitutive models exploit this important prop- erty by using them as highly flexible functions within the model formulations, e.g., to represent the energy potential W defined in (1). This is in contrast to conventional approaches, which rely on a human choice for the representation of W , thus potentially reducing the possible function space. Furthermore, through the right choice of network architecture, FFNNs can be constructed as convex functions, which is an often required property in the context of constitutive modeling. In finite elasticity theory, the construction of potentials which are convex in the deformation gradient, its cofactor and determinant is of special interest, which is referred to as polyconvexity [5]. By using convex FFNNs, i.e., ICNNs, hyperelastic potentials can be generated which satisfy the polyconvexity condition, see 1For a coercive function, f (x) → ∞ as ∥x∥ → ∞ holds. 5 (8). Polyconvexity implies ellipticity, and thus ensures material stability of the constitutive models. The convexity condition is not trivially fulfilled by arbitrary FFNNs. Consequently, for FFNN based models found in the literature a loss of polyconvexity can often be detected, which may lead to a loss of ellipticity and thus material instability. This is now discussed for the approach of Fernández et al. [20]. Fernández et al. [20] introduced several ML based constitutive models, from which we will examine the potential model. The model uses a FFNN core for the hyperelastic potential, with the right Cauchy-Green tensor C = FTF as input, and fulfills the material symmetry condition (5) with a group symmetrization of the potential, see eq. (20). With the six independent components of C as input, the model fulfills the objectivity condition (3) per construction. The internal FFNN used in this work is built upon multiple compositions of the Softplus function, cf. (A.12), with unrestricted weights. This model violates the polyconvexity condition in two aspects. First of all, only the principal diagonal elements of the right Cauchy-Green tensor C are convex in the deformation gradient F , while the remaining elements of C are neither convex nor concave.2 Additionally, compositions of Softplus functions with arbitrary parameters are not convex, as is discussed in corollaries A.4 and A.8 in App. A. Furthermore, while the metamaterials under consideration in [20] are highly compressible, the volume compression condition (7) is not considered in the model formulation. Therefore the model does not ensure a physical sensible behavior for the limiting case of infinitely large volumetric compression. A brief introduction to convex FFNNs, i.e., ICNNs, and the notation used throughout this work can be found in App. A. In corollaries A.4 and A.6 the standard conditions for the fulfillment of convexity are provided. Activation functions fulfilling these conditions are presented in theorem A.7 and corollary A.8. These activation functions lead to the ICNN cores discussed in proposition A.9. In a nutshell, an ICNN is easily constructed based on standard FFNNs with input vector X based on the following restrictions: • the first hidden layer A1 is neuron-wise convex with respect to X (e.g., by employing convex activation functions as Softplus, s(x) = log(1 + exp(x)), in each neuron) • from the second A2 to the last hidden layer AH , each hidden layer is neuron-wise convex and non- decreasing with respect to the previous hidden layer (e.g., by employing s(x) with non-negative weights in each neuron) • the scalar-valued output layer a is convex and non-decreasing with respect to the last hidden layer. These restrictions imply that the composition a ◦ AH ◦ . . . ◦ A2 ◦ A1 ◦ X = ā(X) is convex in X. These architectures are shortly denoted as ICNNs from now on. For several passages, the abbreviation A = AH ◦ . . . ◦A1 will be used for the core (i.e., the hidden layers) of the ICNN. The collection of all weights and biases of the ICNN will be abbreviated by the internal parameter vector p. We now present two polyconvex ICNN-based approaches for anisotropic hyperelastic constitutive model- ing, extending the works of Schröder, Neff, and Ebbing [73] and Fernández et al. [20]. 3.1 Model based on invariants Polyconvex model based on invariants In Schröder, Neff, and Ebbing [73], a polyconvex potential for trigonal, tetragonal and cubic symmetry groups is proposed, see also [17]. In the following, we will shortly describe the cubic potential. It is formulated in polynomials of invariants, which are based on a combination of the right Cauchy-Green tensor and an anisotropic structural tensor. The fourth order tensor G is called a structural tensor of the material symmetry group G ⊂ SO(3) if G = Q ∗G ∀ Q ∈ G ⊂ SO(3) , (11) where ∗ denotes the Rayleigh product [9]. For some material symmetry groups, second order structural tensors are sufficient to represent the symmetry, while for other symmetries, the introduction of fourth or sixth order tensors is required [90]. For cubic symmetry, a fourth-order structural tensor is required. In [73], the fourth order structural tensor Gcub = 3∑ i=1 ei ⊗ ei ⊗ ei ⊗ ei (12) 2This does not mean that objectivity and polyconvexity contradict each other, it rather shows how challenging it is to combine both requirements. 6 is used for the cubic group G7, with ei being the basis vectors of a Cartesian coordinate system. A description of the cubic group G7 can be found in [17, Section 3.7]. Using the structural tensor (12), a set of five polyconvex invariants can be derived as I1 = trC , I2 = tr (Cof C) , I3 = detC , J7 = C : Gcub : C , J11 = Cof C : Gcub : Cof C . (13) The first three invariants are the well-known isotropic invariants, while the remaining two invariants possess the cubic symmetry. The invariants I1 and J7 are convex in F , the invariants I2 and J11 are convex in Cof F , and the invariant I3 is convex in detF . Since the invariants are formulated in the right Cauchy-Green tensor, they fulfill the principle of objectivity. Based on these invariants, Schröder, Neff, and Ebbing [73] proposed the potential W SNE cub = κ n∑ r=1 ( 1 (αr + 1)3αr I (αr+1) 1 + 3 (βr + 1)3βr I (βr+1) 2 + 1 (ηr + 1)3ηr J (ηr+1) 7 + 9 γr I−γr 3 ) , (14) where we introduced the additional parameter κ. With κ > 0, αr, βr, ηr ≥ 0 and γr ≥ −1/2, γr ̸= 0, the potential is polyconvex, coercive and has a stress-free reference configuration. Extension with neural networks A vector of group specific objective invariants I(ξ) is defined, where each component of I(ξ) is convex in ξ, i.e., I(ξ) is a polyconvex vector-valued function. For instance, I = (I1, I2, I3, J7, J11) can be considered for cubic materials. This motivates the model P0(ξ; p) = a ◦ A ◦ I ◦ ξ , (15) where the layers a ◦ A in (15) are restricted to convex and non-decreasing activation functions, since I represents the first convex layer, see remark A.10 for a discussion. The model (15) is polyconvex due to the usage of ICNNs und fulfills the objectivity and material symmetry conditions due to the considered invariants. Growth condition The volumetric growth condition (7) can be fulfilled with coercive functions. How- ever, since ICNNs are not necessarily coercive, they are not suited to fulfill this condition, and therefore, an analytical term should be added to the potential (15) according to W = P0(ξ; p) +Wvol (detF ) , (16) where we choose the polyconvex term [29] Wvol(detF ) = ( detF + 1 detF − 2 )2 . (17) Computation of stress The scalar-valued function W given by (16), which consists of the neural network P0 from (15) and the volumetric growth term Wvol from (17), is then interpreted as a hyperelastic potential, see (1), and the first Piola-Kirchhoff stress S = DFW (F ) is calculated as its gradient, see (2). In the present work, automatic differentiation [7] is used, which is widely available in modern machine learning libraries. This approach is, of course, not only applicable to the present model but to any differentiable model built in a ML library providing automatic differentiation. Reference state In the reference configuration F = 1, the model (16) is not stress-free by construction. However, elastic stress-strain data received from numerical or experimental investigations will always have the property S(1) = 0. When the model is calibrated with this data, it also approximates the stress behavior for F = 1, thus fulfilling S(1) = 0 in an approximate fashion. In Fernández et al. [20] a projection approach is proposed which fulfills the stress-free reference configuration in an exact way. While this approach preserves polyconvexity when formulated in terms of F instead of C, it is not compatible with the methods of incorporating objectivity and material symmetry which are used in the present work, and therefore cannot be applied. 7 3.2 Model based on the deformation gradient Model formulation Based on the definition of polyconvexity (8), the most straightforward inputs for a polyconvex FFNN-based model are the deformation gradient, its cofactor and determinant itself. Thus, when ξ = (F, Cof F, detF ) ∈ R19 is used as input for an ICNN ā(ξ) = a ◦A ◦ ξ, the output ā(ξ) is convex in ξ and the resulting potential is polyconvex, but does not fulfill the material symmetry condition, in general. Here, the group symmetrization of a function ϕ(F ) with respect to a given finite group G ⊂ SO(3) with #(G) elements, as introduced in Fernández et al. [20] for ML-based models, is of interest ϕG(F ) = 1 #(G) ∑ Q∈G ϕ(F Q) , (18) since the group symmetrized function ϕG(F ) fulfills the material symmetry conditions of the considered group. Application of group elements Q ∈ G on F , i.e, F → F Q, results in the linear transformation of ξ = (F, Cof F, detF ) according to ξ∗sQ = (F Q, (Cof F )Q, detF ), (19) where we introduced the operator ∗s that specifies the transformation of ξ by Q, and exploited that Cof(F Q) = (Cof F )Q [70]. Such a linear transformation does not affect the convexity of the ICNN ā(ξ), see corollary A.6 for an explicit proof. This implies that the following potential model P0(ξ; p) = a ◦ A ◦ ξ , W = 1 #(G) ∑ Q∈G P0(ξ∗sQ; p) +Wvol (20) is polyconvex and fulfills the material symmetry and volume compression conditions. It should be noted that compared to the previous invariant based model, the first hidden layer of the core A in (20) is not restricted to convex and non-decreasing activation functions, but only to convex functions. This allows more flexibility in the first hidden layer. Infinite groups and finite subgroups The anisotropy of many materials can be described by finite groups, e.g., cubic, orthotropic or monoclinic materials, such that the group symmetrization (20) can be carried out for the exact fulfillment of the material symmetry condition (5). For some materials, the symmetry group is infinite, e.g., for transversely isotropic materials. For such cases, as also remarked in Fernández et al. [20], a pragmatic solution for the usage of the group symmetrization (20) can be obtained by consideration of a finite subgroup of the infinite group. For the case of transverse isotropy, a finite subgroup G∗ ti ⊂ Gti for chosen N can be simply constructed by G∗ ti := { Qα x | α = 2π n/N, n = 1, 2, . . . , N } ⊂ Gti := { Qα x | α ∈ [0, 2π) } (21) where Qα x denotes a rotation around the preferred axis x by the angle α. The finite subgroup G∗ ti contains N equidistant elements of Gti. As will be shown in Sect. 5, already N = 6 elements can be sufficient for an excellent approximation of transverse isotropy. Computation of stress As in the invariant-based, differentiable FFNN-based model, the present model uses automatic differentiation for the computation of the stresses. Objectivity While the determinant of the deformation gradient is an invariant quantity with respect to change of observers, the deformation gradient itself and its cofactor depend on the choice of observer. Thus, the model W as defined in (20) is generally not objective. While the objectivity condition could be fulfilled trivially by using the components of the right Cauchy-Green tensor C, this would violate the polyconvexity condition, as only the main diagonal elements of C are convex in the deformation gradient. Consequently, in order to meet both polyconvexity and objectivity, we choose a quantity which is suitable to fulfill the polyconvexity condition, and take further steps to approximate the objectivity condition. Then, the objectivity of the model is approximated with a data augmentation approach, following Ling, Jones, and 8 Templeton [51]. Based on the existing calibration dataset for a single observer D = {F, S, W}, the dataset is extended by a finite amount of additional observers according to D̃ = ⋃ Q∈G {QF, QS, W} , G ⊂ SO(3) . (22) For pragmatic reasons, a finite amount of randomly distributed rotation matrices is chosen. The data augmentation approach can be visualized as follows: Without data augmentation, the constitutive model is calibrated with a single observer. As ab initio, the model does not know how to extrapolate the learned material behavior to other observers, it is only applicable to the observer chosen in the calibration dataset. When the data augmentation approach (22) is applied, the model is calibrated with multiple observers. If the model is then evaluated with an arbitrary observer, the model will yield reasonable results, as the material behavior for any observer can be seen as the interpolation between the observers with which the model was calibrated. With a sufficient number of observers, the model can be trained to behave nearly independent of the observer, as will be demonstrated in the upcoming examples. In fact, for hyperelastic potentials, it is sufficient to apply the objectivity condition (3) to the potential only, which directly implies the transformation rule for the stress tensor. In Ling, Jones, and Templeton [51] only the potential values were extended and no visualization of evaluation of the stress values was provided. However, the model quality can benefit from the additional information the stress tensor provides, and therefore, both quantities are used for the data augmentation in this work. While this approach increases the size of the training data and thus the required calibration time, it is important to note that the time required for the model evaluation is not affected, and that no additional simulation or experimental data is required. Considering (3) and (4), objectivity could be further enforced by adding terms of the form |W (QF ) − W (F )|2 and ∥S(QF )−QS(F )∥2 to the objective function used for calibration of the model. As, in this ap- proach, both W and S are received by the constitutive model, the choice of F is not restricted to deformation states used in the calibration dataset. Consequently, F may be sampled in a sensible range of deformations in which the model is to be applied, e.g., following the sampling strategy as proposed by Kunc and Fritzen [48], together with a finite amount of random rotation matrices Q. This approach takes into account that objectivity is a physical requirement, which must be fulfilled independent of stress-strain data available for a specific material. However, in the present work, no additional term is added to the objective function, as the objectivity can already be approximated very well with the data augmentation approach (22), as will be shown in Section 4.3. 4 Application to cubic metamaterials 4.1 Homogenized behavior of soft beam-lattice metamaterials The performance of the models proposed in the previous section is now examined in application to the homogenized behavior of beam-lattice structures with cubic anisotropy. In Fernández et al. [20], the ho- mogenized behaviors of the cubic BCC cell and the cubic X cell are numerically investigated for sev- eral calibration and test scenarios, with full data availability on the public GitHub repository https: //github.com/CPShub/sim-data. The mechanical behavior of these metamaterials is highly nonlinear and exhibits several challenging characteristics like lattice instabilities, which makes it a good benchmark case for the models. The cubic BCC structure consists of body-centered beams with additional beams along all edges. Identi- fying the smallest unit from which the structure is built, the BCC unit cell is obtained which, regarding the periodicity, contains beams only at three edges, see Fig. 1b. For large structures built from lattice metama- terials, it is convenient to formulate a constitutive model for the homogenized behavior of the cells, instead of simulating every single beam of the structure. To determine the homogenized behavior of the BCC unit cell, finite element simulations are carried out, where effective deformation gradients F are applied on the cell using displacements on the outer nodes of 3Figure from Fernández et al. [20]. 9 https://github.com/CPShub/sim-data https://github.com/CPShub/sim-data (a) F11 = 0.63 (b) F11 = 1.00 (c) F11 = 1.50 Figure 1: Uniaxial deformation in x-direction for the BCC unit cell.3 the beams. With periodic boundary conditions, the behavior of the unit cell within a larger structure can be simulated. The averaged, or effective, strain energy density W can directly be calculated with the strain energy stored in the beams and the size of the cell. For the structures under consideration, the effective stress response S of the cell can be computed with the reaction forces on the outer nodes and the size of the cell [20]. In contrast to physical experiments, this numerical characterization yields not only the stress response of the material, but also the strain energy density, and the resulting dataset D = {(F1, S1, W1) , . . . } (23) consists of triplets for corresponding deformation gradient, effective stress and effective strain energy density for each simulation step and applied deformation. For the calibration data used here, uniaxial, equibiaxial, planar, shear, and volumetric deformations were applied on the unit cell. These deformations can also be applied in physical experiments. The calibration dataset DC consists of 905 triplets. Beside the calibration data, three test cases were examined. For the first two test cases, biaxial deformations with different stretch ratios are applied on the unit cell, while the third test case is a combination of tension and shear deformation. All three test cases exhibit lattice instabilities and complex deformations, which are not included in the calibration data. The test dataset DT consists of 605 datapoints. Further technical details on the simulations can be found in [20]. In the present work, the first test case of [20] is referred to as “biaxial test”, while the third test case is referred to as “mixed test”. Especially the “mixed test” is a good benchmark case, as the combination of tension and shear is a very general deformation. Due to the soft materials and high slenderness of the beams, lattice instabilities occur for several deforma- tion modes. In Fernández et al. [20], the microstructure simulation were carried out with an experimentally validated nonlinear post-buckling analysis approach [39]. Taking a closer look at the uniaxial deformation of the BCC cell, it exhibits instabilities in both compression and tension, which are shown in Fig. 1. Even for small uniaxial compression, the load bearing beams show instabilities, while for tension the less stressed beams on the edges show instabilities, which results in a highly differing behavior of the cell for compression and tension, leading to distinctive changes in the slope of the stress components, see Fig. 2. This highly challenging behavior is observed for all deformation modes, in both the calibration and test cases. The properties of the constitutive models are examined with the BCC cell, the behavior for the cubic X cell is only shortly discussed. The X cell consists of body-centered beams, without additional beams at the edges like the BCC cell. Therefore, less lattice instabilities occur compared to the BCC cell. While uniaxial and equibiaxial deformation gradients are quite similiar, their stress response for the X cell differ by a factor of ten, which is a challenging behavior for constitutive models. Apart from this case, the stress response of all load scenarios has the same order of magnitude. Furthermore, the amount of data points is roughly equal for all load scenarios. Therefore, for the following investigations, no data normalization is applied. If necessary, strategies such as the L2 normalization of deformation cases proposed in [19] could be applied. 10 0.75 1.0 1.25 1.5 0.6 1 1.4 F11 F ij F11 F22 F33 0.75 1.0 1.25 1.5 0 10 20 F11 S ij S11 S22 S33 Figure 2: Deformation (left side) and first Piola-Kirchhoff stress S (right side) for uniaxial deformation of the cubic BCC cell. Lattice instabilities express themselves by nearly horizontal stress values of S11 (compression) and decreasing slope of S11 (tension). Stress in [hPa]. 4.2 Preparation of the models Analytical model (WSNE cub ) The model proposed by Schröder, Neff, and Ebbing [73] is used as an example for conventional polyconvex models formulated in terms of invariants. For the potential of the model (14), n = 2 summands are used. FFNN model by Fernández et al. [20] (WC) The potential model proposed by [20] is used as an example for FFNN based models which are not polyconvex by construction. The model is similar to the one discussed in section 3.2, with the difference that the six independent components of the right Cauchy-Green tensor are used as input, therefore the model is objective by construction. Following [20], the models core is built from three layers with 16 nodes using Softplus functions in each layer, and unrestricted parameters. Polyconvex ICNN model based on invariants (W I) The set of cubic invariants introduced in [73], see eq. (13), is used, with the additional invariant I∗3 = −2 √ I3. This additional polyconvex invariant can be received from the last summand in eq. (14), and is important for the model to represent negative stress values. Using four isotropic invariants and two cubic invariants, the input of the neural network is the vector of invariants I = (I1, I2, I3, I ∗ 3 , J7, J11) ∈ R6. An ICNN core based on Softplus (SP) functions, see (A.12), is used, with the number of layers in {2, 3} and the number of nodes in {8, 16, 32}. The weights in all layers are non-negative. The restrictions on the networks parameters, which are caused by the convexity condition, are discussed in proposition A.9. For a compact notation, the core is referred to as e.g. SP [8, 8] for a core containing two layers with eight nodes with SP functions in each layer. Polyconvex ICNN model based on the deformation gradient (WF) For the second model, the input (F, detF ) ∈ R10 is chosen. Several simulations showed that, for the metamaterials under consideration, the deformation gradient alone is not enough to represent the material behavior, while there is no improvement when using all arguments of (F, Cof F, detF ). The group symmetrization is carried out with the cubic group G7 ⊂ SO(3), which contains 24 orthogonal transformations [17]. The network architectures are chosen as described above for the invariant based model W I, but with arbitrary weights in the first layer, and non- negative weights in all subsequent layers, c.f. proposition A.9. Performance measures While, from experimental investigations of materials, only stress values are available, the numerical evaluation of the unit cells yields both effective energy density and the stress tensor. For a given dataset D, the mean squared error (MSE) of the constitutive model W□, □ ∈ {SNE, C, I, F}, is 11 defined as MSE□ (p) = 1 # (D) ∑ F∈D [ 1 (J/m3)2 ( W (F )−W□ (F ; p) )2 + 1 9Pa2 ∥∥∥S (F )− S□ (F ; p) ∥∥∥2 ] (24) where #(D) denotes the number of datapoints in D. The parameters p of the models are found as the mini- mizers of the corresponding MSE. However, since the minimization of the MSE is a non-convex optimization problem, a local minimum p may strongly depend on the initial guess and the optimization algorithm. Thus, we introduce the mean deviation of two model instances (M̃D) with parameters pi and pj on a set of defor- mation gradients DF as M̃D □ (pi, pj) = 1 # (DF ) ∑ F∈DF [ 1 (J/m3)2 ( W□ (F ; pi)−W□ (F ; pj) )2 + 1 9Pa2 ∥∥∥S□ (F ; pi)− S□ (F ; pj) ∥∥∥2 ] . (25) Then, the overall mean deviation of a model (MD) with n instances is obtained as the averaged M̃D of all possible combinations of model instances: MD□ (p1, . . . , pn) = ( n 2 )−1 n−1∑ i=1 n∑ j=i+1 M̃D □ (pi, pj) . (26) For the dataset used to calibrate the parameters, different sets of experiments may be applied on a material. As long as each dataset contains all the required information, different calibration datasets should yield the same model behavior. However, for models whose parameters have no physical interpretation (which is the case for ML based models), different calibration datasets may lead to a different model behavior. Thus, we introduce the mean deviation (MD) between a single model instance calibrated on the dataset D1 with parameters p0 and multiple model instances trained on another dataset D2 with parameters p1, . . . , pn: MD □ (p0, p1, . . . , pn) = 1 n n∑ i=1 M̃D □ (p0, pi) . (27) Implementation The model W SNE has been implemented in MATLAB R2021a, the machine learning models were implemented in TensorFlow 2.3.0 with Python 3.7. Each machine learning model has been trained with the full batch of training data and the ADAM optimizer using default settings, every architecture was initialized three times. The models WC and W I were trained for 20, 000 epochs using the calibration dataset DC . The model based on the deformation gradient WF was trained for 15, 000 epochs with the calibration dataset DC , and retrained for 2, 000 epochs with the extended calibration dataset D̃C according to eq. (22). From our experience, this strategy provides a good balance between accuracy, speed of convergence and computation time required. Since the material objectivity of the model WF is only approximated, both MSE and MD of the model are evaluated with 1, 024 random observers. For the additional observers, uniformly distributed rotation matrices are generated using the “Spatial Transformation” package provided by SciPy [80]. For the BCC cell, the models are also trained one time with the adapted calibration dataset D∗ C which contains the shear and volumetric deformation cases (from DC), as well as the biaxial and mixed test cases (from DT ), following the training strategies given above. Reproducibility of the models The potentials can be reproduced with the information of the net- work’s architecture and parameters, i.e., the number of layers and neurons, the connection of the neurons, choice of input and output, activation functions, as well as the weights and biases for each neuron. How- ever, the training of the model, i.e., the process of parameter calibration of the weights and biases, is of utmost importance for its performance. As mentioned above, this calibration corresponds to solving a non- convex optimization problem, for which minima are usually local and may strongly depend on the initial 12 0.75 1 1.25 1.5 0 10 20 F11 S ij S11 S22 S33 Figure 3: Evaluation of W SNE cub for the BCC cell, calibrated only for the uniaxial deformation case. Points depict the simulation data, while lines depict the calibrated model; stress in [hPa]. The S11 component is not well-fitted. guess and the optimization algorithm. Thus, for the same network architecture, typically several param- eter sets are determined based on different random initializations, which renders the calibration process non-reproducible. To meet this difficulty, we provide not only the calibration data, but also a compiled version of our TensorFlow model instances as well as their sets of parameters in the public GitHub repository https://github.com/CPShub/sim-data. With the information about the networks architecture and the parameters used for the potential, it can be reproduced in any program, c.f. (A.1). 4.3 Model evaluation for BCC cell The performance of the four models calibrated with the training data for the BCC cell is now discussed. First of all, the evaluation of the analytical model W SNE only in the uniaxial deformation case is shown in Fig. 3. As can be seen, even for this simple case the model fails to represent the material behavior. Thus, this model is not considered for further, more detailed performance evaluations. Before the machine learning models can be compared, the required amount of observers for the data augmentation of the model WF must be examined. The model WF is calibrated with the full calibration dataset D̃C , using 8, 16, 32 and 64 random observers, see eq. (22), and evaluated after calibration with 1, 024 random observers. The corresponding MSEs and relative calibration times are shown in Table 1, and the shear response for each calibration is illustrated in Fig. 4. While, for a calibration with 64 observers, the dataset is eight times as big as the dataset for eight observers, its calibration time is just under six times as long. Although data augmentation increases the calibration time, for increasing datasets, this approach profits from the fast evaluation of ANNs for large batch sizes. If the model response depends on the choice of observer, the evaluation yields an area containing the stress response for all different observers, whereas for sufficiently well approximated objectivity, minimum and maximum model response coincide, and the area practically reduces to a single line. For a calibration with eight observers, the individual stress components for the shear deformation are within a wide range, and the MSEs are very high. Therefore, the material objectivity is not fulfilled. Using more observers for the data augmentation, the approximation quality of the objectivity increases. For 64 observers in the calibration dataset, the shear responses for different observers are indistinguishable from each other, and the MSEs are sufficiently small. This shows that objectivity can be successfully learned using the data augmentation approach. In the following, the model WF is calibrated using 64 observers, and all evaluations for the model are carried out with 1, 024 random observers. In Table 2, the three best MSEs for the three machine learning models trained on DC and D̃C are depicted and sorted for decreasing error on the test dataset DT . The model WF yields far better results than W I for both calibration and test datasets, which may be caused by a smaller function space of the model W I due to the human choice of invariants. While the MSEs of WF are slightly higher than the ones of WC on DC , which is reasonable since the polyconvex model is more restricted in fitting the calibration data than the non-polyconex one, it actually performs slightly better on the test data. This could be rooted in the additional mathematical structure that polyconvexity incorporates into the model, making it more 13 https://github.com/CPShub/sim-data 0 7 14 0 0.25 0.5 0 7 14 0 0.25 0.5 8 observers S ij 16 observers 32 observers F12 S ij 64 observers F12 S11 S12 S13 S21 S22 S23 S31 S32 S33 Figure 4: Variation of observers for WF for the cubic BCC cell. Points depict the simulation data, while lines and shaded areas depict the calibrated model evaluated with 1, 024 observers. Stress tensor QT S(QF ) (4) for shear deformation is shown, with normal components in red colors, shear components in blue colors and less important components in gray; stress in [hPa]. generalizable. As already encountered in [20], different neural network architectures using a vastly differing amount of parameters may lead to models of equal quality, with different results for multiple initializations using the same architecture. Also, the stress error in eq. (24) dominates the overall MSE for all evaluations. For the convex neural networks used in this work, the models need a sufficiently high amount of nodes and layers to counteract the loss of flexibility due to the restrictions on the parameters. For the model WF, SP [16, 16, 16] is the smallest architecture yielding good results, while there were no significant benefits for architectures using more layers or nodes. Furthermore, in Table 2 the mean deviations for the three machine learning models are shown, evaluating the deviations of the different instances of each individual model. The deviations are evaluated for the datasets DF C and DF T , which contain all deformation gradients applied in the calibration dataset DC and test dataset DT , respectively. MD evaluates the deviation for the instances trained on DC , while MD compares the deviation of the single instance trained on the dataset D∗ C to the instances trained on DC . For the model WC the MD on DF C is very low, which may be attributed to the high flexibility of the unrestricted network core, allowing the model to be very similar on DC for every initialization. However, for DF T , the MD of the model WC is worse than its MSE on DT . Furthermore, for both DF T and DF C , the model behavior depends on the dataset used for the calibration, which leads to a high MD. Actually, MD on DF C is even three magnitudes higher than MD. For the model W I, the MD on both datasets is about two magnitudes smaller than its MSEs on the datasets. This may be caused by the additional mathematical structure that polyconvexity and invariants incorporate into the model. While the deviation of multiple instances trained on the same dataset is very low, the training on the adapted dataset D∗ C leads to a different model behavior, which results in a high MD. The reason for this is that the model is not flexible enough to capture the material behavior. The model WF has a flexible input and, due to the polyconvexity condition, a pronounced mathematical structure. This leads to excellent results for both the MSE and the MD values, making it the only model of this comparison which has both excellent approximation qualities and a consistent behavior within multiple 14 Observers for the MSE Relative cali- calibration dataset DT DC bration time 8 1.25 · 104 8.64 · 103 1 16 3.97 · 103 1.65 · 103 1.98 32 3.42 · 103 6.84 · 102 2.95 64 7.61 · 102 2.57 · 102 5.70 Table 1: MSEs of calibrated WF models for the BCC cell with augmented data for learning objectivity. Evaluation with 1, 024 random observers. Calibration times related to calibration time for 8 observers. 0.75 1 1.25 1.5 0 10 20 F11 un ia xi al - S ij S11 S12 S13 S21 S22 S23 S31 S32 S33 0.6 1.0 1.4 0 8 16 F11 m ix ed te st - S ij Figure 5: Evaluation of WC for the BCC cell, shaded areas denote the model’s loss of ellipticity. The ellipticity was checked with the Hessian of the potential, using 500 random unit vectors for each test vector, c.f. (10). Points depict the simulation data, while lines depict the calibrated model. Normal components of the stress tensor in red colors, shear components in blue, less important components in gray; stress in [hPa]. model instances. Furthermore, even for a calibration with the adapted dataset D̃∗ C the model behavior stays consistent. The deviation MD has the same magnitude as MD, and also the MSEs for the instances trained with different datasets have the same magnitude. For the following evaluations, the core SP [16, 16] is used for W I, while for WC and WF the core with the smallest MSE is chosen. In Fig. 5, the uniaxial deformation case and mixed test case for WC are shown to examine the ellipticity of the model. As already encountered in Fernández et al. [20], the model shows excellent approximation prop- erties. However, in both examined cases in Fig. 5, the model loses its ellipticity even for small deformations, which causes material instability and would lead to major drawbacks in numerical applications such as the finite element method. Consequently, for the metamaterials under consideration, it is important to include the polyconvexity condition into the model formulation, as it implies ellipticity, and thus ensures material stability. Typically, for soft materials, one does not expect loss of material stability for a deformation gradient F in a bounded set including the identity deformation F = 1 (even if large elastic strains may occur). This is, e.g., the case for isotropic elastic energies defined in the logarithmic strain tensor [61, 62, 69]. The loss of ellipticity in these models occurs only for extremely large strains that cannot be observed in experiments. In Fig. 6, the behavior of WC and WF is examined for volumetric tension and compression. Regarding the volumetric growth condition (7), the strain energy density should rapidly grow for J = detF → 0+. The model WF contains the term Wvol from (17) and thus fulfills the growth condition. In the formulation of WC, the growth condition is not considered, and consequently it is violated for the extrapolation J → 0+. For the metamaterials under consideration, lattice instabilities may lead to high volumetric compression, therefore the growth condition should be included in the model formulation. It must be emphasized that both the loss of ellipticity and the nonphysical behavior for J → 0+ are no specific drawbacks of the model WC as proposed by Fernández et al. [20]. It is very likely that other ML-based constitutive models would show the same behavior when calibrated to the examined data, as long as ellipticity and the volumetric growth condition are not explicitly considered in the model formulation. 15 Model Param. Calibr. Deviations MSE data DF T DF C DT DC WC with SP [16, 16, 16] 673 DC 1.02 · 103 1.59 · 102 ——"—— 1.40 · 103 1.58 · 102 ——"—— 1.41 · 103 1.63 · 102 SP [16, 16, 16] 673 D∗ C 2.40 · 102 4.18 · 103 MD 2.05 · 103 5.91 · 100 MD 1.47 · 103 4.21 · 103 W I with SP [32, 32, 32] 2, 369 DC 1.33 · 104 1.11 · 104 SP [32, 32] 1, 313 —"— 1.41 · 104 1.10 · 104 SP [16, 16] 401 —"— 1.44 · 104 1.12 · 104 SP [16, 16] 401 D∗ C 1.77 · 105 1.30 · 105 MD 2.37 · 102 2.93 · 102 MD 3.47 · 103 1.84 · 103 WF with SP [16, 16, 16] 737 D̃C 7.47 · 102 2.45 · 102 ——"—— 7.74 · 102 3.43 · 102 ——"—— 8.05 · 102 2.62 · 102 SP [16, 16, 16] 737 D̃∗ C 6.81 · 102 5.42 · 102 MD 1.60 · 102 1.11 · 102 MD 4.36 · 102 2.66 · 102 Table 2: Deviations of model instances and MSEs of calibrated ML models for the BCC cell. MD evaluates the deviation for the instances trained on DC , while MD compares the deviation of the instance trained on D∗ C to the instances trained on DC . 0 0.5 1 1.5 0 2 4 F11 = J1/3 st ra in en er gy de ns it y data WC WF Figure 6: Evaluation of WC and WF for volumetric deformation of the cubic BCC cell; strain energy density in [kJ/m3]. In Fig. 7, a subset of the calibration and test cases for the model W I is shown to examine some model characteristics. The model yields acceptable results for deformation gradients with dominating main diagonal elements. For the uniaxial and equibiaxial calibration case, the component S33 shows a large deviation from the simulation data, which also transfers to the biaxial test case. For shear deformation, the model completely fails to represent the simulation data, consequently, it also fails to represent the mixed test case. While, basically, the set of invariants for the model could be extended, it is unlikely that the model behavior for the cubic metamaterials under considerations can be improved with this approach. While the model shows 16 0.75 1 1.25 1.5 0 10 20 0 0.25 0.5 0 7 14 0.8 1.0 1.2 1.4 0 11 22 0.6 1.0 1.4 0 8 16 F11 un ia xi al - S ij F12 sh ea r - S ij F11 bi ax ia lt es t - S ij F11 m ix ed te st - S ij S11 S12 S13 S21 S22 S23 S31 S32 S33 Figure 7: Evaluation of W I for the cubic BCC cell. Points depict the simulation data, while lines depict the calibrated model. Normal components of the stress tensor in red colors, shear components in blue, less important components in gray; stress in [hPa]. 0.75 1 1.25 1.5 0 10 20 F11 un ia xi al - S ij S11 S12 S13 S21 S22 S23 S31 S32 S33 0 0.25 0.5 0 7 14 F12 sh ea r - S ij 0.8 1.0 1.2 1.4 0 11 22 F11 bi ax ia lt es t - S ij 0.6 1.0 1.4 0 8 16 F11 m ix ed te st - S ij Figure 8: Evaluation of WF for the cubic BCC cell. Points depict the simulation data, lines depict the calibrated model. Normal components of the stress tensor in red colors, shear components in blue, less important components in gray; stress in [hPa]. 17 Model Param. Calibr. Deviations MSE data DF T DF C DT DC WF with SP [16, 16, 16] 737 D̃C 5.41 · 102 2.38 · 102 ——"—— 5.62 · 102 2.61 · 102 ——"—— 8.82 · 102 2.54 · 102 MD 1.47 · 102 5.64 · 101 Table 3: Deviation of model instances and MSEs of calibrated WF model for the X cell. 0.6 1 1.4 0 1.5 3 0.6 1 1.4 0 10 20 F11 un ia xi al - S ij F11 eq ui bi ax ia l- S ij S11 S22 S33 Figure 9: Evaluation of WF for the cubic X cell. Points depict the simulation data, while lines and shaded areas depict the calibrated model; stress in [hPa]. drawbacks especially for shear deformations, the additional invariants can only use main diagonal elements of C in order to be polyconvex, which makes it hard to gain flexibility for shear deformations. However, for a wide range of materials with less challenging behavior, the invariant-based model may still be a good choice, which is demonstrated in Section 5 for transverse isotropy. In Fig. 8, a subset of the calibration and test cases for the model WF is shown to examine the model’s characteristics. The model shows excellent results for every deformation mode of the calibration dataset and the test dataset, with only small deviations of the S12 component for the mixed test case. After recalibration of the model with the concatenation of calibration and test dataset for 1, 000 epochs, the model can perfectly represent the simulation data for both calibration and test data, which is not shown in Fig. 8. The data augmentation approximates the material objectivity so well, that no dependence on the observer can be seen at all. 4.4 Model evaluation for X cell In the following, for the evaluation of the performance of the models calibrated with the training data for the X cell, only the results for the model WF are shortly discussed; the results for the model W I confirmed the observations made for the BCC cell without yielding further insights. The model WF was initialized three times using three layers with 16 nodes in each layer, leading to the MDs and MSEs shown in Table 3. In Fig. 9, the uniaxial and equibiaxial deformation mode for the model with the best DT are shown. While the model shows excellent agreement with the simulation data for almost all calibration cases, in the uniaxial tension regime, the model shows a slight dependence on the choice of observer. The reason for this is that the uniaxial and equibiaxial deformations are similar, while their stress response differs by a factor of ten. Therefore it is challenging for the model to capture the material’s behavior for both uniaxial and equibiaxial deformations. This may be resolved by an adapted training strategy as proposed in Fernández, Fritzen, and Weeger [19], for which the stress responses of the different deformation cases are scaled to a comparable magnitude for the training. We should also remark that approximate satisfaction of objectivity does not conflict with an existence theorem based on polyconvexity and growth or coercivity conditions. 18 5 Application to transverse isotropy After the detailed examinations for cubic lattice metamaterials in Section 4, the application of the polyconvex ML models to transverse isotropy is now briefly discussed. In doing so, we demonstrate the straightforward applicability of our models to other symmetry groups. Also, for the highly challenging behavior of cubic lattice metamaterials, the invariant-based model W I showed poor approximation quality. With the following example we show that, for a wide range of materials, W I can still be an appropriate choice. 5.1 Data generation Analytical transversely isotropic potential For the following investigations, we generate data with the polyconvex model proposed by Schröder, Neff, and Ebbing [72], which is applicable to several symmetry groups, including transverse isotropy. In the following, the preferred axis of the transversely isotropic symmetry group (21) is chosen as the x1-axis, which motivates the second order structural tensor Gti = diag ( β2, 1 β , 1 β ) . (28) Using this structural tensor, the two transversely isotropic invariants J4 = tr (C Gti) , J5 = tr (Cof(C)Gti) (29) can be derived, which are convex in F and Cof F , respectively [72]. Together with the isotropic invariants I1−3, Schröder, Neff, and Ebbing [72] proposed the potential W SNE ti = α1 I1 + α2 I2 + δ1 I3 − δ2 log (√ I3 ) + η1 α4 (trGti) α4 (Jα4 4 + Jα4 5 ) , (30) which is objective by construction, and polyconvex if all parameters are equal to or greater than zero. The parameter δ2 depends on the other parameters and is chosen such that the model is stress-free in the reference configuration. Note that the parameter β of the structural tensor (28) needs to be specified as well. Here, we use the parameter values (β, α1, α2, δ1, δ2, α4, η1) = (2, 8, 0, 10, 56, 2, 10), which were fitted in [72] to referential data of a not further specified real-world material. Using the potential (30), transversely isotropic data can be generated. As was already the case for the numerical homogenization of cubic lattice metamaterials, the analytical potential (30) provides both energy and stress values. However, for the following investigations, we will only make use of the stress values, which will demonstrate that energy values, which are typically not available in experiments, are not necessarily required to calibrate the proposed polyconvex ML models. Given as an analytical function, the potential (30) can directly be evaluated for a given deformation gradient F ∈ GL+(3). This leaves the question open of how to choose F for the generation of calibration and test datasets. Calibration dataset The calibration dataset DC consists of uniaxial and equibiaxial tensile tests, as well as a shear deformation. For the special cases of uniaxial and equibiaxial tensile tests, the corresponding boundary-value problem can be directly formulated as systems of non-linear equations. For uniaxial tension in x1-direction, deformation gradient and stress tensor are given by F = diag (F11, F22, F33) , S = diag (S11(F ), 0, 0) , (31) where F11 is prescribed, F22 = F33 are unknown and S11(F ) = DF11 W SNE ti (F ). For equibiaxial tension in x1, x2-directions, F = diag (F11, F22, F33) , S = diag (S11(F ), S22(F ), 0) (32) holds, where F11 = F22, F33 is unknown and S11(F ) = DF11 W SNE ti (F ), S22(F ) = DF22 W SNE ti (F ). These non- linear systems of equations are then solved with standard functions provided by MATLAB R2021a, which provides the overall deformation gradients for uniaxial and equibiaxial tensile tests. For this, 200 equidistant values F11 ∈ [0.5, 2] are prescribed. The shear deformation F = 1+ γ(e1 ⊗ e2 + e2 ⊗ e1) is evaluated for 250 equidistant values γ ∈ [0, 0.5]. Selected data points are visualized in Fig. 10 and Fig. 11. 19 Test dataset The test dataset DT consists of a biaxial test and a combined tension-shear test. Note that the “biaxial test” and “mixed test” are different from the ones used in Sect. 4. For the biaxial test, the system of nonlinear equations F = diag (F11, F22, F33) , S = diag (S11(F ), S22(F ), 0) (33) with prescribed F11, F22 = 0.5F11 is solved for F33 for 100 equidistant values F11 ∈ [0.5, 2]. The mixed test case F = 1 + 0.2λ 0.2λ 0 0 1 + 0.1λ 0 0 0 1− 0.1λ  (34) is evaluated for 100 equidistant values λ ∈ [−1, 2.5]. After generation of the deformation gradients F for all calibration and test cases, the first Piola-Kirchhoff stress S = DFW SNE ti (F ) is evaluated, and the resulting datasets consist of tuples D = {(F1, S1) , . . . } . (35) Altogether, the calibration dataset DC consists of 650 tuples, while the test dataset DT consists of 200 tuples. 5.2 Model preparation For the invariant-based ML model W I, the two transversely isotropic invariants from (29) together with the three isotropic invariants from (13) and the additional invariant I∗3 = −2 √ I3 form the input I = (I1, I2, I3, I ∗ 3 , J4, J5) ∈ R6 of the neural network. For the deformation gradient based model WF, the input (F, detF ) ∈ R10 is chosen. For the network core of W I, one hidden layer with eight nodes turned out as a sufficiently accurate choice, while for WF three hidden layers with 32 nodes in each layer are chosen. As before, convexity is ensured by choice of the convex Softplus activation function in each node and restrictions on the network parameters, which are discussed in Section 4.2 and Proposition A.9, respectively. The transversely isotropic symmetry group Gti has an infinite number of elements, see (21). In order to apply the group symmetrization approach from (20) on WF, the group is approximated by six rotations around the x1-direction, c.f. eq. (21). For the model calibration, the MSE MSE□ (p) = 1 # (D) ∑ F∈D 1 9Pa2 ∥∥∥S (F )− S□ (F ; p) ∥∥∥2 (36) with W□, □ ∈ {I, F} is applied, c.f. Sect. 4.2. Here, the MSE for the shear deformation is weighted twice, as its stress response is considerably lower than the stress response of the other calibration cases. For the data augmentation for objectivity of WF, see (22), 128 random rotation matrices are used, while for the group symmetrization for transverse isotropy of WF, see (20), six equidistant rotations around the x1-axis are applied. Since both objectivity and material symmetry are only approximated for WF, the model is evaluated with 1, 024 random observers for the verification of objectivity (3) and with 60 equidistant rotations around the x1-axis for the material symmetry (5). Both models are initialized three times and each trained for 5, 000 epochs. Further technical details are discussed in Sect. 4.2. The MATLAB code used to generate calibration and test data, as well as compiled versions of both ML models and their sets of parameters are provided in the public GitHub repository https://github.com/CPShub/sim-data. 5.3 Model evaluation In Table 4, the MSEs of the different model initializations are shown. The ML models show excellent agreement with both the calibration and test dataset. In particular, even though W I has only one layer with eight nodes, it can represent the data almost perfectly, which is most likely caused by the fact that the ML model W I uses the same invariants as the analytical potential (30). A subset of the training dataset 20 https://github.com/CPShub/sim-data 0.5 1 1.5 2 -0.75 0 0.75 1.5 0 0.25 0.5 -0.25 0 0.25 0.5 0.5 1 1.5 2 -0.75 0 0.75 1.5 0.8 1.0 1.2 1.4 -0.3 0 0.3 0.6 F11 un ia xi al - S ij F12 sh ea r - S ij F11 bi ax ia lt es t - S ij F11 m ix ed te st - S ij S11 S12 S13 S21 S22 S23 S31 S32 S33 Figure 10: Evaluation of W I for transverse isotropy. Points depict data from the analytical model W SNE ti , while lines depict the evaluation of the calibrated model W I. Normal components of the stress tensor are shown in red colors, shear components in blue, less important components in gray; stress in [hPa]. 0.5 1 1.5 2 -0.75 0 0.75 1.5 F11 un ia xi al - S ij S11 S12 S13 S21 S22 S23 S31 S32 S33 0 0.25 0.5 -0.25 0 0.25 0.5 F12 sh ea r - S ij 0.5 1 1.5 2 -0.75 0 0.75 1.5 F11 bi ax ia lt es t - S ij 0.8 1.0 1.2 1.4 -0.3 0 0.3 0.6 F11 m ix ed te st - S ij Figure 11: Evaluation of WF for transverse isotropy. Points depict data from the analytical model W SNE ti , while lines depict the evaluation of the calibrated model WF. Normal components of the stress tensor in red colors, shear components in blue, less important components in gray; stress in [hPa]. 21 Model Param. MSE DT DC W I with SP [8] 65 1.52 · 10−1 2.90 · 10−2 ——"—— 7.84 · 10−1 6.41 · 10−2 ——"—— 8.47 · 10−1 3.44 · 10−2 WF with SP [32, 32, 32] 737 3.20 · 100 2.80 · 10−1 ——"—— 3.40 · 100 3.44 · 10−1 ——"—— 4.12 · 100 2.95 · 10−1 Table 4: MSEs of calibrated ML models for the transversely isotropic data. and both test cases are shown for W I and WF in Fig. 10 and 11, respectively. Again, the model W I shows excellent agreement for the transversely isotropic data. For the shear calibration case and the test cases, WF shows a slight dependence on the observer. Overall, both ML models are able to represent the analytical potential (30) very well, and here especially the invariant-based model W I shows excellent results. As such analytical potentials are successfully applied in, e.g., modelling of soft biological tissues [6], this implies that the polyconvex ML models are also applicable to a wide range of real-world materials. 6 A critique of machine learning in nonlinear elasticity theory At this point, we would like to briefly discuss some general issues raised by the use of machine learning techniques in nonlinear elasticity theory. First and foremost, we would like to point out that these methods are not meant to serve as a replacement for classical analytical models, but rather as an addition to the already existing extensive theoretical framework. More specifically, we want to address three interrelated shortcomings of the data-driven approach: • the lack of an intuitive interpretation of the model and its parameters; • the unstable (and, in practice, even non-deterministic) dependence of the parameter values on the experimental data; • the uncertainty of whether the resulting model is applicable to problems outside the range of prior experiments. To a smaller extent, all three of these issues can be observed for a number of analytical models as well, especially some phenomenological models with a large number of parameters, which could be considered a precursor to the modern purely data-driven approaches. 6.1 Analytical models For comparison, we first consider the the classical, isotropic Hencky strain energy WH : GL+(3) → R , WH(F ) = µ ∥dev log √ FTF∥2 + κ 2 [tr(log √ FTF )]2 , (37) which depends solely on the two physical parameters µ (the shear modulus) and κ (the bulk modulus). While it is well known that the elasticity model induced by the Hencky energy does not provide an accurate description of very large deformations [3, 65], it is indeed highly accurate for up to moderate strains of about 20% [3]. Moreover, the relation between the elastic behaviour predicted by the Hencky model and the experimental data used to determine the parameters is clearly accessible to direct interpretation: The shear modulus and the bulk modulus are determined by the material’s response to shear stresses and hydrostatic pressure, respectively, and in turn influence the stress response to certain modes of deformation, namely to simple shear and purely volumetric strain. In particular, this direct correspondence between the 22 parameter values and the model’s behaviour can be used to examine the plausibility of a specific parameter set, even in the absence of additional test data. Moreover, Hencky deduced his material model from a number of simple axiomatic assumptions [32, 33, 61, 63]. The applicability of his model to deformations not included in prior experiments can therefore be based on whether or not (or rather: to what degree) his postulates hold under the new circumstances. It is thereby possible to reasonably assess the limitations of Hencky’s model. This direct correspondence between the mechanical-geometrical interpretation of the model and its pa- rameters can no longer be established for other hyperelastic material models, especially for so-called phe- nomenological (or “heuristic”) models. For example, the Ogden energy, which can be expressed in terms of the singular values λ1, λ2, λ3 of F via WOG : GL+(3) → R , WOG(F ) = M∑ i=1 µi αi (λαi 1 + λαi 2 + λαi 3 − 3) , (38) with 2M parameters µi > 0, αi ∈ R, can provide a much better fit to empirical observations than the Hencky model for very large deformations [65] if the number 2M of material parameters is sufficiently high. However, there is no longer any intuitive relation between these parameters and the predicted material behaviour. Furthermore, the (globally) optimal choice of parameters for fitting the energy to a given dataset is difficult to determine due to the strongly nonlinear dependence of the induced stress-strain relation on the parameter values [65]. Therefore, in practice, the result of the parameter optimization is not fully determined by the measured empirical data, but also affected by the random influence of, for example, the chosen starting points of the optimization algorithm. In particular, two possible outcomes of such an optimization procedure might yield a similar (or even an identical) quality of fit for the Ogden model to the limited experimental data, whereas the material behaviour predicted by those two distinct optimized models for other deformations might differ significantly. Therefore, a high degree of uncertainty must remain about the prediction quality exhibited by material models such as Ogden’s, particularly when applied to deformations which are not included in (or closely related to) the original experimental observations. This problem is closely related to the more general notion of overfitting, i.e. optimizing too specifically to a given dataset, which tends to occur for model functions with a high number of parameters. 6.2 Data-driven models Machine-learning based approaches share and even amplify these shortcomings of highly complex phenomeno- logical models. Due to the general nature of data-driven methods, the number of parameters required for closely fitting such a model to a given specific dataset is necessarily rather high, even when compared to complex models such as the Ogden energy. In addition, for most machine learning methods used today (in- cluding neural networks), the resulting model cannot (easily) be stated explicitly in the form of a closed-form analytical expression. It is therefore extremely difficult, if not impossible, to develop an intuitive understand- ing of the relation between such a data-driven model and its parameters on the one hand and the predicted material behaviour on the other. Of course, these problems have been recognized in many other fields of research where machine learning techniques have been applied, and a number of approaches have been suggested to determine not only the influence of different parameters on the prediction, but also the direct relation between the training data and the resulting output in a humanly comprehensible fashion [55, 74, 76]. However, these techniques still lack the reliability and the direct intuition offered by more traditional models. Similarly, the phenomenon of overfitting is a well known issue in the field of machine learning, and a number of precautions (such as the careful distinction between training and validation data) are taken in order to alleviate this problem. However, since any form of validation or testing is still based on available experimental data, any assumptions about the applicability of these models to circumstances outside the range of prior observations must remain unfounded even in the best of cases. Thus, we consider it as extremely important to inform machine learning models with as much physi- cal and mathematical structure as possible (as done here with the hyperelasticity, anisotropy, objectivity, and polyconvexity properties) and apply them only in cases where classical approaches cannot provide an acceptable accuracy (as is the case here due to the strong nonlinearity of the lattice microstructures). 23 Finally, and perhaps most importantly, even if the resulting trained algorithm does indeed provide an accurate model in all practically relevant situations, it does not offer any insight as to why its predictions are accurate. Again, this can be contrasted with the aforementioned Hencky elasticity model: Although not deduced ab initio, the model has indeed been developed originally by Heinrich Hencky from simple geometrical and mechanical considerations, and a careful study of his deductions can without doubt further the reader’s understanding of continuum mechanics in a way that cannot be matched by inspecting the “black box” that results from training a machine learning algorithm. Of course, the above considerations are not restricted to applications of data-driven models to nonlinear elasticity theory, but equally apply to many other domains of the natural sciences where machine learning has recently been demonstrated to yield promising results. Machine-learning based approaches can (and will) most certainly be employed to improve the accuracy of predictions and thus the quality of models and simulations in the years to come, most likely resulting in considerable technological advancements. However, for the reasons outlined above, machine learning should not be considered as a full-fledged replacement of more traditional, analytical models, now or in the future, even if the outcomes seem to match or even surpass those resulting from more classical approaches. In the past, humanity has found a major motivation for developing a further understanding of nature in the dependence of (practically applicable) scientific techniques on the scientific method [67], and it would be most unfortunate if the success of machine learning in advancing the former would lead us to neglect the latter. 7 Conclusion In the present work, two machine learning based constitutive models are proposed, which fulfill the poly- convexity condition by using input convex neural networks. This implies ellipticity of the constitutive mod- els, which ensures material stability. The hyperelastic models are formulated for finite deformations and anisotropic material behavior. Furthermore, the neural networks yield highly flexible constitutive models, which are adaptable to a wide range of materials. The first model W I is based on a set of polyconvex, anisotropic invariants proposed by Schröder, Neff, and Ebbing [73], see eq. (13) and (15), which fulfill the material objectivity and material symmetry conditions by construction. The neural network core is able to create highly nonlinear functions from the invariants, while conventional models are restricted to comparatively simple polynomials. Depending on the anisotropy class, various sets of invariants can be used. The second model WF fully exploits the approximation capabilities of artificial neural networks. For- mulated in the deformation gradient, its cofactor and determinant, the material objectivity condition is not fulfilled by construction, which would be a major drawback for conventional constitutive models. In the context of machine learning, however, the model is trained to approximate the objectivity condition, using data augmentation of the calibration dataset, cf. (22), which is possible due to the high flexibility of the models, and the high-performing optimization algorithms available in several machine learning libraries, e.g., TensorFlow. The present work not only uses the potential values, as done in Ling, Jones, and Templeton [51], but also the stress values. This is convenient as the stress is the quantity of interest for many applications of the constitutive model, and furthermore, the general approximation quality of the model can benefit from the additional information that the augmented stress values provide. For incorporating the material symmetry, the group symmetrization introduced in Fernández et al. [20] is used, see eq. (20). The model capabilities are examined with synthetic homogenization data of cubic metamaterials used in Fernández et al. [20], and compared to the polyconvex model proposed by Schröder, Neff, and Ebbing [73]. The simulation data offers a highly challenging benchmark case, with characteristics like lattice instabilities for several deformation modes. The analytical model from [73] is not able to capture the behavior of the material, even for the uniaxial deformation case. The model WF shows excellent performance, not only for the calibration data, but also for several test scenarios which were not included in the training of the model. The evaluation of the model W I gave acceptable results for deformation gradients with dominating main diagonal elements, but failed to represent the stress response for shear deformations. However, when fitted to data generated from the analytical, transversely isotropic model from [72], which represents a real-world material, the model W I also delivered excellent results. This shows that both models are applicable to a wide 24 class of anisotropic materials, while WF is preferable for highly challenging metamaterials. Apparently, the polyconvexity conditions greatly improves the generalization capabilities of ANN-based constitutive models, such that they can be trained on fairly small training datasets. Here, deformation modes which are commonly applied in physical experiments were used. Nevertheless, due to their high flexibility, the models can benefit from a wider range of calibration data and yield even better results. The data augmentation approach for the calibration of the model WF does not require additional simulation or experimental data, the extended dataset is created purely from mechanical considerations. The models are formulated as general as possible, and can be adapted to a wide range of anisotropic, hyperelastic materials. It lies in the very nature of machine learning that constitutive models based on neu- ral networks are, to some extent, more complex than their conventional counterparts. As to what extent, the present work suggests that there are two major ways: For a wide range of materials with a moderately challenging behavior, very small ML models can be used, c.f. the excellent performance of the small invariant based model in Sect. 5. The model complexity of this approach is close to the one of analytical potentials, without the need to manually construct a function for the specific material behavior at hand. However, for very complex material behavior, the full flexibility of neural networks can be utilized by using bigger net- works, c.f. the deformation gradient based model in Sect. 4. In both cases, the application to finite element simulations will be important for future research. Considering the infinitely continuously differentiable neural network cores and the ellipticity of the proposed models, they offer a straightforward adaption for this, with favorable numerical behavior and trivial computation of stress and stiffness tensors, if automatic differentiation functionalities are considered. For future work, it would be valuable to investigate the for- mulation of polyconvex FFNNs with a volumetric-deviatoric decomposition of the deformation gradient for (nearly) incompressible materials. Furthermore, the incorporation of parametric dependencies, such as the aspect ratio of a microstructure, into polyconvex FFNNs should be investigated. Conflict of interest. The authors declare that they have no conflict of interest. Acknowledgment. The work of Dominik Klein is supported by the Graduate School CE within the Centre for Computational Engineering at Technical University of Darmstadt. Patrizio Neff acknowledges support in the framework of the DFG-Priority Programme 2256 “Variational Methods for Predicting Complex Phenom- ena in Engineering Structures and Materials”, Neff 902/10-1, Project-No. 440935806. Data availability. The authors provide access to the complete simulation data required to reproduce the results through the public GitHub repository https://github.com/CPShub/sim-data. A Input convex feed-forward neural networks FFNNs are a special class of artificial neural networks, which can be recursively defined as the composition of several vector-valued functions [1, 46]. The components of the vectors are referred to as nodes or neurons, the function in each neuron is referred to as activation function. Definition A.1 (Feed-forward neural networks (FFNNs)). The FFNN with vector-valued input X, H hidden layers and scalar-valued output function a is given by X ∈ Rn[0] A1 = A1 ( W [1]X + b[1] ) ∈ Rn[1] , Ah = Ah ( W [h]Ah−1 + b[h] ) ∈ Rn[h] , h = 2, . . . ,H a = a ( W [H+1]AH + b[H+1] ) ∈ R. (A.1) Weights W [h] ∈ Rn[h]×n[h−1] and bias b[h] ∈ Rn[h] form the set of parameters, which is optimized when the model is calibrated. X and a are referred to as input and output layer, respectively, while the layers Ah are referred to as hidden layers with component-wise applied activation functions according to (A (W X))i = Ai ( ⟨w[i], X⟩ ) , W = ( w[1], . . . , w[n] )T . (A.2) 25 https://github.com/CPShub/sim-data We apply the short notation for feed-forward neural networks a ◦ A ◦X (A.3) with the networks core A = AH ◦ . . . ◦A1. Definition A.2 (Input convex neural networks (ICNNs)). The FFNN a = a◦A◦X is called an ICNN, when the scalar-valued output a is convex w.r.t. the vector-valued input X [2]. Sufficiency conditions for the fulfillment of convexity in the case of function compositions are given in the following theorem. Theorem A.3. The function ā = a ◦B ◦X is convex in X if the function a is convex and non-decreasing in B, and B is component-wise convex in X. Proof. We have to show the positive semi-definiteness of the function’s Hessian [29, 75]: D2 X ā = (DXB) T ·D2 Ba ·DXB +DBa ·D2 XB (A.4) The derivatives DXB are mappings from the vector space X into the vector space B. When a is choosen as a convex function of B, the Hessian of a w.r.t. B is positive semi-definite and hence the first term in eq. (A.4) is positive semi-definite, see also observation 7.1.8 in [35]. The second term in eq. (A.4) is positive semi-definite when a is non-decreasing in every component of B, and B is component-wise convex in X. Corollary A.4. A FFNN is convex, when (i) the first hidden layer of the network’s core is component- wise convex w.r.t. the input, (ii) every following hidden layer is component-wise convex and non-decreasing w.r.t. the previous layer, and (iii) the scalar-valued output function is convex and non-decreasing w.r.t. the last hidden layer. Proof. This follows by recursively applying theorem A.3 to the components of the hidden layers. Theorem A.5. Convexity is preserved under affine transformations. The proof is clear but we provide it for the convenience of the reader. Proof. The Hessian of the convex function a applied on an affine transformation of its argument, i.e., a ( X̃ ) = a (X C +D) with arbitrary, but constant C and D, is positive semi-definite. D2 Xa = ( DXX̃ )T ·D2 X̃ a ·DXX̃ +DX̃a ·D2 XX̃ (A.5) The positive semi-definiteness of the first summand follows equivalent to eq. (A.4), while the second summand vanishes due to the linearity of X̃ in X. Corollary A.6. Due to its linearity, the bias of convex FFNNs can be choosen arbitrarily in every layer. The input of a convex FFNN can be multiplied by any constant matrix. With corollaries A.4 and A.6, convex neural networks can be constructed. In the first step, this requires the choice of a convex and non-decreasing activation function. Furthermore, the activation function should be sufficiently smooth, as the calculation of gradients plays an important role in continuum mechanics. All of the former requirements can be fulfilled by the following functions. Theorem A.7 (Log-Sum-Exp function). The Log-Sum-Exp function is defined as f : Rm → R, f (x) = log m∑ l=1 exl . (A.6) 26 We use the adaption f0 (x) = f (0, x), since it is closely linked to the Softplus function, see corollary A.8. Using the weight matrix W ∈ Rm×n as defined in eq. (A.2) and the bias vector b ∈ Rm, we obtain the adapted Log-Sum-Exp function LSE : Rn → R, LSE(X) = f0 (W X + b) = log [ 1 + m∑ l=1 e⟨w [l],X⟩+bl ] (A.7) for neural networks. The LSE function is convex for arbitrary weights and bias, and non-decreasing when all weights are non-negative. It is smooth for any choice of arguments or parameters, i.e., LSE ∈ C∞(Rn). Proof. We first show the convexity of the adapted softplus function f0 by proving the positive semi-definiteness of the Hessian D2 xf0 (x). For this, we simply observe that the inequality 0 ≤ v ·D2 xf0 · v = 1 (1 + ∑m l=1 e xl) 2 [ m∑ l=1 v2l e xl ][ 1 + m∑ l=1 exl ] − [ m∑ l=1 vle xl ]2  (A.8) holds for any v ∈ Rm since, due to the Cauchy-Schwarz inequality,[ m∑ l=1 vle xl ]2 = [ m∑ l=1 vle xl/2 · exl/2 ]2 ≤ [ m∑ l=1 v2l e xl ][ m∑ l=1 exl ] ≤ [ m∑ l=1 v2l e xl ][ 1 + m∑ l=1 exl ] . (A.9) The LSE function is obtained through the linear transformation LSE(X) = f0 (x) , x = W X + b (A.10) of the convex function f0. Linear transformations preserve convexity, therefore the LSE function is also convex. The first derivative of the LSE function [DXLSE(X)]i = ∑m l=1 w [l] i e⟨w [l],X⟩ 1 + ∑m l=1 e ⟨w[l],X⟩ (A.11) is non-negative for w [l] i ≥ 0 ∀ i, l. The smoothness of the functions follows from the smoothness of the exponential function, and the smoothness of the logarithm on the positive domain. Corollary A.8 (Softplus function). For m = 1 in eq. (A.7), the LSE function is reduced to the Softplus (SP) function SP : Rn → R, SP(X) = log [ 1 + e⟨w,X⟩+b ] (A.12) with weights w ∈ Rn and bias b ∈ R. The SP function is convex for arbitrary weights and bias, and non-decreasing when all weights are non-negative. It is smooth for any choice of arguments or parameters. Proposition A.9 (ICNNs using SP and LSE functions). ICNNs based on SP functions are built from multiple layers, with several nodes using SP activation functions in each layer. For the first hidden layer, the weights of the softplus functions can take arbitrary values, while the other layers must be non-decreasing functions and, therefore, use non-negative weights. The networks output is given as the non-negative weighted sum of the last SP layer. The bias in every layer can be chosen arbitrarily. ICNNs based on LSE functions can be composed of either a single LSE function, or of several LSE functions. In the first case, the LSE function can be understood as a composition of one layer with several nodes using exponential activation functions, which are summed up and logarithmized in the next layer. The function achieves its approximation properties by increasing the amount of nodes in the exponential layer. As the exponential layer is the first hidden layer, the weights can be chosen arbitrarily. Alternatively, the convex neural network can be constructed using several LSE functions. We obtain this approach similar to the one introduced for SP functions, just by replacing the SP functions by LSE functions. In both cases, the bias in every layer can be chosen arbitrarily. 27 Remark A.10. Constitutive models are often formulated in sets of invariants. In doing so, several important properties are fulfilled already by the choice of the input quantity. When a set of invariants is used for a ML model, the networks core must not only be convex, but also non-decreasing in its input, which follows directly from eq. (A.4). Since the invariants are created by nonlinear functions, e.g., I1 = tr(FTF ), the subsequent function processing the invariants must be convex and non-decreasing in order to be convex in (F, Cof F, detF ). For cores based on SP or LSE functions, this can easily be achieved by using non-negative weights in the first layer. Remark A.11. The LSE activation function is not included in the current TensorFlow version, and was manually implemented. While LSE based cores should benefit from the highly flexible activation function and yield excellent results, the convergence behavior during training was very slow, and no satisfying results could be obtained. However, this should be seen as a numerical drawback of the implementation and optimization approaches used in this work rather than a general drawback of the function itself. Thus, only SP based cores were used for the numerical investigations shown in Sect. 4. References [1] C. C. Aggarwal. Neural Networks and Deep Learning. 1st ed. Springer International Publishing, 2018. [2] B. Amos, L. Xu and J. Z. Kolter. “Input convex neural networks”. In: Proceedings of the 34th International Conference on Machine Learning. Ed. by D. Precup and Y. W. Teh. Vol. 70. Proceedings of Machine Learning Research. PMLR, 2017, pp. 146–155. arXiv: 1609.07152. [3] L. Anand. “On H. Hencky’s approximate strain energy function for moderate deformations”. In: Journal of Applied Mechanics 46 (1979), pp. 78–82. doi: 10.1115/1.3424532. [4] J. M. Ball. “Constitutive inequalities and existence theorems in nonlinear elasto-statics”. In: Herriot Watt Symposion: Nonlinear Analysis and Mechanics. Ed. by R. J. Knops. Vol. 1. London: Pitman, 1977, pp. 187– 241. [5] J. M. Ball. “Convexity conditions and existence theorems in nonlinear elasticity”. In: Archive for Rational Mechanics and Analysis 63.4 (1976), pp. 337–403. doi: 10.1007/BF00279992. [6] D. Balzani, P. Neff, J. Schröder and G. A. Holzapfel. “A polyconvex framework for soft biological tissues. Adjustment to experimental data”. In: International Journal of Solids and Structures 43.20 (2006), pp. 6052– 6070. doi: 10.1016/j.ijsolstr.2005.07.048. [7] A. Baydin, B. Pearlmutter, A. Radul and J. Siskind. “Automatic differentiation in machine learning: A survey”. In: Journal of Machine Learning Research 18 (2018), pp. 1–43. arXiv: 1502.05767. [8] K. Bertoldi, V. Vitelli, J. Christensen and M. van Hecke. “Flexible mechanical metamaterials”. In: Nature Reviews Materials 2.11 (2017). Number: 11 Publisher: Nature Pub