Polyconvex anisotropic hyperelasticity with neural networks

Dominik K. Klein1,*, Mauricio Fernández2, Robert J. Martin3,
Patrizio Neff3 and Oliver Weeger1

1Cyber-Physical Simulation Group, Department of Mechanical Engineering & Centre for Computational
Engineering, Technical University of Darmstadt, Dolivostr. 15, 64293 Darmstadt, Germany

2Access e.V., Intzestr. 5, 52072 Aachen, Germany
3Chair for Nonlinear Analysis and Modeling, Faculty of Mathematics, University of Duisburg-Essen,

Thea-Leymann-Str. 9, 45127 Essen, Germany
*Corresponding author, email: klein@cps.tu-darmstadt.de

October 29, 2021

Abstract
In the present work, two machine learning based constitutive models for finite deformations are
proposed. Using input convex neural networks, the models are hyperelastic, anisotropic and fulfill
the polyconvexity condition, which implies ellipticity and thus ensures material stability. The
first constitutive model is based on a set of polyconvex, anisotropic and objective invariants. The
second approach is formulated in terms of the deformation gradient, its cofactor and determinant,
uses group symmetrization to fulfill the material symmetry condition, and data augmentation
to fulfill objectivity approximately. The extension of the dataset for the data augmentation
approach is based on mechanical considerations and does not require additional experimental
or simulation data. The models are calibrated with highly challenging simulation data of cubic
lattice metamaterials, including finite deformations and lattice instabilities. A moderate amount
of calibration data is used, based on deformations which are commonly applied in experimental
investigations. While the invariant-based model shows drawbacks for several deformation modes,
the model based on the deformation gradient alone is able to reproduce and predict the effective
material behavior very well and exhibits excellent generalization capabilities. In addition, the
models are calibrated with transversely isotropic data, generated with an analytical polyconvex
potential. For this case, both models show excellent results, demonstrating the straightforward
applicability of the polyconvex neural network constitutive models to other symmetry groups.

Key words: constitutive modeling, nonlinear elasticity, anisotropic hyperelasticity, polyconvexity,
ellipticity, material stability, soft materials, metamaterials, invariants, structural tensors, parameter
identification, data-driven modeling, machine learning, input convex neural networks

Accepted version of manuscript published in the Journal of the Mechanics and Physics of Solids.
Date accepted: October 29, 2021. DOI: 10.1016/j.jmps.2021.104703. License: CC BY-NC-ND 4.0

1 Introduction
In the last decades, a vast amount of highly specialised materials has been developed and, with advancing
requirements in engineering applications, the trend is growing. In particular, with recent advances in additive
and advanced manufacturing technologies, flexible and functional mechanical metamaterials and composites

1

mailto:klein@cps.tu-darmstadt.de
https://doi.org/10.1016/j.jmps.2021.104703
https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode


are being developed [77]. As a consequence, numerous constitutive models have been formulated, each
specifically designed for the mechanical characteristics of a comparatively small class of (soft) materials [15,
84]. However, while the specific formulations may be different, the theoretical considerations that lead to
physically sensible and mathematically well-posed models stay the same, i.e., ellipticity, thermodynamic
consistency, objectivity, material symmetry, etc. [34].

Recent progress in the field of machine learning (ML) has sparked the development of data-driven nu-
merical methods that avoid the explicit formulation of constitutive models and purely operate on discrete
stress-strain data [14, 45, 64], as well as of data-driven constitutive models that employ reduced bases [22,
23, 88], polynomials [43, 87], or artificial neural networks (ANNs) [21, 49, 54, 86] for the representations
of nonlinear energy potentials or stress-strain relationships. In particular, the latter approach offers a high
flexibility and applicability to a wide range of materials due to the universal approximation properties of
ANNs [36]. Furthermore, they can also be formulated to fulfill important material theoretical considerations,
e.g., the material symmetry condition [20, 52]. In González, Chinesta, and Cueto [27], an analytical model
was extended using ML methods. While preserving favorable properties of the analytical model, the ML
extension can improve the models performance and, thus, the applicability to different materials, which is
shown in González et al. [28] for vascular soft tissues. In Fernández et al. [20], two anisotropic material models
for finite deformations are proposed. The first model is based on an analytical model [37], whose function
space is extended using ANNs. While preserving favorable analytical properties of the model, the flexibility is
significantly improved. For the second approach, the method of group symmetrization is introduced to fulfill
the material symmetry. With the six independent components of the right Cauchy-Green tensor C = FTF as
input quantity for an ANN, the model offers a highly flexible, objective hyperelastic potential, which is able
to reproduce the challenging effective behavior of cubic beam-lattice metamaterials, including lattice insta-
bilities. The model proposed by Linka et al. [52] uses structural tensors, c.f. [90], processed through ANNs, to
formulate a set of invariants reflecting the intrinsic material anisotropy and, using these invariants as input
for another ANN, predicts the potential of the hyperelastic structure. By using a Lagrangian multiplier to
enforce incompressibility, c.f. [34, Section 6.3], the model is applicable to elastomeric material behavior, and
shows excellent results for Treloar’s experimental data on vulcanised rubber. Both the models introduced in
[20] and [52] include the objectivity and material symmetry condition in their formulation. In Heider, Wang,
and Sun [31], a neural network based constitutive model for anisotropic elastoplastic materials is proposed,
and different methods are examined to fulfill the objectivity condition, e.g., the representation of stress and
strain tensors in their spectral decomposition for the input and output data of the neural network.

In the context of finite elasticity theory, existence and stability of solutions for boundary-value problems
are strongly linked to the notion of polyconvexity introduced by Ball [4, 5]. From a material theory point
of view, polyconvexity is advantageous since it implies ellipticity of the corresponding constitutive model,
while being more straightforward to include into the model formulation than the ellipticity condition itself.
Ellipticity ensures material stability [62, 89], and is thus important for numerical applications such as the
finite element method. As will be shown for the model of Fernández et al. [20], the polyconvexity condition
is easily violated when using unrestricted ANNs without caution. As this may lead to a loss of ellipticity
and thus to material instability, the polyconvexity condition should be treated with special care in the model
formulation.

The theoretical aspects of polyconvexity are still subject to current research [25, 58, 59], and the formula-
tion of polyconvex models remains a challenging task [56, 57]. For a long time after its initial conception, the
polyconvexity condition was practically restricted to isotropic material response, as no anisotropic formulation
was available that ensured at the same time: polyconvexity, objecticity, material symmetry and a stress-free
reference configuration. In fact, the fulfillment of multiple constitutive requirements at the same time can
be seen as “the main open problem of the theory of material behavior” (Truesdell and Noll [79, Section 20]).
In a landmark paper, Schröder and Neff [70] derived a formulation which fulfills all of the former mentioned
requirements, quickly followed by [18, 29, 38, 41, 68, 71, 72]. It was shown that questions of material stability
could neatly be avoided while being able to match experimental data [6]. The approaches are based on
second and fourth order structural tensors, which can reproduce a wide range of anisotropies. Combining
the structural tensors with the right Cauchy-Green deformation tensor, sets of invariants can be derived. By
a suitable construction of polynomials of these invariants, constitutive models can be created which fulfill all
of the former mentioned restrictions. By using polyconvex invariants instead of non-polyconvex invariants,
constitutive models can be improved [11]. Subsequently, also finite strain finite element methods tailored to

2


the discretization of polyconvex material models were developed [10, 66].
It must be emphasized, however, that the polyconvexity condition is a purely mathematical framework

that ensures ellipticity, but unlike ellipticity, to the best knowledge of the authors, is not a fundamental
physical requirement. For example, polyconvexity imposes restrictions upon arbitrarily large strains that
will never occur in actual experiments. Here, we adopt polyconvexity as a suitable means to ensure overall
material stability, i.e., to avoid loss of ellipticity. Otherwise, this would be cumbersome to be checked a
posteriori. Note that the only real physical requirement is satisfaction of ellipticity in a compact set including
the identity. But even this can be very challenging without polyconvexity [61, 62].

Coming back to the field of ML, the difficulty of formulating physically sensible models, e.g., material
models which fulfill multiple constitutive requirements, is still an open problem [16, 42, 82, 83]. Several
constitutive modeling approaches found in the literature consider convexity properties, however, none of
them known to the authors fulfills the polyconvexity condition. In Ghaderi, Morovati, and Dargazany [24],
an attempt is made to formulate polyconvex potentials based on ANNs. However, for the neural network
core, non-convex activation functions are used such as the hyperbolic tangent, and consequently, the potential
proposed in [24] violates the polyconvexity condition, which follows from corollary A.4. In Vlassis, Ma, and
Sun [81], a feed-forward neural network (FFNN) based hyperelastic model for anisotropic material behavior is
proposed, and the convexity of the resulting potential w.r.t. the right Cauchy-Green tensor C is examined for
a special material. However, while the potentials are convex in C for some examined deformation modes, the
model only approximates the convexity condition and, consequently, it may be violated for other deformations.
Furthermore, convexity in C does not ensure polyconvexity, as not all components of C are convex in the
deformation gradient F . Therefore, convexity in C does not imply ellipticity, and exhibits no physical
significance at all. The constitutive model proposed in Xu, Huang, and Darve [85] identifies the Cholesky
factor of a tangent stiffness matrix to describe the behavior of the material. By using symmetric positive
definite neural networks, the formulation is closely linked to convex potentials, and numerical robustness of the
model is ensured. While the model was able to predict both time-dependent and plastic problems, it requires
a large amount of data for the calibration. A FFNN based constitutive model for nearly incompressible
anisotropic hyperelasticity is proposed in Tac et al. [78]. By using additional terms in the objective function
which enforce semi positive-definiteness of the FFNN’s Hessian, the neural network is approximately convex.
However, the polyconvexity of the invariants which are used as input quantities for the neural network is
not examined. For example, the invariant Ī2 = 1

2

(
tr C̄

)2 − 1
2 tr C̄

2 with C̄ = J−2/3C is not polyconvex [29,
Lemma 2.4]. Therefore, in [78], the polyconvexity condition is violated by the choice of input arguments.

In the present work, two polyconvex constitutive models are introduced, which are both based on input
convex neural networks (ICNNs), see [2], and formulated for hyperelastic, anisotropic material behavior and
finite deformations. ICNNs are a special class of FFNNs which are, through a suitable network architecture,
constructed as convex functions. As already discussed, there is a variety of data-driven or machine learning
methods which are used for constitutive modeling. However, for the construction of polyconvex potentials,
convex FFNNs (=̂ ICNNs) are a very natural choice, as the simple mathematical structure of FFNNs offers
a straightforward application of analytically received convexity conditions, c.f. Appendix A. While there are
several applications for ICNNs in, e.g., convex optimization [2, 12, 13], the application towards polyconvex
constitutive models has, to the best of the authors’ knowledge, not been studied yet. The first model
developed in this work is based on a set of invariants already introduced in Schröder, Neff, and Ebbing [73],
which fulfills the objectivity and material symmetry condition by construction. Using ICNNs, substantially
more complex functions for the potential compared to polynomial approaches can be created, while preserving
the polyconvexity of the model. While this model can be seen as a straightforward extension of invariant-
polynomials to the more flexible function space ANNs offer, the second constitutive model is specifically
designed for machine learning. Formulated directly in the deformation gradient, its cofactor and determinant,
the objectivity condition is not fulfilled by construction. That objectivity, i.e., invariance of the energy
W (QF ) = W (F ) under rotations Q ∈ SO(3), is not automatically satisfied, may be surprising at first glance.
Usually, hyperelastic formulations come with objectivity built in a priori. Indeed, objectivity is not derived
from an experimental observation, but is a physical law, c.f. sections 17 and 19 in Truesdell and Noll [79].
However, in consideration of a useful model for a compact set of deformation gradients F = Dφ, the possible
error in not exactly satisfying objectivity remains controllable. The benefit in giving up exact objectivity is
the increased flexibility in combining all the needed constitutive requirements. We remind that the major
obstacle in originally extending polyconvexity to the anisotropic setting was coming from exact objectivity

3


combined with the stress-free reference configuration and material symmetry. Here, the objectivity condition
is approximated using data augmentation, following [51]. The approach is based on mechanical considerations,
and requires no further simulation or experimental data. The anisotropy is taken into account with the group
symmetrization already introduced in Fernández et al. [20].

Both models can be calibrated with a moderate amount of data, which is shown for the highly challeng-
ing behavior of cubic beam-lattice metamaterials, using synthetic homogenization data. Modern additive
and advanced manufacturing methods allow for a variety of tailored materials with specifically designed mi-
crostructure, consisting of, e.g., beams and shells, which leads to mechanical characteristics not encountered
in classical materials, e.g., due to lattice instabilites within the microstructures [8, 26, 40, 44, 50, 53]. Nonlin-
ear multiscale simulations for this class of metamaterials can be executed by homogenizing the microstructure
[60] either in a current multiscale setting using the FE2 method [26], or sequentially based on the formulation
of an effective constitutive model [20, 39, 43], as applied here.

The outline of the manuscript is as follows. In Sect. 2, the present work starts with a short introduction
to the basics of constitutive modeling relevant to this work. In Sect. 3, the lack of polyconvexity for models
with unrestricted ANN architecture is discussed, and the two polyconvex ML based models are introduced.
Finally, in Sect. 4, the models are calibrated to the highly challenging, homogenized behavior of cubic lattice
metamaterials and compared with each other and to a conventional polyconvex model. The application to
another material symmetry, i.e., transverse isotropy, is demonstrated in Sect. 5. In Sect. 6, some issues raised
by the use of ML techniques in nonlinear elasticity theory are discussed. After the conclusion in Sect. 7, some
general properties of ICNNs are introduced and discussed in Appendix A.

Notation Throughout this work, tensor compositions and contractions are denoted by (AB)ij = AikBkj ,
a · b = aibi = ⟨a, b⟩, A : B = AijBij and (A : A)ij = AijklAkl, respectively, with vectors a and b, second
order tensors A and B and fourth order tensor A. The tensor product is denoted by ⊗, the second order
identity tensor by 1. The first Fréchet derivative of a function f w.r.t. X is denoted by DXf , the second
Fréchet derivative (or Hessian) is denoted by D2

Xf . For the function composition f(g(x)) the compact
notation f ◦ g ◦ x is applied. The set of invertible second order tensors with positive determinant is denoted
by GL+(3) :=

{
X ∈ R3×3 | detX > 0

}
, the special orthogonal group in R3 by SO(3) :=

{
X ∈ R3×3 |

XTX = 1, detX = 1
}
.

2 Basics of material theory
The hyperelastic potential

W : GL+(3) → R , F 7→ W (F ) (1)

corresponds to the strain energy density stored in the body Ω ⊂ R3 due to the deformation φ : Ω → R3,
and depends on the deformation gradient F = Dφ [34]. To ensure a physically sensible and mathematically
well-posed formulation, several restrictions on the potential must be considered. The restrictions relevant to
the present work are shortly introduced in the following:

The reference configuration of the body must be stress-free, i.e., for the first Piola-Kirchhoff stress

S = DFW (F ) (2)

it holds that S(1) = 0 [34]. In fact, this is implied by the condition that the potential (1) attains its unique,
global minimum at the identity, i.e., W (1) = 0 and W (F ) ≥ 0 [34]. The principle of objectivity [79] states
that the material behavior has to be independent of the observer, which means that the potential is invariant
under the transformation

W (QF ) = W (F ) ∀F ∈ GL+(3) , Q ∈ SO(3) . (3)

Following (2), this implies the invariance of the stress tensor under transformations according to

S(QF ) = QS(F ) ∀F ∈ GL+(3) , Q ∈ SO(3) . (4)

For models formulated in terms of the right Cauchy-Green tensor C = FTF , i.e., as W = W (C), the material
objectivity condition is automatically fulfilled, which is an advantage compared to models depending directly

4


on the deformation gradient. For anisotropic materials with symmetry group G ⊂ SO(3), the strain energy
must be invariant under the symmetry transformation [30]

W (F Q) = W (F ) ∀F ∈ GL+(3) , Q ∈ G ⊂ SO(3) , (5)

which implies the stress invariance condition

S(F Q) = S(F )Q ∀F ∈ GL+(3) , Q ∈ G ⊂ SO(3) . (6)

The growth condition

W (F ) → ∞ as detF → 0+
(
⇔ 1

detF
→ ∞

)
(7)

captures the physical consideration that for infinitely large volumetric compression, an infinite amount of
energy is required. There are several other growth conditions known in constitutive modeling, which regard
the material behavior for very large deformations and are often referred to as coercivity conditions.1 However,
while the case of large volumetric compression (7) is important to consider especially for highly compressible
materials, coercivity conditions regard material behavior which usually lies outside the considered deformation
modes. Therefore, they are of rather theoretical interest, and will not be considered throughout this work.

In finite elasticity theory, the existence of minimizers for the underlying variational functionals is guar-
anteed if the energy potential (1) fulfills the polyconvexity condition introduced by Ball [4, 5] and an ad-
ditional coercivity condition [47]. The potential W

(
F
)

is polyconvex if and only if there exists a function
P : R3×3 × R3×3 × R → R such that

W (F ) = P(F, Cof F, detF ) , (8)

so that P is convex in its arguments (F, Cof F, detF ). The function P is in general non-unique [70]. Due
to the reason mentioned above, the coercivity condition is not considered in this work. For convex func-
tions which are sufficiently smooth, the Hessian matrix is positive semi-definite [75], and the polyconvexity
condition can be formulated as

δξ ·D2
ξP(ξ) · δξ ≥ 0 ∀ ξ, δξ , (9)

with the rearranged arguments ξ = (F, Cof F, detF ) ∈ R19 and δξ ∈ R19 [17]. From this point on, the
notation ξ = (F, Cof F, detF ) ∈ R19 will be used to adress the argument in the polyconvexity contexts more
compactly. Polyconvexity implies ellipticity, while being more straightforward to include into the model
formulation than the ellipticity condition itself. The ellipticity (or rank-one convexity) condition [62, 89]

(a⊗ b) : D2
FW (F ) : (a⊗ b) ≥ 0 ∀ a, b ∈ R3 (10)

ensures material stability of the constitutive model.

3 Polyconvex constitutive models based on FFNNs
Feed-forward neural networks (FFNNs) are universal approximators [36], meaning that they can represent
continuous functions of arbitrary complexity. FFNN based constitutive models exploit this important prop-
erty by using them as highly flexible functions within the model formulations, e.g., to represent the energy
potential W defined in (1). This is in contrast to conventional approaches, which rely on a human choice
for the representation of W , thus potentially reducing the possible function space. Furthermore, through
the right choice of network architecture, FFNNs can be constructed as convex functions, which is an often
required property in the context of constitutive modeling.

In finite elasticity theory, the construction of potentials which are convex in the deformation gradient,
its cofactor and determinant is of special interest, which is referred to as polyconvexity [5]. By using convex
FFNNs, i.e., ICNNs, hyperelastic potentials can be generated which satisfy the polyconvexity condition, see

1For a coercive function, f (x) → ∞ as ∥x∥ → ∞ holds.

5


(8). Polyconvexity implies ellipticity, and thus ensures material stability of the constitutive models. The
convexity condition is not trivially fulfilled by arbitrary FFNNs. Consequently, for FFNN based models
found in the literature a loss of polyconvexity can often be detected, which may lead to a loss of ellipticity
and thus material instability. This is now discussed for the approach of Fernández et al. [20].

Fernández et al. [20] introduced several ML based constitutive models, from which we will examine the
potential model. The model uses a FFNN core for the hyperelastic potential, with the right Cauchy-Green
tensor C = FTF as input, and fulfills the material symmetry condition (5) with a group symmetrization of the
potential, see eq. (20). With the six independent components of C as input, the model fulfills the objectivity
condition (3) per construction. The internal FFNN used in this work is built upon multiple compositions of
the Softplus function, cf. (A.12), with unrestricted weights. This model violates the polyconvexity condition
in two aspects. First of all, only the principal diagonal elements of the right Cauchy-Green tensor C are
convex in the deformation gradient F , while the remaining elements of C are neither convex nor concave.2
Additionally, compositions of Softplus functions with arbitrary parameters are not convex, as is discussed in
corollaries A.4 and A.8 in App. A. Furthermore, while the metamaterials under consideration in [20] are highly
compressible, the volume compression condition (7) is not considered in the model formulation. Therefore
the model does not ensure a physical sensible behavior for the limiting case of infinitely large volumetric
compression.

A brief introduction to convex FFNNs, i.e., ICNNs, and the notation used throughout this work can
be found in App. A. In corollaries A.4 and A.6 the standard conditions for the fulfillment of convexity are
provided. Activation functions fulfilling these conditions are presented in theorem A.7 and corollary A.8.
These activation functions lead to the ICNN cores discussed in proposition A.9. In a nutshell, an ICNN is
easily constructed based on standard FFNNs with input vector X based on the following restrictions:

• the first hidden layer A1 is neuron-wise convex with respect to X (e.g., by employing convex activation
functions as Softplus, s(x) = log(1 + exp(x)), in each neuron)

• from the second A2 to the last hidden layer AH , each hidden layer is neuron-wise convex and non-
decreasing with respect to the previous hidden layer (e.g., by employing s(x) with non-negative weights
in each neuron)

• the scalar-valued output layer a is convex and non-decreasing with respect to the last hidden layer.

These restrictions imply that the composition a ◦ AH ◦ . . . ◦ A2 ◦ A1 ◦ X = ā(X) is convex in X. These
architectures are shortly denoted as ICNNs from now on. For several passages, the abbreviation A =
AH ◦ . . . ◦A1 will be used for the core (i.e., the hidden layers) of the ICNN. The collection of all weights and
biases of the ICNN will be abbreviated by the internal parameter vector p.

We now present two polyconvex ICNN-based approaches for anisotropic hyperelastic constitutive model-
ing, extending the works of Schröder, Neff, and Ebbing [73] and Fernández et al. [20].

3.1 Model based on invariants
Polyconvex model based on invariants In Schröder, Neff, and Ebbing [73], a polyconvex potential
for trigonal, tetragonal and cubic symmetry groups is proposed, see also [17]. In the following, we will shortly
describe the cubic potential. It is formulated in polynomials of invariants, which are based on a combination
of the right Cauchy-Green tensor and an anisotropic structural tensor. The fourth order tensor G is called a
structural tensor of the material symmetry group G ⊂ SO(3) if

G = Q ∗G ∀ Q ∈ G ⊂ SO(3) , (11)

where ∗ denotes the Rayleigh product [9]. For some material symmetry groups, second order structural
tensors are sufficient to represent the symmetry, while for other symmetries, the introduction of fourth or
sixth order tensors is required [90]. For cubic symmetry, a fourth-order structural tensor is required. In [73],
the fourth order structural tensor

Gcub =

3∑
i=1

ei ⊗ ei ⊗ ei ⊗ ei (12)

2This does not mean that objectivity and polyconvexity contradict each other, it rather shows how challenging it
is to combine both requirements.

6


is used for the cubic group G7, with ei being the basis vectors of a Cartesian coordinate system. A description
of the cubic group G7 can be found in [17, Section 3.7]. Using the structural tensor (12), a set of five polyconvex
invariants can be derived as

I1 = trC , I2 = tr (Cof C) , I3 = detC ,

J7 = C : Gcub : C , J11 = Cof C : Gcub : Cof C .
(13)

The first three invariants are the well-known isotropic invariants, while the remaining two invariants possess
the cubic symmetry. The invariants I1 and J7 are convex in F , the invariants I2 and J11 are convex in Cof F ,
and the invariant I3 is convex in detF . Since the invariants are formulated in the right Cauchy-Green tensor,
they fulfill the principle of objectivity. Based on these invariants, Schröder, Neff, and Ebbing [73] proposed
the potential

W SNE
cub = κ

n∑
r=1

(
1

(αr + 1)3αr
I
(αr+1)
1 +

3

(βr + 1)3βr
I
(βr+1)
2 +

1

(ηr + 1)3ηr
J
(ηr+1)
7 +

9

γr
I−γr

3

)
, (14)

where we introduced the additional parameter κ. With κ > 0, αr, βr, ηr ≥ 0 and γr ≥ −1/2, γr ̸= 0, the
potential is polyconvex, coercive and has a stress-free reference configuration.

Extension with neural networks A vector of group specific objective invariants I(ξ) is defined,
where each component of I(ξ) is convex in ξ, i.e., I(ξ) is a polyconvex vector-valued function. For instance,
I = (I1, I2, I3, J7, J11) can be considered for cubic materials. This motivates the model

P0(ξ; p) = a ◦ A ◦ I ◦ ξ , (15)

where the layers a ◦ A in (15) are restricted to convex and non-decreasing activation functions, since I
represents the first convex layer, see remark A.10 for a discussion. The model (15) is polyconvex due to the
usage of ICNNs und fulfills the objectivity and material symmetry conditions due to the considered invariants.

Growth condition The volumetric growth condition (7) can be fulfilled with coercive functions. How-
ever, since ICNNs are not necessarily coercive, they are not suited to fulfill this condition, and therefore, an
analytical term should be added to the potential (15) according to

W = P0(ξ; p) +Wvol (detF ) , (16)

where we choose the polyconvex term [29]

Wvol(detF ) =

(
detF +

1

detF
− 2

)2

. (17)

Computation of stress The scalar-valued function W given by (16), which consists of the neural
network P0 from (15) and the volumetric growth term Wvol from (17), is then interpreted as a hyperelastic
potential, see (1), and the first Piola-Kirchhoff stress S = DFW (F ) is calculated as its gradient, see (2). In
the present work, automatic differentiation [7] is used, which is widely available in modern machine learning
libraries. This approach is, of course, not only applicable to the present model but to any differentiable model
built in a ML library providing automatic differentiation.

Reference state In the reference configuration F = 1, the model (16) is not stress-free by construction.
However, elastic stress-strain data received from numerical or experimental investigations will always have
the property S(1) = 0. When the model is calibrated with this data, it also approximates the stress
behavior for F = 1, thus fulfilling S(1) = 0 in an approximate fashion. In Fernández et al. [20] a projection
approach is proposed which fulfills the stress-free reference configuration in an exact way. While this approach
preserves polyconvexity when formulated in terms of F instead of C, it is not compatible with the methods
of incorporating objectivity and material symmetry which are used in the present work, and therefore cannot
be applied.

7


3.2 Model based on the deformation gradient
Model formulation Based on the definition of polyconvexity (8), the most straightforward inputs for
a polyconvex FFNN-based model are the deformation gradient, its cofactor and determinant itself. Thus,
when ξ = (F, Cof F, detF ) ∈ R19 is used as input for an ICNN ā(ξ) = a ◦A ◦ ξ, the output ā(ξ) is convex in
ξ and the resulting potential is polyconvex, but does not fulfill the material symmetry condition, in general.

Here, the group symmetrization of a function ϕ(F ) with respect to a given finite group G ⊂ SO(3) with
#(G) elements, as introduced in Fernández et al. [20] for ML-based models, is of interest

ϕG(F ) =
1

#(G)
∑
Q∈G

ϕ(F Q) , (18)

since the group symmetrized function ϕG(F ) fulfills the material symmetry conditions of the considered
group. Application of group elements Q ∈ G on F , i.e, F → F Q, results in the linear transformation of
ξ = (F, Cof F, detF ) according to

ξ∗sQ = (F Q, (Cof F )Q, detF ), (19)

where we introduced the operator ∗s that specifies the transformation of ξ by Q, and exploited that Cof(F Q) =
(Cof F )Q [70]. Such a linear transformation does not affect the convexity of the ICNN ā(ξ), see corollary
A.6 for an explicit proof. This implies that the following potential model

P0(ξ; p) = a ◦ A ◦ ξ , W =
1

#(G)
∑
Q∈G

P0(ξ∗sQ; p) +Wvol (20)

is polyconvex and fulfills the material symmetry and volume compression conditions. It should be noted that
compared to the previous invariant based model, the first hidden layer of the core A in (20) is not restricted
to convex and non-decreasing activation functions, but only to convex functions. This allows more flexibility
in the first hidden layer.

Infinite groups and finite subgroups The anisotropy of many materials can be described by finite
groups, e.g., cubic, orthotropic or monoclinic materials, such that the group symmetrization (20) can be
carried out for the exact fulfillment of the material symmetry condition (5). For some materials, the symmetry
group is infinite, e.g., for transversely isotropic materials. For such cases, as also remarked in Fernández et al.
[20], a pragmatic solution for the usage of the group symmetrization (20) can be obtained by consideration
of a finite subgroup of the infinite group. For the case of transverse isotropy, a finite subgroup G∗

ti ⊂ Gti for
chosen N can be simply constructed by

G∗
ti :=

{
Qα

x | α = 2π n/N, n = 1, 2, . . . , N
}

⊂ Gti :=
{
Qα

x | α ∈ [0, 2π)
}

(21)

where Qα
x denotes a rotation around the preferred axis x by the angle α. The finite subgroup G∗

ti contains
N equidistant elements of Gti. As will be shown in Sect. 5, already N = 6 elements can be sufficient for an
excellent approximation of transverse isotropy.

Computation of stress As in the invariant-based, differentiable FFNN-based model, the present model
uses automatic differentiation for the computation of the stresses.

Objectivity While the determinant of the deformation gradient is an invariant quantity with respect
to change of observers, the deformation gradient itself and its cofactor depend on the choice of observer.
Thus, the model W as defined in (20) is generally not objective. While the objectivity condition could
be fulfilled trivially by using the components of the right Cauchy-Green tensor C, this would violate the
polyconvexity condition, as only the main diagonal elements of C are convex in the deformation gradient.
Consequently, in order to meet both polyconvexity and objectivity, we choose a quantity which is suitable
to fulfill the polyconvexity condition, and take further steps to approximate the objectivity condition. Then,
the objectivity of the model is approximated with a data augmentation approach, following Ling, Jones, and

8


Templeton [51]. Based on the existing calibration dataset for a single observer D = {F, S, W}, the dataset
is extended by a finite amount of additional observers according to

D̃ =
⋃
Q∈G

{QF, QS, W} , G ⊂ SO(3) . (22)

For pragmatic reasons, a finite amount of randomly distributed rotation matrices is chosen. The data
augmentation approach can be visualized as follows: Without data augmentation, the constitutive model is
calibrated with a single observer. As ab initio, the model does not know how to extrapolate the learned
material behavior to other observers, it is only applicable to the observer chosen in the calibration dataset.
When the data augmentation approach (22) is applied, the model is calibrated with multiple observers. If the
model is then evaluated with an arbitrary observer, the model will yield reasonable results, as the material
behavior for any observer can be seen as the interpolation between the observers with which the model was
calibrated. With a sufficient number of observers, the model can be trained to behave nearly independent of
the observer, as will be demonstrated in the upcoming examples.

In fact, for hyperelastic potentials, it is sufficient to apply the objectivity condition (3) to the potential
only, which directly implies the transformation rule for the stress tensor. In Ling, Jones, and Templeton [51]
only the potential values were extended and no visualization of evaluation of the stress values was provided.
However, the model quality can benefit from the additional information the stress tensor provides, and
therefore, both quantities are used for the data augmentation in this work. While this approach increases
the size of the training data and thus the required calibration time, it is important to note that the time
required for the model evaluation is not affected, and that no additional simulation or experimental data is
required.

Considering (3) and (4), objectivity could be further enforced by adding terms of the form |W (QF ) −
W (F )|2 and ∥S(QF )−QS(F )∥2 to the objective function used for calibration of the model. As, in this ap-
proach, both W and S are received by the constitutive model, the choice of F is not restricted to deformation
states used in the calibration dataset. Consequently, F may be sampled in a sensible range of deformations
in which the model is to be applied, e.g., following the sampling strategy as proposed by Kunc and Fritzen
[48], together with a finite amount of random rotation matrices Q. This approach takes into account that
objectivity is a physical requirement, which must be fulfilled independent of stress-strain data available for
a specific material. However, in the present work, no additional term is added to the objective function, as
the objectivity can already be approximated very well with the data augmentation approach (22), as will be
shown in Section 4.3.

4 Application to cubic metamaterials

4.1 Homogenized behavior of soft beam-lattice metamaterials
The performance of the models proposed in the previous section is now examined in application to the
homogenized behavior of beam-lattice structures with cubic anisotropy. In Fernández et al. [20], the ho-
mogenized behaviors of the cubic BCC cell and the cubic X cell are numerically investigated for sev-
eral calibration and test scenarios, with full data availability on the public GitHub repository https:
//github.com/CPShub/sim-data. The mechanical behavior of these metamaterials is highly nonlinear and
exhibits several challenging characteristics like lattice instabilities, which makes it a good benchmark case
for the models.

The cubic BCC structure consists of body-centered beams with additional beams along all edges. Identi-
fying the smallest unit from which the structure is built, the BCC unit cell is obtained which, regarding the
periodicity, contains beams only at three edges, see Fig. 1b. For large structures built from lattice metama-
terials, it is convenient to formulate a constitutive model for the homogenized behavior of the cells, instead
of simulating every single beam of the structure.

To determine the homogenized behavior of the BCC unit cell, finite element simulations are carried out,
where effective deformation gradients F are applied on the cell using displacements on the outer nodes of

3Figure from Fernández et al. [20].

9

https://github.com/CPShub/sim-data
https://github.com/CPShub/sim-data


(a) F11 = 0.63 (b) F11 = 1.00 (c) F11 = 1.50

Figure 1: Uniaxial deformation in x-direction for the BCC unit cell.3

the beams. With periodic boundary conditions, the behavior of the unit cell within a larger structure can
be simulated. The averaged, or effective, strain energy density W can directly be calculated with the strain
energy stored in the beams and the size of the cell. For the structures under consideration, the effective stress
response S of the cell can be computed with the reaction forces on the outer nodes and the size of the cell
[20]. In contrast to physical experiments, this numerical characterization yields not only the stress response
of the material, but also the strain energy density, and the resulting dataset

D = {(F1, S1, W1) , . . . } (23)

consists of triplets for corresponding deformation gradient, effective stress and effective strain energy density
for each simulation step and applied deformation.

For the calibration data used here, uniaxial, equibiaxial, planar, shear, and volumetric deformations were
applied on the unit cell. These deformations can also be applied in physical experiments. The calibration
dataset DC consists of 905 triplets. Beside the calibration data, three test cases were examined. For the first
two test cases, biaxial deformations with different stretch ratios are applied on the unit cell, while the third
test case is a combination of tension and shear deformation. All three test cases exhibit lattice instabilities
and complex deformations, which are not included in the calibration data. The test dataset DT consists of
605 datapoints. Further technical details on the simulations can be found in [20]. In the present work, the
first test case of [20] is referred to as “biaxial test”, while the third test case is referred to as “mixed test”.
Especially the “mixed test” is a good benchmark case, as the combination of tension and shear is a very
general deformation.

Due to the soft materials and high slenderness of the beams, lattice instabilities occur for several deforma-
tion modes. In Fernández et al. [20], the microstructure simulation were carried out with an experimentally
validated nonlinear post-buckling analysis approach [39]. Taking a closer look at the uniaxial deformation
of the BCC cell, it exhibits instabilities in both compression and tension, which are shown in Fig. 1. Even
for small uniaxial compression, the load bearing beams show instabilities, while for tension the less stressed
beams on the edges show instabilities, which results in a highly differing behavior of the cell for compression
and tension, leading to distinctive changes in the slope of the stress components, see Fig. 2. This highly
challenging behavior is observed for all deformation modes, in both the calibration and test cases.

The properties of the constitutive models are examined with the BCC cell, the behavior for the cubic X
cell is only shortly discussed. The X cell consists of body-centered beams, without additional beams at the
edges like the BCC cell. Therefore, less lattice instabilities occur compared to the BCC cell. While uniaxial
and equibiaxial deformation gradients are quite similiar, their stress response for the X cell differ by a factor
of ten, which is a challenging behavior for constitutive models. Apart from this case, the stress response
of all load scenarios has the same order of magnitude. Furthermore, the amount of data points is roughly
equal for all load scenarios. Therefore, for the following investigations, no data normalization is applied. If
necessary, strategies such as the L2 normalization of deformation cases proposed in [19] could be applied.

10


0.75 1.0 1.25 1.5

0.6

1

1.4

F11

F
ij

F11 F22 F33

0.75 1.0 1.25 1.5

0

10

20

F11

S
ij

S11 S22 S33

Figure 2: Deformation (left side) and first Piola-Kirchhoff stress S (right side) for uniaxial deformation of the
cubic BCC cell. Lattice instabilities express themselves by nearly horizontal stress values of S11 (compression)
and decreasing slope of S11 (tension). Stress in [hPa].

4.2 Preparation of the models
Analytical model (WSNE

cub ) The model proposed by Schröder, Neff, and Ebbing [73] is used as an
example for conventional polyconvex models formulated in terms of invariants. For the potential of the
model (14), n = 2 summands are used.

FFNN model by Fernández et al. [20] (WC) The potential model proposed by [20] is used as an
example for FFNN based models which are not polyconvex by construction. The model is similar to the one
discussed in section 3.2, with the difference that the six independent components of the right Cauchy-Green
tensor are used as input, therefore the model is objective by construction. Following [20], the models core is
built from three layers with 16 nodes using Softplus functions in each layer, and unrestricted parameters.

Polyconvex ICNN model based on invariants (W I) The set of cubic invariants introduced in
[73], see eq. (13), is used, with the additional invariant I∗3 = −2

√
I3. This additional polyconvex invariant

can be received from the last summand in eq. (14), and is important for the model to represent negative
stress values. Using four isotropic invariants and two cubic invariants, the input of the neural network is the
vector of invariants I = (I1, I2, I3, I

∗
3 , J7, J11) ∈ R6. An ICNN core based on Softplus (SP) functions, see

(A.12), is used, with the number of layers in {2, 3} and the number of nodes in {8, 16, 32}. The weights in
all layers are non-negative. The restrictions on the networks parameters, which are caused by the convexity
condition, are discussed in proposition A.9. For a compact notation, the core is referred to as e.g. SP [8, 8]
for a core containing two layers with eight nodes with SP functions in each layer.

Polyconvex ICNN model based on the deformation gradient (WF) For the second model, the
input (F, detF ) ∈ R10 is chosen. Several simulations showed that, for the metamaterials under consideration,
the deformation gradient alone is not enough to represent the material behavior, while there is no improvement
when using all arguments of (F, Cof F, detF ). The group symmetrization is carried out with the cubic group
G7 ⊂ SO(3), which contains 24 orthogonal transformations [17]. The network architectures are chosen as
described above for the invariant based model W I, but with arbitrary weights in the first layer, and non-
negative weights in all subsequent layers, c.f. proposition A.9.

Performance measures While, from experimental investigations of materials, only stress values are
available, the numerical evaluation of the unit cells yields both effective energy density and the stress tensor.
For a given dataset D, the mean squared error (MSE) of the constitutive model W□, □ ∈ {SNE, C, I, F}, is

11


defined as

MSE□ (p) =
1

# (D)

∑
F∈D

[
1

(J/m3)2

(
W (F )−W□ (F ; p)

)2

+
1

9Pa2
∥∥∥S (F )− S□ (F ; p)

∥∥∥2 ] (24)

where #(D) denotes the number of datapoints in D. The parameters p of the models are found as the mini-
mizers of the corresponding MSE. However, since the minimization of the MSE is a non-convex optimization
problem, a local minimum p may strongly depend on the initial guess and the optimization algorithm. Thus,
we introduce the mean deviation of two model instances (M̃D) with parameters pi and pj on a set of defor-
mation gradients DF as

M̃D
□
(pi, pj) =

1

# (DF )

∑
F∈DF

[
1

(J/m3)2

(
W□ (F ; pi)−W□ (F ; pj)

)2

+
1

9Pa2
∥∥∥S□ (F ; pi)− S□ (F ; pj)

∥∥∥2 ] .

(25)

Then, the overall mean deviation of a model (MD) with n instances is obtained as the averaged M̃D of all
possible combinations of model instances:

MD□ (p1, . . . , pn) =

(
n
2

)−1 n−1∑
i=1

n∑
j=i+1

M̃D
□
(pi, pj) . (26)

For the dataset used to calibrate the parameters, different sets of experiments may be applied on a material.
As long as each dataset contains all the required information, different calibration datasets should yield the
same model behavior. However, for models whose parameters have no physical interpretation (which is the
case for ML based models), different calibration datasets may lead to a different model behavior. Thus,
we introduce the mean deviation (MD) between a single model instance calibrated on the dataset D1 with
parameters p0 and multiple model instances trained on another dataset D2 with parameters p1, . . . , pn:

MD
□
(p0, p1, . . . , pn) =

1

n

n∑
i=1

M̃D
□
(p0, pi) . (27)

Implementation The model W SNE has been implemented in MATLAB R2021a, the machine learning
models were implemented in TensorFlow 2.3.0 with Python 3.7. Each machine learning model has been
trained with the full batch of training data and the ADAM optimizer using default settings, every architecture
was initialized three times. The models WC and W I were trained for 20, 000 epochs using the calibration
dataset DC . The model based on the deformation gradient WF was trained for 15, 000 epochs with the
calibration dataset DC , and retrained for 2, 000 epochs with the extended calibration dataset D̃C according to
eq. (22). From our experience, this strategy provides a good balance between accuracy, speed of convergence
and computation time required. Since the material objectivity of the model WF is only approximated,
both MSE and MD of the model are evaluated with 1, 024 random observers. For the additional observers,
uniformly distributed rotation matrices are generated using the “Spatial Transformation” package provided
by SciPy [80]. For the BCC cell, the models are also trained one time with the adapted calibration dataset
D∗

C which contains the shear and volumetric deformation cases (from DC), as well as the biaxial and mixed
test cases (from DT ), following the training strategies given above.

Reproducibility of the models The potentials can be reproduced with the information of the net-
work’s architecture and parameters, i.e., the number of layers and neurons, the connection of the neurons,
choice of input and output, activation functions, as well as the weights and biases for each neuron. How-
ever, the training of the model, i.e., the process of parameter calibration of the weights and biases, is of
utmost importance for its performance. As mentioned above, this calibration corresponds to solving a non-
convex optimization problem, for which minima are usually local and may strongly depend on the initial

12


0.75 1 1.25 1.5

0

10

20

F11

S
ij

S11 S22 S33

Figure 3: Evaluation of W SNE
cub for the BCC cell, calibrated only for the uniaxial deformation case. Points

depict the simulation data, while lines depict the calibrated model; stress in [hPa]. The S11 component is
not well-fitted.

guess and the optimization algorithm. Thus, for the same network architecture, typically several param-
eter sets are determined based on different random initializations, which renders the calibration process
non-reproducible. To meet this difficulty, we provide not only the calibration data, but also a compiled
version of our TensorFlow model instances as well as their sets of parameters in the public GitHub repository
https://github.com/CPShub/sim-data. With the information about the networks architecture and the
parameters used for the potential, it can be reproduced in any program, c.f. (A.1).

4.3 Model evaluation for BCC cell
The performance of the four models calibrated with the training data for the BCC cell is now discussed.
First of all, the evaluation of the analytical model W SNE only in the uniaxial deformation case is shown in
Fig. 3. As can be seen, even for this simple case the model fails to represent the material behavior. Thus,
this model is not considered for further, more detailed performance evaluations.

Before the machine learning models can be compared, the required amount of observers for the data
augmentation of the model WF must be examined. The model WF is calibrated with the full calibration
dataset D̃C , using 8, 16, 32 and 64 random observers, see eq. (22), and evaluated after calibration with 1, 024
random observers. The corresponding MSEs and relative calibration times are shown in Table 1, and the
shear response for each calibration is illustrated in Fig. 4. While, for a calibration with 64 observers, the
dataset is eight times as big as the dataset for eight observers, its calibration time is just under six times
as long. Although data augmentation increases the calibration time, for increasing datasets, this approach
profits from the fast evaluation of ANNs for large batch sizes. If the model response depends on the choice
of observer, the evaluation yields an area containing the stress response for all different observers, whereas
for sufficiently well approximated objectivity, minimum and maximum model response coincide, and the area
practically reduces to a single line. For a calibration with eight observers, the individual stress components
for the shear deformation are within a wide range, and the MSEs are very high. Therefore, the material
objectivity is not fulfilled. Using more observers for the data augmentation, the approximation quality of the
objectivity increases. For 64 observers in the calibration dataset, the shear responses for different observers
are indistinguishable from each other, and the MSEs are sufficiently small. This shows that objectivity can
be successfully learned using the data augmentation approach. In the following, the model WF is calibrated
using 64 observers, and all evaluations for the model are carried out with 1, 024 random observers.

In Table 2, the three best MSEs for the three machine learning models trained on DC and D̃C are
depicted and sorted for decreasing error on the test dataset DT . The model WF yields far better results
than W I for both calibration and test datasets, which may be caused by a smaller function space of the
model W I due to the human choice of invariants. While the MSEs of WF are slightly higher than the ones
of WC on DC , which is reasonable since the polyconvex model is more restricted in fitting the calibration
data than the non-polyconex one, it actually performs slightly better on the test data. This could be rooted
in the additional mathematical structure that polyconvexity incorporates into the model, making it more

13

https://github.com/CPShub/sim-data


0

7

14

0 0.25 0.5

0

7

14

0 0.25 0.5

8 observers

S
ij

16 observers

32 observers

F12

S
ij

64 observers

F12

S11 S12 S13

S21 S22 S23

S31 S32 S33

Figure 4: Variation of observers for WF for the cubic BCC cell. Points depict the simulation data, while
lines and shaded areas depict the calibrated model evaluated with 1, 024 observers. Stress tensor QT S(QF )
(4) for shear deformation is shown, with normal components in red colors, shear components in blue colors
and less important components in gray; stress in [hPa].

generalizable.
As already encountered in [20], different neural network architectures using a vastly differing amount of

parameters may lead to models of equal quality, with different results for multiple initializations using the
same architecture. Also, the stress error in eq. (24) dominates the overall MSE for all evaluations. For the
convex neural networks used in this work, the models need a sufficiently high amount of nodes and layers to
counteract the loss of flexibility due to the restrictions on the parameters. For the model WF, SP [16, 16, 16]
is the smallest architecture yielding good results, while there were no significant benefits for architectures
using more layers or nodes.

Furthermore, in Table 2 the mean deviations for the three machine learning models are shown, evaluating
the deviations of the different instances of each individual model. The deviations are evaluated for the datasets
DF

C and DF
T , which contain all deformation gradients applied in the calibration dataset DC and test dataset

DT , respectively. MD evaluates the deviation for the instances trained on DC , while MD compares the
deviation of the single instance trained on the dataset D∗

C to the instances trained on DC . For the model
WC the MD on DF

C is very low, which may be attributed to the high flexibility of the unrestricted network core,
allowing the model to be very similar on DC for every initialization. However, for DF

T , the MD of the model
WC is worse than its MSE on DT . Furthermore, for both DF

T and DF
C , the model behavior depends on the

dataset used for the calibration, which leads to a high MD. Actually, MD on DF
C is even three magnitudes

higher than MD. For the model W I, the MD on both datasets is about two magnitudes smaller than its
MSEs on the datasets. This may be caused by the additional mathematical structure that polyconvexity and
invariants incorporate into the model. While the deviation of multiple instances trained on the same dataset
is very low, the training on the adapted dataset D∗

C leads to a different model behavior, which results in
a high MD. The reason for this is that the model is not flexible enough to capture the material behavior.
The model WF has a flexible input and, due to the polyconvexity condition, a pronounced mathematical
structure. This leads to excellent results for both the MSE and the MD values, making it the only model of
this comparison which has both excellent approximation qualities and a consistent behavior within multiple

14


Observers for the MSE Relative cali-
calibration dataset DT DC bration time

8 1.25 · 104 8.64 · 103 1
16 3.97 · 103 1.65 · 103 1.98
32 3.42 · 103 6.84 · 102 2.95
64 7.61 · 102 2.57 · 102 5.70

Table 1: MSEs of calibrated WF models for the BCC cell with augmented data for learning objectivity.
Evaluation with 1, 024 random observers. Calibration times related to calibration time for 8 observers.

0.75 1 1.25 1.5

0

10

20

F11

un
ia

xi
al

-
S
ij

S11 S12 S13

S21 S22 S23

S31 S32 S33

0.6 1.0 1.4

0

8

16

F11

m
ix

ed
te

st
-
S
ij

Figure 5: Evaluation of WC for the BCC cell, shaded areas denote the model’s loss of ellipticity. The
ellipticity was checked with the Hessian of the potential, using 500 random unit vectors for each test vector,
c.f. (10). Points depict the simulation data, while lines depict the calibrated model. Normal components of
the stress tensor in red colors, shear components in blue, less important components in gray; stress in [hPa].

model instances. Furthermore, even for a calibration with the adapted dataset D̃∗
C the model behavior stays

consistent. The deviation MD has the same magnitude as MD, and also the MSEs for the instances trained
with different datasets have the same magnitude. For the following evaluations, the core SP [16, 16] is used
for W I, while for WC and WF the core with the smallest MSE is chosen.

In Fig. 5, the uniaxial deformation case and mixed test case for WC are shown to examine the ellipticity
of the model. As already encountered in Fernández et al. [20], the model shows excellent approximation prop-
erties. However, in both examined cases in Fig. 5, the model loses its ellipticity even for small deformations,
which causes material instability and would lead to major drawbacks in numerical applications such as the
finite element method. Consequently, for the metamaterials under consideration, it is important to include
the polyconvexity condition into the model formulation, as it implies ellipticity, and thus ensures material
stability. Typically, for soft materials, one does not expect loss of material stability for a deformation gradient
F in a bounded set including the identity deformation F = 1 (even if large elastic strains may occur). This
is, e.g., the case for isotropic elastic energies defined in the logarithmic strain tensor [61, 62, 69]. The loss of
ellipticity in these models occurs only for extremely large strains that cannot be observed in experiments.

In Fig. 6, the behavior of WC and WF is examined for volumetric tension and compression. Regarding
the volumetric growth condition (7), the strain energy density should rapidly grow for J = detF → 0+. The
model WF contains the term Wvol from (17) and thus fulfills the growth condition. In the formulation of WC,
the growth condition is not considered, and consequently it is violated for the extrapolation J → 0+. For the
metamaterials under consideration, lattice instabilities may lead to high volumetric compression, therefore
the growth condition should be included in the model formulation. It must be emphasized that both the
loss of ellipticity and the nonphysical behavior for J → 0+ are no specific drawbacks of the model WC as
proposed by Fernández et al. [20]. It is very likely that other ML-based constitutive models would show
the same behavior when calibrated to the examined data, as long as ellipticity and the volumetric growth
condition are not explicitly considered in the model formulation.

15


Model Param. Calibr. Deviations MSE
data DF

T DF
C DT DC

WC with SP [16, 16, 16] 673 DC 1.02 · 103 1.59 · 102
——"—— 1.40 · 103 1.58 · 102
——"—— 1.41 · 103 1.63 · 102

SP [16, 16, 16] 673 D∗
C 2.40 · 102 4.18 · 103

MD 2.05 · 103 5.91 · 100
MD 1.47 · 103 4.21 · 103

W I with SP [32, 32, 32] 2, 369 DC 1.33 · 104 1.11 · 104
SP [32, 32] 1, 313 —"— 1.41 · 104 1.10 · 104
SP [16, 16] 401 —"— 1.44 · 104 1.12 · 104
SP [16, 16] 401 D∗

C 1.77 · 105 1.30 · 105

MD 2.37 · 102 2.93 · 102
MD 3.47 · 103 1.84 · 103

WF with SP [16, 16, 16] 737 D̃C 7.47 · 102 2.45 · 102
——"—— 7.74 · 102 3.43 · 102
——"—— 8.05 · 102 2.62 · 102

SP [16, 16, 16] 737 D̃∗
C 6.81 · 102 5.42 · 102

MD 1.60 · 102 1.11 · 102
MD 4.36 · 102 2.66 · 102

Table 2: Deviations of model instances and MSEs of calibrated ML models for the BCC cell. MD evaluates
the deviation for the instances trained on DC , while MD compares the deviation of the instance trained on
D∗

C to the instances trained on DC .

0 0.5 1 1.5
0

2

4

F11 = J1/3

st
ra

in
en

er
gy

de
ns

it
y

data WC WF

Figure 6: Evaluation of WC and WF for volumetric deformation of the cubic BCC cell; strain energy density
in [kJ/m3].

In Fig. 7, a subset of the calibration and test cases for the model W I is shown to examine some model
characteristics. The model yields acceptable results for deformation gradients with dominating main diagonal
elements. For the uniaxial and equibiaxial calibration case, the component S33 shows a large deviation from
the simulation data, which also transfers to the biaxial test case. For shear deformation, the model completely
fails to represent the simulation data, consequently, it also fails to represent the mixed test case. While,
basically, the set of invariants for the model could be extended, it is unlikely that the model behavior for
the cubic metamaterials under considerations can be improved with this approach. While the model shows

16


0.75 1 1.25 1.5

0

10

20

0 0.25 0.5

0

7

14

0.8 1.0 1.2 1.4

0

11

22

0.6 1.0 1.4

0

8

16

F11

un
ia

xi
al

-
S
ij

F12

sh
ea

r
-
S
ij

F11

bi
ax

ia
lt

es
t

-
S
ij

F11

m
ix

ed
te

st
-
S
ij

S11 S12 S13

S21 S22 S23

S31 S32 S33

Figure 7: Evaluation of W I for the cubic BCC cell. Points depict the simulation data, while lines depict
the calibrated model. Normal components of the stress tensor in red colors, shear components in blue, less
important components in gray; stress in [hPa].

0.75 1 1.25 1.5

0

10

20

F11

un
ia

xi
al

-
S
ij

S11 S12 S13

S21 S22 S23

S31 S32 S33

0 0.25 0.5

0

7

14

F12

sh
ea

r
-
S
ij

0.8 1.0 1.2 1.4

0

11

22

F11

bi
ax

ia
lt

es
t

-
S
ij

0.6 1.0 1.4

0

8

16

F11

m
ix

ed
te

st
-
S
ij

Figure 8: Evaluation of WF for the cubic BCC cell. Points depict the simulation data, lines depict the
calibrated model. Normal components of the stress tensor in red colors, shear components in blue, less
important components in gray; stress in [hPa].

17


Model Param. Calibr. Deviations MSE
data DF

T DF
C DT DC

WF with SP [16, 16, 16] 737 D̃C 5.41 · 102 2.38 · 102
——"—— 5.62 · 102 2.61 · 102
——"—— 8.82 · 102 2.54 · 102

MD 1.47 · 102 5.64 · 101

Table 3: Deviation of model instances and MSEs of calibrated WF model for the X cell.

0.6 1 1.4

0

1.5

3

0.6 1 1.4

0

10

20

F11

un
ia

xi
al

-
S
ij

F11

eq
ui

bi
ax

ia
l-

S
ij

S11 S22 S33

Figure 9: Evaluation of WF for the cubic X cell. Points depict the simulation data, while lines and shaded
areas depict the calibrated model; stress in [hPa].

drawbacks especially for shear deformations, the additional invariants can only use main diagonal elements
of C in order to be polyconvex, which makes it hard to gain flexibility for shear deformations. However, for a
wide range of materials with less challenging behavior, the invariant-based model may still be a good choice,
which is demonstrated in Section 5 for transverse isotropy.

In Fig. 8, a subset of the calibration and test cases for the model WF is shown to examine the model’s
characteristics. The model shows excellent results for every deformation mode of the calibration dataset and
the test dataset, with only small deviations of the S12 component for the mixed test case. After recalibration
of the model with the concatenation of calibration and test dataset for 1, 000 epochs, the model can perfectly
represent the simulation data for both calibration and test data, which is not shown in Fig. 8. The data
augmentation approximates the material objectivity so well, that no dependence on the observer can be seen
at all.

4.4 Model evaluation for X cell
In the following, for the evaluation of the performance of the models calibrated with the training data for the
X cell, only the results for the model WF are shortly discussed; the results for the model W I confirmed the
observations made for the BCC cell without yielding further insights. The model WF was initialized three
times using three layers with 16 nodes in each layer, leading to the MDs and MSEs shown in Table 3. In
Fig. 9, the uniaxial and equibiaxial deformation mode for the model with the best DT are shown. While the
model shows excellent agreement with the simulation data for almost all calibration cases, in the uniaxial
tension regime, the model shows a slight dependence on the choice of observer. The reason for this is that
the uniaxial and equibiaxial deformations are similar, while their stress response differs by a factor of ten.
Therefore it is challenging for the model to capture the material’s behavior for both uniaxial and equibiaxial
deformations. This may be resolved by an adapted training strategy as proposed in Fernández, Fritzen, and
Weeger [19], for which the stress responses of the different deformation cases are scaled to a comparable
magnitude for the training. We should also remark that approximate satisfaction of objectivity does not
conflict with an existence theorem based on polyconvexity and growth or coercivity conditions.

18


5 Application to transverse isotropy
After the detailed examinations for cubic lattice metamaterials in Section 4, the application of the polyconvex
ML models to transverse isotropy is now briefly discussed. In doing so, we demonstrate the straightforward
applicability of our models to other symmetry groups. Also, for the highly challenging behavior of cubic
lattice metamaterials, the invariant-based model W I showed poor approximation quality. With the following
example we show that, for a wide range of materials, W I can still be an appropriate choice.

5.1 Data generation
Analytical transversely isotropic potential For the following investigations, we generate data
with the polyconvex model proposed by Schröder, Neff, and Ebbing [72], which is applicable to several
symmetry groups, including transverse isotropy. In the following, the preferred axis of the transversely
isotropic symmetry group (21) is chosen as the x1-axis, which motivates the second order structural tensor

Gti = diag

(
β2,

1

β
,
1

β

)
. (28)

Using this structural tensor, the two transversely isotropic invariants

J4 = tr (C Gti) , J5 = tr (Cof(C)Gti) (29)

can be derived, which are convex in F and Cof F , respectively [72]. Together with the isotropic invariants
I1−3, Schröder, Neff, and Ebbing [72] proposed the potential

W SNE
ti = α1 I1 + α2 I2 + δ1 I3 − δ2 log

(√
I3

)
+

η1
α4 (trGti)

α4
(Jα4

4 + Jα4
5 ) , (30)

which is objective by construction, and polyconvex if all parameters are equal to or greater than zero. The
parameter δ2 depends on the other parameters and is chosen such that the model is stress-free in the reference
configuration. Note that the parameter β of the structural tensor (28) needs to be specified as well. Here,
we use the parameter values (β, α1, α2, δ1, δ2, α4, η1) = (2, 8, 0, 10, 56, 2, 10), which were fitted in [72] to
referential data of a not further specified real-world material.

Using the potential (30), transversely isotropic data can be generated. As was already the case for the
numerical homogenization of cubic lattice metamaterials, the analytical potential (30) provides both energy
and stress values. However, for the following investigations, we will only make use of the stress values, which
will demonstrate that energy values, which are typically not available in experiments, are not necessarily
required to calibrate the proposed polyconvex ML models. Given as an analytical function, the potential
(30) can directly be evaluated for a given deformation gradient F ∈ GL+(3). This leaves the question open
of how to choose F for the generation of calibration and test datasets.

Calibration dataset The calibration dataset DC consists of uniaxial and equibiaxial tensile tests, as
well as a shear deformation. For the special cases of uniaxial and equibiaxial tensile tests, the corresponding
boundary-value problem can be directly formulated as systems of non-linear equations. For uniaxial tension
in x1-direction, deformation gradient and stress tensor are given by

F = diag (F11, F22, F33) , S = diag (S11(F ), 0, 0) , (31)

where F11 is prescribed, F22 = F33 are unknown and S11(F ) = DF11
W SNE

ti (F ). For equibiaxial tension in
x1, x2-directions,

F = diag (F11, F22, F33) , S = diag (S11(F ), S22(F ), 0) (32)

holds, where F11 = F22, F33 is unknown and S11(F ) = DF11
W SNE

ti (F ), S22(F ) = DF22
W SNE

ti (F ). These non-
linear systems of equations are then solved with standard functions provided by MATLAB R2021a, which
provides the overall deformation gradients for uniaxial and equibiaxial tensile tests. For this, 200 equidistant
values F11 ∈ [0.5, 2] are prescribed. The shear deformation F = 1+ γ(e1 ⊗ e2 + e2 ⊗ e1) is evaluated for 250
equidistant values γ ∈ [0, 0.5]. Selected data points are visualized in Fig. 10 and Fig. 11.

19


Test dataset The test dataset DT consists of a biaxial test and a combined tension-shear test. Note that
the “biaxial test” and “mixed test” are different from the ones used in Sect. 4. For the biaxial test, the system
of nonlinear equations

F = diag (F11, F22, F33) , S = diag (S11(F ), S22(F ), 0) (33)

with prescribed F11, F22 = 0.5F11 is solved for F33 for 100 equidistant values F11 ∈ [0.5, 2]. The mixed test
case

F =

1 + 0.2λ 0.2λ 0
0 1 + 0.1λ 0
0 0 1− 0.1λ

 (34)

is evaluated for 100 equidistant values λ ∈ [−1, 2.5].

After generation of the deformation gradients F for all calibration and test cases, the first Piola-Kirchhoff
stress S = DFW

SNE
ti (F ) is evaluated, and the resulting datasets consist of tuples

D = {(F1, S1) , . . . } . (35)

Altogether, the calibration dataset DC consists of 650 tuples, while the test dataset DT consists of 200 tuples.

5.2 Model preparation
For the invariant-based ML model W I, the two transversely isotropic invariants from (29) together with
the three isotropic invariants from (13) and the additional invariant I∗3 = −2

√
I3 form the input I =

(I1, I2, I3, I
∗
3 , J4, J5) ∈ R6 of the neural network. For the deformation gradient based model WF, the input

(F, detF ) ∈ R10 is chosen. For the network core of W I, one hidden layer with eight nodes turned out as
a sufficiently accurate choice, while for WF three hidden layers with 32 nodes in each layer are chosen. As
before, convexity is ensured by choice of the convex Softplus activation function in each node and restrictions
on the network parameters, which are discussed in Section 4.2 and Proposition A.9, respectively. The
transversely isotropic symmetry group Gti has an infinite number of elements, see (21). In order to apply the
group symmetrization approach from (20) on WF, the group is approximated by six rotations around the
x1-direction, c.f. eq. (21).

For the model calibration, the MSE

MSE□ (p) =
1

# (D)

∑
F∈D

1

9Pa2
∥∥∥S (F )− S□ (F ; p)

∥∥∥2 (36)

with W□, □ ∈ {I, F} is applied, c.f. Sect. 4.2. Here, the MSE for the shear deformation is weighted twice,
as its stress response is considerably lower than the stress response of the other calibration cases. For
the data augmentation for objectivity of WF, see (22), 128 random rotation matrices are used, while for the
group symmetrization for transverse isotropy of WF, see (20), six equidistant rotations around the x1-axis are
applied. Since both objectivity and material symmetry are only approximated for WF, the model is evaluated
with 1, 024 random observers for the verification of objectivity (3) and with 60 equidistant rotations around
the x1-axis for the material symmetry (5). Both models are initialized three times and each trained for 5, 000
epochs.

Further technical details are discussed in Sect. 4.2. The MATLAB code used to generate calibration and
test data, as well as compiled versions of both ML models and their sets of parameters are provided in the
public GitHub repository https://github.com/CPShub/sim-data.

5.3 Model evaluation
In Table 4, the MSEs of the different model initializations are shown. The ML models show excellent
agreement with both the calibration and test dataset. In particular, even though W I has only one layer
with eight nodes, it can represent the data almost perfectly, which is most likely caused by the fact that the
ML model W I uses the same invariants as the analytical potential (30). A subset of the training dataset

20

https://github.com/CPShub/sim-data


0.5 1 1.5 2

-0.75

0

0.75

1.5

0 0.25 0.5

-0.25

0

0.25

0.5

0.5 1 1.5 2

-0.75

0

0.75

1.5

0.8 1.0 1.2 1.4

-0.3

0

0.3

0.6

F11

un
ia

xi
al

-
S
ij

F12

sh
ea

r
-
S
ij

F11

bi
ax

ia
lt

es
t

-
S
ij

F11

m
ix

ed
te

st
-
S
ij

S11 S12 S13

S21 S22 S23

S31 S32 S33

Figure 10: Evaluation of W I for transverse isotropy. Points depict data from the analytical model W SNE
ti ,

while lines depict the evaluation of the calibrated model W I. Normal components of the stress tensor are
shown in red colors, shear components in blue, less important components in gray; stress in [hPa].

0.5 1 1.5 2

-0.75

0

0.75

1.5

F11

un
ia

xi
al

-
S
ij

S11 S12 S13

S21 S22 S23

S31 S32 S33

0 0.25 0.5

-0.25

0

0.25

0.5

F12

sh
ea

r
-
S
ij

0.5 1 1.5 2

-0.75

0

0.75

1.5

F11

bi
ax

ia
lt

es
t

-
S
ij

0.8 1.0 1.2 1.4

-0.3

0

0.3

0.6

F11

m
ix

ed
te

st
-
S
ij

Figure 11: Evaluation of WF for transverse isotropy. Points depict data from the analytical model W SNE
ti ,

while lines depict the evaluation of the calibrated model WF. Normal components of the stress tensor in red
colors, shear components in blue, less important components in gray; stress in [hPa].

21


Model Param. MSE
DT DC

W I with SP [8] 65 1.52 · 10−1 2.90 · 10−2

——"—— 7.84 · 10−1 6.41 · 10−2

——"—— 8.47 · 10−1 3.44 · 10−2

WF with SP [32, 32, 32] 737 3.20 · 100 2.80 · 10−1

——"—— 3.40 · 100 3.44 · 10−1

——"—— 4.12 · 100 2.95 · 10−1

Table 4: MSEs of calibrated ML models for the transversely isotropic data.

and both test cases are shown for W I and WF in Fig. 10 and 11, respectively. Again, the model W I shows
excellent agreement for the transversely isotropic data. For the shear calibration case and the test cases, WF

shows a slight dependence on the observer. Overall, both ML models are able to represent the analytical
potential (30) very well, and here especially the invariant-based model W I shows excellent results. As such
analytical potentials are successfully applied in, e.g., modelling of soft biological tissues [6], this implies that
the polyconvex ML models are also applicable to a wide range of real-world materials.

6 A critique of machine learning in nonlinear elasticity theory
At this point, we would like to briefly discuss some general issues raised by the use of machine learning
techniques in nonlinear elasticity theory. First and foremost, we would like to point out that these methods
are not meant to serve as a replacement for classical analytical models, but rather as an addition to the
already existing extensive theoretical framework. More specifically, we want to address three interrelated
shortcomings of the data-driven approach:

• the lack of an intuitive interpretation of the model and its parameters;

• the unstable (and, in practice, even non-deterministic) dependence of the parameter values on the
experimental data;

• the uncertainty of whether the resulting model is applicable to problems outside the range of prior
experiments.

To a smaller extent, all three of these issues can be observed for a number of analytical models as well,
especially some phenomenological models with a large number of parameters, which could be considered a
precursor to the modern purely data-driven approaches.

6.1 Analytical models
For comparison, we first consider the the classical, isotropic Hencky strain energy

WH : GL+(3) → R , WH(F ) = µ ∥dev log
√
FTF∥2 + κ

2
[tr(log

√
FTF )]2 , (37)

which depends solely on the two physical parameters µ (the shear modulus) and κ (the bulk modulus).
While it is well known that the elasticity model induced by the Hencky energy does not provide an

accurate description of very large deformations [3, 65], it is indeed highly accurate for up to moderate strains
of about 20% [3]. Moreover, the relation between the elastic behaviour predicted by the Hencky model
and the experimental data used to determine the parameters is clearly accessible to direct interpretation:
The shear modulus and the bulk modulus are determined by the material’s response to shear stresses and
hydrostatic pressure, respectively, and in turn influence the stress response to certain modes of deformation,
namely to simple shear and purely volumetric strain. In particular, this direct correspondence between the

22


parameter values and the model’s behaviour can be used to examine the plausibility of a specific parameter
set, even in the absence of additional test data.

Moreover, Hencky deduced his material model from a number of simple axiomatic assumptions [32, 33,
61, 63]. The applicability of his model to deformations not included in prior experiments can therefore be
based on whether or not (or rather: to what degree) his postulates hold under the new circumstances. It is
thereby possible to reasonably assess the limitations of Hencky’s model.

This direct correspondence between the mechanical-geometrical interpretation of the model and its pa-
rameters can no longer be established for other hyperelastic material models, especially for so-called phe-
nomenological (or “heuristic”) models. For example, the Ogden energy, which can be expressed in terms of
the singular values λ1, λ2, λ3 of F via

WOG : GL+(3) → R , WOG(F ) =

M∑
i=1

µi

αi
(λαi

1 + λαi
2 + λαi

3 − 3) , (38)

with 2M parameters µi > 0, αi ∈ R, can provide a much better fit to empirical observations than the Hencky
model for very large deformations [65] if the number 2M of material parameters is sufficiently high. However,
there is no longer any intuitive relation between these parameters and the predicted material behaviour.

Furthermore, the (globally) optimal choice of parameters for fitting the energy to a given dataset is
difficult to determine due to the strongly nonlinear dependence of the induced stress-strain relation on the
parameter values [65]. Therefore, in practice, the result of the parameter optimization is not fully determined
by the measured empirical data, but also affected by the random influence of, for example, the chosen starting
points of the optimization algorithm. In particular, two possible outcomes of such an optimization procedure
might yield a similar (or even an identical) quality of fit for the Ogden model to the limited experimental
data, whereas the material behaviour predicted by those two distinct optimized models for other deformations
might differ significantly. Therefore, a high degree of uncertainty must remain about the prediction quality
exhibited by material models such as Ogden’s, particularly when applied to deformations which are not
included in (or closely related to) the original experimental observations. This problem is closely related to
the more general notion of overfitting, i.e. optimizing too specifically to a given dataset, which tends to occur
for model functions with a high number of parameters.

6.2 Data-driven models
Machine-learning based approaches share and even amplify these shortcomings of highly complex phenomeno-
logical models. Due to the general nature of data-driven methods, the number of parameters required for
closely fitting such a model to a given specific dataset is necessarily rather high, even when compared to
complex models such as the Ogden energy. In addition, for most machine learning methods used today (in-
cluding neural networks), the resulting model cannot (easily) be stated explicitly in the form of a closed-form
analytical expression. It is therefore extremely difficult, if not impossible, to develop an intuitive understand-
ing of the relation between such a data-driven model and its parameters on the one hand and the predicted
material behaviour on the other.

Of course, these problems have been recognized in many other fields of research where machine learning
techniques have been applied, and a number of approaches have been suggested to determine not only the
influence of different parameters on the prediction, but also the direct relation between the training data and
the resulting output in a humanly comprehensible fashion [55, 74, 76]. However, these techniques still lack
the reliability and the direct intuition offered by more traditional models.

Similarly, the phenomenon of overfitting is a well known issue in the field of machine learning, and a
number of precautions (such as the careful distinction between training and validation data) are taken in
order to alleviate this problem. However, since any form of validation or testing is still based on available
experimental data, any assumptions about the applicability of these models to circumstances outside the
range of prior observations must remain unfounded even in the best of cases.

Thus, we consider it as extremely important to inform machine learning models with as much physi-
cal and mathematical structure as possible (as done here with the hyperelasticity, anisotropy, objectivity,
and polyconvexity properties) and apply them only in cases where classical approaches cannot provide an
acceptable accuracy (as is the case here due to the strong nonlinearity of the lattice microstructures).

23


Finally, and perhaps most importantly, even if the resulting trained algorithm does indeed provide an
accurate model in all practically relevant situations, it does not offer any insight as to why its predictions
are accurate. Again, this can be contrasted with the aforementioned Hencky elasticity model: Although not
deduced ab initio, the model has indeed been developed originally by Heinrich Hencky from simple geometrical
and mechanical considerations, and a careful study of his deductions can without doubt further the reader’s
understanding of continuum mechanics in a way that cannot be matched by inspecting the “black box” that
results from training a machine learning algorithm.

Of course, the above considerations are not restricted to applications of data-driven models to nonlinear
elasticity theory, but equally apply to many other domains of the natural sciences where machine learning
has recently been demonstrated to yield promising results. Machine-learning based approaches can (and
will) most certainly be employed to improve the accuracy of predictions and thus the quality of models and
simulations in the years to come, most likely resulting in considerable technological advancements. However,
for the reasons outlined above, machine learning should not be considered as a full-fledged replacement of more
traditional, analytical models, now or in the future, even if the outcomes seem to match or even surpass those
resulting from more classical approaches. In the past, humanity has found a major motivation for developing
a further understanding of nature in the dependence of (practically applicable) scientific techniques on the
scientific method [67], and it would be most unfortunate if the success of machine learning in advancing the
former would lead us to neglect the latter.

7 Conclusion
In the present work, two machine learning based constitutive models are proposed, which fulfill the poly-
convexity condition by using input convex neural networks. This implies ellipticity of the constitutive mod-
els, which ensures material stability. The hyperelastic models are formulated for finite deformations and
anisotropic material behavior. Furthermore, the neural networks yield highly flexible constitutive models,
which are adaptable to a wide range of materials.

The first model W I is based on a set of polyconvex, anisotropic invariants proposed by Schröder, Neff,
and Ebbing [73], see eq. (13) and (15), which fulfill the material objectivity and material symmetry conditions
by construction. The neural network core is able to create highly nonlinear functions from the invariants,
while conventional models are restricted to comparatively simple polynomials. Depending on the anisotropy
class, various sets of invariants can be used.

The second model WF fully exploits the approximation capabilities of artificial neural networks. For-
mulated in the deformation gradient, its cofactor and determinant, the material objectivity condition is not
fulfilled by construction, which would be a major drawback for conventional constitutive models. In the
context of machine learning, however, the model is trained to approximate the objectivity condition, using
data augmentation of the calibration dataset, cf. (22), which is possible due to the high flexibility of the
models, and the high-performing optimization algorithms available in several machine learning libraries, e.g.,
TensorFlow. The present work not only uses the potential values, as done in Ling, Jones, and Templeton [51],
but also the stress values. This is convenient as the stress is the quantity of interest for many applications of
the constitutive model, and furthermore, the general approximation quality of the model can benefit from the
additional information that the augmented stress values provide. For incorporating the material symmetry,
the group symmetrization introduced in Fernández et al. [20] is used, see eq. (20).

The model capabilities are examined with synthetic homogenization data of cubic metamaterials used in
Fernández et al. [20], and compared to the polyconvex model proposed by Schröder, Neff, and Ebbing [73].
The simulation data offers a highly challenging benchmark case, with characteristics like lattice instabilities
for several deformation modes. The analytical model from [73] is not able to capture the behavior of the
material, even for the uniaxial deformation case. The model WF shows excellent performance, not only for
the calibration data, but also for several test scenarios which were not included in the training of the model.
The evaluation of the model W I gave acceptable results for deformation gradients with dominating main
diagonal elements, but failed to represent the stress response for shear deformations. However, when fitted
to data generated from the analytical, transversely isotropic model from [72], which represents a real-world
material, the model W I also delivered excellent results. This shows that both models are applicable to a wide

24


class of anisotropic materials, while WF is preferable for highly challenging metamaterials. Apparently, the
polyconvexity conditions greatly improves the generalization capabilities of ANN-based constitutive models,
such that they can be trained on fairly small training datasets. Here, deformation modes which are commonly
applied in physical experiments were used. Nevertheless, due to their high flexibility, the models can benefit
from a wider range of calibration data and yield even better results. The data augmentation approach for
the calibration of the model WF does not require additional simulation or experimental data, the extended
dataset is created purely from mechanical considerations.

The models are formulated as general as possible, and can be adapted to a wide range of anisotropic,
hyperelastic materials. It lies in the very nature of machine learning that constitutive models based on neu-
ral networks are, to some extent, more complex than their conventional counterparts. As to what extent,
the present work suggests that there are two major ways: For a wide range of materials with a moderately
challenging behavior, very small ML models can be used, c.f. the excellent performance of the small invariant
based model in Sect. 5. The model complexity of this approach is close to the one of analytical potentials,
without the need to manually construct a function for the specific material behavior at hand. However, for
very complex material behavior, the full flexibility of neural networks can be utilized by using bigger net-
works, c.f. the deformation gradient based model in Sect. 4. In both cases, the application to finite element
simulations will be important for future research. Considering the infinitely continuously differentiable
neural network cores and the ellipticity of the proposed models, they offer a straightforward adaption for
this, with favorable numerical behavior and trivial computation of stress and stiffness tensors, if automatic
differentiation functionalities are considered. For future work, it would be valuable to investigate the for-
mulation of polyconvex FFNNs with a volumetric-deviatoric decomposition of the deformation gradient for
(nearly) incompressible materials. Furthermore, the incorporation of parametric dependencies, such as the
aspect ratio of a microstructure, into polyconvex FFNNs should be investigated.

Conflict of interest. The authors declare that they have no conflict of interest.

Acknowledgment. The work of Dominik Klein is supported by the Graduate School CE within the Centre
for Computational Engineering at Technical University of Darmstadt. Patrizio Neff acknowledges support in
the framework of the DFG-Priority Programme 2256 “Variational Methods for Predicting Complex Phenom-
ena in Engineering Structures and Materials”, Neff 902/10-1, Project-No. 440935806.

Data availability. The authors provide access to the complete simulation data required to reproduce the
results through the public GitHub repository https://github.com/CPShub/sim-data.

A Input convex feed-forward neural networks
FFNNs are a special class of artificial neural networks, which can be recursively defined as the composition
of several vector-valued functions [1, 46]. The components of the vectors are referred to as nodes or neurons,
the function in each neuron is referred to as activation function.

Definition A.1 (Feed-forward neural networks (FFNNs)). The FFNN with vector-valued input X,
H hidden layers and scalar-valued output function a is given by

X ∈ Rn[0]

A1 = A1

(
W [1]X + b[1]

)
∈ Rn[1]

,

Ah = Ah

(
W [h]Ah−1 + b[h]

)
∈ Rn[h]

, h = 2, . . . ,H

a = a
(
W [H+1]AH + b[H+1]

)
∈ R.

(A.1)

Weights W [h] ∈ Rn[h]×n[h−1]

and bias b[h] ∈ Rn[h]

form the set of parameters, which is optimized when the
model is calibrated. X and a are referred to as input and output layer, respectively, while the layers Ah are
referred to as hidden layers with component-wise applied activation functions according to

(A (W X))i = Ai

(
⟨w[i], X⟩

)
, W =

(
w[1], . . . , w[n]

)T

. (A.2)

25

https://github.com/CPShub/sim-data


We apply the short notation for feed-forward neural networks

a ◦ A ◦X (A.3)

with the networks core A = AH ◦ . . . ◦A1.

Definition A.2 (Input convex neural networks (ICNNs)). The FFNN a = a◦A◦X is called an ICNN,
when the scalar-valued output a is convex w.r.t. the vector-valued input X [2].

Sufficiency conditions for the fulfillment of convexity in the case of function compositions are given in
the following theorem.

Theorem A.3. The function ā = a ◦B ◦X is convex in X if the function a is convex and non-decreasing in
B, and B is component-wise convex in X.

Proof. We have to show the positive semi-definiteness of the function’s Hessian [29, 75]:

D2
X ā = (DXB)

T ·D2
Ba ·DXB +DBa ·D2

XB (A.4)

The derivatives DXB are mappings from the vector space X into the vector space B. When a is choosen
as a convex function of B, the Hessian of a w.r.t. B is positive semi-definite and hence the first term in eq.
(A.4) is positive semi-definite, see also observation 7.1.8 in [35]. The second term in eq. (A.4) is positive
semi-definite when a is non-decreasing in every component of B, and B is component-wise convex in X.

Corollary A.4. A FFNN is convex, when (i) the first hidden layer of the network’s core is component-
wise convex w.r.t. the input, (ii) every following hidden layer is component-wise convex and non-decreasing
w.r.t. the previous layer, and (iii) the scalar-valued output function is convex and non-decreasing w.r.t. the
last hidden layer.

Proof. This follows by recursively applying theorem A.3 to the components of the hidden layers.

Theorem A.5. Convexity is preserved under affine transformations.

The proof is clear but we provide it for the convenience of the reader.

Proof. The Hessian of the convex function a applied on an affine transformation of its argument, i.e., a
(
X̃
)
=

a (X C +D) with arbitrary, but constant C and D, is positive semi-definite.

D2
Xa =

(
DXX̃

)T ·D2
X̃
a ·DXX̃ +DX̃a ·D2

XX̃ (A.5)

The positive semi-definiteness of the first summand follows equivalent to eq. (A.4), while the second summand
vanishes due to the linearity of X̃ in X.

Corollary A.6. Due to its linearity, the bias of convex FFNNs can be choosen arbitrarily in every layer.
The input of a convex FFNN can be multiplied by any constant matrix.

With corollaries A.4 and A.6, convex neural networks can be constructed. In the first step, this requires
the choice of a convex and non-decreasing activation function. Furthermore, the activation function should
be sufficiently smooth, as the calculation of gradients plays an important role in continuum mechanics. All
of the former requirements can be fulfilled by the following functions.

Theorem A.7 (Log-Sum-Exp function). The Log-Sum-Exp function is defined as

f : Rm → R, f (x) = log

m∑
l=1

exl . (A.6)

26


We use the adaption f0 (x) = f (0, x), since it is closely linked to the Softplus function, see corollary A.8.
Using the weight matrix W ∈ Rm×n as defined in eq. (A.2) and the bias vector b ∈ Rm, we obtain the
adapted Log-Sum-Exp function

LSE : Rn → R, LSE(X) = f0 (W X + b) = log

[
1 +

m∑
l=1

e⟨w
[l],X⟩+bl

]
(A.7)

for neural networks. The LSE function is convex for arbitrary weights and bias, and non-decreasing when
all weights are non-negative. It is smooth for any choice of arguments or parameters, i.e., LSE ∈ C∞(Rn).

Proof. We first show the convexity of the adapted softplus function f0 by proving the positive semi-definiteness
of the Hessian D2

xf0 (x). For this, we simply observe that the inequality

0 ≤ v ·D2
xf0 · v =

1

(1 +
∑m

l=1 e
xl)

2

[
m∑
l=1

v2l e
xl

][
1 +

m∑
l=1

exl

]
−

[
m∑
l=1

vle
xl

]2
 (A.8)

holds for any v ∈ Rm since, due to the Cauchy-Schwarz inequality,[
m∑
l=1

vle
xl

]2

=

[
m∑
l=1

vle
xl/2 · exl/2

]2

≤

[
m∑
l=1

v2l e
xl

][
m∑
l=1

exl

]
≤

[
m∑
l=1

v2l e
xl

][
1 +

m∑
l=1

exl

]
. (A.9)

The LSE function is obtained through the linear transformation

LSE(X) = f0 (x) , x = W X + b (A.10)

of the convex function f0. Linear transformations preserve convexity, therefore the LSE function is also
convex. The first derivative of the LSE function

[DXLSE(X)]i =

∑m
l=1 w

[l]
i e⟨w

[l],X⟩

1 +
∑m

l=1 e
⟨w[l],X⟩ (A.11)

is non-negative for w
[l]
i ≥ 0 ∀ i, l. The smoothness of the functions follows from the smoothness of the

exponential function, and the smoothness of the logarithm on the positive domain.

Corollary A.8 (Softplus function). For m = 1 in eq. (A.7), the LSE function is reduced to the Softplus
(SP) function

SP : Rn → R, SP(X) = log
[
1 + e⟨w,X⟩+b

]
(A.12)

with weights w ∈ Rn and bias b ∈ R. The SP function is convex for arbitrary weights and bias, and
non-decreasing when all weights are non-negative. It is smooth for any choice of arguments or parameters.

Proposition A.9 (ICNNs using SP and LSE functions). ICNNs based on SP functions are built from
multiple layers, with several nodes using SP activation functions in each layer. For the first hidden layer,
the weights of the softplus functions can take arbitrary values, while the other layers must be non-decreasing
functions and, therefore, use non-negative weights. The networks output is given as the non-negative weighted
sum of the last SP layer. The bias in every layer can be chosen arbitrarily.

ICNNs based on LSE functions can be composed of either a single LSE function, or of several LSE
functions. In the first case, the LSE function can be understood as a composition of one layer with several
nodes using exponential activation functions, which are summed up and logarithmized in the next layer. The
function achieves its approximation properties by increasing the amount of nodes in the exponential layer.
As the exponential layer is the first hidden layer, the weights can be chosen arbitrarily. Alternatively, the
convex neural network can be constructed using several LSE functions. We obtain this approach similar to
the one introduced for SP functions, just by replacing the SP functions by LSE functions. In both cases,
the bias in every layer can be chosen arbitrarily.

27


Remark A.10. Constitutive models are often formulated in sets of invariants. In doing so, several important
properties are fulfilled already by the choice of the input quantity. When a set of invariants is used for a
ML model, the networks core must not only be convex, but also non-decreasing in its input, which follows
directly from eq. (A.4). Since the invariants are created by nonlinear functions, e.g., I1 = tr(FTF ), the
subsequent function processing the invariants must be convex and non-decreasing in order to be convex in
(F, Cof F, detF ). For cores based on SP or LSE functions, this can easily be achieved by using non-negative
weights in the first layer.

Remark A.11. The LSE activation function is not included in the current TensorFlow version, and was
manually implemented. While LSE based cores should benefit from the highly flexible activation function and
yield excellent results, the convergence behavior during training was very slow, and no satisfying results could
be obtained. However, this should be seen as a numerical drawback of the implementation and optimization
approaches used in this work rather than a general drawback of the function itself. Thus, only SP based
cores were used for the numerical investigations shown in Sect. 4.

References
[1] C. C. Aggarwal. Neural Networks and Deep Learning. 1st ed. Springer International Publishing, 2018.

[2] B. Amos, L. Xu and J. Z. Kolter. “Input convex neural networks”. In: Proceedings of the 34th International
Conference on Machine Learning. Ed. by D. Precup and Y. W. Teh. Vol. 70. Proceedings of Machine Learning
Research. PMLR, 2017, pp. 146–155. arXiv: 1609.07152.

[3] L. Anand. “On H. Hencky’s approximate strain energy function for moderate deformations”. In: Journal of
Applied Mechanics 46 (1979), pp. 78–82. doi: 10.1115/1.3424532.

[4] J. M. Ball. “Constitutive inequalities and existence theorems in nonlinear elasto-statics”. In: Herriot Watt
Symposion: Nonlinear Analysis and Mechanics. Ed. by R. J. Knops. Vol. 1. London: Pitman, 1977, pp. 187–
241.

[5] J. M. Ball. “Convexity conditions and existence theorems in nonlinear elasticity”. In: Archive for Rational
Mechanics and Analysis 63.4 (1976), pp. 337–403. doi: 10.1007/BF00279992.

[6] D. Balzani, P. Neff, J. Schröder and G. A. Holzapfel. “A polyconvex framework for soft biological tissues.
Adjustment to experimental data”. In: International Journal of Solids and Structures 43.20 (2006), pp. 6052–
6070. doi: 10.1016/j.ijsolstr.2005.07.048.

[7] A. Baydin, B. Pearlmutter, A. Radul and J. Siskind. “Automatic differentiation in machine learning: A survey”.
In: Journal of Machine Learning Research 18 (2018), pp. 1–43. arXiv: 1502.05767.

[8] K. Bertoldi, V. Vitelli, J. Christensen and M. van Hecke. “Flexible mechanical metamaterials”. In: Nature
Reviews Materials 2.11 (2017). Number: 11 Publisher: Nature Pub