Multi-objective model predictive control for microgrids

Abstract Economic model predictive control is applied to a simplified linear microgrid model. Monetary costs and thermal comfort are simultaneously optimized by using Pareto optimal solutions in every time step. The effects of different metrics and normalization schemes for selecting knee points from the Pareto front are investigated. For German industry pricing with nonlinear peak costs, a linear programming trick is applied to reformulate the optimization problem. Thus, together with an efficient weight determination scheme, the Pareto front for a horizon of 48 steps is determined in less than 4 s.


Introduction
Decentralization of the public electricity grid is becoming more and more relevant. While macroscopic solutions will take time, single units, so-called microgrids, already have an interest in self-control of their energy management, e. g. due to rising electricity costs. Model predictive control (MPC) is a fitting control scheme due to its capability of including forecasts of demand, weather conditions and renewable energy production. Economic MPC allows for direct optimization of incentives such as monetary costs, thermal comfort, CO 2 emissions or others. However, these objectives often are contradicting. Thus, a systematic approach to the multi-objective optimization (MOO) problem is necessary. This paper is structured as follows. First, in the remainder of this section, MOO is introduced and a literature review about its connection with MPC and its use in microgrid control is presented. In Section 2, the modeling of a medium-sized company building as a microgrid and its formulation as an optimization problem are discussed. In Section 3, preliminaries of how MOO problems can be solved are given. In Section 4, we present different normalization schemes and metrics to choose solutions of the MOO problem. In Section 5, we analyze the effects of the different selections with long term simulations of the microgrid model with real-world data. Last, we finish with a conclusion in Section 6.

Multi-objective optimization
Consider a MOO problem of the form P MOO : min J(x) = J 1 (x) J 2 (x) ... J q (x) (1) s. t. g j (x) ≤ 0, j = 1, 2, . . . , n ineq (2) h l (x) = 0, l = 1, 2, . . . , n eq , with x ∈ being the decision variable vector, q the number of objectives and n ineq and n eq the numbers of inequality and equality constraints, respectively. In general, there is no unique solution to P MOO . Instead, a hyperplane in ℝ q of non-dominated solution exists. A solution x * is not dominated by any other (feasible) solution x, if and only if there is no x such that J(x) ≤ J(x * ), where ≤ is understood elementwise, and J i (x) < J i (x * ) for at least one i ∈ {1, . . . , q}.
Such a x * is also called Pareto optimal and the set * of all Pareto optimal solutions is called Pareto front. Furthermore, a solution x † is only weakly Pareto optimal if there exists no other solution x such that J(x) < J(x † ).
Many MOO methods further include the Utopia point which is defined by and in general not attainable. Similar, the Nadir point is the combination of all worst Pareto solutions, i. e.

MOO in MPC
The interest in MPC has been rising significantly throughout the last decades. However, despite its optimization nature, no unifying framework for MOO in MPC is available so far. Considering regular MPC with its quadratic objective function, i. e.
the challenge of weight tuning Q and R could be considered an optimization of two competing objectives, i. e. between state control (with Q as its 'weights') and actuator energy (with R as its 'weights'). Literature about this general problem of weight tuning is rich (see [11] and the references therein). However, in most auto-tuning methods, only the system's step or impulse responses are used and evaluated by a different metric. Thus, we do not consider weight tuning for regular MPC itself as an answer to the MOO problem P MOO . A step towards MOO has been made with the introduction of economic MPC, where the quadratic terms in (8) may be replaced by stage costs of arbitrary form. Thus, multiple (competing) objectives such as in (1) may be used. The final cost term V f (x(N p )) is neglected in many cases. Furthermore, frequently multiple objectives are only considered in form of a (weighted) sum without any trade-off consideration.
In [9], a MOO MPC scheme for general nonlinear systems is defined. They consider a finite number of objectives and show that, given some mild assumptions in addition to the usual, the max of all objectives as costs can be used as a Lyapunov function to guarantee stability. In [4], a weighted sum is used. However, the weights are updated in every time step, thus choosing different Pareto solutions. It is shown that, under some conditions on the objectives such as joint convexity, closed-loop stability can be guaranteed. However, for the updating of the weights, a linear programming problem which is not jointly convex in general has to be solved in every time step. An economic MPC scheme with a compromise solution is formulated in [40]. Namely, the authors directly minimize the distance to the Utopia point as in (42) (but without weights). However, they only consider steady-state control and show that, if the objectives satisfy a Lipschitz continuity property and strong duality, stability can be guaranteed.
In [14,15], nonlinear time-discrete systems with an arbitrary number of (contradicting) objectives and without disturbances are considered. Given some assumptions on the system (e. g. an equilibrium point) and terminal costs and a local feedback in the terminal region, the authors propose a stabilizing MPC algorithm for which they show that the infinite-horizon closed loop performance has an upper bound defined by a Pareto optimal control sequence chosen at the beginning of the algorithm. For economic stage costs and similar assumptions, they show that the average closed-loop performance is upperly bounded. While they are able to give statements on the performance of Pareto optimal sequences, the trade-off between the multiple objectives seems to rely on the choice in their algorithms' initialization. Thus, changing conditions over time cannot be respected.
Evolutionary algorithms are used as optimization methods for an economic MPC scheme with a weighted sum of a finite number of objectives in [3]. Furthermore, Smith dynamics are used to dynamically tune the weights such that the solution lies in a pre-defined region of the Pareto front. This so-called management region is defined by the objectives' ratio in a normalized space.

Energy management for microgrids
The control of microgrids, in a broader sense also determined as an energy management system, has experienced huge attention in the research community. Thus, it is impossible to cover all approaches in this work. Several reviews exist focussing on different aspects, see e. g. [18,41] and the references to other reviews therein. Thus, we restrict ourselves to a short survey of the different approaches for both modeling and the (optimal) dispatch control problem of centralized microgrids.
For the modeling of a microgrid, mainly two basic approaches exist. First, ordinary differential or difference equations (ODEs) are determined for all storage systems. Actuators and constraints are incorporated into these ODEs. Depending on the modeling complexity, this results in either linear or non-linear systems, often including binary variables to cover start-up/shut-down decisions. The second approach is to use higher modeling languages or software such as Simulink, Modelica, EnergyPlus or other building performance software (BPS) tools. This way, complex dependencies and rules can be modeled with low effort. However, this comes at the cost of lower insight into the model itself and a generally harder to solve optimization problem. A review of various BPS tools can be found in [26].
While MPC is a popular approach to generate control actions for a microgrid due to the possibility of respecting constraints on states and inputs and forecasts of demands and disturbances, several other methods exist. Not all of them are equally applicable to the two modeling approaches. Despite some heuristic approaches and artificial intelligence methods, an optimization problem in some form is formulated in most cases. Then, the two main options are deterministic or meta-heuristic optimization methods. Deterministic optimization methods include linear, quadratic, mixed integer and nonlinear programming. They might fail in case of harsh nonlinearities modeled in complex BPS models. In this case, meta-heuristic optimization methods including evolutionary strategies, genetic algorithms (GA), Particle Swarm Optimization and others are preferable. These are also the usual combinations: ODE modeling + deterministic optimizers or BPSbased modeling + meta-heuristic optimizers, which can also be described as simulation-based optimization. The MPC framework can be used in both cases.
Examples for a modeling based on ODEs and MPC using deterministic optimization are [16,27,33,39]. The authors of [16] model a simple linear microgrid model with a stationary battery as the only state. As objective, they minimize the unweighted sum of monetary costs and battery lifetime degradation. A time horizon of 12 h with a time step Δt = 20 min is chosen. However, the battery lifetime is a nonlinear function and it is not clear how the resulting optimization problem has been solved (other than with the Matlab optimization toolbox). In [39], a simplified building model with 4 ODEs including an air-handling unit is used. Energy consumption and thermal comfort are considered as objectives. The distance to the Utopia point is directly minimized, which leads to a nonlinear programming problem. However, only one optimization problem per day with Δt = 1 h is solved. In [33], a microgrid in island mode with diesel generators and shiftable loads is considered. The prediction horizon is 24 h. However, to reduce computational expenses, the sampling time varies, i. e. increases from 5 min in the first half hour to 1 h for the last 22 h of the horizon. Fuel and emission costs are minimized. The authors compare different strategies for this MOO, thereby once taking a (fixed) weighted sum and once directly minimize the distance to the Utopia point, which results in a mixed integer nonlinear programming (MINLP) problem, which is solved in reasonable time. However, the results of the one day simulation for the scenario without demand response show that the compromise solution is outperformed by the weighted sum solution. This might be due to non-global optima found for the MINLP. In [27], stochastic MPC with chance constraints and uncertain weather forecasts is used. Namely, constraints for the building temperature are not violated with a likelihood of 1 − α. Samples of the Pareto front (energy use vs. number of constraint violations) are derived by varying α. However, no systematic choice of a trade-off is proceeded.
Examples for complex models based on a BPS and meta-heuristic optimization are [1,19,30], which all consider monetary costs and thermal comfort as objectives. The authors of [19] use a combination of GenOpt [36] and EnergyPlus to optimize temperature setpoints of three building models in an MPC scheme over a time period of 5 days. However, they sample only five Pareto solutions by running their simulation with five different weights on the monetary costs. In [1], a three-room building is modeled with EnergyPlus. Matlab is used to derive optimal control sequences with GA in an MPC framework. Optimization is repeated only every 24 h with Δt = 1 h, possibly due to the long optimization time required (> 1 h), which makes this approach not applicable in real-time. In [30], again the combination of EnergyPlus and GA is used to control the model of a residential building with 6 temperature zones. As decision variables, hourly set points are used. Artificial neural networks are used both to determine a population's fitness as well as to approximate solutions from the GA to reduce optimization time, which allows for online optimization with Δt = 1 h. However, thermal comfort is only defined by being in a certain temperature range in occupied times. If this is violated, a penalty function in the GA algorithm is applied. Thus, no trade-offs in form of an MOO are considered.
Concluding, it is in general a challenging problem to apply sophisticated MOO methods within an MPC framework to microgrid models with increasing complexity. Furthermore, there is a lack in research of how compromises between contradicting objectives are selected best.
Note that this paper extends the work in [32]. It's contributions are as follows. As in [32], we use a fairly simple model of a medium-sized company building based on ODEs with real-world data. Here, we show how industry pricing with peak costs and different costs for buying/selling can be modeled as a convex linear programming optimization problem without the need of binary variables; thus allowing online control with determination of the Pareto front in every time step. In addition to [32], we systematically analyze the effects of three different metrics for automated selections from the Pareto fronts in long term simulations (instead of using only one). Furthermore, we propose a fixed normalization scheme to overcome the issue of highly different objective values due to changing conditions and again analyze its effects. Next to the industry pricing scenario, we also consider participation in the intraday market with time varying costs.

Microgrid control
Note that in the following, sequences of scalars or vectors are typed in bold, e. g.
Furthermore, as is usual in the context of control theory, x denotes the state vector and u the input vector, i. e. (part of) the decision variables for the following optimization problems. The microgrid model presented next has first been introduced in [32].

Modeling
If only one temperature zone is considered, a building's temperature ϑ b can be modeled by an ODE such aṡ where ∑ iQi stands for all thermal powers acting on the building [28]. For the company building considered in this study, these are the heating powersQ chp from a combined heat and power plant (CHP) andQ rad from a gas heating, cooling powerQ cool from an air conditioning system, and heat losses to the groundQ other . C th denotes the building's thermal capacity and H air the heat transfer coefficient to the outside air temperature ϑ air . If (self-discharging) losses are neglected, the change of a battery's energy level E(t) is simply given bẏ where ∑ j P j (t) stands for all electrical powers fed into or drawn from the battery. Here, these are the produced electrical power P chp from the CHP, the power P grid bought from or sold to the grid, the (uncontrollable) renewable energy P ren from the PV system and the (uncontrollable) power demand P dem , e. g. consumption from offices, which must be met at all times. Remark that the CHP can only produce electrical power P chp and thermal powerQ chp at the same time (see Table 1).
Using E(t) and ϑ b (t) as states, a linear state space model of the microgrid is derived, . . .
Note that for the gas heating an efficiency η rad and for the air condition and energy efficiency ratio ε c are considered. By discretizing system (11) with a sampling rate T s , a discrete model in the form of is derived, i. e.
. Constraints x ∈ ⊆ ℝ n on the states, u ∈ ⊂ ℝ m on the inputs, d ∈ ⊂ ℝ q on the disturbances and as well on the rate of change of the states, (x(k + 1) − x(k)) ∈ Δ ⊆ ℝ n , apply. and are compact. Remark that all uncontrollable influences have been modeled as disturbances d. For this study, perfect predictions of d are assumed. Note that input constraints are chosen generously such that all disturbances can be compensated, e. g. P grid ∈ [−1000 kW, 1000 kW] whereas P dem ∈ [−650 kW, 0]. Thus, feasibility can be ensured for any x 0 ∈ . Table 1 gives an overview of the model parameters.

Optimization costs
The optimization costs are formulated in an MPC-manner as usual, i. e.
arg min whereby N p is the length of the prediction horizon, the final cost term V f (x(N p )) = 0 is omitted and ℓ(x, u) are the stage costs, which are described in detail in the following. Please note that, considering the system matrix A in (12), the autonomous system x(k+1) = Ax(k) is Lyapunov stable. Thus, together with the assumption that all disturbances are known, stability guarantees are not further considered in the design of the cost functions. Furthermore, two pricing scenarios are considered in this study, 1) participation in the intraday market with time-varying electricity prices and 2) German industry pricing.

Scenario 1: Intraday market
The stage costs represent our two objectives and thus consist of weighted monetary and comfort costs, The comfort costs are given by the quadratic deviation of the building temperature from a desired setpoint of ϑ set = 21°C, In this first scenario, participation in the electricity intraday market is assumed. Thus, ℓ mon consists of three parts, ℓ market mon (k) = ℓ market mon,grid (k)+ℓ mon,chp (k)+ℓ mon,heat (k). (17) The grid costs depend on the current market price, The costs for using the CHP or the gas heating are fixed and not time-dependent, Summarizing, for t = 0 in the intraday scenario, the optimization problem can then be formulated as , Note that due to the quadratic comfort costs, P intraday is a quadratic programming problem.

Scenario 2: Industry pricing
In the second scenario, German industry pricing is assumed. Then, the monetary costs are given by The costs for the CHP and gas heating are the same as in (19) and (20). However, grid costs are not timedependent, but distinguish between buying from and selling to the grid, ℓ industry mon,grid (k) = c grid,buy ⋅ P + grid (k) . . .
if energy is bought from the grid, and P + grid (k) = 0 if P grid (k) < 0. P − grid (k) is defined accordingly for selling energy to the grid. Energy is bought for c grid,buy = 0.13 e /kWh and sold for c grid,sell = 0.07 e /kWh.
To describe ℓ mon,grid in dependence of our input and decision variable P grid , (24) is formulated as ℓ industry mon,grid (k) = c grid,buy ⋅ max 0, P grid (k) . . .
In addition to the different pricing for buying and selling, the industry pricing scenario includes peak costs, ℓ mon,peak (k) = c grid,peak ⋅ max 0, P grid (k) − P grid,peak (k) . (26) The peak costs refer to the maximum peak within one calendar year which is priced with c grid,peak = 87.38 e /kW. However, since our prediction horizon N p is significantly smaller than one year, we punish every new peak according to the difference to the maximum peak P grid,peak (k) which has occurred until time step k. Furthermore, P grid,peak may change within N p , e. g. if the predicted P grid (k + 2) > P grid,peak (k), then P grid,peak (k + 3) = P grid (k + 2) and for t ≥ k + 3, new peak costs only occur for P grid (t) > P grid,peak (k + 2). A solution to this problem is presented in Section 2.3 Summarizing, for t = 0 in the industry scenario, the optimization problem can then be formulated as s. t. (12), (22a), (22b).
Note that, due to the discontinuities in (25) and (26), P industry is a nonlinear programming problem, which is in general hard to solve.

Problem reformulation
However, using an epigraph reformulation [6,21], it is possible to reformulate this part of the cost function to a relaxed version which still leads to the same optimum. Similar linear programming tricks can e. g. be found in [5].
In (25), the max-term is replaced by an additional decision variable, This is achieved by respecting two addtional constraints, Note that the two sides of (28) are not equivalent, but by replacing the right hand side of (28) in (25) with P pos (k), the new problem formulation is a relaxed version of the old one. Figure 1 illustrates this behaviour. Consequently, if (28) holds, the second max-term in (25) can be expressed as Together, the grid costs can then be expressed as ℓ industry mon,grid (k) = c grid,buy ⋅ P pos (k) . . .
Considering (31) as part of the cost function, it becomes clear that P pos will be minimized to fulfill (28) as long as c grid,buy > c grid,sell , which is true. Note that with this reformulation, N p additional decision variables and 2N p additional constraints are incorporated into the optimization problem.
The same trick can be used to replace the max-term(s) in (26). Namely, it is computationally advantageous to calculate ℓ mon,peak not for every time step k, but over the whole prediction horizon k, . . . , k + N p , since P grid,peak (k + i) depends on P grid,peak (k + i − 1) and only the maximum value of P grid matters. Thus, the peak costs in each optimization step are given by ℓ N p mon,peak (k) = c grid,peak ⋅ max 0, max i=k,...,k+N p P grid (i) −P grid,peak (k) .
Considering (32), our trick has to be applied twice. Namely, first the inner max-term is replaced by an additional deci- Please note that, in contrast to P pos , only one decision variable and N p constraints are added to the optimization problem. Again, (33) will be fulfilled in the optimization due to the minimization of costs.
Considering (33) in (32), the peak costs can be described as ℓ N p mon,peak (k) = c grid,peak ⋅max 0, v(k)−P grid,peak (k) . (35) Again, the max-term in (35) is replaced by and constrained to For replacement (36), one additional decision variable with only two constraints is needed for the entire prediction horizon. Furthermore, (36) will again be fulfilled when minimizing the peak costs, which can now be expressed as ℓ N p mon,peak (k) = c grid,peak ⋅ q(k) (38) and is thus reformulated into a linear programming problem. For ease of presentation, let ℓ industry mon (k) = ℓ industry mon,grid (k)+ℓ mon,chp (k)+ℓ mon,heat (k), (39) i. e. all monetary costs except the peak costs. Then, for t = 0 in the industry scenario, the reformulated optimization problem can then be stated as  Remark that the nonlinear parts of the cost function have been replaced by linear ones, but due to the comfort costs, P reform industry is still a quadratic programming problem.

Preliminaries of MOO
In this section, we summarize how solutions of a MOO problem and how the Pareto front can efficiently be determined.

Choosing solutions
Since for MOO problems no unique solutions exist, usually preferences are used to derive a suitable solution. Thereby, three cases can be distinguished on how to incorporate these, i. e. if they are 1) known a priori, 2) chosen a posteriori or 3) if there are none.

Preferences known a priori
In case of 1), the aim is usually to directly obtain only a single Pareto optimal solution x * instead of the complete set * . Most common, this is achieved by translating the preferences into weights w i for all objectives. Then, different utility functions U can be used and minimized. Most popular is the weighted sum (WS) method It is important to note that, if w i > 0 ∀ i, minimizing (41) is sufficient (but not necessary) for Pareto optimality [22,38]. If w i ≥ 0 ∀ i, the solution may only be weakly Pareto optimal. A different approach is to penalize the distance to the Utopia point [37], The resulting trade-off is called a compromise solution.
Various different methods to include a priori preference in form of weights [2,12,25,34] but also in other, e. g. verbal forms [17], exist. For a more detailed survey, the reader is referred to [22].

Preferences chosen a posteriori
In case of 2), the general idea is to determine the complete Pareto front and manually pick a solution afterwards. This would be done by a human decision maker. Two problems arise. First, visualization of the Pareto front for q > 3 is hard to realize. Second, a dense sampling of the Pareto front has to be determined. This is usually done by repeatingly solving the optimization problem with a method deriving a single Pareto solution such as the WS method. However, major drawbacks of the WS method for this purpose are its incapability of obtaining non-convex parts of the Pareto front and that an even spread of weight sets does not necessarily lead to an even spread of points on the Pareto front [7].
Thus, different methods exist to overcome these issues. Examples are the normal boundary intersection [8] and the normal constraint [24] method. A more recent approach which is used in this work is the adaptive weight determination scheme described in more detail in Section 3.2.
Note that especially for more complex models, metaheuristic optimization methods such as genetic or evolutionary algorithms are often used to approximate the Pareto front. Since for this work only deterministic optimization methods are used, the reader is referred to [13] for a further survey.

No preferences known
In case of 3), i. e. no preferences about the competing objectives are known, basically two options exist.
First, a single solution can directly be obtained without formulating preference such as weights. For example, (41) or (42) can be minimized with w i = 1 ∀ i. Alternatively, game theory approaches can be used [22,35].
The second option is the approach we use in this study. Namely, we choose a solution from the Pareto front based on its form, i. e. so-called knee points. More descriptively, a compromise is sought from which an improvement in one objective would lead to a 'bigger' deterioration in the other objective(s). For two objectives, most common the knee point would be defined as the point where the Pareto front has its biggest curvature (see also [10] for an overview of possible interpretations). However, since usually only samples of the Pareto set are known, choosing a knee point is not trivial. Thus, some metric is necessary to evaluate which solution is considered a knee point. The metrics analyzed in this work are described in detail in Section 4.
Furthermore, before any metric is calculated, the objectives may be transformed, e. g. bỹ which is also called upper-lower bound approach [23] or normalization [29] and a robust transformation. Using (43), all objective values are scaled to [0, 1], which might lead to undesired behaviour in long term control. Thus, we propose an alternative in Section 4.

Determining the Pareto front
The adaptive weight determination scheme (AWDS) [31] uses a geometric interpretation to obtain a well-distributed sampling of the Pareto front. The optimiziation problem is solved multiple times as a weighted sum. However, the different sets of weights are determined iteratively. After q samples of the Pareto front are found, they are used to determine new weights w i . With q = 2, this is equivalent to drawing a straight line through both points and interpreting its gradient m as the ratio between both weights. In the following, J 1 = ℓ mon and J 2 = ℓ comf is assumed. Having found two solutions (ℓ A mon , ℓ A comf ) and (ℓ B mon , ℓ B comf ), new weights w C mon and w C comf are determined by solving Then, the optimization is redone with the new weights w C mon , w C comf . This can be interpreted as shifting the straight line with the gradient m to the origin until it is tangential to the Pareto front, as is illustrated in Figure 2. As a result, we obtain a new Pareto optimal solution (ℓ C mon , ℓ C comf ) which lies between (ℓ A mon , ℓ A comf ) and (ℓ B mon , ℓ B comf ). Moreover, the new solution C is more or less in the middle between A and B -for a circular Pareto front, it would be exactly in the middle. Next, A ↔ C and B ↔ C are used to derive new pairs of weights, respectively, in the same fashion by solving equations such as (44). This is repeated until the distance between two solutions is below a chosen threshold Δd awds . As starting points, the Pareto front's extreme points are used, which are easily determined by solving the optimization problem with w A mon = 1, w A comf = 0 and w B mon = 0, w B comf = 1, respectively. An extension to non-convex problems by using a weighted global criterion method instead of a (linearly) weighted sum is possible [31].
Concluding, the main advantage of AWDS is the efficient determination of the Pareto front with approximately equidistant steps of a desired size without the necessity of reformulating the optimization problem other than as a weighted sum.

Pareto optimization
The optimal control problem described in Section 2 is solved in every time step and the Pareto fronts are determined with the AWDS method described above. In this section, we describe different normalization schemes and metrics to choose solutions (knee points) from them. Furthermore, we illustrate with simulation examples how the different combinations lead to different knee point selections.

Normalizations
We analyze two transformations. 1) we describe the upperlower bound approach (43) as a dynamic normalization adjusted in every time step; 2) we propose a fixed normalization as an adaptation by values chosen a priori.

Dynamic normalization
Let (ℓ i mon (k), ℓ i comf (k)) be the i-th of a total of n samp samples of the Pareto front at time step k. Furthermore, all samples shall be ordered with an increasing ℓ mon , i. e. ℓ i+1 mon > ℓ i mon . Then, the dynamic normalization is given bŷ Namely, the distances from the Utopia to the Nadir point in every dimension, i. e. the width and height of the Pareto front, are scaled to 1. Note that this dynamic normalization respects that both the Utopia and Nadir point change over time due to different initial conditions and forecasted disturbances.

Fixed normalization
Former experiments [32] showed that in the microgrid setting considered here, the differences from the Nadir to the Utopia point especially for ℓ mon can vary extremely over time. Thus, automated decisions based on metrics may not be result in the solutions anticipated before. To overcome this issue, we propose an alternative to the regular upperlower bound approach as in Eqs. (45), (46). In the so-called fixed normalization, the transformed values are given bŷ where Δℓ fix mon and Δℓ fix comf are constant values for every time step. Please note that while different transformation schemes have been analyzed before [23], the authors could not find any reference for the proposed transformation. Usually, only estimates of the maximum possible value of an objective are used.
However, choosing appropriate values is not trivial. Figures 3 and 4 show the proportions of the Pareto fronts'  widths, i. e.
at time step k, for a one year simulation of the intraday scenario. Note that this distribution depends on the solutions which are chosen in the simulation itself. Here, dynamic normalization together with the CUP metric have been used. However, the resulting distributions are similar for all combinations. For the intraday scenario, the mean values are taken, which cover 38.13 % of all Δℓ mon and 46.32 % of all Δℓ comf . The resulting distributions for the industry scenario are presented in Figures 5 and 6. Figure 5 shows that the  mean value of all Δℓ mon is, in comparison to the mean of all Δℓ comf in Figure 6, highly shifted to the right. This is due to the occuring peak costs. Thus, taking the means as Δℓ fix mon and Δℓ fix comf is not appropriate. Instead, we make use of the bend in the distribution of Figure 5. This corresponds to Δℓ fix mon = 272 and lies above 91.80 % of all occuring Δℓ mon . Accordingly, we choose Δℓ fix comf = 153.5, which covers 91.82 % of all Δℓ comf . Table 2 summarizes the constant scaling values for the fixed normalization.

Metrics
We investigate three different metrics for the selection of a knee point: 1) the Euclidean distance to the Utopia point, i. e. the closest to Utopia point(CUP) solution is chosen; 2) the minimum angle to neighbor points (ATN) and 3) the minimum angle to the extreme points (AEP) of the Pareto front is considered best. Note that all metrics are determined after normalization.

CUP
The Euclidean distance to the Utopia point is given by

Comparison on real Pareto fronts
To gain insights on the different selections with each normalization and metric, three typical Pareto fronts in the industry scenario are examined. Figure 7 shows a setting in winter when heating is necessary. Due to the low Δℓ comf , the fixed normalization shifts all selections to the left. However, the ATN solution is in general the least sensitive to this behaviour, as can be seen in the size of its shift in comparison to CUP and AEP. Figure 8 shows the selections for a cooling scenario in summer without possible peak costs. Here, Δℓ mon is scaled Figure 8: Pareto front in the industry scenario and ϑ air > 21°C, i. e. cooling is necessary. Since Δℓ comf ≈ Δℓ fix comf and Δℓ mon < Δℓ fix mon , the CUP and AEP selection are shifted to the right when fixed normalization is used. stronger than Δℓ comf for the fixed normalization. Thus, the CUP and AEP solutions are shifted to the right. While the AEP knee is shifted the most, the ATN metric is not sensitive enough to be shifted.
In Figure 9, a Pareto front with high peak costs is shown. Here, a clear knee point can be deducted. However, only for the dynamic normalization all metrics find it. For the fixed normalization, all selections are shifted to the left due to Δℓ mon >> Δℓ fix mon . This results in trajectories with higher ℓ comf .
Despite the last setting with peak costs, the Pareto fronts and the knee selections for the intraday scenario look similar. The examples show that metrics based on an angle, i. e. ATN or AEP, are more sensitive to the front's curvature than using the CUP. The fixed normalization mainly shifts the solutions to one side, depending on the relation of Δℓ comf Δℓ fix comf to Δℓ mon Δℓ fix mon , i. e. whether it is greater or smaller. However, Figure 8 shows an example where the ATN does not follow this logic and illustrates one possible drawback of it, i. e. its sensitivity to the distance between sample points. In summary, the CUP metric seems to be the most robust one. However, long time simulations are necessary to evaluate the resulting trajectories and whether the shifting from the fixed normalization is beneficial or not.

Simulation results
All simulations have been done in Matlab ® with use of the YALMIP toolbox [20]. We use CPLEX as solver in its demo version, which has a limit of maximum 1000 decision variables and constraints each. With the reformulations from Section 2.3, a prediction horizon of 24 h ∧ = 48 steps results in 291 decision variables and 915 constraints for the industry scenario. The creation of the Pareto front in a single step takes less than 4 s, leading to a simulation time of ≈ 2.5 min for one day and ≈ 15 h for one year on a regular desktop PC with an Intel i5. Figure 10 illustrates the results for the intraday scenario and simulation with data from 2018. Note that, to support interpretation, the model has been simulated with w mon = 1, w comf ≈ 0, i. e. with a focus only on monetary costs to obtain a lower limit on how much money has to be spent only on fulfilling P dem (while making use of the stationary battery). This value of 98341 e has been sub- stracted from all other simulation results for Figure 10. All numerical values without this adaption can be found in the Appendix in Table 3. Note that, due to numerical reasons and to ensure Pareto optimality (and not only weak Pareto optimality), w comf has actually been set to a very small value (10 −5 ). However, the effect on J mon is negligible.

Intraday scenario
In addition to the three metrics for both dynamic and fixed normalization presented in Section 4, we also simulated the entire year with fixed weights (i. e. without Pareto front construction and knee point selection). First, w mon = w comf = 0.5 is the setting for no weighting between the objectives and can be considered as a baseline result. Second, the mean weights from the CUP (dynamic) simulation have been used as fixed weights, i. e. w mon = 0.14, w comf = 0.86. Note that this setting can also be considered as a comparison baseline. However, this (sophisticated) weighting could not have been done a priori. Last, only focus on comfort has been put, i. e. w mon ≈ 0, w comf = 1. Again, due to the reasons described before, w mon = 10 −5 for this setting.
In comparison to the unweighted setting (w mon = w comf = 0.5), all settings with knee point selection significantly reduce J comf (≈ 60 to 80 %) with only a slight increase of J mon (≈ 4 to 8 %). The same is true for the fixed weighting with w mon = 0.14, w comf = 0.86, which results in an even lower J comf but higher J mon than any knee point setting. However, as has been stated before, this setting could not have been chosen a priori. Surprisingly, when the CUP is chosen as metric, the fixed normalization outperforms the dynamic normalization.
If the knee point is chosen by an angle metric (either ATN or AEP), the fixed normalization leads to a stronger focus on costs, i. e. a lower J mon and higher J comf . Note that this behavior might change with different normalization scales, which have been derived from a one-year simulation with CUP (dynamic) as described in Section 4.1.

Industry scenario
The results for the industry scenario are illustrated in Figure 11. Again, for ease of interpretation, the minimum monetary costs have been subtracted and all numerical results are given in the Appendix, Table 4.
Note that (unadapted) J mon is overall higher due to the higher grid (and peak) costs. Furthermore, if no focus is set on comfort (w mon = 1, w comf ≈ 0), J comf is less than half as high as for the intraday scenario. This is due to the stronger use of the CHP, whose electricity is cheaper than buying from grid in the intraday scenario. However, for the other settings, the comfort costs are in general higher, since a stronger focus on minimizing the monetary costs is put due to the higher grid costs. In contrast to the intraday scenario, here the fixed normalization always leads to a lower J mon and higher J comf . This is more severe if an angle metric is used (both ATN and AEP).
Note that the CUP (dynamic) setting is the only one (besides w mon = 1, w comf ≈ 0) which 'voluntarily' accepts new peak costs. Namely, in all other settings, the maximum peak is 384.6 kW and dictated by a high P dem (k = 1081) = −630.55 kW at the 23rd of January. However, the CUP in the dynamic normalized space is associated with new peak costs twice within three time steps on the 23rd of June and goes up to 394.62 kW. This results in 875.347 e higher peak costs and leads to the CUP (dynamic) setting being slightly outperformed by the AEP (dynamic) solutions.
In summary, the results suggest that using the CUP is the most robust metric. For the angle based metrics, the fixed normalization shifts the focus further on reducing monetary costs at the expense of a lower thermal comfort. This effect is more severe for the industry than for the in-traday scenario. The most promising combination seems to be the CUP metric together with the fixed normalization, since in the intraday scenario it even outperforms the CUP with dynamic normalization and in the industry scenario it avoids unnecessary peak costs (in comparison to the dynamic normalization). Alternatively, mean weights resulting from the knee point selections can be used a posteriori.

Conclusion
In this study, economic MPC has been applied to a simplified linear microgrid model of a medium-sized company building with real-world disturbance data. Thereby, two objectives (monetary costs and temperature comfort) have simultaneously been optimized by exploitation of the Pareto front in every time step. The effects of different metrics and normalizations for knee point selection have been investigated for two realistic electricity pricing scenarios, i. e. participation in the intraday market and German industry pricing with peak costs.
The usual obstacle against construction of the whole Pareto set for control purposes, i. e. high computational costs, have been solved by 1) an efficient Pareto front calculation and 2) the reformulation of the originally highly nonlinear peak costs into a linear programming problem. Results showed advantages from choosing the Pareto solution closest to the Utopia point with a proposed fixed normalization scheme. Alternatively, mean weights can be obtained from long time simulations with Pareto optimization. However, further investigations are necessary to assess whether these fixed weights are still appropriate for new data sets. Furthermore, a more sophisticated building model with more temperature zones will be analyzed in the future.