ORIGINAL ARTICLE Ubiquitous multi-occupant detection in smart environments Daniel Fährmann1 • Fadi Boutros1 • Philipp Kubon1 • Florian Kirchbuchner1 • Arjan Kuijper1,2 • Naser Damer1,2 Received: 14 December 2022 / Accepted: 20 October 2023 / Published online: 27 November 2023 � The Author(s) 2023 Abstract Recent advancements in ubiquitous computing have emphasized the need for privacy-preserving occupancy detection in smart environments to enhance security. This work presents a novel occupancy detection solution utilizing privacy-aware sensing technologies. The solution analyzes time-series data to detect not only occupancy as a binary problem, but also determines whether one or multiple individuals are present in an indoor environment. On three real-world datasets, our models outperformed various state-of-the-art algorithms, achieving F1-scores up to 94.91% in single-occupancy detection and a macro F1-score of 91.55% in multi-occupancy detection. This makes our approach a promising solution for improving security in smart environments. Keywords Human activity recognition � Machine learning � Pattern recognition � Safety 1 Introduction Occupancy detection (OD) refers to detecting the presence and absence of occupants [1]. OD is not only of scientific interest, but has practical applications, as emphasised by the following examples. Improper operation of heating, ventilation and air-conditioning (HVAC) systems can waste large amounts of energy inside indoor environments [2, 3]. Manually controlled HVAC systems do not properly adapt to changes in the environment. This eventually leads to situations in which the system is running and consuming energy, although nobody is present to benefit from it. High costs for electricity and negative economical and ecologi- cal implications are the consequences [3]. OD technologies can enhance human comfort by enabling automated HVAC systems that regulate climate depending on the presence of individuals detected in an indoor environment. Unautho- rised access to restricted areas (e.g., office buildings or industrial facilities) can be a thread for infrastructure or the safety of individuals [4]. The ability to detect whether one or multiple individuals are present in restricted areas con- tributes to more secure smart environments. Previous work on OD either discussed approaches for simple OD (i.e., the detection of an individual) [5–12] or multi-occupancy detection (i.e., determining whether one or multiple individuals are present) [13, 14]. Privacy-in- vasive sensors (i.e., cameras and microphones) [15, 16], as well as privacy-aware ambient sensors have been used to monitor indoor environments [5–14]. Privacy-aware ambient sensors are important because they preserve the privacy of individuals. Detection systems that are based on privacy-aware sensing technologies usually find greater public acceptance. Privacy-aware ambient sensors measure characteristics like temperature, humidity or carbon diox- ide (CO2) concentration. The time series data recorded by & Daniel Fährmann daniel.faehrmann@igd.fraunhofer.de Fadi Boutros fadi.boutros@igd.fraunhofer.de Philipp Kubon philipp.kubon@gmail.com Florian Kirchbuchner florian.kirchbuchner@igd.fraunhofer.de Arjan Kuijper arjan.kuijper@igd.fraunhofer.de Naser Damer naser.damer@igd.fraunhofer.de 1 Smart Living and Biometric Technologies, Fraunhofer Institute for Computer Graphics Research IGD, Fraunhoferstraße 5, Darmstadt 64283, Hesse, Germany 2 Department of Computer Science, Technical University of Darmstadt, Hochschulstraße 10, Darmstadt 64283, Hesse, Germany 123 Neural Computing and Applications (2024) 36:2941–2960 https://doi.org/10.1007/s00521-023-09162-z(0123456789().,-volV)(0123456789().,-volV) https://orcid.org/0000-0002-5820-5733 http://crossmark.crossref.org/dialog/?doi=10.1007/s00521-023-09162-z&domain=pdf https://doi.org/10.1007/s00521-023-09162-z ambient sensors enables ubiquitous OD in smart living environments. This work presents a novel solution for OD. Our solu- tion uses time series data recorded only by privacy-aware ambient sensors to detect individuals. Our solution is based on a Bidirectional gated recurrent unit (BiGRU) architec- ture that takes the temporal dependency of successive sensory signals into consideration. Besides general OD, our proposed solution can differentiate whether one or multiple individuals occupy an indoor environment. We organized the structure of this work as follows. Section 2 presents previous work on OD. Section 4 pre- sents the OD solution proposed in this work, including the underlying Artificial Neural Network (ANN) architecture, as well as the definitions of the OD tasks. Section 5 pre- sents details on the datasets we used for our experiments. Section 6 presents the experiments, including the dataset preprocessing procedures, as well as the ANN hyper-pa- rameter configurations we used. In Sect. 7, we discuss the results and compare the detection performance of our solution to state-of-the-art algorithms. Finally, Sect. 8 presents the conclusion. 2 Related work This section presents previous work on privacy-aware OD since the solution presented in this work is also based on privacy-aware sensing technology. We disregard work on privacy-invasive OD (i.e., methodologies that are based on privacy-invasive sensing technology), as this is outside the scope of this work. Pirttikangas et al. [17] examined the challenges and technologies in occupancy sensing, emphasizing the importance of enhancing location accuracy and considering privacy aspects. Their work also highlighted the potential of sensor fusion techniques and identified emerging research areas in predictive user behavior analysis. Fol- lowing the work of Pirttikangas et al. [17], which under- scored the importance of privacy aspects, a variety of approaches for privacy-aware OD were suggested in pre- vious work [5–14]. Dong et al. [5] created a comparatively big test envi- ronment, including a larger workplace with many rooms and floors. The researchers installed various types of ambient sensors including CO2, carbon monoxide (CO), total volatile organic compounds (TVOC), fine particulate matter (PM2.5), acoustic, illumination, motion, tempera- ture and humidity sensors in the experimental environment. Using the sensors, the researchers recorded a dataset over several weeks and determined the best combination of features based on the relative information gain (RIG). Regarding their investigations, CO2 and acoustic information is the most important features. The selected features were used to train various machine learning models, including traditional and deep learning models to determine the number of occupants. In [13], Han et al. proposed an approach to detect multiple individuals based on CO2 measurements. The authors assumed that the amount of CO2 generated in an indoor environment scales linearly with the number of occupants. They suggested the use of a dynamic ANN that processes time-delayed information. Their time-delayed neural network (TDNN) enables the extraction of features from time windows instead of just considering a single data point without context. The authors collected CO2 data sampled in one-minute intervals over five days. Their experimental results show that the use of time-delay information enhances detection performance. The size of the time window considered was optimal at approximately 7 min. The authors observed time-delayed responses from their ANN architecture, which they explained by gas dis- persion in space. However, the authors did not explore alternative architectures (e.g., RNNs) to cope with the time dependency of sensory signals. Nor did they investigate the influence of additional environmental sensors (e.g., tem- perature and humidity sensors) on predictive performance. Alhamoud et al. [6] investigated OD and indoor local- ization in a home environment based on bluetooth tech- nology. The authors established a bluetooth personal area network (PAN) using three bluetooth USB sticks (i.e., beacons) in different rooms. They configured a mobile phone to run a bluetooth discovery search periodically to record the received signal strength indicator (RSSI) data (i.e., signal strength) from all three beacons. By measuring signal strength, their approach recognized whether a person is present in the room. The authors investigated supervised, as well as unsupervised models, including k-means clus- tering, support vector machine (SVM), and an ANN. The authors showed that person detection and localization with bluetooth is feasible in their setting. However, their approach may not be applicable to every common home environment because it requires isolated rooms to avoid the influence of other bluetooth signal sources. In [14], the authors explored OD based on CO2 con- centration. The authors did not apply machine learning models, but built a system based on differential equations that is driven by proxy sensing. However, their approach requires explicit knowledge about the location of the sensors. Candanedo and Feldheim [7] conducted experiments using multiple types of environmental sensors. They investigated different combinations of temperature, humidity, light (i.e., brightness), and CO2 sensory data to detect occupancy in a binary classification scenario. The authors created a dataset and investigated the performance 2942 Neural Computing and Applications (2024) 36:2941–2960 123 of several classification models, including classification and regression trees (CART), random forests (RF), gradient boosting machine (GBM), and linear discriminant analysis (LDA). However, the authors did not evaluate the perfor- mance of more complex ANN models. In [8], Szczurek et al. proposed an approach to deter- mine the number of occupants. The authors measured CO2 concentration, temperature, and relative humidity using a single combined sensor device in an intermittently used university classroom. The authors compared the perfor- mance of k-Nearest Neighbors (k-NN) to LDA. Their experimental results show that OD performance can be improved using a combination of sensors, especially when combining temperature and humidity data. Wang et al. [9] suggested an approach for determining the number of occupants inside an office room. The authors combined data from several environmental sensors with Wi-Fi information to increase detection performance. The environmental sensors were used to continuously measure CO2 concentration, temperature, and relative humidity over three days. Wi-Fi connection requests and responses were captured based on MAC addresses and RSSI. The authors evaluated three different machine learning models, including k-NN, SVM, and an ANN, and investigated different combinations of sensory data. Their ANN per- formed best when using only environmental sensor data. When using only Wi-Fi data, their ANN performed less well. Hoori et al. [10] proposed the multicolumn radial basis function network (MCRN) mechanism that improves upon the traditional radial basis function network (RBFN) algorithm. Their MCRN mechanism divides the training set of a dataset into smaller subsets, using the k-dimensional tree (k-d tree) algorithm. The authors also reported on the OD performance of several baseline algorithms. Liu et al. [11] proposed their multivariate convolutional neural network (MVCNN) approach. The authors used their model for the classification of multivariate time series and applied it to the OD task. In [12], the authors applied the double deep Q-learning (DDQN) reinforcement learning algorithm to multivariate time series classification scenarios, including occupancy and fall detection. The authors extended the DDQN algo- rithm using a prioritized experience replay (PER) strategy to improve detection performance in rare event classifica- tion tasks. Earlier work in the field of multi-occupancy detection focused heavily on the use of CO2 concentration to detect multiple individuals [13, 14]. However, the use of CO2 concentration has a significant disadvantage, as it is strongly influenced when windows are open or when stoves are used. The solution presented in this work differs because it utilizes data from a variety of different ambient sensors (excluding CO2 concentration) for multi-occu- pancy detection. 3 Problem statement OD in indoor environments is a crucial component for enhancing security and enabling smart living applications. However, deployments of OD technologies often face challenges such as invasion of privacy, computational overhead, and limited occupancy state resolution. This section outlines the problems addressed in this work. Privacy-friendly sensing Many OD approaches rely on cameras and other invasive sensing technologies such as microphones [15, 16]. Such approaches often face resis- tance from occupants due to privacy concerns. A primary problem addressed in this work is an OD solution that respects the privacy of the occupants. Occupancy state resolution Traditional OD approaches mainly focus on binary classification, i.e., whether a space is occupied or not [5–12]. However, in certain scenarios, it is essential to determine whether one or multiple individ- uals are present in a given space. The second problem this work addresses is the distinction between one and multiple occupants. Moreover, the complexity of this issue extends to a dataset that we have preprocessed in a novel manner to make it suitable for multi-occupant detection. To address these problems, this work proposes a novel solution for multi-occupant detection, aiming to preserve the privacy of the occupants. This also encompasses the adaptation of a dataset, which we have innovatively pre- processed to make it suitable for multi-occupant detection experiments. 4 Methodology This section presents our novel occupancy detector and its components. Our solution is based on the BiGRU architecture. The BiGRU architecture is special because it considers the time dependency of successive signals from the ambient sen- sors, making it particularly well-suited to OD, especially for multi-occupancy detection, where the sensory signals caused by multiple individuals may be temporally separated. In the following, we describe the processing pipeline of our OD solution, as visualized in Fig. 1. The first step involves preprocessing an input dataset. Next, the prepro- cessed data is fed into our BiGRU model, which accepts consecutive samples as input. In the final step, the occu- pancy verdict is obtained. Our detector can handle two Neural Computing and Applications (2024) 36:2941–2960 2943 123 different tasks in indoor environments, defined as occu- pancy detection and multi-occupancy detection: Definition 1 (Occupancy detection) refers to the detection of the presence of at least one person by means of ambient sensors in an indoor environment at a given time or time span. Definition 2 (Multi-occupancy detection) refers to deter- mining whether only one or multiple individuals are pre- sent in an indoor environment at a given time or time span. 4.1 BiGRU model architecture This section presents the ANN architecture of our BiGRU model. The architecture is visualized in Fig. 2. The architecture comprises bidirectional gated recurrent unit (GRU) layers, as well as fully connected layers with various activation functions. Section 4.2 provides details about the BiGRU layers in our architecture. We created two versions of our BiGRU architecture, which we refer to as BiGRU2 and BiGRU4. Both versions differ in the number of BiGRU layers they contain. The numbers in the model names indicate how many BiGRU layers were used Dataset preprocessing BiGRU Model Subsequential input of samples Xt,…,Xt+k Xt+k,…,Xt Dataset Classifier verdict Task A Task A Task B Task Task B 0 Individuals 1 Individual 2 Individuals P Individuals One-hot encoding 0 0 1 0 Fig. 1 The processing pipeline of our OD solution. After preprocessing a dataset, a sequence of consecutive samples is fed into our BiGRU model to obtain the occupancy verdict GRU Sigmoid GRU Sigmoid Temporal direction Temporal direction GRU Sigmoid GRU Sigmoid Temporal direction Temporal direction Dense LeakyReLU Dense Softmax Average Average Subsequent passes of latent representations ht, ht+1,…,ht+k Only final latent representation ht+k Subsequential input of samples Xt, Xt+1,…,Xt+k Xt+k,…,Xt+1,Xt (Reversed) (Directed) Layer type Average representation Output layer Dense LeakyReLU Classifier verdict Fig. 2 The ANN architecture of our BiGRU occupancy detector consists of BiGRU layers, two fully connected layers, and a softmax output layer. The outputs of the BiGRU layers are combined by averaging 2944 Neural Computing and Applications (2024) 36:2941–2960 123 in the respective model instances. Table 1 lists the building blocks of the particular BiGRU models. The input to the architecture consists of time-series data divided into windows. Each window represents k succes- sive samples, where each sample contains n features. Section 6.2 elaborates on the method we used to extract windows from the datasets. The k successive samples in a window are sequentially processed through the BiGRU layers, resulting in a sequence of states ht; htþ1; . . .; htþk for each window. These states are then averaged before the latent vector progresses to the subsequent BiGRU layer. The GRU layers employ Tanh and Sigmoid activation functions for recurrent steps. We introduced dropout layers between the GRU layers with a dropout probability of 30% to aid in regularization. The final BiGRU layer outputs only the last state representation, htþk, instead of a sequence of states. The dense layers utilize the LeakyReLU [18] activation function. The output layer is a softmax layer with p outputs. In the case of single-occupancy detection, the model is trained using the binary cross-entropy loss function. Conversely, for multi-occupancy detection where the classification problem is not binary, the cross-entropy loss function is implemented for model training. The general ANN architecture design remains similar for both OD tasks. Further details regarding the adaptations made, specific hyper-parameter configurations utilized, and the dataset preprocessing procedures applied will be discussed in Sect. 6. 4.2 Bidirectional gated recurrent unit This section describes the GRU layers we use in our architecture. We use GRU [19] layers to capture the time- dependency between sequential sensory data. A GRU has an update gate zt, a reset gate rt, and a memory ĥt. The update gate zt determines how much impact the memory has on the current hidden state ht, while the reset gate rt controls the influence of the previous hidden state ht�1 that is stored in the memory. According to [20], a GRU can be mathematically expressed with: zt ¼ rðWzxt þ Uzht�1 þ bzÞ ð1Þ rt ¼ rðWrxt þ Urht�1 þ brÞ ð2Þ ĥt ¼ /ðWhxt þ Uhðrt � ht�1Þ þ bhÞ ð3Þ ht ¼ ð1� ztÞ � ht�1 þ zt � ĥt; ð4Þ where xt is the input at time step t, zt the update gate, rt the reset gate, ĥt the memory, ht the hidden state, r and / the logistic function and hyperbolic tangent activation func- tions, and Wz, Wr, Wh, Uz, Ur, Uh the weight matrices, and bz, br, bh the biases. The input that is fed into a GRU layer comprises data points of a time series, where the number of time steps considered matches the size of the GRU layer. GRUs have the limitation that time series data is only considered in a certain temporal direction (e.g., the state ht is computed based on past observations). To overcome this limitation, Table 1 The ANN building blocks of our BiGRU occupancy detectors. The table lists two different versions with 2 GRU layers (BiGRU2) and 4 GRU layers (BiGRU4). The particular architecture blocks, the type of the blocks, and the output shapes are listed, where k denotes the number of time steps, n the number of input variables, p the number of individuals to be detected, and d1; d2; d3; d4 the hidden dimensions of the GRU layers Model Block (type) Input from block Output shape BiGRU2 1. Input (InputLayer) (k, n) 2. Bidirectional layer (GRU) 1 (k, d1) 3. Dropout 2 (k, d1) 4. Bidirectional layer (GRU) 3 (d2) 5. Hidden layer (dense) 4 n 6. Hidden layer (dense) 5 k 7. Output (Softmax) 6 (pþ 1) BiGRU4 1. Input (InputLayer) (k, n) 2. Bidirectional layer (GRU) 1 (k, d1) 3. Dropout 2 (k, d1) 4. Bidirectional layer (GRU) 3 (k, d2) 5. Dropout 4 (k, d2) 6. Bidirectional layer (GRU) 5 (k, d3) 7. Dropout 6 (k, d3) 8. Bidirectional layer (GRU) 7 (d4) 9. Hidden layer (Dense) 8 n 10. Hidden layer (Dense) 9 k 11. Output (Softmax) 10 (pþ 1) Neural Computing and Applications (2024) 36:2941–2960 2945 123 we included BiGRU layers in our models. BiGRU layers consist of two separate GRU layers, one processing the input sequence in the forward direction and the other processing it in the reverse direction. This allows our model to capture information from both past and future contexts. The outputs of the forward and backward GRU layers are then combined through averaging to obtain a combined latent state representation. Traditional RNNs are thrown towards to the challenges of exploding and vanishing gradients, which mitigates their learning capabilities, especially in capturing long-range dependencies [21]. Pascanu et al. [22] highlighted that LSTMs and GRUs were introduced as innovations specif- ically aimed at mitigating the vanishing gradient problem. These architectures employ gating mechanisms that regu- late the flow of information, making them more adept at learning dependencies over longer sequences. However, Rehmer and Kroll [23] showed that the gradient of the GRU is usually smaller than that of the traditional RNN, at least within the constraints of their experimental setup. According to their insights, these smaller gradient magni- tudes in GRUs contribute to a smoother loss function. This in effect counteracts against excessively large gradients, providing a more stable training dynamics [23]. The BiGRU models, as an extension of the GRU architecture, inherit these properties. It is imperative to note that while the GRU and consequently BiGRU architectures have mechanisms that make them more robust compared to traditional RNNs in handling long-range dependencies, they do not solve the vanishing or exploding gradient problem, which remain areas of ongoing research and development. 4.3 Complexity analysis This section compares the space complexity between simple RNN models, our BiGRU2 model, and previously proposed RNN architectures. Understanding the storage requirements of these models is important for assessing their feasibility in real-world applications. Our BiGRU2 model, an advanced variant of an RNN, incorporates BiGRU layers that allow the model to access both future and past context of the input sequence from its current state [24]. While long short-term memory (LSTM) [25] layers are similar to GRUs, we use GRUs in our models for efficiency reasons. GRUs are computationally more efficient than LSTMs since GRUs have two gates (reset and update gates) [19], whereas LSTMs have three gates (input, output, and forget gates) [25]. GRUs can still efficiently capture dependencies in sequences with only two gates because their reset and update gates work col- laboratively to control the flow of information to be remembered or discarded at each time step. According to [20], the parameter count of a GRU is given by 3ðIoþ o2 þ oÞ, where I is the input dimension and o is the output dimension. This is because there are three sets of operations that require weight matrices of this magnitude. In comparison, LSTMs have 4ðIoþ o2 þ oÞ parameters [20]. Therefore, GRUs have a lower space complexity than LSTMs. For the experiments presented in this work, we used the Keras API1 to implement our models, where the total number of parameters for a simple GRU is 3ðIoþ o2 þ 2oÞ, accounting for the inclusion of two sep- arate biases for computational reasons. Let c denote the context, I the input dimension, d1; d2 the hidden dimensions, and o the output dimension. The number of parameters of the BiGRU2 model layers are given by: 1. First bidirectional layer (GRU): 2� 3ðId1 þ d21 þ 2d1Þ 2. Second bidirectional layer (GRU): 2� 3ðd1d2þ d22 þ 2d2Þ 3. Hidden layer (Dense): d2 � I þ I 4. Hidden layer (Dense): I � cþ c 5. Output Layer (Softmax): c� oþ o Therefore, the total parameter count of the BiGRU2 model utilized in this work is given by Eq. (5). Pðc; I; d1; d2; oÞ ¼ 6ðId1 þ d21 þ 2d1 þ d1d2 þ d22 þ 2d2Þ þ d2I þ I þ Icþ cþ coþ o: ð5Þ Considering the importance of space complexity for practical application of RNN models, we compared the space complexity of the BiGRU2 model against previously proposed RNNs in Table 2. Our BiGRU2 model exhibits a higher space complexity compared to that of a simple BiGRU. However, it is important to note that the hyper- parameter configurations of our model were specifically optimized for the multi-occupant detection application and the corresponding evaluation scenarios presented in this work. As shown in Eq. 5, the space complexity of the BiGRU2 model primarily consists of components corre- sponding to the two bidirectional recurrent layers. Notably, the parameter counts in BiGRU2 have a linear dependence on the context c, as opposed to some of the other models [26–28] where the dependence is quadratic. The linear dependence on the context c can be beneficial in applica- tions where the context varies, as it ensures a more pre- dictable and scalable growth in parameter counts. The time complexity, commonly measured in floating point operations (FLOPs) or multiply-accumulate opera- tions (MACs), of the models listed in Table 2 heavily depends on various factors, including implementation 1 https://keras.io/api/layers/recurrent_layers/gru/. 2946 Neural Computing and Applications (2024) 36:2941–2960 123 https://keras.io/api/layers/recurrent_layers/gru/ details, the choice of activation function, the number of features, and the temporal context considered. The specific implementations of the models may differ depending on the deep learning frameworks or libraries utilized, as dif- ferent optimizations and parallelization techniques are employed to perform computations efficiently. The actual FLOPs required by the models depend heavily on the implementation details and hyperparameter choices. Therefore, in Sect. 6, we report on the actual number of FLOPs required in a forwards pass by our models for each experiment. 5 Datasets This section presents datasets that contain privacy-aware ambient sensor readings and occupancy annotations, recorded in indoor environments. 5.1 Dataset overview Many datasets exist that contain privacy-invasive and pri- vacy-aware sensor measurements, whereof some are specifically designed for the development and evaluation of methodologies that should solve the problem of OD. Among the datasets, only very few are appropriate for the development and evaluation of the solution proposed in this work. We consider a dataset to be suitable if it is public, annotated with labels and recorded by privacy- aware monitoring sensors. Table 3 summarizes datasets that contain privacy-aware ambient sensor readings and occupancy annotations. Table 4 lists additional details about the datasets. We did not consider datasets that were recorded using cameras or microphones for the development and evaluation of the solution proposed in this work because of their privacy- invasive nature (i.e., they pose a threat to the privacy of individuals). The office building sensing [5], room occupant estima- tion [13] and class room occupancy [8] datasets are rele- vant because they feature ambient characteristics such as CO2, temperature and humidity sensor readings. However, these datasets are not publically available. The Bluetooth beacons dataset [6] does not contain usual ambient sensor measurements but RSSI recordings. The building fusion set [9] combines ambient information with Wi-Fi data. Both the Bluetooth beacons and the building fusion set are not publically available. The Harvard ODDs [31] dataset contains ambient measurements, including light, tempera- ture, as well as power meter readings and weather data. The Harvard ODDs dataset is publically available. However, it does not contain any labels and therefore cannot be used in supervised learning scenarios. The dataset used in [32] contains state-change sensor measurements and activity labels. The state-change sensors fit into the category of ambient sensors, and were taped onto objects. However, the used sensors are outdated. The UCI occupancy detec- tion [7], Kasteren Ubicomp [29], and CASAS HH101-130 [30] datasets are the most important for the development of the solution proposed in this work and have been utilized in the experiments. These datasets and the novel preprocess- ing procedure we applied will be presented in Sect. 5.2. 5.2 Utilized datasets and preprocessing This section presents the datasets we utilized for the development and evaluation of our proposed method and how we preprocessed the particular datasets. 5.2.1 UCI occupancy detection dataset We consider the UCI occupancy detection [7] dataset to be the most appropriate dataset for human OD experiments. It was created specifically for the development of OD algo- rithms. The dataset contains recordings from ambient sensors like temperature, humidity, light and CO2 that monitored a small office room, as well as the timestamp of the recording. Currently, many smart buildings already contain this composition of sensors. The humidity ratio was calculated for every sample. The dataset also contains ground truth binary labels that indicate whether the room is occupied. The researchers generated the ground truth labels automatically with the help of a video surveillance system. However, the binary labels do not indicate how many people were present in the office environment. The number of samples, as well as the overall class distribution of each partition of the dataset, are listed in Table 5. The number Table 2 Comparison of the space complexity between simple RNNs, previously proposed RNN architectures, and our proposed BiGRU2 model Work Space complexity Simple RNN [20] 1ðId1 þ d21 þ d1Þ [20] GRU [19] 3ðId1 þ d21 þ d1Þ [20] LSTM [25] 4ðId1 þ d21 þ d1Þ [20] Simple BiGRU 2� 3ðId1 þ d21 þ d1Þ MCRN [26] cd21 þ ðcþ 1Þd1oþ Id1 þ d1 þ I [26] Elman Tower [27] cd21 þ d1ðI þ oÞ þ d1 þ I [26] AR-MCRN-a [28] cd22 þ Id2 þ 2d2 þ Id1 þ d1 [28] AR-MCRN-b [28] cd22 þ Id2 þ 2d2 þ d1d2 þ d1 [28] BiGRU2 (Ours) Equation (5) Let I denote the input dimension, d1; d2 the hidden dimensions, c the context, and o the output dimension Neural Computing and Applications (2024) 36:2941–2960 2947 123 of samples in the partitions is rather unusual, as the test set contains the majority of all samples. Candanedo et al. [7] also noted that the door of their experimental office room was closed most of the time while the office room was occupied. It’s important to note that the state of the door can significantly influence the readings from the ambient sensors installed in the room. This is reflected in the training and testing partition. In the case of testing partition 2, the door was mostly open when someone was in the room. An open door could have influenced the sensor values. However, the authors did not specify the fraction of samples affected. For our experiments, we use the training data only to train our proposed models. The test partitions are only used to test the solution proposed in this work. Table 6 lists the features contained in the UCI Occupancy Detection dataset [7]. Preprocessing: Following [7], we generated features algorithmically by exploiting the timestamps of the sensor measurements that were recorded in one-minute intervals. Instead of using the raw timestamps, we extracted the number of seconds from midnight for each day and clas- sified the timestamp as either a weekend or a weekday. We did not use the raw timestamps itself. This results in seven features per sample. Prior to utilizing the samples for model training, we normalized the samples, as explained in Sect. 6.1. Addi- tionally, we grouped successive samples into windows as explained in Sect. 6.2. Subsequently, to prevent potential overfitting, we shuffled the windows post the grouping process. 5.2.2 Kasteren ubicomp dataset The publicly available Kasteren Ubicomp [29] dataset has been created for the purpose of activity recognition. Although human activities are usually recognized using wearable sensors, this dataset contains ambient sensor readings. Importantly, information about occupancy can be inferred from the activity annotations contained in the Table 3 Dataset overview with general information. Only the datasets [7, 29, 30] are publically available, annotated with labels and recorded by multiple privacy-aware ambient sensors, which motivated us to use these datasets in our experiments Work Dataset Public Labels Sensors type [7] UCI occupancy detection Yes Yes Ambient [29] Kasteren Ubicomp Dataset Yes Yes Ambient [30] CASAS HH 101-130 Yes Partial Ambient [31] Harvard ODDs Yes Partial Ambient ? Power [32] Home state change sensors Yes Yes Ambient [9] Building fusion set No Yes Ambient ? Wi-Fi [5] Office building sensing No Yes Ambient [13] Room occupant estimation No Yes Ambient [8] Class room occupancy No Yes Ambient [6] Bluetooth beacons No Yes Bluetooth Table 4 Dataset overview including the number of subjects, recording duration, temporal resolution and environment Work Subjects Duration Temporal resolution Environment [7] 2 17 days 1 min Office room [29] 1 28 days 1 min Multi-room home [30] 1–2 few years dynamic [ms] Multi-room home [31] – 8 months 15 min Multi-room home [32] 2 14 days 15 min Multi-room home [9] 25 9 days 30/60 s Office room [5] – 55 days 1 / 20 min Multi-room office [13] 8 5 days 1 min Office room [8] 43 16 days 1 min Class room [6] 1 – – Multi-room home Table 5 The overall class distribution of the UCI occupancy detection dataset Partition # Samples Occupied (%) Not occupied (%) Training 8143 21 79 Testing 0 2665 36 64 Testing 2 9752 21 79 2948 Neural Computing and Applications (2024) 36:2941–2960 123 dataset. The dataset is relatively small compared to the other datasets used for the experiments. It has been recorded in a multi-room apartment with a single resident. It contains 1319 sensor firing intervals that have been recorded by digital state-change sensors. The 14 unique sensors and their object placement are listed in Table 7. Besides the sensor firing intervals, the dataset also contains 245 activity intervals that have been manually created with the help of a Bluetooth headset, which was utilized to record verbal annotations. The different activities and their average duration are listed in Table 8. The sensor firings and the corresponding activity annotations were recorded over a time period of 28 days (i.e., 4 weeks from 25.02.2008 to 23.03.2008). The difference in the number of sensor firing intervals and activity intervals indicates that they were recorded independently. Every sample in the dataset comprises a start time, an end time, and an ID. The data formats and examples are listed in Table 9. There are various firing sensors interval lengths ranging from one second to over 24 h. The average firing sensors interval length is 44 min, while the average activity length is 171 min. The activities with the longest interval length are ’leave house’ and ’go to bed,’ with over 11 and over 8 h of duration, respectively. The label ’leave house’ (i.e., the activity of a person leaving the home environment) is the most important activity for the experiments conducted in this work because it bears the potential to derive occupancy annotations. The activity ’go to bed’ (i.e., the person is sleeping) may cause problems if no sensor firings occur during the activity because an automated OD system may confuse the activity with the absence of the resident. Preprocessing: We applied two different preprocessing variants to make the dataset suitable for OD experiments. The activity ’leave house’ is essential to derive occupancy information, while all other activities indicate the apart- ment is occupied. The time interval of the activity ’leave house’ implies that the resident is absent from the Table 6 The features contained in the UCI Occupancy Detection dataset. The number of seconds since midnight and the differentiation between weekend and weekday are manually engineered features based on the timestamp Feature Unit Example Timestamp YYYY-MM-DD hh:mm:ss 2015–02-06 11:05:00 1. ) Sec. since Midnight Seconds 3600 2. ) Weekend / Weekday Binary 1 3. Temperature C� 21,7225 4. Humidity % 20,7 5. Light Lux 479 6. CO2 ppm 848,25 7. HumidityRatio (none) 0,0033 Table 7 The IDs of the state-change sensors contained in the Kasteren Ubicomp [29] dataset and the objects the sensors were attached to Sensor ID Placement 1 Microwave 5 Hall-Toilet door 6 Hall-Bathroom door 7 Cups cupboard 8 Fridge 9 Plates cupboard 12 Front door 13 Dishwasher 14 Toilet flush 17 Freezer 18 Pans cupboard 20 Washingmachine 23 Groceries cupboard 24 Hall-Bedroom door Table 8 The activity annotations of the Kasteren Ubicomp dataset and the corresponding IDs Activity ID Activity Average duration (min) 1 Leave house 665.7 4 Use toilet 1.8 5 Take shower 9.6 10 Go to bed 485.7 13 Prepare breakfast 3.4 15 Prepare dinner 34.2 17 Get drink 0.9 Table 9 The features contained in the Kasteren Ubicomp dataset Feature Format Example Start time DD-Mon-YYYY hh:mm:ss 14-Mar-2008 23:59:37 End time DD-Mon-YYYY hh:mm:ss 15-Mar-2008 09:53:24 ID Integer 10 The data formats and examples are listed Neural Computing and Applications (2024) 36:2941–2960 2949 123 beginning to the end of the interval. Based on the activity ’leave house,’ we extracted 34 absence periods. We used the absence periods to preprocess the dataset in two dif- ferent ways, as described in the following. Variant 1:We assigned binary occupancy annotations to the sensor intervals. To determine the label of a sample, we computed the overlap between the sensor firing intervals and the absence periods. We assigned the label ’not occupied,’ if there is an overlap of at least 50% between a sensor interval and the absence periods. Otherwise, we assigned the label ’occupied.’ In this fashion, we assign a label to every sensor interval. Finally, we extracted six features from each sensor interval: the sensor ID, the length of the interval (i.e., the temporal difference between interval start time and end time), and four additional fea- tures. The four additional features are obtained by pro- cessing the start time and the end time of the intervals, similar to the UCI Occupancy Detection dataset prepro- cessing procedure. From each timestamp, the number of seconds since midnight on the same day (NSM), and the week status (WS) indicating a weekend or a weekday, are extracted. All features are normalized as described in Sect. 6.1. Preprocessing variant 1 results in 1319 samples with six features and binary occupancy labels. The class distribution for this preprocessing variant is listed in Table 10. For the experiments we conducted, we used 80% of the sample as the training set and 20% as the testing set and grouped successive samples into windows as explained in Sect. 6.2. Subsequently, to prevent potential overfitting, we shuffled the windows post the grouping process. Variant 2: Preprocessing variant 2 involves the extrac- tion of equally sized time slices from the sensor intervals, instead of directly treating the particular sensor intervals as samples. This is feasible because date and time are given for each sensor interval. First, the sensor activations are evaluated at a time resolution of one second. For each second, we identify which of the 14 state-change sensors are active. Over the length of the specified time slice, the activations are then averaged, resulting in floating-point values that represent the sensor activations throughout the time slice. This method helps in preserving information regarding sensor activation within each time slice, which is essential as a sensor might be active at the start of a time slice but not active towards the end. The mathematical representation for the average activation of a sensor over a time slice of length T seconds is given by Equation (6). Ai ¼ 1 T XT t¼1 Si;t ð6Þ In this equation, Ai refers to the average activation of sensor i over the time slice, T represents the length of the time slice in seconds, and Si;t represents the activation state of sensor i at time t (1 if active, 0 otherwise). The dimensionality of the final samples comprises the average sensor activations of all 14 state-change sensors. The ground truth occupancy labels are again determined by checking whether or not a time slice has at least 50% overlap with any of the absence periods. The resulting class distribution is much more balanced as listed in Table 10. The reason for this is that the non-occupied intervals are generally much longer than the occupied intervals. There- fore, the label ’non-occupied’ is assigned to the majority of time slices. In this way, we created samples for time slice lengths of 15/60 s, resulting in 160,015/40,004 samples, respectively. We selected time slice lengths of 15 s and 60 s strategically to accommodate the differing require- ments of RNNs and traditional machine learning algo- rithms that we employed in our study. The 15 s time slice was chosen to facilitate the functioning of RNNs, which are capable at handling sequences of data inputs. By utilizing a 15-second slice, we were able to feed the RNNs with four sequential inputs (amounting to a total of 60 s), thereby enabling the networks to learn and identify dependencies Table 10 Class distributions of the Kasteren Ubicomp dataset for preprocessing variant 1 (sensor intervals) and variant 2 (time slice length 15/60 s) with binary occupancy labels Variant # Samples # Features Occupied (%) Not occupied (%) 1 1319 6 83.40 16.60 2 160,015/40,004 14 43.59 56.41 Table 11 The ambient sensors contained in the CASAS [30] dataset that we included in our experiment ID Sensor type D Magnetic door sensor L Light switch (binary) LL Light switch (dimmable) LS Light sensor M Infrared motion sensor MA Wide-area infrared motion sensor T Temperature sensor 2950 Neural Computing and Applications (2024) 36:2941–2960 123 across a short time series of data points. Conversely, the 60 s time slice was designated to suit the input requirements of traditional machine learning algorithms utilized in our analysis. By adopting these specified time slice lengths, we aimed to tailor the input data optimally for both types of algorithms, thereby facilitating a more comparative and comprehensive analysis of their performance. We used 70% of the sample as the training set and 30% as the testing set and grouped successive samples into windows as explained in Sect. 6.2. Subsequently, to prevent potential overfitting, we shuffled the windows post the grouping process. 5.2.3 CASAS HH101-HH130 datasets The CASAS HH101-HH130 [30] datasets comprise a col- lection of different datasets that were created at the Washington State University (WSU). WSU sent a selection of smart ambient sensors to volunteers, who installed the sensors in their apartments. The identifiers HH101-HH130 refer to 30 apartments that are mostly different in structure, only a few of the apartments are identical. All recordings originate from single resident home environments, except the HH107 and HH121 apartments are two resident apartments. The apartments have different layouts, so that the ambient sensors are tailored to them (i.e., the numbers of sensors differ). The ambient sensors used for the mea- surements are identical, but the locations and number of sensors vary depending on the apartments. The selection of sensors includes light, motion, magnetic and temperature sensors. The sensors we included in our experiments are listed in Table 11. The number of sensor measurements ranges from under 200 thousand to over 16 million, depending on the apart- ment. The average recording period is 951 days. However, only 4,63% of all measurements have activity annotations. The apartments have different numbers of sensors. Thus, the dimensionality of the samples (i.e., the number of features) varies between the particular apartments. The annotations mostly indicate the start or end of an activity, thus we extracted activity intervals, similarly to the pre- processing procedure of the Kasteren Ubicomp dataset. Preprocessing: The most crucial activity annotations are ’Leave home’ and ’Enter home’ because we use these annotations to derive occupancy and absence intervals. In Enter home Entertain guests (Begin) Entertain guests (End) Leave home Assign label 0 Assign label 1 Assign label 2 Assign label 1 Assign label 0 0 Persons 1 Persons ≥ 2 Persons 1 Persons 0 Persons Timeline Fig. 3 The labeling procedure we apply to the CASAS dataset. We differentiate three different cases of occupancy so that we can not only use the dataset for occupancy detection but also for multi-occupancy detection Table 12 Class distributions of the CASAS HH111 and HH112 datasets. The number of samples refers to time slice length 15/60 s Apartment # Samples # Features Occupied (%) Not occupied (%) HH111 351,217/87,805 62 60,82 39,18 HH112 622,076/155,520 45 64,95 35,05 Table 13 Specifications of the architecture building blocks, model parameters, output shapes, and FLOPs for the top-performing MLP model Block (type) Output Shape Parameters 1. Input (InputLayer) (7) 0 2. Hidden Layer (Dense) (21) 168 3. Dropout (21) 0 4. Hidden Layer (Dense) (21) 462 4. Hidden Layer (Dense) (7) 154 5. Hidden Layer (Dense) (4) 32 7. Output (Softmax) (2) 10 Total number of parameters 826 Model size 3,25 KiB FLOPs (forwads pass) 1607 * The model size is represented in kibibytes (KiB), calculated based on the Float32 data type Neural Computing and Applications (2024) 36:2941–2960 2951 123 addition, we extracted intervals in which guests are present in the apartments. Guest intervals always occur outside absence intervals. We used the datasets recorded in the HH112 and HH111 apartments for occupancy detection and multi-occupancy detection, respectively. Figure 3 shows the labeling procedure we applied to the dataset. In the case of multi-occupancy detection, we used the activity annotation ’Entertain guests’ to derive the presence of multiple individuals in an apartment. The state ’more than one occupant’ was observed in less than 7% of the intervals. The number of guests is not specified in the CASAS [30] dataset. Therefore, it is not possible to derive the number of occupants using the dataset. Although it is possible to derive whether one or multiple persons are present in intervals with guests. We differentiate between the fol- lowing occupancy states: (a) no occupant, (b) exactly one occupant, (c) more than one occupant. Similarly to the preprocessing procedure of the Kasteren Ubicomp dataset, we split the datasets into equally sized time slices with a length of 15/60 s and annotated them using the occupancy intervals. Table 12 lists the resulting number of samples after preprocessing and the class dis- tribution for both apartments, respectively. We grouped successive samples into windows as explained in Sect. 6.2 and shuffled the windows post the grouping process, to prevent potential overfitting. 6 Experimental setup This section presents the general preprocessing procedures that all datasets we utilized have in common and the setups for the experiments conducted. 6.1 Feature normalization The features (i.e., sensor measurements) contained in the datasets vary significantly in terms of value ranges. To facilitate a smoother gradient descent flow and prevent a bias towards features with higher magnitude values, we scale these features. The Tanh activation function utilized in our ANN architectures is zero-centered; hence, we scale the values to a range of [0, 1] using Min–Max normaliza- tion [33]. Let x represent a value of an input feature, and xmax and xmin represent the maximum and minimum value of this feature, respectively. The normalized value of x, represented as n(x), is given by: nðxÞ ¼ x� xmin xmax � xmin : ð7Þ 6.2 Window extraction In this section, we describe the process of converting time- series data into windows, facilitating data processing through our proposed ANN architecture. It is critical to Table 14 Specifications of the architecture building blocks, model parameters, output shapes, and FLOPs for the top-performing BiGRU models Model Block (type) Output shape Parameters BiGRU2 1. Input (InputLayer) (1, 7) 0 2. Bidirectional layer (GRU) (1, 28) 6216 3. Dropout (1, 28) 0 4. Bidirectional layer (GRU) (28) 9744 5. Hidden layer (Dense) (7) 203 6. Hidden layer (Dense) (1) 8 7. Output (Softmax) (2) 4 Total number of parameters 16.175 * Model size 63,18 KiB FLOPs (forward pass) 32,630 BiGRU4 1. Input (InputLayer) (1, 7) 0 2. Bidirectional layer (GRU) (1, 28) 6216 3. Dropout (1, 28) 0 4. Bidirectional layer (GRU) (1, 28) 9744 5. Dropout (1, 28) 0 6. Bidirectional layer (GRU) (1, 14) 3696 7. Dropout (1, 14) 0 8. Bidirectional layer (GRU) (10) 1560 9. Hidden layer (Dense) (7) 77 10. Hidden layer (Dense) (1) 8 11. Output (Softmax) (2) 4 Total number of parameters 21.305 * Model size 83,22 KiB FLOPs (forward pass) 42,950 * V1 and V2 denote annotation variants 1 and 2, respectively ** The model size is represented in kibibytes (KiB), calculated based on the Float32 data type Table 15 The hyper-parameter configuration we used to train our ANN models Hyper-parameter Value Learning rate 2e� 3 Learning decay b1 0.9 Learning decay b2 0.999 Batch size 100 Dropout probability 30% Training epochs 12 Window size k 1 (i.e., 1 min) 2952 Neural Computing and Applications (2024) 36:2941–2960 123 retain the temporal dependency observed between sequential data points. Consequently, we transformed the time-series data into individual feature vectors, or win- dows, employing the sliding window method [34]. A window is defined as a tuple denoted by Wi � hsi; siþ1; siþk�1i, where si refers to the i-th sample and k represents the window size. During the inference phase, our ANN structure categorizes each window individually. This process involves sliding a fixed-size window across the entire time series, extracting n� k þ 1 windows rep- resented by W1;W2; . . .;Wn�kþ1, where n indicates the total sample count within a dataset. In the training phase, all windows are extracted from the training data for model training. The ground truth label corresponding to a window is determined by the label of its final contained sample. In the testing phase, all windows from the test data are extracted and utilized for model evaluation. This process ensures that the initial verdict from our occupancy detector is obtained after observing k successive samples. 6.3 Experiment 1: UCI occupancy detection dataset In this experiment, we evaluate OD using the UCI occu- pancy detection dataset [7]. We compared the performance of various traditional algorithms with RNN variants, including our BiGRU occupancy detector. The traditional algorithms examined are SVM, quadratic discriminant Table 16 Specifications of the architecture building blocks, model parameters, output shapes, and FLOPs for the top-performing BiGRU models Model Block (type) Output shape *V1 Params. *V1 Output shape *V2 Params. *V2 BiGRU2 1. Input (InputLayer) (4, 6) 0 (4, 14) 0 2. Bidirectional Layer (GRU) (4, 40) 11.520 (4, 20) 4320 3. Dropout (4, 40) 0 (4, 20) 0 4. Bidirectional Layer (GRU) (5) 1410 (5) 810 5. Hidden Layer (Dense) (6) 36 (14) 84 6. Hidden Layer (Dense) (4) 28 (4) 60 7. Output (Softmax) (2) 10 (2) 10 Total number of parameters 13.004 5284 ** Model size 50,80 KiB 20,64 KiB FLOPs (forward pass) 26,196 10,668 BiGRU4 1. Input (InputLayer) (4, 6) 0 (4, 14) 0 2. Bidirectional Layer (GRU) (4, 40) 11.520 (4, 20) 4320 3. Dropout (4, 40) 0 (4, 20) 0 4. Bidirectional Layer (GRU) (4, 30) 12.960 (4, 20) 5040 5. Dropout (4, 30) 0 (4, 20) 0 6. Bidirectional Layer (GRU) (4, 20) 6240 (4, 10) 1920 7. Dropout (4, 20) 0 (4, 10) 0 8. Bidirectional Layer (GRU) (5) 810 (5) 510 9. Hidden Layer (Dense) (6) 36 (14) 84 10. Hidden Layer (Dense) (4) 28 (4) 60 11. Output (Softmax) (2) 10 (2) 10 Total number of parameters 31.604 11.944 ** Model size 123,45 KiB 46,66 KiB FLOPs (forward pass) 63,596 24,108 * V1 and V2 denote annotation variants 1 and 2, respectively ** The model size is represented in kibibytes (KiB), calculated based on the Float32 data type Table 17 The hyper-parameter configuration we used to train our ANN models Hyper-parameter Variant 1 Variant 2 Learning rate 2e� 3 2e� 3 Learning decay b1 0.9 0.9 Learning decay b2 0.999 0.999 Batch size 50 400 Dropout probability 30% 30% Training epochs 250 25 Window size k 4 4 Neural Computing and Applications (2024) 36:2941–2960 2953 123 analysis (QDA), and k-NN. The hyperparameters for these algorithms are detailed below: • SVM (all variants): Shrinking heuristic enabled, 1e� 3 tolerance for stopping criterion, and one-vs-rest deci- sion function. – SVM-lin: Linear kernel. – SVM-poly: 3rd-degree polynomial kernel, scale gamma. – SVM-rbf: radial basis function (RBF) kernel, scale gamma. • k-NN (all variants): Uniform weights, automatic algo- rithm determination, leaf size of 30, and Euclidean distance metric. – 4-NN: k ¼ 4. – 20-NN: k ¼ 20. • QDA: Rank estimation threshold set at 1e� 4. Table 13 lists the ANN building blocks of our MLP. The dense layers of our MLP use the Leaky ReLU activation function with a ¼ 0:3. We included a dropout layer with a 50% probability to mitigate overfitting. In this binary classification scenario, the output layer is of size 2 and uses the Softmax activation function. We trained the MLP for 10 epochs with the Adam optimizer [35], a learning rate of 1e� 3, and a Batch size of 50 samples. Besides the traditional algorithms, we also created instances of RNNs, including our BiGRU models for this experiment. Layer configurations were optimized through hyperparameter tuning. The architecture, parameter details, and computational requirements of our highest-performing BiGRU models are listed in Table 14. The number of model parameters greatly depends on the number of input features n and the window size k. In this experiment, we set the window size to k ¼ 1 because we want to compare the performance of our RNN models directly to the detection performance reported in [7]. Although our RNN models cannot unfold their true potential, when the inputs only consist of single samples. We used the Nadam [36] opti- mizer for ANN parameter optimization. Table 15 lists additional hyper-parameters we used to train our ANN models in this experiment. 6.4 Experiment 2: Kasteren ubicomp dataset This experiment investigates OD utilizing the Kasteren Ubicomp [29] dataset. We employed two distinct prepro- cessing variants on the dataset, detailed in Sect. 5.2.2. Analogous to Experiment 1, we assessed the performance of various traditional algorithms, including SVM, k-NN, and QDA, comparing them with our our ANNs. Further- more, we compare with the LDA and classification and regression tree (CART) algorithms. The hyperparameters defined for the LDA and CART algorithms are as follows: • LDA linear discriminant analysis, implemented using singular value decomposition with a rank estimation threshold set at 1e� 4. • CART A decision tree classifier employing the Gini index as the split criterion and selecting the optimal split at each decision node. The minimum numbers of Table 18 Specifications of the architecture building blocks, model parameters, output shapes, and FLOPs for the top- performing MLP models Block (type) Output shape *V1 Params. *V1 Output shape *V2 Params. *V2 1. Input (InputLayer) (6) 0 (14) 0 2. Hidden Layer (Dense) (40) 280 (20) 300 3. Dropout (40) 0 (20) 0 4. Hidden Layer (Dense) (5) 205 (5) 105 4. Hidden Layer (Dense) (6) 36 (14) 84 5. Hidden Layer (Dense) (4) 28 (4) 60 7. Output (Softmax) (2) 10 (2) 10 Total number of parameters 559 559 * Model size 2,18 KiB 2,18 KiB FLOPs (forwad pass) 1071 1083 *V1, V2 denote annotation variants 1 and 2, respectively ** The model size is represented in kibibytes (KiB), calculated based on the Float32 data type Table 19 The hyper-parameter configuration we used to train our ANN models Hyper-parameter Value Learning rate 2e� 3 Learning decay b1 0.9 Learning decay b2 0.999 Batch size 250 Dropout probability 30% Training epochs 25 Window size k 4 2954 Neural Computing and Applications (2024) 36:2941–2960 123 samples required to split a node and to be at a leaf node are set to 2 and 1, respectively. We created instances of both BiGRU versions, BiGRU2 and BiGRU4, for this experiment. The configurations of the top-performing instances are listed in Table 16. Depending on the preprocessing variant applied to the dataset, slight adjustments were made to the GRU layer sizes. Details on the hyperparameter configurations used to train our ANN models are listed in Table 17. Similarly, the architecture details of the MLPs we created are summarized in Table 18. Additional details regarding the experimental setups for both preprocessing variants are provided below: Variant 1: The recurrent models utilized a window size of k ¼ 4 samples. In contrast, all other algorithms, including our MLP, were configured with a window size of k ¼ 1. Each sample has n ¼ 6 features. Each sample comprised n ¼ 6 features. The MLP training was con- ducted over 250 epochs, employing the adam optimizer with a learning rate of 1e� 3. Variant 2: The recurrent models utilized a window size of k ¼ 4 samples, where each sample corresponds to a 15- second time slice. In comparison, all other algorithms, including our MLP, were set with a window size of k ¼ 1 sample, representing a 60-second time slice. Each sample in this variant incorporated n ¼ 14 features. The MLP was trained over 10 epochs using the Adam optimizer with a learning rate of 2e� 3. 6.5 Experiment 3: CASAS HH112 dataset This experiment investigates OD using the CASAS HH112 [30] dataset. Table 20 lists the architecture details of our top-performing BiGRU models utilized for this experi- ment. The hyper-parameters utilized to train our ANN models are listed in Table 19. The hyper-parameters are mostly consistent with those used in the other experiments. 6.6 Experiment 4: CASAS HH111 dataset This experiment investigates multi-occupancy detection utilizing the CASAS HH111 [30] dataset. Table 21 lists the architecture details of our top-performing BiGRU models employed in this experiment. Most of the other hyper-pa- rameters are consistent with those used in Sect. 6.5, with a notable alteration in the output shape of the Softmax layer, Table 20 Specifications of the architecture building blocks, model parameters, output shapes, and FLOPs for the top- performing BiGRU models Model Block (type) Output shape Parameters BiGRU2 1. Input (InputLayer) (4, 45) 0 2. Bidirectional Layer (GRU) (4, 50) 29.100 3. Dropout (4, 50) 0 4. Bidirectional Layer (GRU) (10) 3720 5. Hidden Layer (Dense) (45) 495 6. Hidden Layer (Dense) (4) 184 7. Output (Softmax) (2) 10 Total number of parameters 33.509 * Model size 130,89 KiB FLOPs (forward pass) 67.237 BiGRU4 1. Input (InputLayer) (4, 45) 0 2. Bidirectional Layer (GRU) (4, 50) 29.100 3. Dropout (4, 50) 0 4. Bidirectional Layer (GRU) (4, 80) 22.080 5. Dropout (4, 80) 0 6. Bidirectional Layer (GRU) (4, 60) 20.160 7. Dropout (4, 60) 0 8. Bidirectional Layer (GRU) (10) 4320 9. Hidden Layer (Dense) (45) 495 10. Hidden Layer (Dense) (4) 184 11. Output (Softmax) (2) 10 Total number of parameters 76.349 * Model size 298,24 KiB FLOPs (forward pass) 297.957 * The model size is represented in kibibytes (KiB), calculated based on the Float32 data type Neural Computing and Applications (2024) 36:2941–2960 2955 123 which has been adjusted to 3 to distinguish between three classes in this experiment. 6.7 Evaluation metrics In evaluating the performance of our models, we employ several commonly used performance metrics, including accuracy, F1-score, precision, and recall. 7 Results This section presents the results of the particular experi- ments. The performance of our BiGRU solution is com- pared to several state-of-the-art algorithms. 7.1 Experiment 1: UCI occupancy detection dataset The results of this experiment are listed in Table 22. In [7], the authors analyzed the performance of the RF, GBM, CART and LDA algorithms. To facilitate a direct com- parison, we set the windows size necessary to train and test our RNN models to k ¼ 1 samples, aligning with the results reported in [7]. However, this setting prevents our RNN models to unfold their full potential as they do not profit from patterns extracted between successive time steps. In addition to our BiGRU models, we trained QDA, k-NN, and SVM models, along with our ANN implemen- tations on the respective dataset. The SVM models yield accuracies of 97.90%, 97.86%, and 97.94% on test set 1, with the SVM-rbf model showing the highest accuracy reported for this test set in comparison to previous work. Our MLP, BiGRU2, BiGRU4 models yielded accuracies of up to 97.86%, 97.71%, 97.60% on test set 1, marking a noticeable improvement over prior work, but without set- ting new standards. In particular, our BiGRU2 model and MLP surpassed other solutions on test set 2, with the MLP even slightly outperforming our BiGRU models by margins of 0.15 and 0.05% points on test sets 1 and 2, respectively. Additionally, we incorporated well-established RNN architectures such as GRU and LSTM, enhancing the depth of our comparison. However, these showed lower perfor- mance compared to our BiGRU models on test set 2. Overall, the ANN models proposed in this work outper- form the state-of-the-art on test set 2. Table 21 Specifications of the architecture building blocks, model parameters, output shapes, and FLOPs for the top- performing BiGRU models Model Block (type) Output shape Parameters BiGRU2 1. Input (InputLayer) (4, 62) 0 2. Bidirectional layer (GRU) (4, 50) 34.200 3. Dropout (4, 50) 0 4. Bidirectional layer (GRU) (10) 3720 5. Hidden layer (Dense) (62) 682 6. Hidden layer (Dense) (4) 252 7. Output (Softmax) (3) 15 Total number of parameters 38.869 * Model size 151,81 KiB FLOPs (forward pass) 77.930 BiGRU4 1. Input (InputLayer) (4, 62) 0 2. Bidirectional layer (GRU) (4, 50) 34.200 3. Dropout (4, 50) 0 4. Bidirectional layer (GRU) (4, 80) 63.360 5. Dropout (4, 80) 0 6. Bidirectional layer (GRU) (4, 60) 51.120 7. Dropout (4, 60) 0 8. Bidirectional layer (GRU) (10) 4320 9. Hidden layer (Dense) (62) 682 10. Hidden layer (Dense) (4) 252 11. Output (Softmax) (3) 15 Total number of parameters 153.949 * Model size 601,36 KiB FLOPs (forward pass) 308.664 * The model size is represented in kibibytes (KiB), calculated based on the Float32 data type 2956 Neural Computing and Applications (2024) 36:2941–2960 123 7.2 Experiment 2: kasteren ubicomp dataset In this experiment, the evaluations were conducted on two preprocessing variants, each differing significantly in terms of occupancy annotations and class distribution. The results for each variant are listed in Tables 23 and 24. Variant 1: In this variant, the CART model was the top- performing instance, with an F1-score of 97.98%. Despite the k-NN algorithm yielded an 97.48% F1-score. Among the ANN models, our BiGRU2 was the top-performing instance with an F1-score of 94.91%. Notably, with 97.18%, our BiGRU4 model yielded the highest recall rate among the ANN models. Variant 2: Analyzing Table 24, it is clear that the ANN models demonstrated quite similar performance across various metrics. Focusing on the F1-score, the MLP, GRU, LSTM, BiGRU2, and BiGRU4 models all recorded scores within a narrow range, oscillating between 77.49% and 77.69% F1-score. This similarity is mirrored in their accuracy, precision, and recall metrics as well, indicating a Table 22 Comparison to previous work using the UCI occupancy detection dataset. The performance metrics reported in previous work are limited, only the accuracy has been reported. For the ANN models, average and top performances are significantly different. Our BiGRU2 and MLP model outperforms the state-of-the-art Model Test 1 Avg Accuracy (%) Test 2 Avg Accuracy (%) Test 1 Top Accuracy (%) Test 2 Top Accuracy (%) RF [7] N/A N/A 95,53 98,06 GBM [7] N/A N/A 95,76 96,10 CART [7] N/A N/A 94,52 96,52 LDA [7] N/A N/A 97,90 99,33 SVM [10] N/A N/A 97,90 N/A K-NN [10] N/A N/A 95,90 N/A RBFN [10] N/A N/A 97,00 N/A MCRN [10] N/A N/A 97,60/93,20 N/A MVCNN [11] N/A N/A 97,40 97,72 DDQN-PER [12] N/A N/A 96,40 98,20 QDA N/A N/A 97,67 99,02 4-NN 97,00 94,29 97,00 94,29 20-NN 97,34 94,16 97,34 94,16 SVM-lin 97,90 98,95 97,90 98,95 SVM-poly 97,86 97,64 97,86 97,64 SVM-rbf 97,94 98,53 97,94 99,46 MLP 97,49 98,66 97,86 99,44 GRU N/A N/A 97,86 96,86 LSTM N/A N/A 97,71 95,93 BiGRU2 97,22 96,92 97,71 99,39 BiGRU4 97,24 97,83 97,60 98,55 Table 23 Comparison to previous work on occupancy detection, variant 1 (sensor intervals) using the Kasteren Ubicomp dataset Model Accuracy (%) F1-Score (%) Precision (%) Recall (%) 4-NN 95,85 97,48 100,00 95,09 QDA 86,04 92,31 86,38 99,11 LDA 84,91 91,74 85,38 99,11 CART 96,60 97,98 98,64 97,32 SVM-rbf 85,66 92,18 85,50 100,00 MLP 87,55 92,48 89,04 96,21 GRU 89,77 93,79 91,89 95,77 LSTM 91,29 94,69 93,18 96,24 BiGRU2 91,67 94,91 93,61 96,24 BiGRU4 85,23 91,39 86,25 97,18 Neural Computing and Applications (2024) 36:2941–2960 2957 123 consistent performance across all these ANN models. In the case of the traditional algorithms, the CART algorithm performed best, yielding an F1-score of 78.02%. The QDA and LDA algorithms were observed to have underper- formed when compared to the other models. The QDA model, in particular, displayed a noticeably lower perfor- mance, with an F1-score of 19.61%. While the ANN models exhibited a homogeneous performance, the CART model displayed a marginally better output, demonstrating its efficacy in the task. 7.3 Experiment 3: CASAS HH112 dataset In this experiment, we analyzed the performance tested on the CASAS HH112 dataset as presented in Table 25. The CART model proved to be the top-performing instance with regard to both accuracy and F1-score, which were 92.97% and 94.72%, respectively. The worst performing instance, when considering the F1-score as primary indi- cator of performance, was the LDA algorithm. It yielded the lowest F1-score at 78.03%. The RNN models (GRU, LSTM, BiGRU2, and BiGRU4) yielded close performance, especially when focusing on the F1-scores which were all within the range of 89.53 to 91.12%. The BiGRU2 model was the top-performing instance among the RNN models. 7.4 Experiment 4: CASAS HH111 dataset This section presents the results of our multi-occupancy detection experiment, which are listed in Table 26. Among the traditional algorithms, 4-NN demonstrated a good performance, with an F1-Score of 93.96%. The CART model was the top-performing instance across all metrics, achieving the highest F1-Score of 96.16%. The QDA model struggled significantly in this experiment, managing only a 31.64% F1-Score. Among the ANN models, the MLP model showed the lowest performance, with an F1- score of 76.93%. The RNN models present a noticeable improvement, with F1-scores above 90%. They maintained balance between precision and recall, with the LSTM Table 24 Comparison to previous work on occupancy detection, variant 2 (60 s slices for traditional algorithms, 4x15 s slices for RNN models) using the Kasteren Ubicomp dataset. The ANN models achieved a consistent performance Model Accuracy (%) F1-Score (%) Precision (%) Recall (%) 4-NN 81,29 78,01 79,38 76,70 QDA 61,26 19,61 96,18 10,92 LDA 75,76 69,20 76,86 62,92 CART 81,29 78,02 79,36 76,73 SVM-rbf 80,89 77,41 79,27 75,63 MLP 81,39 77,69 80,23 75,31 GRU 80,90 77,65 80,12 75,32 LSTM 80,80 77,50 80,09 75,06 BiGRU2 80,88 77,62 80,13 75,27 BiGRU4 80,79 77,49 80,08 75,05 Table 25 Comparison to previous work on occupancy detection (60 s slices for traditional algorithms, 4x15 s slices for RNN models) using the CASAS HH112 dataset. Our BiGRU4 model achieved the highest recall rate compared to the other RNN models Model Accuracy (%) F1-Score (%) Precision (%) Recall (%) 4-NN 90,76 92,81 93,92 91,72 QDA 64,97 78,77 64,97 100,00 LDA 69,72 78,03 73,80 82,78 CART 92,97 94,72 92,38 97,19 SVM-rbf 72,87 79,28 78,67 79,89 MLP 75,61 81,97 78,44 85,82 GRU 87,51 90,56 88,81 92,38 LSTM 86,05 89,53 87,17 92,03 BiGRU2 88,27 91,12 89,43 92,88 BiGRU4 87,29 90,32 89,19 91,49 2958 Neural Computing and Applications (2024) 36:2941–2960 123 slightly outperforming the GRU and BiGRU models by 0.49 and 1.39% points F1-score, respectively. 8 Conclusion In this work, we presented occupancy and multi-occupancy detection experiments. The strength of our proposed BiGRU occupancy detector lies in its ability to harness both historical and future contexts through bidirectional recurrent layers, making it adept at identifying complex patterns within sensory data. Besides the BiGRU models, a major contribution of this work is the sophisticated pre- processing procedure we employed in preparing the Kas- teren Ubicomp and CASAS dataset. This procedure not only annotated the CASAS dataset with occupancy labels but also tailored it to address multi-occupancy detection experiments. Our preprocessing contributed significantly, as the Kasteren Ubicomp and CASAS dataset were origi- nally developed for activity recognition rather than OD. Our evaluations were performed with three different data- sets recorded exclusively with privacy-aware ambient sensors. Notably, with accuracies and F1-scores exceeding 90%, our models not only excelled in single-occupancy detection but also showed great promise in the more challenging task of multi-occupancy detection. In addition, our work presented insightful comparisons between the BiGRU models and other state-of-the-art RNN algorithms. Moreover, our ANN models outperformed previous work on the UCI Occupancy Detection dataset with accuracies up to 99.44%. This work lays the foundation for further research and innovation in the field of occupancy detection. Acknowledgements This research work has been funded by the German Federal Ministry of Education and Research and the Hessian Ministry of Higher Education, Research, Science and the Arts within their joint support of the National Research Center for Applied Cybersecurity ATHENE. Funding Open Access funding enabled and organized by Projekt DEAL. This research work has been funded by the German Federal Ministry of Education and Research and the Hessian Ministry of Higher Education, Research, Science and the Arts within their joint support of the National Research Center for Applied Cybersecurity ATHENE. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/. Availability of data and materials The UCI Occupancy Detection dataset analyzed during the current study is available in the UCI machine learning repository [37]. The Kasteren Ubicomp dataset analyzed during this study is included in the supplementary infor- mation files of the original published article [29]. The CASAS dataset analyzed during the current study is available in the CASAS reposi- tory of the Washington State University [38]. References 1. Chen Z, Jiang C, Xie L (2018) Building occupancy estimation and detection: a review. Energy Build 169:260–270. https://doi. org/10.1016/j.enbuild.2018.03.084 2. Narayanaswamy B, Balaji B, Gupta R, Agarwal Y (2014) Data driven investigation of faults in HVAC systems with model, cluster and compare (MCC). In: Proceedings of the 1st ACM conference on embedded systems for energy-efficient buildings. BuildSys ’14, pp 50–59. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/2674061.2674067 3. Mills E (2011) Building commissioning: a golden opportunity for reducing energy costs and greenhouse gas emissions in the united states. Energ Effi 4(2):145–173 4. Black J, Velastin SA, Boghossian BA (2005) A real time surveillance system for metropolitan railways. In: Advanced Table 26 Comparison to previous work on multi-occupancy detection (60 s slices for traditional algorithms, 4x15 s slices for RNN models) using the CASAS HH111 dataset Model Accuracy (%) F1-Score (%) Precision (%) Recall (%) 4-NN 94,11 93,96 94,58 93,46 QDA 45,53 31,64 25,71 56,32 LDA 73,20 69,96 74,61 66,95 CART 96,09 96,16 96,69 95,66 SVM-rbf 76,87 76,05 79,87 73,27 MLP 77,64 76,93 80,79 74,30 GRU 91,67 92,45 93,10 91,88 LSTM 92,18 92,94 93,50 92,42 BiGRU2 89,44 90,41 91,50 89,43 BiGRU4 90,82 91,55 92,32 90,84 Neural Computing and Applications (2024) 36:2941–2960 2959 123 http://creativecommons.org/licenses/by/4.0/ http://creativecommons.org/licenses/by/4.0/ https://doi.org/10.1016/j.enbuild.2018.03.084 https://doi.org/10.1016/j.enbuild.2018.03.084 https://doi.org/10.1145/2674061.2674067 video and signal based surveillance, 2005 IEEE international conference on video and signal based surveillance (AVSS’05), 15–16 September 2005, Como, Italy, pp 189–194. IEEE Com- puter Society, Washington, DC, USA. https://doi.org/10.1109/ AVSS.2005.1577265 5. Dong B, Andrews B, Lam KP, Höynck M, Zhang R, Chiou Y-S, Benitez D (2010) An information technology enabled sustain- ability test-bed (ITEST) for occupancy detection through an environmental sensing network. Energy Buil 42(7):1038–1046 6. Alhamoud A, Nair AA, Gottron C, Böhnstedt D, Steinmetz R (2014) Presence detection, identification and tracking in smart homes utilizing bluetooth enabled smartphones. In: IEEE 39th conference on local computer networks, Edmonton, AB, Canada, 8–11 September, 2014 - workshop proceedings, pp 784–789. IEEE Computer Society, Washington, DC, USA. https://doi.org/ 10.1109/LCNW.2014.6927735 7. Candanedo LM, Feldheim V (2016) Accurate occupancy detec- tion of an office room from light, temperature, humidity and co2 measurements using statistical learning models. Energy Build 112:28–39 8. Szczurek A, Maciejewska M, Pietrucha T (2017) Occupancy determination based on time series of co2 concentration, tem- perature and relative humidity. Energy Build 147:142–154 9. Wang W, Chen J, Hong T (2018) Occupancy prediction through machine learning and data fusion of environmental sensing and wi-fi sensing in buildings. Autom Constr 94:233–243 10. Hoori AO, Motai Y (2018) Multicolumn RBF network. IEEE Trans Neural Netw Learn Syst 29(4):766–778. https://doi.org/10. 1109/TNNLS.2017.2650865 11. Liu C, Hsaio W, Tu Y (2019) Time series classification with multivariate convolutional neural network. IEEE Trans Ind Electron 66(6):4788–4797. https://doi.org/10.1109/TIE.2018.2864 702 12. Fährmann D, Jorek N, Damer N, Kirchbuchner F, Kuijper A (2022) Double deep q-learning with prioritized experience replay for anomaly detection in smart environments. IEEE Access 10:60836–60848. https://doi.org/10.1109/ACCESS.2022.3179720 13. Han H, Jang K-J, Han C, Lee J (2013) Occupancy estimation based on co2 concentration using dynamic neural network model. Proc. AIVC 13 14. Jin M, Bekiaris-Liberis N, Weekly K, Spanos C, Bayen A (2015) Sensing by proxy: occupancy detection based on indoor co2 concentration. UBICOMM 2015:14 15. Xia L, Chen C, Aggarwal JK (2011) Human detection using depth information by kinect. In: IEEE conference on computer vision and pattern recognition, CVPR workshops 2011, Colorado Springs, CO, USA, 20–25 June, 2011, pp 15–22. IEEE Computer Society, Washington, DC, USA. https://doi.org/10.1109/ CVPRW.2011.5981811 16. Corvee, E, Bak, S, Brémond F (2012) People detection and re- identification for multi surveillance cameras. In: Proceedings of the international conference on computer vision theory and applications, Volume 2: VISAPP, (VISIGRAPP 2012), pp 82–88. SciTePress, Setubal, Portugal. https://doi.org/10.5220/00038086 00820088. INSTICC 17. Pirttikangas S, Tobe Y, Thepvilojanapong N (2010) Smart environments for occupancy sensing and services. In: Nakashima H, Aghajan HK, Augusto JC (eds) Handbook of ambient intel- ligence and smart environments, pp 825–849. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-93808-0_31 18. Maas AL, Hannun AY, Ng AY, et al. (2013) Rectifier nonlin- earities improve neural network acoustic models. In: Proc. Icml, vol 30, p 3 (2013). Atlanta, Georgia, USA 19. Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representa- tions using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1724–1734. Association for Computational Linguistics, Doha, Qatar. https://doi.org/10.3115/v1/D14-1179. https://aclan thology.org/D14-1179 20. Dey R, Salem FM (2017) Gate-variants of gated recurrent unit (GRU) neural networks. In: IEEE 60th international midwest symposium on circuits and systems, MWSCAS 2017, Boston, MA, USA, August 6-9, 2017, pp 1597–1600. IEEE, Washington, DC, USA. https://doi.org/10.1109/MWSCAS.2017.8053243 21. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166 22. Pascanu R, Mikolov T, Bengio Y (2012) Understanding the exploding gradient problem. CoRR arXiv:1211.5063 23. Rehmer A, Kroll A (2020) On the vanishing and exploding gradient problem in gated recurrent units. IFAC-PapersOnLine 53(2):1243–1248 24. Salehinejad H, Baarbe J, Sankar S, Barfett J, Colak E, Valaee S (2018) Recent advances in recurrent neural networks. CoRR arXiv:1801.01078 25. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco. 1997.9.8.1735 26. Huang B, Rashid T, Kechadi M (2007) Multi-context recurrent neural network for time series applications. Int J Comput Inf Eng 1(10):3086–3095 27. Elman JL (1990) Finding structure in time. Cogn Sci 14(2):179–211. https://doi.org/10.1207/s15516709cog1402_1 28. Rashid T, Huang B, Kechadi M, Gleeson B (2006) Auto-re- gressive recurrent neural network approach for electricity load forecasting. Int J Comput Intell 3(1):1–9 29. van Kasteren T, Noulas A, Englebienne G, Kröse B (2008) Accurate activity recognition in a home setting. In: Proceedings of the 10th international conference on ubiquitous computing. UbiComp ’08, pp 1–9. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/1409635.1409637 30. Cook DJ, Crandall AS, Thomas BL, Krishnan NC (2013) CASAS: a smart home in a box. Computer 46(7):62–69. https:// doi.org/10.1109/MC.2012.328 31. Makonin S (2015) ODDs: occupancy detection dataset. Harv Dataverse. https://doi.org/10.7910/DVN/2K9FFE 32. Tapia EM, Intille SS, Larson K (2004) Activity recognition in the home using simple and ubiquitous sensors. In: Ferscha A, Mat- tern F (eds) Pervasive computing. Springer, Berlin, Heidelberg, pp 158–175 33. Han J, Kamber M, Pei J (2011) Data transformation and data discretization. Data mining: concepts and techniques, pp 111–118 34. Dietterich TG (2002) Machine learning for sequential data: a review. In: Caelli T, Amin A, Duin RPW, de Ridder D, Kamel M (eds) Structural, syntactic, and statistical pattern recognition. Springer, Berlin, pp 15–30 35. Kingma DP, Ba J (2015) Adam: A method for stochastic opti- mization. In:BengioY,LeCunY (eds) 3rd international conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, conference track proceedings. arXiv:1412.6980 36. Dozat, T.: Incorporating nesterov momentum into adam (2016) 37. Candanedo L (2016) UCI Machine Learning Repository. https:// archive.ics.uci.edu/dataset/357/occupancy?detection. Accessed: 24 Sep 2023 38. Cook DJ, Crandall AS, Thomas BL, Krishnan NC (2021) CASAS. http://casas.wsu.edu/datasets/. Accessed: 12 Jul 2022 Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. 2960 Neural Computing and Applications (2024) 36:2941–2960 123 https://doi.org/10.1109/AVSS.2005.1577265 https://doi.org/10.1109/AVSS.2005.1577265 https://doi.org/10.1109/LCNW.2014.6927735 https://doi.org/10.1109/LCNW.2014.6927735 https://doi.org/10.1109/TNNLS.2017.2650865 https://doi.org/10.1109/TNNLS.2017.2650865 https://doi.org/10.1109/TIE.2018.2864702 https://doi.org/10.1109/TIE.2018.2864702 https://doi.org/10.1109/ACCESS.2022.3179720 https://doi.org/10.1109/CVPRW.2011.5981811 https://doi.org/10.1109/CVPRW.2011.5981811 https://doi.org/10.5220/0003808600820088 https://doi.org/10.5220/0003808600820088 https://doi.org/10.1007/978-0-387-93808-0_31 https://doi.org/10.3115/v1/D14-1179 https://aclanthology.org/D14-1179 https://aclanthology.org/D14-1179 https://doi.org/10.1109/MWSCAS.2017.8053243 http://arxiv.org/abs/1211.5063 http://arxiv.org/abs/1801.01078 https://doi.org/10.1162/neco.1997.9.8.1735 https://doi.org/10.1162/neco.1997.9.8.1735 https://doi.org/10.1207/s15516709cog1402_1 https://doi.org/10.1145/1409635.1409637 https://doi.org/10.1109/MC.2012.328 https://doi.org/10.1109/MC.2012.328 https://doi.org/10.7910/DVN/2K9FFE http://arxiv.org/abs/1412.6980 https://archive.ics.uci.edu/dataset/357/occupancy%2bdetection https://archive.ics.uci.edu/dataset/357/occupancy%2bdetection http://casas.wsu.edu/datasets/ Ubiquitous multi-occupant detection in smart environments Abstract Introduction Related work Problem statement Methodology BiGRU model architecture Bidirectional gated recurrent unit Complexity analysis Datasets Dataset overview Utilized datasets and preprocessing UCI occupancy detection dataset Kasteren ubicomp dataset CASAS HH101-HH130 datasets Experimental setup Feature normalization Window extraction Experiment 1: UCI occupancy detection dataset Experiment 2: Kasteren ubicomp dataset Experiment 3: CASAS HH112 dataset Experiment 4: CASAS HH111 dataset Evaluation metrics Results Experiment 1: UCI occupancy detection dataset Experiment 2: kasteren ubicomp dataset Experiment 3: CASAS HH112 dataset Experiment 4: CASAS HH111 dataset Conclusion Availability of data and materials References