KNOWLEDGE ENGINEERING AND MANAGEMENT CONTRIBUTIONS FOR SCIENTIFIC RESEARCH IN THE THERMAL SMART ENERGY CONTEXT

Scientists and researchers have been increasingly studying and seeking alternatives to solve problems related to the power consumption, aiming at environmental conservation and financial savings. In this paper, Smart Energy (SE) is the term used to refer to energy studies in the areas of Smart Grids, Smart Cities, Smart Homes or Buildings. This paper has as a guiding question: how have the scientific studies in Knowledge Engineering and Management (KEM) area contributed to the SE area? The goal of this research is to present a systematic literature review of the scientific research on KEM area in SE context, particularly in thermal heating in Smart Building context, throughout the last 10 years. This systematic literature review, organized in a bibliometric study format uses syntax and content analysis and was conducted in eight stages of research in which were used three known and relevant scientific databases. The present study analyzed 61.662 papers in eight stages. The result was 18 papers strongly related to the scope of this research. The analyses point to a set of researches that mainly use the Elicitation, the Acquisition and the Knowledge Discovery as the most used KEM processes. From the point of view of the use of Information Technology (IT) approaches, it can be highlighted: Machine Learning, Artificial Intelligence algorithms and especially the proposition of their own algorithms.


INTRODUCTION
Due to the high levels of CO 2 emissions and to the growth in power consumption it is becoming increasingly important to carry out scientific studies in order to reduce environmental impacts and to contribute to financial savings of final users. In this sense, some approaches and terminologies have been established in the areas of intelligent power consumption such as Smart Grids (SG), Smart Cities (SC), Smart Homes (SH) and Smart Buildings (SB).
The term SG is related to the use of Information Technology applied to the management and operation of electrical networks. The term SC has many applications. One of them, in particular, it is the study of consumption and production of sustainable electricity. Finally, SH and SB have the same bias, although being distinct terms, both refer to the study of smart environments such as buildings or houses. The term "smart" is commonly used to equally describe the use of technological innovation and ITs for automation purposes and resource savings, Lund et al. (2012). Such areas of study have many points of convergence. In this study the term Smart Energy (SE) , Preissler Junior (2015) is used to refer to research and development initiatives in the cited areas of study. Here, however, the terms are applied to research focusing on energy context. Furthermore, the Knowledge Engineering and Management is an interdisciplinary field which over the past 25 years has been using IT as a tool to "operationalize" the Knowledge Management (KM), Nonaka (2008), Rus and Lindvall (2002). Assuming that the term "smart" is related to the use of IT and considering that IT is also an important area of KEM studies, this article deals with IT as the main point of convergence between SE and KEM.
This study is characterized as a systematic literature review in a bibliometric study format containing quantitative (syntax) and content analyzes (semantic). First of all, the keywords related to the topics studied were chosen, and then three scientific databases were elected to be used. Following, the direct search was performed and then the abstracts were read in order to filter the content. Finally, the complete reading from selected papers in the previous step was done. Partial and final results are analyzed and described throughout this document. The final results as well as future forward in relation to the themes are discussed at the end. This paper is structured as it follows: section 2 presents the definitions of KEM concepts and foundations for the terms SG, SC, SH, SB and SE. Section 3 presents the hypotheses of this study, followed by the methodology, included in section 4. In addition, section 5 reports the research process and section 6 the entire research development as well as its outcomes with the results presentation. Section 7 presents a textual content analysis from the final set of papers. Final remarks are presented in section 8.1 and conclusions in section 8.

Knowledge Engineering and Smart Studies
This section is responsible for providing references and definitions of key terms used in this research. Here the relations between studies in the KEM and Smart Energy areas are presented.
KEM is an interdisciplinary study area which aims to capitalize on organizations' intellectual capital, Rus and Lindvall (2002), Alavi and Leidner (2001). One important definition given by Davenport (1994) is that "knowledge management is the process of capturing, distributing, and effectively using knowledge". Moreover the Knowledge Engineering (KE) definition was made by Feigenbaum and McCorduck (1983) explaining that knowledge engineering is inseparably connected with solutions in the IT area: "knowledge engineering involves integrating knowledge into computer systems in order to solve complex problems", Jooß et al. (2015).
Although the KM concept emerged in the mid-1980s, Rus and Lindvall (2002), it was only in the early '90s that the two disciplines KM and KE were able to merge, forming a new subject KEM, Nonaka (2008). This merging made possible to use the techniques of both mother areas and then a new area was formed, making possible to understand and study the problems related to business and business environment using as support or means for managing the IT. In this paper, the term Smart Energy (SE) refers to the common area among SG, SC, SH or SB study areas with respect to the energy context. Fig. 1 shows the relationship among the smart areas while the dark areas represent SE on the energy context. Equation 1 complements Fig. 1, which illustrates the concept of the term SE used in this study by a logical expression. Where ES represent "Energy Studies".
The term SG is designed to integrate advanced communication and networking technologies into electrical power grids to make them "smarter", Gao et al. (2012), Alto (2008). It refers to the application of information technology to power systems. The SG uses unified communications and control system on the existing power delivery infrastructure to provide the right information to the right entity (...) at the right time to take the right action, Lund et al. (2012). The goal of SG applications is to optimize power supply and delivery minimizing losses. It is also self-healing, enables next-generation energy efficiency and demands response applications, Nonaka (2008), Lund et al. (2012).
SG grids should not be seen as separate neither from the other energy sectors nor from what the integration of the other sectors means for the identification of proper solutions to the integration problem, Lund et al. (2012). In spite of being widely used nowadays the term SC is still not a clear and consistent understanding of the concept among practitioners and academia, Chourabi et al. (2012).
Nevertheless, an important meaning for SC is that it combines digital environment to intelligent growth, a type of development based on information and communication technologies, Chourabi et al. (2012), Anttiroiko (2006). A SC can be defined as a community that has made a conscious effort to use information technology to transform, significantly and fundamentally, the live and work within its territory, instead of following an incremental way, Communities (2001).
The SC concept can be subdivided into several areas such as planning and management, human and infrastructure, government and agency administration, public safety, social programs, health care, education, transportation and water energy, environmental and smarter buildings and urban planning. Despite this diversity, this study is focused on those related to the energy sector, Townsend et al. (2010). The SH and SB terms share some functional and technical commonalities, Martins et al. (2012). The term SH, however, is mainly used to describe residential homes while SB refers to tertiary buildings (office buildings, industrial premises, hospitals, schools, etc.), Preissler et al. (2016) .
These two terms can be defined as a place equipped with computing and information technology which anticipates and responds to the needs of the occupants, working to promote their comfort, convenience, security and entertainment through the management of technology within the place and connections to the world beyond Aldrich. The term "smart" is related to SH or SB when one of those has some sort of Figure 1 Relationship among the smart technologies areas automation and where there is an interactive technology with the final user. This intelligent automation principle emerges from the needs of its occupants. So, it can provide better work or home life experience to occupants with intuitive user interfaces and without overpowering them with complex technologies, . In this study the terms SH and SB are used only related to the automation and control of power consumption.

Hypothesys
As presented before, the SE is a common study area among SG, SC, SH and SB in the Energy context. In this study, the KEM area, which is a discipline that deals with studies related to knowledge management and making use of information technology as a way to operationalize it, was included. So it can be seen that the main common point between SE and KEM is IT. Based on these statements, this study intended to carry out a systematic literature review in three important scientific databases. Having, therefore, this purpose, the hypotheses h ∈ {1, 2, 3} are presented: Hypothesis h 1 -Since KEM makes use of information technology as the main tool to operate their applications and the SE area also uses the information technology as a tool, it is assumed that most studies related to these two themes will be subordinate to the Computer Science area.
Hypothesis h 2 -Having KEM area emerged in the 90's and being also recent the Smart Grid, Smart Cities, Smart Homes and Smart Buildings terminologies, it is assumed that the greatest number of publications which relates the two subjects will be found mainly in recent years.
Hypothesis h 3 -Knowing KEM as an interdisciplinary area, it is assumed that a large number of documents found in the databases will be discarded between the second and final analysis.

METHODOLOGY
This paper aims to present, through a bibliometric study, one systematic literature review in the proposed theme. The goal is to present analyzes for the results of searches from three important scientific databases over the past 10 years using quantitative (syntax) and content analyzes (semantic). They are: IEEE Xplore (IEEEx), Scopus and Web of Science (WoS). The tools used for the analysis were Microsoft Excel and Endnote software.
The present document is characterized as descriptive, analytical and bibliometric. It can be characterized as a descriptive study because it seeks to describe all information collected during the research in all its stages. It is also analytic because at the end of each presented stage or along its steps the collected information is analyzed in order to check the hypotheses as well as presenting content analysis.

Bibliometry
The bibliometry is the study of quantitative aspects of production, dissemination and use of recorded information, Macias-Chapula (1998). Its purpose is to show one condensed representation of information for storage and future inquiries, Bardin (1977).
Although they do not have a content analysis character, bibliometry studies are important in the generation of quantitative surveys documented with respect to a quantity of articles, author, year of publication, among other indicators. These indicators could be used for the scientific community in future researches.

Research Process
This systematic literature review is also given in x stages let x = {1, ..., 8}. Where Sx is the set of papers selected in each x stage. These stages and their results can be seen in Fig. 7. It is was also conducted in two steps: (a) systematic literature search and (b) analysis and synthesis of bibliometric information. Both steps occur in parallel, i.e. after each completed stage Sx(a), an analysis of the results obtained is done Sx(b).

Literature Search Procedure -step (a)
Firstly, the keywords related to the topics researched were chosen. Then the scientific databases to be used were selected and after that the direct search was carried out. In order to execute the direct search, the terms above are also set on the search engine in the following categories: titles, abstracts and keywords (meta-data only). The parameter indicates that it was intended to list only the most recent publications and those publications must be in English in order to be readable by the authors.

Analysis and Synthesis Procedure -step (b)
After carrying out the direct searches conducted with the settings of the search terms, the abstracts were read in order to filter the content and verify whether the selected paper was aligned with the search terms. Finally, the report generation which is presented in Sec. 6 was developed.

RESEARCH DEVELOPMENT
The present section aims to present the steps of this research process. There are eight phases of analysis based on syntax, where the papers are eliminated by the use of keywords and then content analysis, by reading abstracts and full texts.
Stage S 1 A previous search S 1 (a) with the terms t given in Eq. 2 was conducted. The purpose of this previous search was to check the entire sample space of scientific research in SG, SC, SH and SB in those three scientific bases (IEEEx, Scopus and WoS).
Where: S 1 = set of found publications in stage 1 t 3 = "smart grid*" t 4 = "smart cit*" t 5 = "smart home*" t 6 = "smart build*" Table 1 shows the results from the equivalent search from Eq. 2. The total found scientific production was 61.662. The largest amount was found in the IEEE Xplore while the lowest was found in WoS.

Stage S 2
Furthermore, step S 2 (a) has applied the terms equivalent from Eq. 4 without parameters (y and lang) search resulted in the values presented in Table 2 from Eq. 3. The results indicate a significant reduction in the number of publications found after the terms t 1 and t 2 were added.

Stage S 3
In sequence, the parameters y and lang were added to the searches resulted in the values shown on Table 3. The search terms used in this stage S 3 (a) are presented in Eq. 4 Where y is the year of publications and lang is the language. In that case Eng stands for English. By observing Tables 2 and 3 there is a total reduction of 91publications to be analyzed.

Stage S 4
After this step the files were imported into EndNote references manager. After analyzing the 874 in the three databases were found 301 inconsistencies. These inconsistencies consisted in the lack of title, author names or abstracts or because there were duplicated documents among these bases. That is, the same document was indexed in more than one database, so appearing in duplicate or three times.
That S 4 (a) stage (Eq. 5) took into consideration the author(s)' name(s), paper's title and the publisher. It resulted in 573 papers. Fig. 2(a) shows the amount of scientific works found grouped by year of publication. The year 2016 presents a significant drop in the number of publications. This is due, in large part, to the fact that this paper was finalized in the beginning of the current year 2016, thus justifying these low numbers. Fig. 2(b) objectives to show the percentage ratio among the types of found documents. More than half (62%) of these documents are from Conference Proceedings, followed by Articles category (30%), then Book Section (7%) and Book (1%). Still, Fig. 5(a) represents the documents by subject area. The vast majority is from the Computer Science (29%) and Engineering (25%) areas. Despite representing significant 23%, the category "others" includes many other areas of knowledge which individually account for 1% or less.

Stage S 5
The next stage of this study consists in adding a new term t 7 to the sample space (Eq. 6). It is the t 7 = heat term. This term is added to the search because it is intended to identify studies related to KEM under thermal studies, specifically heating environments. This Step S 5 (a) resulted in a total of 48 papers. Then, the abstracts were read to identify the correspondence to this paper propose and the found publications. The words and phrases that might identify the involved works with KEM search area were chosen based on literature relating to the area, Bovo (2011), Ppgegc (2014).

Stage S 6
After the reading analysis, 5 publications were eliminated in Step S 6 (a) as it is presented by Eq. 7). The criterion used in the readings was to identify whether the abstracts had scientific research evidence related to the issues, but more specifically to the KEM and thermal studies areas.

Stage S 7
From those 43 papers selected by reading the abstracts, the search for files in electronic format was proceed. Only 37 papers were able to be downloaded (S 7 (b)) from the Internet.
Step S 7 (a) can be seen in Eq. 8).  (2015) Anders et al. (2013) X Barnicoat and Danson (2015) Chen and Cheng (2014 This occurred because there was trouble in downloading or because the files were not in readable digital format. Table 4 and Table 5 show the specific IT or KEM processes found in the S 7 (b In Stage S7 the same main author appears in Martirano et al. (2014a) and Martirano et al. (2014b) as well as Fernandes et al. (2012) and Fernandes et al. (2015). A summary analysis from the complete reading of these 37 papers (step S 8 (b)) can be seen in Table 5 In these tables the relationship between References and a summary of the text classification performed can be verified. The columns Explicit IT and Explicit KEM refer to the fact that an explicit relationship between the work and the IT or KEM processes was found in the reading. Columns Fit IT and Fit KEM are related to the adequacy of classification, made by these authors, to the themes IT or KEM. Following, column Keep has the final analysis criterion from the 37 papers: if they will be kept for the next stage S 8 or not. In this table are presented the methodology process proposed by each one of the 37 papers.
The approaches found are Case study, Concept, Methodology, Model, Short Paper and Survey or Review.
The last column is related to the Context in which the paper is proposed: Smart Buildings or Homes (SB), Smart Cities (SC) or Smart Grids (SG).

Stage S 8
Stage S 8 is the full text reading of the papers found on stage S 7 .
Step S 8 (a) can be represented by Eq. 9. The stage S 8 results in 18 papers. A title word cloud is presented by Fig. 4 -(a) from those 18 papers found in step S8. The most used words are: Smart, Energy, System and Management.
Only in the abstract word cloud (Fig. 4 -(c)) it is possible to clearly identify the word Knowledge. The most used are:

Energy, Systems, Smart, Management, Algorithm and
Knowledge.
The used methodology approaches are represented by Fig. 5(a).
In which 33% propose a new Methodology, 28% Case Study, 22% Model and 17% are Not Classified. This last is due to a not declared methodology or because it was not possible a classification by these authors.
In Fig. 5(b) the percentage of Context classification from the 18 papers is presented. It shows that the majority, 56% are classified in Smart Buildings or Smart Homes studies. This percentage is followed by 39% of Smart Grids and the last 5% of Smart Cities context. Concerning to the Technological Approach Fig. 6 (b) presents a graph with the relationship between Frequency and IT Approaches identified in S 8 (b). The most used are: Machine Learning and proposition of their Own Algorithm. This information was obtained by reading explicit declaration of the used process.

Content Description
A resource allocation problem is studied by Anders et al. (2013). In this work a power plant management balance between energy production and consumption was proposed. The author uses a multi-agent IT method to solve one dynamic allocation problem. That work has no explicit fit with KEM process but it was categorized as Fit KEM because elements  were found in the text demonstrating that it uses old knowledge and untrusted knowledge, as it was cited by the authors. Chen and Cheng (2014) present a paper in a power management for a residential user with renewable energy production context. It shows a study using photovoltaic inverter a battery. The authors propose an own algorithm based on Lyapunov optimization for residential users in Smart Grid context. Even without an explicit KEM process, it demonstrates knowledge reuse over its development.
In Fernandes et al. (2012) and Fernandes et al. (2015) the same main author appears. In the first one a (Supervisory Control and Data Acquisition) SCADA-based system is proposed in order to obtain a heat, power and consumption optimization. This algorithm works on Data Acquisition, a process which is frequently used in KEM. It also included a case study using real data. The second one proposes a Heating, Ventilation and Air Conditioning (HVAC) management methodology. It is placed in a Smart Building context and it has an explicit relationship with KEM process. This work takes into account the user knowledge and the other external parameters as temperatures. An experimental demand response service implementation for Smart Buildings is presented by Hong et al. (2014). A new hardware is proposed in order to control the heating devices. In that work it is possible to find the relationship between KEM and IT, where the authors use the expression "user expected price-based". They also refer, over the development part, to the "user knowledge". In this study, laboratory simulations are performed and those results are presented. Other energy service is proposed by Huang et al. (2013) focused on disaggregation of heating usage. Machine Learning method is used, and it is called Non-Intrusive Load Monitoring. Those authors cite knowledge transformation as their KEM process. A disaggregation algorithm estimates heating usage and these characteristics are used to predict the heat power.
An occupancy estimation is proposed by Javed et al. (2015). It is an experimental testing of a random neural network. This paper works with smart controller using a single zone test, so in a Smart Building context. That work uses prior knowledge for its methodology and as an IT approach is used an artificial neural network algorithm. Keshtkar and Arzanpour (2014) and Martirano et al. (2014b) use fuzzy logic systems as IT approach to solve heating problems. The first one works on Smart Buildings context as well as the second one. Sensors and real time information are used in Keshtkar and Arzanpour (2014) and Martirano et al. (2014b) works with sensitivity analysis and it works with a "building knowledge" propose. In the fuzzy logic rules, Keshtkar and Arzanpour (2014) used a natural framework to include expert knowledge in the form of linguistic rules.
Ontologies are used by Kofler and Kastner (2010). The authors use knowledge representation and reuse process. It is presented in a Smart Building context and one algorithm is proposed too. That paper suggests the usage of building information models as source and semantic web technologies as store of building data. A comparison between conventional grid and Smart Grid is presented by Mulyono (2015). In this work it is proposed a methodology in two steps using game theory to model an electricity usage behavior and mutual support interaction in smart grid system. Authors show a case study in which a constant user interaction is used.
An online learning algorithm is proposed by Pardo et al. (2015). That work uses low cost wireless sensor networks nodes and artificial neural network back-propagation algorithm. Even the authors declare the algorithm do not need prior knowledge, their algorithm is fit KEM because it uses user knowledge and generate new knowledge by learning. An algorithm for energy management in SH using wireless sensors and artificial intelligence is proposed by Qela and Mouftah (2012). This process is called by authors as Observe, Learn and Adapt (OLA) method. The proposed algorithm utilizes the adaptable learning, system concepts and it uses integration of wireless sensors and artificial intelligence concepts. Knowledge Base and Adaptive learning are cited in this work.
One contribution for SH called advanced data acquisition and analysis for improved sustainability and decision making is described by Rowley et al. (2013). In this paper a "SmartNet" is developed and acquisition and knowledge sharing methods are used. She et al. (2013) compares real parameters and parameters obtained from the manufacturer using system identification method. These parameters are used as a multi-objective control for fuel cell system. The performance and stability is theoretically guaranteed by a Lyapunov-based proof. Two types of controller are developed. Although a knowledge base from parameters is used, its relationship with KEM is not clear.
Knowledge extraction is used by Viegas et al. (2015) when an electricity demand profile prediction based on household characteristics is proposed. This proposed model is a methodology for predicting the typical daily load profile of electricity usage based on static data obtained from surveys. It used k means clustering algorithm. This paper has a clear relationship with IT and KEM because it uses data mining and machine learning as IT approaches and knowledge extraction and classification as KEM process. Wastell et al. (2006) makes a microworld study of domestic information system design. This paper proposes a GUI Interface focuses on the design of domestic heating management systems, specifically on the feedback support required to achieve energy savings. This paper uses design knowledge as KEM process.
An integrating mobile device is proposed by Welge et al. (2010) using knowledge based assistance systems. An adaptive assistance system is described to specific user preferences distributed by mobile devices building automation system. This work uses embedding methods of knowledge processing and knowledge aggregation. It also has a strong influence of KEM process using semantic knowledge description language and ontologies.

CONCLUSION
This paper aimed to present, through a systematic literature review, one bibliometric study about KEM contributions in SE area. This research has analyzed 61.662 (S 1 ) papers and has finally selected 18 (S 8 ) which resulted as being strongly related to the proposed theme.

General Discussions
The funnel, Fig. 7, presents the quantitative results (funnel left side) obtained in this re-search as well as the process used in each step (funnel right side). Where S x represents the set of publications S obtained in each step x.
On stage S 7 it was possible to verify that 44.44% of those 37 papers also apply their studies to cooling environments, 33.33% propose controller acting on the heating devices. More than half (66.67%) use parameter identification method to identify thermal heating characteristics from the building.
Two of them use gray-box method to study heating parameters.
The following approaches were also identified: Hybrid System, Probability Distribution, System Identification and Electrical-Thermal Analogy. Finally, on stage S 8 it was identified that 83.33% of those 18 papers were published in the last 4 years proving h 2 . Parameter Identification method is used by 33.33% and 27.78% also covered thermal-cooling studies.
Acquisition and Knowledge Discovery as well as Elicitation were the KEM processes most used by the 18 finalist papers. Although well-distributed IT approaches, it is possible to highlight the use of GUI in the studies as well as the application of Machine Learning algorithms. Among the finalists it is possible to identify that 16.67% of them propose GUI Interfaces, 3 from those 18 propose controller acting on the heating devices. A new hardware is presented for 11.11% and one of them uses Systems Thinking, which is a method that brings together Computer Science and Engineering Technologies studies in the same field.
Several IT approaches have been used to address issues related to the Smart Energy area. Particular attention is given to Machine Learning methods and Artificial Intelligence algorithms. Based on these approaches, 21% of the authors have proposed their own technological solution (algorithm) to solve the problems studied by them. Through the non-syntactic analysis, it was possible to identify, in the reading stages (S 6 and S 8 ), that the papers, in the majority make use of advanced computerized technical language while, on the other hand, do not make detailed reference to the KEM techniques used. There is a greater concern of the authors in detailing the technical computational results than describing the KEM methods used.

Final Analysis and Future Works
The main contribution of this paper is related to the presentation of the profile from the scientific researches involving the KEM and SE areas and their features particularly in thermal-heating in Smart Building context.
The preliminary hypotheses were all checked and all of them were satisfied: h 1 was satisfied, once Fig. 1 shows that 29% of studies in the S 4 refer to the Computer Science area; h 2 was satisfied, as Fig. 2 -(a) shows that the greater number of publications has occurred over the past five years and also because on stage S 8 it was identified that 83.33% of the works were published in the last 4 years. Hypothesis h 3 was satisfied because through the comparison between S 2 and S 8 it will be observed that 98.13% of the analyzed publications were rejected between the second and the last stage. The use of different keywords related to the KEM area to improve the assertiveness of the results in the second stage S 1 and it can be depicted as one of the suggestions given for future works. Moreover, for further bibliometric research, it will be advisable to use a local computer system with some sort of data mining in order to streamline the process between steps. Several initiatives can be found using KEM with IT applications to solve problems related to the SE context. The identified KEM processes however, are used often only to describe the technological approaches and not as a complete process per se. This factor can represent an important opportunity for SE solutions based 100% on KEM processes.
A closer look at the reality of a household user who consume daily electricity from KEM's point of view can significantly subsidize the decisions of algorithms and other approaches used by IT. Throughout the research it has been verified that the contribution of domestic users in the decision-making process of the algorithms makes the resource saving process even more efficient.
The union among: the use of household feedback (either automatic or without interaction), the use of technologies such as Internet of Things and even Machine Learning is a tendency pointed out in the researches and that can generate accurate results regarding the reduction in consumption of electric power and intelligent electrical systems. These conjoined approaches may represent promising research gaps. In conclusion, it can be seen from the diversity of existing processes in KEM that these can be further explored when applied to the studies of Smart Energy. As KE comes from AI and has a close connection with Computer Science and Engineering, in depth studies can be carried out connecting these related areas to the problems of the Energy area.