We propose an incremental change detection method for data center (DC) energy efficiency metrics and consider its application to the power usage efficiency (PUE) metric. In recent years, there is an increasing focus on the sustainability of DCs and PUE is playing an important role to evaluate the DC's energy efficiency. Publicly reported PUE values are mostly calculated over a whole year as there are many fluctuations caused by outside influences as outdoor air temperature (OAT). In this paper, we propose a method to detect short-term changes in the DC energy efficiency (e.g., PUE) , while considering outside influences (e.g., OAT) observing related daily aggregated DC data. We also conduct a few preliminary experiments for PUE change detection based on real-world DC data, where we have manually labeled changes in the PUE using visualization tools. The experimental results show that the method can detect important major and minor changes in the PUE with a very low false positive rate. However, due to the small number of positive labels, the recall rate is currently between 57% and 70%. Further investigation is necessary to see how representative the current recall rates are and what kind of improvements are necessary to make the change detection method more stable.
Performability is the classic metric for performance evaluation of static systems in case of failures. Compared to static systems, Self-Adaptive Systems (SASs) are inherently more complex due to their constantly changing nature. Thus software architects are facing more complex design decisions which are preferably evaluated at design-time. Model-Based Quality Analysis (MBQA) provides valuable support by putting software architects in a position to take well-founded design decisions about software system quality attributes over the whole development phase of a system. We claim that combining methods from MBQA and established performability concepts support software architects in this decision making process to design effective fault-tolerant adaptation strategies. Our contribution is a model-based approach to evaluate performability-oriented adaptation strategies of SAS at design-time. We demonstrate the applicability of our approach by a proof-of-concept.
Cloud-native systems are dynamic in nature as they always have to react to changes in the environment, e.g., how users utilize the system. Self-adaptive cloud-native systems manage those changes by predicting how future environmental changes will impact the system's service level objectives and how the system can subsequently reconfigure to ensure that the service level objectives stay fulfilled. The farther the predictions look into the future, the higher the chance that good reconfigurations can be identified and applied. However, this requires efficient exploration of potential future system states, particularly exploring alternative futures resulting from alternative system reconfiguration. We present in this paper an extension to the Slingshot simulator for Palladio component models to efficiently explore the future state space induced by environmental changes and reconfigurations. The extension creates snapshots of simulation states and reloads them to explore alternatives. We show that Slingshot's event-based publish-subscribe architecture enables us to extend the simulator easily and without changes to the simulator itself.
Microservices is a cloud-native architecture in which a single application is implemented as a collection of small, independent, and loosely-coupled services. This architecture is gaining popularity in the industry as it promises to make applications more scalable and easier to develop and deploy. Nonetheless, adopting this architecture in practice has raised many concerns, particularly regarding the difficulty of diagnosing performance bugs and explaining abnormal software behaviour. Fortunately, many tools based on distributed tracing were proposed to achieve observability in microservice-oriented systems and address these concerns (e.g., Jaeger). Distributed tracing is a method for tracking user requests as they flow between services. While these tools can identify slow services and detect latency-related problems, they mostly fail to pinpoint the root causes of these issues.
This paper presents a new approach for enacting cross-layer tracing of microservice-based applications. It also proposes a framework for annotating traces generated by most distributed tracing tools with relevant tracing data and metrics collected from the kernel. The information added to the traces aims at helping the practitioner get a clear insight into the operations of the application executing user requests. The framework we present is notably efficient in diagnosing the causes of long tail latencies. Unlike other solutions, our approach for annotating traces is completely transparent as it does not require the modification of the application, the tracer, or the operating system. Furthermore, our evaluation shows that this approach incurs low overhead costs.
Runtime data of software systems is often of multivariate nature, describing different aspects of performance among other characteristics, and evolves along different versions or changes depending on the execution context. This poses a challenge for visualizations, which are typically only two- or three-dimensional. Using dimensionality reduction, we project the multivariate runtime data to 2D and visualize the result in a scatter plot. To show changes over time, we apply the projection to multiple timestamps and connect temporally adjacent points to form trajectories. This allows for cluster and outlier detection, analysis of co-evolution, and finding temporal patterns. While projected temporal trajectories have been applied to other domains before, we use it to visualize software evolution and execution context changes as evolution paths. We experiment with and report results of two application examples: (I) the runtime evolution along different versions of components from the Apache Commons project, and (II) a benchmark suite from scientific visualization comparing different rendering techniques along camera paths.
The performance of distributed applications implemented using microservice architecture depends heavily on the configuration of various parameters, which are hard to tune due to large configuration search space and inter-dependence of parameters. While the information in product manuals and technical documents guides the tuning process, manual collection of meta-data for all application parameters is laborious and not scalable. Prior works have largely overlooked the automated use of product manuals, technical documents and source code for extracting such meta-data. In the current work, we propose using large language models for automated meta-data extraction and enhancing the configuration tuning pipeline. We further ideate on building an in-house knowledge system using experimental data to learn important parameters in configuration tuning using historical data on parameter dependence, workload statistics, performance metrics and resource utilization. We expect productionizing the proposed system will reduce the total time and experimental iterations required for configuration tuning in new applications, saving an organization both developer time and money.
Serverless platforms have exploded in popularity in recent years, but, today, these platforms are still unsuitable for large classes of applications. They perform well for batch-oriented workloads that perform coarse transformations over data asynchronously, but their lack of clear service level agreements (SLAs), high per-invocation overheads, and interference make deploying online applications with stringent response time demands impractical.
Our assertion is that beyond the glaring issues like cold start costs, a more fundamental shift is needed in how serverless function invocations are provisioned and scheduled in order to support these more demanding applications. Specifically, we propose a platform that leverages the observability and predictability of serverless functions to enforce multi-resource fairness. We explain why we believe interference across a spectrum of resources (CPU, network, and storage) contributes to lower resource utilization and poor response times for latency-sensitive and high-fanout serverless application patterns. Finally, we propose a new distributed and hierarchical function scheduling architecture that combines lessons from multi-resource fair scheduling, hierarchical scheduling, batch-analytics resource scheduling, and statistics to create an approach that we believe will enable tighter SLAs on serverless platforms than has been possible in the past.
SPEC benchmarks are crucial contributors behind the improvement of server efficiency since 2007, given their role in making the power consumption and efficiency of servers transparent for government regulators, customers, and the manufacturers themselves.
As the IT landscape experiences radical transformations, efficiency benchmarks need to be updated accordingly to generate results relevant to government regulators, manufacturers, and customers. In this paper, we outline current challenges efficiency benchmark developers are tackling and highlight recent technological developments the next generation of efficiency benchmarks should take into account.
Datacenters are the backbone of our digital society, used by the industry, academic researchers, public institutions, etc. To manage resources, data centers make use of sophisticated schedulers. Each scheduler offers a different set of capabilities and users make use of them through the APIs they offer. However, there is not a clear understanding of what programming abstractions they offer, nor why they offer some and not others. Consequently, it is difficult to understand the differences between them and the performance costs that are imposed by their APIs. In this work, we study the programming abstractions offered by industrial schedulers, their shortcomings, and the performance costs of the shortcomings. We propose a general reference architecture for scheduler programming abstractions. Specifically, we analyze the programming abstractions of five popular industrial schedulers, we analyze the differences in their APIs, we identify the missing abstractions, and finally, we carry out an exemplary experiment to demonstrate that schedulers sacrifice performance by under-implementing programming abstractions. In the experiments, we demonstrate that an API extension can improve task runtime by up to 23%. This work allows schedulers to identify their shortcomings and points of improvement in their APIs, but most importantly, provides a reference architecture for existing and future schedulers.
Systematic testing of software performance during development is a persistent challenge, made increasingly important by the magnifying effect of mass software deployment on any savings. In practice, such systematic performance evaluation requires a combination of an efficient and reliable measurement procedure integrated into the development environment, coupled with an automated evaluation of the measurement results and compact reporting of detected performance anomalies.
A realistic evaluation of research contributions to systematic software performance testing can benefit from the availability of measurement data that comes from long term development activities in a well documented context. This paper presents a data artifact that aggregates more than 70 machine time years of performance measurements over 7 years of development of the GraalVM Compiler Project, aiming to reduce the costs of evaluating research contributions in this and similar contexts.
Developers often use microbenchmarking tools to evaluate the performance of a Java program. These tools run a small section of code multiple times and measure its performance. However, this process can be problematic as Java execution is traditionally divided into two stages: a warmup stage where the JVM's JIT compiler optimizes frequently used code and a steady stage where performance is stable. Measuring performance before reaching the steady stage can provide an inaccurate representation of the program's efficiency. The challenge comes from determining when a program should be considered as in a steady state. In this paper, we propose that call stack sampling data should be considered when conducting steady state performance evaluations. By analyzing this data, we can generate call graphs for individual microbenchmark executions. Our proposed method of using call stack sampling data and visualizing call graphs intuitively empowers developers to effectively distinguish between warmup and steady state executions. Additionally, by utilizing machine learning classification techniques this method can automate the steady state detection, working towards a more accurate and efficient performance evaluation process.
As Java microbenchmarks are great at profiling the performance of essential code elements, they are widely adopted for Java performance testing. Performance testing using Java microbenchmarks is composed of two phases: the warmup phase and the steady phase. Usually, testing results from the warmup phase are discarded because of the highly fluctuating performance caused by the optimization of Java Virtual Machine. The performance results collected during the steady phase are used for performance evaluation as they are assumed to be more stable. However, according to our study, severe performance fluctuations also occur during the steady phase, which leads to long tail latencies. Long tail latencies constitute a big problem in modern Java systems (of course, other systems and applications) as they hurt the user experience by prolonging the overall execution time.
In this paper, we extensively evaluated the long tail performance of 586 Java microbenchmarks from 30 Java systems. The evaluation results show that, for 38% of the benchmarks in steady phase, their 99%tile execution times are over 30% higher than their median execution times. In the worst-case scenario, the 99%ile performance is 659 times higher than the median performance. Furthermore, the 95%ile execution times are above 30% higher than the median execution times for 11% of the steady phase benchmarks.
This paper compares the performance of R, Python, and Rust in the context of data processing tasks. A real-world data processing task in the form of an aggregation of benchmark measurement results was implemented in each language, and their execution times were measured. The results indicate that while all languages can perform the tasks effectively, there are significant differences in performance. Even the same code showed significant runtime differences depending on the interpreter used for execution. Rust and Python were the most efficient, with R requiring much longer execution times. Additionally, the paper discusses the potential implications of these findings for data scientists and developers when choosing a language for data processing projects.
Source code analysis is an important aspect of software development that provides insight into a program's quality, security and performance. There are few methods for consistently predicting or determining when a written piece of code will end its warm-up state and proceed to a steady state. In this study, we use the data gathered by the SEALABQualityGroup at the University of L'Aquila and Charles University and extend their research of steady state analysis to determine whether certain source code features could provide a basis for developers to make more informed predictions on when a steady state would occur. We explore if there is a direct correlation between source code features on the time and ability of a Java microbenchmark to reach a steady state to build a machine learning-based approach for steady-state prediction. We found that the correlation between source code features and the probability of reaching a steady state go as high as 10.9% for Pearson's correlation coefficient, whereas the correlation between source code features and the time it takes to reach a steady state go as high as 21.6% for Spearman's correlation coefficient. Our results also show that a K Nearest Neighbour Classifier with features selected with either Spearman's or Kendall's correlation coefficient boasts an accuracy of 78.6%.
Stable and repeatable measurements are essential for comparing the performance of different systems or applications, and benchmarks are used to ensure accuracy and replication. However, if the corresponding measurements are not stable and repeatable, wrong conclusions can be drawn. To facilitate the task of determining whether the measurements are similar, we used a data set of 586 micro-benchmarks to (i) analyze the data set itself, (ii) examine our previous approach, and (iii) propose and evaluate a heuristic. To evaluate the different approaches, we perform a peer review to assess the dissimilarity of the benchmark runs. Our results show that this task is challenging even for humans and that our heuristic exhibits a sensitivity of 92%.
The practice of microbenchmarking is very important for observing the performance of code. As such, observing the states and anomalies experienced by the program during a benchmark is equally important. This paper attempts to evaluate the effectiveness of the matrix profile method when applied to analyse JMH benchmarks in time series format, to determine if it is a viable alternative to proven methods. We observe that, when using the matrix profile method, there is a statistically significant difference between the results of the analysis on steady state and non-steady state benchmarks. By comparing results of the matrix profile method and the proven changepoint analysis method, we are able to prove a stronger correlation between the two when the benchmark tested is non-steady state versus that of steady state.
Microbenchmarking is a widely used method for evaluating the performance of a piece of code. However, the results of microbenchmarks for applications that utilize the Java Virtual Machine (JVM) are often unstable during the initial phase of execution, known as the warmup phase. This is due to the JVM's use of just-in-time compiler optimization, which is to identify and compile a "hot set" of important code regions. In this study we examine the static features of 586 microbenchmarks from 30 Java applications. To do so, we first extract static source code features of the benchmarks and then employ manual and descriptive data mining methods to identify meaningful correlations between these static features and the benchmarks' ability to reach a steady state. Our findings indicate that the number of function calls and lines of code have a considerable influence on whether or not the microbenchmarks reach a steady state.
While systemic failure/overload in recoverable networks with load redistribution is a common phenomenon, current ability to evaluate and moreover mitigate the corresponding systemic risk is vastly insufficient due to complexity of the problem and relying on oversimplified models. The proposed in this paper framework for systemic risk evaluation relies on approximate dimension reduction at the onset of systemic failure. Assuming a general failure/recovery microscopic model, the macro-level system dynamics is approximated by a 2-state Markov process alternating between systemically operational and failed states.
The advance in digital twin technology is creating value for lots of companies. We look at the digital twin design and operation from a sustainability perspective. We identify some challenges related to a digital twin's sustainable design and operation. Finally, we look at some possible approaches, grounded in multi-paradigm modelling to help us create and deploy more sustainable twins.
In this paper, we present PerfoRT, a tool to ease software performance regression measurement of Java systems. Its main characteristics include: minimal configuration to ease automation and hide complexity to the end user; a broad scope of performance metrics including system, process, JVM, and tracing; and presentation of the results from a developer's perspective. We show some of its features in a usage example using Apache Commons BCEL project.
This tutorial will introduce participants to the Score-P measurement system and the Vampir trace visualization tool for performance analysis. We will provide examples and hands-on exercises covering the full performance engineering workflow cycle on applications that include MPI, OpenMP, and GPU parallelism. Users will learn the following concepts: 1. How to collect an initial profile of their code with Score-P. 2. Evaluation of that profile and its associated measurement overhead. 3. The concepts of scoring and filtering a profile and measurement respectively. 4. How to control the Score-P measurement system via environment variables. 5. How to collect useful traces with acceptable overhead. 6. How to understand trace visualization in Vampir.
While many developers put a lot of effort into optimizing large-scale parallelism, they often neglect the importance of an efficient serial code. Even worse, slow serial code tends to scale very well, hiding the fact that resources are wasted because no definite hardware performance limit ("bottleneck") is exhausted. This tutorial conveys the required knowledge to develop a thorough understanding of the interactions between software and hardware on the level of a single CPU core and the lowest memory hierarchy level (the L1 cache). We introduce general out-of-order core architectures and their typical performance bottlenecks using modern x86-64 (Intel Ice Lake) and ARM (Fujitsu A64FX) processors as examples. We then go into detail about x86 and AArch64 assembly code, specifically including vectorization (SIMD), pipeline utilization, critical paths, and loop-carried dependencies. We also demonstrate performance analysis and performance engineering using the Open-Source Architecture Code Analyzer (OSACA) in combination with a dedicated instance of the well-known Compiler Explorer. Various hands-on exercises allow attendees to make their own experiments and measurements and identify in-core performance bottlenecks. Furthermore, we show real-life use cases to emphasize how profitable in-core performance engineering can be.
We are pleased to welcome you to the 2023 ACM Workshop on Artificial Intelligence for Performance Modeling, Prediction, and Control - AIPerf'23.
In its first edition, AIPerf intends to foster the usage of AI (such as probabilistic methods, machine learning, and deep learning) to control, model, and predict the performance of computer systems. The relevance of these topics reflects current and future trends toward exploiting AI-based approaches to deal with complex, large, and interconnected systems. Despite AI and ML being widely adopted techniques to investigate several mainstream domains, their usage for performance modeling and evaluation is still limited, and their benefit to the performance engineering field remains unclear. AIPerf proposes a meeting venue to promote the dissemination of research works that use or study AI techniques for quantitative analysis of modern ICT systems and to engage academics and practitioners of this field. The workshop focuses on presenting experiences and results of applying AI/ML-based techniques to performance-related problems, as well as sharing performance datasets and benchmarks with the community to facilitate the development of new and more accurate learning procedures.
Putting together AIPerf'23 was a team effort. We first thank the authors for providing the content of the program. We are grateful to the program committee and the senior program committee, who worked very hard to review papers and provide authors' feedback. Finally, We thank the ICPE23 organizers for sponsoring AIPerf in its community.
We hope that you will find this program interesting and thought-provoking and that the symposium will provide you with a valuable opportunity to share ideas with other researchers and practitioners from institutions around the world.
Optimizing the performance of complex systems has always been a central issue for the control theory community. However, ideas and tools from this field often require very precise assumptions and extensive tuning to perform well, making them unsuited for a non-specialist practitioner.
In recent times, however, the influx of the machine learning community has brought a wave of renewal in the field, making many of these powerful methods finally applicable outside academic examples.
In this talk, I will discuss my journey at the border between control theory and machine learning, from classical system identification and model-based control to modern autotuning data-driven techniques. I will also shed light on how this novel generation of much more user-friendly techniques can easily be applied to improve the performance of a large class of systems, including software ones.
Modern distributed systems can benefit from the availability of large-scale and heterogeneous computing infrastructures. However, the complexity and dynamic nature of these environments also call for self-adaptation abilities, as guaranteeing efficient resource usage and acceptable service levels through static configurations is very difficult.
In this talk, we discuss a hierarchical auto-scaling approach for distributed applications, where application-level managers steer the overall process by supervising component-level adaptation managers. Following a bottom-up approach, we first discuss how to exploit model-free and model-based reinforcement learning to compute auto-scaling policies for each component. Then, we show how Bayesian optimization can be used to automatically configure the lower-level auto-scalers based on application-level objectives. As a case study, we consider distributed data stream processing applications, which process high-volume data flows in near real-time and cope with varying and unpredictable workloads.
This paper proposes an auto-profiling tool for OSCAR, an open-source platform able to support serverless computing in cloud and edge environments. The tool, named OSCAR-P, is designed to automatically test a specified application workflow on different hardware and node combinations, obtaining relevant information on the execution time of the individual components. It then uses the collected data to build performance models using machine learning, making it possible to predict the performance of the application on unseen configurations. The preliminary evaluation of the performance models accuracy is promising, showing a mean absolute percentage error for extrapolation lower than 10%.
This paper presents a novel methodology based on first principles of statistics and statistical learning for anomaly detection in industrial processes and IoT environments. We present a 5-level analytical pipeline that cleans, smooths, and eliminates redundancies from the data, and identifies outliers as well as the features that contribute most to these anomalies. We show how smoothing can make our methodology less sensitive to short-lived anomalies that might be, e.g., due to sensor noise. We validate the methodology on a dataset freely available in the literature. Our results show that we can identify all anomalies in the considered dataset, with the ability of controlling the amount of false positives. This work is the result of a research project co-funded by the Tuscany Region and a company leader in the paper and nonwovens sector. Although the methodology was developed for this domain, we consider here a dataset from a different industrial sector. This shows that our methodology can be generalized to other contexts with similar constraints on limited resources, interpretability, time, and budget.
It is our great pleasure to welcome you to the first FastContinuum Workshop held on April 16th 2023. The goal of the workshop is to foster discussion and collaboration among researchers from cloud/edge/fog computing and performance analysis communities, to share the relevant topics and results of the current approaches proposed by industry and academia. FastContinuum solicited full papers as well as demo and short papers including reports about research activities not mature enough for a full paper as well as new ideas and vision papers.
The final program includes four full papers and three short ones. They cover some of the most interesting areas of computing continua, from FaaS development and acceleration to the management of heterogeneous datasets to the development of Infrastructure as Code and the automation of deployment through the computing continuum. DevSecOps is also brought to the attendees' attention as one of the crucial ingredients for proper management of the continuum.
The workshop keynote, given by Samuel Kunev, investigates further the area of serverless computing properly positioning the multiple aspects and approaches developed in this area and highlighting the main challenges related to the performance of these approaches. The keynote is held in collaboration with the eleventh International Workshop on Load Testing and Benchmarking of Software Systems (LTB 2023).
Serverless computing and, in particular, Function as a Service (FaaS) has introduced novel computational approaches with its highly-elastic capabilities, per-millisecond billing and scale-to-zero capacities, thus being of interest for the computing continuum. Services such as AWS Lambda allow efficient execution of event-driven short-lived bursty applications, even if there are limitations in terms of the amount of memory and the lack of GPU support for accelerated execution. To this aim, this paper analyses the suitability of including GPU support in AWS Lambda through the rCUDA middleware, which provides CUDA applications with remote GPU execution capabilities. A reference architecture for data-driven accelerated processing is introduced, based on elastic queues and event-driven object storage systems to manage resource contention and GPU scheduling. The benefits and limitations are assessed through a use case of sequence alignment. The results indicate that, for certain scenarios, the usage of remote GPUs in AWS Lambda represents a viable approach to reduce the execution time.
The ability to split applications across different locations in the continuum (edge/cloud) creates needs for application break down into smaller and more distributed chunks. In this realm the Function as a Service approach appears as a significant enabler in this process. The paper presents a visual function and workflow development environment for complex FaaS (Apache OpenwhisK) applications. The environment offers a library of pattern based and reusable nodes and flows while mitigating function orchestration limitations in the domain. Generation of the deployable artefacts, i.e. the functions, is performed through embedded DevOps pipelines. A range of annotations are available for dictating diverse options including QoS needs, function or data locality requirements, function affinity considerations etc. These are propagated to the deployment and operation stacks for supporting the cloud/edge interplay. The mechanism is evaluated functionally through creating, registering and executing functions and orchestrating workflows, adapting typical parallelization patterns and an edge data collection process.
Survival analysis studies time-modeling techniques for an event of interest occurring for a population. Survival analysis found widespread applications in healthcare, engineering, and social sciences. However, the data needed to train survival models are often distributed, incomplete, censored, and confidential. In this context, federated learning can be exploited to tremendously improve the quality of the models trained on distributed data while preserving user privacy. However, federated survival analysis is still in its early development, and there is no common benchmarking dataset to test federated survival models. This work provides a novel technique for constructing realistic heterogeneous datasets by starting from existing non-federated datasets in a reproducible way. Specifically, we propose two dataset-splitting algorithms based on the Dirichlet distribution to assign each data sample to a carefully chosen client: quantity-skewed splitting and label-skewed splitting. Furthermore, these algorithms allow for obtaining different levels of heterogeneity by changing a single hyperparameter. Finally, numerical experiments provide a quantitative evaluation of the heterogeneity level using log-rank tests and a qualitative analysis of the generated splits. The implementation of the proposed methods is publicly available in favor of reproducibility and to encourage common practices to simulate federated environments for survival analysis.
As the next generation of diverse workloads like autonomous driving and augmented/virtual reality evolves, computation is shifting from cloud-based services to the edge, leading to the emergence of a cloud-edge compute continuum. This continuum promises a wide spectrum of deployment opportunities for workloads that can leverage the strengths of cloud (scalable infrastructure, high reliability), edge (energy efficient, low latencies), and endpoints (sensing, user-owned). Designing and deploying software in the continuum is complex because of the variety of available hardware, each with unique properties and trade-offs. In practice, developers have limited access to these resources, limiting their ability to create software deployments. To simplify research and development in the compute continuum, in this paper, we propose Continuum, a framework for automated infrastructure deployment and benchmarking that helps researchers and engineers to deploy and test their use cases in a few lines of code. Continuum can automatically deploy a wide variety of emulated infrastructures and networks locally and in the cloud, install software for operating services and resource managers, and deploy and benchmark applications for users with diverse configuration options. In our evaluation, we show how our design covers these requirements, allowing Continuum to be (i) highly flexible, supporting any computing model, (ii) highly configurable, allowing users to alter framework components using an intuitive API, and (iii) highly extendable, allowing users to add support for more infrastructure, applications, and more. Continuum is available at https://github.com/atlarge-research/continuum.
The infrastructure-as-code (IaC) is an approach for automating the deployment, maintenance, and monitoring of environments for online services and applications that developers usually do manually. The benefit is not only reducing the time and effort but also the operational costs. This paper aims at describing our experience in applying IaC in cloud-native applications, mainly discussing the key challenges towards modeling and generating IaC faced in the ongoing project Programming Trustworthy Infrastructure-As-Code in a Secure Framework (PIACERE). The concluding insights could spur the wider adoption of IaC by software developers.
Over the last few years, DevOps methodologies have promoted a more streamlined operationalization of software components in production environments. Infrastructure as Code (IaC) technologies play a key role in the lifecycle management of applications, as they promote the delivery of the infrastructural elements alongside the application components. This way, IaC technologies aspire to minimize the problems associated with the environment by providing a repeatable and traceable process. However, there are a large variety of IaC frameworks, each of them focusing on a different phase of the operationalization lifecycle, hence the necessity to master numerous technologies. In this research, we present the IaC Execution Manager (IEM), a tool devoted to providing a unified framework for the operationalization of software components that encompasses the various stages and technologies involved in the application lifecycle. We analyze an industrial use case to improve the current approach and conclude the IEM is a suitable tool for solving the problem as it promotes automation, while reducing the learning curve associated with the required IaC technologies.
Security represents one of the crucial concerns when it comes to DevOps methodology-empowered software development and service delivery process. Considering the adoption of Infrastructure as Code (IaC), even minor flaws could potentially cause fatal consequences, especially in sensitive domains such as healthcare and maritime applications. However, most of the existing solutions tackle either Static Application Security Testing (SAST) or run-time behavior analysis distinctly. In this paper, we propose a) IaC Scan Runner, an open-source solution developed in Python for inspecting a variety of state-of-the-art IaC languages in application design time and b) the run time anomaly detection tool called LOMOS. Both tools work in synergy and provide a valuable contribution to a DevSecOps tool set. The proposed approach is demonstrated and their results will be demonstrated on various case studies showcasing the capabilities of static analysis tool IaC Scan Runner combined with LOMOS - log analysis artificial intelligence-enabled framework.
It is our great pleasure to welcome you to the 2023 ACM/SPEC Workshop on Serverless, Extreme-Scale, and Sustainable Graph Processing Systems. This is the first such workshop, aiming to facilitate the exchange of ideas and expertise in the broad field of high-performance large-scale graph processing.
Graphs and GraphSys - The use, interoperability, and analytical exploitation of graph data are essential for modern digital economies. Today, thousands of computational methods (algorithms) and findable, accessible, interoperable, and reusable (FAIR) graph datasets exist. However, current computational capabilities lag when faced with the complex workflows involved in graph processing, the extreme scale of existing graph datasets, and the need to consider sustainability metrics in graph-processing operations. Needs are emerging for graph-processing platforms to provide multilingual information processing and reasoning based on the massive graph representation of extreme data in the form of general graphs, knowledge graphs, and property graphs. Because graph workloads and graph datasets are strongly irregular, and involve one or several big data "Vs" (e.g., volume, velocity, variability, vicissitude), the community needs to reconsider traditional approaches in performance analysis and modeling, system architectures and techniques, serverless and "as a service" operation, real-world and simulation-driven experimentation, etc., and provide new tools and instruments to address emerging challenges in graph processing.
Graphs or linked data are crucial to innovation, competition, and prosperity and establish a strategic investment in technical processing and ecosystem enablers. Graphs are universal abstractions that capture, combine, model, analyze, and process knowledge about real and digital worlds into actionable insights through item representation and interconnectedness. For societally relevant problems, graphs are extreme data that require further technological innovations to meet the needs of the European data economy. Digital graphs help pursue the United Nations Sustainable Development Goals (UN SDG) by enabling better value chains, products, and services for more profitable or green investments in the financial sector and deriving trustworthy insight for creating sustainable communities. All science, engineering, industry, economy, and society-at-large domains can leverage graph data for unique analysis and insight, but only if graph processing becomes easy to use, fast, scalable, and sustainable.
GraphSys is a cross-disciplinary meeting venue focusing on state-of-the-art and the emerging (future) graph processing systems. We invite experts and trainees in the field, across academia, industry, governance, and society, to share experience and expertise leading to a shared body of knowledge, to formulate together a vision for the field, and to engage with the topics to foster new approaches, techniques, and solutions.
Our society is increasingly digital, and its processes are increasingly digitalized. As an emerging technology for the digital society, graphs provide a universal abstraction to represent concepts and objects, and the relationships between them. However, processing graphs at a massive scale raises numerous sustainability challenges; becoming energy-aware could help graph-processing infrastructure alleviate its climate impact. Graph Greenifier aims to address this challenge in the conceptual framework offered by the Graph Massivizer architecture. We present an early vision of how Graph Greenifier could provide sustainability analysis and decision-making capabilities for extreme graph-processing workloads. Graph Greenifier leverages an advanced digital twin for data center operations, based on the OpenDC open-source simulator, a novel toolchain for workload-driven simulation of graph processing at scale, and a sustainability predictor. The input to the digital twin combines monitoring of the information and communication technology infrastructure used for graph processing with data collected from the power grid. Graph Greenifier thus informs providers and consumers on operational sustainability aspects, requiring mutual information sharing, reducing energy consumption for graph analytics, and increasing the use of electricity from renewable sources.
Knowledge Graphs and semantic technologies allow scientists and domain experts to model complex relations between data in a logically structured and machine readable format. metaphactory is a platform that enables users to build these kinds of semantic graphs easily and efficiently. metaphactory uses standards such as RDF in combination with OWL, SKOS, SHACL, and others to provide a flexible endpoint to interact with graphs of varying complexity and expressivity. As part of the Graph-Massivizer project, metaphactory is supporting integration and infrastructure consolidation for components developed in the project. Part of this work is to develop a toolkit which metaphactory uses to process very large graphs without sacrificing sustainability. In this paper we describe in detail the metaphactory platform and how it supports large-scale graph processing in the Graph-Massivizer project, as well as outlining the current efforts within the project and how they aim to increase capabilities in the present to support future work.
With the ever-increasing volume of data and the demand to analyze and comprehend it, graph processing has become an essential approach for solving complex problems in various domains, like social networks, bioinformatics, and finance. Despite the potential benefits of current graph processing platforms, they often encounter difficulties supporting diverse workloads, models, and languages. Moreover, existing platforms suffer from limited portability and interoperability, resulting in redundant efforts and inefficient resource and energy utilization due to vendor and even platform lock-in. To bridge the aforementioned gaps, the Graph-Massivizer project, funded by the Horizon Europe research and innovation program, conducts research and develops a high-performance, scalable, and sustainable platform for information processing and reasoning based on the massive graph (MG) representation of extreme data. In this paper, we briefly introduce the Graph-Massivizer platform. We explore how the emerging serverless computing paradigm can be leveraged to devise a scalable graph analytics tool over a codesigned computing continuum infrastructure. Finally, we sketch seven crucial research questions in our design and outline three ongoing and future research directions for addressing them.
Serverless computing offers an affordable and easy way to code lightweight functions that can be invoked based on some events to perform simple tasks. For more complicated processing, multiple serverless functions can be orchestrated as a directed acyclic graph to form a serverless workflow, so-called function choreography (FC). Although most famous cloud providers offer FC management systems such as AWS Step Functions, and there are also several open-source FC management systems (e.g., Apache OpenWhisk), their primary focus is on describing the control flow and data flow between serverless functions in the FC. Moreover, the existing FC management systems rarely consider the processed data, which is commonly represented in a graph format. In this paper, we review the capabilities of the existing FC management systems in supporting graph processing applications. We also raise two key research questions related to large-scale graph processing using serverless computing in federated Function-as-a-Service (FaaS). As part of the Graph-Massivizer project, funded by the Horizon Europe research and innovation program, we will research and develop (prototype) solutions that will address these challenges.
We explore the potential of the Graph-Massivizer project funded by the Horizon Europe research and innovation program of the European Union to boost the impact of extreme and sustainable graph processing for mitigating existing urgent societal challenges. Current graph processing platforms do not support diverse workloads, models, languages, and algebraic frameworks. Existing specialized platforms are difficult to use by non-experts and suffer from limited portability and interoperability, leading to redundant efforts and inefficient resource and energy consumption due to vendor and even platform lock-in. While synthetic data emerged as an invaluable resource overshadowing actual data for developing robust artificial intelligence analytics, graph generation remains a challenge due to extreme dimensionality and complexity. On the European scale, this practice is unsustainable and, thus, threatens the possibility of creating a climate-neutral and sustainable economy based on graph data. Making graph processing sustainable is essential but needs credible evidence. The grand vision of the Graph-Massivizer project is a technological solution, coupled with field experiments and experience-sharing, for a high-performance and sustainable graph processing of extreme data with a proper response for any need and organizational size by 2030.
In this paper, we explore the use of Graph Neural Networks (GNNs) for anomaly anticipation in high performance computing (HPC) systems. We propose a GNN-based approach that leverages the structure of the HPC system (particularly, the physical proximity of the compute nodes) to facilitate anomaly anticipation. We frame the task of forecasting the availability of the compute nodes as a supervised prediction problem; the GNN predicts the probability that a compute node will fail within a fixed-length future window.
We empirically demonstrate the viability of the GNN-based approach by conducting experiments on the production Tier-0 super-computer hosted at CINECA datacenter facilities, the largest Italian provider of HPC. The results are extremely promising, showing both anomaly detection capabilities on par with other techniques from the literature (with a special focus on those tested on real, production data) and, more significantly, strong results in terms of anomaly prediction.
This paper describes how we envision classifying events into the United Nations Sustainable Development Goals (SDGs) by utilizing machine learning techniques on global news data. We propose extracting data from a media intelligence platform using an ontology and a classifier to assign each event to its corresponding SDG. To minimize the labeling effort, a few-shot classification approach is employed. Additionally, a labeling tool is developed to facilitate event analysis and assign labels accurately. We envision this approach could be used for analyzing media events at a large scale and track progress towards the SDGs.
There is an increasing awareness that strategic foresight is much needed to guide efficient policy-making. The growing digitalization implies a rising amount of digital evidence of many aspects of society (e.g., science, economy, and politics). Artificial intelligence can process massive amounts of data and extract meaningful information. Furthermore, a knowledge graph can be developed to capture significant aspects of reality, and machine learning models can be used to identify patterns and derive insights. This paper describes how we envision artificial intelligence could be used to create and deliver strategic foresight automatically.
In this paper we present a case addressing the drawbacks of financial market data, its limited volumes, history, and sometimes the incomplete and erroneous datasets with variable quality, limited availability, and price barriers. The case aims to enable fast, semi-automated creation of realistic and affordable synthetic (extreme) financial datasets, unlimited in size and accessibility, ready to be commercialized. Peracton Ltd. intends to apply the resulting extreme financial data multiverse for testing and improving artificial intelligence (AI)-enhanced financial algorithms (e.g., using machine learning) focused on green investment and trading. Using synthetic data for testing financial algorithms removes critical biases, such as prior knowledge, overfitting, and indirect contamination due to real-world data scarcity, and ensures data completeness at an affordable cost. The availability of extreme (volumes) of synthetic data will consolidate further financial algorithms and provide a statistically relevant sample size for advanced back-testing.
Graphs can represent various phenomena and are increasingly used to tackle complex problems. Among the challenges associated with graph processing is the ability to analyze and mine massive-scale graphs. While the massive scale is usually associated with distributed systems, the complex nature of graphs makes them an exception to the rule. Currently, most graph processing is performed within a single computer. In this research, we describe a solution at a conceptual level in the context of the Graph-Massivizer architecture. We use two approaches to provide graph analytics and querying functionalities at scale. First, we leverage graph sampling techniques to obtain relevant samples and avoid processing the whole graph. Second, we support heuristic and neural query execution engines. We envision an interface that will decide which queries to execute with a given engine, given constraints (e.g., execution time boundaries, exactness of results, energy saving requirements).
Graph processing is increasingly popular given the wide range of phenomena represented as graphs (e.g., social media networks, pharmaceutical drug compounds, or fraud networks, among others). The increasing amount of data available requires new approaches to efficiently ingest and process such data. In this research, we describe a solution at a conceptual level in the context of the Graph-Massivizer architecture. Graph-Inceptor aims to bridge the void among ETL tools enabling data transformations required for graph creation and enrichment and supporting connectors to multiple graph storages at a massive scale. Furthermore, it aims to enhance ETL operations by learning from data content and load and making decisions based on machine-learning-based predictive analytics.
Graph and hardware-specific optimisations lead to orders of magnitude improvements in performance, energy, and cost over conventional graph processing methods. Typical big data platforms, such as Apache MapReduce and Apache Spark, rely on generic primitives, exhibiting poor performance and high financial and environmental costs. Even optimised basic graph operations (BGOs) lack the tools to combine them towards real-world applications. Furthermore, graph topology and dynamics (i.e., changing the number and content of vertices and edges) lead to high variability in computational needs. Primitive predictive models demonstrate they can enable algorithm selection and advanced auto-scaling techniques to ensure better performance, but no such models exist for energy consumption.
In this work, we present the Graph-Optimizer tool. Graph-Optimizer uses optimised BGOs and composition rules to capture and model the workload. It combines the workload model with hardware and infrastructure models, predicting performance and energy consumption. Combined with design space exploration, such predictions select codesigned workload implementations to fit a requested performance objective and guarantee their performance bounds during execution.
It is our great pleasure to welcome you to the 2023 edition of the Workshop on Hot Topics in Cloud Computing Performance - HotCloudPerf 2023.
Cloud computing is emerging as one of the most profound changes in the way we build and use IT. The use of global services in public clouds is increasing, and the lucrative and rapidly growing global cloud market already supports over 1 million IT-related jobs. However, it is currently challenging to make the IT services offered by public and private clouds performant (in an extended sense) and efficient. Emerging architectures, techniques, and real-world systems include interactions with the computing continuum, serverless operation, everything as a service, complex workflows, auto-scaling and -tiering, etc. It is unclear to which extent traditional performance engineering, software engineering, and system design and analysis tools can help with understanding and engineering these emerging technologies. The community needs practical tools and powerful methods to address hot topics in cloud computing performance.
Responding to this need, the HotCloudPerf workshop proposes a meeting venue for academics and practitioners, from experts to trainees, in the field of cloud computing performance. The workshop aims to engage this community and to lead to the development of new methodological aspects for gaining a deeper understanding not only of cloud performance, but also of cloud operation and behavior, through diverse quantitative evaluation tools, including benchmarks, metrics, and workload generators. The workshop focuses on novel cloud properties such as elasticity, performance isolation, dependability, and other non-functional system properties, in addition to classical performance-related metrics such as response time, throughput, scalability, and efficiency.
When one hears the word Metaverse, it is automatically associated with millions of users, immersive experiences and its potential to change our lives. But, what enables the Metaverse to function at such a scale? This talk will present the different challenges associated with handling 55 million daily users within the Roblox Metaverse. From addressing user Quality of Experience, to distributed architecture, programming model and scheduling, we will cover the entire stack underneath. In particular, I will put special attention into the AI-Metaverse relationship. On one hand, a large proportion of workloads are based on one or multiple ML/DL models and I will present the challenges of scaling them. On the other hand, I will explore infrastructure and service model challenges that can be addressed with AI, e.g. multi-resource and multi-datacenter scheduling with Reinforcement learning.
The ambition of this talk is to seed discussions around how cloud native technologies can help research on performance engineering, but also what are the interesting performance engineering challenges to solve with cloud native technologies.
Cloud native technologies are building blocks for creating a modern environment for hosting containerized applications. Amongst others, great focus is placed on observability, which allows engineers to collect and analyze massive amounts of performance data in near real-time. Take as an example service meshes, which are a layer 7 network platform for containerized applications. Service meshes not only allow traffic engineering, but also add observability on top of a microservice application. Amongst other, this allows understanding traffic patterns between microservices, including upstream-downstream relationships, request rate, etc. without writing a single line of code.
This talk discusses how cloud native technologies may help researchers in performance engineering. The benefits are three-fold. They allow researchers - e.g., PhD students - to be more productive, by getting the mechanism of collecting performance data out of the way. They improve collaboration because the effects of changing a parameter can be visualized in near real time. Finally, experiments are based on proven technologies with skills more widely available, which helps reproducibility.
These benefits are illustrated through our research on adaptive service meshes. Indeed, service meshes have many parameters which impact performance. Discussions with practitioners revealed a gap in understanding on how to effectively choose these parameters. We therefore proposed an adaptive controller that configures a service mesh so as to maintain a target tail response time.
Reliable job execution is important in High Performance Computing clusters. Understanding the failure distribution and failure pattern of jobs helps HPC cluster managers design better systems, and users design fault tolerant systems. Machine learning is an increasingly popular workload for HPC clusters are used for. But, there is little information on machine learning job failure characteristics on HPC clusters, and how they differ from the previous workload such clusters were used for. The goal of our work is to improve the understanding of machine learning job failures in HPC clusters. We collect and analyze job data spanning the whole of 2022, and over 2~million jobs. We analyze basic statistical characteristics, the time pattern of failures, resource waste caused by failures, and their autocorrelation. Some of our findings are that machine learning jobs fail at a higher rate than non-ML jobs, and waste much more CPU-time per job when they fail.
Cloud computing has become the major computational paradigm for the deployment of all kind of applications, ranging from mobile apps to complex AI algorithms. On the other side, the rapid growth of IoT market has led to the need of processing the data produced by smart devices using their embedded resources. The computing continuum paradigm aims at solving the issues related to the deployment of applications across edge-to-cloud cyber-infrastructures.
This work considers in-memory data protection to enhance security over the compute continua and proposes a solution for the development of distributed applications that handles security in a transparent way for the developer. The proposed framework has been evaluated using an ML application that classifies health data using a pre-trained model. The results show that securing in-memory data incurs no additional effort at development time and the overheads introduced by the encryption mechanisms do not compromise the scalability of the application.
Cloud-native applications force increasingly powerful and complex autoscalers to guarantee the applications' quality of service. For software engineers with operational tasks understanding the autoscalers' behavior and applying appropriate reconfigurations is challenging due to their internal mechanisms, inherent distribution, and decentralized decision-making. Hence, engineers seek appropriate explanations. However, engineers' expectations on feedback and explanations of autoscalers are unclear. In this paper, through a workshop with a representative sample of engineers responsible for operating an autoscaler, we elicit requirements for explainability in autoscaling. Based on the requirements, we propose an evaluation scheme for evaluating explainability as a non-functional property of the autoscaling process and guide software engineers in choosing the best-fitting autoscaler for their scenario. The evaluation scheme is based on a Goal Question Metric approach and contains three goals, nine questions to assess explainability, and metrics to answer these questions. The evaluation scheme should help engineers choose a suitable and explainable autoscaler or guide them in building their own.
Performance analysis of microservices can be a challenging task, as a typical request to these systems involves multiple Remote Procedure Calls (RPC) spanning across independent services and machines. Practitioners primarily rely on distributed tracing tools to closely monitor microservices performance. These tools enable practitioners to trace, collect, and visualize RPC workflows and associated events in the context of individual end-to-end requests. While effective for analyzing individual end-to-end requests, current distributed tracing visualizations often fall short in providing a comprehensive understanding of the system's overall performance. To address this limitation, we propose a novel visualization approach that enables aggregate performance analysis of multiple end-to-end requests. Our approach builds on a previously developed technique for comparing structural differences of request pairs and extends it for aggregate performance analysis of sets of requests. This paper presents our proposal and discusses our preliminary ongoing progress in developing this innovative approach.
In this paper we report our experiences from the migration of an AI model inference process, used in the context of an E-health platform to the Function as a Service model. To that direction, a performance analysis is applied, across three available Cloud or Edge FaaS clusters based on the open source Apache Openwhisk FaaS platform. The aim is to highlight differences in performance based on the characteristics of each cluster, the request rates and the parameters of Openwhisk. The conclusions can be applied for understanding the expected behavior of the inference function in each of these clusters as well as the effect of the Openwhisk execution model. Key observations and findings are reported on aspects such as function execution duration, function sizing, wait time in the system, network latency and concurrent container overheads for different load rates. These can be used to detect in a black box manner capabilities of unknown clusters, guide or fine-tune performance models as well as private cloud FaaS deployment setup.
Extending human societies into virtual space through the construction of a metaverse has been a long-term challenge in both industry and academia. Achieving this challenge is now closer than ever due to advances in computer systems, facilitating large-scale online platforms such as Minecraft and Roblox that fulfill an increasing number of societal needs, and extended reality (XR) hardware, which provides users with state-of-the-art immersive experiences. For a metaverse to succeed, we argue that all involved systems must provide consistently good performance. However, there is a lack of knowledge on the performance characteristics of extended reality devices. In this paper, we address this gap and focus on extended- and virtual-reality hardware. We synthesize a user-centered system model that models common deployments of XR hardware and their trade-offs. Based on this model, we design and conduct real-world experiments with Meta's flagship virtual reality device, the Quest Pro. We highlight two surprising results from our findings which show that (i) under our workload, the battery drains 15% faster when using wireless offloading compared to local execution, and (ii) the outdated 2.4 GHz WiFi4 gives surprisingly good performance, with 99% of samples achieving a frame rate of at least 65 Hz, compared to the 72 Hz performance target. Our experimental setup and data are available at https://github.com/atlarge-research/measuring-the-metaverse.
It is our great pleasure to welcome you to the eleventh edition of the International Workshop on Load Testing and Benchmarking of Software Systems - LTB 2023, https://ltb2023.github.io/). This one-day workshop brings together software testing and software performance researchers, practitioners, and tool developers to discuss the challenges and opportunities of conducting research on load testing and benchmarking software systems, including theory, applications, and experiences. LTB 2023 included 2 keynote talks, 2 research papers, and 4 industry presentations. The topics cover performance of serverless computing, performance and load testing, performance-driven culture, workload generation, workload tracing, benchmarking, and performance verification.
It is important for developers to understand the performance of a software project as they develop new features, fix bugs, and try to generally improve the product. At MongoDB we have invested in building a performance infrastructure to support our developers. The infrastructure automates the provisioning of systems under test, the running of performance tests against those systems, collecting many metrics from the tests and system under test, and making sense of all the results.
Our performance infrastructure and processes are continually changing. As the system has become more powerful we have used it more and more: adding new tests, new configurations, and new measurements. Tools and processes that work on one scale of use start to break down at higher scales and we must adapt and update. If we do a good job, we keep pace with the rising constraints. If we do a great job, we make the system fundamentally better even as we scale the system.
In this talk we describe our performance testing environment at MongoDB and its evolution over time. The core of our environment is a focus on automating everything, integrating into our continuous integration (CI) system (Evergreen), controlling as many factors as possible, and making everything as repeatable and consistent as possible. After describing that core, we will discuss the scaling challenges we have faced, before relating what we have done to address those scaling challenges and improve the system overall.
Market analysts are agreed that serverless computing has strong market potential, with projected compound annual growth rates varying between 21% and 28% through 2028 and a projected market value of 36.8 billion by that time. Although serverless computing has gained significant attention in industry and academia over the past years, there is still no consensus about its unique distinguishing characteristics and precise understanding of how these characteristics differ from classical cloud computing. For example, there is no wide agreement on whether serverless is solely a set of requirements from the cloud user's perspective or it should also mandate specific implementation choices on the provider side, such as implementing an autoscaling mechanism to achieve elasticity. Similarly, there is no agreement on whether serverless is just the operational part, or it should also include specific programming models, interfaces, or calling protocols. In this talk, we seek to dispel this confusion by evaluating the essential conceptual characteristics of serverless computing as a paradigm, while putting the various terms around it into perspective. We examine how the term serverless computing, and related terms, are used today. We explain the historical evolution leading to serverless computing, starting with mainframe virtualization in the 1960 through to Grid and cloud computing all the way up to today. We review existing cloud computing service models, including IaaS, PaaS, SaaS, CaaS, FaaS, and BaaS, discussing how they relate to the serverless paradigm. In the second part of talk, we focus on performance challenges in serverless computing both from the user's perspective (finding the optimal size of serverless functions) as well as from the provider's perspective (ensuring predictable and fast container start times coupled with fine-granular and accurate elastic scaling mechanisms).
Database management systems~(DBMS) are crucial architectural components of any modern distributed software system. Yet, ensuring a smooth, high-performant operation of a DBMS is a black art that requires tweaking many knobs and is heavily dependent on the experienced workload. Misconfigurations at production systems have an heavy impact on the overall delivered service quality and hence, should be avoided at all costs. Replaying production workload on test and staging systems to estimate the ideal configuration are a valid approach. Yet, this requires traces from the production systems.
While many DBMS's have built-in support to capture such traces these have a non-negligible impact on performance. eBPF is a Linux kernel feature claiming to enable low-overhead observability and application tracing. In this paper, we evaluate different eBPF-based approaches to DBMS workload tracing for PostgreSQL and MySQL. The results show that using eBPF causes lower overhead than the built-in mechanisms. Hence, eBPF can be a viable baseline for building a generic tracing framework. Yet, our current results also show that additional optimisation and fine-tuning is needed to further lower the performance overhead.
Chaos Engineering is an approach for assessing the resilience of software systems, i.e., their ability to withstand unexpected events, adapt accordingly, and return to a steady state. The traditional Chaos Engineering approach only verifies whether the system is in a steady state and considers no statements about state changes over time and timing. Thus, Chaos Engineering conceptually does not consider transient behavior hypotheses, i.e., specifications regarding the system behavior during the transition between steady states after a failure has been injected. We aim to extend the Chaos Engineering approach and tooling to support the specification of transient behavior hypotheses and their verification.
We interviewed three Chaos Engineering practitioners to elicit requirements for extending the Chaos Engineering process. Our concept uses Metric Temporal Logic and Property Specification Patterns to specify transient behavior hypotheses. We then developed a prototype that can be used stand-alone or to complement the established Chaos Engineering framework Chaos Toolkit. We successfully conducted a correctness evaluation comprising 160 test cases from the Timescales benchmark and demonstrate the prototype's applicability in Chaos Experiment settings by executing three chaos experiments.
It is our great pleasure to welcome you to the 2023 ACM Practically FAIR - PFAIR 2023. This workshop builds upon the popular FAIR data principles to investigate and share best practices for adopting FAIR principles in practice. The FAIR proposal only covers some computing and science domains while leaving many unconsidered and also does not prescribe how to achieve the standards in the ideal. As researchers and practitioners begin to meet these standards, we, as a community, need to agree upon what constitutes meeting the principles and how it can be validated. We had a call for research and experience papers and received one submission that met our peer review standards and will fill out our program with a keynote as well as a panel and open discussion.
Scientific computing communities often run their experiments using complex data- and compute-intensive workflows that utilize high performance computing (HPC), distributed clusters and specialized architectures targeting machine learning and artificial intelligence. FAIR principles for data and software can be useful enablers for the reproducibility of performance (a key HPC metric) and that of scientific results (a crucial tenet of the scientific method) that are based in part on re-use, the R of FAIR principles. FAIR principles are under-used by HPC and data-intensive communities who have been slow to adopt them. This is due in part to the complexity of workflow life cycles, the numerous workflow management systems, the lack of integration of FAIR within existing technologies, and the specificity of managed systems that include rapidly evolving architectures and software stacks, and execution models that require resource managers and batch schedulers. Numerous challenges emerge for scientists attempting to publish FAIR datasets and software for the purpose of re-use and reproducibility, e.g. what data to publish and where due to sizes, how to "FAIRify" data subsetting, at what level of granularity to attribute persistent identifiers to software, what is the minimal amount of metadata needed to guarantee a certain level of reproducibility, what does reproducible AI actually mean? This talk will focus on such challenges and illustrate the negative impact of not applying FAIR on the reproducibility of experiments. We will introduce the notion of FAIR Digital Objects and present RECUP, a framework for data and metadata services for high performance workflows that proposes micro-solutions for adapting FAIR principles to HPC.
Provenance provides data lineage and history of different transformations applied to a dataset. A complete trace of data provenance can enable the reanalysis, reproducibility, and reusability of features, which are essential for validating results and extending them in many projects. Open time series datasets are readily accessible and discoverable, but their full reproducibility and reusability require clear metadata provenance. This paper introduces an assessment of provenance variables using an algorithm for collecting FAIR (Findable, Accessible, Interoperable, Reusable) characteristics in open time series and generating an associated provenance graph. We have evaluated the FAIRness of provenance traces by automatically mapping their properties to a provenance data model graph for a case study employing open time series from weather stations. Our approach arguably enables researchers to analyse time series datasets with similar characteristics, prompting new research questions, insights, and investigations. As a result, this approach has the potential to promote reusability and reproducibility, which are critical factors in scientific research.
It is our great pleasure to welcome you to the 4th International Workshop on Education and Practice of Performance Engineering - WEPPE 2023. The goal of the Workshop on Education and Practice of Performance Engineering is to bring together University researchers and Industry Performance Engineers to share education and practice experiences. This year's symposium continues its tradition of being a forum for performance engineering educators. We are interested in creating opportunities to share valuable experience between researchers that are actively teaching performance engineering and Performance Engineers that are applying Performance Engineering techniques in industry.
ABB is developing a vast range of software services for process automation applications used in chemical production facilities, power plants, and container ships. High responsiveness and resource efficiency is important in this domain both for real-time embedded systems and distributed containerized systems, but performance engineering can be challenging due to system complexity and application domain heterogeneity. This talk provides experiences and lessons learned from several selected case studies on performance engineering. It illustrates testing performance of OPC UA pub/sub communication, clustered MQTT brokers for edge computing, software container online updates, and lightweight Kubernetes frameworks while highlighting the applied practices and tools. The talk reports on challenges in workload modeling, performance testing, and performance modeling.
Context. The Software Quality and Architecture group (SQA) at the University of Stuttgart offers the Quantitative Analysis of Software Designs (QASD) course for master students. The goal is to give students the necessary skill to evaluate architecture alternatives of software systems quantitatively. The course offers a combination of required theoretical skills, such as applying stochastic processes and practical exercises using suitable tools. The challenge is providing teaching materials that balance necessary theoretical knowledge and appropriate tooling that can be used in practice. As a solution, the course is designed so that one-third is about the formalisms behind quantitative analysis, including stochastic processes and queuing theory. One-third is modeling languages, such as queuing networks, UML, and UML profiles, including MARTE. The other one-third uses tooling to model and analyze example systems. During Corona, we provided students with an e-learning module with pre-recorded videos, online quizzes at the end of every chapter, and a virtual machine that pre-installed all the required tooling for the exercise sheets. Final-remarks. In the past two years, students' feedback was often positive regarding the balance between theory and tooling. However, it has to be emphasized that the number of students participating in the course has always been no more than ten. Hence, the student feedback has not been collected by the universities' survey.
Software engineering and computer science courses are frequently focused on particular areas in a way that neglects such cross-cutting quality attributes as performance, reliability, and security. We will describe the progress we have made in developing enhancements to some of our existing software engineering courses to draw attention and even lay the foundations of an awareness of performance considerations in the software development life cycle. In doing so, we wish to make performance considerations integral to the software engineering mindset while avoiding the need to remove current material from our existing courses. This work is part of an NSF-funded project for undergraduate curriculum development.
This paper reports the experience gained over several years of teaching the course entitled Software performance and scalability at the University Ca' Foscari of Venice. The course is taken by perspective computer scientists and is taught at the master's level. It covers the topics of modeling and assessment of the performance properties of software systems.
In this paper, we will also devote attention to the challenge of online teaching due to the pandemic conditions.
Finally, we propose some auspices for the community to collect material for structured courses on performance and reliability evaluation topics.
MongoDB has invested in developing a performance infrastructure and a corresponding performance culture. All development engineers are expected to improve MongoDB performance, through adding performance tests, optimizing code, and fixing performance regressions. Investing in the infrastructure is clear: we develop and support tools to make it easy to track performance changes and improve performance. Investing in the culture includes formal and informal training. The training must ultimately both support strong developers with very limited performance backgrounds as well as developing our future performance experts.
Performance engineering is changing before our eyes adjusting to current industry trends - such as cloud computing, agile development, and DevOps. As systems scale and sophistication skyrocket, performance gets more attention. While it looks like some performance concepts like algorithm complexity became a must for anybody working in the industry, it still doesn't result in a consistent view of performance. So it remains an open question what computer professionals should learn about performance - and, even more challenging, what is needed to prepare performance professionals (that actually have never been clearly answered - and now it is even less defined than it was before).
Performance analysis tools are frequently used to support the development of parallel MPI applications. They facilitate the detection of errors, bottlenecks, or inefficiencies but differ substantially in their instrumentation, measurement, and type of feedback. Especially, tools that provide visual feedback are helpful for educational purposes. They provide a visual abstraction of program behavior, supporting learners to identify and understand performance issues and write more efficient code. However, existing professional tools for performance analysis are very complex, and their use in beginner courses can be very demanding. Foremost, their instrumentation and measurement require deep knowledge and take a long time. Immediate, as well as straightforward feedback, is essential to motivate learners. This paper provides an extensive overview of performance analysis tools for parallel MPI applications, which experienced developers broadly use today. It also gives an overview of existing educational tools for parallel programming with MPI and shows their shortcomings compared to professional tools. Using tools for performance analysis of MPI programs in educational scenarios can promote the understanding of program behavior in large HPC systems and support learning parallel programming. At the same time, the complexity of the programs and the lack of infrastructure in educational institutions are barriers. These aspects will be considered and discussed in detail.
It is our great pleasure to welcome you to the 8th Workshop on Challenges in Performance Methods for Software Development - WOSP-C 2023. This year's workshop continues its tradition of being the forum for the discussion of emerging or unaddressed challenges in software and performance, including challenges in developing software to be performant, concurrent programming issues, performance and architecture, performance measurement, cloud performance, and testing. Its purpose is to open new avenues for research on methods to address continuously emerging performance challenges. The software world is changing, and new challenges are to be expected.
We also encourage attendees to attend the keynote and talk presentations. These valuable and insightful talks can and will guide us to a better understanding of the future: Non-Volatile Hardware Transactional Memory: Advancements, Challenges, and Future Directions, Paolo Romano (who is currently at IST, Lisbon University & INESC-ID)
Putting together WOSP-C'23 was a team effort. We first thank the authors for providing the content of the program. We are grateful to the program committee, who worked very hard in reviewing papers and providing feedback for authors. Finally, we thank the hosting organization or university, our sponsor, ACM SIGs.
We hope that you will find this program interesting and thought-provoking and that the workshop will provide you with a valuable opportunity to share ideas with other researchers and practitioners from institutions around the world.
Transactional memory (TM) has emerged as a powerful paradigm to simplify concurrent programming. Nowadays, hardware-based TM (HTM) implementations are available in several mainstream CPUs (e.g., by ARM, Intel and IBM). Due to their hardware nature, HTM implementations spare the cost of software instrumentation and can efficiently detect conflicts by extending existing cache- coherency protocols. However, their cache-centric approach also imposes a number of limitations that impact how effectively such systems can be used in practice.
This talk investigates the challenges that arise when leveraging existing HTM systems in conjunction with another recent disrup- tive hardware technology, namely Non-Volatile Memory (NVM). NVM, such as Intel Optane DC, provide much higher density than existing DRAM, while attaining competitive performance and pre- serving DRAM's byte addressability. However, the cache-centric approach adopted by existing HTM implementations raises a crucial problem when these are used in conjunction with NVM: since CPU caches are volatile, existing HTM fail to guarantee that data updated by committed transactions are atomically persisted to NVM.
I will overview how this problem has been so far tackled in the literature, with a focus on solutions that do not assume ad-hoc hard- ware mechanisms not provided by current HTM implementations, but that rather rely on hardware-software co-design techniques to ensure consistency on unmodified existing HTM systems. I will conclude by presenting ongoing research directions that depart from state of the art approaches in a twofold way: i) they assume the availability of durable caches, i.e., systems equipped with addi- tional power sources that ensure that cache contents can be safely persisted to NVM upon crashes; ii) they assume a weaker isolation levels at the TM level, namely Snapshot Isolation, which despite being more relaxed than the reference consistency model for TM systems (e.g., opacity), can still ensure correct execution of a wide range of applications while enabling new optimizations to boost the efficiency HTM applications operating on NVM.
The continuous adoption of embedded systems in the most diverse application domains contributes to the increasing complexity of their development. Hardware/Software Co-Design methodologies are usually employed to tackle the challenges deriving from even more stringent functional and non-functional requirements. Using these methodologies, several validation and verification steps can be carried out early in the design process using a unified, technology-independent system model.
This work investigates the possibility of integrating formal functional verification and timing validation in a Hardware/Software Co-Design flow at the system-level of abstraction. Specifically, we introduce Co-V&V, namely an additional step that consists of two phases: (i) a transformation from UML/MARTE to UPPAAL Timed Automata, and (ii) a preliminary functional verification and timing validation that exploits the UPPAAL verifier.
We describe the Co-V&V step through a case study characterized by a component-based architecture and reactive behavior. The verification and validation conducted with UPPAAL indicate that our approach is particularly effective in discovering design flaws located in the communication protocol as well as those arising from the internal behavior of components.
The examination of performance changes or the performance behavior of a software requires the measurement of the performance. This is done via probes, i.e., pieces of code which obtain and process measurement data, and which are inserted into the examined application. The execution of those probes in a singular method creates overhead, which deteriorates performance measurements of calling methods and slows down the measurement process. Therefore, an important challenge for performance measurement is the reduction of the measurement overhead.
To address this challenge, the overhead should be minimized. Based on an analysis of the sources of performance overhead, we derive the following four optimization options: (1) Source instrumentation instead of AspectJ instrumentation, (2) reduction of measurement data, (3) change of the queue and (4) aggregation of measurement data. We evaluate the effect of these optimization options using the MooBench benchmark. Thereby, we show that these optimizations options reduce the monitoring overhead of the monitoring framework Kieker. For MooBench, the execution duration could be reduced from 4.77 µs to 0.39 µs per method invocation on average.
Fluid approximations are useful for representing transient behaviour of queueing systems. For layered queues a fluid model has previously been derived indirectly via transformation first to a PEPA model, or via recursive neural networks. This paper presents a derivation directly from the layered queueing mechanisms, starting from a transformation to a context-sensitive layered form. The accuracy of predictions, compared to transient simulations and steady-state solutions, is evaluated and appears to be useful.
The runtime quality of application systems - e.g., performance, reliability, and resilience - directly influences companies' business success. Over the last few years, corresponding analysis measures such as load tests or monitoring have become widespread in practice, and mature commercial and open-source tools have been developed. However, these measures are all at the technical level and not interpreted at the (business) domain level. At the same time, software architecture and software development approaches such as Domain-Driven Design (DDD), which are becoming increasingly widespread, essentially do not consider runtime quality concerns despite their criticality.
Our envisioned dqualizer approach aims to close the gap between the domain-specificity of application systems and the (technical) measures and findings of quality assurance utilizing a domain-centric approach. For this purpose, we integrate means to model and monitor runtime quality metrics into DDD-based techniques, e.g., Domain Story Telling, that enable domain experts to describe domain-centric runtime quality concerns. Our preliminary results comprise the prototypical extension of a domain story editor for specifying load and resilience tests and reporting test results. Using the editor, we gathered feedback from domain experts in a qualitative user study. Despite the editor's limitations regarding functionality and usability, the feedback indicated that domain experts are able to model runtime quality analyses.