RoadAI - Team NoDig

RoadAI

Team NoDig - Decision Support for Sustainable Worksites

Competition Description

The RoadAI competition aims to reduce emissions in road construction. 1.5% of Norwegian CO2 emission resides from construction machines. The Norwegian Artificial Intelligence Research Consortium (NORA) hosts the competition, in collaboration with Skanska, Sintef, Nordic Machine Intelligence Journal, and several other partners. Kicking off in June 2023, the competition has run for three months, ending on the 10th of September. The overall aim of the competition is to demonstrate how road construction can become more sustainable through the use of data.

Skanska has provided the participants with real machine data mainly recorded in March, April, and May in 2022. The data stems from the construction of the E16 highway, between Bjørum and Skaret, just west of Oslo. There are four main components in the data set; GPS data, machine data (AEMP), vibration data, and drone photos.

NoDig's Approach

Team NoDig's approach has been founded on a desire to develop feasible tools for automated decision support. A necessary first step has been the preprocessing of raw data. Any successful data analysis or machine learning model relies on relevant, representative, and usable input of data. Part one in the next section will elaborate on how we have preprocessed the raw data. Part two demonstrates how we have automated the load and dump activities of the machines. Finally, part three of our algorithm demonstrates the automated generation of a daily report.

The Algorithm

Part one: Data Preprocessing

The primary data source for the algorithms is GPS information. Initial CSV files have been transformed into user-friendly, extensible classes designed for both analysis and machine learning applications. These classes offer the flexibility to easily incorporate additional features. Subsequent sections of this work rely solely on this refined GPS data. While attempts to integrate machine and vibration data were made, the outcomes were inconsistent. Points in the discussion will highlight instances where the incorporation of additional data sets would have been beneficial.

Part two: Automatic Load and Dump Detection with Light Gradient-Boosting Machine

Introduction

In the GPS data set, loading and dumping activities for trucks and dumpers are manually recorded by drivers. This manual method introduces variations in defining what constitutes a loading or dumping event and is susceptible to human error, affecting data quality. To enhance analytical outcomes and driver efficiency, automating data registration should be a long-term goal. The model employed in this work, therefore, attempts to predict the event at any given time - a classification problem with "Driving", "Loading" and "Dumping" as target labels.

The dataset is assumed to feature a continuous stream of data points from each machine's device. While the frequency of these points is inconsistent, each is categorized as either "Driving," "Loading," or "Dumping." The objective is a real-time classification based on the cumulative data up to each point. By definition, a single trip comprises one loading and one dumping event, with the remainder categorized as "driving."

Given the variations in driver behavior and potential GPS inaccuracies, identifying exact moments of loading or dumping is challenging. Therefore, an approach that consolidates consecutive points for classification is recommended if one can afford the trade-off in resolution, i.e. having a larger timeframe where the predicted event occurs. This enhances model robustness and facilitates generalization across different drivers and machines.

The Machine Learning Algorithm

The machine learning algorithm employed in this project is Light Gradient-Boosting Machine (LightGBM). LightGBM is an ensemble framework that uses decision trees as weak learners and can tackle regression, ranking, and classification tasks. What sets LightGBM apart from other gradient boosting algorithms like XGBoost and CatBoost are its distinctive features - Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). These features effectively reduce the size of the data set and the number of utilized features, all while maintaining the same level of accuracy for the majority of tasks. As a result, LightGBM delivers exceptional time efficiency. More about this algorithm can be found in the developers' whitepaper LightGBM whitepaper.

Data Pre-processing

The LightGBM model was trained on a dataset derived from GPS data. This enriched dataset included a variety of features for each timestamp, such as speed, acceleration, and movement-related variables like changes in latitude/longitude and speed in both north/south and east/west directions.

The model proceeded by first segmenting the data for each machine across a single day and then splitting it into an 80/20 ratio for training and testing. This segmentation ensured that the model made predictions on all relevant chunks of data, not neglecting randomly selected routes.

Results

It has been observed that the model's predictive accuracy significantly improves when trained on a larger dataset. However, this advantage comes at the cost of increased data loading and training times, which tend to scale linearly with the size of the dataset. The algorithm features intriguing results based on metrics such as precision, recall. and F1-score. Table 1 and Figure 2 display the results of training the model on the entire set of available GPS data, resulting in a data set containing over one million data points. In this model configuration, five timestamps were consolidated into a single data point. For further information regarding the set of hyperparameters and features employed in this model, please navigate to the top bar and find the Notebook (load_dump_lightgbm_demo.ipynb).

Table 1: Performance Metrics
Label Precision Recall F1-Score
Driving 0.9969 0.9958 0.9964
Dump 0.8388 0.8867 0.8621
Load 0.8445 0.8647 0.8545
Figure 2: Confusion matrix. The value represents the frequency of the label in the dataset.

Part three: Daily Report

Taking advantage of the available GPS data, mass quantities, and type of mass moved, we have developed a report displaying the daily progress at the construction site. The report includes an interactive plot of mass moved, idle time statistics, and productivity information per machine per material type. The report is generated automatically for a chosen day.

The daily report is meant to serve as a proof of concept of what could be deployed to an analytics and interactive visualization web application. The solution could be hosted at popular services, such as Tableau, Grafana, or Power BI. In this sense, it is not going to be an algorithm that directly proposes solutions but rather enables insights into how machines operate, both at an individual and group level. We look closer at two main components of the daily report.

Mass Moved

In the interactive map, the user can select the machine type and the amount of clusters for dump and load regions to display. Drone images from Skaret and Nordlandsdalen are overlaid on the map for better visuals of how the region changes over time as construction progresses. An agglomerative clustering and convex hull algorithm is used to plot polygons of load and dump regions. The interactive plot informs on the quantity moved between regions, the top workers of the day (tons/hr), and the total mass moved for each material type.

Day Overview, 04-08-2022
Total mass moved for the day by Truck: 11504.0 t
Stone: 11504.0 t, Soil: 0 t, Equipment: 0 t, Other: 0 t
Top 5 mass transfer zones for the day
L2->D0: 2153.0 t
L4->D2: 1113.0 t
L6->D2: 1023.0 t
L0->D4: 900.0 t
L4->D3: 895.5 t
Top 3 workers of the day
Nr.1 ID: 30 moved 150.6 t of mass moved per hour
Nr.2 ID: 22 moved 148.2 t of mass moved per hour
Nr.3 ID: 44 moved 139.2 t of mass moved per hour

Figure 3: Static version of interactive map showcasing mass moved between different load and dump clusters. Click on 'Dx' or 'Lx' markers to display mass moved to/from different clusters. For an interactive version, launch the daily_report_demo.ipynb notebook.

Idle Time and Productivity

Knowing when, where, and why machines are idle throughout the day can be an important insight when considering operational changes. The daily report provides a timeline plot, as can be seen in Figure 3. The red line, named Machines in action, illustrates the number of machines having started their day, i.e. have recorded their first load of the day. The number of idle machines can be seen by looking at the blue line. This tracks all machines standing still, including machines loading and dumping. Furthermore, one can filter on their status, i.e. are their next activity expected to be a load or dump? We see that the number of idle machines peaks around 09:30 and 13:30.

Figure 4: Plot of idle machine throughout the day.

There might be many reasons for the machines to be idle. Some drivers might be on a break, others are attending other duties or some might be waiting for an excavator. To provide more insight into this matter, we plot the position of idle machines on peak times for the selected day. An example of this can be seen in Figure 4. The map allows the user to zoom in on areas of interest. Machines waiting to load are marked by an excavator icon, while machines waiting to dump are marked by an icon of a dumping truck. This feature could be further developed with vibration data. Consequently one could distinguish between actual idle time and times when the machine is at complete rest.

Figure 5: Map of idle machines at a peak time.

Finally, we present a heatmap of idle machines for a given day. Collecting all idle positions of all machines for a given day, we get an impression of where we are most likely to find idle machines. The heatmap can be seen in Figure 5. Combining the idle timeline, the peak idle time plot, and the heatmap, one should be able to identify potential bottlenecks and areas of reduced efficiency.

Figure 6: Heatmap of idle machines for a given day.