The RoadAI competition aims to reduce emissions in road construction. 1.5% of Norwegian CO2 emission resides
from construction machines. The Norwegian Artificial Intelligence Research Consortium (NORA) hosts the
competition,
in collaboration with Skanska, Sintef, Nordic Machine Intelligence Journal, and several other partners.
Kicking
off in June 2023, the competition has run for three months, ending on the 10th of September. The overall aim
of
the competition is to demonstrate how road construction can become more sustainable through the
use of data.
Skanska has provided the participants with real machine data mainly recorded in March, April, and May in
2022.
The data stems from the construction of the E16 highway, between Bjørum and Skaret, just west of Oslo. There
are
four main components in the data set; GPS data, machine data (AEMP), vibration data, and drone photos.
Team NoDig's approach has been founded on a desire to develop feasible tools for automated decision support. A necessary first step has been the preprocessing of raw data. Any successful data analysis or machine learning model relies on relevant, representative, and usable input of data. Part one in the next section will elaborate on how we have preprocessed the raw data. Part two demonstrates how we have automated the load and dump activities of the machines. Finally, part three of our algorithm demonstrates the automated generation of a daily report.
The primary data source for the algorithms is GPS information. Initial CSV files have been transformed into user-friendly, extensible classes designed for both analysis and machine learning applications. These classes offer the flexibility to easily incorporate additional features. Subsequent sections of this work rely solely on this refined GPS data. While attempts to integrate machine and vibration data were made, the outcomes were inconsistent. Points in the discussion will highlight instances where the incorporation of additional data sets would have been beneficial.
In the GPS data set, loading and dumping activities for trucks and dumpers are manually recorded by drivers. This
manual
method introduces variations in defining what constitutes a loading or dumping event and is susceptible to human
error, affecting data quality. To enhance analytical outcomes and driver efficiency, automating data registration
should be a long-term goal. The model employed in this work, therefore, attempts to predict the event at any given
time
- a classification problem with "Driving", "Loading" and "Dumping" as target labels.
The dataset is assumed to feature a continuous stream of data points from each machine's device. While the
frequency of these points is inconsistent, each is categorized as either "Driving," "Loading," or "Dumping." The
objective is a real-time classification based on the cumulative data up to each point. By definition, a single
trip
comprises one loading and one dumping event, with the remainder categorized as "driving."
Given the variations in driver behavior and potential GPS inaccuracies, identifying exact moments of loading or
dumping is challenging. Therefore, an approach that consolidates consecutive points for classification is
recommended if one can afford the trade-off in resolution, i.e. having a larger timeframe where the predicted
event occurs. This enhances model robustness and facilitates generalization across different drivers and machines.
The machine learning algorithm employed in this project is Light Gradient-Boosting Machine (LightGBM). LightGBM is an ensemble framework that uses decision trees as weak learners and can tackle regression, ranking, and classification tasks. What sets LightGBM apart from other gradient boosting algorithms like XGBoost and CatBoost are its distinctive features - Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB). These features effectively reduce the size of the data set and the number of utilized features, all while maintaining the same level of accuracy for the majority of tasks. As a result, LightGBM delivers exceptional time efficiency. More about this algorithm can be found in the developers' whitepaper LightGBM whitepaper.
The LightGBM model was trained on a dataset derived from GPS data. This enriched
dataset included a variety of features for each timestamp, such as speed, acceleration, and
movement-related variables like changes in latitude/longitude and speed in both north/south and east/west
directions.
The model proceeded by first segmenting the data for each machine across a single day and then splitting it into
an
80/20 ratio for training and testing. This segmentation ensured that the model made predictions on all relevant
chunks of data, not neglecting randomly selected routes.
It has been observed that the model's predictive accuracy significantly improves when trained on a larger dataset. However, this advantage comes at the cost of increased data loading and training times, which tend to scale linearly with the size of the dataset. The algorithm features intriguing results based on metrics such as precision, recall. and F1-score. Table 1 and Figure 2 display the results of training the model on the entire set of available GPS data, resulting in a data set containing over one million data points. In this model configuration, five timestamps were consolidated into a single data point. For further information regarding the set of hyperparameters and features employed in this model, please navigate to the top bar and find the Notebook (load_dump_lightgbm_demo.ipynb).
| Label | Precision | Recall | F1-Score |
|---|---|---|---|
| Driving | 0.9969 | 0.9958 | 0.9964 |
| Dump | 0.8388 | 0.8867 | 0.8621 |
| Load | 0.8445 | 0.8647 | 0.8545 |
Taking advantage of the available GPS data, mass quantities, and type of mass moved, we have developed a report
displaying the daily progress at the construction site. The report includes an interactive plot of mass moved,
idle time statistics, and productivity information per machine per material type. The report is generated
automatically for a chosen day.
The daily report is meant to serve as a proof of concept of what could be deployed to an analytics and interactive
visualization web application. The solution could be hosted at popular services, such as Tableau, Grafana, or
Power
BI. In this sense, it is not going to be an algorithm that directly proposes solutions but rather enables
insights
into how machines operate, both at an individual and group level. We look closer at two main components of the
daily report.
In the interactive map, the user can select the machine type and the amount of clusters for dump and load regions to display. Drone images from Skaret and Nordlandsdalen are overlaid on the map for better visuals of how the region changes over time as construction progresses. An agglomerative clustering and convex hull algorithm is used to plot polygons of load and dump regions. The interactive plot informs on the quantity moved between regions, the top workers of the day (tons/hr), and the total mass moved for each material type.
Day Overview, 04-08-2022
Total mass moved for the day by Truck: 11504.0 t
Stone: 11504.0 t, Soil:
0 t, Equipment: 0 t, Other: 0 t
Top 5 mass transfer zones for the day
L2->D0: 2153.0 t
L4->D2:
1113.0 t
L6->D2: 1023.0 t
L0->D4: 900.0 t
L4->D3: 895.5 t
Top 3 workers of the day
Nr.1 ID: 30 moved 150.6 t of mass moved per hour
Nr.2 ID: 22 moved 148.2 t of mass moved per
hour
Nr.3 ID: 44 moved 139.2 t of mass moved per hour
Knowing when, where, and why machines are idle throughout the day can be an important insight when considering operational changes. The daily report provides a timeline plot, as can be seen in Figure 3. The red line, named Machines in action, illustrates the number of machines having started their day, i.e. have recorded their first load of the day. The number of idle machines can be seen by looking at the blue line. This tracks all machines standing still, including machines loading and dumping. Furthermore, one can filter on their status, i.e. are their next activity expected to be a load or dump? We see that the number of idle machines peaks around 09:30 and 13:30.
There might be many reasons for the machines to be idle. Some drivers might be on a break, others are attending other duties or some might be waiting for an excavator. To provide more insight into this matter, we plot the position of idle machines on peak times for the selected day. An example of this can be seen in Figure 4. The map allows the user to zoom in on areas of interest. Machines waiting to load are marked by an excavator icon, while machines waiting to dump are marked by an icon of a dumping truck. This feature could be further developed with vibration data. Consequently one could distinguish between actual idle time and times when the machine is at complete rest.
Finally, we present a heatmap of idle machines for a given day. Collecting all idle positions of all machines for a given day, we get an impression of where we are most likely to find idle machines. The heatmap can be seen in Figure 5. Combining the idle timeline, the peak idle time plot, and the heatmap, one should be able to identify potential bottlenecks and areas of reduced efficiency.