Computer Vision for Cargo Inspection at Warehouses
In logistics and transportation, keeping cargo safe is a top priority. Warehouses must ensure that shipments are protected while they are stored, since any damage can hurt customer trust and lead to costly fines. The solution to this issue is a quality cargo inspection and condition reporting, a process that can be simplified by artificial intelligence.
When ACI (Automatic Cargo Inspection) approached us with the task of developing a system to automate the cargo inspection, the primary requirements were:
Documenting the arrival of cargo.
Classifying the type of packaging.
Detecting surface-level defects and damages on the items.
Measuring both the dimensions and weight of the objects.
Counting the total number of cargo units.
Generating detailed reports of received goods, including any noted damages.
By using computer vision and machine learning technologies, we created an automated cargo inspection system that meets all of these requirements. And in this article we will tell you about it in detail.
System Overview
Cargo inspection is performed by using a rectangular metal structure that we call ‘stand’. A set of scales is positioned in the middle to weigh the cargo. Cameras are mounted on the top of each side of the stand to provide a 360° view.
Upon arrival, the cargo is placed on the scales for weighing, while the cameras capture images from all angles. The stand's display offers real-time status updates such as loading, data collection, and load removal instructions.
Warehouse staff must follow on-screen instructions for seamless operation. The collected data then is securely stored and linked to a unique QR code assigned to each shipment. Once the data is processed, the cargo is removed from the scales, marking the completion of the inspection process.
After inspection, a shipment card is generated, detailing any detected external defects, packaging type, dimensions, weight, and the total number of cargo units.
The system we developed provides:
360° View: Capturing photo of the cargo from all angles.
Weight Measurement: Precise weighing of each shipment to ensure accurate records.
Defect Detection: Identifying and classifying any visible damages on the packaging or goods.
Dimensional Analysis: Accurate measurements of cargo dimensions (length, width, height) with a high degree of precision.
Packaging Classification: Categorization of the packaging type used for the shipment.
Barcode, Data Matrix and QR Code Scanning: Seamless integration with customer ERP systems through unique identifiers.
Cargo Unit Count: Estimation of the number of items within a consignment.
Shipment Card Generation: Automatic compilation and storage of all cargo-related data in a standardized format for quick retrieval.
Automated Reporting: Creation of detailed reports documenting the status of each shipment.
Technologies and Tools
To achieve these results, we utilized the following technologies:
Label Studio: An open-source annotation tool for data labeling and moderation.
Datapipe: An ETL tool with automatic tracking of data dependencies, used for integrating new data and model retraining.
YOLOv5 & YOLOv8: State-of-the-art computer vision models for object detection and segmentation.
Django Administration: An open-source interface for system management.
The Start: Training the Models
On the planning stage of our project, we analysed all of the potential issues. One of the main challenges was managing data as the company rapidly expanded. ACI is growing rapidly, setting up stands in the warehouses of new customers. With each new customer, there would be new types of packages and defects to recognise.
We realised that our models would have to keep getting 'smarter' to do its job properly. In other words, we needed a flexible system that could adapt to constant change by learning from new data.
To solve this problem, we chose the Datapipe ETL library as the basis of our training pipeline because of its unique features: dependency tracking and incremental data processing. We use it to fine-tune our models on new data. Dependency tracking identifies which data has changed, while incremental data processing allows Datapipe to process only new or updated data, reducing the load on the system and speeding up the pipeline.
Cargo Inspection: How it works
The data processing pipeline of the system looks like this:
Data collection: weighing and photographing the cargo.
The cargo image is uploaded to the system's Pipeline and processed in two independent model blocks:
Segmentation: identifying the type of packaging and the boundaries of the shipment.
Detection: searching for defects.
Combining the results: segmentation and detection results are combined to provide a complete picture of the condition of the cargo.
Data processing: analysing the segmentation and detection results for further calculations.
3D visualisation: The system uses images of the cargo to build a 3D model of the cargo.
Determining the dimensions of the shipment and counting cargo units. Based on the 3D model, the system calculates the dimensions of the cargo and the number of cargo units.
1. Data collection
When a shipment arrives at the warehouse, it gets identified by scanning a QR code, and all the data: weight, type of packaging, dimensions, defects and so on, are linked to this code. In this way, we control the information about the cargo at every stage of its handling and transportation.
Once the cargo has been weighed and photographed, the images are fed into the system and displayed on the administrator interface screen. The images are then processed by two independent model blocks: for cargo segmentation and for defect detection.
2. Cargo Segmentation
After images are loaded into the system's pipeline, they are subjected to a segmentation model to determine the type or classes of cargo packaging: cardboard boxes, rolls, wooden crates, etc.
We divide packaging classes into micro classes and macro classes.
Micro classes of packaging:
'Unpackaged'
'Barrels'
'Wooden crate'
'Wooden boxes'
'Other'
'Cardboard boxes'
'Bags'
'Rolls'
Macro classes:
'Film-covered barrels'
'Film-covered wooden crates'
'Film-covered wooden boxes'
'Film-covered cardboard boxes'
'Film-covered bags'
'Film-covered rolls'
Using convolutional neural networks, YOLOv8 breaks the image into small sections and processes them with special filters. These filters look for simple features such as edges, corners or textures that can help recognise what is depicted in a section.
Once the model has analysed all sections of the image and identified important features, it determines which parts of the image belong to different packing classes.
3. Defect Detection
In the detection phase, the system looks for defects on the package such as damage, tears or deformations. For this purpose, we use YOLOv5 computer vision technology. We trained the models using more than seven thousand images of different defects.
So far, the system recognises the following defects:
‘Torn film - hole’
‘Torn film - dangling pieces’
‘Open - gap in the cargo’
‘Open - film does not cover the load from above’
‘Torned’
‘Pierced’
‘Crumpled bag’
‘Crumpled box’
‘Suspicion of shortage’
‘Suspicious shortage - non-standard number of loads in the top row’
‘Broken pallet’
‘Wet’
During image processing, YOLOv5 splits the image into many small sections and defines for each section:
Bounding boxes (bbox) - coordinates of the defect boundaries in the image.
Class - defect type.
Score - probability that the found defect belongs to this class.
The models of detection and segmentation work independently from each other. We know what a shipment's packaging looks like and we know what defects we have on the shipment. But now we want to know where exactly the defects are located on the surface of the cargo.
To do this, the system combines the prediction results of the two models by combining the package class information with the coordinates of the detected defects. The defect coordinates are sent to the coordinate system for further processing and synchronisation with the segmentation data. By matching the coordinates, the system determines the exact location of the defects on the surface of the load.
4. Determining cargo dimensions and number of cargo units
To determine the dimensions of the cargo, we create a 3D visualisation of the object based on its image. We have the image after segmentation with the marked outline of the cargo (polygon), as well as depth information for each point, i.e. the distance from the camera to the object. Using this data and the transformation vector, we can build a 3D model of the object in a few steps:
Calibrating the camera. First we calibrate the camera. To do this, ArUco markers with known 3D coordinates are placed on the 'stand' (for example, on the floor their Z-coordinate is zero). These markers help to link the 2D-coordinates of the image with the real coordinates. The system finds the ArUco-markers in the image, locates them in the image coordinate system, and uses this data to calibrate the camera.
Coordinate transformation. For each point in the image where depth is known, the system converts the 2D coordinates to 3D using a transformation vector that was calculated during camera calibration. The transformation vector is a set of parameters that allows the coordinates to be transformed from the camera coordinate system (camera SC) to the world coordinate system (world SC). If the depth is incorrect (e.g., equal to 0), the point is not used.
Creating a point cloud. Once all points in the image have been converted to 3D coordinates, they form a point cloud - a set of spatial coordinates (X, Y, Z) that represent the 3D shape of the load. This point cloud reflects the external contours of the object and its shape.
By filtering and processing a 3D point cloud, the system determines the dimensions of the load. First, it selects only a fraction of the points to make the calculations easier. Then it removes those points that are too far apart or outside the desired perimeter. The system then corrects the position of the load to measure it accurately in the axis. Finally, it removes unnecessary data and calculates the dimensions from the outermost points.
To count the number of cargo units, the system compares and combines data from multiple cameras. It analyses the bounding boxes around the objects and tracks their movement to see which objects in different images are the same cargo. In this way, the system accurately counts the number of loads.
5. Quality control and fine-tuning the model
The logistics and warehousing industry is constantly placing new demands on machine learning models. As a company grows, the number of customers increases, and so do the types of packages and defects that the system needs to detect. Under such conditions, a model trained only on old data starts to get 'lost', producing inaccurate results.
The solution to this problem is to retrain the model. That is, we train the pre-trained model on a new dataset that contains information about new types of defects or packages. This process is called 'fine-tuning'.
We use Datapipe - an ETL library that allows the system to process new data easily and efficiently. Using incremental processing, Datapipe 'sees' new or changed data and carefully adds it to the overall array without affecting unused parts of the data. It's like updating a photo collection by simply adding photos, instead of rephotographing the entire album.
To control the quality of our inspection system, we have created a process to monitor and regularly fine-tune the models. It consists of several steps:
Data Collection. Our company has a network of test benches that are installed both at clients' homes and in our lab. These stands continuously collect data for analysis.
Weekly analysis. Every week, our experts review all the collected data and look for any errors and deviations in the performance of the models. This allows us to spot problems in time and take action.
Error identification. During the analysis, we highlight errors that can affect the accuracy of the models. These are usually divided into several main types:
Inaccurate dimensions: the model is wrong in determining the dimensions of objects.
Insufficient defect detection: the model fails to recognise all possible defects.
Cargo segmentation errors: the model confuses different types of cargo and categorises them incorrectly.
Data labelling. We mark all found errors with special tags. This helps to structure the data and prepare it for retraining.
Fine-tuning. After all errors are tagged, we use this data to fine-tune the models.
The fine-tuning process allows the model to adapt to specific conditions or tasks that were not covered in the original training sample. For example, one of the customer's clients uses green shipping pallets to transport goods. The model was not initially trained to find them, but thanks to the fine-tuning approach, this is a fairly easy process. When we add the new data, Datapipe automatically updates the model.
Quality assessment: After fine-tuning the models are tested to assess improvements and ensure their robustness in performance.
This cyclical process helps to maintain high quality model performance and fix any issues in time.
6. Quality Metrics
The key metrics of a model must be analysed on a regular basis in order to assess its performance. For this purpose, we calculate the following metrics:
Precision and Recall to evaluate the accuracy and completeness of object recognition.
The F1-score, as the harmonic mean of Precision and Recall, shows how accurately and completely the model recognises the data. The closer the value is to 1, the better the model's performance.
Weighted and Macro F1-score for overall evaluation of model performance: Weighted F1-score takes into account the frequency of each class and shows how the model handles each class, while Macro F1-score evaluates the accuracy of each class regardless of its frequency.
We use the Metabase platform to monitor metrics. It is easy to use and great for tracking the dynamics of model performance metrics.
Note the growth of the green pallets graph after the fine-tuning phase: the F1-score immediately increased to 0.7-0.8, indicating the effectiveness of the approach and the high performance of the models.
Why choose the ACI system?
1. Objectivity
Human judgment is subjective and influenced by factors like fatigue and inattention, which can compromise inspection quality. The AI system, on the other hand, applies consistent standards and evaluation criteria to every shipment. It analyzes cargo based solely on received data and predefined parameters, reducing the risk of errors or security breaches.
2. Cost-efficiency
Despite the initial investment in equipment and set-up of an automated system, in the long run it proves to be more profitable than using human labour. Its implementation reduces staff training costs, minimises errors and reduces the likelihood of downtime due to human error. The system is also easily scalable.
3. Safety
In some industries, shipments may contain hazardous materials that pose a health risk. This is particularly the case with chemicals and flammable substances. In such cases, the slightest damage to packaging can have serious consequences. Automated cargo inspection is therefore much safer than manual inspection.
Conclusion
Our team has successfully created a system for automated cargo inspection. Using computer vision and machine learning, our system collects all the necessary data about the cargo and generates a report on its condition. And by constantly refreshing the models, the system is stable and can adapt to new customer requirements.
A big thank you to our developers:
Andrey Tatarinov, CEO and STO of epoch8.co and AGIMA.AI
Engineering team: Arseniy Koryagin, Alexander Kozlov, Dmitry Lesnichiy, Timur Sheidaev, Renat Shakirov, Lev Evtodienko, Alexander Korotayevsky.