Using ROS for Computer Vision Tasks

Hi! I’m Andrey Tatarinov, and I’ve been developing machine learning solutions for over seven years. In this video, I’ll talk about ROS (Robot Operating System) and how it helps solve computer vision tasks with real hardware integration. We’ll cover:

Why traditional approaches to these tasks face challenges.
What ROS is and how it helps us.
An example of a ROS-based application: a waste-sorting robot.
How to get started with ROS.

Integrating Computer Vision with Real Hardware

In many projects, the goal isn’t just to recognize objects in photos or videos but to integrate the system with real-world equipment. For example:

Capturing video and recognizing people.
Detecting QR codes and sending data via API.
Tracking events in bowling, recording scores and timestamps.
Identifying objects and signaling a robotic arm to pick them up.

The Naive Approach

You might start with OpenCV for video capture and write Python scripts for processing. It seems simple, but soon issues arise:

Tight coupling to data sources. Capturing video from a USB camera differs from processing network streams or video files. Switching sources becomes difficult.
Lack of transparency. In complex pipelines, it’s hard to see what happens between processing steps, like how data is transformed before sending a command.
Monolithic architecture. Even small changes require major rewrites. More time is spent building tools than focusing on core business logic.

The solution to these problems is to use systems that support modularity. For example, ROS allows you to develop applications where components work independently and can be easily combined.

The Solution: Modular Systems with ROS

ROS is not an operating system despite its name. It’s a platform for developing systems with multiple components. ROS provides modularity, introspection, and ready-made solutions for common tasks.

How ROS Works

ROS (Robot Operating System) provides an architecture based on nodes and topics:

Nodes perform independent tasks. For example, one node can capture video, while another processes the image.
Topics handle message exchange between nodes. They act as channels through which nodes share data.
Message Queues (buffers) allow subscribers to receive only the latest data, which is useful for real-time systems.

Example: In a surveillance system, ROS allows separate handling of camera streams, object detection, and device control — all through different nodes.

This approach allows you to:

Replace individual modules, such as swapping a USB camera for another one or using video playback for testing.
Add introspection to easily see what data is passing through the system.
Avoid code duplication with ready-made ROS components for video capture, processing, and control.

Comparison with Other Tools

While there are other tools like GStreamer and Mediapipe, they address more specific tasks:

GStreamer: Focuses on multimedia stream processing.
Mediapipe: Suitable for local video processing tasks, such as on mobile devices.
ROS: Provides a complete ecosystem for building complex modular applications, including device control, data processing, and monitoring.

ROS Limitations

Despite its advantages, ROS has some drawbacks:

Requires significant effort during the setup phase.
Knowledge of specific tools and concepts is essential.
Performance depends on the system configuration and proper node design.

Example: Waste-Sorting Robot

Let’s look at a waste-sorting system. For such a robot, you can define four key nodes:

Video Capture. The node publishes 10 frames per second to a message queue.
Object Recognition. The node processes the latest frame and publishes the results.
Robotic Arm Control. The node monitors the arm’s status and receives commands.
Decision-Making. The node combines recognition data with the arm’s status to send action signals.

ROS supports asynchronous communication between nodes, which means nodes can:

Operate at different speeds.
Process data independently from other modules.
Subscribe to topics only when needed.

For example, the video capture node might publish data at 30 frames per second, while the recognition node processes only every tenth frame.

How to Get Started with ROS

If you want to try ROS:

Install ROS. Choose a distribution, such as ROS Noetic.
Study the Documentation. The official ROS website has excellent resources for beginners.
Build a Simple Project. Start with a basic system: capture video, process it, and display the results.

ROS is ideal for tasks that require modularity and scalability. It reduces integration complexity, allowing you to focus on solving the core problem rather than building tools from scratch.

Conclusion

ROS is a powerful tool for creating modular systems, especially in the field of computer vision. Its architecture supports efficient development, scaling, and maintenance of complex applications. While it takes time to learn and set up, the flexibility and convenience it offers make it worth the effort.

At Epoch8, we develop computer vision solutions based on ROS, creating modular and efficient systems for automation and robotics.