Color and Size Sorting System with An ESP32-Based Robot Arm

Robotics has seen an explosive increase in usage in the past few decades owing to its abundance of benefits and automation capabilities. The use of robotics is especially prominent in industries where the demand for precise and automatic robots in order to increase production efficiency is high. One of the robots commonly used in these industries is the arm robot model, which is a machine designed specifically to mimic some of the human arm’s capabilities in order to execute certain tasks over and over again, either autonomously or not. The development of a simple and replicable robot arm that is capable of autonomously performing certain tasks will certainly be beneficial to the application of robotics throughout the country. This paper discusses the design of a robotic arm with five degrees of freedom capable of picking up and placing objects with three different colors and sorting the objects based on their size, starting from the largest to the smallest. The inclusion of an additional degree of freedom allows the robot to adjust the rotation of its grip to properly grab rotated objects. With an ESP32-CAM acting as its eyes, the robot is aided by a Python program run remotely on a separate Laptop in order to analyze video stream uploaded by the camera to determine the coordinates, color, size, and rotation of objects placed within the Detection Zone. The implementation of inverse kinematics equations run on an ESP32 microcontroller allows the robot to reach arbitrary coordinates specified by the Python program. Testing revealed that the resulting robot is able to perform the tasks it was designed to, namely recognizing and sorting colored boxes of different colors and sizes, with an accuracy 100% in recognition tasks and an accuracy of 91.5% in sorting tasks.


INTRODUCTION
The advancement of modern technology has gotten to the point where humans are able to create machines that mimic human movements to varying degrees in order to help humans perform certain tasks.The Oxford English Dictionary defines a robot as "a machine resembling a human being and able to replicate certain human movements and functions automatically."[1] Some of these machines are designed specifically to mimic the versatility of the human arm, allowing the robot, dubbed "robotic arm" to perform tasks requiring some degree of precision.These robot arms are commonly found in modern factories, where they act as autonomous, tireless workers that with extremely high reliability and efficiency compared to their human counterparts.Robot arms commonly move by utilizing servo motors, actuators that are capable of remembering their current position and move to other, specified position.[2] Given the advantages a robot arm could provide to the efficiency of industrial processes, it is worthwhile to invest time and effort into creating a robot arm with high flexibility and a variety of functions that is capable of autonomously sorting objects based on certain characteristics, namely size and color.By implementing inverse kinematics, it is possible to allow the robot to pick up objects placed in arbitrary locations within the robot's reach.The utilization of Computer Vision allows the robot to monitor a certain area and act autonomously according to changes that happens in this area.[3] This research references a number of previous relevant and similar researches as a basis upon which to build the robot arm system.The following contains an enumeration of previous researches that have influenced this project.Work referenced in number [4] was done to create an automatic robot arm with the capability to sort objects according to their color using a TCS 3200 sensor.A robot arm was designed in work [5] with a single degree of freedom on which it pivoted to grab and detect object's color before placing said object in its appropriate storage zone.Work [6] delved into applying inverse kinematics in order to control a 3-DOF robot arm.The work referenced in number [7] was done to design a 5 DOF robot arm capable of distinguishing objects of three different colors and placing them in their respective storage areas.Utilizing inverse kinematics, Color and Size Sorting System with an ESP32 Robot Arm (Muhamad Gilang Satria Adiguna) this robot arm was able to pick up objects placed in arbitrary positions without additional inputs from the user as long as the objects were located on the robot's detection area.Work [8] was a research project in which a robot arm that was capable of moving objects that were placed in arbitrary locations within the robot's reach was designed.The objects used in this research were cylindrical and triangular in shape, between which the robot was also able to properly distinguish.This three-degrees-of-freedom robot utilizes a three DOF inverse kinematics to define the joint parameters required to move the gripper to grab objects.Powering this robot are the microcontroller Raspberry Pi 3 Model B and the OpenCM 9.04.
To further develop the projects created in these previous researches, this research aims to create a robot arm with the capability to accurately sort objects not only based on color, but also size.This project uses an ESP32 as the brain of the robot, a microcontroller with dual core processor developed by Espressif Systems.[9].The robot also uses an ESP32-CAM equipped with OV2640 camera module as its eyes.This robot is designed to pick up objects with three different colors placed arbitrarily on the detection zone.This feature was allowed by the utilization of a 5-DOF inverse kinematics system run on the ESP32 microcontroller.As an added feature, this robot arm is also able to properly pick up objects placed with arbitrary orientation by rotating the gripper to match the object's orientation.This research uses Python, a high-level programming language used widely around the world developed by Guido van Rossum known for its intuitiveness and ease of use [10], as the primary language to build the robot arm's software.

RESEARCH METHOD
This section contains the overview of the system, the robot arm's physical design, the inverse kinematics equations used for the robot arm, and the software running the robot arm.

General Overview of the System
Figure 1 below explains the general overview of the system.

Figure 1. General Overview of the System
As seen on Figure 1, the system consists of an ESP32-CAM equipped with OV2640 camera module, a laptop or a computer on which the Python program is run, an ESP32 microcontroller, as well as three MG90s servos coupled with three MG996R servos to move the robot arm.Upon activation, the camera monitors a certain area containing the detection zone upon which objects are placed.This footage is streamed to ESP32-CAM's own web server via Wi-Fi.The Python program run on either a laptop or a computer then grabs and analyzes each frame of this stream to detect boxes placed on the detection zone.In this case, the camera's only function is to provide raw footage for the program to analyze.Once an object is detected by the program, it finds the characteristics of said object including the object's location (X, Y), its orientation (R), its color, and its size.The program subsequently creates a command to be sent to the ESP32 based on these characteristics.The ESP32 microcontroller receives commands from the Python program through its own ESP32 web server that takes specific URLs and translates them into parameters for the inverse kinematics equations, which output certain values for the servo motors to move the robot arm into position to grab the object.As an additional feature, it is possible to control the robot arm by manually inputting certain X, Y, and Z values to the ESP32 web server.

Software Design
The software handles object detection, which include detection of the object's size, color, and location.In order to allow the program to accurately determine the location of the object, a detection zone measuring 290mm by 200mm is used in this research.The program first detects this detection zone in the raw footage provided by the camera before projecting the detection zone onto an image with a dimension of 290 pixels by 200 pixels with perspective transform.This is done so that a distance of one pixel in the image is roughly equal to a distance of one millimeter in the real world.Contour approximation, a method to simplify the number of edges a contour has, [11] is applied here in order to extract a contour with four edges to use for perspective transform.Figure 2 shows this process.

Figure 2. Detection Zone Extraction Process
After the detection zone has been extracted, it's important to inform the program of the location of the detection zone relative to the robot arm.As seen in Figure 3, the pixel coordinates system doesn't match the real-world coordinate system used in the robot arm's inverse kinematics equations.Therefore, it's necessary to provide the necessary X and Y translational values to reconcile these two coordinate systems, as stated in Equation 1.

p = pixel; t = translational; n = real
The program then has to detect the characteristics of the objects placed on the detection zone.This process is explained in Figure 4.

Figure 4. Object Detection Process
First, the program detects the dominant color present in the image.The program then applies a mask over the image to isolate one specific color and then draws contour over the objects to determine their size.The program also determines where to put each object based on its size and the size of objects already on the storage area.After sorting these objects from largest to smallest, the program determines the coordinates of the centroid of each object before sending these coordinates to the robot arm through the ESP32 web server for the inverse kinematics equations.This object detection method incorporates the usage of Canny's Edge Detection method, a method that measures the change in intensity between pixels (gradients) in order to detect edges in an image.[12].This edge detection method also utilizes non-maxima suppression and hysteresis thresholding to increase accuracy.[13]

Robot Arm Design
The chassis of the robot arm is 3D printed using plastic as its primary material.The robot arm consists of five different parts: the base that supports the entire robot arm, a first link connecting the base and the second link, the second link connecting the first link and the third link, and the third link which connects the second link and the gripper, and the gripper itself.The robot arm has five degrees of freedom and uses six servo motors to move, three MG996R servos used for the base, first joint, and second joint, as well as three MG90s servos used for the third joint, gripper orientation joint, and the gripper joint.Figure 5 contains the digital design of the robot arm.

Robot Arm Inverse Kinematics
Inverse kinematics is used to allow the robot arm to translate coordinates into movement.There are two parts of this process as illustrated in Figure 6 and Figure 7.The first part deals with the movement of the base joint seen from a bird's eye point of view, while the second deals with the movement of the first, second, and third joints seen from the side.The inverse kinematics equations are made based on Figure 6 and Figure 7, resulting in Equation 2 through Equation 12.The resulting values of these equations are then written to their corresponding servo motors in order to achieve the desired movement and robot arm position.[14] [15]

TESTING AND ANALYSIS
In order to evaluate the performance of the robot arm, a number of tests was performed on it.The tests are as follows: Object Detection and Object Relocation Test, Object Relocation Test with Arbitrary Object Orientation, and Object Sorting Test, which will be explained in their respective sections.All of these tests were conducted with object storage coordinates presented in Table 1.In the table, each color's storage zone is divided into three sections.The X and Y values denotes the location of the section, while the R value denotes the desired object placement orientation.