feat: 切换后端至PaddleOCR-NCNN,切换工程为CMake
1.项目后端整体迁移至PaddleOCR-NCNN算法,已通过基本的兼容性测试 2.工程改为使用CMake组织,后续为了更好地兼容第三方库,不再提供QMake工程 3.重整权利声明文件,重整代码工程,确保最小化侵权风险 Log: 切换后端至PaddleOCR-NCNN,切换工程为CMake Change-Id: I4d5d2c5d37505a4a24b389b1a4c5d12f17bfa38c
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/images/MachineLearnings.jpg
vendored
Normal file
After Width: | Height: | Size: 3.2 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/images/calib3d_icon.jpg
vendored
Normal file
After Width: | Height: | Size: 3.1 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/images/core.jpg
vendored
Normal file
After Width: | Height: | Size: 4.1 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/images/featureicon.jpg
vendored
Normal file
After Width: | Height: | Size: 5.0 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/images/gui.jpg
vendored
Normal file
After Width: | Height: | Size: 3.2 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/images/imgproc.jpg
vendored
Normal file
After Width: | Height: | Size: 2.6 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/images/intro.png
vendored
Normal file
After Width: | Height: | Size: 3.2 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/images/obj_icon.jpg
vendored
Normal file
After Width: | Height: | Size: 2.9 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/images/photoicon.jpg
vendored
Normal file
After Width: | Height: | Size: 4.9 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/images/videoicon.jpg
vendored
Normal file
After Width: | Height: | Size: 6.3 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_bindings/images/nlm_icon.jpg
vendored
Normal file
After Width: | Height: | Size: 3.6 KiB |
190
3rdparty/opencv-4.5.4/doc/py_tutorials/py_bindings/py_bindings_basics/py_bindings_basics.markdown
vendored
Normal file
@ -0,0 +1,190 @@
|
||||
How OpenCV-Python Bindings Works? {#tutorial_py_bindings_basics}
|
||||
=================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
Learn:
|
||||
|
||||
- How OpenCV-Python bindings are generated?
|
||||
- How to extend new OpenCV modules to Python?
|
||||
|
||||
How OpenCV-Python bindings are generated?
|
||||
-----------------------------------------
|
||||
|
||||
In OpenCV, all algorithms are implemented in C++. But these algorithms can be used from different
|
||||
languages like Python, Java etc. This is made possible by the bindings generators. These generators
|
||||
create a bridge between C++ and Python which enables users to call C++ functions from Python. To get
|
||||
a complete picture of what is happening in background, a good knowledge of Python/C API is required.
|
||||
A simple example on extending C++ functions to Python can be found in official Python
|
||||
documentation[1]. So extending all functions in OpenCV to Python by writing their wrapper functions
|
||||
manually is a time-consuming task. So OpenCV does it in a more intelligent way. OpenCV generates
|
||||
these wrapper functions automatically from the C++ headers using some Python scripts which are
|
||||
located in `modules/python/src2`. We will look into what they do.
|
||||
|
||||
First, `modules/python/CMakeFiles.txt` is a CMake script which checks the modules to be extended to
|
||||
Python. It will automatically check all the modules to be extended and grab their header files.
|
||||
These header files contain list of all classes, functions, constants etc. for that particular
|
||||
modules.
|
||||
|
||||
Second, these header files are passed to a Python script, `modules/python/src2/gen2.py`. This is the
|
||||
Python bindings generator script. It calls another Python script `modules/python/src2/hdr_parser.py`.
|
||||
This is the header parser script. This header parser splits the complete header file into small
|
||||
Python lists. So these lists contain all details about a particular function, class etc. For
|
||||
example, a function will be parsed to get a list containing function name, return type, input
|
||||
arguments, argument types etc. Final list contains details of all the functions, enums, structs,
|
||||
classes etc. in that header file.
|
||||
|
||||
But header parser doesn't parse all the functions/classes in the header file. The developer has to
|
||||
specify which functions should be exported to Python. For that, there are certain macros added to
|
||||
the beginning of these declarations which enables the header parser to identify functions to be
|
||||
parsed. These macros are added by the developer who programs the particular function. In short, the
|
||||
developer decides which functions should be extended to Python and which are not. Details of those
|
||||
macros will be given in next session.
|
||||
|
||||
So header parser returns a final big list of parsed functions. Our generator script (gen2.py) will
|
||||
create wrapper functions for all the functions/classes/enums/structs parsed by header parser (You
|
||||
can find these header files during compilation in the `build/modules/python/` folder as
|
||||
pyopencv_generated_\*.h files). But there may be some basic OpenCV datatypes like Mat, Vec4i,
|
||||
Size. They need to be extended manually. For example, a Mat type should be extended to Numpy array,
|
||||
Size should be extended to a tuple of two integers etc. Similarly, there may be some complex
|
||||
structs/classes/functions etc. which need to be extended manually. All such manual wrapper functions
|
||||
are placed in `modules/python/src2/cv2.cpp`.
|
||||
|
||||
So now only thing left is the compilation of these wrapper files which gives us **cv2** module. So
|
||||
when you call a function, say `res = equalizeHist(img1,img2)` in Python, you pass two numpy arrays and
|
||||
you expect another numpy array as the output. So these numpy arrays are converted to cv::Mat and
|
||||
then calls the equalizeHist() function in C++. Final result, res will be converted back into a Numpy
|
||||
array. So in short, almost all operations are done in C++ which gives us almost same speed as that
|
||||
of C++.
|
||||
|
||||
So this is the basic version of how OpenCV-Python bindings are generated.
|
||||
|
||||
@note There is no 1:1 mapping of numpy.ndarray on cv::Mat. For example, cv::Mat has channels field,
|
||||
which is emulated as last dimension of numpy.ndarray and implicitly converted.
|
||||
However, such implicit conversion has problem with passing of 3D numpy arrays into C++ code
|
||||
(the last dimension is implicitly reinterpreted as number of channels).
|
||||
Refer to the [issue](https://github.com/opencv/opencv/issues/19091) for workarounds if you need to process 3D arrays or ND-arrays with channels.
|
||||
OpenCV 4.5.4+ has `cv.Mat` wrapper derived from `numpy.ndarray` to explicitly handle the channels behavior.
|
||||
|
||||
|
||||
How to extend new modules to Python?
|
||||
------------------------------------
|
||||
|
||||
Header parser parse the header files based on some wrapper macros added to function declaration.
|
||||
Enumeration constants don't need any wrapper macros. They are automatically wrapped. But remaining
|
||||
functions, classes etc. need wrapper macros.
|
||||
|
||||
Functions are extended using `CV_EXPORTS_W` macro. An example is shown below.
|
||||
@code{.cpp}
|
||||
CV_EXPORTS_W void equalizeHist( InputArray src, OutputArray dst );
|
||||
@endcode
|
||||
Header parser can understand the input and output arguments from keywords like
|
||||
InputArray, OutputArray etc. But sometimes, we may need to hardcode inputs and outputs. For that,
|
||||
macros like `CV_OUT`, `CV_IN_OUT` etc. are used.
|
||||
@code{.cpp}
|
||||
CV_EXPORTS_W void minEnclosingCircle( InputArray points,
|
||||
CV_OUT Point2f& center, CV_OUT float& radius );
|
||||
@endcode
|
||||
For large classes also, `CV_EXPORTS_W` is used. To extend class methods, `CV_WRAP` is used.
|
||||
Similarly, `CV_PROP` is used for class fields.
|
||||
@code{.cpp}
|
||||
class CV_EXPORTS_W CLAHE : public Algorithm
|
||||
{
|
||||
public:
|
||||
CV_WRAP virtual void apply(InputArray src, OutputArray dst) = 0;
|
||||
|
||||
CV_WRAP virtual void setClipLimit(double clipLimit) = 0;
|
||||
CV_WRAP virtual double getClipLimit() const = 0;
|
||||
}
|
||||
@endcode
|
||||
Overloaded functions can be extended using `CV_EXPORTS_AS`. But we need to pass a new name so that
|
||||
each function will be called by that name in Python. Take the case of integral function below. Three
|
||||
functions are available, so each one is named with a suffix in Python. Similarly `CV_WRAP_AS` can be
|
||||
used to wrap overloaded methods.
|
||||
@code{.cpp}
|
||||
//! computes the integral image
|
||||
CV_EXPORTS_W void integral( InputArray src, OutputArray sum, int sdepth = -1 );
|
||||
|
||||
//! computes the integral image and integral for the squared image
|
||||
CV_EXPORTS_AS(integral2) void integral( InputArray src, OutputArray sum,
|
||||
OutputArray sqsum, int sdepth = -1, int sqdepth = -1 );
|
||||
|
||||
//! computes the integral image, integral for the squared image and the tilted integral image
|
||||
CV_EXPORTS_AS(integral3) void integral( InputArray src, OutputArray sum,
|
||||
OutputArray sqsum, OutputArray tilted,
|
||||
int sdepth = -1, int sqdepth = -1 );
|
||||
@endcode
|
||||
Small classes/structs are extended using `CV_EXPORTS_W_SIMPLE`. These structs are passed by value
|
||||
to C++ functions. Examples are `KeyPoint`, `Match` etc. Their methods are extended by `CV_WRAP` and
|
||||
fields are extended by `CV_PROP_RW`.
|
||||
@code{.cpp}
|
||||
class CV_EXPORTS_W_SIMPLE DMatch
|
||||
{
|
||||
public:
|
||||
CV_WRAP DMatch();
|
||||
CV_WRAP DMatch(int _queryIdx, int _trainIdx, float _distance);
|
||||
CV_WRAP DMatch(int _queryIdx, int _trainIdx, int _imgIdx, float _distance);
|
||||
|
||||
CV_PROP_RW int queryIdx; // query descriptor index
|
||||
CV_PROP_RW int trainIdx; // train descriptor index
|
||||
CV_PROP_RW int imgIdx; // train image index
|
||||
|
||||
CV_PROP_RW float distance;
|
||||
};
|
||||
@endcode
|
||||
Some other small classes/structs can be exported using `CV_EXPORTS_W_MAP` where it is exported to a
|
||||
Python native dictionary. `Moments()` is an example of it.
|
||||
@code{.cpp}
|
||||
class CV_EXPORTS_W_MAP Moments
|
||||
{
|
||||
public:
|
||||
//! spatial moments
|
||||
CV_PROP_RW double m00, m10, m01, m20, m11, m02, m30, m21, m12, m03;
|
||||
//! central moments
|
||||
CV_PROP_RW double mu20, mu11, mu02, mu30, mu21, mu12, mu03;
|
||||
//! central normalized moments
|
||||
CV_PROP_RW double nu20, nu11, nu02, nu30, nu21, nu12, nu03;
|
||||
};
|
||||
@endcode
|
||||
So these are the major extension macros available in OpenCV. Typically, a developer has to put
|
||||
proper macros in their appropriate positions. Rest is done by generator scripts. Sometimes, there
|
||||
may be an exceptional cases where generator scripts cannot create the wrappers. Such functions need
|
||||
to be handled manually, to do this write your own `pyopencv_*.hpp` extending headers and put them into
|
||||
misc/python subdirectory of your module. But most of the time, a code written according to OpenCV
|
||||
coding guidelines will be automatically wrapped by generator scripts.
|
||||
|
||||
More advanced cases involves providing Python with additional features that does not exist
|
||||
in the C++ interface such as extra methods, type mappings, or to provide default arguments.
|
||||
We will take `UMat` datatype as an example of such cases later on.
|
||||
First, to provide Python-specific methods, `CV_WRAP_PHANTOM` is utilized in a similar manner to
|
||||
`CV_WRAP`, except that it takes the method header as its argument, and you would need to provide
|
||||
the method body in your own `pyopencv_*.hpp` extension. `UMat::queue()` and `UMat::context()` are
|
||||
an example of such phantom methods that does not exist in C++ interface, but are needed to handle
|
||||
OpenCL functionalities at the Python side.
|
||||
Second, if an already-existing datatype(s) is mappable to your class, it is highly preferable to
|
||||
indicate such capacity using `CV_WRAP_MAPPABLE` with the source type as its argument,
|
||||
rather than crafting your own binding function(s). This is the case of `UMat` which maps from `Mat`.
|
||||
Finally, if a default argument is needed, but it is not provided in the native C++ interface,
|
||||
you can provide it for Python side as the argument of `CV_WRAP_DEFAULT`. As per the `UMat::getMat`
|
||||
example below:
|
||||
@code{.cpp}
|
||||
class CV_EXPORTS_W UMat
|
||||
{
|
||||
public:
|
||||
//! Mat is mappable to UMat.
|
||||
// You would need to provide `static bool cv_mappable_to(const Ptr<Mat>& src, Ptr<UMat>& dst)`
|
||||
CV_WRAP_MAPPABLE(Ptr<Mat>);
|
||||
|
||||
/! returns the OpenCL queue used by OpenCV UMat.
|
||||
// You would need to provide the method body in the binder code
|
||||
CV_WRAP_PHANTOM(static void* queue());
|
||||
|
||||
//! returns the OpenCL context used by OpenCV UMat
|
||||
// You would need to provide the method body in the binder code
|
||||
CV_WRAP_PHANTOM(static void* context());
|
||||
|
||||
//! The wrapped method become equivalent to `get(int flags = ACCESS_RW)`
|
||||
CV_WRAP_AS(get) Mat getMat(int flags CV_WRAP_DEFAULT(ACCESS_RW)) const;
|
||||
};
|
||||
@endcode
|
8
3rdparty/opencv-4.5.4/doc/py_tutorials/py_bindings/py_table_of_contents_bindings.markdown
vendored
Normal file
@ -0,0 +1,8 @@
|
||||
OpenCV-Python Bindings {#tutorial_py_table_of_contents_bindings}
|
||||
======================
|
||||
|
||||
Here, you will learn how OpenCV-Python bindings are generated.
|
||||
|
||||
- @subpage tutorial_py_bindings_basics
|
||||
|
||||
Learn how OpenCV-Python bindings are generated.
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/images/calibration_icon.jpg
vendored
Normal file
After Width: | Height: | Size: 3.6 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/images/depthmap_icon.jpg
vendored
Normal file
After Width: | Height: | Size: 3.7 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/images/epipolar_icon.jpg
vendored
Normal file
After Width: | Height: | Size: 3.6 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/images/pose_icon.jpg
vendored
Normal file
After Width: | Height: | Size: 3.5 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/py_calibration/images/calib_pattern.jpg
vendored
Normal file
After Width: | Height: | Size: 45 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/py_calibration/images/calib_radial.jpg
vendored
Normal file
After Width: | Height: | Size: 33 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/py_calibration/images/calib_result.jpg
vendored
Normal file
After Width: | Height: | Size: 22 KiB |
225
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/py_calibration/py_calibration.markdown
vendored
Normal file
@ -0,0 +1,225 @@
|
||||
Camera Calibration {#tutorial_py_calibration}
|
||||
==================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this section, we will learn about
|
||||
|
||||
* types of distortion caused by cameras
|
||||
* how to find the intrinsic and extrinsic properties of a camera
|
||||
* how to undistort images based off these properties
|
||||
|
||||
Basics
|
||||
------
|
||||
|
||||
Some pinhole cameras introduce significant distortion to images. Two major kinds of distortion are
|
||||
radial distortion and tangential distortion.
|
||||
|
||||
Radial distortion causes straight lines to appear curved. Radial distortion becomes larger the farther points are from
|
||||
the center of the image. For example, one image is shown below in which two edges of a chess board are
|
||||
marked with red lines. But, you can see that the border of the chess board is not a straight line and doesn't match with the
|
||||
red line. All the expected straight lines are bulged out. Visit [Distortion
|
||||
(optics)](http://en.wikipedia.org/wiki/Distortion_%28optics%29) for more details.
|
||||
|
||||

|
||||
|
||||
Radial distortion can be represented as follows:
|
||||
|
||||
\f[x_{distorted} = x( 1 + k_1 r^2 + k_2 r^4 + k_3 r^6) \\
|
||||
y_{distorted} = y( 1 + k_1 r^2 + k_2 r^4 + k_3 r^6)\f]
|
||||
|
||||
Similarly, tangential distortion occurs because the image-taking lense
|
||||
is not aligned perfectly parallel to the imaging plane. So, some areas in the image may look nearer than
|
||||
expected. The amount of tangential distortion can be represented as below:
|
||||
|
||||
\f[x_{distorted} = x + [ 2p_1xy + p_2(r^2+2x^2)] \\
|
||||
y_{distorted} = y + [ p_1(r^2+ 2y^2)+ 2p_2xy]\f]
|
||||
|
||||
In short, we need to find five parameters, known as distortion coefficients given by:
|
||||
|
||||
\f[Distortion \; coefficients=(k_1 \hspace{10pt} k_2 \hspace{10pt} p_1 \hspace{10pt} p_2 \hspace{10pt} k_3)\f]
|
||||
|
||||
In addition to this, we need to some other information, like the intrinsic and extrinsic parameters
|
||||
of the camera. Intrinsic parameters are specific to a camera. They include information like focal
|
||||
length (\f$f_x,f_y\f$) and optical centers (\f$c_x, c_y\f$). The focal length and optical centers can be used to create a camera matrix, which can be used to remove distortion due to the lenses of a specific camera. The camera matrix is unique to a specific camera, so once calculated, it can be reused on other images taken by the same camera. It is expressed as a 3x3
|
||||
matrix:
|
||||
|
||||
\f[camera \; matrix = \left [ \begin{matrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{matrix} \right ]\f]
|
||||
|
||||
Extrinsic parameters corresponds to rotation and translation vectors which translates a coordinates
|
||||
of a 3D point to a coordinate system.
|
||||
|
||||
For stereo applications, these distortions need to be corrected first. To find these parameters,
|
||||
we must provide some sample images of a well defined pattern (e.g. a chess board). We
|
||||
find some specific points of which we already know the relative positions (e.g. square corners in the chess board). We know the coordinates of these points in real world space and we know the coordinates in the image, so we can solve for the distortion coefficients. For better results, we need at least 10 test patterns.
|
||||
|
||||
Code
|
||||
----
|
||||
|
||||
As mentioned above, we need at least 10 test patterns for camera calibration. OpenCV comes with some
|
||||
images of a chess board (see samples/data/left01.jpg -- left14.jpg), so we will utilize these. Consider an image of a chess board. The important input data needed for calibration of the camera
|
||||
is the set of 3D real world points and the corresponding 2D coordinates of these points in the image. 2D image points
|
||||
are OK which we can easily find from the image. (These image points are locations where two black
|
||||
squares touch each other in chess boards)
|
||||
|
||||
What about the 3D points from real world space? Those images are taken from a static camera and
|
||||
chess boards are placed at different locations and orientations. So we need to know \f$(X,Y,Z)\f$
|
||||
values. But for simplicity, we can say chess board was kept stationary at XY plane, (so Z=0 always)
|
||||
and camera was moved accordingly. This consideration helps us to find only X,Y values. Now for X,Y
|
||||
values, we can simply pass the points as (0,0), (1,0), (2,0), ... which denotes the location of
|
||||
points. In this case, the results we get will be in the scale of size of chess board square. But if
|
||||
we know the square size, (say 30 mm), we can pass the values as (0,0), (30,0), (60,0), ... . Thus, we get
|
||||
the results in mm. (In this case, we don't know square size since we didn't take those images, so we
|
||||
pass in terms of square size).
|
||||
|
||||
3D points are called **object points** and 2D image points are called **image points.**
|
||||
|
||||
### Setup
|
||||
|
||||
So to find pattern in chess board, we can use the function, **cv.findChessboardCorners()**. We also
|
||||
need to pass what kind of pattern we are looking for, like 8x8 grid, 5x5 grid etc. In this example, we
|
||||
use 7x6 grid. (Normally a chess board has 8x8 squares and 7x7 internal corners). It returns the
|
||||
corner points and retval which will be True if pattern is obtained. These corners will be placed in
|
||||
an order (from left-to-right, top-to-bottom)
|
||||
|
||||
@note This function may not be able to find the required pattern in all the images. So, one good option
|
||||
is to write the code such that, it starts the camera and check each frame for required pattern. Once
|
||||
the pattern is obtained, find the corners and store it in a list. Also, provide some interval before
|
||||
reading next frame so that we can adjust our chess board in different direction. Continue this
|
||||
process until the required number of good patterns are obtained. Even in the example provided here, we
|
||||
are not sure how many images out of the 14 given are good. Thus, we must read all the images and take only the good
|
||||
ones.
|
||||
|
||||
@note Instead of chess board, we can alternatively use a circular grid. In this case, we must use the function
|
||||
**cv.findCirclesGrid()** to find the pattern. Fewer images are sufficient to perform camera calibration using a circular grid.
|
||||
|
||||
Once we find the corners, we can increase their accuracy using **cv.cornerSubPix()**. We can also
|
||||
draw the pattern using **cv.drawChessboardCorners()**. All these steps are included in below code:
|
||||
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
import glob
|
||||
|
||||
# termination criteria
|
||||
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 30, 0.001)
|
||||
|
||||
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
|
||||
objp = np.zeros((6*7,3), np.float32)
|
||||
objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2)
|
||||
|
||||
# Arrays to store object points and image points from all the images.
|
||||
objpoints = [] # 3d point in real world space
|
||||
imgpoints = [] # 2d points in image plane.
|
||||
|
||||
images = glob.glob('*.jpg')
|
||||
|
||||
for fname in images:
|
||||
img = cv.imread(fname)
|
||||
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
|
||||
|
||||
# Find the chess board corners
|
||||
ret, corners = cv.findChessboardCorners(gray, (7,6), None)
|
||||
|
||||
# If found, add object points, image points (after refining them)
|
||||
if ret == True:
|
||||
objpoints.append(objp)
|
||||
|
||||
corners2 = cv.cornerSubPix(gray,corners, (11,11), (-1,-1), criteria)
|
||||
imgpoints.append(corners)
|
||||
|
||||
# Draw and display the corners
|
||||
cv.drawChessboardCorners(img, (7,6), corners2, ret)
|
||||
cv.imshow('img', img)
|
||||
cv.waitKey(500)
|
||||
|
||||
cv.destroyAllWindows()
|
||||
@endcode
|
||||
One image with pattern drawn on it is shown below:
|
||||
|
||||

|
||||
|
||||
### Calibration
|
||||
|
||||
Now that we have our object points and image points, we are ready to go for calibration. We can
|
||||
use the function, **cv.calibrateCamera()** which returns the camera matrix, distortion coefficients,
|
||||
rotation and translation vectors etc.
|
||||
@code{.py}
|
||||
ret, mtx, dist, rvecs, tvecs = cv.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)
|
||||
@endcode
|
||||
|
||||
### Undistortion
|
||||
|
||||
Now, we can take an image and undistort it. OpenCV comes with two
|
||||
methods for doing this. However first, we can refine the camera matrix based on a free scaling
|
||||
parameter using **cv.getOptimalNewCameraMatrix()**. If the scaling parameter alpha=0, it returns
|
||||
undistorted image with minimum unwanted pixels. So it may even remove some pixels at image corners.
|
||||
If alpha=1, all pixels are retained with some extra black images. This function also returns an image ROI which
|
||||
can be used to crop the result.
|
||||
|
||||
So, we take a new image (left12.jpg in this case. That is the first image in this chapter)
|
||||
@code{.py}
|
||||
img = cv.imread('left12.jpg')
|
||||
h, w = img.shape[:2]
|
||||
newcameramtx, roi = cv.getOptimalNewCameraMatrix(mtx, dist, (w,h), 1, (w,h))
|
||||
@endcode
|
||||
#### 1. Using **cv.undistort()**
|
||||
|
||||
This is the easiest way. Just call the function and use ROI obtained above to crop the result.
|
||||
@code{.py}
|
||||
# undistort
|
||||
dst = cv.undistort(img, mtx, dist, None, newcameramtx)
|
||||
|
||||
# crop the image
|
||||
x, y, w, h = roi
|
||||
dst = dst[y:y+h, x:x+w]
|
||||
cv.imwrite('calibresult.png', dst)
|
||||
@endcode
|
||||
#### 2. Using **remapping**
|
||||
|
||||
This way is a little bit more difficult. First, find a mapping function from the distorted image to the undistorted image. Then
|
||||
use the remap function.
|
||||
@code{.py}
|
||||
# undistort
|
||||
mapx, mapy = cv.initUndistortRectifyMap(mtx, dist, None, newcameramtx, (w,h), 5)
|
||||
dst = cv.remap(img, mapx, mapy, cv.INTER_LINEAR)
|
||||
|
||||
# crop the image
|
||||
x, y, w, h = roi
|
||||
dst = dst[y:y+h, x:x+w]
|
||||
cv.imwrite('calibresult.png', dst)
|
||||
@endcode
|
||||
Still, both the methods give the same result. See the result below:
|
||||
|
||||

|
||||
|
||||
You can see in the result that all the edges are straight.
|
||||
|
||||
Now you can store the camera matrix and distortion coefficients using write functions in NumPy
|
||||
(np.savez, np.savetxt etc) for future uses.
|
||||
|
||||
Re-projection Error
|
||||
-------------------
|
||||
|
||||
Re-projection error gives a good estimation of just how exact the found parameters are. The closer the re-projection error is to zero, the more accurate the parameters we found are. Given the intrinsic, distortion, rotation and translation matrices,
|
||||
we must first transform the object point to image point using **cv.projectPoints()**. Then, we can calculate
|
||||
the absolute norm between what we got with our transformation and the corner finding algorithm. To
|
||||
find the average error, we calculate the arithmetical mean of the errors calculated for all the
|
||||
calibration images.
|
||||
@code{.py}
|
||||
mean_error = 0
|
||||
for i in range(len(objpoints)):
|
||||
imgpoints2, _ = cv.projectPoints(objpoints[i], rvecs[i], tvecs[i], mtx, dist)
|
||||
error = cv.norm(imgpoints[i], imgpoints2, cv.NORM_L2)/len(imgpoints2)
|
||||
mean_error += error
|
||||
|
||||
print( "total error: {}".format(mean_error/len(objpoints)) )
|
||||
@endcode
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
||||
|
||||
-# Try camera calibration with circular grid.
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/py_depthmap/images/disparity_map.jpg
vendored
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/py_depthmap/images/stereo_depth.jpg
vendored
Normal file
After Width: | Height: | Size: 13 KiB |
75
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/py_depthmap/py_depthmap.markdown
vendored
Normal file
@ -0,0 +1,75 @@
|
||||
Depth Map from Stereo Images {#tutorial_py_depthmap}
|
||||
============================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this session,
|
||||
- We will learn to create a depth map from stereo images.
|
||||
|
||||
Basics
|
||||
------
|
||||
|
||||
In the last session, we saw basic concepts like epipolar constraints and other related terms. We also
|
||||
saw that if we have two images of same scene, we can get depth information from that in an intuitive
|
||||
way. Below is an image and some simple mathematical formulas which prove that intuition. (Image
|
||||
Courtesy :
|
||||
|
||||

|
||||
|
||||
The above diagram contains equivalent triangles. Writing their equivalent equations will yield us
|
||||
following result:
|
||||
|
||||
\f[disparity = x - x' = \frac{Bf}{Z}\f]
|
||||
|
||||
\f$x\f$ and \f$x'\f$ are the distance between points in image plane corresponding to the scene point 3D and
|
||||
their camera center. \f$B\f$ is the distance between two cameras (which we know) and \f$f\f$ is the focal
|
||||
length of camera (already known). So in short, the above equation says that the depth of a point in a
|
||||
scene is inversely proportional to the difference in distance of corresponding image points and
|
||||
their camera centers. So with this information, we can derive the depth of all pixels in an image.
|
||||
|
||||
So it finds corresponding matches between two images. We have already seen how epiline constraint
|
||||
make this operation faster and accurate. Once it finds matches, it finds the disparity. Let's see
|
||||
how we can do it with OpenCV.
|
||||
|
||||
Code
|
||||
----
|
||||
|
||||
Below code snippet shows a simple procedure to create a disparity map.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
imgL = cv.imread('tsukuba_l.png',0)
|
||||
imgR = cv.imread('tsukuba_r.png',0)
|
||||
|
||||
stereo = cv.StereoBM_create(numDisparities=16, blockSize=15)
|
||||
disparity = stereo.compute(imgL,imgR)
|
||||
plt.imshow(disparity,'gray')
|
||||
plt.show()
|
||||
@endcode
|
||||
Below image contains the original image (left) and its disparity map (right). As you can see, the result
|
||||
is contaminated with high degree of noise. By adjusting the values of numDisparities and blockSize,
|
||||
you can get a better result.
|
||||
|
||||

|
||||
|
||||
There are some parameters when you get familiar with StereoBM, and you may need to fine tune the parameters to get better and smooth results. Parameters:
|
||||
- texture_threshold: filters out areas that don't have enough texture for reliable matching
|
||||
- Speckle range and size: Block-based matchers often produce "speckles" near the boundaries of objects, where the matching window catches the foreground on one side and the background on the other. In this scene it appears that the matcher is also finding small spurious matches in the projected texture on the table. To get rid of these artifacts we post-process the disparity image with a speckle filter controlled by the speckle_size and speckle_range parameters. speckle_size is the number of pixels below which a disparity blob is dismissed as "speckle." speckle_range controls how close in value disparities must be to be considered part of the same blob.
|
||||
- Number of disparities: How many pixels to slide the window over. The larger it is, the larger the range of visible depths, but more computation is required.
|
||||
- min_disparity: the offset from the x-position of the left pixel at which to begin searching.
|
||||
- uniqueness_ratio: Another post-filtering step. If the best matching disparity is not sufficiently better than every other disparity in the search range, the pixel is filtered out. You can try tweaking this if texture_threshold and the speckle filtering are still letting through spurious matches.
|
||||
- prefilter_size and prefilter_cap: The pre-filtering phase, which normalizes image brightness and enhances texture in preparation for block matching. Normally you should not need to adjust these.
|
||||
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
- [Ros stereo img processing wiki page](http://wiki.ros.org/stereo_image_proc/Tutorials/ChoosingGoodStereoParameters)
|
||||
|
||||
Exercises
|
||||
---------
|
||||
|
||||
-# OpenCV samples contain an example of generating disparity map and its 3D reconstruction. Check
|
||||
stereo_match.py in OpenCV-Python samples.
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/py_epipolar_geometry/images/epipolar.jpg
vendored
Normal file
After Width: | Height: | Size: 11 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/py_epipolar_geometry/images/epiresult.jpg
vendored
Normal file
After Width: | Height: | Size: 78 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/py_epipolar_geometry/images/essential_matrix.jpg
vendored
Normal file
After Width: | Height: | Size: 15 KiB |
172
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/py_epipolar_geometry/py_epipolar_geometry.markdown
vendored
Normal file
@ -0,0 +1,172 @@
|
||||
Epipolar Geometry {#tutorial_py_epipolar_geometry}
|
||||
=================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this section,
|
||||
|
||||
- We will learn about the basics of multiview geometry
|
||||
- We will see what is epipole, epipolar lines, epipolar constraint etc.
|
||||
|
||||
Basic Concepts
|
||||
--------------
|
||||
|
||||
When we take an image using pin-hole camera, we loose an important information, ie depth of the
|
||||
image. Or how far is each point in the image from the camera because it is a 3D-to-2D conversion. So
|
||||
it is an important question whether we can find the depth information using these cameras. And the
|
||||
answer is to use more than one camera. Our eyes works in similar way where we use two cameras (two
|
||||
eyes) which is called stereo vision. So let's see what OpenCV provides in this field.
|
||||
|
||||
(*Learning OpenCV* by Gary Bradsky has a lot of information in this field.)
|
||||
|
||||
Before going to depth images, let's first understand some basic concepts in multiview geometry. In
|
||||
this section we will deal with epipolar geometry. See the image below which shows a basic setup with
|
||||
two cameras taking the image of same scene.
|
||||
|
||||

|
||||
|
||||
If we are using only the left camera, we can't find the 3D point corresponding to the point \f$x\f$ in
|
||||
image because every point on the line \f$OX\f$ projects to the same point on the image plane. But
|
||||
consider the right image also. Now different points on the line \f$OX\f$ projects to different points
|
||||
(\f$x'\f$) in right plane. So with these two images, we can triangulate the correct 3D point. This is
|
||||
the whole idea.
|
||||
|
||||
The projection of the different points on \f$OX\f$ form a line on right plane (line \f$l'\f$). We call it
|
||||
**epiline** corresponding to the point \f$x\f$. It means, to find the point \f$x\f$ on the right image,
|
||||
search along this epiline. It should be somewhere on this line (Think of it this way, to find the
|
||||
matching point in other image, you need not search the whole image, just search along the epiline.
|
||||
So it provides better performance and accuracy). This is called **Epipolar Constraint**. Similarly
|
||||
all points will have its corresponding epilines in the other image. The plane \f$XOO'\f$ is called
|
||||
**Epipolar Plane**.
|
||||
|
||||
\f$O\f$ and \f$O'\f$ are the camera centers. From the setup given above, you can see that projection of
|
||||
right camera \f$O'\f$ is seen on the left image at the point, \f$e\f$. It is called the **epipole**. Epipole
|
||||
is the point of intersection of line through camera centers and the image planes. Similarly \f$e'\f$ is
|
||||
the epipole of the left camera. In some cases, you won't be able to locate the epipole in the image,
|
||||
they may be outside the image (which means, one camera doesn't see the other).
|
||||
|
||||
All the epilines pass through its epipole. So to find the location of epipole, we can find many
|
||||
epilines and find their intersection point.
|
||||
|
||||
So in this session, we focus on finding epipolar lines and epipoles. But to find them, we need two
|
||||
more ingredients, **Fundamental Matrix (F)** and **Essential Matrix (E)**. Essential Matrix contains
|
||||
the information about translation and rotation, which describe the location of the second camera
|
||||
relative to the first in global coordinates. See the image below (Image courtesy: Learning OpenCV by
|
||||
Gary Bradsky):
|
||||
|
||||

|
||||
|
||||
But we prefer measurements to be done in pixel coordinates, right? Fundamental Matrix contains the
|
||||
same information as Essential Matrix in addition to the information about the intrinsics of both
|
||||
cameras so that we can relate the two cameras in pixel coordinates. (If we are using rectified
|
||||
images and normalize the point by dividing by the focal lengths, \f$F=E\f$). In simple words,
|
||||
Fundamental Matrix F, maps a point in one image to a line (epiline) in the other image. This is
|
||||
calculated from matching points from both the images. A minimum of 8 such points are required to
|
||||
find the fundamental matrix (while using 8-point algorithm). More points are preferred and use
|
||||
RANSAC to get a more robust result.
|
||||
|
||||
Code
|
||||
----
|
||||
|
||||
So first we need to find as many possible matches between two images to find the fundamental matrix.
|
||||
For this, we use SIFT descriptors with FLANN based matcher and ratio test.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
img1 = cv.imread('myleft.jpg',0) #queryimage # left image
|
||||
img2 = cv.imread('myright.jpg',0) #trainimage # right image
|
||||
|
||||
sift = cv.SIFT_create()
|
||||
|
||||
# find the keypoints and descriptors with SIFT
|
||||
kp1, des1 = sift.detectAndCompute(img1,None)
|
||||
kp2, des2 = sift.detectAndCompute(img2,None)
|
||||
|
||||
# FLANN parameters
|
||||
FLANN_INDEX_KDTREE = 1
|
||||
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
|
||||
search_params = dict(checks=50)
|
||||
|
||||
flann = cv.FlannBasedMatcher(index_params,search_params)
|
||||
matches = flann.knnMatch(des1,des2,k=2)
|
||||
|
||||
pts1 = []
|
||||
pts2 = []
|
||||
|
||||
# ratio test as per Lowe's paper
|
||||
for i,(m,n) in enumerate(matches):
|
||||
if m.distance < 0.8*n.distance:
|
||||
pts2.append(kp2[m.trainIdx].pt)
|
||||
pts1.append(kp1[m.queryIdx].pt)
|
||||
@endcode
|
||||
Now we have the list of best matches from both the images. Let's find the Fundamental Matrix.
|
||||
@code{.py}
|
||||
pts1 = np.int32(pts1)
|
||||
pts2 = np.int32(pts2)
|
||||
F, mask = cv.findFundamentalMat(pts1,pts2,cv.FM_LMEDS)
|
||||
|
||||
# We select only inlier points
|
||||
pts1 = pts1[mask.ravel()==1]
|
||||
pts2 = pts2[mask.ravel()==1]
|
||||
@endcode
|
||||
Next we find the epilines. Epilines corresponding to the points in first image is drawn on second
|
||||
image. So mentioning of correct images are important here. We get an array of lines. So we define a
|
||||
new function to draw these lines on the images.
|
||||
@code{.py}
|
||||
def drawlines(img1,img2,lines,pts1,pts2):
|
||||
''' img1 - image on which we draw the epilines for the points in img2
|
||||
lines - corresponding epilines '''
|
||||
r,c = img1.shape
|
||||
img1 = cv.cvtColor(img1,cv.COLOR_GRAY2BGR)
|
||||
img2 = cv.cvtColor(img2,cv.COLOR_GRAY2BGR)
|
||||
for r,pt1,pt2 in zip(lines,pts1,pts2):
|
||||
color = tuple(np.random.randint(0,255,3).tolist())
|
||||
x0,y0 = map(int, [0, -r[2]/r[1] ])
|
||||
x1,y1 = map(int, [c, -(r[2]+r[0]*c)/r[1] ])
|
||||
img1 = cv.line(img1, (x0,y0), (x1,y1), color,1)
|
||||
img1 = cv.circle(img1,tuple(pt1),5,color,-1)
|
||||
img2 = cv.circle(img2,tuple(pt2),5,color,-1)
|
||||
return img1,img2
|
||||
@endcode
|
||||
Now we find the epilines in both the images and draw them.
|
||||
@code{.py}
|
||||
# Find epilines corresponding to points in right image (second image) and
|
||||
# drawing its lines on left image
|
||||
lines1 = cv.computeCorrespondEpilines(pts2.reshape(-1,1,2), 2,F)
|
||||
lines1 = lines1.reshape(-1,3)
|
||||
img5,img6 = drawlines(img1,img2,lines1,pts1,pts2)
|
||||
|
||||
# Find epilines corresponding to points in left image (first image) and
|
||||
# drawing its lines on right image
|
||||
lines2 = cv.computeCorrespondEpilines(pts1.reshape(-1,1,2), 1,F)
|
||||
lines2 = lines2.reshape(-1,3)
|
||||
img3,img4 = drawlines(img2,img1,lines2,pts2,pts1)
|
||||
|
||||
plt.subplot(121),plt.imshow(img5)
|
||||
plt.subplot(122),plt.imshow(img3)
|
||||
plt.show()
|
||||
@endcode
|
||||
Below is the result we get:
|
||||
|
||||

|
||||
|
||||
You can see in the left image that all epilines are converging at a point outside the image at right
|
||||
side. That meeting point is the epipole.
|
||||
|
||||
For better results, images with good resolution and many non-planar points should be used.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
||||
|
||||
-# One important topic is the forward movement of camera. Then epipoles will be seen at the same
|
||||
locations in both with epilines emerging from a fixed point. [See this
|
||||
discussion](http://answers.opencv.org/question/17912/location-of-epipole/).
|
||||
2. Fundamental Matrix estimation is sensitive to quality of matches, outliers etc. It becomes worse
|
||||
when all selected matches lie on the same plane. [Check this
|
||||
discussion](http://answers.opencv.org/question/18125/epilines-not-correct/).
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/py_pose/images/pose_1.jpg
vendored
Normal file
After Width: | Height: | Size: 44 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/py_pose/images/pose_2.jpg
vendored
Normal file
After Width: | Height: | Size: 26 KiB |
127
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/py_pose/py_pose.markdown
vendored
Normal file
@ -0,0 +1,127 @@
|
||||
Pose Estimation {#tutorial_py_pose}
|
||||
===============
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this section,
|
||||
- We will learn to exploit calib3d module to create some 3D effects in images.
|
||||
|
||||
Basics
|
||||
------
|
||||
|
||||
This is going to be a small section. During the last session on camera calibration, you have found
|
||||
the camera matrix, distortion coefficients etc. Given a pattern image, we can utilize the above
|
||||
information to calculate its pose, or how the object is situated in space, like how it is rotated,
|
||||
how it is displaced etc. For a planar object, we can assume Z=0, such that, the problem now becomes
|
||||
how camera is placed in space to see our pattern image. So, if we know how the object lies in the
|
||||
space, we can draw some 2D diagrams in it to simulate the 3D effect. Let's see how to do it.
|
||||
|
||||
Our problem is, we want to draw our 3D coordinate axis (X, Y, Z axes) on our chessboard's first
|
||||
corner. X axis in blue color, Y axis in green color and Z axis in red color. So in-effect, Z axis
|
||||
should feel like it is perpendicular to our chessboard plane.
|
||||
|
||||
First, let's load the camera matrix and distortion coefficients from the previous calibration
|
||||
result.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
import glob
|
||||
|
||||
# Load previously saved data
|
||||
with np.load('B.npz') as X:
|
||||
mtx, dist, _, _ = [X[i] for i in ('mtx','dist','rvecs','tvecs')]
|
||||
@endcode
|
||||
Now let's create a function, draw which takes the corners in the chessboard (obtained using
|
||||
**cv.findChessboardCorners()**) and **axis points** to draw a 3D axis.
|
||||
@code{.py}
|
||||
def draw(img, corners, imgpts):
|
||||
corner = tuple(corners[0].ravel())
|
||||
img = cv.line(img, corner, tuple(imgpts[0].ravel()), (255,0,0), 5)
|
||||
img = cv.line(img, corner, tuple(imgpts[1].ravel()), (0,255,0), 5)
|
||||
img = cv.line(img, corner, tuple(imgpts[2].ravel()), (0,0,255), 5)
|
||||
return img
|
||||
@endcode
|
||||
Then as in previous case, we create termination criteria, object points (3D points of corners in
|
||||
chessboard) and axis points. Axis points are points in 3D space for drawing the axis. We draw axis
|
||||
of length 3 (units will be in terms of chess square size since we calibrated based on that size). So
|
||||
our X axis is drawn from (0,0,0) to (3,0,0), so for Y axis. For Z axis, it is drawn from (0,0,0) to
|
||||
(0,0,-3). Negative denotes it is drawn towards the camera.
|
||||
@code{.py}
|
||||
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 30, 0.001)
|
||||
objp = np.zeros((6*7,3), np.float32)
|
||||
objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2)
|
||||
|
||||
axis = np.float32([[3,0,0], [0,3,0], [0,0,-3]]).reshape(-1,3)
|
||||
@endcode
|
||||
Now, as usual, we load each image. Search for 7x6 grid. If found, we refine it with subcorner
|
||||
pixels. Then to calculate the rotation and translation, we use the function,
|
||||
**cv.solvePnPRansac()**. Once we those transformation matrices, we use them to project our **axis
|
||||
points** to the image plane. In simple words, we find the points on image plane corresponding to
|
||||
each of (3,0,0),(0,3,0),(0,0,3) in 3D space. Once we get them, we draw lines from the first corner
|
||||
to each of these points using our draw() function. Done !!!
|
||||
@code{.py}
|
||||
for fname in glob.glob('left*.jpg'):
|
||||
img = cv.imread(fname)
|
||||
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
|
||||
ret, corners = cv.findChessboardCorners(gray, (7,6),None)
|
||||
|
||||
if ret == True:
|
||||
corners2 = cv.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
|
||||
|
||||
# Find the rotation and translation vectors.
|
||||
ret,rvecs, tvecs = cv.solvePnP(objp, corners2, mtx, dist)
|
||||
|
||||
# project 3D points to image plane
|
||||
imgpts, jac = cv.projectPoints(axis, rvecs, tvecs, mtx, dist)
|
||||
|
||||
img = draw(img,corners2,imgpts)
|
||||
cv.imshow('img',img)
|
||||
k = cv.waitKey(0) & 0xFF
|
||||
if k == ord('s'):
|
||||
cv.imwrite(fname[:6]+'.png', img)
|
||||
|
||||
cv.destroyAllWindows()
|
||||
@endcode
|
||||
See some results below. Notice that each axis is 3 squares long.:
|
||||
|
||||

|
||||
|
||||
### Render a Cube
|
||||
|
||||
If you want to draw a cube, modify the draw() function and axis points as follows.
|
||||
|
||||
Modified draw() function:
|
||||
@code{.py}
|
||||
def draw(img, corners, imgpts):
|
||||
imgpts = np.int32(imgpts).reshape(-1,2)
|
||||
|
||||
# draw ground floor in green
|
||||
img = cv.drawContours(img, [imgpts[:4]],-1,(0,255,0),-3)
|
||||
|
||||
# draw pillars in blue color
|
||||
for i,j in zip(range(4),range(4,8)):
|
||||
img = cv.line(img, tuple(imgpts[i]), tuple(imgpts[j]),(255),3)
|
||||
|
||||
# draw top layer in red color
|
||||
img = cv.drawContours(img, [imgpts[4:]],-1,(0,0,255),3)
|
||||
|
||||
return img
|
||||
@endcode
|
||||
Modified axis points. They are the 8 corners of a cube in 3D space:
|
||||
@code{.py}
|
||||
axis = np.float32([[0,0,0], [0,3,0], [3,3,0], [3,0,0],
|
||||
[0,0,-3],[0,3,-3],[3,3,-3],[3,0,-3] ])
|
||||
@endcode
|
||||
And look at the result below:
|
||||
|
||||

|
||||
|
||||
If you are interested in graphics, augmented reality etc, you can use OpenGL to render more
|
||||
complicated figures.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
22
3rdparty/opencv-4.5.4/doc/py_tutorials/py_calib3d/py_table_of_contents_calib3d.markdown
vendored
Normal file
@ -0,0 +1,22 @@
|
||||
Camera Calibration and 3D Reconstruction {#tutorial_py_table_of_contents_calib3d}
|
||||
========================================
|
||||
|
||||
- @subpage tutorial_py_calibration
|
||||
|
||||
Let's find how good
|
||||
is our camera. Is there any distortion in images taken with it? If so how to correct it?
|
||||
|
||||
- @subpage tutorial_py_pose
|
||||
|
||||
This is a small
|
||||
section which will help you to create some cool 3D effects with calib module.
|
||||
|
||||
- @subpage tutorial_py_epipolar_geometry
|
||||
|
||||
Let's understand
|
||||
epipolar geometry and epipolar constraint.
|
||||
|
||||
- @subpage tutorial_py_depthmap
|
||||
|
||||
Extract depth
|
||||
information from 2D images.
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_core/images/image_arithmetic.jpg
vendored
Normal file
After Width: | Height: | Size: 2.0 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_core/images/maths_tools.jpg
vendored
Normal file
After Width: | Height: | Size: 3.1 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_core/images/pixel_ops.jpg
vendored
Normal file
After Width: | Height: | Size: 4.1 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_core/images/speed.jpg
vendored
Normal file
After Width: | Height: | Size: 2.9 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_core/py_basic_ops/images/border.jpg
vendored
Normal file
After Width: | Height: | Size: 44 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_core/py_basic_ops/images/roi.jpg
vendored
Normal file
After Width: | Height: | Size: 26 KiB |
201
3rdparty/opencv-4.5.4/doc/py_tutorials/py_core/py_basic_ops/py_basic_ops.markdown
vendored
Normal file
@ -0,0 +1,201 @@
|
||||
Basic Operations on Images {#tutorial_py_basic_ops}
|
||||
==========================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
Learn to:
|
||||
|
||||
- Access pixel values and modify them
|
||||
- Access image properties
|
||||
- Set a Region of Interest (ROI)
|
||||
- Split and merge images
|
||||
|
||||
Almost all the operations in this section are mainly related to Numpy rather than OpenCV. A good
|
||||
knowledge of Numpy is required to write better optimized code with OpenCV.
|
||||
|
||||
*( Examples will be shown in a Python terminal, since most of them are just single lines of code )*
|
||||
|
||||
Accessing and Modifying pixel values
|
||||
------------------------------------
|
||||
|
||||
Let's load a color image first:
|
||||
@code{.py}
|
||||
>>> import numpy as np
|
||||
>>> import cv2 as cv
|
||||
|
||||
>>> img = cv.imread('messi5.jpg')
|
||||
@endcode
|
||||
You can access a pixel value by its row and column coordinates. For BGR image, it returns an array
|
||||
of Blue, Green, Red values. For grayscale image, just corresponding intensity is returned.
|
||||
@code{.py}
|
||||
>>> px = img[100,100]
|
||||
>>> print( px )
|
||||
[157 166 200]
|
||||
|
||||
# accessing only blue pixel
|
||||
>>> blue = img[100,100,0]
|
||||
>>> print( blue )
|
||||
157
|
||||
@endcode
|
||||
You can modify the pixel values the same way.
|
||||
@code{.py}
|
||||
>>> img[100,100] = [255,255,255]
|
||||
>>> print( img[100,100] )
|
||||
[255 255 255]
|
||||
@endcode
|
||||
|
||||
**Warning**
|
||||
|
||||
Numpy is an optimized library for fast array calculations. So simply accessing each and every pixel
|
||||
value and modifying it will be very slow and it is discouraged.
|
||||
|
||||
@note The above method is normally used for selecting a region of an array, say the first 5 rows
|
||||
and last 3 columns. For individual pixel access, the Numpy array methods, array.item() and
|
||||
array.itemset() are considered better. They always return a scalar, however, so if you want to access
|
||||
all the B,G,R values, you will need to call array.item() separately for each value.
|
||||
|
||||
Better pixel accessing and editing method :
|
||||
@code{.py}
|
||||
# accessing RED value
|
||||
>>> img.item(10,10,2)
|
||||
59
|
||||
|
||||
# modifying RED value
|
||||
>>> img.itemset((10,10,2),100)
|
||||
>>> img.item(10,10,2)
|
||||
100
|
||||
@endcode
|
||||
|
||||
Accessing Image Properties
|
||||
--------------------------
|
||||
|
||||
Image properties include number of rows, columns, and channels; type of image data; number of pixels; etc.
|
||||
|
||||
The shape of an image is accessed by img.shape. It returns a tuple of the number of rows, columns, and channels
|
||||
(if the image is color):
|
||||
@code{.py}
|
||||
>>> print( img.shape )
|
||||
(342, 548, 3)
|
||||
@endcode
|
||||
|
||||
@note If an image is grayscale, the tuple returned contains only the number of rows
|
||||
and columns, so it is a good method to check whether the loaded image is grayscale or color.
|
||||
|
||||
Total number of pixels is accessed by `img.size`:
|
||||
@code{.py}
|
||||
>>> print( img.size )
|
||||
562248
|
||||
@endcode
|
||||
Image datatype is obtained by \`img.dtype\`:
|
||||
@code{.py}
|
||||
>>> print( img.dtype )
|
||||
uint8
|
||||
@endcode
|
||||
|
||||
@note img.dtype is very important while debugging because a large number of errors in OpenCV-Python
|
||||
code are caused by invalid datatype.
|
||||
|
||||
Image ROI
|
||||
---------
|
||||
|
||||
Sometimes, you will have to play with certain regions of images. For eye detection in images, first
|
||||
face detection is done over the entire image. When a face is obtained, we select the face region alone
|
||||
and search for eyes inside it instead of searching the whole image. It improves accuracy (because eyes
|
||||
are always on faces :D ) and performance (because we search in a small area).
|
||||
|
||||
ROI is again obtained using Numpy indexing. Here I am selecting the ball and copying it to another
|
||||
region in the image:
|
||||
@code{.py}
|
||||
>>> ball = img[280:340, 330:390]
|
||||
>>> img[273:333, 100:160] = ball
|
||||
@endcode
|
||||
Check the results below:
|
||||
|
||||

|
||||
|
||||
Splitting and Merging Image Channels
|
||||
------------------------------------
|
||||
|
||||
Sometimes you will need to work separately on the B,G,R channels of an image. In this case, you need
|
||||
to split the BGR image into single channels. In other cases, you may need to join these individual
|
||||
channels to create a BGR image. You can do this simply by:
|
||||
@code{.py}
|
||||
>>> b,g,r = cv.split(img)
|
||||
>>> img = cv.merge((b,g,r))
|
||||
@endcode
|
||||
Or
|
||||
@code
|
||||
>>> b = img[:,:,0]
|
||||
@endcode
|
||||
Suppose you want to set all the red pixels to zero - you do not need to split the channels first.
|
||||
Numpy indexing is faster:
|
||||
@code{.py}
|
||||
>>> img[:,:,2] = 0
|
||||
@endcode
|
||||
|
||||
**Warning**
|
||||
|
||||
cv.split() is a costly operation (in terms of time). So use it only if necessary. Otherwise go
|
||||
for Numpy indexing.
|
||||
|
||||
Making Borders for Images (Padding)
|
||||
-----------------------------------
|
||||
|
||||
If you want to create a border around an image, something like a photo frame, you can use
|
||||
**cv.copyMakeBorder()**. But it has more applications for convolution operation, zero
|
||||
padding etc. This function takes following arguments:
|
||||
|
||||
- **src** - input image
|
||||
- **top**, **bottom**, **left**, **right** - border width in number of pixels in corresponding
|
||||
directions
|
||||
|
||||
- **borderType** - Flag defining what kind of border to be added. It can be following types:
|
||||
- **cv.BORDER_CONSTANT** - Adds a constant colored border. The value should be given
|
||||
as next argument.
|
||||
- **cv.BORDER_REFLECT** - Border will be mirror reflection of the border elements,
|
||||
like this : *fedcba|abcdefgh|hgfedcb*
|
||||
- **cv.BORDER_REFLECT_101** or **cv.BORDER_DEFAULT** - Same as above, but with a
|
||||
slight change, like this : *gfedcb|abcdefgh|gfedcba*
|
||||
- **cv.BORDER_REPLICATE** - Last element is replicated throughout, like this:
|
||||
*aaaaaa|abcdefgh|hhhhhhh*
|
||||
- **cv.BORDER_WRAP** - Can't explain, it will look like this :
|
||||
*cdefgh|abcdefgh|abcdefg*
|
||||
|
||||
- **value** - Color of border if border type is cv.BORDER_CONSTANT
|
||||
|
||||
Below is a sample code demonstrating all these border types for better understanding:
|
||||
@code{.py}
|
||||
import cv2 as cv
|
||||
import numpy as np
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
BLUE = [255,0,0]
|
||||
|
||||
img1 = cv.imread('opencv-logo.png')
|
||||
|
||||
replicate = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_REPLICATE)
|
||||
reflect = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_REFLECT)
|
||||
reflect101 = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_REFLECT_101)
|
||||
wrap = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_WRAP)
|
||||
constant= cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_CONSTANT,value=BLUE)
|
||||
|
||||
plt.subplot(231),plt.imshow(img1,'gray'),plt.title('ORIGINAL')
|
||||
plt.subplot(232),plt.imshow(replicate,'gray'),plt.title('REPLICATE')
|
||||
plt.subplot(233),plt.imshow(reflect,'gray'),plt.title('REFLECT')
|
||||
plt.subplot(234),plt.imshow(reflect101,'gray'),plt.title('REFLECT_101')
|
||||
plt.subplot(235),plt.imshow(wrap,'gray'),plt.title('WRAP')
|
||||
plt.subplot(236),plt.imshow(constant,'gray'),plt.title('CONSTANT')
|
||||
|
||||
plt.show()
|
||||
@endcode
|
||||
See the result below. (Image is displayed with matplotlib. So RED and BLUE channels will be
|
||||
interchanged):
|
||||
|
||||

|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_core/py_image_arithmetics/images/blending.jpg
vendored
Normal file
After Width: | Height: | Size: 18 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_core/py_image_arithmetics/images/overlay.jpg
vendored
Normal file
After Width: | Height: | Size: 23 KiB |
116
3rdparty/opencv-4.5.4/doc/py_tutorials/py_core/py_image_arithmetics/py_image_arithmetics.markdown
vendored
Normal file
@ -0,0 +1,116 @@
|
||||
Arithmetic Operations on Images {#tutorial_py_image_arithmetics}
|
||||
===============================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
- Learn several arithmetic operations on images, like addition, subtraction, bitwise operations, and etc.
|
||||
- Learn these functions: **cv.add()**, **cv.addWeighted()**, etc.
|
||||
|
||||
Image Addition
|
||||
--------------
|
||||
|
||||
You can add two images with the OpenCV function, cv.add(), or simply by the numpy operation
|
||||
res = img1 + img2. Both images should be of same depth and type, or the second image can just be a
|
||||
scalar value.
|
||||
|
||||
@note There is a difference between OpenCV addition and Numpy addition. OpenCV addition is a
|
||||
saturated operation while Numpy addition is a modulo operation.
|
||||
|
||||
For example, consider the below sample:
|
||||
@code{.py}
|
||||
>>> x = np.uint8([250])
|
||||
>>> y = np.uint8([10])
|
||||
|
||||
>>> print( cv.add(x,y) ) # 250+10 = 260 => 255
|
||||
[[255]]
|
||||
|
||||
>>> print( x+y ) # 250+10 = 260 % 256 = 4
|
||||
[4]
|
||||
@endcode
|
||||
This will be more visible when you add two images. Stick with OpenCV functions, because they will provide a better result.
|
||||
|
||||
Image Blending
|
||||
--------------
|
||||
|
||||
This is also image addition, but different weights are given to images in order to give a feeling of
|
||||
blending or transparency. Images are added as per the equation below:
|
||||
|
||||
\f[g(x) = (1 - \alpha)f_{0}(x) + \alpha f_{1}(x)\f]
|
||||
|
||||
By varying \f$\alpha\f$ from \f$0 \rightarrow 1\f$, you can perform a cool transition between one image to
|
||||
another.
|
||||
|
||||
Here I took two images to blend together. The first image is given a weight of 0.7 and the second image
|
||||
is given 0.3. cv.addWeighted() applies the following equation to the image:
|
||||
|
||||
\f[dst = \alpha \cdot img1 + \beta \cdot img2 + \gamma\f]
|
||||
|
||||
Here \f$\gamma\f$ is taken as zero.
|
||||
@code{.py}
|
||||
img1 = cv.imread('ml.png')
|
||||
img2 = cv.imread('opencv-logo.png')
|
||||
|
||||
dst = cv.addWeighted(img1,0.7,img2,0.3,0)
|
||||
|
||||
cv.imshow('dst',dst)
|
||||
cv.waitKey(0)
|
||||
cv.destroyAllWindows()
|
||||
@endcode
|
||||
Check the result below:
|
||||
|
||||

|
||||
|
||||
Bitwise Operations
|
||||
------------------
|
||||
|
||||
This includes the bitwise AND, OR, NOT, and XOR operations. They will be highly useful while extracting
|
||||
any part of the image (as we will see in coming chapters), defining and working with non-rectangular
|
||||
ROI's, and etc. Below we will see an example of how to change a particular region of an image.
|
||||
|
||||
I want to put the OpenCV logo above an image. If I add two images, it will change the color. If I blend them,
|
||||
I get a transparent effect. But I want it to be opaque. If it was a rectangular region, I could use
|
||||
ROI as we did in the last chapter. But the OpenCV logo is a not a rectangular shape. So you can do it with
|
||||
bitwise operations as shown below:
|
||||
@code{.py}
|
||||
# Load two images
|
||||
img1 = cv.imread('messi5.jpg')
|
||||
img2 = cv.imread('opencv-logo-white.png')
|
||||
|
||||
# I want to put logo on top-left corner, So I create a ROI
|
||||
rows,cols,channels = img2.shape
|
||||
roi = img1[0:rows, 0:cols]
|
||||
|
||||
# Now create a mask of logo and create its inverse mask also
|
||||
img2gray = cv.cvtColor(img2,cv.COLOR_BGR2GRAY)
|
||||
ret, mask = cv.threshold(img2gray, 10, 255, cv.THRESH_BINARY)
|
||||
mask_inv = cv.bitwise_not(mask)
|
||||
|
||||
# Now black-out the area of logo in ROI
|
||||
img1_bg = cv.bitwise_and(roi,roi,mask = mask_inv)
|
||||
|
||||
# Take only region of logo from logo image.
|
||||
img2_fg = cv.bitwise_and(img2,img2,mask = mask)
|
||||
|
||||
# Put logo in ROI and modify the main image
|
||||
dst = cv.add(img1_bg,img2_fg)
|
||||
img1[0:rows, 0:cols ] = dst
|
||||
|
||||
cv.imshow('res',img1)
|
||||
cv.waitKey(0)
|
||||
cv.destroyAllWindows()
|
||||
@endcode
|
||||
See the result below. Left image shows the mask we created. Right image shows the final result. For
|
||||
more understanding, display all the intermediate images in the above code, especially img1_bg and
|
||||
img2_fg.
|
||||
|
||||

|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
||||
|
||||
-# Create a slide show of images in a folder with smooth transition between images using
|
||||
cv.addWeighted function
|
167
3rdparty/opencv-4.5.4/doc/py_tutorials/py_core/py_optimization/py_optimization.markdown
vendored
Normal file
@ -0,0 +1,167 @@
|
||||
Performance Measurement and Improvement Techniques {#tutorial_py_optimization}
|
||||
==================================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In image processing, since you are dealing with a large number of operations per second, it is mandatory that your code is not only providing the correct solution, but that it is also providing it in the fastest manner.
|
||||
So in this chapter, you will learn:
|
||||
|
||||
- To measure the performance of your code.
|
||||
- Some tips to improve the performance of your code.
|
||||
- You will see these functions: **cv.getTickCount**, **cv.getTickFrequency**, etc.
|
||||
|
||||
Apart from OpenCV, Python also provides a module **time** which is helpful in measuring the time of
|
||||
execution. Another module **profile** helps to get a detailed report on the code, like how much time
|
||||
each function in the code took, how many times the function was called, etc. But, if you are using
|
||||
IPython, all these features are integrated in an user-friendly manner. We will see some important
|
||||
ones, and for more details, check links in the **Additional Resources** section.
|
||||
|
||||
Measuring Performance with OpenCV
|
||||
---------------------------------
|
||||
|
||||
The **cv.getTickCount** function returns the number of clock-cycles after a reference event (like the
|
||||
moment the machine was switched ON) to the moment this function is called. So if you call it before and
|
||||
after the function execution, you get the number of clock-cycles used to execute a function.
|
||||
|
||||
The **cv.getTickFrequency** function returns the frequency of clock-cycles, or the number of
|
||||
clock-cycles per second. So to find the time of execution in seconds, you can do following:
|
||||
@code{.py}
|
||||
e1 = cv.getTickCount()
|
||||
# your code execution
|
||||
e2 = cv.getTickCount()
|
||||
time = (e2 - e1)/ cv.getTickFrequency()
|
||||
@endcode
|
||||
We will demonstrate with following example. The following example applies median filtering with kernels
|
||||
of odd sizes ranging from 5 to 49. (Don't worry about what the result will look like - that is not our
|
||||
goal):
|
||||
@code{.py}
|
||||
img1 = cv.imread('messi5.jpg')
|
||||
|
||||
e1 = cv.getTickCount()
|
||||
for i in range(5,49,2):
|
||||
img1 = cv.medianBlur(img1,i)
|
||||
e2 = cv.getTickCount()
|
||||
t = (e2 - e1)/cv.getTickFrequency()
|
||||
print( t )
|
||||
|
||||
# Result I got is 0.521107655 seconds
|
||||
@endcode
|
||||
@note You can do the same thing with the time module. Instead of cv.getTickCount, use the time.time() function.
|
||||
Then take the difference of the two times.
|
||||
|
||||
Default Optimization in OpenCV
|
||||
------------------------------
|
||||
|
||||
Many of the OpenCV functions are optimized using SSE2, AVX, etc. It contains the unoptimized code also.
|
||||
So if our system support these features, we should exploit them (almost all modern day processors
|
||||
support them). It is enabled by default while compiling. So OpenCV runs the optimized code if it is
|
||||
enabled, otherwise it runs the unoptimized code. You can use **cv.useOptimized()** to check if it is
|
||||
enabled/disabled and **cv.setUseOptimized()** to enable/disable it. Let's see a simple example.
|
||||
@code{.py}
|
||||
# check if optimization is enabled
|
||||
In [5]: cv.useOptimized()
|
||||
Out[5]: True
|
||||
|
||||
In [6]: %timeit res = cv.medianBlur(img,49)
|
||||
10 loops, best of 3: 34.9 ms per loop
|
||||
|
||||
# Disable it
|
||||
In [7]: cv.setUseOptimized(False)
|
||||
|
||||
In [8]: cv.useOptimized()
|
||||
Out[8]: False
|
||||
|
||||
In [9]: %timeit res = cv.medianBlur(img,49)
|
||||
10 loops, best of 3: 64.1 ms per loop
|
||||
@endcode
|
||||
As you can see, optimized median filtering is \~2x faster than the unoptimized version. If you check its source,
|
||||
you can see that median filtering is SIMD optimized. So you can use this to enable optimization at the
|
||||
top of your code (remember it is enabled by default).
|
||||
|
||||
Measuring Performance in IPython
|
||||
--------------------------------
|
||||
|
||||
Sometimes you may need to compare the performance of two similar operations. IPython gives you a
|
||||
magic command %timeit to perform this. It runs the code several times to get more accurate results.
|
||||
Once again, it is suitable to measuring single lines of code.
|
||||
|
||||
For example, do you know which of the following addition operations is better, x = 5; y = x\*\*2,
|
||||
x = 5; y = x\*x, x = np.uint8([5]); y = x\*x, or y = np.square(x)? We will find out with %timeit in the
|
||||
IPython shell.
|
||||
@code{.py}
|
||||
In [10]: x = 5
|
||||
|
||||
In [11]: %timeit y=x**2
|
||||
10000000 loops, best of 3: 73 ns per loop
|
||||
|
||||
In [12]: %timeit y=x*x
|
||||
10000000 loops, best of 3: 58.3 ns per loop
|
||||
|
||||
In [15]: z = np.uint8([5])
|
||||
|
||||
In [17]: %timeit y=z*z
|
||||
1000000 loops, best of 3: 1.25 us per loop
|
||||
|
||||
In [19]: %timeit y=np.square(z)
|
||||
1000000 loops, best of 3: 1.16 us per loop
|
||||
@endcode
|
||||
You can see that, x = 5 ; y = x\*x is fastest and it is around 20x faster compared to Numpy. If you
|
||||
consider the array creation also, it may reach up to 100x faster. Cool, right? *(Numpy devs are
|
||||
working on this issue)*
|
||||
|
||||
@note Python scalar operations are faster than Numpy scalar operations. So for operations including
|
||||
one or two elements, Python scalar is better than Numpy arrays. Numpy has the advantage when the size of
|
||||
the array is a little bit bigger.
|
||||
|
||||
We will try one more example. This time, we will compare the performance of **cv.countNonZero()**
|
||||
and **np.count_nonzero()** for the same image.
|
||||
|
||||
@code{.py}
|
||||
In [35]: %timeit z = cv.countNonZero(img)
|
||||
100000 loops, best of 3: 15.8 us per loop
|
||||
|
||||
In [36]: %timeit z = np.count_nonzero(img)
|
||||
1000 loops, best of 3: 370 us per loop
|
||||
@endcode
|
||||
See, the OpenCV function is nearly 25x faster than the Numpy function.
|
||||
|
||||
@note Normally, OpenCV functions are faster than Numpy functions. So for same operation, OpenCV
|
||||
functions are preferred. But, there can be exceptions, especially when Numpy works with views
|
||||
instead of copies.
|
||||
|
||||
More IPython magic commands
|
||||
---------------------------
|
||||
|
||||
There are several other magic commands to measure performance, profiling, line profiling, memory
|
||||
measurement, and etc. They all are well documented. So only links to those docs are provided here.
|
||||
Interested readers are recommended to try them out.
|
||||
|
||||
Performance Optimization Techniques
|
||||
-----------------------------------
|
||||
|
||||
There are several techniques and coding methods to exploit maximum performance of Python and Numpy.
|
||||
Only relevant ones are noted here and links are given to important sources. The main thing to be
|
||||
noted here is, first try to implement the algorithm in a simple manner. Once it is working,
|
||||
profile it, find the bottlenecks, and optimize them.
|
||||
|
||||
-# Avoid using loops in Python as much as possible, especially double/triple loops etc. They are
|
||||
inherently slow.
|
||||
2. Vectorize the algorithm/code to the maximum extent possible, because Numpy and OpenCV are
|
||||
optimized for vector operations.
|
||||
3. Exploit the cache coherence.
|
||||
4. Never make copies of an array unless it is necessary. Try to use views instead. Array copying is a
|
||||
costly operation.
|
||||
|
||||
If your code is still slow after doing all of these operations, or if the use of large loops is inevitable, use additional libraries like Cython to make it faster.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
-# [Python Optimization Techniques](http://wiki.python.org/moin/PythonSpeed/PerformanceTips)
|
||||
2. Scipy Lecture Notes - [Advanced
|
||||
Numpy](http://scipy-lectures.github.io/advanced/advanced_numpy/index.html#advanced-numpy)
|
||||
3. [Timing and Profiling in IPython](http://pynash.org/2013/03/06/timing-and-profiling/)
|
||||
|
||||
Exercises
|
||||
---------
|
18
3rdparty/opencv-4.5.4/doc/py_tutorials/py_core/py_table_of_contents_core.markdown
vendored
Normal file
@ -0,0 +1,18 @@
|
||||
Core Operations {#tutorial_py_table_of_contents_core}
|
||||
===============
|
||||
|
||||
- @subpage tutorial_py_basic_ops
|
||||
|
||||
Learn to read and
|
||||
edit pixel values, working with image ROI and other basic operations.
|
||||
|
||||
- @subpage tutorial_py_image_arithmetics
|
||||
|
||||
Perform arithmetic
|
||||
operations on images
|
||||
|
||||
- @subpage tutorial_py_optimization
|
||||
|
||||
Getting a solution is
|
||||
important. But getting it in the fastest way is more important. Learn to check the speed of your
|
||||
code, optimize the code etc.
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/images/brief.jpg
vendored
Normal file
After Width: | Height: | Size: 4.7 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/images/fast_icon.jpg
vendored
Normal file
After Width: | Height: | Size: 3.1 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/images/features_icon.jpg
vendored
Normal file
After Width: | Height: | Size: 4.7 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/images/harris_icon.jpg
vendored
Normal file
After Width: | Height: | Size: 2.8 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/images/homography_icon.jpg
vendored
Normal file
After Width: | Height: | Size: 4.6 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/images/matching.jpg
vendored
Normal file
After Width: | Height: | Size: 5.4 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/images/orb.jpg
vendored
Normal file
After Width: | Height: | Size: 7.1 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/images/shi_icon.jpg
vendored
Normal file
After Width: | Height: | Size: 3.7 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/images/sift_icon.jpg
vendored
Normal file
After Width: | Height: | Size: 3.4 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/images/surf_icon.jpg
vendored
Normal file
After Width: | Height: | Size: 3.4 KiB |
92
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_brief/py_brief.markdown
vendored
Normal file
@ -0,0 +1,92 @@
|
||||
BRIEF (Binary Robust Independent Elementary Features) {#tutorial_py_brief}
|
||||
=====================================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter
|
||||
- We will see the basics of BRIEF algorithm
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
We know SIFT uses 128-dim vector for descriptors. Since it is using floating point numbers, it takes
|
||||
basically 512 bytes. Similarly SURF also takes minimum of 256 bytes (for 64-dim). Creating such a
|
||||
vector for thousands of features takes a lot of memory which are not feasible for resource-constraint
|
||||
applications especially for embedded systems. Larger the memory, longer the time it takes for
|
||||
matching.
|
||||
|
||||
But all these dimensions may not be needed for actual matching. We can compress it using several
|
||||
methods like PCA, LDA etc. Even other methods like hashing using LSH (Locality Sensitive Hashing) is
|
||||
used to convert these SIFT descriptors in floating point numbers to binary strings. These binary
|
||||
strings are used to match features using Hamming distance. This provides better speed-up because
|
||||
finding hamming distance is just applying XOR and bit count, which are very fast in modern CPUs with
|
||||
SSE instructions. But here, we need to find the descriptors first, then only we can apply hashing,
|
||||
which doesn't solve our initial problem on memory.
|
||||
|
||||
BRIEF comes into picture at this moment. It provides a shortcut to find the binary strings directly
|
||||
without finding descriptors. It takes smoothened image patch and selects a set of \f$n_d\f$ (x,y)
|
||||
location pairs in an unique way (explained in paper). Then some pixel intensity comparisons are done
|
||||
on these location pairs. For eg, let first location pairs be \f$p\f$ and \f$q\f$. If \f$I(p) < I(q)\f$, then its
|
||||
result is 1, else it is 0. This is applied for all the \f$n_d\f$ location pairs to get a
|
||||
\f$n_d\f$-dimensional bitstring.
|
||||
|
||||
This \f$n_d\f$ can be 128, 256 or 512. OpenCV supports all of these, but by default, it would be 256
|
||||
(OpenCV represents it in bytes. So the values will be 16, 32 and 64). So once you get this, you can
|
||||
use Hamming Distance to match these descriptors.
|
||||
|
||||
One important point is that BRIEF is a feature descriptor, it doesn't provide any method to find the
|
||||
features. So you will have to use any other feature detectors like SIFT, SURF etc. The paper
|
||||
recommends to use CenSurE which is a fast detector and BRIEF works even slightly better for CenSurE
|
||||
points than for SURF points.
|
||||
|
||||
In short, BRIEF is a faster method feature descriptor calculation and matching. It also provides
|
||||
high recognition rate unless there is large in-plane rotation.
|
||||
|
||||
STAR(CenSurE) in OpenCV
|
||||
------
|
||||
STAR is a feature detector derived from CenSurE.
|
||||
Unlike CenSurE however, which uses polygons like squares, hexagons and octagons to approach a circle,
|
||||
Star emulates a circle with 2 overlapping squares: 1 upright and 1 45-degree rotated. These polygons are bi-level.
|
||||
They can be seen as polygons with thick borders. The borders and the enclosed area have weights of opposing signs.
|
||||
This has better computational characteristics than other scale-space detectors and it is capable of real-time implementation.
|
||||
In contrast to SIFT and SURF, which find extrema at sub-sampled pixels that compromises accuracy at larger scales,
|
||||
CenSurE creates a feature vector using full spatial resolution at all scales in the pyramid.
|
||||
BRIEF in OpenCV
|
||||
---------------
|
||||
|
||||
Below code shows the computation of BRIEF descriptors with the help of CenSurE detector.
|
||||
|
||||
note, that you need [opencv contrib](https://github.com/opencv/opencv_contrib)) to use this.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
img = cv.imread('simple.jpg',0)
|
||||
|
||||
# Initiate FAST detector
|
||||
star = cv.xfeatures2d.StarDetector_create()
|
||||
|
||||
# Initiate BRIEF extractor
|
||||
brief = cv.xfeatures2d.BriefDescriptorExtractor_create()
|
||||
|
||||
# find the keypoints with STAR
|
||||
kp = star.detect(img,None)
|
||||
|
||||
# compute the descriptors with BRIEF
|
||||
kp, des = brief.compute(img, kp)
|
||||
|
||||
print( brief.descriptorSize() )
|
||||
print( des.shape )
|
||||
@endcode
|
||||
The function brief.getDescriptorSize() gives the \f$n_d\f$ size used in bytes. By default it is 32. Next one
|
||||
is matching, which will be done in another chapter.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
-# Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua, "BRIEF: Binary Robust
|
||||
Independent Elementary Features", 11th European Conference on Computer Vision (ECCV), Heraklion,
|
||||
Crete. LNCS Springer, September 2010.
|
||||
2. [LSH (Locality Sensitive Hashing)](https://en.wikipedia.org/wiki/Locality-sensitive_hashing) at wikipedia.
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_fast/images/fast_eqns.jpg
vendored
Normal file
After Width: | Height: | Size: 6.2 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_fast/images/fast_kp.jpg
vendored
Normal file
After Width: | Height: | Size: 25 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_fast/images/fast_speedtest.jpg
vendored
Normal file
After Width: | Height: | Size: 17 KiB |
143
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_fast/py_fast.markdown
vendored
Normal file
@ -0,0 +1,143 @@
|
||||
FAST Algorithm for Corner Detection {#tutorial_py_fast}
|
||||
===================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter,
|
||||
- We will understand the basics of FAST algorithm
|
||||
- We will find corners using OpenCV functionalities for FAST algorithm.
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
We saw several feature detectors and many of them are really good. But when looking from a real-time
|
||||
application point of view, they are not fast enough. One best example would be SLAM (Simultaneous
|
||||
Localization and Mapping) mobile robot which have limited computational resources.
|
||||
|
||||
As a solution to this, FAST (Features from Accelerated Segment Test) algorithm was proposed by
|
||||
Edward Rosten and Tom Drummond in their paper "Machine learning for high-speed corner detection" in
|
||||
2006 (Later revised it in 2010). A basic summary of the algorithm is presented below. Refer original
|
||||
paper for more details (All the images are taken from original paper).
|
||||
|
||||
### Feature Detection using FAST
|
||||
|
||||
-# Select a pixel \f$p\f$ in the image which is to be identified as an interest point or not. Let its
|
||||
intensity be \f$I_p\f$.
|
||||
2. Select appropriate threshold value \f$t\f$.
|
||||
3. Consider a circle of 16 pixels around the pixel under test. (See the image below)
|
||||
|
||||

|
||||
|
||||
-# Now the pixel \f$p\f$ is a corner if there exists a set of \f$n\f$ contiguous pixels in the circle (of
|
||||
16 pixels) which are all brighter than \f$I_p + t\f$, or all darker than \f$I_p − t\f$. (Shown as white
|
||||
dash lines in the above image). \f$n\f$ was chosen to be 12.
|
||||
5. A **high-speed test** was proposed to exclude a large number of non-corners. This test examines
|
||||
only the four pixels at 1, 9, 5 and 13 (First 1 and 9 are tested if they are too brighter or
|
||||
darker. If so, then checks 5 and 13). If \f$p\f$ is a corner, then at least three of these must all
|
||||
be brighter than \f$I_p + t\f$ or darker than \f$I_p − t\f$. If neither of these is the case, then \f$p\f$
|
||||
cannot be a corner. The full segment test criterion can then be applied to the passed candidates
|
||||
by examining all pixels in the circle. This detector in itself exhibits high performance, but
|
||||
there are several weaknesses:
|
||||
|
||||
- It does not reject as many candidates for n \< 12.
|
||||
- The choice of pixels is not optimal because its efficiency depends on ordering of the
|
||||
questions and distribution of corner appearances.
|
||||
- Results of high-speed tests are thrown away.
|
||||
- Multiple features are detected adjacent to one another.
|
||||
|
||||
First 3 points are addressed with a machine learning approach. Last one is addressed using
|
||||
non-maximal suppression.
|
||||
|
||||
### Machine Learning a Corner Detector
|
||||
|
||||
-# Select a set of images for training (preferably from the target application domain)
|
||||
2. Run FAST algorithm in every images to find feature points.
|
||||
3. For every feature point, store the 16 pixels around it as a vector. Do it for all the images to
|
||||
get feature vector \f$P\f$.
|
||||
4. Each pixel (say \f$x\f$) in these 16 pixels can have one of the following three states:
|
||||
|
||||

|
||||
|
||||
-# Depending on these states, the feature vector \f$P\f$ is subdivided into 3 subsets, \f$P_d\f$, \f$P_s\f$,
|
||||
\f$P_b\f$.
|
||||
6. Define a new boolean variable, \f$K_p\f$, which is true if \f$p\f$ is a corner and false otherwise.
|
||||
7. Use the ID3 algorithm (decision tree classifier) to query each subset using the variable \f$K_p\f$
|
||||
for the knowledge about the true class. It selects the \f$x\f$ which yields the most information
|
||||
about whether the candidate pixel is a corner, measured by the entropy of \f$K_p\f$.
|
||||
8. This is recursively applied to all the subsets until its entropy is zero.
|
||||
9. The decision tree so created is used for fast detection in other images.
|
||||
|
||||
### Non-maximal Suppression
|
||||
|
||||
Detecting multiple interest points in adjacent locations is another problem. It is solved by using
|
||||
Non-maximum Suppression.
|
||||
|
||||
-# Compute a score function, \f$V\f$ for all the detected feature points. \f$V\f$ is the sum of absolute
|
||||
difference between \f$p\f$ and 16 surrounding pixels values.
|
||||
2. Consider two adjacent keypoints and compute their \f$V\f$ values.
|
||||
3. Discard the one with lower \f$V\f$ value.
|
||||
|
||||
### Summary
|
||||
|
||||
It is several times faster than other existing corner detectors.
|
||||
|
||||
But it is not robust to high levels of noise. It is dependent on a threshold.
|
||||
|
||||
FAST Feature Detector in OpenCV
|
||||
-------------------------------
|
||||
|
||||
It is called as any other feature detector in OpenCV. If you want, you can specify the threshold,
|
||||
whether non-maximum suppression to be applied or not, the neighborhood to be used etc.
|
||||
|
||||
For the neighborhood, three flags are defined, cv.FAST_FEATURE_DETECTOR_TYPE_5_8,
|
||||
cv.FAST_FEATURE_DETECTOR_TYPE_7_12 and cv.FAST_FEATURE_DETECTOR_TYPE_9_16. Below is a
|
||||
simple code on how to detect and draw the FAST feature points.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
img = cv.imread('blox.jpg',0) # `<opencv_root>/samples/data/blox.jpg`
|
||||
|
||||
# Initiate FAST object with default values
|
||||
fast = cv.FastFeatureDetector_create()
|
||||
|
||||
# find and draw the keypoints
|
||||
kp = fast.detect(img,None)
|
||||
img2 = cv.drawKeypoints(img, kp, None, color=(255,0,0))
|
||||
|
||||
# Print all default params
|
||||
print( "Threshold: {}".format(fast.getThreshold()) )
|
||||
print( "nonmaxSuppression:{}".format(fast.getNonmaxSuppression()) )
|
||||
print( "neighborhood: {}".format(fast.getType()) )
|
||||
print( "Total Keypoints with nonmaxSuppression: {}".format(len(kp)) )
|
||||
|
||||
cv.imwrite('fast_true.png', img2)
|
||||
|
||||
# Disable nonmaxSuppression
|
||||
fast.setNonmaxSuppression(0)
|
||||
kp = fast.detect(img, None)
|
||||
|
||||
print( "Total Keypoints without nonmaxSuppression: {}".format(len(kp)) )
|
||||
|
||||
img3 = cv.drawKeypoints(img, kp, None, color=(255,0,0))
|
||||
|
||||
cv.imwrite('fast_false.png', img3)
|
||||
@endcode
|
||||
See the results. First image shows FAST with nonmaxSuppression and second one without
|
||||
nonmaxSuppression:
|
||||
|
||||

|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
-# Edward Rosten and Tom Drummond, “Machine learning for high speed corner detection” in 9th
|
||||
European Conference on Computer Vision, vol. 1, 2006, pp. 430–443.
|
||||
2. Edward Rosten, Reid Porter, and Tom Drummond, "Faster and better: a machine learning approach to
|
||||
corner detection" in IEEE Trans. Pattern Analysis and Machine Intelligence, 2010, vol 32, pp.
|
||||
105-119.
|
||||
|
||||
Exercises
|
||||
---------
|
After Width: | Height: | Size: 31 KiB |
@ -0,0 +1,110 @@
|
||||
Feature Matching + Homography to find Objects {#tutorial_py_feature_homography}
|
||||
=============================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter,
|
||||
- We will mix up the feature matching and findHomography from calib3d module to find known
|
||||
objects in a complex image.
|
||||
|
||||
Basics
|
||||
------
|
||||
|
||||
So what we did in last session? We used a queryImage, found some feature points in it, we took
|
||||
another trainImage, found the features in that image too and we found the best matches among them.
|
||||
In short, we found locations of some parts of an object in another cluttered image. This information
|
||||
is sufficient to find the object exactly on the trainImage.
|
||||
|
||||
For that, we can use a function from calib3d module, ie **cv.findHomography()**. If we pass the set
|
||||
of points from both the images, it will find the perspective transformation of that object. Then we
|
||||
can use **cv.perspectiveTransform()** to find the object. It needs atleast four correct points to
|
||||
find the transformation.
|
||||
|
||||
We have seen that there can be some possible errors while matching which may affect the result. To
|
||||
solve this problem, algorithm uses RANSAC or LEAST_MEDIAN (which can be decided by the flags). So
|
||||
good matches which provide correct estimation are called inliers and remaining are called outliers.
|
||||
**cv.findHomography()** returns a mask which specifies the inlier and outlier points.
|
||||
|
||||
So let's do it !!!
|
||||
|
||||
Code
|
||||
----
|
||||
|
||||
First, as usual, let's find SIFT features in images and apply the ratio test to find the best
|
||||
matches.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
MIN_MATCH_COUNT = 10
|
||||
|
||||
img1 = cv.imread('box.png',0) # queryImage
|
||||
img2 = cv.imread('box_in_scene.png',0) # trainImage
|
||||
|
||||
# Initiate SIFT detector
|
||||
sift = cv.SIFT_create()
|
||||
|
||||
# find the keypoints and descriptors with SIFT
|
||||
kp1, des1 = sift.detectAndCompute(img1,None)
|
||||
kp2, des2 = sift.detectAndCompute(img2,None)
|
||||
|
||||
FLANN_INDEX_KDTREE = 1
|
||||
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
|
||||
search_params = dict(checks = 50)
|
||||
|
||||
flann = cv.FlannBasedMatcher(index_params, search_params)
|
||||
|
||||
matches = flann.knnMatch(des1,des2,k=2)
|
||||
|
||||
# store all the good matches as per Lowe's ratio test.
|
||||
good = []
|
||||
for m,n in matches:
|
||||
if m.distance < 0.7*n.distance:
|
||||
good.append(m)
|
||||
@endcode
|
||||
Now we set a condition that atleast 10 matches (defined by MIN_MATCH_COUNT) are to be there to
|
||||
find the object. Otherwise simply show a message saying not enough matches are present.
|
||||
|
||||
If enough matches are found, we extract the locations of matched keypoints in both the images. They
|
||||
are passed to find the perspective transformation. Once we get this 3x3 transformation matrix, we use
|
||||
it to transform the corners of queryImage to corresponding points in trainImage. Then we draw it.
|
||||
@code{.py}
|
||||
if len(good)>MIN_MATCH_COUNT:
|
||||
src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
|
||||
dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)
|
||||
|
||||
M, mask = cv.findHomography(src_pts, dst_pts, cv.RANSAC,5.0)
|
||||
matchesMask = mask.ravel().tolist()
|
||||
|
||||
h,w,d = img1.shape
|
||||
pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2)
|
||||
dst = cv.perspectiveTransform(pts,M)
|
||||
|
||||
img2 = cv.polylines(img2,[np.int32(dst)],True,255,3, cv.LINE_AA)
|
||||
|
||||
else:
|
||||
print( "Not enough matches are found - {}/{}".format(len(good), MIN_MATCH_COUNT) )
|
||||
matchesMask = None
|
||||
@endcode
|
||||
Finally we draw our inliers (if successfully found the object) or matching keypoints (if failed).
|
||||
@code{.py}
|
||||
draw_params = dict(matchColor = (0,255,0), # draw matches in green color
|
||||
singlePointColor = None,
|
||||
matchesMask = matchesMask, # draw only inliers
|
||||
flags = 2)
|
||||
|
||||
img3 = cv.drawMatches(img1,kp1,img2,kp2,good,None,**draw_params)
|
||||
|
||||
plt.imshow(img3, 'gray'),plt.show()
|
||||
@endcode
|
||||
See the result below. Object is marked in white color in cluttered image:
|
||||
|
||||

|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_features_harris/images/harris_region.jpg
vendored
Normal file
After Width: | Height: | Size: 17 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_features_harris/images/harris_result.jpg
vendored
Normal file
After Width: | Height: | Size: 34 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_features_harris/images/subpixel3.png
vendored
Normal file
After Width: | Height: | Size: 16 KiB |
150
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_features_harris/py_features_harris.markdown
vendored
Normal file
@ -0,0 +1,150 @@
|
||||
Harris Corner Detection {#tutorial_py_features_harris}
|
||||
=======================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter,
|
||||
|
||||
- We will understand the concepts behind Harris Corner Detection.
|
||||
- We will see the following functions: **cv.cornerHarris()**, **cv.cornerSubPix()**
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
In the last chapter, we saw that corners are regions in the image with large variation in intensity in
|
||||
all the directions. One early attempt to find these corners was done by **Chris Harris & Mike
|
||||
Stephens** in their paper **A Combined Corner and Edge Detector** in 1988, so now it is called
|
||||
the Harris Corner Detector. He took this simple idea to a mathematical form. It basically finds the
|
||||
difference in intensity for a displacement of \f$(u,v)\f$ in all directions. This is expressed as below:
|
||||
|
||||
\f[E(u,v) = \sum_{x,y} \underbrace{w(x,y)}_\text{window function} \, [\underbrace{I(x+u,y+v)}_\text{shifted intensity}-\underbrace{I(x,y)}_\text{intensity}]^2\f]
|
||||
|
||||
The window function is either a rectangular window or a Gaussian window which gives weights to pixels
|
||||
underneath.
|
||||
|
||||
We have to maximize this function \f$E(u,v)\f$ for corner detection. That means we have to maximize the
|
||||
second term. Applying Taylor Expansion to the above equation and using some mathematical steps (please
|
||||
refer to any standard text books you like for full derivation), we get the final equation as:
|
||||
|
||||
\f[E(u,v) \approx \begin{bmatrix} u & v \end{bmatrix} M \begin{bmatrix} u \\ v \end{bmatrix}\f]
|
||||
|
||||
where
|
||||
|
||||
\f[M = \sum_{x,y} w(x,y) \begin{bmatrix}I_x I_x & I_x I_y \\
|
||||
I_x I_y & I_y I_y \end{bmatrix}\f]
|
||||
|
||||
Here, \f$I_x\f$ and \f$I_y\f$ are image derivatives in x and y directions respectively. (These can be easily found
|
||||
using **cv.Sobel()**).
|
||||
|
||||
Then comes the main part. After this, they created a score, basically an equation, which
|
||||
determines if a window can contain a corner or not.
|
||||
|
||||
\f[R = \det(M) - k(\operatorname{trace}(M))^2\f]
|
||||
|
||||
where
|
||||
- \f$\det(M) = \lambda_1 \lambda_2\f$
|
||||
- \f$\operatorname{trace}(M) = \lambda_1 + \lambda_2\f$
|
||||
- \f$\lambda_1\f$ and \f$\lambda_2\f$ are the eigenvalues of \f$M\f$
|
||||
|
||||
So the magnitudes of these eigenvalues decide whether a region is a corner, an edge, or flat.
|
||||
|
||||
- When \f$|R|\f$ is small, which happens when \f$\lambda_1\f$ and \f$\lambda_2\f$ are small, the region is
|
||||
flat.
|
||||
- When \f$R<0\f$, which happens when \f$\lambda_1 >> \lambda_2\f$ or vice versa, the region is edge.
|
||||
- When \f$R\f$ is large, which happens when \f$\lambda_1\f$ and \f$\lambda_2\f$ are large and
|
||||
\f$\lambda_1 \sim \lambda_2\f$, the region is a corner.
|
||||
|
||||
It can be represented in a nice picture as follows:
|
||||
|
||||

|
||||
|
||||
So the result of Harris Corner Detection is a grayscale image with these scores. Thresholding for a
|
||||
suitable score gives you the corners in the image. We will do it with a simple image.
|
||||
|
||||
Harris Corner Detector in OpenCV
|
||||
--------------------------------
|
||||
|
||||
OpenCV has the function **cv.cornerHarris()** for this purpose. Its arguments are:
|
||||
|
||||
- **img** - Input image. It should be grayscale and float32 type.
|
||||
- **blockSize** - It is the size of neighbourhood considered for corner detection
|
||||
- **ksize** - Aperture parameter of the Sobel derivative used.
|
||||
- **k** - Harris detector free parameter in the equation.
|
||||
|
||||
See the example below:
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
|
||||
filename = 'chessboard.png'
|
||||
img = cv.imread(filename)
|
||||
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
|
||||
|
||||
gray = np.float32(gray)
|
||||
dst = cv.cornerHarris(gray,2,3,0.04)
|
||||
|
||||
#result is dilated for marking the corners, not important
|
||||
dst = cv.dilate(dst,None)
|
||||
|
||||
# Threshold for an optimal value, it may vary depending on the image.
|
||||
img[dst>0.01*dst.max()]=[0,0,255]
|
||||
|
||||
cv.imshow('dst',img)
|
||||
if cv.waitKey(0) & 0xff == 27:
|
||||
cv.destroyAllWindows()
|
||||
@endcode
|
||||
Below are the three results:
|
||||
|
||||

|
||||
|
||||
Corner with SubPixel Accuracy
|
||||
-----------------------------
|
||||
|
||||
Sometimes, you may need to find the corners with maximum accuracy. OpenCV comes with a function
|
||||
**cv.cornerSubPix()** which further refines the corners detected with sub-pixel accuracy. Below is
|
||||
an example. As usual, we need to find the Harris corners first. Then we pass the centroids of these
|
||||
corners (There may be a bunch of pixels at a corner, we take their centroid) to refine them. Harris
|
||||
corners are marked in red pixels and refined corners are marked in green pixels. For this function,
|
||||
we have to define the criteria when to stop the iteration. We stop it after a specified number of
|
||||
iterations or a certain accuracy is achieved, whichever occurs first. We also need to define the size
|
||||
of the neighbourhood it searches for corners.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
|
||||
filename = 'chessboard2.jpg'
|
||||
img = cv.imread(filename)
|
||||
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
|
||||
|
||||
# find Harris corners
|
||||
gray = np.float32(gray)
|
||||
dst = cv.cornerHarris(gray,2,3,0.04)
|
||||
dst = cv.dilate(dst,None)
|
||||
ret, dst = cv.threshold(dst,0.01*dst.max(),255,0)
|
||||
dst = np.uint8(dst)
|
||||
|
||||
# find centroids
|
||||
ret, labels, stats, centroids = cv.connectedComponentsWithStats(dst)
|
||||
|
||||
# define the criteria to stop and refine the corners
|
||||
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 100, 0.001)
|
||||
corners = cv.cornerSubPix(gray,np.float32(centroids),(5,5),(-1,-1),criteria)
|
||||
|
||||
# Now draw them
|
||||
res = np.hstack((centroids,corners))
|
||||
res = np.int0(res)
|
||||
img[res[:,1],res[:,0]]=[0,0,255]
|
||||
img[res[:,3],res[:,2]] = [0,255,0]
|
||||
|
||||
cv.imwrite('subpixel5.png',img)
|
||||
@endcode
|
||||
Below is the result, where some important locations are shown in the zoomed window to visualize:
|
||||
|
||||

|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_features_meaning/images/feature_building.jpg
vendored
Normal file
After Width: | Height: | Size: 49 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_features_meaning/images/feature_simple.png
vendored
Normal file
After Width: | Height: | Size: 1.0 KiB |
@ -0,0 +1,89 @@
|
||||
Understanding Features {#tutorial_py_features_meaning}
|
||||
======================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter, we will just try to understand what are features, why are they important, why
|
||||
corners are important etc.
|
||||
|
||||
Explanation
|
||||
-----------
|
||||
|
||||
Most of you will have played the jigsaw puzzle games. You get a lot of small pieces of an image,
|
||||
where you need to assemble them correctly to form a big real image. **The question is, how you do
|
||||
it?** What about the projecting the same theory to a computer program so that computer can play
|
||||
jigsaw puzzles? If the computer can play jigsaw puzzles, why can't we give a lot of real-life images
|
||||
of a good natural scenery to computer and tell it to stitch all those images to a big single image?
|
||||
If the computer can stitch several natural images to one, what about giving a lot of pictures of a
|
||||
building or any structure and tell computer to create a 3D model out of it?
|
||||
|
||||
Well, the questions and imaginations continue. But it all depends on the most basic question: How do
|
||||
you play jigsaw puzzles? How do you arrange lots of scrambled image pieces into a big single image?
|
||||
How can you stitch a lot of natural images to a single image?
|
||||
|
||||
The answer is, we are looking for specific patterns or specific features which are unique, can
|
||||
be easily tracked and can be easily compared. If we go for a definition of such a feature, we may
|
||||
find it difficult to express it in words, but we know what they are. If someone asks you to point
|
||||
out one good feature which can be compared across several images, you can point out one. That is
|
||||
why even small children can simply play these games. We search for these features in an image,
|
||||
find them, look for the same features in other images and align them. That's it. (In jigsaw puzzle,
|
||||
we look more into continuity of different images). All these abilities are present in us inherently.
|
||||
|
||||
So our one basic question expands to more in number, but becomes more specific. **What are these
|
||||
features?**. (The answer should be understandable also to a computer.)
|
||||
|
||||
It is difficult to say how humans find these features. This is already programmed in our brain.
|
||||
But if we look deep into some pictures and search for different patterns, we will find something
|
||||
interesting. For example, take below image:
|
||||
|
||||

|
||||
|
||||
The image is very simple. At the top of image, six small image patches are given. Question for you is to
|
||||
find the exact location of these patches in the original image. How many correct results can you
|
||||
find?
|
||||
|
||||
A and B are flat surfaces and they are spread over a lot of area. It is difficult to find the exact
|
||||
location of these patches.
|
||||
|
||||
C and D are much more simple. They are edges of the building. You can find an approximate location,
|
||||
but exact location is still difficult. This is because the pattern is same everywhere along the edge.
|
||||
At the edge, however, it is different. An edge is therefore better feature compared to flat area, but
|
||||
not good enough (It is good in jigsaw puzzle for comparing continuity of edges).
|
||||
|
||||
Finally, E and F are some corners of the building. And they can be easily found. Because at the
|
||||
corners, wherever you move this patch, it will look different. So they can be considered as good
|
||||
features. So now we move into simpler (and widely used image) for better understanding.
|
||||
|
||||

|
||||
|
||||
Just like above, the blue patch is flat area and difficult to find and track. Wherever you move the blue
|
||||
patch it looks the same. The black patch has an edge. If you move it in vertical direction (i.e.
|
||||
along the gradient) it changes. Moved along the edge (parallel to edge), it looks the same. And for
|
||||
red patch, it is a corner. Wherever you move the patch, it looks different, means it is unique. So
|
||||
basically, corners are considered to be good features in an image. (Not just corners, in some cases
|
||||
blobs are considered good features).
|
||||
|
||||
So now we answered our question, "what are these features?". But next question arises. How do we
|
||||
find them? Or how do we find the corners?. We answered that in an intuitive way, i.e., look for
|
||||
the regions in images which have maximum variation when moved (by a small amount) in all regions
|
||||
around it. This would be projected into computer language in coming chapters. So finding these image
|
||||
features is called **Feature Detection**.
|
||||
|
||||
We found the features in the images. Once you have found it, you should be able to find the same
|
||||
in the other images. How is this done? We take a region around the feature, we explain it in our own
|
||||
words, like "upper part is blue sky, lower part is region from a building, on that building there is
|
||||
glass etc" and you search for the same area in the other images. Basically, you are describing the
|
||||
feature. Similarly, a computer also should describe the region around the feature so that it can
|
||||
find it in other images. So called description is called **Feature Description**. Once you have the
|
||||
features and its description, you can find same features in all images and align them, stitch them together
|
||||
or do whatever you want.
|
||||
|
||||
So in this module, we are looking to different algorithms in OpenCV to find features, describe them,
|
||||
match them etc.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_matcher/images/matcher_flann.jpg
vendored
Normal file
After Width: | Height: | Size: 34 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_matcher/images/matcher_result1.jpg
vendored
Normal file
After Width: | Height: | Size: 31 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_matcher/images/matcher_result2.jpg
vendored
Normal file
After Width: | Height: | Size: 22 KiB |
217
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_matcher/py_matcher.markdown
vendored
Normal file
@ -0,0 +1,217 @@
|
||||
Feature Matching {#tutorial_py_matcher}
|
||||
================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter
|
||||
- We will see how to match features in one image with others.
|
||||
- We will use the Brute-Force matcher and FLANN Matcher in OpenCV
|
||||
|
||||
Basics of Brute-Force Matcher
|
||||
-----------------------------
|
||||
|
||||
Brute-Force matcher is simple. It takes the descriptor of one feature in first set and is matched
|
||||
with all other features in second set using some distance calculation. And the closest one is
|
||||
returned.
|
||||
|
||||
For BF matcher, first we have to create the BFMatcher object using **cv.BFMatcher()**. It takes two
|
||||
optional params. First one is normType. It specifies the distance measurement to be used. By
|
||||
default, it is cv.NORM_L2. It is good for SIFT, SURF etc (cv.NORM_L1 is also there). For binary
|
||||
string based descriptors like ORB, BRIEF, BRISK etc, cv.NORM_HAMMING should be used, which used
|
||||
Hamming distance as measurement. If ORB is using WTA_K == 3 or 4, cv.NORM_HAMMING2 should be
|
||||
used.
|
||||
|
||||
Second param is boolean variable, crossCheck which is false by default. If it is true, Matcher
|
||||
returns only those matches with value (i,j) such that i-th descriptor in set A has j-th descriptor
|
||||
in set B as the best match and vice-versa. That is, the two features in both sets should match each
|
||||
other. It provides consistent result, and is a good alternative to ratio test proposed by D.Lowe in
|
||||
SIFT paper.
|
||||
|
||||
Once it is created, two important methods are *BFMatcher.match()* and *BFMatcher.knnMatch()*. First
|
||||
one returns the best match. Second method returns k best matches where k is specified by the user.
|
||||
It may be useful when we need to do additional work on that.
|
||||
|
||||
Like we used cv.drawKeypoints() to draw keypoints, **cv.drawMatches()** helps us to draw the
|
||||
matches. It stacks two images horizontally and draw lines from first image to second image showing
|
||||
best matches. There is also **cv.drawMatchesKnn** which draws all the k best matches. If k=2, it
|
||||
will draw two match-lines for each keypoint. So we have to pass a mask if we want to selectively
|
||||
draw it.
|
||||
|
||||
Let's see one example for each of SIFT and ORB (Both use different distance measurements).
|
||||
|
||||
### Brute-Force Matching with ORB Descriptors
|
||||
|
||||
Here, we will see a simple example on how to match features between two images. In this case, I have
|
||||
a queryImage and a trainImage. We will try to find the queryImage in trainImage using feature
|
||||
matching. ( The images are /samples/data/box.png and /samples/data/box_in_scene.png)
|
||||
|
||||
We are using ORB descriptors to match features. So let's start with loading images, finding
|
||||
descriptors etc.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
img1 = cv.imread('box.png',cv.IMREAD_GRAYSCALE) # queryImage
|
||||
img2 = cv.imread('box_in_scene.png',cv.IMREAD_GRAYSCALE) # trainImage
|
||||
|
||||
# Initiate ORB detector
|
||||
orb = cv.ORB_create()
|
||||
|
||||
# find the keypoints and descriptors with ORB
|
||||
kp1, des1 = orb.detectAndCompute(img1,None)
|
||||
kp2, des2 = orb.detectAndCompute(img2,None)
|
||||
@endcode
|
||||
Next we create a BFMatcher object with distance measurement cv.NORM_HAMMING (since we are using
|
||||
ORB) and crossCheck is switched on for better results. Then we use Matcher.match() method to get the
|
||||
best matches in two images. We sort them in ascending order of their distances so that best matches
|
||||
(with low distance) come to front. Then we draw only first 10 matches (Just for sake of visibility.
|
||||
You can increase it as you like)
|
||||
@code{.py}
|
||||
# create BFMatcher object
|
||||
bf = cv.BFMatcher(cv.NORM_HAMMING, crossCheck=True)
|
||||
|
||||
# Match descriptors.
|
||||
matches = bf.match(des1,des2)
|
||||
|
||||
# Sort them in the order of their distance.
|
||||
matches = sorted(matches, key = lambda x:x.distance)
|
||||
|
||||
# Draw first 10 matches.
|
||||
img3 = cv.drawMatches(img1,kp1,img2,kp2,matches[:10],None,flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
|
||||
|
||||
plt.imshow(img3),plt.show()
|
||||
@endcode
|
||||
Below is the result I got:
|
||||
|
||||

|
||||
|
||||
### What is this Matcher Object?
|
||||
|
||||
The result of matches = bf.match(des1,des2) line is a list of DMatch objects. This DMatch object has
|
||||
following attributes:
|
||||
|
||||
- DMatch.distance - Distance between descriptors. The lower, the better it is.
|
||||
- DMatch.trainIdx - Index of the descriptor in train descriptors
|
||||
- DMatch.queryIdx - Index of the descriptor in query descriptors
|
||||
- DMatch.imgIdx - Index of the train image.
|
||||
|
||||
### Brute-Force Matching with SIFT Descriptors and Ratio Test
|
||||
|
||||
This time, we will use BFMatcher.knnMatch() to get k best matches. In this example, we will take k=2
|
||||
so that we can apply ratio test explained by D.Lowe in his paper.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
img1 = cv.imread('box.png',cv.IMREAD_GRAYSCALE) # queryImage
|
||||
img2 = cv.imread('box_in_scene.png',cv.IMREAD_GRAYSCALE) # trainImage
|
||||
|
||||
# Initiate SIFT detector
|
||||
sift = cv.SIFT_create()
|
||||
|
||||
# find the keypoints and descriptors with SIFT
|
||||
kp1, des1 = sift.detectAndCompute(img1,None)
|
||||
kp2, des2 = sift.detectAndCompute(img2,None)
|
||||
|
||||
# BFMatcher with default params
|
||||
bf = cv.BFMatcher()
|
||||
matches = bf.knnMatch(des1,des2,k=2)
|
||||
|
||||
# Apply ratio test
|
||||
good = []
|
||||
for m,n in matches:
|
||||
if m.distance < 0.75*n.distance:
|
||||
good.append([m])
|
||||
|
||||
# cv.drawMatchesKnn expects list of lists as matches.
|
||||
img3 = cv.drawMatchesKnn(img1,kp1,img2,kp2,good,None,flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
|
||||
|
||||
plt.imshow(img3),plt.show()
|
||||
@endcode
|
||||
See the result below:
|
||||
|
||||

|
||||
|
||||
FLANN based Matcher
|
||||
-------------------
|
||||
|
||||
FLANN stands for Fast Library for Approximate Nearest Neighbors. It contains a collection of
|
||||
algorithms optimized for fast nearest neighbor search in large datasets and for high dimensional
|
||||
features. It works faster than BFMatcher for large datasets. We will see the second example
|
||||
with FLANN based matcher.
|
||||
|
||||
For FLANN based matcher, we need to pass two dictionaries which specifies the algorithm to be used,
|
||||
its related parameters etc. First one is IndexParams. For various algorithms, the information to be
|
||||
passed is explained in FLANN docs. As a summary, for algorithms like SIFT, SURF etc. you can pass
|
||||
following:
|
||||
@code{.py}
|
||||
FLANN_INDEX_KDTREE = 1
|
||||
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
|
||||
@endcode
|
||||
While using ORB, you can pass the following. The commented values are recommended as per the docs,
|
||||
but it didn't provide required results in some cases. Other values worked fine.:
|
||||
@code{.py}
|
||||
FLANN_INDEX_LSH = 6
|
||||
index_params= dict(algorithm = FLANN_INDEX_LSH,
|
||||
table_number = 6, # 12
|
||||
key_size = 12, # 20
|
||||
multi_probe_level = 1) #2
|
||||
@endcode
|
||||
Second dictionary is the SearchParams. It specifies the number of times the trees in the index
|
||||
should be recursively traversed. Higher values gives better precision, but also takes more time. If
|
||||
you want to change the value, pass search_params = dict(checks=100).
|
||||
|
||||
With this information, we are good to go.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
img1 = cv.imread('box.png',cv.IMREAD_GRAYSCALE) # queryImage
|
||||
img2 = cv.imread('box_in_scene.png',cv.IMREAD_GRAYSCALE) # trainImage
|
||||
|
||||
# Initiate SIFT detector
|
||||
sift = cv.SIFT_create()
|
||||
|
||||
# find the keypoints and descriptors with SIFT
|
||||
kp1, des1 = sift.detectAndCompute(img1,None)
|
||||
kp2, des2 = sift.detectAndCompute(img2,None)
|
||||
|
||||
# FLANN parameters
|
||||
FLANN_INDEX_KDTREE = 1
|
||||
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
|
||||
search_params = dict(checks=50) # or pass empty dictionary
|
||||
|
||||
flann = cv.FlannBasedMatcher(index_params,search_params)
|
||||
|
||||
matches = flann.knnMatch(des1,des2,k=2)
|
||||
|
||||
# Need to draw only good matches, so create a mask
|
||||
matchesMask = [[0,0] for i in range(len(matches))]
|
||||
|
||||
# ratio test as per Lowe's paper
|
||||
for i,(m,n) in enumerate(matches):
|
||||
if m.distance < 0.7*n.distance:
|
||||
matchesMask[i]=[1,0]
|
||||
|
||||
draw_params = dict(matchColor = (0,255,0),
|
||||
singlePointColor = (255,0,0),
|
||||
matchesMask = matchesMask,
|
||||
flags = cv.DrawMatchesFlags_DEFAULT)
|
||||
|
||||
img3 = cv.drawMatchesKnn(img1,kp1,img2,kp2,matches,None,**draw_params)
|
||||
|
||||
plt.imshow(img3,),plt.show()
|
||||
@endcode
|
||||
See the result below:
|
||||
|
||||

|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_orb/images/orb_kp.jpg
vendored
Normal file
After Width: | Height: | Size: 23 KiB |
98
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_orb/py_orb.markdown
vendored
Normal file
@ -0,0 +1,98 @@
|
||||
ORB (Oriented FAST and Rotated BRIEF) {#tutorial_py_orb}
|
||||
=====================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter,
|
||||
- We will see the basics of ORB
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
As an OpenCV enthusiast, the most important thing about the ORB is that it came from "OpenCV Labs".
|
||||
This algorithm was brought up by Ethan Rublee, Vincent Rabaud, Kurt Konolige and Gary R. Bradski in
|
||||
their paper **ORB: An efficient alternative to SIFT or SURF** in 2011. As the title says, it is a
|
||||
good alternative to SIFT and SURF in computation cost, matching performance and mainly the patents.
|
||||
Yes, SIFT and SURF are patented and you are supposed to pay them for its use. But ORB is not !!!
|
||||
|
||||
ORB is basically a fusion of FAST keypoint detector and BRIEF descriptor with many modifications to
|
||||
enhance the performance. First it use FAST to find keypoints, then apply Harris corner measure to
|
||||
find top N points among them. It also use pyramid to produce multiscale-features. But one problem is
|
||||
that, FAST doesn't compute the orientation. So what about rotation invariance? Authors came up with
|
||||
following modification.
|
||||
|
||||
It computes the intensity weighted centroid of the patch with located corner at center. The
|
||||
direction of the vector from this corner point to centroid gives the orientation. To improve the
|
||||
rotation invariance, moments are computed with x and y which should be in a circular region of
|
||||
radius \f$r\f$, where \f$r\f$ is the size of the patch.
|
||||
|
||||
Now for descriptors, ORB use BRIEF descriptors. But we have already seen that BRIEF performs poorly
|
||||
with rotation. So what ORB does is to "steer" BRIEF according to the orientation of keypoints. For
|
||||
any feature set of \f$n\f$ binary tests at location \f$(x_i, y_i)\f$, define a \f$2 \times n\f$ matrix, \f$S\f$
|
||||
which contains the coordinates of these pixels. Then using the orientation of patch, \f$\theta\f$, its
|
||||
rotation matrix is found and rotates the \f$S\f$ to get steered(rotated) version \f$S_\theta\f$.
|
||||
|
||||
ORB discretize the angle to increments of \f$2 \pi /30\f$ (12 degrees), and construct a lookup table of
|
||||
precomputed BRIEF patterns. As long as the keypoint orientation \f$\theta\f$ is consistent across views,
|
||||
the correct set of points \f$S_\theta\f$ will be used to compute its descriptor.
|
||||
|
||||
BRIEF has an important property that each bit feature has a large variance and a mean near 0.5. But
|
||||
once it is oriented along keypoint direction, it loses this property and become more distributed.
|
||||
High variance makes a feature more discriminative, since it responds differentially to inputs.
|
||||
Another desirable property is to have the tests uncorrelated, since then each test will contribute
|
||||
to the result. To resolve all these, ORB runs a greedy search among all possible binary tests to
|
||||
find the ones that have both high variance and means close to 0.5, as well as being uncorrelated.
|
||||
The result is called **rBRIEF**.
|
||||
|
||||
For descriptor matching, multi-probe LSH which improves on the traditional LSH, is used. The paper
|
||||
says ORB is much faster than SURF and SIFT and ORB descriptor works better than SURF. ORB is a good
|
||||
choice in low-power devices for panorama stitching etc.
|
||||
|
||||
ORB in OpenCV
|
||||
-------------
|
||||
|
||||
As usual, we have to create an ORB object with the function, **cv.ORB()** or using feature2d common
|
||||
interface. It has a number of optional parameters. Most useful ones are nFeatures which denotes
|
||||
maximum number of features to be retained (by default 500), scoreType which denotes whether Harris
|
||||
score or FAST score to rank the features (by default, Harris score) etc. Another parameter, WTA_K
|
||||
decides number of points that produce each element of the oriented BRIEF descriptor. By default it
|
||||
is two, ie selects two points at a time. In that case, for matching, NORM_HAMMING distance is used.
|
||||
If WTA_K is 3 or 4, which takes 3 or 4 points to produce BRIEF descriptor, then matching distance
|
||||
is defined by NORM_HAMMING2.
|
||||
|
||||
Below is a simple code which shows the use of ORB.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
img = cv.imread('simple.jpg',0)
|
||||
|
||||
# Initiate ORB detector
|
||||
orb = cv.ORB_create()
|
||||
|
||||
# find the keypoints with ORB
|
||||
kp = orb.detect(img,None)
|
||||
|
||||
# compute the descriptors with ORB
|
||||
kp, des = orb.compute(img, kp)
|
||||
|
||||
# draw only keypoints location,not size and orientation
|
||||
img2 = cv.drawKeypoints(img, kp, None, color=(0,255,0), flags=0)
|
||||
plt.imshow(img2), plt.show()
|
||||
@endcode
|
||||
See the result below:
|
||||
|
||||

|
||||
|
||||
ORB feature matching, we will do in another chapter.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
-# Ethan Rublee, Vincent Rabaud, Kurt Konolige, Gary R. Bradski: ORB: An efficient alternative to
|
||||
SIFT or SURF. ICCV 2011: 2564-2571.
|
||||
|
||||
Exercises
|
||||
---------
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_shi_tomasi/images/shitomasi_block1.jpg
vendored
Normal file
After Width: | Height: | Size: 14 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_shi_tomasi/images/shitomasi_space.png
vendored
Normal file
After Width: | Height: | Size: 4.5 KiB |
75
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_shi_tomasi/py_shi_tomasi.markdown
vendored
Normal file
@ -0,0 +1,75 @@
|
||||
Shi-Tomasi Corner Detector & Good Features to Track {#tutorial_py_shi_tomasi}
|
||||
===================================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter,
|
||||
|
||||
- We will learn about the another corner detector: Shi-Tomasi Corner Detector
|
||||
- We will see the function: **cv.goodFeaturesToTrack()**
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
In last chapter, we saw Harris Corner Detector. Later in 1994, J. Shi and C. Tomasi made a small
|
||||
modification to it in their paper **Good Features to Track** which shows better results compared to
|
||||
Harris Corner Detector. The scoring function in Harris Corner Detector was given by:
|
||||
|
||||
\f[R = \lambda_1 \lambda_2 - k(\lambda_1+\lambda_2)^2\f]
|
||||
|
||||
Instead of this, Shi-Tomasi proposed:
|
||||
|
||||
\f[R = \min(\lambda_1, \lambda_2)\f]
|
||||
|
||||
If it is a greater than a threshold value, it is considered as a corner. If we plot it in
|
||||
\f$\lambda_1 - \lambda_2\f$ space as we did in Harris Corner Detector, we get an image as below:
|
||||
|
||||

|
||||
|
||||
From the figure, you can see that only when \f$\lambda_1\f$ and \f$\lambda_2\f$ are above a minimum value,
|
||||
\f$\lambda_{\min}\f$, it is considered as a corner(green region).
|
||||
|
||||
Code
|
||||
----
|
||||
|
||||
OpenCV has a function, **cv.goodFeaturesToTrack()**. It finds N strongest corners in the image by
|
||||
Shi-Tomasi method (or Harris Corner Detection, if you specify it). As usual, image should be a
|
||||
grayscale image. Then you specify number of corners you want to find. Then you specify the quality
|
||||
level, which is a value between 0-1, which denotes the minimum quality of corner below which
|
||||
everyone is rejected. Then we provide the minimum euclidean distance between corners detected.
|
||||
|
||||
With all this information, the function finds corners in the image. All corners below quality
|
||||
level are rejected. Then it sorts the remaining corners based on quality in the descending order.
|
||||
Then function takes first strongest corner, throws away all the nearby corners in the range of
|
||||
minimum distance and returns N strongest corners.
|
||||
|
||||
In below example, we will try to find 25 best corners:
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
img = cv.imread('blox.jpg')
|
||||
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
|
||||
|
||||
corners = cv.goodFeaturesToTrack(gray,25,0.01,10)
|
||||
corners = np.int0(corners)
|
||||
|
||||
for i in corners:
|
||||
x,y = i.ravel()
|
||||
cv.circle(img,(x,y),3,255,-1)
|
||||
|
||||
plt.imshow(img),plt.show()
|
||||
@endcode
|
||||
See the result below:
|
||||
|
||||

|
||||
|
||||
This function is more appropriate for tracking. We will see that when its time comes.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_sift_intro/images/sift_dog.jpg
vendored
Normal file
After Width: | Height: | Size: 30 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_sift_intro/images/sift_keypoints.jpg
vendored
Normal file
After Width: | Height: | Size: 33 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_sift_intro/images/sift_local_extrema.jpg
vendored
Normal file
After Width: | Height: | Size: 15 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_sift_intro/images/sift_scale_invariant.jpg
vendored
Normal file
After Width: | Height: | Size: 3.3 KiB |
168
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.markdown
vendored
Normal file
@ -0,0 +1,168 @@
|
||||
Introduction to SIFT (Scale-Invariant Feature Transform) {#tutorial_py_sift_intro}
|
||||
========================================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter,
|
||||
- We will learn about the concepts of SIFT algorithm
|
||||
- We will learn to find SIFT Keypoints and Descriptors.
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
In last couple of chapters, we saw some corner detectors like Harris etc. They are
|
||||
rotation-invariant, which means, even if the image is rotated, we can find the same corners. It is
|
||||
obvious because corners remain corners in rotated image also. But what about scaling? A corner may
|
||||
not be a corner if the image is scaled. For example, check a simple image below. A corner in a small
|
||||
image within a small window is flat when it is zoomed in the same window. So Harris corner is not
|
||||
scale invariant.
|
||||
|
||||

|
||||
|
||||
In 2004, **D.Lowe**, University of British Columbia, came up with a new algorithm, Scale
|
||||
Invariant Feature Transform (SIFT) in his paper, **Distinctive Image Features from Scale-Invariant
|
||||
Keypoints**, which extract keypoints and compute its descriptors. *(This paper is easy to understand
|
||||
and considered to be best material available on SIFT. This explanation is just a short summary of
|
||||
this paper)*.
|
||||
|
||||
There are mainly four steps involved in SIFT algorithm. We will see them one-by-one.
|
||||
|
||||
### 1. Scale-space Extrema Detection
|
||||
|
||||
From the image above, it is obvious that we can't use the same window to detect keypoints with
|
||||
different scale. It is OK with small corner. But to detect larger corners we need larger windows.
|
||||
For this, scale-space filtering is used. In it, Laplacian of Gaussian is found for the image with
|
||||
various \f$\sigma\f$ values. LoG acts as a blob detector which detects blobs in various sizes due to
|
||||
change in \f$\sigma\f$. In short, \f$\sigma\f$ acts as a scaling parameter. For eg, in the above image,
|
||||
gaussian kernel with low \f$\sigma\f$ gives high value for small corner while gaussian kernel with high
|
||||
\f$\sigma\f$ fits well for larger corner. So, we can find the local maxima across the scale and space
|
||||
which gives us a list of \f$(x,y,\sigma)\f$ values which means there is a potential keypoint at (x,y) at
|
||||
\f$\sigma\f$ scale.
|
||||
|
||||
But this LoG is a little costly, so SIFT algorithm uses Difference of Gaussians which is an
|
||||
approximation of LoG. Difference of Gaussian is obtained as the difference of Gaussian blurring of
|
||||
an image with two different \f$\sigma\f$, let it be \f$\sigma\f$ and \f$k\sigma\f$. This process is done for
|
||||
different octaves of the image in Gaussian Pyramid. It is represented in below image:
|
||||
|
||||

|
||||
|
||||
Once this DoG are found, images are searched for local extrema over scale and space. For eg, one
|
||||
pixel in an image is compared with its 8 neighbours as well as 9 pixels in next scale and 9 pixels
|
||||
in previous scales. If it is a local extrema, it is a potential keypoint. It basically means that
|
||||
keypoint is best represented in that scale. It is shown in below image:
|
||||
|
||||

|
||||
|
||||
Regarding different parameters, the paper gives some empirical data which can be summarized as,
|
||||
number of octaves = 4, number of scale levels = 5, initial \f$\sigma=1.6\f$, \f$k=\sqrt{2}\f$ etc as optimal
|
||||
values.
|
||||
|
||||
### 2. Keypoint Localization
|
||||
|
||||
Once potential keypoints locations are found, they have to be refined to get more accurate results.
|
||||
They used Taylor series expansion of scale space to get more accurate location of extrema, and if
|
||||
the intensity at this extrema is less than a threshold value (0.03 as per the paper), it is
|
||||
rejected. This threshold is called **contrastThreshold** in OpenCV
|
||||
|
||||
DoG has higher response for edges, so edges also need to be removed. For this, a concept similar to
|
||||
Harris corner detector is used. They used a 2x2 Hessian matrix (H) to compute the principal
|
||||
curvature. We know from Harris corner detector that for edges, one eigen value is larger than the
|
||||
other. So here they used a simple function,
|
||||
|
||||
If this ratio is greater than a threshold, called **edgeThreshold** in OpenCV, that keypoint is
|
||||
discarded. It is given as 10 in paper.
|
||||
|
||||
So it eliminates any low-contrast keypoints and edge keypoints and what remains is strong interest
|
||||
points.
|
||||
|
||||
### 3. Orientation Assignment
|
||||
|
||||
Now an orientation is assigned to each keypoint to achieve invariance to image rotation. A
|
||||
neighbourhood is taken around the keypoint location depending on the scale, and the gradient
|
||||
magnitude and direction is calculated in that region. An orientation histogram with 36 bins covering
|
||||
360 degrees is created (It is weighted by gradient magnitude and gaussian-weighted circular window
|
||||
with \f$\sigma\f$ equal to 1.5 times the scale of keypoint). The highest peak in the histogram is taken
|
||||
and any peak above 80% of it is also considered to calculate the orientation. It creates keypoints
|
||||
with same location and scale, but different directions. It contribute to stability of matching.
|
||||
|
||||
### 4. Keypoint Descriptor
|
||||
|
||||
Now keypoint descriptor is created. A 16x16 neighbourhood around the keypoint is taken. It is
|
||||
divided into 16 sub-blocks of 4x4 size. For each sub-block, 8 bin orientation histogram is created.
|
||||
So a total of 128 bin values are available. It is represented as a vector to form keypoint
|
||||
descriptor. In addition to this, several measures are taken to achieve robustness against
|
||||
illumination changes, rotation etc.
|
||||
|
||||
### 5. Keypoint Matching
|
||||
|
||||
Keypoints between two images are matched by identifying their nearest neighbours. But in some cases,
|
||||
the second closest-match may be very near to the first. It may happen due to noise or some other
|
||||
reasons. In that case, ratio of closest-distance to second-closest distance is taken. If it is
|
||||
greater than 0.8, they are rejected. It eliminates around 90% of false matches while discards only
|
||||
5% correct matches, as per the paper.
|
||||
|
||||
This is a summary of SIFT algorithm. For more details and understanding, reading the original
|
||||
paper is highly recommended.
|
||||
|
||||
SIFT in OpenCV
|
||||
--------------
|
||||
|
||||
Now let's see SIFT functionalities available in OpenCV. Note that these were previously only
|
||||
available in [the opencv contrib repo](https://github.com/opencv/opencv_contrib), but the patent
|
||||
expired in the year 2020. So they are now included in the main repo. Let's start with keypoint
|
||||
detection and draw them. First we have to construct a SIFT object. We can pass different
|
||||
parameters to it which are optional and they are well explained in docs.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
|
||||
img = cv.imread('home.jpg')
|
||||
gray= cv.cvtColor(img,cv.COLOR_BGR2GRAY)
|
||||
|
||||
sift = cv.SIFT_create()
|
||||
kp = sift.detect(gray,None)
|
||||
|
||||
img=cv.drawKeypoints(gray,kp,img)
|
||||
|
||||
cv.imwrite('sift_keypoints.jpg',img)
|
||||
@endcode
|
||||
**sift.detect()** function finds the keypoint in the images. You can pass a mask if you want to
|
||||
search only a part of image. Each keypoint is a special structure which has many attributes like its
|
||||
(x,y) coordinates, size of the meaningful neighbourhood, angle which specifies its orientation,
|
||||
response that specifies strength of keypoints etc.
|
||||
|
||||
OpenCV also provides **cv.drawKeyPoints()** function which draws the small circles on the locations
|
||||
of keypoints. If you pass a flag, **cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS** to it, it will
|
||||
draw a circle with size of keypoint and it will even show its orientation. See below example.
|
||||
@code{.py}
|
||||
img=cv.drawKeypoints(gray,kp,img,flags=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
|
||||
cv.imwrite('sift_keypoints.jpg',img)
|
||||
@endcode
|
||||
See the two results below:
|
||||
|
||||

|
||||
|
||||
Now to calculate the descriptor, OpenCV provides two methods.
|
||||
|
||||
-# Since you already found keypoints, you can call **sift.compute()** which computes the
|
||||
descriptors from the keypoints we have found. Eg: kp,des = sift.compute(gray,kp)
|
||||
2. If you didn't find keypoints, directly find keypoints and descriptors in a single step with the
|
||||
function, **sift.detectAndCompute()**.
|
||||
|
||||
We will see the second method:
|
||||
@code{.py}
|
||||
sift = cv.SIFT_create()
|
||||
kp, des = sift.detectAndCompute(gray,None)
|
||||
@endcode
|
||||
Here kp will be a list of keypoints and des is a numpy array of shape
|
||||
\f$\text{(Number of Keypoints)} \times 128\f$.
|
||||
|
||||
So we got keypoints, descriptors etc. Now we want to see how to match keypoints in different images.
|
||||
That we will learn in coming chapters.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_surf_intro/images/surf_boxfilter.jpg
vendored
Normal file
After Width: | Height: | Size: 13 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_surf_intro/images/surf_kp1.jpg
vendored
Normal file
After Width: | Height: | Size: 26 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_surf_intro/images/surf_kp2.jpg
vendored
Normal file
After Width: | Height: | Size: 28 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_surf_intro/images/surf_matching.jpg
vendored
Normal file
After Width: | Height: | Size: 12 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_surf_intro/images/surf_orientation.jpg
vendored
Normal file
After Width: | Height: | Size: 7.7 KiB |
163
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_surf_intro/py_surf_intro.markdown
vendored
Normal file
@ -0,0 +1,163 @@
|
||||
Introduction to SURF (Speeded-Up Robust Features) {#tutorial_py_surf_intro}
|
||||
=================================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter,
|
||||
- We will see the basics of SURF
|
||||
- We will see SURF functionalities in OpenCV
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
In last chapter, we saw SIFT for keypoint detection and description. But it was comparatively slow
|
||||
and people needed more speeded-up version. In 2006, three people, Bay, H., Tuytelaars, T. and Van
|
||||
Gool, L, published another paper, "SURF: Speeded Up Robust Features" which introduced a new
|
||||
algorithm called SURF. As name suggests, it is a speeded-up version of SIFT.
|
||||
|
||||
In SIFT, Lowe approximated Laplacian of Gaussian with Difference of Gaussian for finding
|
||||
scale-space. SURF goes a little further and approximates LoG with Box Filter. Below image shows a
|
||||
demonstration of such an approximation. One big advantage of this approximation is that, convolution
|
||||
with box filter can be easily calculated with the help of integral images. And it can be done in
|
||||
parallel for different scales. Also the SURF rely on determinant of Hessian matrix for both scale
|
||||
and location.
|
||||
|
||||

|
||||
|
||||
For orientation assignment, SURF uses wavelet responses in horizontal and vertical direction for a
|
||||
neighbourhood of size 6s. Adequate gaussian weights are also applied to it. Then they are plotted in
|
||||
a space as given in below image. The dominant orientation is estimated by calculating the sum of all
|
||||
responses within a sliding orientation window of angle 60 degrees. Interesting thing is that,
|
||||
wavelet response can be found out using integral images very easily at any scale. For many
|
||||
applications, rotation invariance is not required, so no need of finding this orientation, which
|
||||
speeds up the process. SURF provides such a functionality called Upright-SURF or U-SURF. It improves
|
||||
speed and is robust upto \f$\pm 15^{\circ}\f$. OpenCV supports both, depending upon the flag,
|
||||
**upright**. If it is 0, orientation is calculated. If it is 1, orientation is not calculated and it
|
||||
is faster.
|
||||
|
||||

|
||||
|
||||
For feature description, SURF uses Wavelet responses in horizontal and vertical direction (again,
|
||||
use of integral images makes things easier). A neighbourhood of size 20sX20s is taken around the
|
||||
keypoint where s is the size. It is divided into 4x4 subregions. For each subregion, horizontal and
|
||||
vertical wavelet responses are taken and a vector is formed like this,
|
||||
\f$v=( \sum{d_x}, \sum{d_y}, \sum{|d_x|}, \sum{|d_y|})\f$. This when represented as a vector gives SURF
|
||||
feature descriptor with total 64 dimensions. Lower the dimension, higher the speed of computation
|
||||
and matching, but provide better distinctiveness of features.
|
||||
|
||||
For more distinctiveness, SURF feature descriptor has an extended 128 dimension version. The sums of
|
||||
\f$d_x\f$ and \f$|d_x|\f$ are computed separately for \f$d_y < 0\f$ and \f$d_y \geq 0\f$. Similarly, the sums of
|
||||
\f$d_y\f$ and \f$|d_y|\f$ are split up according to the sign of \f$d_x\f$ , thereby doubling the number of
|
||||
features. It doesn't add much computation complexity. OpenCV supports both by setting the value of
|
||||
flag **extended** with 0 and 1 for 64-dim and 128-dim respectively (default is 128-dim)
|
||||
|
||||
Another important improvement is the use of sign of Laplacian (trace of Hessian Matrix) for
|
||||
underlying interest point. It adds no computation cost since it is already computed during
|
||||
detection. The sign of the Laplacian distinguishes bright blobs on dark backgrounds from the reverse
|
||||
situation. In the matching stage, we only compare features if they have the same type of contrast
|
||||
(as shown in image below). This minimal information allows for faster matching, without reducing the
|
||||
descriptor's performance.
|
||||
|
||||

|
||||
|
||||
In short, SURF adds a lot of features to improve the speed in every step. Analysis shows it is 3
|
||||
times faster than SIFT while performance is comparable to SIFT. SURF is good at handling images with
|
||||
blurring and rotation, but not good at handling viewpoint change and illumination change.
|
||||
|
||||
SURF in OpenCV
|
||||
--------------
|
||||
|
||||
OpenCV provides SURF functionalities just like SIFT. You initiate a SURF object with some optional
|
||||
conditions like 64/128-dim descriptors, Upright/Normal SURF etc. All the details are well explained
|
||||
in docs. Then as we did in SIFT, we can use SURF.detect(), SURF.compute() etc for finding keypoints
|
||||
and descriptors.
|
||||
|
||||
First we will see a simple demo on how to find SURF keypoints and descriptors and draw it. All
|
||||
examples are shown in Python terminal since it is just same as SIFT only.
|
||||
@code{.py}
|
||||
>>> img = cv.imread('fly.png',0)
|
||||
|
||||
# Create SURF object. You can specify params here or later.
|
||||
# Here I set Hessian Threshold to 400
|
||||
>>> surf = cv.xfeatures2d.SURF_create(400)
|
||||
|
||||
# Find keypoints and descriptors directly
|
||||
>>> kp, des = surf.detectAndCompute(img,None)
|
||||
|
||||
>>> len(kp)
|
||||
699
|
||||
@endcode
|
||||
1199 keypoints is too much to show in a picture. We reduce it to some 50 to draw it on an image.
|
||||
While matching, we may need all those features, but not now. So we increase the Hessian Threshold.
|
||||
@code{.py}
|
||||
# Check present Hessian threshold
|
||||
>>> print( surf.getHessianThreshold() )
|
||||
400.0
|
||||
|
||||
# We set it to some 50000. Remember, it is just for representing in picture.
|
||||
# In actual cases, it is better to have a value 300-500
|
||||
>>> surf.setHessianThreshold(50000)
|
||||
|
||||
# Again compute keypoints and check its number.
|
||||
>>> kp, des = surf.detectAndCompute(img,None)
|
||||
|
||||
>>> print( len(kp) )
|
||||
47
|
||||
@endcode
|
||||
It is less than 50. Let's draw it on the image.
|
||||
@code{.py}
|
||||
>>> img2 = cv.drawKeypoints(img,kp,None,(255,0,0),4)
|
||||
|
||||
>>> plt.imshow(img2),plt.show()
|
||||
@endcode
|
||||
See the result below. You can see that SURF is more like a blob detector. It detects the white blobs
|
||||
on wings of butterfly. You can test it with other images.
|
||||
|
||||

|
||||
|
||||
Now I want to apply U-SURF, so that it won't find the orientation.
|
||||
@code{.py}
|
||||
# Check upright flag, if it False, set it to True
|
||||
>>> print( surf.getUpright() )
|
||||
False
|
||||
|
||||
>>> surf.setUpright(True)
|
||||
|
||||
# Recompute the feature points and draw it
|
||||
>>> kp = surf.detect(img,None)
|
||||
>>> img2 = cv.drawKeypoints(img,kp,None,(255,0,0),4)
|
||||
|
||||
>>> plt.imshow(img2),plt.show()
|
||||
@endcode
|
||||
See the results below. All the orientations are shown in same direction. It is faster than
|
||||
previous. If you are working on cases where orientation is not a problem (like panorama stitching)
|
||||
etc, this is better.
|
||||
|
||||

|
||||
|
||||
Finally we check the descriptor size and change it to 128 if it is only 64-dim.
|
||||
@code{.py}
|
||||
# Find size of descriptor
|
||||
>>> print( surf.descriptorSize() )
|
||||
64
|
||||
|
||||
# That means flag, "extended" is False.
|
||||
>>> surf.getExtended()
|
||||
False
|
||||
|
||||
# So we make it to True to get 128-dim descriptors.
|
||||
>>> surf.setExtended(True)
|
||||
>>> kp, des = surf.detectAndCompute(img,None)
|
||||
>>> print( surf.descriptorSize() )
|
||||
128
|
||||
>>> print( des.shape )
|
||||
(47, 128)
|
||||
@endcode
|
||||
Remaining part is matching which we will do in another chapter.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
54
3rdparty/opencv-4.5.4/doc/py_tutorials/py_feature2d/py_table_of_contents_feature2d.markdown
vendored
Normal file
@ -0,0 +1,54 @@
|
||||
Feature Detection and Description {#tutorial_py_table_of_contents_feature2d}
|
||||
=================================
|
||||
|
||||
- @subpage tutorial_py_features_meaning
|
||||
|
||||
What are the main
|
||||
features in an image? How can finding those features be useful to us?
|
||||
|
||||
- @subpage tutorial_py_features_harris
|
||||
|
||||
Okay, Corners are good
|
||||
features? But how do we find them?
|
||||
|
||||
- @subpage tutorial_py_shi_tomasi
|
||||
|
||||
We will look into
|
||||
Shi-Tomasi corner detection
|
||||
|
||||
- @subpage tutorial_py_sift_intro
|
||||
|
||||
Harris corner detector
|
||||
is not good enough when scale of image changes. Lowe developed a breakthrough method to find
|
||||
scale-invariant features and it is called SIFT
|
||||
|
||||
- @subpage tutorial_py_surf_intro
|
||||
|
||||
SIFT is really good,
|
||||
but not fast enough, so people came up with a speeded-up version called SURF.
|
||||
|
||||
- @subpage tutorial_py_fast
|
||||
|
||||
All the above feature
|
||||
detection methods are good in some way. But they are not fast enough to work in real-time
|
||||
applications like SLAM. There comes the FAST algorithm, which is really "FAST".
|
||||
|
||||
- @subpage tutorial_py_brief
|
||||
|
||||
SIFT uses a feature
|
||||
descriptor with 128 floating point numbers. Consider thousands of such features. It takes lots of
|
||||
memory and more time for matching. We can compress it to make it faster. But still we have to
|
||||
calculate it first. There comes BRIEF which gives the shortcut to find binary descriptors with
|
||||
less memory, faster matching, still higher recognition rate.
|
||||
|
||||
- @subpage tutorial_py_orb
|
||||
|
||||
SIFT and SURF are good in what they do, but what if you have to pay a few dollars every year to use them in your applications? Yeah, they are patented!!! To solve that problem, OpenCV devs came up with a new "FREE" alternative to SIFT & SURF, and that is ORB.
|
||||
|
||||
- @subpage tutorial_py_matcher
|
||||
|
||||
We know a great deal about feature detectors and descriptors. It is time to learn how to match different descriptors. OpenCV provides two techniques, Brute-Force matcher and FLANN based matcher.
|
||||
|
||||
- @subpage tutorial_py_feature_homography
|
||||
|
||||
Now we know about feature matching. Let's mix it up with calib3d module to find objects in a complex image.
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_gui/images/drawing.jpg
vendored
Normal file
After Width: | Height: | Size: 2.5 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_gui/images/image_display.jpg
vendored
Normal file
After Width: | Height: | Size: 2.5 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_gui/images/mouse_drawing.jpg
vendored
Normal file
After Width: | Height: | Size: 1.5 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_gui/images/trackbar.jpg
vendored
Normal file
After Width: | Height: | Size: 2.3 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_gui/images/video_display.jpg
vendored
Normal file
After Width: | Height: | Size: 2.5 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_gui/py_drawing_functions/images/drawing_result.jpg
vendored
Normal file
After Width: | Height: | Size: 14 KiB |
113
3rdparty/opencv-4.5.4/doc/py_tutorials/py_gui/py_drawing_functions/py_drawing_functions.markdown
vendored
Normal file
@ -0,0 +1,113 @@
|
||||
Drawing Functions in OpenCV {#tutorial_py_drawing_functions}
|
||||
===========================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
- Learn to draw different geometric shapes with OpenCV
|
||||
- You will learn these functions : **cv.line()**, **cv.circle()** , **cv.rectangle()**,
|
||||
**cv.ellipse()**, **cv.putText()** etc.
|
||||
|
||||
Code
|
||||
----
|
||||
|
||||
In all the above functions, you will see some common arguments as given below:
|
||||
|
||||
- img : The image where you want to draw the shapes
|
||||
- color : Color of the shape. for BGR, pass it as a tuple, eg: (255,0,0) for blue. For
|
||||
grayscale, just pass the scalar value.
|
||||
- thickness : Thickness of the line or circle etc. If **-1** is passed for closed figures like
|
||||
circles, it will fill the shape. *default thickness = 1*
|
||||
- lineType : Type of line, whether 8-connected, anti-aliased line etc. *By default, it is
|
||||
8-connected.* cv.LINE_AA gives anti-aliased line which looks great for curves.
|
||||
|
||||
### Drawing Line
|
||||
|
||||
To draw a line, you need to pass starting and ending coordinates of line. We will create a black
|
||||
image and draw a blue line on it from top-left to bottom-right corners.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
|
||||
# Create a black image
|
||||
img = np.zeros((512,512,3), np.uint8)
|
||||
|
||||
# Draw a diagonal blue line with thickness of 5 px
|
||||
cv.line(img,(0,0),(511,511),(255,0,0),5)
|
||||
@endcode
|
||||
### Drawing Rectangle
|
||||
|
||||
To draw a rectangle, you need top-left corner and bottom-right corner of rectangle. This time we
|
||||
will draw a green rectangle at the top-right corner of image.
|
||||
@code{.py}
|
||||
cv.rectangle(img,(384,0),(510,128),(0,255,0),3)
|
||||
@endcode
|
||||
### Drawing Circle
|
||||
|
||||
To draw a circle, you need its center coordinates and radius. We will draw a circle inside the
|
||||
rectangle drawn above.
|
||||
@code{.py}
|
||||
cv.circle(img,(447,63), 63, (0,0,255), -1)
|
||||
@endcode
|
||||
### Drawing Ellipse
|
||||
|
||||
To draw the ellipse, we need to pass several arguments. One argument is the center location (x,y).
|
||||
Next argument is axes lengths (major axis length, minor axis length). angle is the angle of rotation
|
||||
of ellipse in anti-clockwise direction. startAngle and endAngle denotes the starting and ending of
|
||||
ellipse arc measured in clockwise direction from major axis. i.e. giving values 0 and 360 gives the
|
||||
full ellipse. For more details, check the documentation of **cv.ellipse()**. Below example draws a
|
||||
half ellipse at the center of the image.
|
||||
@code{.py}
|
||||
cv.ellipse(img,(256,256),(100,50),0,0,180,255,-1)
|
||||
@endcode
|
||||
### Drawing Polygon
|
||||
|
||||
To draw a polygon, first you need coordinates of vertices. Make those points into an array of shape
|
||||
ROWSx1x2 where ROWS are number of vertices and it should be of type int32. Here we draw a small
|
||||
polygon of with four vertices in yellow color.
|
||||
@code{.py}
|
||||
pts = np.array([[10,5],[20,30],[70,20],[50,10]], np.int32)
|
||||
pts = pts.reshape((-1,1,2))
|
||||
cv.polylines(img,[pts],True,(0,255,255))
|
||||
@endcode
|
||||
|
||||
@note If third argument is False, you will get a polylines joining all the points, not a closed
|
||||
shape.
|
||||
|
||||
@note cv.polylines() can be used to draw multiple lines. Just create a list of all the lines you
|
||||
want to draw and pass it to the function. All lines will be drawn individually. It is a much better
|
||||
and faster way to draw a group of lines than calling cv.line() for each line.
|
||||
|
||||
### Adding Text to Images:
|
||||
|
||||
To put texts in images, you need specify following things.
|
||||
- Text data that you want to write
|
||||
- Position coordinates of where you want put it (i.e. bottom-left corner where data starts).
|
||||
- Font type (Check **cv.putText()** docs for supported fonts)
|
||||
- Font Scale (specifies the size of font)
|
||||
- regular things like color, thickness, lineType etc. For better look, lineType = cv.LINE_AA
|
||||
is recommended.
|
||||
|
||||
We will write **OpenCV** on our image in white color.
|
||||
@code{.py}
|
||||
font = cv.FONT_HERSHEY_SIMPLEX
|
||||
cv.putText(img,'OpenCV',(10,500), font, 4,(255,255,255),2,cv.LINE_AA)
|
||||
@endcode
|
||||
|
||||
### Result
|
||||
|
||||
So it is time to see the final result of our drawing. As you studied in previous articles, display
|
||||
the image to see it.
|
||||
|
||||

|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
-# The angles used in ellipse function is not our circular angles. For more details, visit [this
|
||||
discussion](http://answers.opencv.org/question/14541/angles-in-ellipse-function/).
|
||||
|
||||
Exercises
|
||||
---------
|
||||
|
||||
-# Try to create the logo of OpenCV using drawing functions available in OpenCV.
|
4
3rdparty/opencv-4.5.4/doc/py_tutorials/py_gui/py_image_display/py_image_display.markdown
vendored
Normal file
@ -0,0 +1,4 @@
|
||||
Getting Started with Images {#tutorial_py_image_display}
|
||||
===========================
|
||||
|
||||
Tutorial content has been moved: @ref tutorial_display_image
|
111
3rdparty/opencv-4.5.4/doc/py_tutorials/py_gui/py_mouse_handling/py_mouse_handling.markdown
vendored
Normal file
@ -0,0 +1,111 @@
|
||||
Mouse as a Paint-Brush {#tutorial_py_mouse_handling}
|
||||
======================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
- Learn to handle mouse events in OpenCV
|
||||
- You will learn these functions : **cv.setMouseCallback()**
|
||||
|
||||
Simple Demo
|
||||
-----------
|
||||
|
||||
Here, we create a simple application which draws a circle on an image wherever we double-click on
|
||||
it.
|
||||
|
||||
First we create a mouse callback function which is executed when a mouse event take place. Mouse
|
||||
event can be anything related to mouse like left-button down, left-button up, left-button
|
||||
double-click etc. It gives us the coordinates (x,y) for every mouse event. With this event and
|
||||
location, we can do whatever we like. To list all available events available, run the following code
|
||||
in Python terminal:
|
||||
@code{.py}
|
||||
import cv2 as cv
|
||||
events = [i for i in dir(cv) if 'EVENT' in i]
|
||||
print( events )
|
||||
@endcode
|
||||
Creating mouse callback function has a specific format which is same everywhere. It differs only in
|
||||
what the function does. So our mouse callback function does one thing, it draws a circle where we
|
||||
double-click. So see the code below. Code is self-explanatory from comments :
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
|
||||
# mouse callback function
|
||||
def draw_circle(event,x,y,flags,param):
|
||||
if event == cv.EVENT_LBUTTONDBLCLK:
|
||||
cv.circle(img,(x,y),100,(255,0,0),-1)
|
||||
|
||||
# Create a black image, a window and bind the function to window
|
||||
img = np.zeros((512,512,3), np.uint8)
|
||||
cv.namedWindow('image')
|
||||
cv.setMouseCallback('image',draw_circle)
|
||||
|
||||
while(1):
|
||||
cv.imshow('image',img)
|
||||
if cv.waitKey(20) & 0xFF == 27:
|
||||
break
|
||||
cv.destroyAllWindows()
|
||||
@endcode
|
||||
More Advanced Demo
|
||||
------------------
|
||||
|
||||
Now we go for a much better application. In this, we draw either rectangles or circles (depending on
|
||||
the mode we select) by dragging the mouse like we do in Paint application. So our mouse callback
|
||||
function has two parts, one to draw rectangle and other to draw the circles. This specific example
|
||||
will be really helpful in creating and understanding some interactive applications like object
|
||||
tracking, image segmentation etc.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
|
||||
drawing = False # true if mouse is pressed
|
||||
mode = True # if True, draw rectangle. Press 'm' to toggle to curve
|
||||
ix,iy = -1,-1
|
||||
|
||||
# mouse callback function
|
||||
def draw_circle(event,x,y,flags,param):
|
||||
global ix,iy,drawing,mode
|
||||
|
||||
if event == cv.EVENT_LBUTTONDOWN:
|
||||
drawing = True
|
||||
ix,iy = x,y
|
||||
|
||||
elif event == cv.EVENT_MOUSEMOVE:
|
||||
if drawing == True:
|
||||
if mode == True:
|
||||
cv.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
|
||||
else:
|
||||
cv.circle(img,(x,y),5,(0,0,255),-1)
|
||||
|
||||
elif event == cv.EVENT_LBUTTONUP:
|
||||
drawing = False
|
||||
if mode == True:
|
||||
cv.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
|
||||
else:
|
||||
cv.circle(img,(x,y),5,(0,0,255),-1)
|
||||
@endcode
|
||||
Next we have to bind this mouse callback function to OpenCV window. In the main loop, we should set
|
||||
a keyboard binding for key 'm' to toggle between rectangle and circle.
|
||||
@code{.py}
|
||||
img = np.zeros((512,512,3), np.uint8)
|
||||
cv.namedWindow('image')
|
||||
cv.setMouseCallback('image',draw_circle)
|
||||
|
||||
while(1):
|
||||
cv.imshow('image',img)
|
||||
k = cv.waitKey(1) & 0xFF
|
||||
if k == ord('m'):
|
||||
mode = not mode
|
||||
elif k == 27:
|
||||
break
|
||||
|
||||
cv.destroyAllWindows()
|
||||
@endcode
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
||||
|
||||
-# In our last example, we drew filled rectangle. You modify the code to draw an unfilled
|
||||
rectangle.
|
27
3rdparty/opencv-4.5.4/doc/py_tutorials/py_gui/py_table_of_contents_gui.markdown
vendored
Normal file
@ -0,0 +1,27 @@
|
||||
Gui Features in OpenCV {#tutorial_py_table_of_contents_gui}
|
||||
======================
|
||||
|
||||
- @ref tutorial_display_image
|
||||
|
||||
Learn to load an
|
||||
image, display it, and save it back
|
||||
|
||||
- @subpage tutorial_py_video_display
|
||||
|
||||
Learn to play videos,
|
||||
capture videos from a camera, and write videos
|
||||
|
||||
- @subpage tutorial_py_drawing_functions
|
||||
|
||||
Learn to draw lines,
|
||||
rectangles, ellipses, circles, etc with OpenCV
|
||||
|
||||
- @subpage tutorial_py_mouse_handling
|
||||
|
||||
Draw stuff with your
|
||||
mouse
|
||||
|
||||
- @subpage tutorial_py_trackbar
|
||||
|
||||
Create trackbar to
|
||||
control certain parameters
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_gui/py_trackbar/images/trackbar_screenshot.jpg
vendored
Normal file
After Width: | Height: | Size: 13 KiB |