feat: 切换后端至PaddleOCR-NCNN,切换工程为CMake

1.项目后端整体迁移至PaddleOCR-NCNN算法,已通过基本的兼容性测试
2.工程改为使用CMake组织,后续为了更好地兼容第三方库,不再提供QMake工程
3.重整权利声明文件,重整代码工程,确保最小化侵权风险

Log: 切换后端至PaddleOCR-NCNN,切换工程为CMake
Change-Id: I4d5d2c5d37505a4a24b389b1a4c5d12f17bfa38c
This commit is contained in:
wangzhengyang
2022-05-10 09:54:44 +08:00
parent ecdd171c6f
commit 718c41634f
10018 changed files with 3593797 additions and 186748 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.6 KiB

View File

@ -0,0 +1,190 @@
How OpenCV-Python Bindings Works? {#tutorial_py_bindings_basics}
=================================
Goal
----
Learn:
- How OpenCV-Python bindings are generated?
- How to extend new OpenCV modules to Python?
How OpenCV-Python bindings are generated?
-----------------------------------------
In OpenCV, all algorithms are implemented in C++. But these algorithms can be used from different
languages like Python, Java etc. This is made possible by the bindings generators. These generators
create a bridge between C++ and Python which enables users to call C++ functions from Python. To get
a complete picture of what is happening in background, a good knowledge of Python/C API is required.
A simple example on extending C++ functions to Python can be found in official Python
documentation[1]. So extending all functions in OpenCV to Python by writing their wrapper functions
manually is a time-consuming task. So OpenCV does it in a more intelligent way. OpenCV generates
these wrapper functions automatically from the C++ headers using some Python scripts which are
located in `modules/python/src2`. We will look into what they do.
First, `modules/python/CMakeFiles.txt` is a CMake script which checks the modules to be extended to
Python. It will automatically check all the modules to be extended and grab their header files.
These header files contain list of all classes, functions, constants etc. for that particular
modules.
Second, these header files are passed to a Python script, `modules/python/src2/gen2.py`. This is the
Python bindings generator script. It calls another Python script `modules/python/src2/hdr_parser.py`.
This is the header parser script. This header parser splits the complete header file into small
Python lists. So these lists contain all details about a particular function, class etc. For
example, a function will be parsed to get a list containing function name, return type, input
arguments, argument types etc. Final list contains details of all the functions, enums, structs,
classes etc. in that header file.
But header parser doesn't parse all the functions/classes in the header file. The developer has to
specify which functions should be exported to Python. For that, there are certain macros added to
the beginning of these declarations which enables the header parser to identify functions to be
parsed. These macros are added by the developer who programs the particular function. In short, the
developer decides which functions should be extended to Python and which are not. Details of those
macros will be given in next session.
So header parser returns a final big list of parsed functions. Our generator script (gen2.py) will
create wrapper functions for all the functions/classes/enums/structs parsed by header parser (You
can find these header files during compilation in the `build/modules/python/` folder as
pyopencv_generated_\*.h files). But there may be some basic OpenCV datatypes like Mat, Vec4i,
Size. They need to be extended manually. For example, a Mat type should be extended to Numpy array,
Size should be extended to a tuple of two integers etc. Similarly, there may be some complex
structs/classes/functions etc. which need to be extended manually. All such manual wrapper functions
are placed in `modules/python/src2/cv2.cpp`.
So now only thing left is the compilation of these wrapper files which gives us **cv2** module. So
when you call a function, say `res = equalizeHist(img1,img2)` in Python, you pass two numpy arrays and
you expect another numpy array as the output. So these numpy arrays are converted to cv::Mat and
then calls the equalizeHist() function in C++. Final result, res will be converted back into a Numpy
array. So in short, almost all operations are done in C++ which gives us almost same speed as that
of C++.
So this is the basic version of how OpenCV-Python bindings are generated.
@note There is no 1:1 mapping of numpy.ndarray on cv::Mat. For example, cv::Mat has channels field,
which is emulated as last dimension of numpy.ndarray and implicitly converted.
However, such implicit conversion has problem with passing of 3D numpy arrays into C++ code
(the last dimension is implicitly reinterpreted as number of channels).
Refer to the [issue](https://github.com/opencv/opencv/issues/19091) for workarounds if you need to process 3D arrays or ND-arrays with channels.
OpenCV 4.5.4+ has `cv.Mat` wrapper derived from `numpy.ndarray` to explicitly handle the channels behavior.
How to extend new modules to Python?
------------------------------------
Header parser parse the header files based on some wrapper macros added to function declaration.
Enumeration constants don't need any wrapper macros. They are automatically wrapped. But remaining
functions, classes etc. need wrapper macros.
Functions are extended using `CV_EXPORTS_W` macro. An example is shown below.
@code{.cpp}
CV_EXPORTS_W void equalizeHist( InputArray src, OutputArray dst );
@endcode
Header parser can understand the input and output arguments from keywords like
InputArray, OutputArray etc. But sometimes, we may need to hardcode inputs and outputs. For that,
macros like `CV_OUT`, `CV_IN_OUT` etc. are used.
@code{.cpp}
CV_EXPORTS_W void minEnclosingCircle( InputArray points,
CV_OUT Point2f& center, CV_OUT float& radius );
@endcode
For large classes also, `CV_EXPORTS_W` is used. To extend class methods, `CV_WRAP` is used.
Similarly, `CV_PROP` is used for class fields.
@code{.cpp}
class CV_EXPORTS_W CLAHE : public Algorithm
{
public:
CV_WRAP virtual void apply(InputArray src, OutputArray dst) = 0;
CV_WRAP virtual void setClipLimit(double clipLimit) = 0;
CV_WRAP virtual double getClipLimit() const = 0;
}
@endcode
Overloaded functions can be extended using `CV_EXPORTS_AS`. But we need to pass a new name so that
each function will be called by that name in Python. Take the case of integral function below. Three
functions are available, so each one is named with a suffix in Python. Similarly `CV_WRAP_AS` can be
used to wrap overloaded methods.
@code{.cpp}
//! computes the integral image
CV_EXPORTS_W void integral( InputArray src, OutputArray sum, int sdepth = -1 );
//! computes the integral image and integral for the squared image
CV_EXPORTS_AS(integral2) void integral( InputArray src, OutputArray sum,
OutputArray sqsum, int sdepth = -1, int sqdepth = -1 );
//! computes the integral image, integral for the squared image and the tilted integral image
CV_EXPORTS_AS(integral3) void integral( InputArray src, OutputArray sum,
OutputArray sqsum, OutputArray tilted,
int sdepth = -1, int sqdepth = -1 );
@endcode
Small classes/structs are extended using `CV_EXPORTS_W_SIMPLE`. These structs are passed by value
to C++ functions. Examples are `KeyPoint`, `Match` etc. Their methods are extended by `CV_WRAP` and
fields are extended by `CV_PROP_RW`.
@code{.cpp}
class CV_EXPORTS_W_SIMPLE DMatch
{
public:
CV_WRAP DMatch();
CV_WRAP DMatch(int _queryIdx, int _trainIdx, float _distance);
CV_WRAP DMatch(int _queryIdx, int _trainIdx, int _imgIdx, float _distance);
CV_PROP_RW int queryIdx; // query descriptor index
CV_PROP_RW int trainIdx; // train descriptor index
CV_PROP_RW int imgIdx; // train image index
CV_PROP_RW float distance;
};
@endcode
Some other small classes/structs can be exported using `CV_EXPORTS_W_MAP` where it is exported to a
Python native dictionary. `Moments()` is an example of it.
@code{.cpp}
class CV_EXPORTS_W_MAP Moments
{
public:
//! spatial moments
CV_PROP_RW double m00, m10, m01, m20, m11, m02, m30, m21, m12, m03;
//! central moments
CV_PROP_RW double mu20, mu11, mu02, mu30, mu21, mu12, mu03;
//! central normalized moments
CV_PROP_RW double nu20, nu11, nu02, nu30, nu21, nu12, nu03;
};
@endcode
So these are the major extension macros available in OpenCV. Typically, a developer has to put
proper macros in their appropriate positions. Rest is done by generator scripts. Sometimes, there
may be an exceptional cases where generator scripts cannot create the wrappers. Such functions need
to be handled manually, to do this write your own `pyopencv_*.hpp` extending headers and put them into
misc/python subdirectory of your module. But most of the time, a code written according to OpenCV
coding guidelines will be automatically wrapped by generator scripts.
More advanced cases involves providing Python with additional features that does not exist
in the C++ interface such as extra methods, type mappings, or to provide default arguments.
We will take `UMat` datatype as an example of such cases later on.
First, to provide Python-specific methods, `CV_WRAP_PHANTOM` is utilized in a similar manner to
`CV_WRAP`, except that it takes the method header as its argument, and you would need to provide
the method body in your own `pyopencv_*.hpp` extension. `UMat::queue()` and `UMat::context()` are
an example of such phantom methods that does not exist in C++ interface, but are needed to handle
OpenCL functionalities at the Python side.
Second, if an already-existing datatype(s) is mappable to your class, it is highly preferable to
indicate such capacity using `CV_WRAP_MAPPABLE` with the source type as its argument,
rather than crafting your own binding function(s). This is the case of `UMat` which maps from `Mat`.
Finally, if a default argument is needed, but it is not provided in the native C++ interface,
you can provide it for Python side as the argument of `CV_WRAP_DEFAULT`. As per the `UMat::getMat`
example below:
@code{.cpp}
class CV_EXPORTS_W UMat
{
public:
//! Mat is mappable to UMat.
// You would need to provide `static bool cv_mappable_to(const Ptr<Mat>& src, Ptr<UMat>& dst)`
CV_WRAP_MAPPABLE(Ptr<Mat>);
/! returns the OpenCL queue used by OpenCV UMat.
// You would need to provide the method body in the binder code
CV_WRAP_PHANTOM(static void* queue());
//! returns the OpenCL context used by OpenCV UMat
// You would need to provide the method body in the binder code
CV_WRAP_PHANTOM(static void* context());
//! The wrapped method become equivalent to `get(int flags = ACCESS_RW)`
CV_WRAP_AS(get) Mat getMat(int flags CV_WRAP_DEFAULT(ACCESS_RW)) const;
};
@endcode

View File

@ -0,0 +1,8 @@
OpenCV-Python Bindings {#tutorial_py_table_of_contents_bindings}
======================
Here, you will learn how OpenCV-Python bindings are generated.
- @subpage tutorial_py_bindings_basics
Learn how OpenCV-Python bindings are generated.

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

View File

@ -0,0 +1,225 @@
Camera Calibration {#tutorial_py_calibration}
==================
Goal
----
In this section, we will learn about
* types of distortion caused by cameras
* how to find the intrinsic and extrinsic properties of a camera
* how to undistort images based off these properties
Basics
------
Some pinhole cameras introduce significant distortion to images. Two major kinds of distortion are
radial distortion and tangential distortion.
Radial distortion causes straight lines to appear curved. Radial distortion becomes larger the farther points are from
the center of the image. For example, one image is shown below in which two edges of a chess board are
marked with red lines. But, you can see that the border of the chess board is not a straight line and doesn't match with the
red line. All the expected straight lines are bulged out. Visit [Distortion
(optics)](http://en.wikipedia.org/wiki/Distortion_%28optics%29) for more details.
![image](images/calib_radial.jpg)
Radial distortion can be represented as follows:
\f[x_{distorted} = x( 1 + k_1 r^2 + k_2 r^4 + k_3 r^6) \\
y_{distorted} = y( 1 + k_1 r^2 + k_2 r^4 + k_3 r^6)\f]
Similarly, tangential distortion occurs because the image-taking lense
is not aligned perfectly parallel to the imaging plane. So, some areas in the image may look nearer than
expected. The amount of tangential distortion can be represented as below:
\f[x_{distorted} = x + [ 2p_1xy + p_2(r^2+2x^2)] \\
y_{distorted} = y + [ p_1(r^2+ 2y^2)+ 2p_2xy]\f]
In short, we need to find five parameters, known as distortion coefficients given by:
\f[Distortion \; coefficients=(k_1 \hspace{10pt} k_2 \hspace{10pt} p_1 \hspace{10pt} p_2 \hspace{10pt} k_3)\f]
In addition to this, we need to some other information, like the intrinsic and extrinsic parameters
of the camera. Intrinsic parameters are specific to a camera. They include information like focal
length (\f$f_x,f_y\f$) and optical centers (\f$c_x, c_y\f$). The focal length and optical centers can be used to create a camera matrix, which can be used to remove distortion due to the lenses of a specific camera. The camera matrix is unique to a specific camera, so once calculated, it can be reused on other images taken by the same camera. It is expressed as a 3x3
matrix:
\f[camera \; matrix = \left [ \begin{matrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{matrix} \right ]\f]
Extrinsic parameters corresponds to rotation and translation vectors which translates a coordinates
of a 3D point to a coordinate system.
For stereo applications, these distortions need to be corrected first. To find these parameters,
we must provide some sample images of a well defined pattern (e.g. a chess board). We
find some specific points of which we already know the relative positions (e.g. square corners in the chess board). We know the coordinates of these points in real world space and we know the coordinates in the image, so we can solve for the distortion coefficients. For better results, we need at least 10 test patterns.
Code
----
As mentioned above, we need at least 10 test patterns for camera calibration. OpenCV comes with some
images of a chess board (see samples/data/left01.jpg -- left14.jpg), so we will utilize these. Consider an image of a chess board. The important input data needed for calibration of the camera
is the set of 3D real world points and the corresponding 2D coordinates of these points in the image. 2D image points
are OK which we can easily find from the image. (These image points are locations where two black
squares touch each other in chess boards)
What about the 3D points from real world space? Those images are taken from a static camera and
chess boards are placed at different locations and orientations. So we need to know \f$(X,Y,Z)\f$
values. But for simplicity, we can say chess board was kept stationary at XY plane, (so Z=0 always)
and camera was moved accordingly. This consideration helps us to find only X,Y values. Now for X,Y
values, we can simply pass the points as (0,0), (1,0), (2,0), ... which denotes the location of
points. In this case, the results we get will be in the scale of size of chess board square. But if
we know the square size, (say 30 mm), we can pass the values as (0,0), (30,0), (60,0), ... . Thus, we get
the results in mm. (In this case, we don't know square size since we didn't take those images, so we
pass in terms of square size).
3D points are called **object points** and 2D image points are called **image points.**
### Setup
So to find pattern in chess board, we can use the function, **cv.findChessboardCorners()**. We also
need to pass what kind of pattern we are looking for, like 8x8 grid, 5x5 grid etc. In this example, we
use 7x6 grid. (Normally a chess board has 8x8 squares and 7x7 internal corners). It returns the
corner points and retval which will be True if pattern is obtained. These corners will be placed in
an order (from left-to-right, top-to-bottom)
@note This function may not be able to find the required pattern in all the images. So, one good option
is to write the code such that, it starts the camera and check each frame for required pattern. Once
the pattern is obtained, find the corners and store it in a list. Also, provide some interval before
reading next frame so that we can adjust our chess board in different direction. Continue this
process until the required number of good patterns are obtained. Even in the example provided here, we
are not sure how many images out of the 14 given are good. Thus, we must read all the images and take only the good
ones.
@note Instead of chess board, we can alternatively use a circular grid. In this case, we must use the function
**cv.findCirclesGrid()** to find the pattern. Fewer images are sufficient to perform camera calibration using a circular grid.
Once we find the corners, we can increase their accuracy using **cv.cornerSubPix()**. We can also
draw the pattern using **cv.drawChessboardCorners()**. All these steps are included in below code:
@code{.py}
import numpy as np
import cv2 as cv
import glob
# termination criteria
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((6*7,3), np.float32)
objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.
images = glob.glob('*.jpg')
for fname in images:
img = cv.imread(fname)
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv.findChessboardCorners(gray, (7,6), None)
# If found, add object points, image points (after refining them)
if ret == True:
objpoints.append(objp)
corners2 = cv.cornerSubPix(gray,corners, (11,11), (-1,-1), criteria)
imgpoints.append(corners)
# Draw and display the corners
cv.drawChessboardCorners(img, (7,6), corners2, ret)
cv.imshow('img', img)
cv.waitKey(500)
cv.destroyAllWindows()
@endcode
One image with pattern drawn on it is shown below:
![image](images/calib_pattern.jpg)
### Calibration
Now that we have our object points and image points, we are ready to go for calibration. We can
use the function, **cv.calibrateCamera()** which returns the camera matrix, distortion coefficients,
rotation and translation vectors etc.
@code{.py}
ret, mtx, dist, rvecs, tvecs = cv.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)
@endcode
### Undistortion
Now, we can take an image and undistort it. OpenCV comes with two
methods for doing this. However first, we can refine the camera matrix based on a free scaling
parameter using **cv.getOptimalNewCameraMatrix()**. If the scaling parameter alpha=0, it returns
undistorted image with minimum unwanted pixels. So it may even remove some pixels at image corners.
If alpha=1, all pixels are retained with some extra black images. This function also returns an image ROI which
can be used to crop the result.
So, we take a new image (left12.jpg in this case. That is the first image in this chapter)
@code{.py}
img = cv.imread('left12.jpg')
h, w = img.shape[:2]
newcameramtx, roi = cv.getOptimalNewCameraMatrix(mtx, dist, (w,h), 1, (w,h))
@endcode
#### 1. Using **cv.undistort()**
This is the easiest way. Just call the function and use ROI obtained above to crop the result.
@code{.py}
# undistort
dst = cv.undistort(img, mtx, dist, None, newcameramtx)
# crop the image
x, y, w, h = roi
dst = dst[y:y+h, x:x+w]
cv.imwrite('calibresult.png', dst)
@endcode
#### 2. Using **remapping**
This way is a little bit more difficult. First, find a mapping function from the distorted image to the undistorted image. Then
use the remap function.
@code{.py}
# undistort
mapx, mapy = cv.initUndistortRectifyMap(mtx, dist, None, newcameramtx, (w,h), 5)
dst = cv.remap(img, mapx, mapy, cv.INTER_LINEAR)
# crop the image
x, y, w, h = roi
dst = dst[y:y+h, x:x+w]
cv.imwrite('calibresult.png', dst)
@endcode
Still, both the methods give the same result. See the result below:
![image](images/calib_result.jpg)
You can see in the result that all the edges are straight.
Now you can store the camera matrix and distortion coefficients using write functions in NumPy
(np.savez, np.savetxt etc) for future uses.
Re-projection Error
-------------------
Re-projection error gives a good estimation of just how exact the found parameters are. The closer the re-projection error is to zero, the more accurate the parameters we found are. Given the intrinsic, distortion, rotation and translation matrices,
we must first transform the object point to image point using **cv.projectPoints()**. Then, we can calculate
the absolute norm between what we got with our transformation and the corner finding algorithm. To
find the average error, we calculate the arithmetical mean of the errors calculated for all the
calibration images.
@code{.py}
mean_error = 0
for i in range(len(objpoints)):
imgpoints2, _ = cv.projectPoints(objpoints[i], rvecs[i], tvecs[i], mtx, dist)
error = cv.norm(imgpoints[i], imgpoints2, cv.NORM_L2)/len(imgpoints2)
mean_error += error
print( "total error: {}".format(mean_error/len(objpoints)) )
@endcode
Additional Resources
--------------------
Exercises
---------
-# Try camera calibration with circular grid.

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

View File

@ -0,0 +1,75 @@
Depth Map from Stereo Images {#tutorial_py_depthmap}
============================
Goal
----
In this session,
- We will learn to create a depth map from stereo images.
Basics
------
In the last session, we saw basic concepts like epipolar constraints and other related terms. We also
saw that if we have two images of same scene, we can get depth information from that in an intuitive
way. Below is an image and some simple mathematical formulas which prove that intuition. (Image
Courtesy :
![image](images/stereo_depth.jpg)
The above diagram contains equivalent triangles. Writing their equivalent equations will yield us
following result:
\f[disparity = x - x' = \frac{Bf}{Z}\f]
\f$x\f$ and \f$x'\f$ are the distance between points in image plane corresponding to the scene point 3D and
their camera center. \f$B\f$ is the distance between two cameras (which we know) and \f$f\f$ is the focal
length of camera (already known). So in short, the above equation says that the depth of a point in a
scene is inversely proportional to the difference in distance of corresponding image points and
their camera centers. So with this information, we can derive the depth of all pixels in an image.
So it finds corresponding matches between two images. We have already seen how epiline constraint
make this operation faster and accurate. Once it finds matches, it finds the disparity. Let's see
how we can do it with OpenCV.
Code
----
Below code snippet shows a simple procedure to create a disparity map.
@code{.py}
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
imgL = cv.imread('tsukuba_l.png',0)
imgR = cv.imread('tsukuba_r.png',0)
stereo = cv.StereoBM_create(numDisparities=16, blockSize=15)
disparity = stereo.compute(imgL,imgR)
plt.imshow(disparity,'gray')
plt.show()
@endcode
Below image contains the original image (left) and its disparity map (right). As you can see, the result
is contaminated with high degree of noise. By adjusting the values of numDisparities and blockSize,
you can get a better result.
![image](images/disparity_map.jpg)
There are some parameters when you get familiar with StereoBM, and you may need to fine tune the parameters to get better and smooth results. Parameters:
- texture_threshold: filters out areas that don't have enough texture for reliable matching
- Speckle range and size: Block-based matchers often produce "speckles" near the boundaries of objects, where the matching window catches the foreground on one side and the background on the other. In this scene it appears that the matcher is also finding small spurious matches in the projected texture on the table. To get rid of these artifacts we post-process the disparity image with a speckle filter controlled by the speckle_size and speckle_range parameters. speckle_size is the number of pixels below which a disparity blob is dismissed as "speckle." speckle_range controls how close in value disparities must be to be considered part of the same blob.
- Number of disparities: How many pixels to slide the window over. The larger it is, the larger the range of visible depths, but more computation is required.
- min_disparity: the offset from the x-position of the left pixel at which to begin searching.
- uniqueness_ratio: Another post-filtering step. If the best matching disparity is not sufficiently better than every other disparity in the search range, the pixel is filtered out. You can try tweaking this if texture_threshold and the speckle filtering are still letting through spurious matches.
- prefilter_size and prefilter_cap: The pre-filtering phase, which normalizes image brightness and enhances texture in preparation for block matching. Normally you should not need to adjust these.
Additional Resources
--------------------
- [Ros stereo img processing wiki page](http://wiki.ros.org/stereo_image_proc/Tutorials/ChoosingGoodStereoParameters)
Exercises
---------
-# OpenCV samples contain an example of generating disparity map and its 3D reconstruction. Check
stereo_match.py in OpenCV-Python samples.

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

View File

@ -0,0 +1,172 @@
Epipolar Geometry {#tutorial_py_epipolar_geometry}
=================
Goal
----
In this section,
- We will learn about the basics of multiview geometry
- We will see what is epipole, epipolar lines, epipolar constraint etc.
Basic Concepts
--------------
When we take an image using pin-hole camera, we loose an important information, ie depth of the
image. Or how far is each point in the image from the camera because it is a 3D-to-2D conversion. So
it is an important question whether we can find the depth information using these cameras. And the
answer is to use more than one camera. Our eyes works in similar way where we use two cameras (two
eyes) which is called stereo vision. So let's see what OpenCV provides in this field.
(*Learning OpenCV* by Gary Bradsky has a lot of information in this field.)
Before going to depth images, let's first understand some basic concepts in multiview geometry. In
this section we will deal with epipolar geometry. See the image below which shows a basic setup with
two cameras taking the image of same scene.
![image](images/epipolar.jpg)
If we are using only the left camera, we can't find the 3D point corresponding to the point \f$x\f$ in
image because every point on the line \f$OX\f$ projects to the same point on the image plane. But
consider the right image also. Now different points on the line \f$OX\f$ projects to different points
(\f$x'\f$) in right plane. So with these two images, we can triangulate the correct 3D point. This is
the whole idea.
The projection of the different points on \f$OX\f$ form a line on right plane (line \f$l'\f$). We call it
**epiline** corresponding to the point \f$x\f$. It means, to find the point \f$x\f$ on the right image,
search along this epiline. It should be somewhere on this line (Think of it this way, to find the
matching point in other image, you need not search the whole image, just search along the epiline.
So it provides better performance and accuracy). This is called **Epipolar Constraint**. Similarly
all points will have its corresponding epilines in the other image. The plane \f$XOO'\f$ is called
**Epipolar Plane**.
\f$O\f$ and \f$O'\f$ are the camera centers. From the setup given above, you can see that projection of
right camera \f$O'\f$ is seen on the left image at the point, \f$e\f$. It is called the **epipole**. Epipole
is the point of intersection of line through camera centers and the image planes. Similarly \f$e'\f$ is
the epipole of the left camera. In some cases, you won't be able to locate the epipole in the image,
they may be outside the image (which means, one camera doesn't see the other).
All the epilines pass through its epipole. So to find the location of epipole, we can find many
epilines and find their intersection point.
So in this session, we focus on finding epipolar lines and epipoles. But to find them, we need two
more ingredients, **Fundamental Matrix (F)** and **Essential Matrix (E)**. Essential Matrix contains
the information about translation and rotation, which describe the location of the second camera
relative to the first in global coordinates. See the image below (Image courtesy: Learning OpenCV by
Gary Bradsky):
![image](images/essential_matrix.jpg)
But we prefer measurements to be done in pixel coordinates, right? Fundamental Matrix contains the
same information as Essential Matrix in addition to the information about the intrinsics of both
cameras so that we can relate the two cameras in pixel coordinates. (If we are using rectified
images and normalize the point by dividing by the focal lengths, \f$F=E\f$). In simple words,
Fundamental Matrix F, maps a point in one image to a line (epiline) in the other image. This is
calculated from matching points from both the images. A minimum of 8 such points are required to
find the fundamental matrix (while using 8-point algorithm). More points are preferred and use
RANSAC to get a more robust result.
Code
----
So first we need to find as many possible matches between two images to find the fundamental matrix.
For this, we use SIFT descriptors with FLANN based matcher and ratio test.
@code{.py}
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img1 = cv.imread('myleft.jpg',0) #queryimage # left image
img2 = cv.imread('myright.jpg',0) #trainimage # right image
sift = cv.SIFT_create()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
# FLANN parameters
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50)
flann = cv.FlannBasedMatcher(index_params,search_params)
matches = flann.knnMatch(des1,des2,k=2)
pts1 = []
pts2 = []
# ratio test as per Lowe's paper
for i,(m,n) in enumerate(matches):
if m.distance < 0.8*n.distance:
pts2.append(kp2[m.trainIdx].pt)
pts1.append(kp1[m.queryIdx].pt)
@endcode
Now we have the list of best matches from both the images. Let's find the Fundamental Matrix.
@code{.py}
pts1 = np.int32(pts1)
pts2 = np.int32(pts2)
F, mask = cv.findFundamentalMat(pts1,pts2,cv.FM_LMEDS)
# We select only inlier points
pts1 = pts1[mask.ravel()==1]
pts2 = pts2[mask.ravel()==1]
@endcode
Next we find the epilines. Epilines corresponding to the points in first image is drawn on second
image. So mentioning of correct images are important here. We get an array of lines. So we define a
new function to draw these lines on the images.
@code{.py}
def drawlines(img1,img2,lines,pts1,pts2):
''' img1 - image on which we draw the epilines for the points in img2
lines - corresponding epilines '''
r,c = img1.shape
img1 = cv.cvtColor(img1,cv.COLOR_GRAY2BGR)
img2 = cv.cvtColor(img2,cv.COLOR_GRAY2BGR)
for r,pt1,pt2 in zip(lines,pts1,pts2):
color = tuple(np.random.randint(0,255,3).tolist())
x0,y0 = map(int, [0, -r[2]/r[1] ])
x1,y1 = map(int, [c, -(r[2]+r[0]*c)/r[1] ])
img1 = cv.line(img1, (x0,y0), (x1,y1), color,1)
img1 = cv.circle(img1,tuple(pt1),5,color,-1)
img2 = cv.circle(img2,tuple(pt2),5,color,-1)
return img1,img2
@endcode
Now we find the epilines in both the images and draw them.
@code{.py}
# Find epilines corresponding to points in right image (second image) and
# drawing its lines on left image
lines1 = cv.computeCorrespondEpilines(pts2.reshape(-1,1,2), 2,F)
lines1 = lines1.reshape(-1,3)
img5,img6 = drawlines(img1,img2,lines1,pts1,pts2)
# Find epilines corresponding to points in left image (first image) and
# drawing its lines on right image
lines2 = cv.computeCorrespondEpilines(pts1.reshape(-1,1,2), 1,F)
lines2 = lines2.reshape(-1,3)
img3,img4 = drawlines(img2,img1,lines2,pts2,pts1)
plt.subplot(121),plt.imshow(img5)
plt.subplot(122),plt.imshow(img3)
plt.show()
@endcode
Below is the result we get:
![image](images/epiresult.jpg)
You can see in the left image that all epilines are converging at a point outside the image at right
side. That meeting point is the epipole.
For better results, images with good resolution and many non-planar points should be used.
Additional Resources
--------------------
Exercises
---------
-# One important topic is the forward movement of camera. Then epipoles will be seen at the same
locations in both with epilines emerging from a fixed point. [See this
discussion](http://answers.opencv.org/question/17912/location-of-epipole/).
2. Fundamental Matrix estimation is sensitive to quality of matches, outliers etc. It becomes worse
when all selected matches lie on the same plane. [Check this
discussion](http://answers.opencv.org/question/18125/epilines-not-correct/).

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

View File

@ -0,0 +1,127 @@
Pose Estimation {#tutorial_py_pose}
===============
Goal
----
In this section,
- We will learn to exploit calib3d module to create some 3D effects in images.
Basics
------
This is going to be a small section. During the last session on camera calibration, you have found
the camera matrix, distortion coefficients etc. Given a pattern image, we can utilize the above
information to calculate its pose, or how the object is situated in space, like how it is rotated,
how it is displaced etc. For a planar object, we can assume Z=0, such that, the problem now becomes
how camera is placed in space to see our pattern image. So, if we know how the object lies in the
space, we can draw some 2D diagrams in it to simulate the 3D effect. Let's see how to do it.
Our problem is, we want to draw our 3D coordinate axis (X, Y, Z axes) on our chessboard's first
corner. X axis in blue color, Y axis in green color and Z axis in red color. So in-effect, Z axis
should feel like it is perpendicular to our chessboard plane.
First, let's load the camera matrix and distortion coefficients from the previous calibration
result.
@code{.py}
import numpy as np
import cv2 as cv
import glob
# Load previously saved data
with np.load('B.npz') as X:
mtx, dist, _, _ = [X[i] for i in ('mtx','dist','rvecs','tvecs')]
@endcode
Now let's create a function, draw which takes the corners in the chessboard (obtained using
**cv.findChessboardCorners()**) and **axis points** to draw a 3D axis.
@code{.py}
def draw(img, corners, imgpts):
corner = tuple(corners[0].ravel())
img = cv.line(img, corner, tuple(imgpts[0].ravel()), (255,0,0), 5)
img = cv.line(img, corner, tuple(imgpts[1].ravel()), (0,255,0), 5)
img = cv.line(img, corner, tuple(imgpts[2].ravel()), (0,0,255), 5)
return img
@endcode
Then as in previous case, we create termination criteria, object points (3D points of corners in
chessboard) and axis points. Axis points are points in 3D space for drawing the axis. We draw axis
of length 3 (units will be in terms of chess square size since we calibrated based on that size). So
our X axis is drawn from (0,0,0) to (3,0,0), so for Y axis. For Z axis, it is drawn from (0,0,0) to
(0,0,-3). Negative denotes it is drawn towards the camera.
@code{.py}
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 30, 0.001)
objp = np.zeros((6*7,3), np.float32)
objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2)
axis = np.float32([[3,0,0], [0,3,0], [0,0,-3]]).reshape(-1,3)
@endcode
Now, as usual, we load each image. Search for 7x6 grid. If found, we refine it with subcorner
pixels. Then to calculate the rotation and translation, we use the function,
**cv.solvePnPRansac()**. Once we those transformation matrices, we use them to project our **axis
points** to the image plane. In simple words, we find the points on image plane corresponding to
each of (3,0,0),(0,3,0),(0,0,3) in 3D space. Once we get them, we draw lines from the first corner
to each of these points using our draw() function. Done !!!
@code{.py}
for fname in glob.glob('left*.jpg'):
img = cv.imread(fname)
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
ret, corners = cv.findChessboardCorners(gray, (7,6),None)
if ret == True:
corners2 = cv.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
# Find the rotation and translation vectors.
ret,rvecs, tvecs = cv.solvePnP(objp, corners2, mtx, dist)
# project 3D points to image plane
imgpts, jac = cv.projectPoints(axis, rvecs, tvecs, mtx, dist)
img = draw(img,corners2,imgpts)
cv.imshow('img',img)
k = cv.waitKey(0) & 0xFF
if k == ord('s'):
cv.imwrite(fname[:6]+'.png', img)
cv.destroyAllWindows()
@endcode
See some results below. Notice that each axis is 3 squares long.:
![image](images/pose_1.jpg)
### Render a Cube
If you want to draw a cube, modify the draw() function and axis points as follows.
Modified draw() function:
@code{.py}
def draw(img, corners, imgpts):
imgpts = np.int32(imgpts).reshape(-1,2)
# draw ground floor in green
img = cv.drawContours(img, [imgpts[:4]],-1,(0,255,0),-3)
# draw pillars in blue color
for i,j in zip(range(4),range(4,8)):
img = cv.line(img, tuple(imgpts[i]), tuple(imgpts[j]),(255),3)
# draw top layer in red color
img = cv.drawContours(img, [imgpts[4:]],-1,(0,0,255),3)
return img
@endcode
Modified axis points. They are the 8 corners of a cube in 3D space:
@code{.py}
axis = np.float32([[0,0,0], [0,3,0], [3,3,0], [3,0,0],
[0,0,-3],[0,3,-3],[3,3,-3],[3,0,-3] ])
@endcode
And look at the result below:
![image](images/pose_2.jpg)
If you are interested in graphics, augmented reality etc, you can use OpenGL to render more
complicated figures.
Additional Resources
--------------------
Exercises
---------

View File

@ -0,0 +1,22 @@
Camera Calibration and 3D Reconstruction {#tutorial_py_table_of_contents_calib3d}
========================================
- @subpage tutorial_py_calibration
Let's find how good
is our camera. Is there any distortion in images taken with it? If so how to correct it?
- @subpage tutorial_py_pose
This is a small
section which will help you to create some cool 3D effects with calib module.
- @subpage tutorial_py_epipolar_geometry
Let's understand
epipolar geometry and epipolar constraint.
- @subpage tutorial_py_depthmap
Extract depth
information from 2D images.

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

View File

@ -0,0 +1,201 @@
Basic Operations on Images {#tutorial_py_basic_ops}
==========================
Goal
----
Learn to:
- Access pixel values and modify them
- Access image properties
- Set a Region of Interest (ROI)
- Split and merge images
Almost all the operations in this section are mainly related to Numpy rather than OpenCV. A good
knowledge of Numpy is required to write better optimized code with OpenCV.
*( Examples will be shown in a Python terminal, since most of them are just single lines of code )*
Accessing and Modifying pixel values
------------------------------------
Let's load a color image first:
@code{.py}
>>> import numpy as np
>>> import cv2 as cv
>>> img = cv.imread('messi5.jpg')
@endcode
You can access a pixel value by its row and column coordinates. For BGR image, it returns an array
of Blue, Green, Red values. For grayscale image, just corresponding intensity is returned.
@code{.py}
>>> px = img[100,100]
>>> print( px )
[157 166 200]
# accessing only blue pixel
>>> blue = img[100,100,0]
>>> print( blue )
157
@endcode
You can modify the pixel values the same way.
@code{.py}
>>> img[100,100] = [255,255,255]
>>> print( img[100,100] )
[255 255 255]
@endcode
**Warning**
Numpy is an optimized library for fast array calculations. So simply accessing each and every pixel
value and modifying it will be very slow and it is discouraged.
@note The above method is normally used for selecting a region of an array, say the first 5 rows
and last 3 columns. For individual pixel access, the Numpy array methods, array.item() and
array.itemset() are considered better. They always return a scalar, however, so if you want to access
all the B,G,R values, you will need to call array.item() separately for each value.
Better pixel accessing and editing method :
@code{.py}
# accessing RED value
>>> img.item(10,10,2)
59
# modifying RED value
>>> img.itemset((10,10,2),100)
>>> img.item(10,10,2)
100
@endcode
Accessing Image Properties
--------------------------
Image properties include number of rows, columns, and channels; type of image data; number of pixels; etc.
The shape of an image is accessed by img.shape. It returns a tuple of the number of rows, columns, and channels
(if the image is color):
@code{.py}
>>> print( img.shape )
(342, 548, 3)
@endcode
@note If an image is grayscale, the tuple returned contains only the number of rows
and columns, so it is a good method to check whether the loaded image is grayscale or color.
Total number of pixels is accessed by `img.size`:
@code{.py}
>>> print( img.size )
562248
@endcode
Image datatype is obtained by \`img.dtype\`:
@code{.py}
>>> print( img.dtype )
uint8
@endcode
@note img.dtype is very important while debugging because a large number of errors in OpenCV-Python
code are caused by invalid datatype.
Image ROI
---------
Sometimes, you will have to play with certain regions of images. For eye detection in images, first
face detection is done over the entire image. When a face is obtained, we select the face region alone
and search for eyes inside it instead of searching the whole image. It improves accuracy (because eyes
are always on faces :D ) and performance (because we search in a small area).
ROI is again obtained using Numpy indexing. Here I am selecting the ball and copying it to another
region in the image:
@code{.py}
>>> ball = img[280:340, 330:390]
>>> img[273:333, 100:160] = ball
@endcode
Check the results below:
![image](images/roi.jpg)
Splitting and Merging Image Channels
------------------------------------
Sometimes you will need to work separately on the B,G,R channels of an image. In this case, you need
to split the BGR image into single channels. In other cases, you may need to join these individual
channels to create a BGR image. You can do this simply by:
@code{.py}
>>> b,g,r = cv.split(img)
>>> img = cv.merge((b,g,r))
@endcode
Or
@code
>>> b = img[:,:,0]
@endcode
Suppose you want to set all the red pixels to zero - you do not need to split the channels first.
Numpy indexing is faster:
@code{.py}
>>> img[:,:,2] = 0
@endcode
**Warning**
cv.split() is a costly operation (in terms of time). So use it only if necessary. Otherwise go
for Numpy indexing.
Making Borders for Images (Padding)
-----------------------------------
If you want to create a border around an image, something like a photo frame, you can use
**cv.copyMakeBorder()**. But it has more applications for convolution operation, zero
padding etc. This function takes following arguments:
- **src** - input image
- **top**, **bottom**, **left**, **right** - border width in number of pixels in corresponding
directions
- **borderType** - Flag defining what kind of border to be added. It can be following types:
- **cv.BORDER_CONSTANT** - Adds a constant colored border. The value should be given
as next argument.
- **cv.BORDER_REFLECT** - Border will be mirror reflection of the border elements,
like this : *fedcba|abcdefgh|hgfedcb*
- **cv.BORDER_REFLECT_101** or **cv.BORDER_DEFAULT** - Same as above, but with a
slight change, like this : *gfedcb|abcdefgh|gfedcba*
- **cv.BORDER_REPLICATE** - Last element is replicated throughout, like this:
*aaaaaa|abcdefgh|hhhhhhh*
- **cv.BORDER_WRAP** - Can't explain, it will look like this :
*cdefgh|abcdefgh|abcdefg*
- **value** - Color of border if border type is cv.BORDER_CONSTANT
Below is a sample code demonstrating all these border types for better understanding:
@code{.py}
import cv2 as cv
import numpy as np
from matplotlib import pyplot as plt
BLUE = [255,0,0]
img1 = cv.imread('opencv-logo.png')
replicate = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_REPLICATE)
reflect = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_REFLECT)
reflect101 = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_REFLECT_101)
wrap = cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_WRAP)
constant= cv.copyMakeBorder(img1,10,10,10,10,cv.BORDER_CONSTANT,value=BLUE)
plt.subplot(231),plt.imshow(img1,'gray'),plt.title('ORIGINAL')
plt.subplot(232),plt.imshow(replicate,'gray'),plt.title('REPLICATE')
plt.subplot(233),plt.imshow(reflect,'gray'),plt.title('REFLECT')
plt.subplot(234),plt.imshow(reflect101,'gray'),plt.title('REFLECT_101')
plt.subplot(235),plt.imshow(wrap,'gray'),plt.title('WRAP')
plt.subplot(236),plt.imshow(constant,'gray'),plt.title('CONSTANT')
plt.show()
@endcode
See the result below. (Image is displayed with matplotlib. So RED and BLUE channels will be
interchanged):
![image](images/border.jpg)
Additional Resources
--------------------
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

View File

@ -0,0 +1,116 @@
Arithmetic Operations on Images {#tutorial_py_image_arithmetics}
===============================
Goal
----
- Learn several arithmetic operations on images, like addition, subtraction, bitwise operations, and etc.
- Learn these functions: **cv.add()**, **cv.addWeighted()**, etc.
Image Addition
--------------
You can add two images with the OpenCV function, cv.add(), or simply by the numpy operation
res = img1 + img2. Both images should be of same depth and type, or the second image can just be a
scalar value.
@note There is a difference between OpenCV addition and Numpy addition. OpenCV addition is a
saturated operation while Numpy addition is a modulo operation.
For example, consider the below sample:
@code{.py}
>>> x = np.uint8([250])
>>> y = np.uint8([10])
>>> print( cv.add(x,y) ) # 250+10 = 260 => 255
[[255]]
>>> print( x+y ) # 250+10 = 260 % 256 = 4
[4]
@endcode
This will be more visible when you add two images. Stick with OpenCV functions, because they will provide a better result.
Image Blending
--------------
This is also image addition, but different weights are given to images in order to give a feeling of
blending or transparency. Images are added as per the equation below:
\f[g(x) = (1 - \alpha)f_{0}(x) + \alpha f_{1}(x)\f]
By varying \f$\alpha\f$ from \f$0 \rightarrow 1\f$, you can perform a cool transition between one image to
another.
Here I took two images to blend together. The first image is given a weight of 0.7 and the second image
is given 0.3. cv.addWeighted() applies the following equation to the image:
\f[dst = \alpha \cdot img1 + \beta \cdot img2 + \gamma\f]
Here \f$\gamma\f$ is taken as zero.
@code{.py}
img1 = cv.imread('ml.png')
img2 = cv.imread('opencv-logo.png')
dst = cv.addWeighted(img1,0.7,img2,0.3,0)
cv.imshow('dst',dst)
cv.waitKey(0)
cv.destroyAllWindows()
@endcode
Check the result below:
![image](images/blending.jpg)
Bitwise Operations
------------------
This includes the bitwise AND, OR, NOT, and XOR operations. They will be highly useful while extracting
any part of the image (as we will see in coming chapters), defining and working with non-rectangular
ROI's, and etc. Below we will see an example of how to change a particular region of an image.
I want to put the OpenCV logo above an image. If I add two images, it will change the color. If I blend them,
I get a transparent effect. But I want it to be opaque. If it was a rectangular region, I could use
ROI as we did in the last chapter. But the OpenCV logo is a not a rectangular shape. So you can do it with
bitwise operations as shown below:
@code{.py}
# Load two images
img1 = cv.imread('messi5.jpg')
img2 = cv.imread('opencv-logo-white.png')
# I want to put logo on top-left corner, So I create a ROI
rows,cols,channels = img2.shape
roi = img1[0:rows, 0:cols]
# Now create a mask of logo and create its inverse mask also
img2gray = cv.cvtColor(img2,cv.COLOR_BGR2GRAY)
ret, mask = cv.threshold(img2gray, 10, 255, cv.THRESH_BINARY)
mask_inv = cv.bitwise_not(mask)
# Now black-out the area of logo in ROI
img1_bg = cv.bitwise_and(roi,roi,mask = mask_inv)
# Take only region of logo from logo image.
img2_fg = cv.bitwise_and(img2,img2,mask = mask)
# Put logo in ROI and modify the main image
dst = cv.add(img1_bg,img2_fg)
img1[0:rows, 0:cols ] = dst
cv.imshow('res',img1)
cv.waitKey(0)
cv.destroyAllWindows()
@endcode
See the result below. Left image shows the mask we created. Right image shows the final result. For
more understanding, display all the intermediate images in the above code, especially img1_bg and
img2_fg.
![image](images/overlay.jpg)
Additional Resources
--------------------
Exercises
---------
-# Create a slide show of images in a folder with smooth transition between images using
cv.addWeighted function

View File

@ -0,0 +1,167 @@
Performance Measurement and Improvement Techniques {#tutorial_py_optimization}
==================================================
Goal
----
In image processing, since you are dealing with a large number of operations per second, it is mandatory that your code is not only providing the correct solution, but that it is also providing it in the fastest manner.
So in this chapter, you will learn:
- To measure the performance of your code.
- Some tips to improve the performance of your code.
- You will see these functions: **cv.getTickCount**, **cv.getTickFrequency**, etc.
Apart from OpenCV, Python also provides a module **time** which is helpful in measuring the time of
execution. Another module **profile** helps to get a detailed report on the code, like how much time
each function in the code took, how many times the function was called, etc. But, if you are using
IPython, all these features are integrated in an user-friendly manner. We will see some important
ones, and for more details, check links in the **Additional Resources** section.
Measuring Performance with OpenCV
---------------------------------
The **cv.getTickCount** function returns the number of clock-cycles after a reference event (like the
moment the machine was switched ON) to the moment this function is called. So if you call it before and
after the function execution, you get the number of clock-cycles used to execute a function.
The **cv.getTickFrequency** function returns the frequency of clock-cycles, or the number of
clock-cycles per second. So to find the time of execution in seconds, you can do following:
@code{.py}
e1 = cv.getTickCount()
# your code execution
e2 = cv.getTickCount()
time = (e2 - e1)/ cv.getTickFrequency()
@endcode
We will demonstrate with following example. The following example applies median filtering with kernels
of odd sizes ranging from 5 to 49. (Don't worry about what the result will look like - that is not our
goal):
@code{.py}
img1 = cv.imread('messi5.jpg')
e1 = cv.getTickCount()
for i in range(5,49,2):
img1 = cv.medianBlur(img1,i)
e2 = cv.getTickCount()
t = (e2 - e1)/cv.getTickFrequency()
print( t )
# Result I got is 0.521107655 seconds
@endcode
@note You can do the same thing with the time module. Instead of cv.getTickCount, use the time.time() function.
Then take the difference of the two times.
Default Optimization in OpenCV
------------------------------
Many of the OpenCV functions are optimized using SSE2, AVX, etc. It contains the unoptimized code also.
So if our system support these features, we should exploit them (almost all modern day processors
support them). It is enabled by default while compiling. So OpenCV runs the optimized code if it is
enabled, otherwise it runs the unoptimized code. You can use **cv.useOptimized()** to check if it is
enabled/disabled and **cv.setUseOptimized()** to enable/disable it. Let's see a simple example.
@code{.py}
# check if optimization is enabled
In [5]: cv.useOptimized()
Out[5]: True
In [6]: %timeit res = cv.medianBlur(img,49)
10 loops, best of 3: 34.9 ms per loop
# Disable it
In [7]: cv.setUseOptimized(False)
In [8]: cv.useOptimized()
Out[8]: False
In [9]: %timeit res = cv.medianBlur(img,49)
10 loops, best of 3: 64.1 ms per loop
@endcode
As you can see, optimized median filtering is \~2x faster than the unoptimized version. If you check its source,
you can see that median filtering is SIMD optimized. So you can use this to enable optimization at the
top of your code (remember it is enabled by default).
Measuring Performance in IPython
--------------------------------
Sometimes you may need to compare the performance of two similar operations. IPython gives you a
magic command %timeit to perform this. It runs the code several times to get more accurate results.
Once again, it is suitable to measuring single lines of code.
For example, do you know which of the following addition operations is better, x = 5; y = x\*\*2,
x = 5; y = x\*x, x = np.uint8([5]); y = x\*x, or y = np.square(x)? We will find out with %timeit in the
IPython shell.
@code{.py}
In [10]: x = 5
In [11]: %timeit y=x**2
10000000 loops, best of 3: 73 ns per loop
In [12]: %timeit y=x*x
10000000 loops, best of 3: 58.3 ns per loop
In [15]: z = np.uint8([5])
In [17]: %timeit y=z*z
1000000 loops, best of 3: 1.25 us per loop
In [19]: %timeit y=np.square(z)
1000000 loops, best of 3: 1.16 us per loop
@endcode
You can see that, x = 5 ; y = x\*x is fastest and it is around 20x faster compared to Numpy. If you
consider the array creation also, it may reach up to 100x faster. Cool, right? *(Numpy devs are
working on this issue)*
@note Python scalar operations are faster than Numpy scalar operations. So for operations including
one or two elements, Python scalar is better than Numpy arrays. Numpy has the advantage when the size of
the array is a little bit bigger.
We will try one more example. This time, we will compare the performance of **cv.countNonZero()**
and **np.count_nonzero()** for the same image.
@code{.py}
In [35]: %timeit z = cv.countNonZero(img)
100000 loops, best of 3: 15.8 us per loop
In [36]: %timeit z = np.count_nonzero(img)
1000 loops, best of 3: 370 us per loop
@endcode
See, the OpenCV function is nearly 25x faster than the Numpy function.
@note Normally, OpenCV functions are faster than Numpy functions. So for same operation, OpenCV
functions are preferred. But, there can be exceptions, especially when Numpy works with views
instead of copies.
More IPython magic commands
---------------------------
There are several other magic commands to measure performance, profiling, line profiling, memory
measurement, and etc. They all are well documented. So only links to those docs are provided here.
Interested readers are recommended to try them out.
Performance Optimization Techniques
-----------------------------------
There are several techniques and coding methods to exploit maximum performance of Python and Numpy.
Only relevant ones are noted here and links are given to important sources. The main thing to be
noted here is, first try to implement the algorithm in a simple manner. Once it is working,
profile it, find the bottlenecks, and optimize them.
-# Avoid using loops in Python as much as possible, especially double/triple loops etc. They are
inherently slow.
2. Vectorize the algorithm/code to the maximum extent possible, because Numpy and OpenCV are
optimized for vector operations.
3. Exploit the cache coherence.
4. Never make copies of an array unless it is necessary. Try to use views instead. Array copying is a
costly operation.
If your code is still slow after doing all of these operations, or if the use of large loops is inevitable, use additional libraries like Cython to make it faster.
Additional Resources
--------------------
-# [Python Optimization Techniques](http://wiki.python.org/moin/PythonSpeed/PerformanceTips)
2. Scipy Lecture Notes - [Advanced
Numpy](http://scipy-lectures.github.io/advanced/advanced_numpy/index.html#advanced-numpy)
3. [Timing and Profiling in IPython](http://pynash.org/2013/03/06/timing-and-profiling/)
Exercises
---------

View File

@ -0,0 +1,18 @@
Core Operations {#tutorial_py_table_of_contents_core}
===============
- @subpage tutorial_py_basic_ops
Learn to read and
edit pixel values, working with image ROI and other basic operations.
- @subpage tutorial_py_image_arithmetics
Perform arithmetic
operations on images
- @subpage tutorial_py_optimization
Getting a solution is
important. But getting it in the fastest way is more important. Learn to check the speed of your
code, optimize the code etc.

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.4 KiB

View File

@ -0,0 +1,92 @@
BRIEF (Binary Robust Independent Elementary Features) {#tutorial_py_brief}
=====================================================
Goal
----
In this chapter
- We will see the basics of BRIEF algorithm
Theory
------
We know SIFT uses 128-dim vector for descriptors. Since it is using floating point numbers, it takes
basically 512 bytes. Similarly SURF also takes minimum of 256 bytes (for 64-dim). Creating such a
vector for thousands of features takes a lot of memory which are not feasible for resource-constraint
applications especially for embedded systems. Larger the memory, longer the time it takes for
matching.
But all these dimensions may not be needed for actual matching. We can compress it using several
methods like PCA, LDA etc. Even other methods like hashing using LSH (Locality Sensitive Hashing) is
used to convert these SIFT descriptors in floating point numbers to binary strings. These binary
strings are used to match features using Hamming distance. This provides better speed-up because
finding hamming distance is just applying XOR and bit count, which are very fast in modern CPUs with
SSE instructions. But here, we need to find the descriptors first, then only we can apply hashing,
which doesn't solve our initial problem on memory.
BRIEF comes into picture at this moment. It provides a shortcut to find the binary strings directly
without finding descriptors. It takes smoothened image patch and selects a set of \f$n_d\f$ (x,y)
location pairs in an unique way (explained in paper). Then some pixel intensity comparisons are done
on these location pairs. For eg, let first location pairs be \f$p\f$ and \f$q\f$. If \f$I(p) < I(q)\f$, then its
result is 1, else it is 0. This is applied for all the \f$n_d\f$ location pairs to get a
\f$n_d\f$-dimensional bitstring.
This \f$n_d\f$ can be 128, 256 or 512. OpenCV supports all of these, but by default, it would be 256
(OpenCV represents it in bytes. So the values will be 16, 32 and 64). So once you get this, you can
use Hamming Distance to match these descriptors.
One important point is that BRIEF is a feature descriptor, it doesn't provide any method to find the
features. So you will have to use any other feature detectors like SIFT, SURF etc. The paper
recommends to use CenSurE which is a fast detector and BRIEF works even slightly better for CenSurE
points than for SURF points.
In short, BRIEF is a faster method feature descriptor calculation and matching. It also provides
high recognition rate unless there is large in-plane rotation.
STAR(CenSurE) in OpenCV
------
STAR is a feature detector derived from CenSurE.
Unlike CenSurE however, which uses polygons like squares, hexagons and octagons to approach a circle,
Star emulates a circle with 2 overlapping squares: 1 upright and 1 45-degree rotated. These polygons are bi-level.
They can be seen as polygons with thick borders. The borders and the enclosed area have weights of opposing signs.
This has better computational characteristics than other scale-space detectors and it is capable of real-time implementation.
In contrast to SIFT and SURF, which find extrema at sub-sampled pixels that compromises accuracy at larger scales,
CenSurE creates a feature vector using full spatial resolution at all scales in the pyramid.
BRIEF in OpenCV
---------------
Below code shows the computation of BRIEF descriptors with the help of CenSurE detector.
note, that you need [opencv contrib](https://github.com/opencv/opencv_contrib)) to use this.
@code{.py}
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('simple.jpg',0)
# Initiate FAST detector
star = cv.xfeatures2d.StarDetector_create()
# Initiate BRIEF extractor
brief = cv.xfeatures2d.BriefDescriptorExtractor_create()
# find the keypoints with STAR
kp = star.detect(img,None)
# compute the descriptors with BRIEF
kp, des = brief.compute(img, kp)
print( brief.descriptorSize() )
print( des.shape )
@endcode
The function brief.getDescriptorSize() gives the \f$n_d\f$ size used in bytes. By default it is 32. Next one
is matching, which will be done in another chapter.
Additional Resources
--------------------
-# Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua, "BRIEF: Binary Robust
Independent Elementary Features", 11th European Conference on Computer Vision (ECCV), Heraklion,
Crete. LNCS Springer, September 2010.
2. [LSH (Locality Sensitive Hashing)](https://en.wikipedia.org/wiki/Locality-sensitive_hashing) at wikipedia.

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

View File

@ -0,0 +1,143 @@
FAST Algorithm for Corner Detection {#tutorial_py_fast}
===================================
Goal
----
In this chapter,
- We will understand the basics of FAST algorithm
- We will find corners using OpenCV functionalities for FAST algorithm.
Theory
------
We saw several feature detectors and many of them are really good. But when looking from a real-time
application point of view, they are not fast enough. One best example would be SLAM (Simultaneous
Localization and Mapping) mobile robot which have limited computational resources.
As a solution to this, FAST (Features from Accelerated Segment Test) algorithm was proposed by
Edward Rosten and Tom Drummond in their paper "Machine learning for high-speed corner detection" in
2006 (Later revised it in 2010). A basic summary of the algorithm is presented below. Refer original
paper for more details (All the images are taken from original paper).
### Feature Detection using FAST
-# Select a pixel \f$p\f$ in the image which is to be identified as an interest point or not. Let its
intensity be \f$I_p\f$.
2. Select appropriate threshold value \f$t\f$.
3. Consider a circle of 16 pixels around the pixel under test. (See the image below)
![image](images/fast_speedtest.jpg)
-# Now the pixel \f$p\f$ is a corner if there exists a set of \f$n\f$ contiguous pixels in the circle (of
16 pixels) which are all brighter than \f$I_p + t\f$, or all darker than \f$I_p t\f$. (Shown as white
dash lines in the above image). \f$n\f$ was chosen to be 12.
5. A **high-speed test** was proposed to exclude a large number of non-corners. This test examines
only the four pixels at 1, 9, 5 and 13 (First 1 and 9 are tested if they are too brighter or
darker. If so, then checks 5 and 13). If \f$p\f$ is a corner, then at least three of these must all
be brighter than \f$I_p + t\f$ or darker than \f$I_p t\f$. If neither of these is the case, then \f$p\f$
cannot be a corner. The full segment test criterion can then be applied to the passed candidates
by examining all pixels in the circle. This detector in itself exhibits high performance, but
there are several weaknesses:
- It does not reject as many candidates for n \< 12.
- The choice of pixels is not optimal because its efficiency depends on ordering of the
questions and distribution of corner appearances.
- Results of high-speed tests are thrown away.
- Multiple features are detected adjacent to one another.
First 3 points are addressed with a machine learning approach. Last one is addressed using
non-maximal suppression.
### Machine Learning a Corner Detector
-# Select a set of images for training (preferably from the target application domain)
2. Run FAST algorithm in every images to find feature points.
3. For every feature point, store the 16 pixels around it as a vector. Do it for all the images to
get feature vector \f$P\f$.
4. Each pixel (say \f$x\f$) in these 16 pixels can have one of the following three states:
![image](images/fast_eqns.jpg)
-# Depending on these states, the feature vector \f$P\f$ is subdivided into 3 subsets, \f$P_d\f$, \f$P_s\f$,
\f$P_b\f$.
6. Define a new boolean variable, \f$K_p\f$, which is true if \f$p\f$ is a corner and false otherwise.
7. Use the ID3 algorithm (decision tree classifier) to query each subset using the variable \f$K_p\f$
for the knowledge about the true class. It selects the \f$x\f$ which yields the most information
about whether the candidate pixel is a corner, measured by the entropy of \f$K_p\f$.
8. This is recursively applied to all the subsets until its entropy is zero.
9. The decision tree so created is used for fast detection in other images.
### Non-maximal Suppression
Detecting multiple interest points in adjacent locations is another problem. It is solved by using
Non-maximum Suppression.
-# Compute a score function, \f$V\f$ for all the detected feature points. \f$V\f$ is the sum of absolute
difference between \f$p\f$ and 16 surrounding pixels values.
2. Consider two adjacent keypoints and compute their \f$V\f$ values.
3. Discard the one with lower \f$V\f$ value.
### Summary
It is several times faster than other existing corner detectors.
But it is not robust to high levels of noise. It is dependent on a threshold.
FAST Feature Detector in OpenCV
-------------------------------
It is called as any other feature detector in OpenCV. If you want, you can specify the threshold,
whether non-maximum suppression to be applied or not, the neighborhood to be used etc.
For the neighborhood, three flags are defined, cv.FAST_FEATURE_DETECTOR_TYPE_5_8,
cv.FAST_FEATURE_DETECTOR_TYPE_7_12 and cv.FAST_FEATURE_DETECTOR_TYPE_9_16. Below is a
simple code on how to detect and draw the FAST feature points.
@code{.py}
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('blox.jpg',0) # `<opencv_root>/samples/data/blox.jpg`
# Initiate FAST object with default values
fast = cv.FastFeatureDetector_create()
# find and draw the keypoints
kp = fast.detect(img,None)
img2 = cv.drawKeypoints(img, kp, None, color=(255,0,0))
# Print all default params
print( "Threshold: {}".format(fast.getThreshold()) )
print( "nonmaxSuppression:{}".format(fast.getNonmaxSuppression()) )
print( "neighborhood: {}".format(fast.getType()) )
print( "Total Keypoints with nonmaxSuppression: {}".format(len(kp)) )
cv.imwrite('fast_true.png', img2)
# Disable nonmaxSuppression
fast.setNonmaxSuppression(0)
kp = fast.detect(img, None)
print( "Total Keypoints without nonmaxSuppression: {}".format(len(kp)) )
img3 = cv.drawKeypoints(img, kp, None, color=(255,0,0))
cv.imwrite('fast_false.png', img3)
@endcode
See the results. First image shows FAST with nonmaxSuppression and second one without
nonmaxSuppression:
![image](images/fast_kp.jpg)
Additional Resources
--------------------
-# Edward Rosten and Tom Drummond, “Machine learning for high speed corner detection” in 9th
European Conference on Computer Vision, vol. 1, 2006, pp. 430443.
2. Edward Rosten, Reid Porter, and Tom Drummond, "Faster and better: a machine learning approach to
corner detection" in IEEE Trans. Pattern Analysis and Machine Intelligence, 2010, vol 32, pp.
105-119.
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

View File

@ -0,0 +1,110 @@
Feature Matching + Homography to find Objects {#tutorial_py_feature_homography}
=============================================
Goal
----
In this chapter,
- We will mix up the feature matching and findHomography from calib3d module to find known
objects in a complex image.
Basics
------
So what we did in last session? We used a queryImage, found some feature points in it, we took
another trainImage, found the features in that image too and we found the best matches among them.
In short, we found locations of some parts of an object in another cluttered image. This information
is sufficient to find the object exactly on the trainImage.
For that, we can use a function from calib3d module, ie **cv.findHomography()**. If we pass the set
of points from both the images, it will find the perspective transformation of that object. Then we
can use **cv.perspectiveTransform()** to find the object. It needs atleast four correct points to
find the transformation.
We have seen that there can be some possible errors while matching which may affect the result. To
solve this problem, algorithm uses RANSAC or LEAST_MEDIAN (which can be decided by the flags). So
good matches which provide correct estimation are called inliers and remaining are called outliers.
**cv.findHomography()** returns a mask which specifies the inlier and outlier points.
So let's do it !!!
Code
----
First, as usual, let's find SIFT features in images and apply the ratio test to find the best
matches.
@code{.py}
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
MIN_MATCH_COUNT = 10
img1 = cv.imread('box.png',0) # queryImage
img2 = cv.imread('box_in_scene.png',0) # trainImage
# Initiate SIFT detector
sift = cv.SIFT_create()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks = 50)
flann = cv.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des1,des2,k=2)
# store all the good matches as per Lowe's ratio test.
good = []
for m,n in matches:
if m.distance < 0.7*n.distance:
good.append(m)
@endcode
Now we set a condition that atleast 10 matches (defined by MIN_MATCH_COUNT) are to be there to
find the object. Otherwise simply show a message saying not enough matches are present.
If enough matches are found, we extract the locations of matched keypoints in both the images. They
are passed to find the perspective transformation. Once we get this 3x3 transformation matrix, we use
it to transform the corners of queryImage to corresponding points in trainImage. Then we draw it.
@code{.py}
if len(good)>MIN_MATCH_COUNT:
src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)
M, mask = cv.findHomography(src_pts, dst_pts, cv.RANSAC,5.0)
matchesMask = mask.ravel().tolist()
h,w,d = img1.shape
pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2)
dst = cv.perspectiveTransform(pts,M)
img2 = cv.polylines(img2,[np.int32(dst)],True,255,3, cv.LINE_AA)
else:
print( "Not enough matches are found - {}/{}".format(len(good), MIN_MATCH_COUNT) )
matchesMask = None
@endcode
Finally we draw our inliers (if successfully found the object) or matching keypoints (if failed).
@code{.py}
draw_params = dict(matchColor = (0,255,0), # draw matches in green color
singlePointColor = None,
matchesMask = matchesMask, # draw only inliers
flags = 2)
img3 = cv.drawMatches(img1,kp1,img2,kp2,good,None,**draw_params)
plt.imshow(img3, 'gray'),plt.show()
@endcode
See the result below. Object is marked in white color in cluttered image:
![image](images/homography_findobj.jpg)
Additional Resources
--------------------
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

View File

@ -0,0 +1,150 @@
Harris Corner Detection {#tutorial_py_features_harris}
=======================
Goal
----
In this chapter,
- We will understand the concepts behind Harris Corner Detection.
- We will see the following functions: **cv.cornerHarris()**, **cv.cornerSubPix()**
Theory
------
In the last chapter, we saw that corners are regions in the image with large variation in intensity in
all the directions. One early attempt to find these corners was done by **Chris Harris & Mike
Stephens** in their paper **A Combined Corner and Edge Detector** in 1988, so now it is called
the Harris Corner Detector. He took this simple idea to a mathematical form. It basically finds the
difference in intensity for a displacement of \f$(u,v)\f$ in all directions. This is expressed as below:
\f[E(u,v) = \sum_{x,y} \underbrace{w(x,y)}_\text{window function} \, [\underbrace{I(x+u,y+v)}_\text{shifted intensity}-\underbrace{I(x,y)}_\text{intensity}]^2\f]
The window function is either a rectangular window or a Gaussian window which gives weights to pixels
underneath.
We have to maximize this function \f$E(u,v)\f$ for corner detection. That means we have to maximize the
second term. Applying Taylor Expansion to the above equation and using some mathematical steps (please
refer to any standard text books you like for full derivation), we get the final equation as:
\f[E(u,v) \approx \begin{bmatrix} u & v \end{bmatrix} M \begin{bmatrix} u \\ v \end{bmatrix}\f]
where
\f[M = \sum_{x,y} w(x,y) \begin{bmatrix}I_x I_x & I_x I_y \\
I_x I_y & I_y I_y \end{bmatrix}\f]
Here, \f$I_x\f$ and \f$I_y\f$ are image derivatives in x and y directions respectively. (These can be easily found
using **cv.Sobel()**).
Then comes the main part. After this, they created a score, basically an equation, which
determines if a window can contain a corner or not.
\f[R = \det(M) - k(\operatorname{trace}(M))^2\f]
where
- \f$\det(M) = \lambda_1 \lambda_2\f$
- \f$\operatorname{trace}(M) = \lambda_1 + \lambda_2\f$
- \f$\lambda_1\f$ and \f$\lambda_2\f$ are the eigenvalues of \f$M\f$
So the magnitudes of these eigenvalues decide whether a region is a corner, an edge, or flat.
- When \f$|R|\f$ is small, which happens when \f$\lambda_1\f$ and \f$\lambda_2\f$ are small, the region is
flat.
- When \f$R<0\f$, which happens when \f$\lambda_1 >> \lambda_2\f$ or vice versa, the region is edge.
- When \f$R\f$ is large, which happens when \f$\lambda_1\f$ and \f$\lambda_2\f$ are large and
\f$\lambda_1 \sim \lambda_2\f$, the region is a corner.
It can be represented in a nice picture as follows:
![image](images/harris_region.jpg)
So the result of Harris Corner Detection is a grayscale image with these scores. Thresholding for a
suitable score gives you the corners in the image. We will do it with a simple image.
Harris Corner Detector in OpenCV
--------------------------------
OpenCV has the function **cv.cornerHarris()** for this purpose. Its arguments are:
- **img** - Input image. It should be grayscale and float32 type.
- **blockSize** - It is the size of neighbourhood considered for corner detection
- **ksize** - Aperture parameter of the Sobel derivative used.
- **k** - Harris detector free parameter in the equation.
See the example below:
@code{.py}
import numpy as np
import cv2 as cv
filename = 'chessboard.png'
img = cv.imread(filename)
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
gray = np.float32(gray)
dst = cv.cornerHarris(gray,2,3,0.04)
#result is dilated for marking the corners, not important
dst = cv.dilate(dst,None)
# Threshold for an optimal value, it may vary depending on the image.
img[dst>0.01*dst.max()]=[0,0,255]
cv.imshow('dst',img)
if cv.waitKey(0) & 0xff == 27:
cv.destroyAllWindows()
@endcode
Below are the three results:
![image](images/harris_result.jpg)
Corner with SubPixel Accuracy
-----------------------------
Sometimes, you may need to find the corners with maximum accuracy. OpenCV comes with a function
**cv.cornerSubPix()** which further refines the corners detected with sub-pixel accuracy. Below is
an example. As usual, we need to find the Harris corners first. Then we pass the centroids of these
corners (There may be a bunch of pixels at a corner, we take their centroid) to refine them. Harris
corners are marked in red pixels and refined corners are marked in green pixels. For this function,
we have to define the criteria when to stop the iteration. We stop it after a specified number of
iterations or a certain accuracy is achieved, whichever occurs first. We also need to define the size
of the neighbourhood it searches for corners.
@code{.py}
import numpy as np
import cv2 as cv
filename = 'chessboard2.jpg'
img = cv.imread(filename)
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
# find Harris corners
gray = np.float32(gray)
dst = cv.cornerHarris(gray,2,3,0.04)
dst = cv.dilate(dst,None)
ret, dst = cv.threshold(dst,0.01*dst.max(),255,0)
dst = np.uint8(dst)
# find centroids
ret, labels, stats, centroids = cv.connectedComponentsWithStats(dst)
# define the criteria to stop and refine the corners
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 100, 0.001)
corners = cv.cornerSubPix(gray,np.float32(centroids),(5,5),(-1,-1),criteria)
# Now draw them
res = np.hstack((centroids,corners))
res = np.int0(res)
img[res[:,1],res[:,0]]=[0,0,255]
img[res[:,3],res[:,2]] = [0,255,0]
cv.imwrite('subpixel5.png',img)
@endcode
Below is the result, where some important locations are shown in the zoomed window to visualize:
![image](images/subpixel3.png)
Additional Resources
--------------------
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.0 KiB

View File

@ -0,0 +1,89 @@
Understanding Features {#tutorial_py_features_meaning}
======================
Goal
----
In this chapter, we will just try to understand what are features, why are they important, why
corners are important etc.
Explanation
-----------
Most of you will have played the jigsaw puzzle games. You get a lot of small pieces of an image,
where you need to assemble them correctly to form a big real image. **The question is, how you do
it?** What about the projecting the same theory to a computer program so that computer can play
jigsaw puzzles? If the computer can play jigsaw puzzles, why can't we give a lot of real-life images
of a good natural scenery to computer and tell it to stitch all those images to a big single image?
If the computer can stitch several natural images to one, what about giving a lot of pictures of a
building or any structure and tell computer to create a 3D model out of it?
Well, the questions and imaginations continue. But it all depends on the most basic question: How do
you play jigsaw puzzles? How do you arrange lots of scrambled image pieces into a big single image?
How can you stitch a lot of natural images to a single image?
The answer is, we are looking for specific patterns or specific features which are unique, can
be easily tracked and can be easily compared. If we go for a definition of such a feature, we may
find it difficult to express it in words, but we know what they are. If someone asks you to point
out one good feature which can be compared across several images, you can point out one. That is
why even small children can simply play these games. We search for these features in an image,
find them, look for the same features in other images and align them. That's it. (In jigsaw puzzle,
we look more into continuity of different images). All these abilities are present in us inherently.
So our one basic question expands to more in number, but becomes more specific. **What are these
features?**. (The answer should be understandable also to a computer.)
It is difficult to say how humans find these features. This is already programmed in our brain.
But if we look deep into some pictures and search for different patterns, we will find something
interesting. For example, take below image:
![image](images/feature_building.jpg)
The image is very simple. At the top of image, six small image patches are given. Question for you is to
find the exact location of these patches in the original image. How many correct results can you
find?
A and B are flat surfaces and they are spread over a lot of area. It is difficult to find the exact
location of these patches.
C and D are much more simple. They are edges of the building. You can find an approximate location,
but exact location is still difficult. This is because the pattern is same everywhere along the edge.
At the edge, however, it is different. An edge is therefore better feature compared to flat area, but
not good enough (It is good in jigsaw puzzle for comparing continuity of edges).
Finally, E and F are some corners of the building. And they can be easily found. Because at the
corners, wherever you move this patch, it will look different. So they can be considered as good
features. So now we move into simpler (and widely used image) for better understanding.
![image](images/feature_simple.png)
Just like above, the blue patch is flat area and difficult to find and track. Wherever you move the blue
patch it looks the same. The black patch has an edge. If you move it in vertical direction (i.e.
along the gradient) it changes. Moved along the edge (parallel to edge), it looks the same. And for
red patch, it is a corner. Wherever you move the patch, it looks different, means it is unique. So
basically, corners are considered to be good features in an image. (Not just corners, in some cases
blobs are considered good features).
So now we answered our question, "what are these features?". But next question arises. How do we
find them? Or how do we find the corners?. We answered that in an intuitive way, i.e., look for
the regions in images which have maximum variation when moved (by a small amount) in all regions
around it. This would be projected into computer language in coming chapters. So finding these image
features is called **Feature Detection**.
We found the features in the images. Once you have found it, you should be able to find the same
in the other images. How is this done? We take a region around the feature, we explain it in our own
words, like "upper part is blue sky, lower part is region from a building, on that building there is
glass etc" and you search for the same area in the other images. Basically, you are describing the
feature. Similarly, a computer also should describe the region around the feature so that it can
find it in other images. So called description is called **Feature Description**. Once you have the
features and its description, you can find same features in all images and align them, stitch them together
or do whatever you want.
So in this module, we are looking to different algorithms in OpenCV to find features, describe them,
match them etc.
Additional Resources
--------------------
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

View File

@ -0,0 +1,217 @@
Feature Matching {#tutorial_py_matcher}
================
Goal
----
In this chapter
- We will see how to match features in one image with others.
- We will use the Brute-Force matcher and FLANN Matcher in OpenCV
Basics of Brute-Force Matcher
-----------------------------
Brute-Force matcher is simple. It takes the descriptor of one feature in first set and is matched
with all other features in second set using some distance calculation. And the closest one is
returned.
For BF matcher, first we have to create the BFMatcher object using **cv.BFMatcher()**. It takes two
optional params. First one is normType. It specifies the distance measurement to be used. By
default, it is cv.NORM_L2. It is good for SIFT, SURF etc (cv.NORM_L1 is also there). For binary
string based descriptors like ORB, BRIEF, BRISK etc, cv.NORM_HAMMING should be used, which used
Hamming distance as measurement. If ORB is using WTA_K == 3 or 4, cv.NORM_HAMMING2 should be
used.
Second param is boolean variable, crossCheck which is false by default. If it is true, Matcher
returns only those matches with value (i,j) such that i-th descriptor in set A has j-th descriptor
in set B as the best match and vice-versa. That is, the two features in both sets should match each
other. It provides consistent result, and is a good alternative to ratio test proposed by D.Lowe in
SIFT paper.
Once it is created, two important methods are *BFMatcher.match()* and *BFMatcher.knnMatch()*. First
one returns the best match. Second method returns k best matches where k is specified by the user.
It may be useful when we need to do additional work on that.
Like we used cv.drawKeypoints() to draw keypoints, **cv.drawMatches()** helps us to draw the
matches. It stacks two images horizontally and draw lines from first image to second image showing
best matches. There is also **cv.drawMatchesKnn** which draws all the k best matches. If k=2, it
will draw two match-lines for each keypoint. So we have to pass a mask if we want to selectively
draw it.
Let's see one example for each of SIFT and ORB (Both use different distance measurements).
### Brute-Force Matching with ORB Descriptors
Here, we will see a simple example on how to match features between two images. In this case, I have
a queryImage and a trainImage. We will try to find the queryImage in trainImage using feature
matching. ( The images are /samples/data/box.png and /samples/data/box_in_scene.png)
We are using ORB descriptors to match features. So let's start with loading images, finding
descriptors etc.
@code{.py}
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
img1 = cv.imread('box.png',cv.IMREAD_GRAYSCALE) # queryImage
img2 = cv.imread('box_in_scene.png',cv.IMREAD_GRAYSCALE) # trainImage
# Initiate ORB detector
orb = cv.ORB_create()
# find the keypoints and descriptors with ORB
kp1, des1 = orb.detectAndCompute(img1,None)
kp2, des2 = orb.detectAndCompute(img2,None)
@endcode
Next we create a BFMatcher object with distance measurement cv.NORM_HAMMING (since we are using
ORB) and crossCheck is switched on for better results. Then we use Matcher.match() method to get the
best matches in two images. We sort them in ascending order of their distances so that best matches
(with low distance) come to front. Then we draw only first 10 matches (Just for sake of visibility.
You can increase it as you like)
@code{.py}
# create BFMatcher object
bf = cv.BFMatcher(cv.NORM_HAMMING, crossCheck=True)
# Match descriptors.
matches = bf.match(des1,des2)
# Sort them in the order of their distance.
matches = sorted(matches, key = lambda x:x.distance)
# Draw first 10 matches.
img3 = cv.drawMatches(img1,kp1,img2,kp2,matches[:10],None,flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
plt.imshow(img3),plt.show()
@endcode
Below is the result I got:
![image](images/matcher_result1.jpg)
### What is this Matcher Object?
The result of matches = bf.match(des1,des2) line is a list of DMatch objects. This DMatch object has
following attributes:
- DMatch.distance - Distance between descriptors. The lower, the better it is.
- DMatch.trainIdx - Index of the descriptor in train descriptors
- DMatch.queryIdx - Index of the descriptor in query descriptors
- DMatch.imgIdx - Index of the train image.
### Brute-Force Matching with SIFT Descriptors and Ratio Test
This time, we will use BFMatcher.knnMatch() to get k best matches. In this example, we will take k=2
so that we can apply ratio test explained by D.Lowe in his paper.
@code{.py}
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
img1 = cv.imread('box.png',cv.IMREAD_GRAYSCALE) # queryImage
img2 = cv.imread('box_in_scene.png',cv.IMREAD_GRAYSCALE) # trainImage
# Initiate SIFT detector
sift = cv.SIFT_create()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
# BFMatcher with default params
bf = cv.BFMatcher()
matches = bf.knnMatch(des1,des2,k=2)
# Apply ratio test
good = []
for m,n in matches:
if m.distance < 0.75*n.distance:
good.append([m])
# cv.drawMatchesKnn expects list of lists as matches.
img3 = cv.drawMatchesKnn(img1,kp1,img2,kp2,good,None,flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
plt.imshow(img3),plt.show()
@endcode
See the result below:
![image](images/matcher_result2.jpg)
FLANN based Matcher
-------------------
FLANN stands for Fast Library for Approximate Nearest Neighbors. It contains a collection of
algorithms optimized for fast nearest neighbor search in large datasets and for high dimensional
features. It works faster than BFMatcher for large datasets. We will see the second example
with FLANN based matcher.
For FLANN based matcher, we need to pass two dictionaries which specifies the algorithm to be used,
its related parameters etc. First one is IndexParams. For various algorithms, the information to be
passed is explained in FLANN docs. As a summary, for algorithms like SIFT, SURF etc. you can pass
following:
@code{.py}
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
@endcode
While using ORB, you can pass the following. The commented values are recommended as per the docs,
but it didn't provide required results in some cases. Other values worked fine.:
@code{.py}
FLANN_INDEX_LSH = 6
index_params= dict(algorithm = FLANN_INDEX_LSH,
table_number = 6, # 12
key_size = 12, # 20
multi_probe_level = 1) #2
@endcode
Second dictionary is the SearchParams. It specifies the number of times the trees in the index
should be recursively traversed. Higher values gives better precision, but also takes more time. If
you want to change the value, pass search_params = dict(checks=100).
With this information, we are good to go.
@code{.py}
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
img1 = cv.imread('box.png',cv.IMREAD_GRAYSCALE) # queryImage
img2 = cv.imread('box_in_scene.png',cv.IMREAD_GRAYSCALE) # trainImage
# Initiate SIFT detector
sift = cv.SIFT_create()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
# FLANN parameters
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50) # or pass empty dictionary
flann = cv.FlannBasedMatcher(index_params,search_params)
matches = flann.knnMatch(des1,des2,k=2)
# Need to draw only good matches, so create a mask
matchesMask = [[0,0] for i in range(len(matches))]
# ratio test as per Lowe's paper
for i,(m,n) in enumerate(matches):
if m.distance < 0.7*n.distance:
matchesMask[i]=[1,0]
draw_params = dict(matchColor = (0,255,0),
singlePointColor = (255,0,0),
matchesMask = matchesMask,
flags = cv.DrawMatchesFlags_DEFAULT)
img3 = cv.drawMatchesKnn(img1,kp1,img2,kp2,matches,None,**draw_params)
plt.imshow(img3,),plt.show()
@endcode
See the result below:
![image](images/matcher_flann.jpg)
Additional Resources
--------------------
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

View File

@ -0,0 +1,98 @@
ORB (Oriented FAST and Rotated BRIEF) {#tutorial_py_orb}
=====================================
Goal
----
In this chapter,
- We will see the basics of ORB
Theory
------
As an OpenCV enthusiast, the most important thing about the ORB is that it came from "OpenCV Labs".
This algorithm was brought up by Ethan Rublee, Vincent Rabaud, Kurt Konolige and Gary R. Bradski in
their paper **ORB: An efficient alternative to SIFT or SURF** in 2011. As the title says, it is a
good alternative to SIFT and SURF in computation cost, matching performance and mainly the patents.
Yes, SIFT and SURF are patented and you are supposed to pay them for its use. But ORB is not !!!
ORB is basically a fusion of FAST keypoint detector and BRIEF descriptor with many modifications to
enhance the performance. First it use FAST to find keypoints, then apply Harris corner measure to
find top N points among them. It also use pyramid to produce multiscale-features. But one problem is
that, FAST doesn't compute the orientation. So what about rotation invariance? Authors came up with
following modification.
It computes the intensity weighted centroid of the patch with located corner at center. The
direction of the vector from this corner point to centroid gives the orientation. To improve the
rotation invariance, moments are computed with x and y which should be in a circular region of
radius \f$r\f$, where \f$r\f$ is the size of the patch.
Now for descriptors, ORB use BRIEF descriptors. But we have already seen that BRIEF performs poorly
with rotation. So what ORB does is to "steer" BRIEF according to the orientation of keypoints. For
any feature set of \f$n\f$ binary tests at location \f$(x_i, y_i)\f$, define a \f$2 \times n\f$ matrix, \f$S\f$
which contains the coordinates of these pixels. Then using the orientation of patch, \f$\theta\f$, its
rotation matrix is found and rotates the \f$S\f$ to get steered(rotated) version \f$S_\theta\f$.
ORB discretize the angle to increments of \f$2 \pi /30\f$ (12 degrees), and construct a lookup table of
precomputed BRIEF patterns. As long as the keypoint orientation \f$\theta\f$ is consistent across views,
the correct set of points \f$S_\theta\f$ will be used to compute its descriptor.
BRIEF has an important property that each bit feature has a large variance and a mean near 0.5. But
once it is oriented along keypoint direction, it loses this property and become more distributed.
High variance makes a feature more discriminative, since it responds differentially to inputs.
Another desirable property is to have the tests uncorrelated, since then each test will contribute
to the result. To resolve all these, ORB runs a greedy search among all possible binary tests to
find the ones that have both high variance and means close to 0.5, as well as being uncorrelated.
The result is called **rBRIEF**.
For descriptor matching, multi-probe LSH which improves on the traditional LSH, is used. The paper
says ORB is much faster than SURF and SIFT and ORB descriptor works better than SURF. ORB is a good
choice in low-power devices for panorama stitching etc.
ORB in OpenCV
-------------
As usual, we have to create an ORB object with the function, **cv.ORB()** or using feature2d common
interface. It has a number of optional parameters. Most useful ones are nFeatures which denotes
maximum number of features to be retained (by default 500), scoreType which denotes whether Harris
score or FAST score to rank the features (by default, Harris score) etc. Another parameter, WTA_K
decides number of points that produce each element of the oriented BRIEF descriptor. By default it
is two, ie selects two points at a time. In that case, for matching, NORM_HAMMING distance is used.
If WTA_K is 3 or 4, which takes 3 or 4 points to produce BRIEF descriptor, then matching distance
is defined by NORM_HAMMING2.
Below is a simple code which shows the use of ORB.
@code{.py}
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('simple.jpg',0)
# Initiate ORB detector
orb = cv.ORB_create()
# find the keypoints with ORB
kp = orb.detect(img,None)
# compute the descriptors with ORB
kp, des = orb.compute(img, kp)
# draw only keypoints location,not size and orientation
img2 = cv.drawKeypoints(img, kp, None, color=(0,255,0), flags=0)
plt.imshow(img2), plt.show()
@endcode
See the result below:
![image](images/orb_kp.jpg)
ORB feature matching, we will do in another chapter.
Additional Resources
--------------------
-# Ethan Rublee, Vincent Rabaud, Kurt Konolige, Gary R. Bradski: ORB: An efficient alternative to
SIFT or SURF. ICCV 2011: 2564-2571.
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.5 KiB

View File

@ -0,0 +1,75 @@
Shi-Tomasi Corner Detector & Good Features to Track {#tutorial_py_shi_tomasi}
===================================================
Goal
----
In this chapter,
- We will learn about the another corner detector: Shi-Tomasi Corner Detector
- We will see the function: **cv.goodFeaturesToTrack()**
Theory
------
In last chapter, we saw Harris Corner Detector. Later in 1994, J. Shi and C. Tomasi made a small
modification to it in their paper **Good Features to Track** which shows better results compared to
Harris Corner Detector. The scoring function in Harris Corner Detector was given by:
\f[R = \lambda_1 \lambda_2 - k(\lambda_1+\lambda_2)^2\f]
Instead of this, Shi-Tomasi proposed:
\f[R = \min(\lambda_1, \lambda_2)\f]
If it is a greater than a threshold value, it is considered as a corner. If we plot it in
\f$\lambda_1 - \lambda_2\f$ space as we did in Harris Corner Detector, we get an image as below:
![image](images/shitomasi_space.png)
From the figure, you can see that only when \f$\lambda_1\f$ and \f$\lambda_2\f$ are above a minimum value,
\f$\lambda_{\min}\f$, it is considered as a corner(green region).
Code
----
OpenCV has a function, **cv.goodFeaturesToTrack()**. It finds N strongest corners in the image by
Shi-Tomasi method (or Harris Corner Detection, if you specify it). As usual, image should be a
grayscale image. Then you specify number of corners you want to find. Then you specify the quality
level, which is a value between 0-1, which denotes the minimum quality of corner below which
everyone is rejected. Then we provide the minimum euclidean distance between corners detected.
With all this information, the function finds corners in the image. All corners below quality
level are rejected. Then it sorts the remaining corners based on quality in the descending order.
Then function takes first strongest corner, throws away all the nearby corners in the range of
minimum distance and returns N strongest corners.
In below example, we will try to find 25 best corners:
@code{.py}
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('blox.jpg')
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
corners = cv.goodFeaturesToTrack(gray,25,0.01,10)
corners = np.int0(corners)
for i in corners:
x,y = i.ravel()
cv.circle(img,(x,y),3,255,-1)
plt.imshow(img),plt.show()
@endcode
See the result below:
![image](images/shitomasi_block1.jpg)
This function is more appropriate for tracking. We will see that when its time comes.
Additional Resources
--------------------
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.3 KiB

View File

@ -0,0 +1,168 @@
Introduction to SIFT (Scale-Invariant Feature Transform) {#tutorial_py_sift_intro}
========================================================
Goal
----
In this chapter,
- We will learn about the concepts of SIFT algorithm
- We will learn to find SIFT Keypoints and Descriptors.
Theory
------
In last couple of chapters, we saw some corner detectors like Harris etc. They are
rotation-invariant, which means, even if the image is rotated, we can find the same corners. It is
obvious because corners remain corners in rotated image also. But what about scaling? A corner may
not be a corner if the image is scaled. For example, check a simple image below. A corner in a small
image within a small window is flat when it is zoomed in the same window. So Harris corner is not
scale invariant.
![image](images/sift_scale_invariant.jpg)
In 2004, **D.Lowe**, University of British Columbia, came up with a new algorithm, Scale
Invariant Feature Transform (SIFT) in his paper, **Distinctive Image Features from Scale-Invariant
Keypoints**, which extract keypoints and compute its descriptors. *(This paper is easy to understand
and considered to be best material available on SIFT. This explanation is just a short summary of
this paper)*.
There are mainly four steps involved in SIFT algorithm. We will see them one-by-one.
### 1. Scale-space Extrema Detection
From the image above, it is obvious that we can't use the same window to detect keypoints with
different scale. It is OK with small corner. But to detect larger corners we need larger windows.
For this, scale-space filtering is used. In it, Laplacian of Gaussian is found for the image with
various \f$\sigma\f$ values. LoG acts as a blob detector which detects blobs in various sizes due to
change in \f$\sigma\f$. In short, \f$\sigma\f$ acts as a scaling parameter. For eg, in the above image,
gaussian kernel with low \f$\sigma\f$ gives high value for small corner while gaussian kernel with high
\f$\sigma\f$ fits well for larger corner. So, we can find the local maxima across the scale and space
which gives us a list of \f$(x,y,\sigma)\f$ values which means there is a potential keypoint at (x,y) at
\f$\sigma\f$ scale.
But this LoG is a little costly, so SIFT algorithm uses Difference of Gaussians which is an
approximation of LoG. Difference of Gaussian is obtained as the difference of Gaussian blurring of
an image with two different \f$\sigma\f$, let it be \f$\sigma\f$ and \f$k\sigma\f$. This process is done for
different octaves of the image in Gaussian Pyramid. It is represented in below image:
![image](images/sift_dog.jpg)
Once this DoG are found, images are searched for local extrema over scale and space. For eg, one
pixel in an image is compared with its 8 neighbours as well as 9 pixels in next scale and 9 pixels
in previous scales. If it is a local extrema, it is a potential keypoint. It basically means that
keypoint is best represented in that scale. It is shown in below image:
![image](images/sift_local_extrema.jpg)
Regarding different parameters, the paper gives some empirical data which can be summarized as,
number of octaves = 4, number of scale levels = 5, initial \f$\sigma=1.6\f$, \f$k=\sqrt{2}\f$ etc as optimal
values.
### 2. Keypoint Localization
Once potential keypoints locations are found, they have to be refined to get more accurate results.
They used Taylor series expansion of scale space to get more accurate location of extrema, and if
the intensity at this extrema is less than a threshold value (0.03 as per the paper), it is
rejected. This threshold is called **contrastThreshold** in OpenCV
DoG has higher response for edges, so edges also need to be removed. For this, a concept similar to
Harris corner detector is used. They used a 2x2 Hessian matrix (H) to compute the principal
curvature. We know from Harris corner detector that for edges, one eigen value is larger than the
other. So here they used a simple function,
If this ratio is greater than a threshold, called **edgeThreshold** in OpenCV, that keypoint is
discarded. It is given as 10 in paper.
So it eliminates any low-contrast keypoints and edge keypoints and what remains is strong interest
points.
### 3. Orientation Assignment
Now an orientation is assigned to each keypoint to achieve invariance to image rotation. A
neighbourhood is taken around the keypoint location depending on the scale, and the gradient
magnitude and direction is calculated in that region. An orientation histogram with 36 bins covering
360 degrees is created (It is weighted by gradient magnitude and gaussian-weighted circular window
with \f$\sigma\f$ equal to 1.5 times the scale of keypoint). The highest peak in the histogram is taken
and any peak above 80% of it is also considered to calculate the orientation. It creates keypoints
with same location and scale, but different directions. It contribute to stability of matching.
### 4. Keypoint Descriptor
Now keypoint descriptor is created. A 16x16 neighbourhood around the keypoint is taken. It is
divided into 16 sub-blocks of 4x4 size. For each sub-block, 8 bin orientation histogram is created.
So a total of 128 bin values are available. It is represented as a vector to form keypoint
descriptor. In addition to this, several measures are taken to achieve robustness against
illumination changes, rotation etc.
### 5. Keypoint Matching
Keypoints between two images are matched by identifying their nearest neighbours. But in some cases,
the second closest-match may be very near to the first. It may happen due to noise or some other
reasons. In that case, ratio of closest-distance to second-closest distance is taken. If it is
greater than 0.8, they are rejected. It eliminates around 90% of false matches while discards only
5% correct matches, as per the paper.
This is a summary of SIFT algorithm. For more details and understanding, reading the original
paper is highly recommended.
SIFT in OpenCV
--------------
Now let's see SIFT functionalities available in OpenCV. Note that these were previously only
available in [the opencv contrib repo](https://github.com/opencv/opencv_contrib), but the patent
expired in the year 2020. So they are now included in the main repo. Let's start with keypoint
detection and draw them. First we have to construct a SIFT object. We can pass different
parameters to it which are optional and they are well explained in docs.
@code{.py}
import numpy as np
import cv2 as cv
img = cv.imread('home.jpg')
gray= cv.cvtColor(img,cv.COLOR_BGR2GRAY)
sift = cv.SIFT_create()
kp = sift.detect(gray,None)
img=cv.drawKeypoints(gray,kp,img)
cv.imwrite('sift_keypoints.jpg',img)
@endcode
**sift.detect()** function finds the keypoint in the images. You can pass a mask if you want to
search only a part of image. Each keypoint is a special structure which has many attributes like its
(x,y) coordinates, size of the meaningful neighbourhood, angle which specifies its orientation,
response that specifies strength of keypoints etc.
OpenCV also provides **cv.drawKeyPoints()** function which draws the small circles on the locations
of keypoints. If you pass a flag, **cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS** to it, it will
draw a circle with size of keypoint and it will even show its orientation. See below example.
@code{.py}
img=cv.drawKeypoints(gray,kp,img,flags=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
cv.imwrite('sift_keypoints.jpg',img)
@endcode
See the two results below:
![image](images/sift_keypoints.jpg)
Now to calculate the descriptor, OpenCV provides two methods.
-# Since you already found keypoints, you can call **sift.compute()** which computes the
descriptors from the keypoints we have found. Eg: kp,des = sift.compute(gray,kp)
2. If you didn't find keypoints, directly find keypoints and descriptors in a single step with the
function, **sift.detectAndCompute()**.
We will see the second method:
@code{.py}
sift = cv.SIFT_create()
kp, des = sift.detectAndCompute(gray,None)
@endcode
Here kp will be a list of keypoints and des is a numpy array of shape
\f$\text{(Number of Keypoints)} \times 128\f$.
So we got keypoints, descriptors etc. Now we want to see how to match keypoints in different images.
That we will learn in coming chapters.
Additional Resources
--------------------
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.7 KiB

View File

@ -0,0 +1,163 @@
Introduction to SURF (Speeded-Up Robust Features) {#tutorial_py_surf_intro}
=================================================
Goal
----
In this chapter,
- We will see the basics of SURF
- We will see SURF functionalities in OpenCV
Theory
------
In last chapter, we saw SIFT for keypoint detection and description. But it was comparatively slow
and people needed more speeded-up version. In 2006, three people, Bay, H., Tuytelaars, T. and Van
Gool, L, published another paper, "SURF: Speeded Up Robust Features" which introduced a new
algorithm called SURF. As name suggests, it is a speeded-up version of SIFT.
In SIFT, Lowe approximated Laplacian of Gaussian with Difference of Gaussian for finding
scale-space. SURF goes a little further and approximates LoG with Box Filter. Below image shows a
demonstration of such an approximation. One big advantage of this approximation is that, convolution
with box filter can be easily calculated with the help of integral images. And it can be done in
parallel for different scales. Also the SURF rely on determinant of Hessian matrix for both scale
and location.
![image](images/surf_boxfilter.jpg)
For orientation assignment, SURF uses wavelet responses in horizontal and vertical direction for a
neighbourhood of size 6s. Adequate gaussian weights are also applied to it. Then they are plotted in
a space as given in below image. The dominant orientation is estimated by calculating the sum of all
responses within a sliding orientation window of angle 60 degrees. Interesting thing is that,
wavelet response can be found out using integral images very easily at any scale. For many
applications, rotation invariance is not required, so no need of finding this orientation, which
speeds up the process. SURF provides such a functionality called Upright-SURF or U-SURF. It improves
speed and is robust upto \f$\pm 15^{\circ}\f$. OpenCV supports both, depending upon the flag,
**upright**. If it is 0, orientation is calculated. If it is 1, orientation is not calculated and it
is faster.
![image](images/surf_orientation.jpg)
For feature description, SURF uses Wavelet responses in horizontal and vertical direction (again,
use of integral images makes things easier). A neighbourhood of size 20sX20s is taken around the
keypoint where s is the size. It is divided into 4x4 subregions. For each subregion, horizontal and
vertical wavelet responses are taken and a vector is formed like this,
\f$v=( \sum{d_x}, \sum{d_y}, \sum{|d_x|}, \sum{|d_y|})\f$. This when represented as a vector gives SURF
feature descriptor with total 64 dimensions. Lower the dimension, higher the speed of computation
and matching, but provide better distinctiveness of features.
For more distinctiveness, SURF feature descriptor has an extended 128 dimension version. The sums of
\f$d_x\f$ and \f$|d_x|\f$ are computed separately for \f$d_y < 0\f$ and \f$d_y \geq 0\f$. Similarly, the sums of
\f$d_y\f$ and \f$|d_y|\f$ are split up according to the sign of \f$d_x\f$ , thereby doubling the number of
features. It doesn't add much computation complexity. OpenCV supports both by setting the value of
flag **extended** with 0 and 1 for 64-dim and 128-dim respectively (default is 128-dim)
Another important improvement is the use of sign of Laplacian (trace of Hessian Matrix) for
underlying interest point. It adds no computation cost since it is already computed during
detection. The sign of the Laplacian distinguishes bright blobs on dark backgrounds from the reverse
situation. In the matching stage, we only compare features if they have the same type of contrast
(as shown in image below). This minimal information allows for faster matching, without reducing the
descriptor's performance.
![image](images/surf_matching.jpg)
In short, SURF adds a lot of features to improve the speed in every step. Analysis shows it is 3
times faster than SIFT while performance is comparable to SIFT. SURF is good at handling images with
blurring and rotation, but not good at handling viewpoint change and illumination change.
SURF in OpenCV
--------------
OpenCV provides SURF functionalities just like SIFT. You initiate a SURF object with some optional
conditions like 64/128-dim descriptors, Upright/Normal SURF etc. All the details are well explained
in docs. Then as we did in SIFT, we can use SURF.detect(), SURF.compute() etc for finding keypoints
and descriptors.
First we will see a simple demo on how to find SURF keypoints and descriptors and draw it. All
examples are shown in Python terminal since it is just same as SIFT only.
@code{.py}
>>> img = cv.imread('fly.png',0)
# Create SURF object. You can specify params here or later.
# Here I set Hessian Threshold to 400
>>> surf = cv.xfeatures2d.SURF_create(400)
# Find keypoints and descriptors directly
>>> kp, des = surf.detectAndCompute(img,None)
>>> len(kp)
699
@endcode
1199 keypoints is too much to show in a picture. We reduce it to some 50 to draw it on an image.
While matching, we may need all those features, but not now. So we increase the Hessian Threshold.
@code{.py}
# Check present Hessian threshold
>>> print( surf.getHessianThreshold() )
400.0
# We set it to some 50000. Remember, it is just for representing in picture.
# In actual cases, it is better to have a value 300-500
>>> surf.setHessianThreshold(50000)
# Again compute keypoints and check its number.
>>> kp, des = surf.detectAndCompute(img,None)
>>> print( len(kp) )
47
@endcode
It is less than 50. Let's draw it on the image.
@code{.py}
>>> img2 = cv.drawKeypoints(img,kp,None,(255,0,0),4)
>>> plt.imshow(img2),plt.show()
@endcode
See the result below. You can see that SURF is more like a blob detector. It detects the white blobs
on wings of butterfly. You can test it with other images.
![image](images/surf_kp1.jpg)
Now I want to apply U-SURF, so that it won't find the orientation.
@code{.py}
# Check upright flag, if it False, set it to True
>>> print( surf.getUpright() )
False
>>> surf.setUpright(True)
# Recompute the feature points and draw it
>>> kp = surf.detect(img,None)
>>> img2 = cv.drawKeypoints(img,kp,None,(255,0,0),4)
>>> plt.imshow(img2),plt.show()
@endcode
See the results below. All the orientations are shown in same direction. It is faster than
previous. If you are working on cases where orientation is not a problem (like panorama stitching)
etc, this is better.
![image](images/surf_kp2.jpg)
Finally we check the descriptor size and change it to 128 if it is only 64-dim.
@code{.py}
# Find size of descriptor
>>> print( surf.descriptorSize() )
64
# That means flag, "extended" is False.
>>> surf.getExtended()
False
# So we make it to True to get 128-dim descriptors.
>>> surf.setExtended(True)
>>> kp, des = surf.detectAndCompute(img,None)
>>> print( surf.descriptorSize() )
128
>>> print( des.shape )
(47, 128)
@endcode
Remaining part is matching which we will do in another chapter.
Additional Resources
--------------------
Exercises
---------

View File

@ -0,0 +1,54 @@
Feature Detection and Description {#tutorial_py_table_of_contents_feature2d}
=================================
- @subpage tutorial_py_features_meaning
What are the main
features in an image? How can finding those features be useful to us?
- @subpage tutorial_py_features_harris
Okay, Corners are good
features? But how do we find them?
- @subpage tutorial_py_shi_tomasi
We will look into
Shi-Tomasi corner detection
- @subpage tutorial_py_sift_intro
Harris corner detector
is not good enough when scale of image changes. Lowe developed a breakthrough method to find
scale-invariant features and it is called SIFT
- @subpage tutorial_py_surf_intro
SIFT is really good,
but not fast enough, so people came up with a speeded-up version called SURF.
- @subpage tutorial_py_fast
All the above feature
detection methods are good in some way. But they are not fast enough to work in real-time
applications like SLAM. There comes the FAST algorithm, which is really "FAST".
- @subpage tutorial_py_brief
SIFT uses a feature
descriptor with 128 floating point numbers. Consider thousands of such features. It takes lots of
memory and more time for matching. We can compress it to make it faster. But still we have to
calculate it first. There comes BRIEF which gives the shortcut to find binary descriptors with
less memory, faster matching, still higher recognition rate.
- @subpage tutorial_py_orb
SIFT and SURF are good in what they do, but what if you have to pay a few dollars every year to use them in your applications? Yeah, they are patented!!! To solve that problem, OpenCV devs came up with a new "FREE" alternative to SIFT & SURF, and that is ORB.
- @subpage tutorial_py_matcher
We know a great deal about feature detectors and descriptors. It is time to learn how to match different descriptors. OpenCV provides two techniques, Brute-Force matcher and FLANN based matcher.
- @subpage tutorial_py_feature_homography
Now we know about feature matching. Let's mix it up with calib3d module to find objects in a complex image.

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

View File

@ -0,0 +1,113 @@
Drawing Functions in OpenCV {#tutorial_py_drawing_functions}
===========================
Goal
----
- Learn to draw different geometric shapes with OpenCV
- You will learn these functions : **cv.line()**, **cv.circle()** , **cv.rectangle()**,
**cv.ellipse()**, **cv.putText()** etc.
Code
----
In all the above functions, you will see some common arguments as given below:
- img : The image where you want to draw the shapes
- color : Color of the shape. for BGR, pass it as a tuple, eg: (255,0,0) for blue. For
grayscale, just pass the scalar value.
- thickness : Thickness of the line or circle etc. If **-1** is passed for closed figures like
circles, it will fill the shape. *default thickness = 1*
- lineType : Type of line, whether 8-connected, anti-aliased line etc. *By default, it is
8-connected.* cv.LINE_AA gives anti-aliased line which looks great for curves.
### Drawing Line
To draw a line, you need to pass starting and ending coordinates of line. We will create a black
image and draw a blue line on it from top-left to bottom-right corners.
@code{.py}
import numpy as np
import cv2 as cv
# Create a black image
img = np.zeros((512,512,3), np.uint8)
# Draw a diagonal blue line with thickness of 5 px
cv.line(img,(0,0),(511,511),(255,0,0),5)
@endcode
### Drawing Rectangle
To draw a rectangle, you need top-left corner and bottom-right corner of rectangle. This time we
will draw a green rectangle at the top-right corner of image.
@code{.py}
cv.rectangle(img,(384,0),(510,128),(0,255,0),3)
@endcode
### Drawing Circle
To draw a circle, you need its center coordinates and radius. We will draw a circle inside the
rectangle drawn above.
@code{.py}
cv.circle(img,(447,63), 63, (0,0,255), -1)
@endcode
### Drawing Ellipse
To draw the ellipse, we need to pass several arguments. One argument is the center location (x,y).
Next argument is axes lengths (major axis length, minor axis length). angle is the angle of rotation
of ellipse in anti-clockwise direction. startAngle and endAngle denotes the starting and ending of
ellipse arc measured in clockwise direction from major axis. i.e. giving values 0 and 360 gives the
full ellipse. For more details, check the documentation of **cv.ellipse()**. Below example draws a
half ellipse at the center of the image.
@code{.py}
cv.ellipse(img,(256,256),(100,50),0,0,180,255,-1)
@endcode
### Drawing Polygon
To draw a polygon, first you need coordinates of vertices. Make those points into an array of shape
ROWSx1x2 where ROWS are number of vertices and it should be of type int32. Here we draw a small
polygon of with four vertices in yellow color.
@code{.py}
pts = np.array([[10,5],[20,30],[70,20],[50,10]], np.int32)
pts = pts.reshape((-1,1,2))
cv.polylines(img,[pts],True,(0,255,255))
@endcode
@note If third argument is False, you will get a polylines joining all the points, not a closed
shape.
@note cv.polylines() can be used to draw multiple lines. Just create a list of all the lines you
want to draw and pass it to the function. All lines will be drawn individually. It is a much better
and faster way to draw a group of lines than calling cv.line() for each line.
### Adding Text to Images:
To put texts in images, you need specify following things.
- Text data that you want to write
- Position coordinates of where you want put it (i.e. bottom-left corner where data starts).
- Font type (Check **cv.putText()** docs for supported fonts)
- Font Scale (specifies the size of font)
- regular things like color, thickness, lineType etc. For better look, lineType = cv.LINE_AA
is recommended.
We will write **OpenCV** on our image in white color.
@code{.py}
font = cv.FONT_HERSHEY_SIMPLEX
cv.putText(img,'OpenCV',(10,500), font, 4,(255,255,255),2,cv.LINE_AA)
@endcode
### Result
So it is time to see the final result of our drawing. As you studied in previous articles, display
the image to see it.
![image](images/drawing_result.jpg)
Additional Resources
--------------------
-# The angles used in ellipse function is not our circular angles. For more details, visit [this
discussion](http://answers.opencv.org/question/14541/angles-in-ellipse-function/).
Exercises
---------
-# Try to create the logo of OpenCV using drawing functions available in OpenCV.

View File

@ -0,0 +1,4 @@
Getting Started with Images {#tutorial_py_image_display}
===========================
Tutorial content has been moved: @ref tutorial_display_image

View File

@ -0,0 +1,111 @@
Mouse as a Paint-Brush {#tutorial_py_mouse_handling}
======================
Goal
----
- Learn to handle mouse events in OpenCV
- You will learn these functions : **cv.setMouseCallback()**
Simple Demo
-----------
Here, we create a simple application which draws a circle on an image wherever we double-click on
it.
First we create a mouse callback function which is executed when a mouse event take place. Mouse
event can be anything related to mouse like left-button down, left-button up, left-button
double-click etc. It gives us the coordinates (x,y) for every mouse event. With this event and
location, we can do whatever we like. To list all available events available, run the following code
in Python terminal:
@code{.py}
import cv2 as cv
events = [i for i in dir(cv) if 'EVENT' in i]
print( events )
@endcode
Creating mouse callback function has a specific format which is same everywhere. It differs only in
what the function does. So our mouse callback function does one thing, it draws a circle where we
double-click. So see the code below. Code is self-explanatory from comments :
@code{.py}
import numpy as np
import cv2 as cv
# mouse callback function
def draw_circle(event,x,y,flags,param):
if event == cv.EVENT_LBUTTONDBLCLK:
cv.circle(img,(x,y),100,(255,0,0),-1)
# Create a black image, a window and bind the function to window
img = np.zeros((512,512,3), np.uint8)
cv.namedWindow('image')
cv.setMouseCallback('image',draw_circle)
while(1):
cv.imshow('image',img)
if cv.waitKey(20) & 0xFF == 27:
break
cv.destroyAllWindows()
@endcode
More Advanced Demo
------------------
Now we go for a much better application. In this, we draw either rectangles or circles (depending on
the mode we select) by dragging the mouse like we do in Paint application. So our mouse callback
function has two parts, one to draw rectangle and other to draw the circles. This specific example
will be really helpful in creating and understanding some interactive applications like object
tracking, image segmentation etc.
@code{.py}
import numpy as np
import cv2 as cv
drawing = False # true if mouse is pressed
mode = True # if True, draw rectangle. Press 'm' to toggle to curve
ix,iy = -1,-1
# mouse callback function
def draw_circle(event,x,y,flags,param):
global ix,iy,drawing,mode
if event == cv.EVENT_LBUTTONDOWN:
drawing = True
ix,iy = x,y
elif event == cv.EVENT_MOUSEMOVE:
if drawing == True:
if mode == True:
cv.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
else:
cv.circle(img,(x,y),5,(0,0,255),-1)
elif event == cv.EVENT_LBUTTONUP:
drawing = False
if mode == True:
cv.rectangle(img,(ix,iy),(x,y),(0,255,0),-1)
else:
cv.circle(img,(x,y),5,(0,0,255),-1)
@endcode
Next we have to bind this mouse callback function to OpenCV window. In the main loop, we should set
a keyboard binding for key 'm' to toggle between rectangle and circle.
@code{.py}
img = np.zeros((512,512,3), np.uint8)
cv.namedWindow('image')
cv.setMouseCallback('image',draw_circle)
while(1):
cv.imshow('image',img)
k = cv.waitKey(1) & 0xFF
if k == ord('m'):
mode = not mode
elif k == 27:
break
cv.destroyAllWindows()
@endcode
Additional Resources
--------------------
Exercises
---------
-# In our last example, we drew filled rectangle. You modify the code to draw an unfilled
rectangle.

View File

@ -0,0 +1,27 @@
Gui Features in OpenCV {#tutorial_py_table_of_contents_gui}
======================
- @ref tutorial_display_image
Learn to load an
image, display it, and save it back
- @subpage tutorial_py_video_display
Learn to play videos,
capture videos from a camera, and write videos
- @subpage tutorial_py_drawing_functions
Learn to draw lines,
rectangles, ellipses, circles, etc with OpenCV
- @subpage tutorial_py_mouse_handling
Draw stuff with your
mouse
- @subpage tutorial_py_trackbar
Create trackbar to
control certain parameters

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Some files were not shown because too many files have changed in this diff Show More