feat: 切换后端至PaddleOCR-NCNN,切换工程为CMake

1.项目后端整体迁移至PaddleOCR-NCNN算法,已通过基本的兼容性测试
2.工程改为使用CMake组织,后续为了更好地兼容第三方库,不再提供QMake工程
3.重整权利声明文件,重整代码工程,确保最小化侵权风险

Log: 切换后端至PaddleOCR-NCNN,切换工程为CMake
Change-Id: I4d5d2c5d37505a4a24b389b1a4c5d12f17bfa38c
This commit is contained in:
wangzhengyang
2022-05-10 09:54:44 +08:00
parent ecdd171c6f
commit 718c41634f
10018 changed files with 3593797 additions and 186748 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.4 KiB

View File

@ -0,0 +1,92 @@
BRIEF (Binary Robust Independent Elementary Features) {#tutorial_py_brief}
=====================================================
Goal
----
In this chapter
- We will see the basics of BRIEF algorithm
Theory
------
We know SIFT uses 128-dim vector for descriptors. Since it is using floating point numbers, it takes
basically 512 bytes. Similarly SURF also takes minimum of 256 bytes (for 64-dim). Creating such a
vector for thousands of features takes a lot of memory which are not feasible for resource-constraint
applications especially for embedded systems. Larger the memory, longer the time it takes for
matching.
But all these dimensions may not be needed for actual matching. We can compress it using several
methods like PCA, LDA etc. Even other methods like hashing using LSH (Locality Sensitive Hashing) is
used to convert these SIFT descriptors in floating point numbers to binary strings. These binary
strings are used to match features using Hamming distance. This provides better speed-up because
finding hamming distance is just applying XOR and bit count, which are very fast in modern CPUs with
SSE instructions. But here, we need to find the descriptors first, then only we can apply hashing,
which doesn't solve our initial problem on memory.
BRIEF comes into picture at this moment. It provides a shortcut to find the binary strings directly
without finding descriptors. It takes smoothened image patch and selects a set of \f$n_d\f$ (x,y)
location pairs in an unique way (explained in paper). Then some pixel intensity comparisons are done
on these location pairs. For eg, let first location pairs be \f$p\f$ and \f$q\f$. If \f$I(p) < I(q)\f$, then its
result is 1, else it is 0. This is applied for all the \f$n_d\f$ location pairs to get a
\f$n_d\f$-dimensional bitstring.
This \f$n_d\f$ can be 128, 256 or 512. OpenCV supports all of these, but by default, it would be 256
(OpenCV represents it in bytes. So the values will be 16, 32 and 64). So once you get this, you can
use Hamming Distance to match these descriptors.
One important point is that BRIEF is a feature descriptor, it doesn't provide any method to find the
features. So you will have to use any other feature detectors like SIFT, SURF etc. The paper
recommends to use CenSurE which is a fast detector and BRIEF works even slightly better for CenSurE
points than for SURF points.
In short, BRIEF is a faster method feature descriptor calculation and matching. It also provides
high recognition rate unless there is large in-plane rotation.
STAR(CenSurE) in OpenCV
------
STAR is a feature detector derived from CenSurE.
Unlike CenSurE however, which uses polygons like squares, hexagons and octagons to approach a circle,
Star emulates a circle with 2 overlapping squares: 1 upright and 1 45-degree rotated. These polygons are bi-level.
They can be seen as polygons with thick borders. The borders and the enclosed area have weights of opposing signs.
This has better computational characteristics than other scale-space detectors and it is capable of real-time implementation.
In contrast to SIFT and SURF, which find extrema at sub-sampled pixels that compromises accuracy at larger scales,
CenSurE creates a feature vector using full spatial resolution at all scales in the pyramid.
BRIEF in OpenCV
---------------
Below code shows the computation of BRIEF descriptors with the help of CenSurE detector.
note, that you need [opencv contrib](https://github.com/opencv/opencv_contrib)) to use this.
@code{.py}
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('simple.jpg',0)
# Initiate FAST detector
star = cv.xfeatures2d.StarDetector_create()
# Initiate BRIEF extractor
brief = cv.xfeatures2d.BriefDescriptorExtractor_create()
# find the keypoints with STAR
kp = star.detect(img,None)
# compute the descriptors with BRIEF
kp, des = brief.compute(img, kp)
print( brief.descriptorSize() )
print( des.shape )
@endcode
The function brief.getDescriptorSize() gives the \f$n_d\f$ size used in bytes. By default it is 32. Next one
is matching, which will be done in another chapter.
Additional Resources
--------------------
-# Michael Calonder, Vincent Lepetit, Christoph Strecha, and Pascal Fua, "BRIEF: Binary Robust
Independent Elementary Features", 11th European Conference on Computer Vision (ECCV), Heraklion,
Crete. LNCS Springer, September 2010.
2. [LSH (Locality Sensitive Hashing)](https://en.wikipedia.org/wiki/Locality-sensitive_hashing) at wikipedia.

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

View File

@ -0,0 +1,143 @@
FAST Algorithm for Corner Detection {#tutorial_py_fast}
===================================
Goal
----
In this chapter,
- We will understand the basics of FAST algorithm
- We will find corners using OpenCV functionalities for FAST algorithm.
Theory
------
We saw several feature detectors and many of them are really good. But when looking from a real-time
application point of view, they are not fast enough. One best example would be SLAM (Simultaneous
Localization and Mapping) mobile robot which have limited computational resources.
As a solution to this, FAST (Features from Accelerated Segment Test) algorithm was proposed by
Edward Rosten and Tom Drummond in their paper "Machine learning for high-speed corner detection" in
2006 (Later revised it in 2010). A basic summary of the algorithm is presented below. Refer original
paper for more details (All the images are taken from original paper).
### Feature Detection using FAST
-# Select a pixel \f$p\f$ in the image which is to be identified as an interest point or not. Let its
intensity be \f$I_p\f$.
2. Select appropriate threshold value \f$t\f$.
3. Consider a circle of 16 pixels around the pixel under test. (See the image below)
![image](images/fast_speedtest.jpg)
-# Now the pixel \f$p\f$ is a corner if there exists a set of \f$n\f$ contiguous pixels in the circle (of
16 pixels) which are all brighter than \f$I_p + t\f$, or all darker than \f$I_p t\f$. (Shown as white
dash lines in the above image). \f$n\f$ was chosen to be 12.
5. A **high-speed test** was proposed to exclude a large number of non-corners. This test examines
only the four pixels at 1, 9, 5 and 13 (First 1 and 9 are tested if they are too brighter or
darker. If so, then checks 5 and 13). If \f$p\f$ is a corner, then at least three of these must all
be brighter than \f$I_p + t\f$ or darker than \f$I_p t\f$. If neither of these is the case, then \f$p\f$
cannot be a corner. The full segment test criterion can then be applied to the passed candidates
by examining all pixels in the circle. This detector in itself exhibits high performance, but
there are several weaknesses:
- It does not reject as many candidates for n \< 12.
- The choice of pixels is not optimal because its efficiency depends on ordering of the
questions and distribution of corner appearances.
- Results of high-speed tests are thrown away.
- Multiple features are detected adjacent to one another.
First 3 points are addressed with a machine learning approach. Last one is addressed using
non-maximal suppression.
### Machine Learning a Corner Detector
-# Select a set of images for training (preferably from the target application domain)
2. Run FAST algorithm in every images to find feature points.
3. For every feature point, store the 16 pixels around it as a vector. Do it for all the images to
get feature vector \f$P\f$.
4. Each pixel (say \f$x\f$) in these 16 pixels can have one of the following three states:
![image](images/fast_eqns.jpg)
-# Depending on these states, the feature vector \f$P\f$ is subdivided into 3 subsets, \f$P_d\f$, \f$P_s\f$,
\f$P_b\f$.
6. Define a new boolean variable, \f$K_p\f$, which is true if \f$p\f$ is a corner and false otherwise.
7. Use the ID3 algorithm (decision tree classifier) to query each subset using the variable \f$K_p\f$
for the knowledge about the true class. It selects the \f$x\f$ which yields the most information
about whether the candidate pixel is a corner, measured by the entropy of \f$K_p\f$.
8. This is recursively applied to all the subsets until its entropy is zero.
9. The decision tree so created is used for fast detection in other images.
### Non-maximal Suppression
Detecting multiple interest points in adjacent locations is another problem. It is solved by using
Non-maximum Suppression.
-# Compute a score function, \f$V\f$ for all the detected feature points. \f$V\f$ is the sum of absolute
difference between \f$p\f$ and 16 surrounding pixels values.
2. Consider two adjacent keypoints and compute their \f$V\f$ values.
3. Discard the one with lower \f$V\f$ value.
### Summary
It is several times faster than other existing corner detectors.
But it is not robust to high levels of noise. It is dependent on a threshold.
FAST Feature Detector in OpenCV
-------------------------------
It is called as any other feature detector in OpenCV. If you want, you can specify the threshold,
whether non-maximum suppression to be applied or not, the neighborhood to be used etc.
For the neighborhood, three flags are defined, cv.FAST_FEATURE_DETECTOR_TYPE_5_8,
cv.FAST_FEATURE_DETECTOR_TYPE_7_12 and cv.FAST_FEATURE_DETECTOR_TYPE_9_16. Below is a
simple code on how to detect and draw the FAST feature points.
@code{.py}
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('blox.jpg',0) # `<opencv_root>/samples/data/blox.jpg`
# Initiate FAST object with default values
fast = cv.FastFeatureDetector_create()
# find and draw the keypoints
kp = fast.detect(img,None)
img2 = cv.drawKeypoints(img, kp, None, color=(255,0,0))
# Print all default params
print( "Threshold: {}".format(fast.getThreshold()) )
print( "nonmaxSuppression:{}".format(fast.getNonmaxSuppression()) )
print( "neighborhood: {}".format(fast.getType()) )
print( "Total Keypoints with nonmaxSuppression: {}".format(len(kp)) )
cv.imwrite('fast_true.png', img2)
# Disable nonmaxSuppression
fast.setNonmaxSuppression(0)
kp = fast.detect(img, None)
print( "Total Keypoints without nonmaxSuppression: {}".format(len(kp)) )
img3 = cv.drawKeypoints(img, kp, None, color=(255,0,0))
cv.imwrite('fast_false.png', img3)
@endcode
See the results. First image shows FAST with nonmaxSuppression and second one without
nonmaxSuppression:
![image](images/fast_kp.jpg)
Additional Resources
--------------------
-# Edward Rosten and Tom Drummond, “Machine learning for high speed corner detection” in 9th
European Conference on Computer Vision, vol. 1, 2006, pp. 430443.
2. Edward Rosten, Reid Porter, and Tom Drummond, "Faster and better: a machine learning approach to
corner detection" in IEEE Trans. Pattern Analysis and Machine Intelligence, 2010, vol 32, pp.
105-119.
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

View File

@ -0,0 +1,110 @@
Feature Matching + Homography to find Objects {#tutorial_py_feature_homography}
=============================================
Goal
----
In this chapter,
- We will mix up the feature matching and findHomography from calib3d module to find known
objects in a complex image.
Basics
------
So what we did in last session? We used a queryImage, found some feature points in it, we took
another trainImage, found the features in that image too and we found the best matches among them.
In short, we found locations of some parts of an object in another cluttered image. This information
is sufficient to find the object exactly on the trainImage.
For that, we can use a function from calib3d module, ie **cv.findHomography()**. If we pass the set
of points from both the images, it will find the perspective transformation of that object. Then we
can use **cv.perspectiveTransform()** to find the object. It needs atleast four correct points to
find the transformation.
We have seen that there can be some possible errors while matching which may affect the result. To
solve this problem, algorithm uses RANSAC or LEAST_MEDIAN (which can be decided by the flags). So
good matches which provide correct estimation are called inliers and remaining are called outliers.
**cv.findHomography()** returns a mask which specifies the inlier and outlier points.
So let's do it !!!
Code
----
First, as usual, let's find SIFT features in images and apply the ratio test to find the best
matches.
@code{.py}
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
MIN_MATCH_COUNT = 10
img1 = cv.imread('box.png',0) # queryImage
img2 = cv.imread('box_in_scene.png',0) # trainImage
# Initiate SIFT detector
sift = cv.SIFT_create()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks = 50)
flann = cv.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des1,des2,k=2)
# store all the good matches as per Lowe's ratio test.
good = []
for m,n in matches:
if m.distance < 0.7*n.distance:
good.append(m)
@endcode
Now we set a condition that atleast 10 matches (defined by MIN_MATCH_COUNT) are to be there to
find the object. Otherwise simply show a message saying not enough matches are present.
If enough matches are found, we extract the locations of matched keypoints in both the images. They
are passed to find the perspective transformation. Once we get this 3x3 transformation matrix, we use
it to transform the corners of queryImage to corresponding points in trainImage. Then we draw it.
@code{.py}
if len(good)>MIN_MATCH_COUNT:
src_pts = np.float32([ kp1[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
dst_pts = np.float32([ kp2[m.trainIdx].pt for m in good ]).reshape(-1,1,2)
M, mask = cv.findHomography(src_pts, dst_pts, cv.RANSAC,5.0)
matchesMask = mask.ravel().tolist()
h,w,d = img1.shape
pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).reshape(-1,1,2)
dst = cv.perspectiveTransform(pts,M)
img2 = cv.polylines(img2,[np.int32(dst)],True,255,3, cv.LINE_AA)
else:
print( "Not enough matches are found - {}/{}".format(len(good), MIN_MATCH_COUNT) )
matchesMask = None
@endcode
Finally we draw our inliers (if successfully found the object) or matching keypoints (if failed).
@code{.py}
draw_params = dict(matchColor = (0,255,0), # draw matches in green color
singlePointColor = None,
matchesMask = matchesMask, # draw only inliers
flags = 2)
img3 = cv.drawMatches(img1,kp1,img2,kp2,good,None,**draw_params)
plt.imshow(img3, 'gray'),plt.show()
@endcode
See the result below. Object is marked in white color in cluttered image:
![image](images/homography_findobj.jpg)
Additional Resources
--------------------
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

View File

@ -0,0 +1,150 @@
Harris Corner Detection {#tutorial_py_features_harris}
=======================
Goal
----
In this chapter,
- We will understand the concepts behind Harris Corner Detection.
- We will see the following functions: **cv.cornerHarris()**, **cv.cornerSubPix()**
Theory
------
In the last chapter, we saw that corners are regions in the image with large variation in intensity in
all the directions. One early attempt to find these corners was done by **Chris Harris & Mike
Stephens** in their paper **A Combined Corner and Edge Detector** in 1988, so now it is called
the Harris Corner Detector. He took this simple idea to a mathematical form. It basically finds the
difference in intensity for a displacement of \f$(u,v)\f$ in all directions. This is expressed as below:
\f[E(u,v) = \sum_{x,y} \underbrace{w(x,y)}_\text{window function} \, [\underbrace{I(x+u,y+v)}_\text{shifted intensity}-\underbrace{I(x,y)}_\text{intensity}]^2\f]
The window function is either a rectangular window or a Gaussian window which gives weights to pixels
underneath.
We have to maximize this function \f$E(u,v)\f$ for corner detection. That means we have to maximize the
second term. Applying Taylor Expansion to the above equation and using some mathematical steps (please
refer to any standard text books you like for full derivation), we get the final equation as:
\f[E(u,v) \approx \begin{bmatrix} u & v \end{bmatrix} M \begin{bmatrix} u \\ v \end{bmatrix}\f]
where
\f[M = \sum_{x,y} w(x,y) \begin{bmatrix}I_x I_x & I_x I_y \\
I_x I_y & I_y I_y \end{bmatrix}\f]
Here, \f$I_x\f$ and \f$I_y\f$ are image derivatives in x and y directions respectively. (These can be easily found
using **cv.Sobel()**).
Then comes the main part. After this, they created a score, basically an equation, which
determines if a window can contain a corner or not.
\f[R = \det(M) - k(\operatorname{trace}(M))^2\f]
where
- \f$\det(M) = \lambda_1 \lambda_2\f$
- \f$\operatorname{trace}(M) = \lambda_1 + \lambda_2\f$
- \f$\lambda_1\f$ and \f$\lambda_2\f$ are the eigenvalues of \f$M\f$
So the magnitudes of these eigenvalues decide whether a region is a corner, an edge, or flat.
- When \f$|R|\f$ is small, which happens when \f$\lambda_1\f$ and \f$\lambda_2\f$ are small, the region is
flat.
- When \f$R<0\f$, which happens when \f$\lambda_1 >> \lambda_2\f$ or vice versa, the region is edge.
- When \f$R\f$ is large, which happens when \f$\lambda_1\f$ and \f$\lambda_2\f$ are large and
\f$\lambda_1 \sim \lambda_2\f$, the region is a corner.
It can be represented in a nice picture as follows:
![image](images/harris_region.jpg)
So the result of Harris Corner Detection is a grayscale image with these scores. Thresholding for a
suitable score gives you the corners in the image. We will do it with a simple image.
Harris Corner Detector in OpenCV
--------------------------------
OpenCV has the function **cv.cornerHarris()** for this purpose. Its arguments are:
- **img** - Input image. It should be grayscale and float32 type.
- **blockSize** - It is the size of neighbourhood considered for corner detection
- **ksize** - Aperture parameter of the Sobel derivative used.
- **k** - Harris detector free parameter in the equation.
See the example below:
@code{.py}
import numpy as np
import cv2 as cv
filename = 'chessboard.png'
img = cv.imread(filename)
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
gray = np.float32(gray)
dst = cv.cornerHarris(gray,2,3,0.04)
#result is dilated for marking the corners, not important
dst = cv.dilate(dst,None)
# Threshold for an optimal value, it may vary depending on the image.
img[dst>0.01*dst.max()]=[0,0,255]
cv.imshow('dst',img)
if cv.waitKey(0) & 0xff == 27:
cv.destroyAllWindows()
@endcode
Below are the three results:
![image](images/harris_result.jpg)
Corner with SubPixel Accuracy
-----------------------------
Sometimes, you may need to find the corners with maximum accuracy. OpenCV comes with a function
**cv.cornerSubPix()** which further refines the corners detected with sub-pixel accuracy. Below is
an example. As usual, we need to find the Harris corners first. Then we pass the centroids of these
corners (There may be a bunch of pixels at a corner, we take their centroid) to refine them. Harris
corners are marked in red pixels and refined corners are marked in green pixels. For this function,
we have to define the criteria when to stop the iteration. We stop it after a specified number of
iterations or a certain accuracy is achieved, whichever occurs first. We also need to define the size
of the neighbourhood it searches for corners.
@code{.py}
import numpy as np
import cv2 as cv
filename = 'chessboard2.jpg'
img = cv.imread(filename)
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
# find Harris corners
gray = np.float32(gray)
dst = cv.cornerHarris(gray,2,3,0.04)
dst = cv.dilate(dst,None)
ret, dst = cv.threshold(dst,0.01*dst.max(),255,0)
dst = np.uint8(dst)
# find centroids
ret, labels, stats, centroids = cv.connectedComponentsWithStats(dst)
# define the criteria to stop and refine the corners
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 100, 0.001)
corners = cv.cornerSubPix(gray,np.float32(centroids),(5,5),(-1,-1),criteria)
# Now draw them
res = np.hstack((centroids,corners))
res = np.int0(res)
img[res[:,1],res[:,0]]=[0,0,255]
img[res[:,3],res[:,2]] = [0,255,0]
cv.imwrite('subpixel5.png',img)
@endcode
Below is the result, where some important locations are shown in the zoomed window to visualize:
![image](images/subpixel3.png)
Additional Resources
--------------------
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 49 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.0 KiB

View File

@ -0,0 +1,89 @@
Understanding Features {#tutorial_py_features_meaning}
======================
Goal
----
In this chapter, we will just try to understand what are features, why are they important, why
corners are important etc.
Explanation
-----------
Most of you will have played the jigsaw puzzle games. You get a lot of small pieces of an image,
where you need to assemble them correctly to form a big real image. **The question is, how you do
it?** What about the projecting the same theory to a computer program so that computer can play
jigsaw puzzles? If the computer can play jigsaw puzzles, why can't we give a lot of real-life images
of a good natural scenery to computer and tell it to stitch all those images to a big single image?
If the computer can stitch several natural images to one, what about giving a lot of pictures of a
building or any structure and tell computer to create a 3D model out of it?
Well, the questions and imaginations continue. But it all depends on the most basic question: How do
you play jigsaw puzzles? How do you arrange lots of scrambled image pieces into a big single image?
How can you stitch a lot of natural images to a single image?
The answer is, we are looking for specific patterns or specific features which are unique, can
be easily tracked and can be easily compared. If we go for a definition of such a feature, we may
find it difficult to express it in words, but we know what they are. If someone asks you to point
out one good feature which can be compared across several images, you can point out one. That is
why even small children can simply play these games. We search for these features in an image,
find them, look for the same features in other images and align them. That's it. (In jigsaw puzzle,
we look more into continuity of different images). All these abilities are present in us inherently.
So our one basic question expands to more in number, but becomes more specific. **What are these
features?**. (The answer should be understandable also to a computer.)
It is difficult to say how humans find these features. This is already programmed in our brain.
But if we look deep into some pictures and search for different patterns, we will find something
interesting. For example, take below image:
![image](images/feature_building.jpg)
The image is very simple. At the top of image, six small image patches are given. Question for you is to
find the exact location of these patches in the original image. How many correct results can you
find?
A and B are flat surfaces and they are spread over a lot of area. It is difficult to find the exact
location of these patches.
C and D are much more simple. They are edges of the building. You can find an approximate location,
but exact location is still difficult. This is because the pattern is same everywhere along the edge.
At the edge, however, it is different. An edge is therefore better feature compared to flat area, but
not good enough (It is good in jigsaw puzzle for comparing continuity of edges).
Finally, E and F are some corners of the building. And they can be easily found. Because at the
corners, wherever you move this patch, it will look different. So they can be considered as good
features. So now we move into simpler (and widely used image) for better understanding.
![image](images/feature_simple.png)
Just like above, the blue patch is flat area and difficult to find and track. Wherever you move the blue
patch it looks the same. The black patch has an edge. If you move it in vertical direction (i.e.
along the gradient) it changes. Moved along the edge (parallel to edge), it looks the same. And for
red patch, it is a corner. Wherever you move the patch, it looks different, means it is unique. So
basically, corners are considered to be good features in an image. (Not just corners, in some cases
blobs are considered good features).
So now we answered our question, "what are these features?". But next question arises. How do we
find them? Or how do we find the corners?. We answered that in an intuitive way, i.e., look for
the regions in images which have maximum variation when moved (by a small amount) in all regions
around it. This would be projected into computer language in coming chapters. So finding these image
features is called **Feature Detection**.
We found the features in the images. Once you have found it, you should be able to find the same
in the other images. How is this done? We take a region around the feature, we explain it in our own
words, like "upper part is blue sky, lower part is region from a building, on that building there is
glass etc" and you search for the same area in the other images. Basically, you are describing the
feature. Similarly, a computer also should describe the region around the feature so that it can
find it in other images. So called description is called **Feature Description**. Once you have the
features and its description, you can find same features in all images and align them, stitch them together
or do whatever you want.
So in this module, we are looking to different algorithms in OpenCV to find features, describe them,
match them etc.
Additional Resources
--------------------
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

View File

@ -0,0 +1,217 @@
Feature Matching {#tutorial_py_matcher}
================
Goal
----
In this chapter
- We will see how to match features in one image with others.
- We will use the Brute-Force matcher and FLANN Matcher in OpenCV
Basics of Brute-Force Matcher
-----------------------------
Brute-Force matcher is simple. It takes the descriptor of one feature in first set and is matched
with all other features in second set using some distance calculation. And the closest one is
returned.
For BF matcher, first we have to create the BFMatcher object using **cv.BFMatcher()**. It takes two
optional params. First one is normType. It specifies the distance measurement to be used. By
default, it is cv.NORM_L2. It is good for SIFT, SURF etc (cv.NORM_L1 is also there). For binary
string based descriptors like ORB, BRIEF, BRISK etc, cv.NORM_HAMMING should be used, which used
Hamming distance as measurement. If ORB is using WTA_K == 3 or 4, cv.NORM_HAMMING2 should be
used.
Second param is boolean variable, crossCheck which is false by default. If it is true, Matcher
returns only those matches with value (i,j) such that i-th descriptor in set A has j-th descriptor
in set B as the best match and vice-versa. That is, the two features in both sets should match each
other. It provides consistent result, and is a good alternative to ratio test proposed by D.Lowe in
SIFT paper.
Once it is created, two important methods are *BFMatcher.match()* and *BFMatcher.knnMatch()*. First
one returns the best match. Second method returns k best matches where k is specified by the user.
It may be useful when we need to do additional work on that.
Like we used cv.drawKeypoints() to draw keypoints, **cv.drawMatches()** helps us to draw the
matches. It stacks two images horizontally and draw lines from first image to second image showing
best matches. There is also **cv.drawMatchesKnn** which draws all the k best matches. If k=2, it
will draw two match-lines for each keypoint. So we have to pass a mask if we want to selectively
draw it.
Let's see one example for each of SIFT and ORB (Both use different distance measurements).
### Brute-Force Matching with ORB Descriptors
Here, we will see a simple example on how to match features between two images. In this case, I have
a queryImage and a trainImage. We will try to find the queryImage in trainImage using feature
matching. ( The images are /samples/data/box.png and /samples/data/box_in_scene.png)
We are using ORB descriptors to match features. So let's start with loading images, finding
descriptors etc.
@code{.py}
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
img1 = cv.imread('box.png',cv.IMREAD_GRAYSCALE) # queryImage
img2 = cv.imread('box_in_scene.png',cv.IMREAD_GRAYSCALE) # trainImage
# Initiate ORB detector
orb = cv.ORB_create()
# find the keypoints and descriptors with ORB
kp1, des1 = orb.detectAndCompute(img1,None)
kp2, des2 = orb.detectAndCompute(img2,None)
@endcode
Next we create a BFMatcher object with distance measurement cv.NORM_HAMMING (since we are using
ORB) and crossCheck is switched on for better results. Then we use Matcher.match() method to get the
best matches in two images. We sort them in ascending order of their distances so that best matches
(with low distance) come to front. Then we draw only first 10 matches (Just for sake of visibility.
You can increase it as you like)
@code{.py}
# create BFMatcher object
bf = cv.BFMatcher(cv.NORM_HAMMING, crossCheck=True)
# Match descriptors.
matches = bf.match(des1,des2)
# Sort them in the order of their distance.
matches = sorted(matches, key = lambda x:x.distance)
# Draw first 10 matches.
img3 = cv.drawMatches(img1,kp1,img2,kp2,matches[:10],None,flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
plt.imshow(img3),plt.show()
@endcode
Below is the result I got:
![image](images/matcher_result1.jpg)
### What is this Matcher Object?
The result of matches = bf.match(des1,des2) line is a list of DMatch objects. This DMatch object has
following attributes:
- DMatch.distance - Distance between descriptors. The lower, the better it is.
- DMatch.trainIdx - Index of the descriptor in train descriptors
- DMatch.queryIdx - Index of the descriptor in query descriptors
- DMatch.imgIdx - Index of the train image.
### Brute-Force Matching with SIFT Descriptors and Ratio Test
This time, we will use BFMatcher.knnMatch() to get k best matches. In this example, we will take k=2
so that we can apply ratio test explained by D.Lowe in his paper.
@code{.py}
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
img1 = cv.imread('box.png',cv.IMREAD_GRAYSCALE) # queryImage
img2 = cv.imread('box_in_scene.png',cv.IMREAD_GRAYSCALE) # trainImage
# Initiate SIFT detector
sift = cv.SIFT_create()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
# BFMatcher with default params
bf = cv.BFMatcher()
matches = bf.knnMatch(des1,des2,k=2)
# Apply ratio test
good = []
for m,n in matches:
if m.distance < 0.75*n.distance:
good.append([m])
# cv.drawMatchesKnn expects list of lists as matches.
img3 = cv.drawMatchesKnn(img1,kp1,img2,kp2,good,None,flags=cv.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
plt.imshow(img3),plt.show()
@endcode
See the result below:
![image](images/matcher_result2.jpg)
FLANN based Matcher
-------------------
FLANN stands for Fast Library for Approximate Nearest Neighbors. It contains a collection of
algorithms optimized for fast nearest neighbor search in large datasets and for high dimensional
features. It works faster than BFMatcher for large datasets. We will see the second example
with FLANN based matcher.
For FLANN based matcher, we need to pass two dictionaries which specifies the algorithm to be used,
its related parameters etc. First one is IndexParams. For various algorithms, the information to be
passed is explained in FLANN docs. As a summary, for algorithms like SIFT, SURF etc. you can pass
following:
@code{.py}
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
@endcode
While using ORB, you can pass the following. The commented values are recommended as per the docs,
but it didn't provide required results in some cases. Other values worked fine.:
@code{.py}
FLANN_INDEX_LSH = 6
index_params= dict(algorithm = FLANN_INDEX_LSH,
table_number = 6, # 12
key_size = 12, # 20
multi_probe_level = 1) #2
@endcode
Second dictionary is the SearchParams. It specifies the number of times the trees in the index
should be recursively traversed. Higher values gives better precision, but also takes more time. If
you want to change the value, pass search_params = dict(checks=100).
With this information, we are good to go.
@code{.py}
import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt
img1 = cv.imread('box.png',cv.IMREAD_GRAYSCALE) # queryImage
img2 = cv.imread('box_in_scene.png',cv.IMREAD_GRAYSCALE) # trainImage
# Initiate SIFT detector
sift = cv.SIFT_create()
# find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(img1,None)
kp2, des2 = sift.detectAndCompute(img2,None)
# FLANN parameters
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50) # or pass empty dictionary
flann = cv.FlannBasedMatcher(index_params,search_params)
matches = flann.knnMatch(des1,des2,k=2)
# Need to draw only good matches, so create a mask
matchesMask = [[0,0] for i in range(len(matches))]
# ratio test as per Lowe's paper
for i,(m,n) in enumerate(matches):
if m.distance < 0.7*n.distance:
matchesMask[i]=[1,0]
draw_params = dict(matchColor = (0,255,0),
singlePointColor = (255,0,0),
matchesMask = matchesMask,
flags = cv.DrawMatchesFlags_DEFAULT)
img3 = cv.drawMatchesKnn(img1,kp1,img2,kp2,matches,None,**draw_params)
plt.imshow(img3,),plt.show()
@endcode
See the result below:
![image](images/matcher_flann.jpg)
Additional Resources
--------------------
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

View File

@ -0,0 +1,98 @@
ORB (Oriented FAST and Rotated BRIEF) {#tutorial_py_orb}
=====================================
Goal
----
In this chapter,
- We will see the basics of ORB
Theory
------
As an OpenCV enthusiast, the most important thing about the ORB is that it came from "OpenCV Labs".
This algorithm was brought up by Ethan Rublee, Vincent Rabaud, Kurt Konolige and Gary R. Bradski in
their paper **ORB: An efficient alternative to SIFT or SURF** in 2011. As the title says, it is a
good alternative to SIFT and SURF in computation cost, matching performance and mainly the patents.
Yes, SIFT and SURF are patented and you are supposed to pay them for its use. But ORB is not !!!
ORB is basically a fusion of FAST keypoint detector and BRIEF descriptor with many modifications to
enhance the performance. First it use FAST to find keypoints, then apply Harris corner measure to
find top N points among them. It also use pyramid to produce multiscale-features. But one problem is
that, FAST doesn't compute the orientation. So what about rotation invariance? Authors came up with
following modification.
It computes the intensity weighted centroid of the patch with located corner at center. The
direction of the vector from this corner point to centroid gives the orientation. To improve the
rotation invariance, moments are computed with x and y which should be in a circular region of
radius \f$r\f$, where \f$r\f$ is the size of the patch.
Now for descriptors, ORB use BRIEF descriptors. But we have already seen that BRIEF performs poorly
with rotation. So what ORB does is to "steer" BRIEF according to the orientation of keypoints. For
any feature set of \f$n\f$ binary tests at location \f$(x_i, y_i)\f$, define a \f$2 \times n\f$ matrix, \f$S\f$
which contains the coordinates of these pixels. Then using the orientation of patch, \f$\theta\f$, its
rotation matrix is found and rotates the \f$S\f$ to get steered(rotated) version \f$S_\theta\f$.
ORB discretize the angle to increments of \f$2 \pi /30\f$ (12 degrees), and construct a lookup table of
precomputed BRIEF patterns. As long as the keypoint orientation \f$\theta\f$ is consistent across views,
the correct set of points \f$S_\theta\f$ will be used to compute its descriptor.
BRIEF has an important property that each bit feature has a large variance and a mean near 0.5. But
once it is oriented along keypoint direction, it loses this property and become more distributed.
High variance makes a feature more discriminative, since it responds differentially to inputs.
Another desirable property is to have the tests uncorrelated, since then each test will contribute
to the result. To resolve all these, ORB runs a greedy search among all possible binary tests to
find the ones that have both high variance and means close to 0.5, as well as being uncorrelated.
The result is called **rBRIEF**.
For descriptor matching, multi-probe LSH which improves on the traditional LSH, is used. The paper
says ORB is much faster than SURF and SIFT and ORB descriptor works better than SURF. ORB is a good
choice in low-power devices for panorama stitching etc.
ORB in OpenCV
-------------
As usual, we have to create an ORB object with the function, **cv.ORB()** or using feature2d common
interface. It has a number of optional parameters. Most useful ones are nFeatures which denotes
maximum number of features to be retained (by default 500), scoreType which denotes whether Harris
score or FAST score to rank the features (by default, Harris score) etc. Another parameter, WTA_K
decides number of points that produce each element of the oriented BRIEF descriptor. By default it
is two, ie selects two points at a time. In that case, for matching, NORM_HAMMING distance is used.
If WTA_K is 3 or 4, which takes 3 or 4 points to produce BRIEF descriptor, then matching distance
is defined by NORM_HAMMING2.
Below is a simple code which shows the use of ORB.
@code{.py}
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('simple.jpg',0)
# Initiate ORB detector
orb = cv.ORB_create()
# find the keypoints with ORB
kp = orb.detect(img,None)
# compute the descriptors with ORB
kp, des = orb.compute(img, kp)
# draw only keypoints location,not size and orientation
img2 = cv.drawKeypoints(img, kp, None, color=(0,255,0), flags=0)
plt.imshow(img2), plt.show()
@endcode
See the result below:
![image](images/orb_kp.jpg)
ORB feature matching, we will do in another chapter.
Additional Resources
--------------------
-# Ethan Rublee, Vincent Rabaud, Kurt Konolige, Gary R. Bradski: ORB: An efficient alternative to
SIFT or SURF. ICCV 2011: 2564-2571.
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.5 KiB

View File

@ -0,0 +1,75 @@
Shi-Tomasi Corner Detector & Good Features to Track {#tutorial_py_shi_tomasi}
===================================================
Goal
----
In this chapter,
- We will learn about the another corner detector: Shi-Tomasi Corner Detector
- We will see the function: **cv.goodFeaturesToTrack()**
Theory
------
In last chapter, we saw Harris Corner Detector. Later in 1994, J. Shi and C. Tomasi made a small
modification to it in their paper **Good Features to Track** which shows better results compared to
Harris Corner Detector. The scoring function in Harris Corner Detector was given by:
\f[R = \lambda_1 \lambda_2 - k(\lambda_1+\lambda_2)^2\f]
Instead of this, Shi-Tomasi proposed:
\f[R = \min(\lambda_1, \lambda_2)\f]
If it is a greater than a threshold value, it is considered as a corner. If we plot it in
\f$\lambda_1 - \lambda_2\f$ space as we did in Harris Corner Detector, we get an image as below:
![image](images/shitomasi_space.png)
From the figure, you can see that only when \f$\lambda_1\f$ and \f$\lambda_2\f$ are above a minimum value,
\f$\lambda_{\min}\f$, it is considered as a corner(green region).
Code
----
OpenCV has a function, **cv.goodFeaturesToTrack()**. It finds N strongest corners in the image by
Shi-Tomasi method (or Harris Corner Detection, if you specify it). As usual, image should be a
grayscale image. Then you specify number of corners you want to find. Then you specify the quality
level, which is a value between 0-1, which denotes the minimum quality of corner below which
everyone is rejected. Then we provide the minimum euclidean distance between corners detected.
With all this information, the function finds corners in the image. All corners below quality
level are rejected. Then it sorts the remaining corners based on quality in the descending order.
Then function takes first strongest corner, throws away all the nearby corners in the range of
minimum distance and returns N strongest corners.
In below example, we will try to find 25 best corners:
@code{.py}
import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
img = cv.imread('blox.jpg')
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
corners = cv.goodFeaturesToTrack(gray,25,0.01,10)
corners = np.int0(corners)
for i in corners:
x,y = i.ravel()
cv.circle(img,(x,y),3,255,-1)
plt.imshow(img),plt.show()
@endcode
See the result below:
![image](images/shitomasi_block1.jpg)
This function is more appropriate for tracking. We will see that when its time comes.
Additional Resources
--------------------
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 33 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 15 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.3 KiB

View File

@ -0,0 +1,168 @@
Introduction to SIFT (Scale-Invariant Feature Transform) {#tutorial_py_sift_intro}
========================================================
Goal
----
In this chapter,
- We will learn about the concepts of SIFT algorithm
- We will learn to find SIFT Keypoints and Descriptors.
Theory
------
In last couple of chapters, we saw some corner detectors like Harris etc. They are
rotation-invariant, which means, even if the image is rotated, we can find the same corners. It is
obvious because corners remain corners in rotated image also. But what about scaling? A corner may
not be a corner if the image is scaled. For example, check a simple image below. A corner in a small
image within a small window is flat when it is zoomed in the same window. So Harris corner is not
scale invariant.
![image](images/sift_scale_invariant.jpg)
In 2004, **D.Lowe**, University of British Columbia, came up with a new algorithm, Scale
Invariant Feature Transform (SIFT) in his paper, **Distinctive Image Features from Scale-Invariant
Keypoints**, which extract keypoints and compute its descriptors. *(This paper is easy to understand
and considered to be best material available on SIFT. This explanation is just a short summary of
this paper)*.
There are mainly four steps involved in SIFT algorithm. We will see them one-by-one.
### 1. Scale-space Extrema Detection
From the image above, it is obvious that we can't use the same window to detect keypoints with
different scale. It is OK with small corner. But to detect larger corners we need larger windows.
For this, scale-space filtering is used. In it, Laplacian of Gaussian is found for the image with
various \f$\sigma\f$ values. LoG acts as a blob detector which detects blobs in various sizes due to
change in \f$\sigma\f$. In short, \f$\sigma\f$ acts as a scaling parameter. For eg, in the above image,
gaussian kernel with low \f$\sigma\f$ gives high value for small corner while gaussian kernel with high
\f$\sigma\f$ fits well for larger corner. So, we can find the local maxima across the scale and space
which gives us a list of \f$(x,y,\sigma)\f$ values which means there is a potential keypoint at (x,y) at
\f$\sigma\f$ scale.
But this LoG is a little costly, so SIFT algorithm uses Difference of Gaussians which is an
approximation of LoG. Difference of Gaussian is obtained as the difference of Gaussian blurring of
an image with two different \f$\sigma\f$, let it be \f$\sigma\f$ and \f$k\sigma\f$. This process is done for
different octaves of the image in Gaussian Pyramid. It is represented in below image:
![image](images/sift_dog.jpg)
Once this DoG are found, images are searched for local extrema over scale and space. For eg, one
pixel in an image is compared with its 8 neighbours as well as 9 pixels in next scale and 9 pixels
in previous scales. If it is a local extrema, it is a potential keypoint. It basically means that
keypoint is best represented in that scale. It is shown in below image:
![image](images/sift_local_extrema.jpg)
Regarding different parameters, the paper gives some empirical data which can be summarized as,
number of octaves = 4, number of scale levels = 5, initial \f$\sigma=1.6\f$, \f$k=\sqrt{2}\f$ etc as optimal
values.
### 2. Keypoint Localization
Once potential keypoints locations are found, they have to be refined to get more accurate results.
They used Taylor series expansion of scale space to get more accurate location of extrema, and if
the intensity at this extrema is less than a threshold value (0.03 as per the paper), it is
rejected. This threshold is called **contrastThreshold** in OpenCV
DoG has higher response for edges, so edges also need to be removed. For this, a concept similar to
Harris corner detector is used. They used a 2x2 Hessian matrix (H) to compute the principal
curvature. We know from Harris corner detector that for edges, one eigen value is larger than the
other. So here they used a simple function,
If this ratio is greater than a threshold, called **edgeThreshold** in OpenCV, that keypoint is
discarded. It is given as 10 in paper.
So it eliminates any low-contrast keypoints and edge keypoints and what remains is strong interest
points.
### 3. Orientation Assignment
Now an orientation is assigned to each keypoint to achieve invariance to image rotation. A
neighbourhood is taken around the keypoint location depending on the scale, and the gradient
magnitude and direction is calculated in that region. An orientation histogram with 36 bins covering
360 degrees is created (It is weighted by gradient magnitude and gaussian-weighted circular window
with \f$\sigma\f$ equal to 1.5 times the scale of keypoint). The highest peak in the histogram is taken
and any peak above 80% of it is also considered to calculate the orientation. It creates keypoints
with same location and scale, but different directions. It contribute to stability of matching.
### 4. Keypoint Descriptor
Now keypoint descriptor is created. A 16x16 neighbourhood around the keypoint is taken. It is
divided into 16 sub-blocks of 4x4 size. For each sub-block, 8 bin orientation histogram is created.
So a total of 128 bin values are available. It is represented as a vector to form keypoint
descriptor. In addition to this, several measures are taken to achieve robustness against
illumination changes, rotation etc.
### 5. Keypoint Matching
Keypoints between two images are matched by identifying their nearest neighbours. But in some cases,
the second closest-match may be very near to the first. It may happen due to noise or some other
reasons. In that case, ratio of closest-distance to second-closest distance is taken. If it is
greater than 0.8, they are rejected. It eliminates around 90% of false matches while discards only
5% correct matches, as per the paper.
This is a summary of SIFT algorithm. For more details and understanding, reading the original
paper is highly recommended.
SIFT in OpenCV
--------------
Now let's see SIFT functionalities available in OpenCV. Note that these were previously only
available in [the opencv contrib repo](https://github.com/opencv/opencv_contrib), but the patent
expired in the year 2020. So they are now included in the main repo. Let's start with keypoint
detection and draw them. First we have to construct a SIFT object. We can pass different
parameters to it which are optional and they are well explained in docs.
@code{.py}
import numpy as np
import cv2 as cv
img = cv.imread('home.jpg')
gray= cv.cvtColor(img,cv.COLOR_BGR2GRAY)
sift = cv.SIFT_create()
kp = sift.detect(gray,None)
img=cv.drawKeypoints(gray,kp,img)
cv.imwrite('sift_keypoints.jpg',img)
@endcode
**sift.detect()** function finds the keypoint in the images. You can pass a mask if you want to
search only a part of image. Each keypoint is a special structure which has many attributes like its
(x,y) coordinates, size of the meaningful neighbourhood, angle which specifies its orientation,
response that specifies strength of keypoints etc.
OpenCV also provides **cv.drawKeyPoints()** function which draws the small circles on the locations
of keypoints. If you pass a flag, **cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS** to it, it will
draw a circle with size of keypoint and it will even show its orientation. See below example.
@code{.py}
img=cv.drawKeypoints(gray,kp,img,flags=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
cv.imwrite('sift_keypoints.jpg',img)
@endcode
See the two results below:
![image](images/sift_keypoints.jpg)
Now to calculate the descriptor, OpenCV provides two methods.
-# Since you already found keypoints, you can call **sift.compute()** which computes the
descriptors from the keypoints we have found. Eg: kp,des = sift.compute(gray,kp)
2. If you didn't find keypoints, directly find keypoints and descriptors in a single step with the
function, **sift.detectAndCompute()**.
We will see the second method:
@code{.py}
sift = cv.SIFT_create()
kp, des = sift.detectAndCompute(gray,None)
@endcode
Here kp will be a list of keypoints and des is a numpy array of shape
\f$\text{(Number of Keypoints)} \times 128\f$.
So we got keypoints, descriptors etc. Now we want to see how to match keypoints in different images.
That we will learn in coming chapters.
Additional Resources
--------------------
Exercises
---------

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.7 KiB

View File

@ -0,0 +1,163 @@
Introduction to SURF (Speeded-Up Robust Features) {#tutorial_py_surf_intro}
=================================================
Goal
----
In this chapter,
- We will see the basics of SURF
- We will see SURF functionalities in OpenCV
Theory
------
In last chapter, we saw SIFT for keypoint detection and description. But it was comparatively slow
and people needed more speeded-up version. In 2006, three people, Bay, H., Tuytelaars, T. and Van
Gool, L, published another paper, "SURF: Speeded Up Robust Features" which introduced a new
algorithm called SURF. As name suggests, it is a speeded-up version of SIFT.
In SIFT, Lowe approximated Laplacian of Gaussian with Difference of Gaussian for finding
scale-space. SURF goes a little further and approximates LoG with Box Filter. Below image shows a
demonstration of such an approximation. One big advantage of this approximation is that, convolution
with box filter can be easily calculated with the help of integral images. And it can be done in
parallel for different scales. Also the SURF rely on determinant of Hessian matrix for both scale
and location.
![image](images/surf_boxfilter.jpg)
For orientation assignment, SURF uses wavelet responses in horizontal and vertical direction for a
neighbourhood of size 6s. Adequate gaussian weights are also applied to it. Then they are plotted in
a space as given in below image. The dominant orientation is estimated by calculating the sum of all
responses within a sliding orientation window of angle 60 degrees. Interesting thing is that,
wavelet response can be found out using integral images very easily at any scale. For many
applications, rotation invariance is not required, so no need of finding this orientation, which
speeds up the process. SURF provides such a functionality called Upright-SURF or U-SURF. It improves
speed and is robust upto \f$\pm 15^{\circ}\f$. OpenCV supports both, depending upon the flag,
**upright**. If it is 0, orientation is calculated. If it is 1, orientation is not calculated and it
is faster.
![image](images/surf_orientation.jpg)
For feature description, SURF uses Wavelet responses in horizontal and vertical direction (again,
use of integral images makes things easier). A neighbourhood of size 20sX20s is taken around the
keypoint where s is the size. It is divided into 4x4 subregions. For each subregion, horizontal and
vertical wavelet responses are taken and a vector is formed like this,
\f$v=( \sum{d_x}, \sum{d_y}, \sum{|d_x|}, \sum{|d_y|})\f$. This when represented as a vector gives SURF
feature descriptor with total 64 dimensions. Lower the dimension, higher the speed of computation
and matching, but provide better distinctiveness of features.
For more distinctiveness, SURF feature descriptor has an extended 128 dimension version. The sums of
\f$d_x\f$ and \f$|d_x|\f$ are computed separately for \f$d_y < 0\f$ and \f$d_y \geq 0\f$. Similarly, the sums of
\f$d_y\f$ and \f$|d_y|\f$ are split up according to the sign of \f$d_x\f$ , thereby doubling the number of
features. It doesn't add much computation complexity. OpenCV supports both by setting the value of
flag **extended** with 0 and 1 for 64-dim and 128-dim respectively (default is 128-dim)
Another important improvement is the use of sign of Laplacian (trace of Hessian Matrix) for
underlying interest point. It adds no computation cost since it is already computed during
detection. The sign of the Laplacian distinguishes bright blobs on dark backgrounds from the reverse
situation. In the matching stage, we only compare features if they have the same type of contrast
(as shown in image below). This minimal information allows for faster matching, without reducing the
descriptor's performance.
![image](images/surf_matching.jpg)
In short, SURF adds a lot of features to improve the speed in every step. Analysis shows it is 3
times faster than SIFT while performance is comparable to SIFT. SURF is good at handling images with
blurring and rotation, but not good at handling viewpoint change and illumination change.
SURF in OpenCV
--------------
OpenCV provides SURF functionalities just like SIFT. You initiate a SURF object with some optional
conditions like 64/128-dim descriptors, Upright/Normal SURF etc. All the details are well explained
in docs. Then as we did in SIFT, we can use SURF.detect(), SURF.compute() etc for finding keypoints
and descriptors.
First we will see a simple demo on how to find SURF keypoints and descriptors and draw it. All
examples are shown in Python terminal since it is just same as SIFT only.
@code{.py}
>>> img = cv.imread('fly.png',0)
# Create SURF object. You can specify params here or later.
# Here I set Hessian Threshold to 400
>>> surf = cv.xfeatures2d.SURF_create(400)
# Find keypoints and descriptors directly
>>> kp, des = surf.detectAndCompute(img,None)
>>> len(kp)
699
@endcode
1199 keypoints is too much to show in a picture. We reduce it to some 50 to draw it on an image.
While matching, we may need all those features, but not now. So we increase the Hessian Threshold.
@code{.py}
# Check present Hessian threshold
>>> print( surf.getHessianThreshold() )
400.0
# We set it to some 50000. Remember, it is just for representing in picture.
# In actual cases, it is better to have a value 300-500
>>> surf.setHessianThreshold(50000)
# Again compute keypoints and check its number.
>>> kp, des = surf.detectAndCompute(img,None)
>>> print( len(kp) )
47
@endcode
It is less than 50. Let's draw it on the image.
@code{.py}
>>> img2 = cv.drawKeypoints(img,kp,None,(255,0,0),4)
>>> plt.imshow(img2),plt.show()
@endcode
See the result below. You can see that SURF is more like a blob detector. It detects the white blobs
on wings of butterfly. You can test it with other images.
![image](images/surf_kp1.jpg)
Now I want to apply U-SURF, so that it won't find the orientation.
@code{.py}
# Check upright flag, if it False, set it to True
>>> print( surf.getUpright() )
False
>>> surf.setUpright(True)
# Recompute the feature points and draw it
>>> kp = surf.detect(img,None)
>>> img2 = cv.drawKeypoints(img,kp,None,(255,0,0),4)
>>> plt.imshow(img2),plt.show()
@endcode
See the results below. All the orientations are shown in same direction. It is faster than
previous. If you are working on cases where orientation is not a problem (like panorama stitching)
etc, this is better.
![image](images/surf_kp2.jpg)
Finally we check the descriptor size and change it to 128 if it is only 64-dim.
@code{.py}
# Find size of descriptor
>>> print( surf.descriptorSize() )
64
# That means flag, "extended" is False.
>>> surf.getExtended()
False
# So we make it to True to get 128-dim descriptors.
>>> surf.setExtended(True)
>>> kp, des = surf.detectAndCompute(img,None)
>>> print( surf.descriptorSize() )
128
>>> print( des.shape )
(47, 128)
@endcode
Remaining part is matching which we will do in another chapter.
Additional Resources
--------------------
Exercises
---------

View File

@ -0,0 +1,54 @@
Feature Detection and Description {#tutorial_py_table_of_contents_feature2d}
=================================
- @subpage tutorial_py_features_meaning
What are the main
features in an image? How can finding those features be useful to us?
- @subpage tutorial_py_features_harris
Okay, Corners are good
features? But how do we find them?
- @subpage tutorial_py_shi_tomasi
We will look into
Shi-Tomasi corner detection
- @subpage tutorial_py_sift_intro
Harris corner detector
is not good enough when scale of image changes. Lowe developed a breakthrough method to find
scale-invariant features and it is called SIFT
- @subpage tutorial_py_surf_intro
SIFT is really good,
but not fast enough, so people came up with a speeded-up version called SURF.
- @subpage tutorial_py_fast
All the above feature
detection methods are good in some way. But they are not fast enough to work in real-time
applications like SLAM. There comes the FAST algorithm, which is really "FAST".
- @subpage tutorial_py_brief
SIFT uses a feature
descriptor with 128 floating point numbers. Consider thousands of such features. It takes lots of
memory and more time for matching. We can compress it to make it faster. But still we have to
calculate it first. There comes BRIEF which gives the shortcut to find binary descriptors with
less memory, faster matching, still higher recognition rate.
- @subpage tutorial_py_orb
SIFT and SURF are good in what they do, but what if you have to pay a few dollars every year to use them in your applications? Yeah, they are patented!!! To solve that problem, OpenCV devs came up with a new "FREE" alternative to SIFT & SURF, and that is ORB.
- @subpage tutorial_py_matcher
We know a great deal about feature detectors and descriptors. It is time to learn how to match different descriptors. OpenCV provides two techniques, Brute-Force matcher and FLANN based matcher.
- @subpage tutorial_py_feature_homography
Now we know about feature matching. Let's mix it up with calib3d module to find objects in a complex image.