feat: 切换后端至PaddleOCR-NCNN,切换工程为CMake
1.项目后端整体迁移至PaddleOCR-NCNN算法,已通过基本的兼容性测试 2.工程改为使用CMake组织,后续为了更好地兼容第三方库,不再提供QMake工程 3.重整权利声明文件,重整代码工程,确保最小化侵权风险 Log: 切换后端至PaddleOCR-NCNN,切换工程为CMake Change-Id: I4d5d2c5d37505a4a24b389b1a4c5d12f17bfa38c
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/images/kmeansicon.jpg
vendored
Normal file
After Width: | Height: | Size: 64 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/images/knnicon.png
vendored
Normal file
After Width: | Height: | Size: 3.3 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/images/svmicon.png
vendored
Normal file
After Width: | Height: | Size: 13 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_kmeans/images/kmeans_begin.jpg
vendored
Normal file
After Width: | Height: | Size: 3.2 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_kmeans/images/kmeans_demo.jpg
vendored
Normal file
After Width: | Height: | Size: 4.1 KiB |
10
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_index.markdown
vendored
Normal file
@ -0,0 +1,10 @@
|
||||
K-Means Clustering {#tutorial_py_kmeans_index}
|
||||
==================
|
||||
|
||||
- @subpage tutorial_py_kmeans_understanding
|
||||
|
||||
Read to get an intuitive understanding of K-Means Clustering
|
||||
|
||||
- @subpage tutorial_py_kmeans_opencv
|
||||
|
||||
Now let's try K-Means functions in OpenCV
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/images/oc_1d_clustered.png
vendored
Normal file
After Width: | Height: | Size: 11 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/images/oc_1d_testdata.png
vendored
Normal file
After Width: | Height: | Size: 8.7 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/images/oc_2d_clustered.jpg
vendored
Normal file
After Width: | Height: | Size: 10 KiB |
After Width: | Height: | Size: 31 KiB |
After Width: | Height: | Size: 12 KiB |
194
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_opencv/py_kmeans_opencv.markdown
vendored
Normal file
@ -0,0 +1,194 @@
|
||||
K-Means Clustering in OpenCV {#tutorial_py_kmeans_opencv}
|
||||
============================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
- Learn to use **cv.kmeans()** function in OpenCV for data clustering
|
||||
|
||||
Understanding Parameters
|
||||
------------------------
|
||||
|
||||
### Input parameters
|
||||
|
||||
-# **samples** : It should be of **np.float32** data type, and each feature should be put in a
|
||||
single column.
|
||||
-# **nclusters(K)** : Number of clusters required at end
|
||||
-# **criteria** : It is the iteration termination criteria. When this criteria is satisfied, algorithm iteration stops. Actually, it should be a tuple of 3 parameters. They are \`( type, max_iter, epsilon )\`:
|
||||
-# type of termination criteria. It has 3 flags as below:
|
||||
- **cv.TERM_CRITERIA_EPS** - stop the algorithm iteration if specified accuracy, *epsilon*, is reached.
|
||||
- **cv.TERM_CRITERIA_MAX_ITER** - stop the algorithm after the specified number of iterations, *max_iter*.
|
||||
- **cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER** - stop the iteration when any of the above condition is met.
|
||||
-# max_iter - An integer specifying maximum number of iterations.
|
||||
-# epsilon - Required accuracy
|
||||
|
||||
-# **attempts** : Flag to specify the number of times the algorithm is executed using different
|
||||
initial labellings. The algorithm returns the labels that yield the best compactness. This
|
||||
compactness is returned as output.
|
||||
-# **flags** : This flag is used to specify how initial centers are taken. Normally two flags are
|
||||
used for this : **cv.KMEANS_PP_CENTERS** and **cv.KMEANS_RANDOM_CENTERS**.
|
||||
|
||||
### Output parameters
|
||||
|
||||
-# **compactness** : It is the sum of squared distance from each point to their corresponding
|
||||
centers.
|
||||
-# **labels** : This is the label array (same as 'code' in previous article) where each element
|
||||
marked '0', '1'.....
|
||||
-# **centers** : This is array of centers of clusters.
|
||||
|
||||
Now we will see how to apply K-Means algorithm with three examples.
|
||||
|
||||
1. Data with Only One Feature
|
||||
-----------------------------
|
||||
|
||||
Consider, you have a set of data with only one feature, ie one-dimensional. For eg, we can take our
|
||||
t-shirt problem where you use only height of people to decide the size of t-shirt.
|
||||
|
||||
So we start by creating data and plot it in Matplotlib
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
x = np.random.randint(25,100,25)
|
||||
y = np.random.randint(175,255,25)
|
||||
z = np.hstack((x,y))
|
||||
z = z.reshape((50,1))
|
||||
z = np.float32(z)
|
||||
plt.hist(z,256,[0,256]),plt.show()
|
||||
@endcode
|
||||
So we have 'z' which is an array of size 50, and values ranging from 0 to 255. I have reshaped 'z'
|
||||
to a column vector. It will be more useful when more than one features are present. Then I made data
|
||||
of np.float32 type.
|
||||
|
||||
We get following image :
|
||||
|
||||

|
||||
|
||||
Now we apply the KMeans function. Before that we need to specify the criteria. My criteria is such
|
||||
that, whenever 10 iterations of algorithm is ran, or an accuracy of epsilon = 1.0 is reached, stop
|
||||
the algorithm and return the answer.
|
||||
@code{.py}
|
||||
# Define criteria = ( type, max_iter = 10 , epsilon = 1.0 )
|
||||
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)
|
||||
|
||||
# Set flags (Just to avoid line break in the code)
|
||||
flags = cv.KMEANS_RANDOM_CENTERS
|
||||
|
||||
# Apply KMeans
|
||||
compactness,labels,centers = cv.kmeans(z,2,None,criteria,10,flags)
|
||||
@endcode
|
||||
This gives us the compactness, labels and centers. In this case, I got centers as 60 and 207. Labels
|
||||
will have the same size as that of test data where each data will be labelled as '0','1','2' etc.
|
||||
depending on their centroids. Now we split the data to different clusters depending on their labels.
|
||||
@code{.py}
|
||||
A = z[labels==0]
|
||||
B = z[labels==1]
|
||||
@endcode
|
||||
Now we plot A in Red color and B in Blue color and their centroids in Yellow color.
|
||||
@code{.py}
|
||||
# Now plot 'A' in red, 'B' in blue, 'centers' in yellow
|
||||
plt.hist(A,256,[0,256],color = 'r')
|
||||
plt.hist(B,256,[0,256],color = 'b')
|
||||
plt.hist(centers,32,[0,256],color = 'y')
|
||||
plt.show()
|
||||
@endcode
|
||||
Below is the output we got:
|
||||
|
||||

|
||||
|
||||
2. Data with Multiple Features
|
||||
------------------------------
|
||||
|
||||
In previous example, we took only height for t-shirt problem. Here, we will take both height and
|
||||
weight, ie two features.
|
||||
|
||||
Remember, in previous case, we made our data to a single column vector. Each feature is arranged in
|
||||
a column, while each row corresponds to an input test sample.
|
||||
|
||||
For example, in this case, we set a test data of size 50x2, which are heights and weights of 50
|
||||
people. First column corresponds to height of all the 50 people and second column corresponds to
|
||||
their weights. First row contains two elements where first one is the height of first person and
|
||||
second one his weight. Similarly remaining rows corresponds to heights and weights of other people.
|
||||
Check image below:
|
||||
|
||||

|
||||
|
||||
Now I am directly moving to the code:
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
X = np.random.randint(25,50,(25,2))
|
||||
Y = np.random.randint(60,85,(25,2))
|
||||
Z = np.vstack((X,Y))
|
||||
|
||||
# convert to np.float32
|
||||
Z = np.float32(Z)
|
||||
|
||||
# define criteria and apply kmeans()
|
||||
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)
|
||||
ret,label,center=cv.kmeans(Z,2,None,criteria,10,cv.KMEANS_RANDOM_CENTERS)
|
||||
|
||||
# Now separate the data, Note the flatten()
|
||||
A = Z[label.ravel()==0]
|
||||
B = Z[label.ravel()==1]
|
||||
|
||||
# Plot the data
|
||||
plt.scatter(A[:,0],A[:,1])
|
||||
plt.scatter(B[:,0],B[:,1],c = 'r')
|
||||
plt.scatter(center[:,0],center[:,1],s = 80,c = 'y', marker = 's')
|
||||
plt.xlabel('Height'),plt.ylabel('Weight')
|
||||
plt.show()
|
||||
@endcode
|
||||
Below is the output we get:
|
||||
|
||||

|
||||
|
||||
3. Color Quantization
|
||||
---------------------
|
||||
|
||||
Color Quantization is the process of reducing number of colors in an image. One reason to do so is
|
||||
to reduce the memory. Sometimes, some devices may have limitation such that it can produce only
|
||||
limited number of colors. In those cases also, color quantization is performed. Here we use k-means
|
||||
clustering for color quantization.
|
||||
|
||||
There is nothing new to be explained here. There are 3 features, say, R,G,B. So we need to reshape
|
||||
the image to an array of Mx3 size (M is number of pixels in image). And after the clustering, we
|
||||
apply centroid values (it is also R,G,B) to all pixels, such that resulting image will have
|
||||
specified number of colors. And again we need to reshape it back to the shape of original image.
|
||||
Below is the code:
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
|
||||
img = cv.imread('home.jpg')
|
||||
Z = img.reshape((-1,3))
|
||||
|
||||
# convert to np.float32
|
||||
Z = np.float32(Z)
|
||||
|
||||
# define criteria, number of clusters(K) and apply kmeans()
|
||||
criteria = (cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)
|
||||
K = 8
|
||||
ret,label,center=cv.kmeans(Z,K,None,criteria,10,cv.KMEANS_RANDOM_CENTERS)
|
||||
|
||||
# Now convert back into uint8, and make original image
|
||||
center = np.uint8(center)
|
||||
res = center[label.flatten()]
|
||||
res2 = res.reshape((img.shape))
|
||||
|
||||
cv.imshow('res2',res2)
|
||||
cv.waitKey(0)
|
||||
cv.destroyAllWindows()
|
||||
@endcode
|
||||
See the result below for K=8:
|
||||
|
||||

|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
Exercises
|
||||
---------
|
After Width: | Height: | Size: 10 KiB |
After Width: | Height: | Size: 12 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/images/testdata.jpg
vendored
Normal file
After Width: | Height: | Size: 11 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_kmeans/py_kmeans_understanding/images/tshirt.jpg
vendored
Normal file
After Width: | Height: | Size: 8.8 KiB |
After Width: | Height: | Size: 16 KiB |
After Width: | Height: | Size: 12 KiB |
@ -0,0 +1,85 @@
|
||||
Understanding K-Means Clustering {#tutorial_py_kmeans_understanding}
|
||||
================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter, we will understand the concepts of K-Means Clustering, how it works etc.
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
We will deal this with an example which is commonly used.
|
||||
|
||||
### T-shirt size problem
|
||||
|
||||
Consider a company, which is going to release a new model of T-shirt to market. Obviously they will
|
||||
have to manufacture models in different sizes to satisfy people of all sizes. So the company make a
|
||||
data of people's height and weight, and plot them on to a graph, as below:
|
||||
|
||||

|
||||
|
||||
Company can't create t-shirts with all the sizes. Instead, they divide people to Small, Medium and
|
||||
Large, and manufacture only these 3 models which will fit into all the people. This grouping of
|
||||
people into three groups can be done by k-means clustering, and algorithm provides us best 3 sizes,
|
||||
which will satisfy all the people. And if it doesn't, company can divide people to more groups, may
|
||||
be five, and so on. Check image below :
|
||||
|
||||

|
||||
|
||||
### How does it work ?
|
||||
|
||||
This algorithm is an iterative process. We will explain it step-by-step with the help of images.
|
||||
|
||||
Consider a set of data as below ( You can consider it as t-shirt problem). We need to cluster this
|
||||
data into two groups.
|
||||
|
||||

|
||||
|
||||
**Step : 1** - Algorithm randomly chooses two centroids, \f$C1\f$ and \f$C2\f$ (sometimes, any two data are
|
||||
taken as the centroids).
|
||||
|
||||
**Step : 2** - It calculates the distance from each point to both centroids. If a test data is more
|
||||
closer to \f$C1\f$, then that data is labelled with '0'. If it is closer to \f$C2\f$, then labelled as '1'
|
||||
(If more centroids are there, labelled as '2','3' etc).
|
||||
|
||||
In our case, we will color all '0' labelled with red, and '1' labelled with blue. So we get
|
||||
following image after above operations.
|
||||
|
||||

|
||||
|
||||
**Step : 3** - Next we calculate the average of all blue points and red points separately and that
|
||||
will be our new centroids. That is \f$C1\f$ and \f$C2\f$ shift to newly calculated centroids. (Remember, the
|
||||
images shown are not true values and not to true scale, it is just for demonstration only).
|
||||
|
||||
And again, perform step 2 with new centroids and label data to '0' and '1'.
|
||||
|
||||
So we get result as below :
|
||||
|
||||

|
||||
|
||||
Now **Step - 2** and **Step - 3** are iterated until both centroids are converged to fixed points.
|
||||
*(Or it may be stopped depending on the criteria we provide, like maximum number of iterations, or a
|
||||
specific accuracy is reached etc.)* **These points are such that sum of distances between test data
|
||||
and their corresponding centroids are minimum**. Or simply, sum of distances between
|
||||
\f$C1 \leftrightarrow Red\_Points\f$ and \f$C2 \leftrightarrow Blue\_Points\f$ is minimum.
|
||||
|
||||
\f[minimize \;\bigg[J = \sum_{All\: Red\_Points}distance(C1,Red\_Point) + \sum_{All\: Blue\_Points}distance(C2,Blue\_Point)\bigg]\f]
|
||||
|
||||
Final result almost looks like below :
|
||||
|
||||

|
||||
|
||||
So this is just an intuitive understanding of K-Means Clustering. For more details and mathematical
|
||||
explanation, please read any standard machine learning textbooks or check links in additional
|
||||
resources. It is just a top layer of K-Means clustering. There are a lot of modifications to this
|
||||
algorithm like, how to choose the initial centroids, how to speed up the iteration process etc.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
-# [Machine Learning Course](https://www.coursera.org/course/ml), Video lectures by Prof. Andrew Ng
|
||||
(Some of the images are taken from this)
|
||||
|
||||
Exercises
|
||||
---------
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_knn/images/knn_icon1.jpg
vendored
Normal file
After Width: | Height: | Size: 2.4 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_knn/images/knn_icon2.jpg
vendored
Normal file
After Width: | Height: | Size: 4.8 KiB |
10
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_knn/py_knn_index.markdown
vendored
Normal file
@ -0,0 +1,10 @@
|
||||
K-Nearest Neighbour {#tutorial_py_knn_index}
|
||||
===================
|
||||
|
||||
- @subpage tutorial_py_knn_understanding
|
||||
|
||||
Get a basic understanding of what kNN is
|
||||
|
||||
- @subpage tutorial_py_knn_opencv
|
||||
|
||||
Now let's use kNN in OpenCV for digit recognition OCR
|
123
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_knn/py_knn_opencv/py_knn_opencv.markdown
vendored
Normal file
@ -0,0 +1,123 @@
|
||||
OCR of Hand-written Data using kNN {#tutorial_py_knn_opencv}
|
||||
==================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter:
|
||||
- We will use our knowledge on kNN to build a basic OCR (Optical Character Recognition) application.
|
||||
- We will try our application on Digits and Alphabets data that comes with OpenCV.
|
||||
|
||||
OCR of Hand-written Digits
|
||||
--------------------------
|
||||
|
||||
Our goal is to build an application which can read handwritten digits. For this we need some
|
||||
training data and some test data. OpenCV comes with an image digits.png (in the folder
|
||||
opencv/samples/data/) which has 5000 handwritten digits (500 for each digit). Each digit is
|
||||
a 20x20 image. So our first step is to split this image into 5000 different digit images. Then for each digit (20x20 image),
|
||||
we flatten it into a single row with 400 pixels. That is our feature set, i.e. intensity values of all
|
||||
pixels. It is the simplest feature set we can create. We use the first 250 samples of each digit as
|
||||
training data, and the other 250 samples as test data. So let's prepare them first.
|
||||
@code{.py}
|
||||
import numpy as np
|
||||
import cv2 as cv
|
||||
|
||||
img = cv.imread('digits.png')
|
||||
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
|
||||
|
||||
# Now we split the image to 5000 cells, each 20x20 size
|
||||
cells = [np.hsplit(row,100) for row in np.vsplit(gray,50)]
|
||||
|
||||
# Make it into a Numpy array: its size will be (50,100,20,20)
|
||||
x = np.array(cells)
|
||||
|
||||
# Now we prepare the training data and test data
|
||||
train = x[:,:50].reshape(-1,400).astype(np.float32) # Size = (2500,400)
|
||||
test = x[:,50:100].reshape(-1,400).astype(np.float32) # Size = (2500,400)
|
||||
|
||||
# Create labels for train and test data
|
||||
k = np.arange(10)
|
||||
train_labels = np.repeat(k,250)[:,np.newaxis]
|
||||
test_labels = train_labels.copy()
|
||||
|
||||
# Initiate kNN, train it on the training data, then test it with the test data with k=1
|
||||
knn = cv.ml.KNearest_create()
|
||||
knn.train(train, cv.ml.ROW_SAMPLE, train_labels)
|
||||
ret,result,neighbours,dist = knn.findNearest(test,k=5)
|
||||
|
||||
# Now we check the accuracy of classification
|
||||
# For that, compare the result with test_labels and check which are wrong
|
||||
matches = result==test_labels
|
||||
correct = np.count_nonzero(matches)
|
||||
accuracy = correct*100.0/result.size
|
||||
print( accuracy )
|
||||
@endcode
|
||||
So our basic OCR app is ready. This particular example gave me an accuracy of 91%. One option to
|
||||
improve accuracy is to add more data for training, especially for the digits where we had more errors.
|
||||
|
||||
Instead of finding
|
||||
this training data every time I start the application, I better save it, so that the next time, I can directly
|
||||
read this data from a file and start classification. This can be done with the help of some Numpy
|
||||
functions like np.savetxt, np.savez, np.load, etc. Please check the NumPy docs for more details.
|
||||
@code{.py}
|
||||
# Save the data
|
||||
np.savez('knn_data.npz',train=train, train_labels=train_labels)
|
||||
|
||||
# Now load the data
|
||||
with np.load('knn_data.npz') as data:
|
||||
print( data.files )
|
||||
train = data['train']
|
||||
train_labels = data['train_labels']
|
||||
@endcode
|
||||
In my system, it takes around 4.4 MB of memory. Since we are using intensity values (uint8 data) as
|
||||
features, it would be better to convert the data to np.uint8 first and then save it. It takes only
|
||||
1.1 MB in this case. Then while loading, you can convert back into float32.
|
||||
|
||||
OCR of the English Alphabet
|
||||
------------------------
|
||||
|
||||
Next we will do the same for the English alphabet, but there is a slight change in data and feature
|
||||
set. Here, instead of images, OpenCV comes with a data file, letter-recognition.data in
|
||||
opencv/samples/cpp/ folder. If you open it, you will see 20000 lines which may, on first sight, look
|
||||
like garbage. Actually, in each row, the first column is a letter which is our label. The next 16 numbers
|
||||
following it are the different features. These features are obtained from the [UCI Machine Learning
|
||||
Repository](http://archive.ics.uci.edu/ml/). You can find the details of these features in [this
|
||||
page](http://archive.ics.uci.edu/ml/datasets/Letter+Recognition).
|
||||
|
||||
There are 20000 samples available, so we take the first 10000 as training samples and the remaining
|
||||
10000 as test samples. We should change the letters to ascii characters because we can't work with
|
||||
letters directly.
|
||||
@code{.py}
|
||||
import cv2 as cv
|
||||
import numpy as np
|
||||
|
||||
# Load the data and convert the letters to numbers
|
||||
data= np.loadtxt('letter-recognition.data', dtype= 'float32', delimiter = ',',
|
||||
converters= {0: lambda ch: ord(ch)-ord('A')})
|
||||
|
||||
# Split the dataset in two, with 10000 samples each for training and test sets
|
||||
train, test = np.vsplit(data,2)
|
||||
|
||||
# Split trainData and testData into features and responses
|
||||
responses, trainData = np.hsplit(train,[1])
|
||||
labels, testData = np.hsplit(test,[1])
|
||||
|
||||
# Initiate the kNN, classify, measure accuracy
|
||||
knn = cv.ml.KNearest_create()
|
||||
knn.train(trainData, cv.ml.ROW_SAMPLE, responses)
|
||||
ret, result, neighbours, dist = knn.findNearest(testData, k=5)
|
||||
|
||||
correct = np.count_nonzero(result == labels)
|
||||
accuracy = correct*100.0/10000
|
||||
print( accuracy )
|
||||
@endcode
|
||||
It gives me an accuracy of 93.22%. Again, if you want to increase accuracy, you can iteratively add
|
||||
more data.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
1. [Wikipedia article on Optical character recognition](https://en.wikipedia.org/wiki/Optical_character_recognition)
|
||||
|
||||
Exercises
|
||||
---------
|
||||
1. Here we used k=5. What happens if you try other values of k? Can you find a value that maximizes accuracy (minimizes the number of errors)?
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_knn/py_knn_understanding/images/knn_simple.png
vendored
Normal file
After Width: | Height: | Size: 7.3 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_knn/py_knn_understanding/images/knn_theory.png
vendored
Normal file
After Width: | Height: | Size: 6.5 KiB |
@ -0,0 +1,150 @@
|
||||
Understanding k-Nearest Neighbour {#tutorial_py_knn_understanding}
|
||||
=================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter, we will understand the concepts of the k-Nearest Neighbour (kNN) algorithm.
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
kNN is one of the simplest classification algorithms available for supervised learning. The idea
|
||||
is to search for the closest match(es) of the test data in the feature space. We will look into it with the below
|
||||
image.
|
||||
|
||||

|
||||
|
||||
In the image, there are two families: Blue Squares and Red Triangles. We refer to each family as
|
||||
a **Class**. Their houses are shown in their town map which we call the **Feature Space**. You can consider
|
||||
a feature space as a space where all data are projected. For example, consider a 2D coordinate
|
||||
space. Each datum has two features, a x coordinate and a y coordinate. You can represent this datum in your 2D
|
||||
coordinate space, right? Now imagine that there are three features, you will need 3D space. Now consider N
|
||||
features: you need N-dimensional space, right? This N-dimensional space is its feature space.
|
||||
In our image, you can consider it as a 2D case with two features.
|
||||
|
||||
Now consider what happens if a new member comes into the town and creates a new home, which is shown as the green circle. He
|
||||
should be added to one of these Blue or Red families (or *classes*). We call that process, **Classification**. How exactly should this new member be classified? Since we are dealing with kNN, let us apply the algorithm.
|
||||
|
||||
One simple method is to check who is his nearest neighbour. From the image, it is clear that it is a member of the Red
|
||||
Triangle family. So he is classified as a Red Triangle. This method is called simply **Nearest Neighbour** classification, because classification depends only on the *nearest neighbour*.
|
||||
|
||||
But there is a problem with this approach! Red Triangle may be the nearest neighbour, but what if there are also a lot of Blue
|
||||
Squares nearby? Then Blue Squares have more strength in that locality than Red Triangles, so
|
||||
just checking the nearest one is not sufficient. Instead we may want to check some **k** nearest families. Then whichever family is the majority amongst them, the new guy should belong to that family. In our image, let's take k=3, i.e. consider the 3 nearest
|
||||
neighbours. The new member has two Red neighbours and one Blue neighbour (there are two Blues equidistant, but since k=3, we can take only
|
||||
one of them), so again he should be added to Red family. But what if we take k=7? Then he has 5 Blue
|
||||
neighbours and 2 Red neighbours and should be added to the Blue family. The result will vary with the selected
|
||||
value of k. Note that if k is not an odd number, we can get a tie, as would happen in the above case with k=4. We would see that our new member has 2 Red and 2 Blue neighbours as his four nearest neighbours and we would need to choose a method for breaking the tie to perform classification. So to reiterate, this method is called **k-Nearest Neighbour** since
|
||||
classification depends on the *k nearest neighbours*.
|
||||
|
||||
Again, in kNN, it is true we are considering k neighbours, but we are giving equal importance to
|
||||
all, right? Is this justified? For example, take the tied case of k=4. As we can see, the 2
|
||||
Red neighbours are actually closer to the new member than the other 2 Blue neighbours, so he is more eligible to be
|
||||
added to the Red family. How do we mathematically explain that? We give some weights to each neighbour
|
||||
depending on their distance to the new-comer: those who are nearer to him get higher weights, while
|
||||
those that are farther away get lower weights. Then we add the total weights of each family separately and classify the new-comer as part of whichever family
|
||||
received higher total weights. This is called **modified kNN** or **weighted kNN**.
|
||||
|
||||
So what are some important things you see here?
|
||||
|
||||
- Because we have to check
|
||||
the distance from the new-comer to all the existing houses to find the nearest neighbour(s), you need to have information about all of the houses in town, right? If there are plenty of houses and families, it takes a lot of memory, and also more time for calculation.
|
||||
- There is almost zero time for any kind of "training" or preparation. Our "learning" involves only memorizing (storing) the data, before testing and classifying.
|
||||
|
||||
Now let's see this algorithm at work in OpenCV.
|
||||
|
||||
kNN in OpenCV
|
||||
-------------
|
||||
|
||||
We will do a simple example here, with two families (classes), just like above. Then in the next
|
||||
chapter, we will do an even better example.
|
||||
|
||||
So here, we label the Red family as **Class-0** (so denoted by 0) and Blue family as **Class-1**
|
||||
(denoted by 1). We create 25 neighbours or 25 training data, and label each of them as either part of Class-0 or Class-1.
|
||||
We can do this with the help of a Random Number Generator from NumPy.
|
||||
|
||||
Then we can plot it with the help of Matplotlib. Red neighbours are shown as Red Triangles and Blue
|
||||
neighbours are shown as Blue Squares.
|
||||
@code{.py}
|
||||
import cv2 as cv
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
# Feature set containing (x,y) values of 25 known/training data
|
||||
trainData = np.random.randint(0,100,(25,2)).astype(np.float32)
|
||||
|
||||
# Label each one either Red or Blue with numbers 0 and 1
|
||||
responses = np.random.randint(0,2,(25,1)).astype(np.float32)
|
||||
|
||||
# Take Red neighbours and plot them
|
||||
red = trainData[responses.ravel()==0]
|
||||
plt.scatter(red[:,0],red[:,1],80,'r','^')
|
||||
|
||||
# Take Blue neighbours and plot them
|
||||
blue = trainData[responses.ravel()==1]
|
||||
plt.scatter(blue[:,0],blue[:,1],80,'b','s')
|
||||
|
||||
plt.show()
|
||||
@endcode
|
||||
You will get something similar to our first image. Since you are using a random number generator, you
|
||||
will get different data each time you run the code.
|
||||
|
||||
Next initiate the kNN algorithm and pass the trainData and responses to train the kNN. (Underneath the hood, it constructs
|
||||
a search tree: see the Additional Resources section below for more information on this.)
|
||||
|
||||
Then we will bring one new-comer and classify him as belonging to a family with the help of kNN in OpenCV. Before
|
||||
running kNN, we need to know something about our test data (data of new comers). Our data should be a
|
||||
floating point array with size \f$number \; of \; testdata \times number \; of \; features\f$. Then we
|
||||
find the nearest neighbours of the new-comer. We can specify *k*: how many neighbours we want. (Here we used 3.) It returns:
|
||||
|
||||
1. The label given to the new-comer depending upon the kNN theory we saw earlier. If you want the *Nearest
|
||||
Neighbour* algorithm, just specify k=1.
|
||||
2. The labels of the k-Nearest Neighbours.
|
||||
3. The corresponding distances from the new-comer to each nearest neighbour.
|
||||
|
||||
So let's see how it works. The new-comer is marked in green.
|
||||
@code{.py}
|
||||
newcomer = np.random.randint(0,100,(1,2)).astype(np.float32)
|
||||
plt.scatter(newcomer[:,0],newcomer[:,1],80,'g','o')
|
||||
|
||||
knn = cv.ml.KNearest_create()
|
||||
knn.train(trainData, cv.ml.ROW_SAMPLE, responses)
|
||||
ret, results, neighbours ,dist = knn.findNearest(newcomer, 3)
|
||||
|
||||
print( "result: {}\n".format(results) )
|
||||
print( "neighbours: {}\n".format(neighbours) )
|
||||
print( "distance: {}\n".format(dist) )
|
||||
|
||||
plt.show()
|
||||
@endcode
|
||||
I got the following results:
|
||||
@code{.py}
|
||||
result: [[ 1.]]
|
||||
neighbours: [[ 1. 1. 1.]]
|
||||
distance: [[ 53. 58. 61.]]
|
||||
@endcode
|
||||
It says that our new-comer's 3 nearest neighbours are all from the Blue family. Therefore, he is labelled as part of the Blue
|
||||
family. It is obvious from the plot below:
|
||||
|
||||

|
||||
|
||||
If you have multiple new-comers (test data), you can just pass them as an array. Corresponding results are also
|
||||
obtained as arrays.
|
||||
@code{.py}
|
||||
# 10 new-comers
|
||||
newcomers = np.random.randint(0,100,(10,2)).astype(np.float32)
|
||||
ret, results,neighbours,dist = knn.findNearest(newcomer, 3)
|
||||
# The results also will contain 10 labels.
|
||||
@endcode
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
1. [NPTEL notes on Pattern Recognition, Chapter
|
||||
11](https://nptel.ac.in/courses/106/108/106108057/)
|
||||
2. [Wikipedia article on Nearest neighbor search](https://en.wikipedia.org/wiki/Nearest_neighbor_search)
|
||||
3. [Wikipedia article on k-d tree](https://en.wikipedia.org/wiki/K-d_tree)
|
||||
|
||||
Exercises
|
||||
---------
|
||||
1. Try repeating the above with more classes and different choices of k. Does choosing k become harder with more classes in the same 2D feature space?
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_svm/images/svm_icon1.jpg
vendored
Normal file
After Width: | Height: | Size: 3.9 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_svm/images/svm_icon2.jpg
vendored
Normal file
After Width: | Height: | Size: 4.5 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_svm/py_svm_basics/images/svm_basics1.png
vendored
Normal file
After Width: | Height: | Size: 7.5 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_svm/py_svm_basics/images/svm_basics2.png
vendored
Normal file
After Width: | Height: | Size: 7.8 KiB |
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_svm/py_svm_basics/images/svm_basics3.png
vendored
Normal file
After Width: | Height: | Size: 10 KiB |
135
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_svm/py_svm_basics/py_svm_basics.markdown
vendored
Normal file
@ -0,0 +1,135 @@
|
||||
Understanding SVM {#tutorial_py_svm_basics}
|
||||
=================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter
|
||||
- We will see an intuitive understanding of SVM
|
||||
|
||||
Theory
|
||||
------
|
||||
|
||||
### Linearly Separable Data
|
||||
|
||||
Consider the image below which has two types of data, red and blue. In kNN, for a test data, we used
|
||||
to measure its distance to all the training samples and take the one with minimum distance. It takes
|
||||
plenty of time to measure all the distances and plenty of memory to store all the training-samples.
|
||||
But considering the data given in image, should we need that much?
|
||||
|
||||

|
||||
|
||||
Consider another idea. We find a line, \f$f(x)=ax_1+bx_2+c\f$ which divides both the data to two
|
||||
regions. When we get a new test_data \f$X\f$, just substitute it in \f$f(x)\f$. If \f$f(X) > 0\f$, it belongs
|
||||
to blue group, else it belongs to red group. We can call this line as **Decision Boundary**. It is
|
||||
very simple and memory-efficient. Such data which can be divided into two with a straight line (or
|
||||
hyperplanes in higher dimensions) is called **Linear Separable**.
|
||||
|
||||
So in above image, you can see plenty of such lines are possible. Which one we will take? Very
|
||||
intuitively we can say that the line should be passing as far as possible from all the points. Why?
|
||||
Because there can be noise in the incoming data. This data should not affect the classification
|
||||
accuracy. So taking a farthest line will provide more immunity against noise. So what SVM does is to
|
||||
find a straight line (or hyperplane) with largest minimum distance to the training samples. See the
|
||||
bold line in below image passing through the center.
|
||||
|
||||

|
||||
|
||||
So to find this Decision Boundary, you need training data. Do you need all? NO. Just the ones which
|
||||
are close to the opposite group are sufficient. In our image, they are the one blue filled circle
|
||||
and two red filled squares. We can call them **Support Vectors** and the lines passing through them
|
||||
are called **Support Planes**. They are adequate for finding our decision boundary. We need not
|
||||
worry about all the data. It helps in data reduction.
|
||||
|
||||
What happened is, first two hyperplanes are found which best represents the data. For eg, blue data
|
||||
is represented by \f$w^Tx+b_0 > 1\f$ while red data is represented by \f$w^Tx+b_0 < -1\f$ where \f$w\f$ is
|
||||
**weight vector** ( \f$w=[w_1, w_2,..., w_n]\f$) and \f$x\f$ is the feature vector
|
||||
(\f$x = [x_1,x_2,..., x_n]\f$). \f$b_0\f$ is the **bias**. Weight vector decides the orientation of decision
|
||||
boundary while bias point decides its location. Now decision boundary is defined to be midway
|
||||
between these hyperplanes, so expressed as \f$w^Tx+b_0 = 0\f$. The minimum distance from support vector
|
||||
to the decision boundary is given by, \f$distance_{support \, vectors}=\frac{1}{||w||}\f$. Margin is
|
||||
twice this distance, and we need to maximize this margin. i.e. we need to minimize a new function
|
||||
\f$L(w, b_0)\f$ with some constraints which can expressed below:
|
||||
|
||||
\f[\min_{w, b_0} L(w, b_0) = \frac{1}{2}||w||^2 \; \text{subject to} \; t_i(w^Tx+b_0) \geq 1 \; \forall i\f]
|
||||
|
||||
where \f$t_i\f$ is the label of each class, \f$t_i \in [-1,1]\f$.
|
||||
|
||||
### Non-Linearly Separable Data
|
||||
|
||||
Consider some data which can't be divided into two with a straight line. For example, consider an
|
||||
one-dimensional data where 'X' is at -3 & +3 and 'O' is at -1 & +1. Clearly it is not linearly
|
||||
separable. But there are methods to solve these kinds of problems. If we can map this data set with
|
||||
a function, \f$f(x) = x^2\f$, we get 'X' at 9 and 'O' at 1 which are linear separable.
|
||||
|
||||
Otherwise we can convert this one-dimensional to two-dimensional data. We can use \f$f(x)=(x,x^2)\f$
|
||||
function to map this data. Then 'X' becomes (-3,9) and (3,9) while 'O' becomes (-1,1) and (1,1).
|
||||
This is also linear separable. In short, chance is more for a non-linear separable data in
|
||||
lower-dimensional space to become linear separable in higher-dimensional space.
|
||||
|
||||
In general, it is possible to map points in a d-dimensional space to some D-dimensional space
|
||||
\f$(D>d)\f$ to check the possibility of linear separability. There is an idea which helps to compute the
|
||||
dot product in the high-dimensional (kernel) space by performing computations in the low-dimensional
|
||||
input (feature) space. We can illustrate with following example.
|
||||
|
||||
Consider two points in two-dimensional space, \f$p=(p_1,p_2)\f$ and \f$q=(q_1,q_2)\f$. Let \f$\phi\f$ be a
|
||||
mapping function which maps a two-dimensional point to three-dimensional space as follows:
|
||||
|
||||
\f[\phi (p) = (p_{1}^2,p_{2}^2,\sqrt{2} p_1 p_2)
|
||||
\phi (q) = (q_{1}^2,q_{2}^2,\sqrt{2} q_1 q_2)\f]
|
||||
|
||||
Let us define a kernel function \f$K(p,q)\f$ which does a dot product between two points, shown below:
|
||||
|
||||
\f[
|
||||
\begin{aligned}
|
||||
K(p,q) = \phi(p).\phi(q) &= \phi(p)^T \phi(q) \\
|
||||
&= (p_{1}^2,p_{2}^2,\sqrt{2} p_1 p_2).(q_{1}^2,q_{2}^2,\sqrt{2} q_1 q_2) \\
|
||||
&= p_{1}^2 q_{1}^2 + p_{2}^2 q_{2}^2 + 2 p_1 q_1 p_2 q_2 \\
|
||||
&= (p_1 q_1 + p_2 q_2)^2 \\
|
||||
\phi(p).\phi(q) &= (p.q)^2
|
||||
\end{aligned}
|
||||
\f]
|
||||
|
||||
It means, a dot product in three-dimensional space can be achieved using squared dot product in
|
||||
two-dimensional space. This can be applied to higher dimensional space. So we can calculate higher
|
||||
dimensional features from lower dimensions itself. Once we map them, we get a higher dimensional
|
||||
space.
|
||||
|
||||
In addition to all these concepts, there comes the problem of misclassification. So just finding
|
||||
decision boundary with maximum margin is not sufficient. We need to consider the problem of
|
||||
misclassification errors also. Sometimes, it may be possible to find a decision boundary with less
|
||||
margin, but with reduced misclassification. Anyway we need to modify our model such that it should
|
||||
find decision boundary with maximum margin, but with less misclassification. The minimization
|
||||
criteria is modified as:
|
||||
|
||||
\f[min \; ||w||^2 + C(distance \; of \; misclassified \; samples \; to \; their \; correct \; regions)\f]
|
||||
|
||||
Below image shows this concept. For each sample of the training data a new parameter \f$\xi_i\f$ is
|
||||
defined. It is the distance from its corresponding training sample to their correct decision region.
|
||||
For those who are not misclassified, they fall on their corresponding support planes, so their
|
||||
distance is zero.
|
||||
|
||||

|
||||
|
||||
So the new optimization problem is :
|
||||
|
||||
\f[\min_{w, b_{0}} L(w,b_0) = ||w||^{2} + C \sum_{i} {\xi_{i}} \text{ subject to } y_{i}(w^{T} x_{i} + b_{0}) \geq 1 - \xi_{i} \text{ and } \xi_{i} \geq 0 \text{ } \forall i\f]
|
||||
|
||||
How should the parameter C be chosen? It is obvious that the answer to this question depends on how
|
||||
the training data is distributed. Although there is no general answer, it is useful to take into
|
||||
account these rules:
|
||||
|
||||
- Large values of C give solutions with less misclassification errors but a smaller margin.
|
||||
Consider that in this case it is expensive to make misclassification errors. Since the aim of
|
||||
the optimization is to minimize the argument, few misclassifications errors are allowed.
|
||||
- Small values of C give solutions with bigger margin and more classification errors. In this
|
||||
case the minimization does not consider that much the term of the sum so it focuses more on
|
||||
finding a hyperplane with big margin.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
-# [NPTEL notes on Statistical Pattern Recognition, Chapters
|
||||
25-29](http://www.nptel.ac.in/courses/106108057/26).
|
||||
|
||||
Exercises
|
||||
---------
|
10
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_svm/py_svm_index.markdown
vendored
Normal file
@ -0,0 +1,10 @@
|
||||
Support Vector Machines (SVM) {#tutorial_py_svm_index}
|
||||
=============================
|
||||
|
||||
- @subpage tutorial_py_svm_basics
|
||||
|
||||
Get a basic understanding of what SVM is
|
||||
|
||||
- @subpage tutorial_py_svm_opencv
|
||||
|
||||
Let's use SVM functionalities in OpenCV
|
BIN
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_svm/py_svm_opencv/images/deskew.jpg
vendored
Normal file
After Width: | Height: | Size: 11 KiB |
56
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_svm/py_svm_opencv/py_svm_opencv.markdown
vendored
Normal file
@ -0,0 +1,56 @@
|
||||
OCR of Hand-written Data using SVM {#tutorial_py_svm_opencv}
|
||||
==================================
|
||||
|
||||
Goal
|
||||
----
|
||||
|
||||
In this chapter
|
||||
|
||||
- We will revisit the hand-written data OCR, but, with SVM instead of kNN.
|
||||
|
||||
OCR of Hand-written Digits
|
||||
--------------------------
|
||||
|
||||
In kNN, we directly used pixel intensity as the feature vector. This time we will use [Histogram of
|
||||
Oriented Gradients](http://en.wikipedia.org/wiki/Histogram_of_oriented_gradients) (HOG) as feature
|
||||
vectors.
|
||||
|
||||
Here, before finding the HOG, we deskew the image using its second order moments. So we first define
|
||||
a function **deskew()** which takes a digit image and deskew it. Below is the deskew() function:
|
||||
|
||||
@snippet samples/python/tutorial_code/ml/py_svm_opencv/hogsvm.py deskew
|
||||
|
||||
Below image shows above deskew function applied to an image of zero. Left image is the original
|
||||
image and right image is the deskewed image.
|
||||
|
||||

|
||||
|
||||
Next we have to find the HOG Descriptor of each cell. For that, we find Sobel derivatives of each
|
||||
cell in X and Y direction. Then find their magnitude and direction of gradient at each pixel. This
|
||||
gradient is quantized to 16 integer values. Divide this image to four sub-squares. For each
|
||||
sub-square, calculate the histogram of direction (16 bins) weighted with their magnitude. So each
|
||||
sub-square gives you a vector containing 16 values. Four such vectors (of four sub-squares) together
|
||||
gives us a feature vector containing 64 values. This is the feature vector we use to train our data.
|
||||
|
||||
@snippet samples/python/tutorial_code/ml/py_svm_opencv/hogsvm.py hog
|
||||
|
||||
Finally, as in the previous case, we start by splitting our big dataset into individual cells. For
|
||||
every digit, 250 cells are reserved for training data and remaining 250 data is reserved for
|
||||
testing. Full code is given below, you also can download it from [here](https://github.com/opencv/opencv/tree/master/samples/python/tutorial_code/ml/py_svm_opencv/hogsvm.py):
|
||||
|
||||
@include samples/python/tutorial_code/ml/py_svm_opencv/hogsvm.py
|
||||
|
||||
This particular technique gave me nearly 94% accuracy. You can try different values for various
|
||||
parameters of SVM to check if higher accuracy is possible. Or you can read technical papers on this
|
||||
area and try to implement them.
|
||||
|
||||
Additional Resources
|
||||
--------------------
|
||||
|
||||
-# [Histograms of Oriented Gradients Video](https://www.youtube.com/watch?v=0Zib1YEE4LU)
|
||||
|
||||
Exercises
|
||||
---------
|
||||
|
||||
-# OpenCV samples contain digits.py which applies a slight improvement of the above method to get
|
||||
improved result. It also contains the reference. Check it and understand it.
|
16
3rdparty/opencv-4.5.4/doc/py_tutorials/py_ml/py_table_of_contents_ml.markdown
vendored
Normal file
@ -0,0 +1,16 @@
|
||||
Machine Learning {#tutorial_py_table_of_contents_ml}
|
||||
================
|
||||
|
||||
- @subpage tutorial_py_knn_index
|
||||
|
||||
Learn to use kNN for classification
|
||||
Plus learn about handwritten digit recognition using kNN
|
||||
|
||||
- @subpage tutorial_py_svm_index
|
||||
|
||||
Understand concepts of SVM
|
||||
|
||||
- @subpage tutorial_py_kmeans_index
|
||||
|
||||
Learn to use K-Means Clustering to group data to a number of clusters.
|
||||
Plus learn to do color quantization using K-Means Clustering
|