结合这周看的论⽂,我对这周研究的Histogram of oriented gradients(HOG)谈谈⾃⼰的理解:
HOG descriptors 是应⽤在计算机视觉和图像处理领域,⽤于⽬标检测的特征描述器。这项技术是⽤来计算局部图像梯度的⽅向信息的统计值。这种⽅法跟边缘⽅向直⽅图(edge orientation histograms)、尺度不变特征变换(scale-invariant feature transform descriptors)以及形状上下⽂⽅法( shape contexts)有很多相似之处,但与它们的不同点是:HOG描述器是在⼀个⽹格密集的⼤⼩统⼀的细胞单元(dense grid ofuniformly spaced cells)上计算,⽽且为了提⾼性能,还采⽤了重叠的局部对⽐度归⼀化(overlappinglocal contrast normalization)技术。
这篇⽂章的作者Navneet Dalal和Bill Triggs是法国国家计算机技术和控制研究所French National Institutefor Research in Computer Science and Control (INRIA)的研究员。他们在这篇⽂章中⾸次提出了HOG⽅法。这篇⽂章被发表在2005年的CVPR上。他们主要是将这种⽅法应⽤在静态图像中的⾏⼈检测上,但在后来,他们也将其应⽤在电影和视频中的⾏⼈检测,以及静态图像中的车辆和常见动物的检测。
HOG描述器最重要的思想是:在⼀副图像中,局部⽬标的表象和形状(appearance and shape)能够被梯度或边缘的⽅向密度分布很好地描述。具体的实现⽅法是:⾸先将图像分成⼩的连通区域,我们把它叫细胞单元。然后采集细胞单元中各像素点的梯度的或边缘的⽅向直⽅图。最后把这些直⽅图组合起来就可以构成特征描述器。为了提⾼性能,我们还可以把这些局部直⽅图在图像的更⼤的范围内(我们把它叫区间或block)进⾏对⽐度归⼀化(contrast-normalized),所采⽤的⽅法是:先计算各直⽅图在这个区间
(block)中的密度,然后根据这个密度对区间中的各个细胞单元做归⼀化。通过这个归⼀化后,能对光照变化和阴影获得更好的效果。
与其他的特征描述⽅法相⽐,HOG描述器后很多优点。⾸先,由于HOG⽅法是在图像的局部细胞单元上操作,所以它对图像⼏何的(geometric)和光学的(photometric)形变都能保持很好的不变性,这两种形变只会出现在更⼤的空间领域上。其次,作者通过实验发现,在粗的空域抽样(coarse spatial sampling)、精细的⽅向抽样(fine orientation sampling)以及较强的局部光学归⼀化(strong local photometric
normalization)等条件下,只要⾏⼈⼤体上能够保持直⽴的姿势,就容许⾏⼈有⼀些细微的肢体动作,这些细微的动作可以被忽略⽽不影响检测效果。综上所述,HOG⽅法是特别适合于做图像中的⾏⼈检测的。
上图是作者做的⾏⼈检测试验,其中(a)表⽰所有训练图像集的平均梯度(average gradient across theirtraining images);(b)和(c)分别表⽰:图像中每⼀个区间(block)上的最⼤最⼤正、负SVM权值;(d)表⽰⼀副测试图像;(e)计算完R-HOG后的测试图像;(f)和(g)分别表⽰被正、负SVM权值加权后的R-HOG图像。算法的实现:
⾊彩和伽马归⼀化(color and gamma normalization)
作者分别在灰度空间、RGB⾊彩空间和LAB⾊彩空间上对图像进⾏⾊彩和伽马归⼀化,但实验结果显⽰,这个归⼀化的预处理⼯作对最后的结果没有影响,原因可能是:在后续步骤中也有归⼀化的过程,那些过程可以取代这个预处理的归⼀化。所以,在实际应⽤中,这⼀步可以省略。梯度的计算(Gradient computation)
最常⽤的⽅法是:简单地使⽤⼀个⼀维的离散微分模板(1-D centered, point discrete derivative mask)在⼀个⽅向上或者同时在⽔平和垂直两个⽅向上对图像进⾏处理,更确切地说,这个⽅法需要使⽤下⾯的滤波器核滤除图像中的⾊彩或变化剧烈的数据(color or intensity data)
作者也尝试了其他⼀些更复杂的模板,如3×3 Sobel 模板,或对⾓线模板(diagonal masks),但是在这个⾏⼈检测的实验中,这些复杂模板的表现都较差,所以作者的结论是:模板越简单,效果反⽽越好。作者也尝试了在使⽤微分模板前加⼊⼀个⾼斯平滑滤波,但是这个⾼斯平滑滤波的加⼊使得检测效果更差,原因是:许多有⽤的图像信息是来⾃变化剧烈的边缘,⽽在计算梯度之前加⼊⾼斯滤波会把这些边缘滤除掉。构建⽅向的直⽅图(creating the orientation histograms)
第三步就是为图像的每个细胞单元构建梯度⽅向直⽅图。细胞单元中的每⼀个像素点都为某个基于⽅向的直⽅图通道(orientation-based histogram channel)投票。投票是采取加权投票(weighted voting)的⽅式,即每⼀票都是带权值的,这个权值是根据该像素点的梯度幅度计算出来。可以采⽤幅值本⾝或者它的函数来表⽰这个权值,实际测试表明:使⽤幅值来表⽰权值能获得最佳的效果,当然,也可以选择幅值的函数来表⽰,⽐如幅值的平⽅根(square root)、幅值的平⽅(square of the gradient magnitude)、幅值的截断形式(clipped version of the magnitude)等。细胞单元可以是矩形的(rectangular),也可以是星形的(radial)。直⽅图通道是平均分布在0-1800(⽆向)或0-3600(有向)范围内。作者发现,采⽤⽆向的梯度和9个直⽅图通道,能在⾏⼈检测试验中取得最佳的效果。
把细胞单元组合成⼤的区间(grouping the cells together into larger blocks)
由于局部光照的变化(variations of illumination)以及前景-背景对⽐度(foreground-background
contrast)的变化,使得梯度强度(gradient strengths)的变化范围⾮常⼤。这就需要对梯度强度做归⼀化,作者采取的办法是:把各个细胞单元组合成⼤的、空间上连通的区间(blocks)。这样以来,HOG描述器就变成了由各区间所有细胞单元的直⽅图成分所组成的⼀个向量。这些区间是互有重叠的,这就意味着:每⼀个细胞单元的输出都多次作⽤于最终的描述器。区间有两个主要的⼏何形状——矩形区间(R-HOG)和环形区间(C-HOG)。R-HOG区间⼤体上是⼀些⽅形的格⼦,它可以有三个参数来表征:每个区
间中细胞单元的数⽬、每个细胞单元中像素点的数⽬、每个细胞的直⽅图通道数⽬。作者通过实验表明,⾏⼈检测的最佳参数设置是:3×3细胞/区间、6×6像素/细胞、9个直⽅图通道。作者还发现,在对直⽅图做处理之前,给每个区间(block)加⼀个⾼斯空域窗⼝(Gaussian spatial window)是⾮常必要的,因为这样可以降低边缘的周围像素点(pixels around the edge)的权重。R-HOG跟SIFT描述器看起来很相似,但他们的不同之处是:R-HOG是在单⼀尺度下、密集的⽹格内、没有对⽅向排序的情况下被计算出来(are computed in dense grids at some single scale without orientationalignment);⽽SIFT描述器是在多尺度下、稀疏的图像关键点上、对⽅向排序的情况下被计算出来(arecomputed at sparse, scale-invariant key image points and are rotated to align orientation)。补充⼀点,R-HOG是各区间被组合起来⽤于对空域信息进⾏编码(are used in conjunction to encode spatial forminformation),⽽SIFT的各描述器是单独使⽤的(are used singly)。C-HOG区间(blocks)有两种不同的形式,它们的区别在于:⼀个的中⼼细胞是完整的,⼀个的中⼼细胞是被分割的。如右图所⽰:作者发现C-HOG的这两种形式都能取得相同的效果。C-HOG区间(blocks)可以⽤四个参数来表征:⾓度盒⼦的个数(number of angular bins)、半径盒⼦个数(number of radial bins)、中⼼盒⼦的半径(radius of the center bin)、半径的伸展因⼦(expansion factor for the radius)。通过实验,对于⾏⼈检测,最佳的参数设置为:4个⾓度盒⼦、2个半径盒⼦、中⼼盒⼦半径为4个像素、伸展因⼦为2。前⾯提到过,对于R-HOG,中间加⼀个⾼斯空域窗⼝是⾮常有必要的,但对于C-HOG,这显得没有必要。C-HOG看起来很像基于形状上下⽂(Shape Contexts)的⽅法,但不同之处是:C-HOG的区间中包含的细胞单元有多个⽅向通道(orientation channels),⽽基于形状上下⽂的⽅法仅仅只⽤到了⼀个单⼀的边缘存在数(edge presence count)。区间归⼀化(Block normalization)作者采⽤了四中不同的⽅法对区间进⾏归⼀化,并对结果进⾏了⽐较。引⼊v表⽰⼀个还没有被归⼀化的向量,它包含了给定区间(block)的所有直⽅图信息。| | vk | |表⽰v的k阶范数,这⾥的k去1、2。⽤e表⽰⼀个很⼩的常数。这时,归⼀化因⼦可以表⽰如下:L2-norm:L1-norm:L1-sqrt:还有第四种归⼀化⽅式:L2-Hys,它可以通过先进⾏L2-norm,对结果进⾏截短(clipping),然后再重新归⼀化得到。作者发现:采⽤L2-Hys, L2-norm, 和 L1-sqrt⽅式所取得的效果是⼀样的,L1-norm稍微表现出⼀点点不可靠性。但是对于没有被归⼀化的数据来说,这四种⽅法都表现出来显著的改进。SVM分类器(SVM classifier)最后⼀步就是把提取的HOG特征输⼊到SVM分类器中,寻找⼀个最优超平⾯作为决策函数。作者采⽤的⽅法是:使⽤免费的SVMLight软件包加上HOG分类器来寻找测试图像中的⾏⼈。zz from http://hi.baidu.com/ykaitao_handsome/blog/item/d7a2c3156e368a0a4b90a745.html 本⽂来⾃CSDN博客,转载请标明出处:http://blog.csdn.net/forsiny/archive/2010/03/22/5404268.aspxposted @ 2011-06-01 13:51 我陪你⾯朝⼤海 阅读(403) 评论(0) 编辑(转)peopledetect学习,来⾃opencv中⽂论坛OpenCV2.0提供了⾏⼈检测的例⼦,⽤的是法国⼈Navneet Dalal最早在CVPR2005会议上提出的⽅法。最近正在学习它,下⾯是⾃⼰的学习体会,希望共同探讨提⾼。1、VC 2008 Express下安装OpenCV2.0--可以直接使⽤2.1,不⽤使⽤CMake进⾏编译了,避免编译出错 这是⼀切⼯作的基础,感谢版主提供的参考:http://www.opencv.org.cn/index.php/VC_2008_Express?????‰è£…OpenCV2.02、体会该程序在DOS界⾯,进⼊如下路径: C:\\OpenCV2.0\\samples\\c peopledetect.exe filename.jpg其中filename.jpg为待检测的⽂件名3、编译程序 创建⼀个控制台程序,从C:\\OpenCV2.0\\samples\\c下将peopledetect.cpp加⼊到⼯程中;按步骤1的⽅法进⾏设置。编译成功,但是在DEBUG模式下⽣成的EXE⽂件运⾏出错,很奇怪 。改成RELEASE模式后再次编译,⽣成的EXE⽂件可以运⾏。4程序代码简要说明1) getDefaultPeopleDetector() 获得3780维检测算⼦(105 blocks with 4 histograms each and 9 bins perhistogram there are 3,780 values)--(为什么是105blocks?)2).cv::HOGDescriptor hog; 创建类的对象 ⼀系列变量初始化 winSize(64,128), blockSize(16,16), blockStride(8,8),cellSize(8,8), nbins(9), derivAperture(1), winSigma(-1),histogramNormType(L2Hys), L2HysThreshold(0.2), gammaCorrection(true)3). 调⽤函数:detectMultiScale(img, found, 0, cv::Size(8,8), cv::Size(24,16), 1.05, 2); 参数分别为待检图像、返回结果列表、门槛值hitThreshold、窗⼝步长winStride、图像paddingmargin、⽐例系数、门槛值groupThreshold;通过修改参数发现,就所⽤的某图⽚,参数0改为0.01就检测不到,改为0.001可以;1.05改为1.1就不⾏,1.06可以;2改为1可以,0.8以下不⾏,(24,16)改成(0,0)也可以,(32,32)也⾏该函数内容如下
(1) 得到层数 levels
某图⽚(530,402)为例,lg(402/128)/lg1.05=23.4 则得到层数为24 (2) 循环levels次,每次执⾏内容如下
HOGThreadData& tdata = threadData[getThreadNum()];Mat smallerImg(sz, img.type(), tdata.smallerImgBuf.data); 调⽤以下核⼼函数
detect(smallerImg, tdata.locations, hitThreshold, winStride, padding);其参数分别为,该⽐例下图像、返回结果列表、门槛值、步长、margin该函数内容如下:
(a)得到补齐图像尺⼨paddedImgSize
(b)创建类的对象 HOGCache cache(this, img, padding, padding, nwindows == 0, cacheStride); 在创建过程中,⾸先初始化 HOGCache::init,包括:计算梯度 descriptor->computeGradient、得到块的个数105、每块参数个数36
(c)获得窗⼝个数nwindows,以第⼀层为例,其窗⼝数为(530+32*2-64)/8+1、(402+32*2-128)/8+1 =67*43=2881,其中(32,32)为winStride参数,也可⽤(24,16)(d)在每个窗⼝执⾏循环,内容如下
在105个块中执⾏循环,每个块内容为:通过getblock函数计算HOG特征并归⼀化,36个数分别与算⼦中对应数进⾏相应运算;判断105个块的总和 s >= hitThreshold 则认为检测到⽬标 4)主体部分感觉就是以上这些,但很多细节还需要进⼀步弄清。5、原⽂献写的算法流程
⽂献NavneetDalalThesis.pdf 78页图5.5描述了The complete object detection algorithm.前2步为初始化,上⾯基本提到了。后⾯2步如下For each scale Si = [Ss, SsSr, . . . , Sn]
(a) Rescale the input image using bilinear interpolation
(b) Extract features (Fig. 4.12) and densely scan the scaled image with stride Ns for object/non-objectdetections
(c) Push all detections with t(wi) > c to a listNon-maximum suppression
(a) Represent each detection in 3-D position and scale space yi(b) Using (5.9), compute the uncertainty matrices Hi for each point
(c) Compute the mean shift vector (5.7) iteratively for each point in the list until it converges to a mode(d) The list of all of the modes gives the final fused detections
(e) For each mode compute the bounding box from the final centre point and scale
以下内容节选⾃⽂献NavneetDalalThesis.pdf,把重要的部分挑出来了。其中保留了原⽂章节号,便于查找。
4. Histogram of Oriented Gradients Based Encoding of ImagesDefault Detector.
As a yardstick for the purpose of comparison, throughout this section we compare results to ourdefault detector which has the following properties: input image in RGB colour space (withoutany gamma correction); image gradient computed by applying [?1, 0, 1] filter along x- and yaxiswith no smoothing; linear gradient voting into 9 orientation bins in 0_–180_; 16×16 pixelblocks containing 2×2 cells of 8×8 pixel; Gaussian block windowing with _ = 8 pixel; L2-Hys(Lowe-style clipped L2 norm) block normalisation; blocks spaced with a stride of 8 pixels (hence4-fold coverage of each cell); 64×128 detection window; and linear SVM classifier. We oftenquote the performance at 10?4 false positives per window (FPPW) – the maximum false positiverate that we consider to be useful for a real detector given that 103–104 windows are tested foreach image.
4.3.2 Gradient Computation
The simple [?1, 0, 1] masks give the best performance.4.3.3 Spatial / Orientation Binning
Each pixel contributes a weighted vote for orientation based on the orientation of the gradient elementcentred on it.
The votes are accumulated into orientation bins over local spatial regions that we call cells.
To reduce aliasing, votes are interpolated trilinearly between the neighbouring bin centres in bothorientation and position.
Details of the trilinear interpolation voting procedure are presented in Appendix D.
The vote is a function of the gradient magnitude at the pixel, either the magnitude itself, its square, its
square root, or a clipped form of the magnitude representing soft presence/absence of an edge at the pixel.In practice, using the magnitude itself gives the best results.
4.3.4 Block Normalisation Schemes and Descriptor Overlap
good normalisation is critical and including overlap significantly improves the performance.
Figure 4.4(d) shows that L2-Hys, L2-norm and L1-sqrt all perform equally well for the person detector.such as cars and motorbikes, L1-sqrt gives the best results. 4.3.5 Descriptor BlocksR-HOG.
For human detection, 3×3 cell blocks of 6×6 pixel cells perform best with 10.4% miss-rateat 10?4 FPPW. Our standard 2×2 cell blocks of 8×8 cells are a close second. We find 2×2 and 3×3 cell blocks work best. 4.3.6 Detector Window and Context
Our 64×128 detection window includes about 16 pixels of margin around the person on all foursides.
4.3.7 Classifier
By default we use a soft (C=0.01) linear SVM trained with SVMLight [Joachims 1999].We modifiedSVMLight to reduce memory usage for problems with large dense descriptor vectors. ---------------------------------5. Multi-Scale Object Localisation
the detector scans the image with a detection window at all positions and scales, running the classifier ineach window and fusing multiple overlapping detections to yield the final object detections.
We represent detections using kernel density estimation (KDE) in 3-D position and scale space. KDE is adata-driven process where continuous densities are evaluated by applying a smoothing kernel to observeddata points. The bandwidth of the smoothing kernel defines the local neighbourhood. The detection scoresare incorporated by weighting the observed detection points by their score values while computing thedensity estimate. Thus KDE naturally incorporates the first two criteria. The overlap criterion follows fromthe fact that detections at very different scales or positions are far off in 3-D position and scale space, andare thus not smoothed together. The modes (maxima) of the density estimate correspond to the positionsand scales of final detections.
Let xi = [xi, yi] and s0i denote the detection position and scale, respectively, for the i-th detection.the detections are represented in 3-D space as y = [x, y, s], where s = log(s’). the variable bandwidth mean shift vector is defined as (5.7)
For each of the n point the mean shift based iterative procedure is guaranteed to converge to a mode2. Detection Uncertainty Matrix Hi.
One key input to the above mode detection algorithm is the amount of uncertainty Hi to be associated witheach point. We assume isosymmetric covariances, i.e. the Hi’s are diagonal matrices.Let diag [H] represent the 3 diagonal elements of H. We use scale dependent covariancematrices such that diag
[Hi] = [(exp(si)_x)2, (exp(si)_y)2, (_s)2] (5.9)
where _x, _y and _s are user supplied smoothing values.
The term t(wi) provides the weight for each detection. For linear SVMs we usually use threshold = 0.
the smoothing parameters _x, _y,and _s used in the non-maximum suppression stage. These parameterscan have a significant impact on performance so proper evaluation is necessary. For all of the results here,unless otherwise noted, a scale ratio of 1.05, a stride of 8 pixels, and _x = 8, _y = 16, _s = log(1.3) are usedas default values.
A scale ratio of 1.01 gives the best performance, but significantly slows the overall process.Scale smoothing of log(1.3)–log(1.6) gives good performance for most object classes.
We group these mode candidates using a proximity measure. The final location is the ode correspondingto the highest density.
----------------------------------------------------附录 A. INRIA Static Person Data Set
The (centred and normalised) positive windows are supplied by the user, and the initial set of negatives iscreated once and for all by randomly sampling negative images.A preliminary classifier is thus trained
using these. Second, the preliminary detector is used to exhaustively scan the negative training images forhard examples (false positives). The classifier is then re-trained using this augmented training set (usersupplied positives, initial negatives and hard examples) to produce the final detector. INRIA Static Person Data Set
As images of people are highly variable, to learn an effective classifier, the positive training examples needto be properly normalized and centered to minimize the variance among them. For this we manuallyannotated all upright people in the original images.
The image regions belonging to the annotations were cropped and rescaled to 64×128 pixel imagewindows. On average the subjects height is 96 pixels in these normalised windows to allow for an
approximately16 pixel margin on each side. In practise we leave a further 16 pixel margin around eachside of the image window to ensure that flow and gradients can be computed without boundary effects. Themargins were added by appropriately expanding the annotations on each side before cropping the imageregions.
//<------------------------以上摘⾃datal的博⼠毕业论⽂关于INRIA Person Dataset的更多介绍,见以下链接http://pascal.inrialpes.fr/data/human/Original Images
Folders 'Train' and 'Test' correspond, respectively, to original training and test images. Both foldershave three sub folders: (a) 'pos' (positive training or test images), (b) 'neg' (negative training or test images),and (c) 'annotations' (annotation files for positive images in Pascal Challenge format). Normalized Images
Folders 'train_64x128_H96' and 'test_64x128_H96' correspond to normalized dataset as used inabove referenced paper. Both folders have two sub folders: (a) 'pos' (normalized positive training or testimages centered on the person with their left-right reflections), (b) 'neg' (containing original negative
training or test images). Note images in folder 'train/pos' are of 96x160 pixels (a margin of 16 pixels aroundeach side), and images in folder 'test/pos' are of 70x134 pixels (a margin of 3 pixels around each side).This has been done to avoid boundary conditions (thus to avoid any particular bias in the classifier). Inboth folders, use the centered 64x128 pixels window for original detection task. Negative windows
To generate negative training windows from normalized images, a fixed set of 12180 windows (10windows per negative image) are sampled randomly from 1218 negative training photos providing theinitial negative training set. For each detector and parameter combination, a preliminary detector is trainedand all negative training images are searched exhaustively (over a scale-space pyramid) for false positives(`hard examples'). All examples with score greater than zero are considered hard examples. The method isthen re-trained using this augmented set (initial 12180 + hard examples) to produce the final detector. Theset of hard examples is subsampled if necessary, so that the descriptors of the final training set fit into 1.7GB of RAM for SVM training.
//------------------------------------------------------______________>原作者对 OpenCV2.0 peopledetect 进⾏了2次更新https://code.ros.org/trac/opencv/changeset/2314/trunk最近⼀次改为如下:---------------------#include \"cvaux.h\"#include \"highgui.h\"#include#include#include
using namespace cv;using namespace std;
int main(int argc, char** argv){
Mat img;FILE* f = 0;
char _filename[1024];if( argc == 1 ){
printf(\"Usage: peopledetect ( | .txt)\\n\");return 0;}
img = imread(argv[1]);if( img.data ){
strcpy(_filename, argv[1]);}else{
f = fopen(argv[1], \"rt\");if(!f){
fprintf( stderr, \"ERROR: the specified file could not be loaded\\n\");return -1;
}}
HOGDescriptor hog;
hog.setSVMDetector(HOGDescriptor::getDefaultPeopleDetector());for(;;){
char* filename = _filename;if(f){
if(!fgets(filename, (int)sizeof(_filename)-2, f))break;
//while(*filename && isspace(*filename))// ++filename;
if(filename[0] == '#')continue;
int l = strlen(filename);
while(l > 0 && isspace(filename[l-1]))--l;
filename[l] = '\\0';
img = imread(filename);}
printf(\"%s:\\n\if(!img.data)continue;fflush(stdout);
vector found, found_filtered;
double t = (double)getTickCount();
// run the detector with default parameters. to get a higher hit-rate
// (and more false alarms, respectively), decrease the hitThreshold and
// groupThreshold (set groupThreshold to 0 to turn off the grouping completely).int can = img.channels();
hog.detectMultiScale(img, found, 0, Size(8,8), Size(32,32), 1.05, 2);t = (double)getTickCount() - t;
printf(\"tdetection time = %gms\\n\size_t i, j;
for( i = 0; i < found.size(); i++ ){
Rect r = found[i];
for( j = 0; j < found.size(); j++ )if( j != i && (r & found[j]) == r)break;
if( j == found.size() )
found_filtered.push_back(r);}
for( i = 0; i < found_filtered.size(); i++ ){
Rect r = found_filtered[i];
// the HOG detector returns slightly larger rectangles than the real objects.// so we slightly shrink the rectangles to get a nicer output.r.x += cvRound(r.width*0.1);
r.width = cvRound(r.width*0.1);r.y += cvRound(r.height*0.07);r.height = cvRound(r.height*0.1);
rectangle(img, r.tl(), r.br(), cv::Scalar(0,255,0), 3);}
imshow(\"people detector\int c = waitKey(0) & 255;if( c == 'q' || c == 'Q' || !f)break;}if(f)
fclose(f);
return 0;}
更新后可以批量检测图⽚!
将需要批量检测的图⽚,构造⼀个TXT⽂本,⽂件名为filename.txt, 其内容如下1.jpg2.jpg......
然后在DOS界⾯输⼊ peopledetect filename.txt , 即可⾃动检测每个图⽚。
//////////////////////////////////////////////////////////////////------------------------------Navneet Dalal的OLT⼯作流程描述Navneet Dalal在以下⽹站提供了INRIA Object Detection and Localization Toolkithttp://pascal.inrialpes.fr/soft/olt/
Wilson Suryajaya Leoputra提供了它的windows版本http://www.computing.edu.au/~12482661/hog.html
需要 Copy all the dll's (boost_1.34.1*.dll, blitz_0.9.dll, opencv*.dll) into \"/debug/\"
Navneet Dalal提供了linux下的可执⾏程序,借别⼈的linux系统,运⾏⼀下,先把总体流程了解了。下⾯结合OLTbinaries\\readme和OLTbinaries\\HOG\\record两个⽂件把其流程描述⼀下。
1.下载 INRIA person detection database 解压到OLTbinaries\\;把其中的'train_64x128_H96' 重命名为'train' ; 'test_64x128_H96' 重命名为 'test'. 2.在linux下运⾏ 'runall.sh' script.
等待结果出来后,打开matlab 运⾏ plotdet.m 可绘制 DET曲线;------这是⼀步到位法---------------------------------------------------------此外,它还提供了分步执⾏法-------------------------------------1、由pos.lst列表提供的图⽚,计算正样本R-HOG特征,pos.lst列表格式如下train/pos/crop_000010a.pngtrain/pos/crop_000010b.pngtrain/pos/crop_000011a.png
------以下表⽰-linux下执⾏语句(下同)------./bin//dump_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 -- epsilon 1 --fullcirc 0-v 3 --proc rgb_sqrt --norm l2hys -s 1 train/pos.lst HOG/train_pos.RHOG2.计算负样本R-HOG特征
./bin//dump_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 -- epsilon 1 --fullcirc 0-v 3 --proc rgb_sqrt --norm l2hys -s 10 train/neg.lst HOG/train_neg.RHOG3.训练
./bin//dump4svmlearn -p HOG/train_pos.RHOG -n HOG/train_neg.RHOG HOG/train_BiSVMLight.blt -v 4.创建 model file: HOG/model_4BiSVMLight.alt
./bin//svm_learn -j 3 -B 1 -z c -v 1 -t 0 HOG/train_BiSVMLight.blt HOG/model_4BiSVMLight.alt5.创建⽂件夹
mkdir -p HOG/hard6.分类
./bin//classify_rhog train/neg.lst HOG/hard/list.txt HOG/model_4BiSVMLight.alt -d HOG/hard/hard_neg.txt -c HOG/hard/hist.txt -m 0 -t 0 --no_nonmax 1 --avsize 0 --margin 0 --scaleratio 1.2 -l N -W 64,128 -C 8,8 -N2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 --epsilon 1 --fullcirc 0 -v 3 --proc rgb_sqrt --norm l2hys--------false +/- 分类结果会写⼊ HOG/hard/hard_neg.txt7. 将hard加⼊到neg,再次计算RHOG特征
./bin//dump_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 -- epsilon 1 --fullcirc 0-v 3 --proc rgb_sqrt --norm l2hys -s 0 HOG/hard/hard_neg.txt OG/train_hard_neg.RHOG --poscases 2416 --negcases 12180 --dumphard 1 --hardscore 0 -- memorylimit 17008.再次训练
./bin//dump4svmlearn -p HOG/train_pos.RHOG -n HOG/train_neg.RHOG -n HOG/train_hard_neg.RHOGHOG/train_BiSVMLight.blt -v 49.得到最终的模型
./bin//svm_learn -j 3 -B 1 -z c -v 1 -t 0 HOG/train_BiSVMLight.blt HOG/model_4BiSVMLight.alt
Opencv中⽤到的3780 个值,应该就在这个模型⾥⾯model_4BiSVMLight.alt,不过它的格式未知,⽆法直接读取,但是可以研究svm_learn程序是如何⽣成它的;此外,该模型由程序classify_rhog调⽤,研究它如何调⽤,估计是⼀个解析此格式的思路10.创建⽂件夹
mkdir -p HOG/WindowTest_Negative
11.负样本检测结果
./bin//classify_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 --epsilon 1 --fullcirc0 -v 3 --proc rgb_sqrt --norm l2hys -p 1 --no_nonmax 1 --nopyramid 0 - -scaleratio 1.2 -t 0 -m 0 --avsize 0 --margin 0 test/neg.lst HOG/WindowTest_Negative/list.txt HOG/model_4BiSVMLight.alt -cHOG/WindowTest_Negative/histogram.txt12.创建⽂件夹
mkdir -p HOG/WindowTest_Positive13.正样本检测结果
./bin//classify_rhog -W 64,128 -C 8,8 -N 2,2 -B 9 -G 8,8 -S 0 --wtscale 2 --maxvalue 0.2 -- epsilon 1 --fullcirc 0 -v 3 --proc rgb_sqrt --norm l2hys -p 1 --no_nonmax 1 --nopyramid 1 -t 0 -m 0 --avsize 0 --margin 0test/pos.lst HOG/WindowTest_Positive/list.txt HOG/model_4BiSVMLight.alt -cHOG/WindowTest_Positive/histogram.txt
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////如何制作训练样本
分析了原作者的数据集,结合⽹上⼀些资料,下⾯描述如何制作训练样本1、如何从原始图⽚⽣成样本
对⽐INRIAPerson\\INRIAPerson\\Train\\pos(原始图⽚),INRIAPerson\rain_64x128_H96\\pos(⽣成样本)可以发现,作者从原始图⽚裁剪出⼀些站⽴的⼈,要求该⼈不被遮挡,然后对剪裁的图⽚left-rightreflect。以第⼀张图⽚为例crop001001,它剪裁了2个不被遮挡的⼈,再加上原照⽚,共3张,再加左右镜像,总共6张。2、裁剪
可利⽤基于opencv1.0的程序imageclipper,进⾏裁剪并保存,它会⾃动⽣成⽂件名并保存在同⼀路径下新⽣成的imageclipper⽂件夹下。3.改变图⽚⼤⼩
可以利⽤Acdsee软件,Tools/open in editor,进去后到Resize选项; tools/rotate还可实现left-right reflect⾃⼰编了⼀个程序,批量改变图⽚⼤⼩,代码见下⼀楼
4. 制作pos.lst列表
进⼊dos界⾯,定位到需要制作列表的图⽚⽂件夹下,输⼊ dir /b> pos.lst,即可⽣成⽂件列表;/////////////////////////#include \"cv.h\"
#include \"highgui.h\"#include \"cvaux.h\"
int main(int argc,char * argv[]){
IplImage* src ;IplImage* dst = 0;CvSize dst_size;FILE* f = 0;
char _filename[1024];int l;
f = fopen(argv[1], \"rt\");if(!f){
fprintf( stderr, \"ERROR: the specified file could not be loaded\\n\");return -1;}
for(;;){
char* filename = _filename;if(f){
if(!fgets(filename, (int)sizeof(_filename)-2, f))break;
if(filename[0] == '#')
continue;
l = strlen(filename);
while(l > 0 && isspace(filename[l-1]))--l;
filename[l] = '\\0';
src=cvLoadImage(filename,1);}
dst_size.width = 96;dst_size.height = 160;
dst=cvCreateImage(dst_size,src->depth,src->nChannels);cvResize(src,dst,CV_INTER_LINEAR);//////////////////
char* filename2 = _filename;char* filename3 = _filename; filename3=\"_96x160.jpg\";strncat(filename2, filename,l-4); strcat(filename2, filename3);cvSaveImage(filename2, dst);}if(f)
fclose(f);
cvWaitKey(-1);
cvReleaseImage( &src );cvReleaseImage( &dst );return 0;}
因篇幅问题不能全部显示,请点此查看更多更全内容