百度360必应搜狗淘宝本站头条
当前位置:网站首页 > 编程字典 > 正文

AprilTags论文解读(论文的解读)

toyiye 2024-08-10 21:28 12 浏览 0 评论

1.1 ARToolkit的劣势:

A major disadvantage of this approach is the computational cost associated with decoding tags, since each template required a separate, slow correlation operation. A second disadvantage is that it is difficult to generate templates that are approximately orthogonal to each other.

主要意思是说:第一个劣势每个模板都是独立的所有校正操作非常的慢,第二个劣势是说为每一个合适正交直线的图像创建模板是非常的困难。

The tag detection scheme used by ARToolkit is based on a simple binarization of the input image based on a userspecified threshold.

这是因为在tag获取的时候只是通过用户给定的一个阈值得到一个简单二值化图像。

This scheme is very fast, but not robust to changes in illumination.

这种方法很快,但是在改变光强的时候就不实用。

In general, ARToolkit’s detections can not handle even modest occlusions of the tag’s border.

通常,ARToolkit也不能出来有适当遮挡的标签边缘。

1.2 ARTag 对ARToolkit的改进:

the detection mechanism was based on the image gradient, making it robust to changes in lighting.

使用图像的梯度来获取tag,这样让他在光照的改变上更加的实用。

While the details of the detector algorithm are not public, ARTag’s detection mechanism is able to detect tags whose border is partially occluded.

ARTag 的详细的获取算法不公开,并且他可以获取tag边缘被部分闭塞。

ARTag also provided the first coding system based on forward error correction, which

made tags easier to generate, faster to correlate, and provided greater orthogonality between tags.

ARTag 提供第一个向前纠错的解码系统,这个让tag容易产生,快速纠错,也提供更好的算法。

二、获取tags(Detector)

2.1 整体描述:

we describe the detector whose job is to estimate the position of possible tags in an image. Loosely speaking, the detector attempts to find four-sided regions (“quads”) that have a darker interior than their exterior. The tags themselves have black and white borders in order to facilitate this.

寻找场景中可能的tag图像,即尝试着寻找内“黑”外“白”的四边形,并且为了好识别tag本身有黑白的边缘特征。如下图。

2.2 获取线段(Detecting line segments )

Our approach begins by detecting lines in the image. Our approach, similar in basic approach to the ARTag detector, computes the gradient direction and magnitude at every pixel and agglomeratively clusters the pixels into components with similar gradient directions and magnitudes.

大概意思是说,类似于ARTag 的获取方法,即计算tag的每一个像素点的梯度方向和幅值,并且把相同的梯度方向和幅值得像素集群到一个部件中。

2.3 之前的方法(Early processing steps)

First:The tag detection algorithm begins by computing the gradient at every pixel, computing their magnitudes (通过计算像素的梯度得到幅值图像)。

Second:gradient direction(得到梯度方向)

Third:similar gradient directions and magnitude are clustered into components(相似的梯度方向和幅值集群到一个组件)

集群算法:

The clustering algorithm is similar to the graph-based method of Felzenszwalb : a graph is created in which each node represents a pixel.

使用类似于Felzenszwalb集群算法,每一个节点node来代表一个像素。

算法描述:

Edges are added between adjacent pixels with an edge weight equal to the pixels’ difference in gradient direction. These edges are then sorted and processed in terms of increasing edge weight: for each edge, we test whether the connected components that the pixels belong to should be joined together.

边缘被添加是通过临近的不同的像素梯度方向的边缘权重。这些边缘在增长边缘权重方面被分类和处理:为了每个边缘,测试像素属于应该被集群的像素是否连接组件。

算法问题:

This gradient-based clustering method is sensitive to noise in the image: even modest amounts of noise will cause local gradient directions to vary, inhibiting the growth of the components. The solution to this problem is to low-pass filter the image.

算法对于噪声集群方法很敏感,甚至适当的噪声会导致局部梯度不同,约束部件增长。解决方案的问题可以通过低通滤波。

Unlike other problem domains where this filtering can blur useful information in the image, the edges of a tag are intrinsically large-scale features (particularly in comparison to the data field), and so this filtering does not cause information loss. We recommend a value of σ = 0.8.

不像其他问题域,这个滤波会模糊一些有用的信息,tag的边缘本质上是一个很大的特征,所以滤波不会导致信息丢失,建议设置值为0.8。

Fourth:Using weighted least squares, a line segment is then fit to the pixels in each component.(使用加权小二乘法,一条线段就适合每个组件的像素。)

The direction of the line segment is determined by the gradient direction, so that segments are dark on the left, light on the right. The direction of the lines are visualized by short perpendicular “notches” at their midpoint; note that these “notches” always point towards the lighter region.

线段的方向通过梯度的方向来决定,因此线段的左边是暗部,右边是亮部。线段的方向在线段的中部短的垂直槽口来直观表示,注意这些槽口总是指向亮得区域。

2.4 获取线段的总结

The segmentation algorithm is the slowest phase in our detection scheme. As an option, this segmentation can be performed at half the image resolution with a 4x improvement in speed. The sub-sampling operation can be efficiently combined with the recommended low-pass filter. The consequence of this optimization is a modestly reduced detection range, since very small quads may no longer be detected.

分割算法是慢的在获取方案中,作为一个选项,这种分割可以在一半的图像分辨率提升了4倍的速度。二级抽样操作推荐与低通滤波器结合能增加效率。有效的结果是适当的减少获取范围,因此非常小的四边形不再被获取。

2.5 四边形获取

Our approach is based on a recursive depth-first search with a depth of four: each level of the search tree adds an edge to the quad. At depth one, we consider all line segments. At depths two through four, we consider all of the line segments that begin “close enough” to where the previous line segment ended and which obey a counter-clockwise winding order.

我们的方法是基于一个深度为4的递归深度优先搜索算法:每一层搜索添加一个边缘到四边形。在第一层深度,考虑所有的线段。在第二层到第四层,考虑所有的线段从“完全闭合”之前线段结束的地方开始,并且服从一个逆时针缠绕顺序。

Robustness to occlusions and segmentation errors is handled by adjusting the “close enough” threshold: by making the threshold large, significant gaps around the edges can be handled. Our threshold for “close enough” is twice the length of the line plus five additional pixels. This is a large threshold which leads to a low false negative rate, but also results in a high false positive rate.

嵌入式物联网需要学的东西真的非常多,千万不要学错了路线和内容,导致工资要不上去!

无偿分享大家一个资料包,差不多150多G。里面学习内容、面经、项目都比较新也比较全!某鱼上买估计至少要好几十。加微信领取资料

鲁棒性遮挡和分割错误处理通过调整“完全闭合”阈值:通过标记大的阈值,大的间隙边缘会被处理。我们阈值足够近两倍的长度线加另外5个像素,这是一个大门槛导致负错误率很低,但也导致较高正错误率。

We populate a two-dimensional lookup table to accelerate queries for line segments that begin near a point in space.

填充一个二维查找表来加快查询线段,开始在空间中的一个点。

三、算出tag距相机距离与角度

3.1 Homography and extrinsics estimation(单应性和外在评估)

3.1.1 通过DLT得到单应矩阵

We compute the 3×3 homography matrix that projects 2D points in homogeneous coordinates from the tag’s coordinate system (in which [0 0 1]T is at the center of the tag and the tag extends one unit in the x? and y?directions) to the 2D image coordinate system. The homography is computed using the Direct Linear Transform (DLT) algorithm. Note that since the homography projects points in homogeneous coordinates, it is defined only up to scale.

计算的3x3 单应矩阵, 项目2D 点的均匀坐标从标签的坐标系 (在其中 [0 0 1] T 是在标签的中心和标签扩展一个单位在 x?和 y?方向) 到2D 图像坐标系统。应是使用直接线性变换 (DLT) 算法计算的。请注意,由于单应项目是以齐次坐标表示的, 所以它的定义只有按比例。

3.1.2 计算方法

Computation of the tag’s position and orientation requires additional information: the camera’s focal length and the physical size of the tag.

标签的位置和方向的计算需要附加信息:相机的焦距和标签的物理大小。

The 3 × 3 homography matrix (computed by the DLT) can be written as the product of the 3 × 4 camera projection matrix P (which we assume is known) and the 4 × 3 truncated extrinsics matrix E.

3 x 3 单应矩阵 (由 DLT 计算) 可以写成 3 x 4 相机投影矩阵 P (我们假设已知) 和 4 x 3 截断extrinsics矩阵E的乘积。

截断extrinsics矩阵 E:

extrinsics matrix are typically 4 × 4, but every position on the tag

is at z = 0 in the tag’s coordinate system. Thus, we can rewrite every tag coordinate as a 2D homogeneous point with z implicitly zero, and remove the third column of the extrinsics matrix, forming the truncated extrinsics matrix.

extrinsics 矩阵通常是 4 x 4, 但每个位置上的标签在标记的坐标系统中为 z = 0。因此, 我们可以将每个标记坐标重写为一个具有 z 隐式零的2D 齐点, 并移除 extrinsics 矩阵的第三列。

We represent the rotation components of P as Rijand thetranslation components as Tk. We also represent the unknownscale factor as s.

我们代表 P 的旋转分量为 Rij和转换组件作为 Tk。我们也代表未知比例因子为s。

Note that we cannot directly solve for E because P is rankdeficient. We can expand the right hand side of Eqn. 2, andwrite the expression for each hij as a set of simultaneousequations。

请注意, 我们不能直接解决 E, 因为 P 是秩不足.我们可以扩大右手边的 Eqn 2,将每个hij的表达式写为一组同等方程。

These are all easily solved for the elements of Rij and Tkexcept for the unknown scale factor s. However, since thecolumns of a rotation matrix must all be of unit magnitude,we can constrain the magnitude of s. We have two columnsof the rotation matrix, so we compute s as the geometric the

geometric average of their magnitudes. The sign of s canbe recovered by requiring that the tag appear in front of thecamera, i.e., that Tz < 0. The third column of the rotationmatrix can be recovered by computing the cross product ofthe two known columns, since the columns of a rotation

matrix must be orthonormal.

这些都很容易解决的 Rij 和 Tk 的元素,除了未知的比例因子 s。然而, 由于旋转矩阵的列必须都是单位幅值,我们可以限制 s 的大小。我们有两列的旋转矩阵, 所以我们计算 s 为他们幅值的几何平均值。标记s可以重新获得通过请求在相机前的tag。即Tz < 0。旋转的第三列矩阵可以通过计算交叉乘积来恢复两个已知列, 因为旋转的列矩阵必须是正交的。

The DLT procedure and the normalization procedureabove do not guarantee that the rotation matrix is strictlyorthonormal. To correct this, we compute the polar decomposition of R, which yields a proper rotation matrix whileminimizing the Frobenius matrix norm of the error.

DLT 程序与规范化程序以上不保证旋转矩阵是严格正交.为了纠正这一点, 我们计算 R 的极分解, 它产生一个适当的旋转矩阵, 而小化误差的 Frobenius 矩阵范数。

3.2PAYLOAD DECODING (有效载荷解码)

3.2.1 整体概述

The final task is to read the bits from the payload field.We do this bycomputing the tag-relative coordinates of eachbit field, transforming them into image coordinates using thehomography, and then thresholding the resulting pixels. Inorder to be robust to lighting (which can vary not only fromtag to tag, but also within a tag), we use a spatially-varyingthreshold.

后的任务是从有效负载字段中读取位。我们通过计算每个位字段的tag相对坐标系, 利用单应性将它们转换为图像坐标, 然后对结果像素进行阈值化。为了受光照影响小 (这不仅可以tag到tag, 而且也可以在一个tag), 我们使用空间变化阈。

we build spatially-varying model of the intensity of “black” pixels, and a second model for the intensity of“white” models. We use the border of the tag, which contains known examples of both white and black pixels.

我们建立了 "黑色" 像素的强度的空间变化模型, 以及第二个模型的强度"白色" 模型。我们使用标签的边框, 它包含白色和黑色像素的已知示例。

A fourth quad is detected around one of the payload bits of the larger

tag. These two extraneous detections are eventually discarded because their payload is invalid. The white dots correspond to samples around the tags border which are used to fit a linear model of intensity of “white” pixels; a model is similarly fit for the black pixels. These two models are used to threshold the data payload bits, shown as yellow dots.

在较大的一个有效载荷位的tag检测到一个四个方形。这两个外部检测终被丢弃, 因为它们的有效负载无效。白点对应于tag周围的样本用于拟合 "白" 像素强度线性模型的边界;模型同样适合黑色像素。这两种模型用于阈值数据有效负载位, 显示为黄色点。

This model has four parameters which are easily computedusing least squares regression. We build two such models,one for black, the other for white. The threshold used whendecoding data bits is then just the average of the predictedintensity values of the black and white models.

该模型有四参数, 易于计算使用小二乘法回归。我们建立了两个这样的模型一个是黑色的, 另一个是白色的。使用的阈值解码数据位, 然后只是平均的预测黑白模型的强度值。

3.2.2 CODING SYSTEM (编码系统,决定获取的四边形是否有效。)

Thegoals of a coding system are to:

? Maximize the number of distinguishable codes

? Maximize the number of bit errors that can be detectedor corrected

? Minimize the false positive/inter-tag confusion rate

? Minimize the total number of bits per tag (and thus thesize of the tag)

These goals are often in conflict, and so a given coderepresents atrade-off.

编码系统的目标是:

·大化可区分码的数量

·大化可检测或更正的位错误数

·小的the false positive/inter-tag 混淆率

·小化每个tag的总位数 (tag的大小)

这些目标经常处于冲突中, 因此给定的代码表示权衡。

we describe a newcoding system based on lexicodes that provides significantadvantages over previous methods. Our procedure can generate lexicodes with a variety of properties, allowing the userto use a code that best fits their needs.

我们描述了一个新基于 lexicodes 的编码系统, 提供了显著优于以前的方法。我们的程序可以生成具有多种属性的 lexicodes, 允许用户使用符合其需要的代码。

we use a lexicode system that can generatecodes for any arbitrary tag size (e.g., 3x3, 4x4, 5x5, 6x6)and minimum Hamming distance. Ourapproach explicitlyguarantees the minimum Hamming distance for all four

rotations of each tag and eliminates tags which are oflow geometriccomplexity. Computing the tags can be anexpensive operation, but is done offline. Small tags (5x5)can be easily computed in seconds or minutes, but largertags (6x6) can take several days of CPU time.

我们使用一个 lexicode 系统, 可以生成任意标记大小的码 (例如, 3x3, 4x4, 5x5, 6x6)和小汉明距离。我们的方法明确保证小汉明距离为每个tag的4方向旋转和消除标签低几何复杂度。计算tag是昂贵的操作, 但离线完成。小标签 (5x5)可以很容易地以秒或分钟计算, 但更大标记 (6x6) 可能需要几天的 CPU 时间。

相关推荐

为何越来越多的编程语言使用JSON(为什么编程)

JSON是JavascriptObjectNotation的缩写,意思是Javascript对象表示法,是一种易于人类阅读和对编程友好的文本数据传递方法,是JavaScript语言规范定义的一个子...

何时在数据库中使用 JSON(数据库用json格式存储)

在本文中,您将了解何时应考虑将JSON数据类型添加到表中以及何时应避免使用它们。每天?分享?最新?软件?开发?,Devops,敏捷?,测试?以及?项目?管理?最新?,最热门?的?文章?,每天?花?...

MySQL 从零开始:05 数据类型(mysql数据类型有哪些,并举例)

前面的讲解中已经接触到了表的创建,表的创建是对字段的声明,比如:上述语句声明了字段的名称、类型、所占空间、默认值和是否可以为空等信息。其中的int、varchar、char和decimal都...

JSON对象花样进阶(json格式对象)

一、引言在现代Web开发中,JSON(JavaScriptObjectNotation)已经成为数据交换的标准格式。无论是从前端向后端发送数据,还是从后端接收数据,JSON都是不可或缺的一部分。...

深入理解 JSON 和 Form-data(json和formdata提交区别)

在讨论现代网络开发与API设计的语境下,理解客户端和服务器间如何有效且可靠地交换数据变得尤为关键。这里,特别值得关注的是两种主流数据格式:...

JSON 语法(json 语法 priority)

JSON语法是JavaScript语法的子集。JSON语法规则JSON语法是JavaScript对象表示法语法的子集。数据在名称/值对中数据由逗号分隔花括号保存对象方括号保存数组JS...

JSON语法详解(json的语法规则)

JSON语法规则JSON语法是JavaScript对象表示法语法的子集。数据在名称/值对中数据由逗号分隔大括号保存对象中括号保存数组注意:json的key是字符串,且必须是双引号,不能是单引号...

MySQL JSON数据类型操作(mysql的json)

概述mysql自5.7.8版本开始,就支持了json结构的数据存储和查询,这表明了mysql也在不断的学习和增加nosql数据库的有点。但mysql毕竟是关系型数据库,在处理json这种非结构化的数据...

JSON的数据模式(json数据格式示例)

像XML模式一样,JSON数据格式也有Schema,这是一个基于JSON格式的规范。JSON模式也以JSON格式编写。它用于验证JSON数据。JSON模式示例以下代码显示了基本的JSON模式。{"...

前端学习——JSON格式详解(后端json格式)

JSON(JavaScriptObjectNotation)是一种轻量级的数据交换格式。易于人阅读和编写。同时也易于机器解析和生成。它基于JavaScriptProgrammingLa...

什么是 JSON:详解 JSON 及其优势(什么叫json)

现在程序员还有谁不知道JSON吗?无论对于前端还是后端,JSON都是一种常见的数据格式。那么JSON到底是什么呢?JSON的定义...

PostgreSQL JSON 类型:处理结构化数据

PostgreSQL提供JSON类型,以存储结构化数据。JSON是一种开放的数据格式,可用于存储各种类型的值。什么是JSON类型?JSON类型表示JSON(JavaScriptO...

JavaScript:JSON、三种包装类(javascript 包)

JOSN:我们希望可以将一个对象在不同的语言中进行传递,以达到通信的目的,最佳方式就是将一个对象转换为字符串的形式JSON(JavaScriptObjectNotation)-JS的对象表示法...

Python数据分析 只要1分钟 教你玩转JSON 全程干货

Json简介:Json,全名JavaScriptObjectNotation,JSON(JavaScriptObjectNotation(记号、标记))是一种轻量级的数据交换格式。它基于J...

比较一下JSON与XML两种数据格式?(json和xml哪个好)

JSON(JavaScriptObjectNotation)和XML(eXtensibleMarkupLanguage)是在日常开发中比较常用的两种数据格式,它们主要的作用就是用来进行数据的传...

取消回复欢迎 发表评论:

请填写验证码