1
0
Files
bagu-thesis/docs/XThesis.example.xml

197 lines
6.3 KiB
XML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<?xml version="1.0" encoding="UTF-8"?>
<article xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="XThesis.xsd">
<!-- ===== 前 言 ===== -->
<preface>
<section caption="摘要">
<paragraph>
<span>本文提出了一种基于深度学习的图像识别方法。实验结果表明,该方法在</span>
<b>ImageNet</b>
<span>数据集上达到了</span>
<inlineeq>92.3\%</inlineeq>
<span>的准确率。</span>
</paragraph>
<paragraph>
<b>关键词:</b>
<span>深度学习;图像识别;卷积神经网络</span>
</paragraph>
</section>
</preface>
<!-- ===== 正 文 ===== -->
<document>
<section1 caption="引言">
<paragraph>
<span>随着深度学习技术的快速发展</span>
<cite>he2016, krizhevsky2012</cite>
<span>,图像识别领域取得了突破性进展。然而,现有方法在计算效率方面仍存在诸多挑战</span>
<cite>tan2019</cite>
<span></span>
</paragraph>
<paragraph>
<span>本文的主要贡献如下:</span>
<cref>sec:method,fig:arch</cref>
<span></span>
</paragraph>
<section2 caption="研究背景">
<paragraph>
<span>卷积神经网络CNN</span>
<cite>lecun1998</cite>
<span>提出以来,已成为计算机视觉领域的基础架构。</span>
</paragraph>
</section2>
<section2 caption="论文结构">
<paragraph>
<span>本文其余部分组织如下:第</span>
<ref>sec:method</ref>
<span>节介绍所提出的方法,第</span>
<ref>sec:experiment</ref>
<span>节展示实验结果,最后在第</span>
<ref>sec:conclusion</ref>
<span>节进行总结。</span>
</paragraph>
</section2>
</section1>
<section1 caption="相关工作">
<paragraph>
<span>近年来,多种网络架构被提出。</span>
<cite>simonyan2014, szegedy2015, he2016</cite>
<span>等代表性工作极大地推动了该领域的发展。</span>
</paragraph>
</section1>
<section1 caption="方法">
<paragraph>
<span>本节详细描述所提出的方法架构,如</span>
<cref>fig:arch</cref>
<span>所示。整体流程如公式</span>
<cref>eq:loss</cref>
<span>所定义。</span>
</paragraph>
<ol>
<li><span>输入图像经过预处理模块进行归一化。</span></li>
<li><span>特征提取网络(如</span><cref>fig:arch</cref><span>所示)提取多尺度特征。</span></li>
<li><span>分类头根据</span><cref>eq:loss</cref><span>计算损失并输出预测结果。</span></li>
</ol>
<figure label="fig:arch" caption="网络架构示意图">figures/architecture.png</figure>
<section2 caption="损失函数">
<paragraph>
<span>本文采用交叉熵损失函数,其定义如下:</span>
</paragraph>
<equation label="eq:loss">\mathcal{L} = -\sum_{i=1}^{N} y_i \log(\hat{y}_i)</equation>
<paragraph>
<span>其中</span>
<inlineeq>N</inlineeq>
<span>表示类别数量,</span>
<inlineeq>y_i</inlineeq>
<span>为真实标签,</span>
<inlineeq>\hat{y}_i</inlineeq>
<span>为预测概率。</span>
</paragraph>
</section2>
<section2 caption="优化策略">
<paragraph>
<span>模型使用</span>
<i>Adam</i>
<span>优化器进行训练,初始学习率设为</span>
<inlineeq>10^{-3}</inlineeq>
<span></span>
</paragraph>
</section2>
</section1>
<section1 caption="实验">
<paragraph>
<span>本节在三个标准数据集上评估所提出方法的性能。实验结果汇总于</span>
<ref>tbl:result</ref>
<span></span>
</paragraph>
<section2 caption="数据集">
<ul>
<li><b>CIFAR-10</b><span>10 个类别,共 60,000 张 32×32 彩色图像。</span></li>
<li><b>CIFAR-100</b><span>100 个类别,共 60,000 张 32×32 彩色图像。</span></li>
<li><b>ImageNet</b><span>1,000 个类别,共约 120 万张训练图像。</span></li>
</ul>
</section2>
<section2 caption="结果分析">
<paragraph>
<span>不同方法的对比结果如下表所示:</span>
</paragraph>
<table label="tbl:result" caption="各方法在 CIFAR-10 上的准确率对比">
<thead>
<td>方法</td>
<td>准确率 (%)</td>
<td>参数量 (M)</td>
</thead>
<tbody>
<tr>
<td>ResNet-18</td>
<td>93.0</td>
<td>11.7</td>
</tr>
<tr>
<td>ResNet-50</td>
<td>93.5</td>
<td>25.6</td>
</tr>
<tr>
<td>DenseNet-121</td>
<td>94.2</td>
<td>8.0</td>
</tr>
<tr>
<td>本文方法</td>
<td>95.1</td>
<td>9.3</td>
</tr>
</tbody>
</table>
<pagebreaker/>
<paragraph>
<span>从表中可以看出,本文方法以较少的参数量取得了最优的准确率。</span>
</paragraph>
</section2>
</section1>
<section1 caption="结论">
<paragraph>
<span>本文提出了一种高效的图像识别方法,在多个数据集上验证了其有效性。未来工作将探索该方法在视频分析领域的应用。</span>
</paragraph>
</section1>
</document>
<!-- ===== 附 录 ===== -->
<appendix>
<section caption="致谢">
<paragraph>
<span>本研究得到了国家自然科学基金项目编号No. 114514的资助在此表示感谢。</span>
</paragraph>
</section>
<section caption="附录 A 核心代码">
<paragraph>
<span>以下是模型核心模块的伪代码实现:</span>
</paragraph>
</section>
</appendix>
<!-- ===== 参考文献 ===== -->
<reference>references.bib</reference>
</article>