<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>蠢黑通行</title>
  
  <subtitle>一个人的全世界</subtitle>
  <link href="https://blackyau.cc/atom.xml" rel="self"/>
  
  <link href="https://blackyau.cc/"/>
  <updated>2021-09-09T15:26:31.000Z</updated>
  <id>https://blackyau.cc/</id>
  
  <author>
    <name>蠢黑通行</name>
    
  </author>
  
  <generator uri="https://hexo.io/">Hexo</generator>
  
  <entry>
    <title>主机互 ping 交互过程</title>
    <link href="https://blackyau.cc/27"/>
    <id>https://blackyau.cc/27</id>
    <published>2021-09-09T15:38:48.000Z</published>
    <updated>2021-09-09T15:26:31.000Z</updated>
    
    <content type="html"><![CDATA[<p>本文从一个 ping 命令作为一个起点，详细描述了数据在路由器之间流通的路径，以及 ARP 请求和回复的详细过程，还有数据在网络层、链路层之间封装和传递的过程。</p><span id="more"></span><p>拓扑结构如下</p><p><img data-src="https://st.blackyau.net/blog/27/1.png" alt="image.png"></p><h1 id="R2-模拟PC1-发送ping"><a href="#R2-模拟PC1-发送ping" class="headerlink" title="R2(模拟PC1) 发送ping"></a>R2(模拟PC1) 发送ping</h1><h2 id="应用层"><a href="#应用层" class="headerlink" title="应用层"></a>应用层</h2><p>应用程序要求使用 ICMP 协议发送一个给 192.168.1.2 类型为 8 的回显请求</p><h2 id="网络层"><a href="#网络层" class="headerlink" title="网络层"></a>网络层</h2><p>通过向 192.168.1.2 发送一个 ICMPv4 数据报，网络层尝试向远程主机发送一个请求，源 IP 为 192.168.3.2，目标 IP 为 192.168.1.2。其中 ICMP 数据中的类型字段为 8 代表这是一个 ICMPv4 的回显请求。IP 首部的协议为 1 即 ICMP。</p><h3 id="路由选择"><a href="#路由选择" class="headerlink" title="路由选择"></a>路由选择</h3><p>Pre: 优先级，优先级越高取值越小</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">&lt;R2&gt;dis ip rou</span><br><span class="line">Route Flags: R - relay, D - download to fib</span><br><span class="line">------------------------------------------------------------------------------</span><br><span class="line">Routing Tables: Public</span><br><span class="line">         Destinations : 5        Routes : 5        </span><br><span class="line"></span><br><span class="line">Destination/Mask    Proto   Pre  Cost      Flags NextHop         Interface</span><br><span class="line"></span><br><span class="line">        0.0.0.0/0   Static  60   0          RD   192.168.3.1     GigabitEthernet0/0/0</span><br><span class="line">      127.0.0.0/8   Direct  0    0           D   127.0.0.1       InLoopBack0</span><br><span class="line">      127.0.0.1/32  Direct  0    0           D   127.0.0.1       InLoopBack0</span><br><span class="line">    192.168.3.0/24  Direct  0    0           D   192.168.3.2     GigabitEthernet0/0/0</span><br><span class="line">    192.168.3.2/32  Direct  0    0           D   127.0.0.1       GigabitEthernet0/0/0</span><br></pre></td></tr></table></figure><p>当 TCP&#x2F;IP 需要向某个 IP 地址发起通信时，它会对路由表进行评估，以确定如何发送数据包。</p><p>TCP&#x2F;IP 使用需要通信的目的 IP 地址和路由表中每一个路由项的掩码进行按位与运算，如果与运算后的结果，匹配对应路由项的网络地址，则记录下次路由项。当计算完路由表中所有的路由项后。</p><ul><li>会使用最长匹配路由（掩码中具有最多 1 的路由项）来和此目的 IP 地址进行通信</li><li>如果存在多个最长匹配路由，那么选择具有最高优先级（值最小）的路由表</li><li>如果存在多个具有最高优先级的最长匹配路由，那么选择最开始找到的最长匹配路由（如果是其它设备的话，一般会看接入方式，有线&gt;无线&gt;移动信号4G）</li></ul><p>在这里它就选择了 <code>0.0.0.0/0</code> 的默认路由，决定了会把该数据包交给 <code>192.168.3.1</code> 帮忙传下去。</p><h2 id="ARP"><a href="#ARP" class="headerlink" title="ARP"></a>ARP</h2><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">&lt;R2&gt;dis arp</span><br><span class="line">IP ADDRESS      MAC ADDRESS     EXPIRE(M) TYPE        INTERFACE   VPN-INSTANCE VLAN/CEVLAN PVC                      </span><br><span class="line">------------------------------------------------------------------------------</span><br><span class="line">192.168.3.2     5489-98da-3e52            I -         GE0/0/0</span><br><span class="line">------------------------------------------------------------------------------</span><br><span class="line">Total:1         Dynamic:0       Static:0     Interface:1</span><br></pre></td></tr></table></figure><p>通过上面的路由选择过程后，确认了该数据包的下一跳为 <code>192.168.3.1</code> 但是在交给链路层包装以太帧之前，会先使用 ARP 用于确定目标 IP 对应的 MAC 地址。</p><p>查询 ARP 表后，发现 ARP 表中没有与之相对应的项目，所以就需要发起 ARP 请求。寻找 <code>192.168.3.1</code> 的 MAC 地址。</p><h3 id="发送-ARP-请求"><a href="#发送-ARP-请求" class="headerlink" title="发送 ARP 请求"></a>发送 ARP 请求</h3><p><img data-src="https://st.blackyau.net/blog/27/2.png" alt="image.png"></p><p>ARP 向 192.168.3.0&#x2F;24 广播 ARP 请求，其中源 MAC 地址和源 IP 地址都是自己对应接口的，目标 IP 就分别是 192.168.3.2，同时目标 MAC 为全 0。</p><p>链路层封装该 ARP 数据包时，使用源地址为端口的 MAC 地址，目标地址使用全 1，并将其中的类型字段设置为 0x0806 也就是 ARP 对应的值。</p><h3 id="收到-ARP-请求"><a href="#收到-ARP-请求" class="headerlink" title="收到 ARP 请求"></a>收到 ARP 请求</h3><p>R1 收到该 ARP 请求后，发现该请求的目标 IP 是自己。它首先会更新自己的 ARP 表，将该请求的源 IP 和源 MAC 地址写入 ARP 表中。这样可以减少一次 ARP 请求，提升效率。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">&lt;R1&gt;dis arp</span><br><span class="line">IP ADDRESS      MAC ADDRESS     EXPIRE(M) TYPE        INTERFACE   VPN-INSTANCE VLAN/CEVLAN PVC                      </span><br><span class="line">------------------------------------------------------------------------------</span><br><span class="line">192.168.0.1     5489-989b-51f2            I -         Eth0/0/0</span><br><span class="line">192.168.1.1     5489-989b-51f3            I -         Eth0/0/1</span><br><span class="line">192.168.1.2     5489-982b-3b01  20        D-0         Eth0/0/1</span><br><span class="line">192.168.3.1     5489-989b-51f4            I -         GE0/0/0</span><br><span class="line">192.168.3.2     5489-98da-3e52  20        D-0         GE0/0/0  // 新写入的条目</span><br><span class="line">------------------------------------------------------------------------------</span><br><span class="line">Total:5         Dynamic:2       Static:0     Interface:3    </span><br></pre></td></tr></table></figure><h3 id="回复-ARP-请求"><a href="#回复-ARP-请求" class="headerlink" title="回复 ARP 请求"></a>回复 ARP 请求</h3><p><img data-src="https://st.blackyau.net/blog/27/3.png" alt="image.png"></p><p>R1 收到该 ARP 请求后，发现该请求的目标 IP 是自己，然后就会调换源地址和目标地址，并且将自己的 MAC 地址写入源地址的字段中，发送该 ARP 回复。</p><p>链路层封装该 ARP 数据包时，使用源地址为自己的 MAC 地址，目标地址为发送该请求的 MAC 地址。</p><h3 id="收到-ARP-回复"><a href="#收到-ARP-回复" class="headerlink" title="收到 ARP 回复"></a>收到 ARP 回复</h3><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">&lt;R2&gt;dis arp</span><br><span class="line">IP ADDRESS      MAC ADDRESS     EXPIRE(M) TYPE        INTERFACE   VPN-INSTANCE VLAN/CEVLAN PVC                      </span><br><span class="line">------------------------------------------------------------------------------</span><br><span class="line">192.168.3.2     5489-98da-3e52            I -         GE0/0/0</span><br><span class="line">192.168.3.1     5489-989b-51f4  20        D-0         GE0/0/0</span><br><span class="line">------------------------------------------------------------------------------</span><br><span class="line">Total:2         Dynamic:1       Static:0     Interface:1    </span><br></pre></td></tr></table></figure><p>R2 将回复的 MAC 地址写入自己的 ARP 缓存中。</p><p>然后 R2 就继续发送引起这次 ARP 请求&#x2F;应答交换过程的数据报。把数据转发出去的时候，还会将 TTL - 1。</p><h2 id="链路层"><a href="#链路层" class="headerlink" title="链路层"></a>链路层</h2><p><img data-src="https://st.blackyau.net/blog/27/4.png" alt="image.png"></p><p>收到网络层的 ICMP 数据包，将其包装为以太网帧，其中源地址为 R2 的 MAC 地址，目标地址为刚刚通过 ARP 请求得到的 MAC 地址。</p><h1 id="R1-转发数据包"><a href="#R1-转发数据包" class="headerlink" title="R1 转发数据包"></a>R1 转发数据包</h1><p>路由器收到了由 192.168.3.2 发来 ICMP 数据帧，对其进行解析。</p><h2 id="网络层-1"><a href="#网络层-1" class="headerlink" title="网络层"></a>网络层</h2><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">[Huawei]dis ip rou</span><br><span class="line">Route Flags: R - relay, D - download to fib</span><br><span class="line">------------------------------------------------------------------------------</span><br><span class="line">Routing Tables: Public</span><br><span class="line">         Destinations : 8        Routes : 8        </span><br><span class="line"></span><br><span class="line">Destination/Mask    Proto   Pre  Cost      Flags NextHop         Interface</span><br><span class="line"></span><br><span class="line">      127.0.0.0/8   Direct  0    0           D   127.0.0.1       InLoopBack0</span><br><span class="line">      127.0.0.1/32  Direct  0    0           D   127.0.0.1       InLoopBack0</span><br><span class="line">    192.168.0.0/24  Direct  0    0           D   192.168.0.1     Ethernet0/0/0</span><br><span class="line">    192.168.0.1/32  Direct  0    0           D   127.0.0.1       Ethernet0/0/0</span><br><span class="line">    192.168.1.0/24  Direct  0    0           D   192.168.1.1     Ethernet0/0/1</span><br><span class="line">    192.168.1.1/32  Direct  0    0           D   127.0.0.1       Ethernet0/0/1</span><br><span class="line">    192.168.3.0/24  Direct  0    0           D   192.168.3.1     GigabitEthernet0/0/0</span><br><span class="line">    192.168.3.1/32  Direct  0    0           D   127.0.0.1       GigabitEthernet0/0/0</span><br></pre></td></tr></table></figure><p>R1 查询自己的路由表，发现目的 IP 192.168.1.2 有匹配的项目，所以该数据包会向 Ethernet0&#x2F;0&#x2F;1 发送。</p><h2 id="ARP-1"><a href="#ARP-1" class="headerlink" title="ARP"></a>ARP</h2><p>路由器查询自己的 ARP 表，源 IP 的 MAC 地址已存在与 ARP 表中，因为之前回复它的 ARP 请求的时候，也在自己的 ARP 表中存了一份。</p><p>目的 IP 192.168.3.2 的 MAC 不在 ARP 表中，所以和上一次一样也需要经过 ARP 请求&#x2F;回复获取其 MAC 写入 ARP 表，再继续执行后面操作。</p><h2 id="链路层-1"><a href="#链路层-1" class="headerlink" title="链路层"></a>链路层</h2><p>收到网络层的 ICMP 数据包，将其包装为以太网帧，其中源地址为 R1 的 MAC 地址，目标地址为刚刚通过 ARP 请求得到的 MAC 地址。</p><h1 id="PC2-回复ping"><a href="#PC2-回复ping" class="headerlink" title="PC2 回复ping"></a>PC2 回复ping</h1><p><img data-src="https://st.blackyau.net/blog/27/5.png" alt="image.png"></p><p>收到来自 192.168.3.2 发送的 ICMPv4 请求，而且目标 IP 是自己，所以回复该 ICMP 请求。</p><h2 id="网络层-2"><a href="#网络层-2" class="headerlink" title="网络层"></a>网络层</h2><p>将接收到的数据返回给发送者，调换源 IP 和目的 IP，IGMP 数据中的类型为 0 代表这是一个回复。</p><p>发送回复的时候，还是和之前一样，要先查自己的路由表。然后因为路由表中的默认路由匹配了，所以就会将该数据包交给默认网关处理。</p><h2 id="ARP-2"><a href="#ARP-2" class="headerlink" title="ARP"></a>ARP</h2><p>再次通过 ARP 表，找到默认网关 192.168.1.1 的 MAC 地址。</p><h2 id="链路层-2"><a href="#链路层-2" class="headerlink" title="链路层"></a>链路层</h2><p>封装以太网帧的时候，使用目的地址为默认网关的 MAC 地址。</p><h1 id="R1-转发数据包-1"><a href="#R1-转发数据包-1" class="headerlink" title="R1 转发数据包"></a>R1 转发数据包</h1><ul><li>查询路由表，发现目的 IP 在路由表中</li><li>查询 ARP 表，找到目的 IP 的 MAC 地址</li><li>链路层封装数据帧，发送给 R2</li></ul><h1 id="R2-收到应答"><a href="#R2-收到应答" class="headerlink" title="R2 收到应答"></a>R2 收到应答</h1><p>收到回复后，将当前时间与应答中的时间相减，也就获得了到达被 ping 主机的 RTT 估计值。</p><p>对于 ICMPv4 来说，类型为 0&#x2F;8 的应答与请求，时间戳为可选的字段。而类型为 14&#x2F;13 的应答和请求就需要带时间戳，但是现在已经作废不用。</p><h1 id="后续"><a href="#后续" class="headerlink" title="后续"></a>后续</h1><p>当 PC1 继续发送 ICMP 包的话，他会将序列号 +1，然后继续发送。此时因为 ARP 缓存都已存在，所以不会需要 ARP 请求&#x2F;回复，链路层会直接调用 ARP 缓存中的 MAC 地址进行发送。</p><h1 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h1><ul><li><a href="https://support.huawei.com/enterprise/zh/doc/EDOC1100087024">华为文档中心 路由协议基础</a></li><li><a href="https://item.jd.com/11966296.html">TCP&#x2F;IP详解 卷1：协议（原书第2版）</a></li><li><a href="https://blog.csdn.net/q1007729991/article/details/72600130">CSDN@–Allen– 96-ICMP 协议（时间戳请求与应答）</a></li></ul>]]></content>
    
    
    <summary type="html">&lt;p&gt;本文从一个 ping 命令作为一个起点，详细描述了数据在路由器之间流通的路径，以及 ARP 请求和回复的详细过程，还有数据在网络层、链路层之间封装和传递的过程。&lt;/p&gt;</summary>
    
    
    
    <category term="学习笔记" scheme="https://blackyau.cc/categories/%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B0/"/>
    
    
    <category term="计算机网络" scheme="https://blackyau.cc/tags/%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%BD%91%E7%BB%9C/"/>
    
    <category term="路由器" scheme="https://blackyau.cc/tags/%E8%B7%AF%E7%94%B1%E5%99%A8/"/>
    
    <category term="ARP" scheme="https://blackyau.cc/tags/ARP/"/>
    
    <category term="ICMP" scheme="https://blackyau.cc/tags/ICMP/"/>
    
  </entry>
  
  <entry>
    <title>机器学习西瓜书 学习笔记</title>
    <link href="https://blackyau.cc/26"/>
    <id>https://blackyau.cc/26</id>
    <published>2020-10-26T08:43:42.000Z</published>
    <updated>2021-01-19T09:24:55.000Z</updated>
    
    <content type="html"><![CDATA[<p>记录自己的学习，以及为期末考试准备一份复习资料，不过复习的主要目的也不是为了那一份资料，而是复习的过程吧。我个人一直都对机器学习的「无中生有」比较感兴趣，正是这样的兴趣推动了我选择了这个专业方向。目前看来，大量的数学公式推导确实让我非常吃力，但是也算是满足了自己一窥端倪的好奇心。</p><span id="more"></span><h1 id="写在最前面"><a href="#写在最前面" class="headerlink" title="写在最前面"></a>写在最前面</h1><p>本系列文章都会遵守 <a href="https://github.com/sparanoid/chinese-copywriting-guidelines">中文文案排版指北</a>，愿各位看官和我都能够跟自己爱的人结婚😀。</p><p>「有研究显示，打字的时候不喜欢在中文和英文之间加空格的人，感情路都走得很辛苦，有七成的比例会在 34 岁的时候跟自己不爱的人结婚，而其余三成的人最后只能把遗产留给自己的猫。毕竟爱情跟书写都需要适时地留白。</p><p>与大家共勉之。」——<a href="https://github.com/vinta/pangu.js">vinta&#x2F;paranoid-auto-spacing</a></p><h1 id="绪论"><a href="#绪论" class="headerlink" title="绪论"></a>绪论</h1><h2 id="基本术语"><a href="#基本术语" class="headerlink" title="基本术语"></a>基本术语</h2><p>假设收集了一批关于西瓜的数据</p><table><thead><tr><th align="center">色泽</th><th align="center">根蒂</th><th align="center">敲声</th></tr></thead><tbody><tr><td align="center">青绿</td><td align="center">蜷缩</td><td align="center">混响</td></tr><tr><td align="center">乌黑</td><td align="center">稍蜷</td><td align="center">沉闷</td></tr><tr><td align="center">浅白</td><td align="center">硬挺</td><td align="center">清脆</td></tr></tbody></table><p>数据集：上面这一组数据的集合。</p><p><strong>样本</strong>：每一条（行）记录是关于一个时间或对象（这里是西瓜）的描述。</p><p><strong>属性&#x2F;特征</strong>：反映时间或对象在某方面的表现或性质，例如「色泽」、「根蒂」、「敲声」。</p><p>属性值：属性上的取值，例如「青绿」、「乌黑」。</p><p><strong>属性空间</strong>&#x2F;样本空间：属性张成的空间，比如把“色泽” “根蒂” “敲声”作为三个坐标轴，则他们张成一个用于描述西瓜的三维空间。这个空间就是属性空间。</p><p><strong>特征向量</strong>：把每个属性作为一个坐标轴，则它们能够组成一个多维空间，那么每一个实例都能够在空间中找到自己的坐标位置。因为每个点都对应一个坐标向量，因此一个实例也称特征向量。</p><p><strong>维数</strong>：因为每个属性都作为一个坐标轴，而又因为有多少个坐标轴我们就将这个空间叫做几维空间，所以维数也就是样本有多少个属性。</p><p><strong>学习&#x2F;训练</strong>：从数据中学的模型的过程，学习的过程是通过执行某个学习算法来完成的。</p><p>训练数据：训练过程中使用的数据。</p><p><strong>训练样本</strong>：训练过程中每个样本。</p><p><strong>训练集</strong>：训练样本组成的集合。</p><p>假设：学得模型对应关于数据的某种潜在的规律。</p><p>标记：样本的好或坏，一般都是布尔值。</p><p>样例：拥有标记值的样本。</p><p>标记空间&#x2F;输出空间：所有标记的集合。</p><p><strong>分类任务</strong>：预测的结果是离散值的任务。</p><p><strong>回归任务</strong>：预测的是连续值，比如西瓜的成熟度 0.95、0.37。</p><p>二分类任务：只要涉及的类别只有两个，通常一个为正类，另一个为反类。</p><p>多分类任务：涉及多个类别，也即是大于两个吧。</p><p><strong>测试</strong>：使用学得的模型进行预测。</p><p><strong>测试样本</strong>：被预测的样本。</p><p>聚类：将训练集中的西瓜分成若干组，例如高工资、低工资。</p><p>簇：被聚类成分成的组，每一组就是一个簇。</p><p><strong>监督学习</strong>：有标记信息，分类任务和回归任务。</p><p><strong>半监督学习</strong>：监督学习与无监督学习相结合的一种学习方法。半监督学习使用大量的未标记数据，以及同时使用标记数据，来进行模式识别工作。 </p><p><strong>无监督学习</strong>：无标记信息，聚类任务。</p><p><strong>泛化能力</strong>：学的模型适用于新样本的能力。具有强泛化能力的模型能很好地适应于整个样本空间。</p><p>独立同分布：可以先假设样本空间都服从一个未知的分布，我们获得的每个样本都是独立的从这个分布上采样获得的，即独立同分布。一般而言，训练样本越多，我们获得关于该分布的信息越多，就越有可能通过学习获得具有强泛化能力的模型。</p><h3 id="术语例题"><a href="#术语例题" class="headerlink" title="术语例题"></a>术语例题</h3><table><thead><tr><th align="center">编号</th><th align="center">姓名</th><th align="center">年收入</th><th align="center">性别</th><th align="center">职业</th><th align="center">好顾客</th></tr></thead><tbody><tr><td align="center">1</td><td align="center">张三</td><td align="center">高</td><td align="center">男</td><td align="center">程序员</td><td align="center">是</td></tr><tr><td align="center">2</td><td align="center">李四</td><td align="center">高</td><td align="center">男</td><td align="center">企业家</td><td align="center">是</td></tr><tr><td align="center">3</td><td align="center">王五</td><td align="center">中</td><td align="center">男</td><td align="center">公务员</td><td align="center">否</td></tr><tr><td align="center">4</td><td align="center">周六</td><td align="center">低</td><td align="center">女</td><td align="center">学生</td><td align="center">否</td></tr><tr><td align="center">5</td><td align="center">钱七</td><td align="center">中</td><td align="center">女</td><td align="center">教师</td><td align="center">否</td></tr></tbody></table><ul><li>上表中样本是？</li><li>一共有多少样本？</li><li>样本的属性都是？</li><li>样本标记是？</li><li>用户「周六」属性值和标记值是？</li></ul><details class="note info no-icon"><summary><p>答案</p></summary><ul><li>样本是顾客。</li><li>一共 5 个。</li><li>年收入、性别和职业。</li><li>是否为顾客（或好顾客）。</li><li>属性值&amp;标记值：年收入&#x3D;低、性别&#x3D;女和职业&#x3D;学生。</li></ul></details><h2 id="假设空间"><a href="#假设空间" class="headerlink" title="假设空间"></a>假设空间</h2><p>科学推理的两个基本手段：演绎、归纳。</p><p>演绎：从一般到特殊的特化，即从基础原理推演出具体状况，例如数学公式就是从一些常识和推理逻辑，推导出了相洽的定理。</p><p>归纳：从特殊到一般的泛化过程，就是从具体的事实归结出一般性规律。机器学习也就是用的这种推理方法。</p><p>广义的归纳学习：从样例中学习。</p><p>狭义的归纳学习：从训练数据中学得概念。</p><p>狭义的归纳学习，有时候也成为概念学习。概念学习中最基础的就是布尔概念学习，也就是对「是」「不是」这样的目标概念的学习。</p><h3 id="假设的确定"><a href="#假设的确定" class="headerlink" title="假设的确定"></a>假设的确定</h3><p>我们可以把学习的过程看做一个在所有假设组成的空间中进行搜索的过程，搜索的目标是 找到与训练集匹配的假设，也就是能够将训练集中的瓜判断正确的假设。</p><p>假设的表示一旦确定下来，假设空间及其规模大小就确定了。</p><blockquote><p>简单的讲，假设就是所有属性的所有取值情况（不受数据集约束）。</p></blockquote><table><thead><tr><th align="center">编号</th><th align="center">色泽</th><th align="center">根蒂</th><th align="center">敲声</th><th align="center">好瓜</th></tr></thead><tbody><tr><td align="center">1</td><td align="center">青绿</td><td align="center">蜷缩</td><td align="center">浊响</td><td align="center">是</td></tr><tr><td align="center">2</td><td align="center">乌黑</td><td align="center">蜷缩</td><td align="center">浊响</td><td align="center">是</td></tr><tr><td align="center">3</td><td align="center">青绿</td><td align="center">硬挺</td><td align="center">清脆</td><td align="center">否</td></tr><tr><td align="center">4</td><td align="center">乌黑</td><td align="center">稍蜷</td><td align="center">沉闷</td><td align="center">否</td></tr></tbody></table><p>与数据集匹配的假设就是</p><p>色泽 &#x3D; *; 根蒂 &#x3D; 蜷缩； 敲声 &#x3D; 浊响</p><p>也就是好瓜就是，根蒂蜷缩、敲声浊响，什么色泽都行的瓜。</p><p>这个能够判断所有数据集的“假设集合”已经非常接近版本空间，版本空间与这个“假设集合”的区别主要是因为。<strong>版本空间是将所有假设全部列出来后，然后用数据集一个一个去进行搜索，然后将删除与正例不一致的假设</strong>。这里主要的问题就是，对于一些属性它无法进行判断，因为数据集没有覆盖完整。</p><p>在做题的时候，也只有尽可能的往上面再想一层，看看它那个属性的上一层就算取任意值也算正例。</p><h3 id="版本空间例题1"><a href="#版本空间例题1" class="headerlink" title="版本空间例题1"></a>版本空间例题1</h3><table><thead><tr><th align="center">编号</th><th align="center">姓名</th><th align="center">年收入</th><th align="center">性别</th><th align="center">职业</th><th align="center">好顾客</th></tr></thead><tbody><tr><td align="center">1</td><td align="center">张三</td><td align="center">高</td><td align="center">男</td><td align="center">程序员</td><td align="center">是</td></tr><tr><td align="center">2</td><td align="center">李四</td><td align="center">高</td><td align="center">男</td><td align="center">企业家</td><td align="center">是</td></tr><tr><td align="center">3</td><td align="center">王五</td><td align="center">中</td><td align="center">男</td><td align="center">公务员</td><td align="center">否</td></tr><tr><td align="center">4</td><td align="center">周六</td><td align="center">低</td><td align="center">女</td><td align="center">学生</td><td align="center">否</td></tr><tr><td align="center">5</td><td align="center">钱七</td><td align="center">中</td><td align="center">女</td><td align="center">教师</td><td align="center">否</td></tr></tbody></table><p>给出上表所对应的版本空间，用符号 * 表示取任何值都可以。</p><details class="note info no-icon"><summary><p>答案</p></summary><p><img data-src="https://st.blackyau.net/blog/26/1.png" alt="1"></p></details><h3 id="版本空间例题2"><a href="#版本空间例题2" class="headerlink" title="版本空间例题2"></a>版本空间例题2</h3><table><thead><tr><th align="center">编号</th><th align="center">色泽</th><th align="center">根蒂</th><th align="center">敲声</th><th align="center">好瓜</th></tr></thead><tbody><tr><td align="center">1</td><td align="center">青绿</td><td align="center">蜷缩</td><td align="center">浊响</td><td align="center">是</td></tr><tr><td align="center">2</td><td align="center">乌黑</td><td align="center">蜷缩</td><td align="center">浊响</td><td align="center">是</td></tr><tr><td align="center">3</td><td align="center">青绿</td><td align="center">硬挺</td><td align="center">清脆</td><td align="center">否</td></tr><tr><td align="center">4</td><td align="center">乌黑</td><td align="center">稍蜷</td><td align="center">沉闷</td><td align="center">否</td></tr></tbody></table><p>该数据集的版本空间是？</p><details class="note info no-icon"><summary><p>答案</p></summary><p><img data-src="https://st.blackyau.net/blog/26/2.svg" alt="2"></p></details><h1 id="归纳偏好"><a href="#归纳偏好" class="headerlink" title="归纳偏好"></a>归纳偏好</h1><p>通过上面的 版本空间例题2 可以了解到，我们通过学习获得了 3 个与训练集一致的假设（版本空间里面有 3 种不同的假设）。但是有一个问题，就是它在面临新样本的时候，可能会输出不同的结果。</p><p>比如这里有个新瓜</p><p>色泽 &#x3D; 青绿； 根茎 &#x3D; 蜷缩； 敲声 &#x3D; 沉闷</p><p>如果使用 B 条件（点开上面的答案）判断，那么它会把这个新瓜判断为<strong>好瓜</strong>。</p><p>但是如果使用 C 条件判断，那么它会把这个新瓜判断为<strong>坏瓜</strong>。</p><p>如果在我们使用机器学习算法的时候，遇到这种样本它每次都随机挑选一个条件进行判断，进而导致每次预测的时候结果时而好又时而坏，这样的学习结果显然是没有意义的。</p><p>所以我们需要让机器学习算法在学习的过程中，要对于某种类型的假设有偏好。我们就称其为<strong>归纳偏好</strong>，或简称为偏好。</p><p>比如它的偏好如果是“尽可能特殊”的模型，那么它就会选择版本空间右上角那一个，因为这个模型确定了 3 种属性分别都是一个具体的取值。</p><p>又比如它的偏好是“尽可能一般”的模型，并且由于某种原因它更相信根蒂，那么它就会选择版本空间左上角那一个。因为这个模型更多的使用了 * 来确认取值，同时它指定了根茎是蜷缩。</p><h2 id="奥卡姆剃刀"><a href="#奥卡姆剃刀" class="headerlink" title="奥卡姆剃刀"></a>奥卡姆剃刀</h2><p>奥卡姆剃刀就是一种用来，评判哪一种“偏好”更好的原则。也即是<em>若有多个假设与观察一致，则选择最简单的那个</em>。</p><blockquote><p>可以总结为，简单的就是最好的</p></blockquote><p>这里的每个训练样本是图中的一个点（x，y），要学习一个与训练集一致的模型，相当于找到一条穿过所有训练样本的曲线。</p><p><img data-src="https://st.blackyau.net/blog/26/3.png" alt="存在多条曲线与有限样本训练集一致"></p><p>在这里我们就列举出了两条曲线，其中 A 是较为平滑的一条它的方程式是 $y&#x3D;-x^2+6x+1$ ，而 B 曲线是要复杂很多。</p><p>假设我们认为“更平滑”意味着“更简单”，所以在上面的图中我们会自然的偏好于选择更平滑的曲线 A。</p><p>但是奥卡姆剃刀并不是唯一可行的原则，因为在很多问题中，我们并不能找到<strong>哪一种假设更简单</strong>。同时也有一个问题，万一所有的样本都比较刁钻，他们都正好和 B 完全重合了呢？这个也正是下面这个定理要说明的。</p><h2 id="没有免费的午餐"><a href="#没有免费的午餐" class="headerlink" title="没有免费的午餐"></a>没有免费的午餐</h2><p>首先你不要被这个定理的名字所迷惑，可以先把这个命名放在另一边，看看定理的内容。</p><blockquote><p>简单的描述这个定理就是，没有任何一个机器学习算法适合所有情况。</p></blockquote><p>这个定理也很符合我们的直觉，就和上面奥卡姆剃刀举例的那两个曲线一样。A 曲线所代表的的机器学习算法，比 B 曲线的机器学习算法要牛逼些吗？并不是，因为 没有免费的午餐 定理证明了他们两种及其学习算法的正确率是完全一致的。</p><p>说人话就是，对于所有机器学习问题，任何一种算法的期望性能都是相等的。</p><p>但是也不要因为这个定理就觉得机器学习没意思了，因为这个定理有一个前提。也就是，所有问题出现的机会相同、或所有问题同等重要。但是在实际情况中，我们的很多问题都不是这样的，你只需要关注自己需要解决的问题就行了。</p><p>比如在学校里面有个电瓶车代步是一个很方便的选择，但是你如果要去外地又或是去国外，那么你可能坐飞机会更好一些。但是你就是想在学校里面找个代步工具，那么买飞机的事情你就不需要关心了。</p><p>返回到这个定理上面来，其实它想说的和丢了芝麻捡西瓜一样，也就是<strong>有得必有失</strong>。某一个机器学习算法在某一个领域非常好用，但是换到另一个地方却完全不好使了。</p><p>也就是说，当天上突然掉馅饼的时候你现在吃到了，但是你之后就会遇到很倒霉的事情。这么说应该就能理解没有免费的午餐是什么意思了。</p><h1 id="经验误差与过拟合"><a href="#经验误差与过拟合" class="headerlink" title="经验误差与过拟合"></a>经验误差与过拟合</h1><p>如果在 m 个样本中有 α 个样本分类错误，则错误率 $E&#x3D;\frac{a}{m}$</p><p><strong>精度</strong>：$1-\frac{a}{m}$，也即是<em>精度&#x3D;1-错误率</em>。</p><p><strong>误差</strong>：学习器的实际预测输出与样本的真实输出之间的差异。（对单个样本的学习）</p><p><strong>训练误差&#x2F;经验误差</strong>：学习器在训练集上的误差。（整个数据集）</p><p><strong>泛化误差</strong>：在新样本上的误差。</p><p>我们希望得到泛化误差小的学习器。然而，我们事先并不知道新样本是什么样，实际能做的是努力使经验误差最小化。</p><p><strong>过拟合</strong>：当学习器把训练样本学得”太好”了的时候，很可能巳经把训练样本自身的一些特点当作了所有潜在样本都会具有的一般性质，这样就会导致泛化性能下降。这就是过拟合。</p><blockquote><p>过拟合是无法彻底避免的，我们所能做的只是“缓解”，或者说减小其风险。</p></blockquote><p><strong>欠拟合</strong>：这是指对训练样本的一般性质尚未学好。</p><blockquote><p>欠拟合比较容易克服，例如在决策树学习中扩展分支、在神经网络学习中增加训练轮数等。</p></blockquote><p><img data-src="https://st.blackyau.net/blog/26/4.png" alt="过拟合与欠拟合直观对比"></p><h2 id="留出法"><a href="#留出法" class="headerlink" title="留出法"></a>留出法</h2><p>这是评估方法的一种，使用<strong>测试集</strong>来测试学习器对新样本的判别能力，在测试集上面的<strong>测试误差</strong>就可以作为泛化误差的近似。同时要使测试集应尽可能的与训练集互斥，也就是测试样本尽量不在训练集中出现、未在训练过程中使用过。</p><p>假定数据集包含 1000 个样本，将其划分为一个 S 集合包含 700 个样本，另一个 T 集合包含 300 个样本。用 S 集合进行训练后，在使用 T 集合进行测试。通过统计测试结果的正确率，就能知道精度和错误率了。这个方法就是留出法。</p><blockquote><p>划分的时候一般都是 训练集样本数 &gt; 测试集样本数。 $\frac{2}{3} \text{~} \frac{4}{5}$ 的样本用于训练，剩下的测试。</p></blockquote><p>划分训练集和测试集时，要尽可能保持数据分布的一致性。</p><p>比如整个数据集有 500 个正例、500 个反例，则划分时 S 应包含 350 个正例、350 个反例，而 T 则包含150 个正例和 150 个反例。这个也叫<strong>分层采样</strong>，若 S、T 中样本类别比例差别很大，则误差估计将由于训练&#x2F;测试数据分布的差异而产生偏差。</p><p>还有一个问题，如果初始数据集里面样本的顺序很极端，前 500 个全是正例，后 500 个全是反例。直接划分的话就不准确，所以单次使用留出法得到的估计结果往往不够稳定可靠，在使用留出法时，一般要采用若干次随机划分、重复进行实验评估后取平均值作为留出法的评估结果。</p><h2 id="交叉检验法"><a href="#交叉检验法" class="headerlink" title="交叉检验法"></a>交叉检验法</h2><p>先将数据集划分为 k 个大小相似的互斥子集，也就是每一个子集都尽可能保持数据分布的一致性（都不重复）。</p><blockquote><p>然后每次都只取其中的最后一个为<strong>测试集</strong>，前面几个都用来做<strong>训练集</strong>。</p></blockquote><p>最终返回的是这 k 个测试结果的均值。显然，交叉验证法评估结果的稳定性和保真性在很大程度上取决于 k 的取值，为强调这一点，通常把交叉验证法称 k 折交叉验证。k 最常用的取值为 10，此时称为 10 折交叉验证。</p><p><img data-src="https://st.blackyau.net/blog/26/5.png" alt="10 折交叉验证示意图"></p><p>假定数据集 D 中包含 m 个样本，若令 k&#x3D;m ， 则得到了交叉验证法的一个特列：留一法。也就是每一个子集只包含一个样本。测试集就取整个数据集的最后一个。</p><p>留一法的评估结果往往被认为比较准确，但是他也有缺陷，留一法在数据集比较大时训练模型的计算开销可能是难以忍受的。另外，留一法的估计结果也未必永远比其他评估方法准确，“没有免费的午餐”定理对实验评估方法同样适用。</p><h2 id="自助法"><a href="#自助法" class="headerlink" title="自助法"></a>自助法</h2><p>给定包含 m 个样本的数据集 D ， 我们对它进行采样产生数据集 D’:</p><p>每次随机从 D 中挑选一个样本， 将其拷贝放入 D’ 然后再将该样本放回初始数据集 D 中，使得该样本在下次采样时仍有可能被采到;这个过程重复执行 m 次后，我们就得到了包含 m 个样本的数据集 D’，这就是自助采样的结果。</p><p>我们还可以估计样本在 m 次采样中始终不被采到的概率</p><p>$$<br>\begin{aligned}<br>P&amp;&#x3D;\lim\limits_{m→∞}(1-\frac{1}{m})^m \<br>&amp;&#x3D;\frac{1}{e} \<br>&amp;≈0.368<br>\end{aligned}<br>$$</p><p>自助法在数据集较小、难以有效划分训练&#x2F;测试集时很有用;此外，自助法能从初始数据集中产生多个不同的训练集，这对集成学习等方法有很大的好处.然而，自助法产生的数据集改变了初始数据集的分布，这会引入估计偏差。因此，在初始数据量足够时，留出法和交叉验证法更常用一些。</p><h1 id="性能度量"><a href="#性能度量" class="headerlink" title="性能度量"></a>性能度量</h1><p>对模型泛化能力的评价标准就是性能度量。</p><p>因为对于不同的模型，使用不同的性能度量方法会导致不同的评判结果。这意味着模型的“好坏”是相对的，一个好的模型不只是它的算法好坏，还取决于它的任务需求。</p><p>比如回归任务最常用的性能度量是“均方误差”（mean squared error）</p><p>$$<br>E(f;D) &#x3D; \frac{1}{m}\sum_{i &#x3D; 1}^m (f(x_i) - y_i)^2<br>$$</p><p>$E$：数学期望，也就是均值。是一种概率论概念，样本出现的情况结合出现的概率，是一种加权平均。（最后一步都是求平均值，这个公式其实和平均值有点类似）</p><p>$f$：代指我们训练的模型。</p><p>$D$：数据集。</p><p>$m$：样本的总数量。</p><p>$i$：当前样本的下标。</p><p>$x_i$：第 i 个样本的预测结果的标签（好瓜&#x2F;坏瓜）。</p><p>$y_i$：第 i 个样本的标签真实值。</p><p>$\displaystyle\sum_{i&#x3D;1}^m(x_i)$：通过下标将所有 x 相加起来。和 for 循环差不多。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">int</span> <span class="variable">sum</span> <span class="operator">=</span> <span class="number">0</span>;</span><br><span class="line"><span class="type">int</span> list[] = &#123;<span class="number">0</span>, <span class="number">1</span>, <span class="number">2</span>, <span class="number">3</span>, <span class="number">4</span>&#125;;</span><br><span class="line"><span class="keyword">for</span>(i=<span class="number">1</span>; i&lt;=m; i++)&#123;</span><br><span class="line">    sum = sum + list[i];</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>简单来说，就是把所有预测值和真实值相减，然后在求他们的平方之和。最后再求个平均值。</p><h2 id="错误率与精度"><a href="#错误率与精度" class="headerlink" title="错误率与精度"></a>错误率与精度</h2><p>错误率和精度是分类任务（预测的结果是布尔值的任务）中最常用的两种性能度量。</p><p>他们既适用于二分类任务，也适用于多分类任务。</p><h3 id="错误率"><a href="#错误率" class="headerlink" title="错误率"></a>错误率</h3><p>对样例集 $D$，分类错误率定义为</p><p>$$<br>E(f;D) &#x3D; \frac{1}{m}\sum_{i &#x3D; 1}^m \text{Ⅱ}(f(x_i) \ne y_i)<br>$$</p><p>$E(f;D)$：对模型 $f$ 和数据集 $D$ 的数学期望。</p><p>$m$：样本的总数量。</p><p>$i$：当前样本的下标。</p><p>$x_i$：第 i 个样本的预测结果的标签（好瓜&#x2F;坏瓜）。</p><p>$y_i$：第 i 个样本的标签真实值。</p><p>$Ⅱ$：相当于定义的一个函数，和 $f(x)$ 概念类似，只是它这个符号比较迷惑。</p><p>$$<br>Ⅱ(f({x_i}) \ne {y_i}) &#x3D; \begin{cases}<br>1 ,&amp;\text{if }f(x_i) \ne y_i \<br>0 ,&amp;\text{if }f(x_i) &#x3D; y_i<br>\end{cases}<br>$$</p><p>也就是：</p><ul><li>如果预测值和实际值<strong>相等</strong>，那么这个函数输出 0。</li><li>如果预测值和实际值<strong>有偏差</strong>，那么这个函数输出 1。</li></ul><p>最后又把他们输出的所有值全部加起来，再求个平均值。</p><h3 id="精度"><a href="#精度" class="headerlink" title="精度"></a>精度</h3><p>精度也就是正确率，上面求出来了错误率，那么直接用 1 减去它就行了。</p><p>$$ acc(f;D) &#x3D; 1-E(f;D) $$</p><p>$acc(f;D)$：acc 是 Accuracy 的缩写，意思是 准确&#x2F;精确 率。</p><p>$E(f;D)$：就是上面的错误率。</p><p>当然了你也可以通过改写上面的 $Ⅱ$ 函数，直接计算精度。</p><p>$$ E(f;D) &#x3D; \frac{1}{m}\sum\limits_{i &#x3D; 1}^m Ⅱ(f({x_i}) \color{blue}&#x3D; \color{black}{y_i}) $$</p><p>将 $Ⅱ$ 函数改成：</p><ul><li>如果预测值和实际值<strong>相等</strong>，那么这个函数输出 1。</li><li>如果预测值和实际值<strong>有偏差</strong>，那么这个函数输出 0。</li></ul><p>$$<br>Ⅱ(f({x_i}) \color{blue}&#x3D;\color{black} {y_i}) &#x3D; \begin{cases}<br>1 ,&amp;\text{if }f(x_i) \color{blue}&#x3D;\color{black} y_i \<br>0 ,&amp;\text{if }f(x_i) \color{blue}\ne\color{black} y_i<br>\end{cases}<br>$$</p><blockquote><p>注意这里的 $&#x3D;$ 和 $\ne$ 与上面错误率的公式有差别。</p></blockquote><p>通过改写后，我们就相当于通过 $Ⅱ$ 直接求出了正确样本的总个数，再通过最外面的 $\frac{1}{m}$ 就直接可以求出正确率的百分比了。</p><h2 id="查准率与查全率"><a href="#查准率与查全率" class="headerlink" title="查准率与查全率"></a>查准率与查全率</h2><p><strong>查准率</strong>（Precision）：判定是对的样例中，到底有多少真是对的。</p><p><strong>查全率</strong>（Recall）：所有对的样例，你找出了多少，或者说你判断对了多少。</p><table>   <tr  rowspan="2">      <th rowspan="2" align="center">真实情况</th>      <th colspan="2" align="center">预测结果</th>   </tr>   <tr>      <td  align="center">正例</td>      <td  align="center">反例</td>   </tr>   <tr>      <td  align="center">正例</td>      <td align="center">TP（真正例）true positive</td>      <td align="center">FN（假反例）false negative</td>   </tr>   <tr>      <td  align="center">反例</td>      <td align="center">FP（假正例）false positive</td>      <td align="center">TN（真反例）true negative</td>   </tr></table><blockquote><p>这个就叫混淆矩阵</p></blockquote><p>查准率 P 与 查全率 R 分别定义为</p><p>$$ p&#x3D;\frac{TP}{TP+FP} $$</p><p>$$ R&#x3D;\frac{TP}{TP+FN} $$</p><p>查准率和查全率是一对矛盾的变量。一般来说，其中一个高另一个就会低。</p><p>比如，要尽可能将好瓜选出来，那么就增加选瓜的数量，如果把所有瓜都选上了，好瓜肯定也都选上了。但是这样的话查准率就会很低。反之亦然。</p><h2 id="F1-度量"><a href="#F1-度量" class="headerlink" title="F1 度量"></a>F1 度量</h2><p>F1 度量是用于找到一个查全率和查准率的平衡点算法。</p><p>$$<br>F1 &#x3D; \frac{2×P×R}{P+R} &#x3D; \frac{2×TP}{样例总数+TP-TN}<br>$$</p><p>F1 度量还可以通过调整参数，让他向查全率偏重，或者是向查准率偏重。</p><p>$$<br>F_β &#x3D; \frac{(1+β^2)×P×R}{(β^2×P)+R}<br>$$</p><p>$$<br>F_β &#x3D; \begin{cases}<br>偏重查全率 &amp;β&gt;1\<br>标准 F1 &amp;β&#x3D;1\<br>偏重查准率 &amp;β&lt;1<br>\end{cases}<br>$$</p><h2 id="宏-微F1"><a href="#宏-微F1" class="headerlink" title="宏&#x2F;微F1"></a>宏&#x2F;微F1</h2><p>很多时候我们有多个二分类混淆矩阵(忘了混淆矩阵是什么的去<a href="/26" title="机器学习西瓜书 学习笔记">这里</a>看看)，然后我们有一个方法可以在 n 个二分类混淆矩阵上综合考察查准率和查全率。</p><ul><li>宏：先计算查准率和查全率再求平均值。</li><li>微：先计算 TP、FP、TN、FN 的平均值再求查准率和查全率。</li></ul><h3 id="宏"><a href="#宏" class="headerlink" title="宏"></a>宏</h3><p>你要先回想以前之前的，查准率（Precision）和查全率（Recall）。</p><p>$$ P&#x3D;\frac{TP}{TP+FP} $$</p><p>$$ R&#x3D;\frac{TP}{TP+FN} $$</p><p>宏查准率与宏查全率就是把他们求了个平均值。</p><p>宏（macro）查准率：</p><p>$$<br>macro-P &#x3D; \frac{1}{n}\sum_{i&#x3D;1}^nP_i<br>$$</p><p>宏查全率：</p><p>$$<br>macro-R &#x3D; \frac{1}{n}\sum_{i&#x3D;1}^nR_i<br>$$</p><p>可以理解为，先把每一个混淆矩阵的 P 和 R 都求出来，然后再把他们求个平均值就行了。</p><p>后面的宏 F1 也就比较好理解，还是回想之前计算 F1 的公式。</p><p>$$<br>F1 &#x3D; \frac{2×P×R}{P+R}<br>$$</p><p>令上面式子的 $P&#x3D;macro-P$ 同时令 $R&#x3D;macro-R$ 那么就可以得到下面的式子了</p><p>$$<br>macro-F1 &#x3D; \frac{2×macro-P×macro-R}{macro-P+macro-R}<br>$$</p><h3 id="微"><a href="#微" class="headerlink" title="微"></a>微</h3><p>介绍完了宏，再来介绍一下微（micro）。还是放一下这个混淆矩阵.</p><table>   <tr  rowspan="2">      <th rowspan="2" align="center">真实情况</th>      <th colspan="2" align="center">预测结果</th>   </tr>   <tr>      <td  align="center">正例</td>      <td  align="center">反例</td>   </tr>   <tr>      <td  align="center">正例</td>      <td align="center">TP（真正例）true positive</td>      <td align="center">FN（假反例）false negative</td>   </tr>   <tr>      <td  align="center">反例</td>      <td align="center">FP（假正例）false positive</td>      <td align="center">TN（真反例）true negative</td>   </tr></table><p>当有多个混淆矩阵的时候，你也会有很多个 TP、FN、FP、TN。那么你可以先分别计算所有混淆矩阵的 TP、FN、FP、TN 的平均值。我们将求过平均值后的数上面画一个横线，那么就有。</p><p>$TP$ 的平均值为 $\overline{TP}$</p><p>$FN$ 的平均值为 $\overline{FN}$</p><p>$FP$ 的平均值为 $\overline{FP}$</p><p>$TN$ 的平均值为 $\overline{TN}$</p><p>有了这四个参数，我们再回头看看之前的查准率和查全率公式：</p><p>$$ P&#x3D;\frac{TP}{TP+FP} $$</p><p>$$ R&#x3D;\frac{TP}{TP+FN} $$</p><p>把里面的变量都用计算平均值过后的替换一下，就可以计算微（micro）查准率：</p><p>$$<br>micro-P &#x3D; \frac{\overline{TP}}{\overline{TP}+\overline{FP}}<br>$$</p><p>微查全率也很简单啦：</p><p>$$<br>micro-R &#x3D; \frac{\overline{TP}}{\overline{TP}+\overline{FN}}<br>$$</p><p>后面的微 F1 也是把每一位的都替换一下就行了：</p><p>$$<br>micro-F1 &#x3D; \frac{2×micro-P×micro-R}{micro-P×micro-R}<br>$$</p><h2 id="P-R-ROC-AUC"><a href="#P-R-ROC-AUC" class="headerlink" title="P-R ROC AUC"></a>P-R ROC AUC</h2><table>   <tr  rowspan="2">      <th rowspan="2" align="center">真实情况</th>      <th colspan="2" align="center">预测结果</th>   </tr>   <tr>      <td  align="center">正例</td>      <td  align="center">反例</td>   </tr>   <tr>      <td  align="center">正例</td>      <td align="center">TP（真正例）true positive</td>      <td align="center">FN（假反例）false negative</td>   </tr>   <tr>      <td  align="center">反例</td>      <td align="center">FP（假正例）false positive</td>      <td align="center">TN（真反例）true negative</td>   </tr></table><h3 id="P-R"><a href="#P-R" class="headerlink" title="P-R"></a>P-R</h3><p>P-R 曲线横坐标是查全率：</p><p>$$ R&#x3D;\frac{TP}{TP+FN} $$</p><p>纵坐标是查准率：</p><p>$$ P&#x3D;\frac{TP}{TP+FP} $$</p><p><img data-src="https://st.blackyau.net/blog/26/6.png" alt="P-R曲线"></p><h3 id="ROC"><a href="#ROC" class="headerlink" title="ROC"></a>ROC</h3><p>横坐标的两个参数，都是混淆矩阵的第二行，这个分子的 FP 是我们不想要的。</p><p>ROC 曲线的横坐标是假正例率（Flase Positive Rate）：</p><p>$$ FPR&#x3D;\frac{FP}{FP+TN} $$</p><p>纵坐标的两个参数，都是混淆矩阵的第一行，这个分子的 TP 是我们想要的。</p><p>纵坐标是真正例率（True Positive Rate），这个公式和 $R$ 是一摸一样的。</p><p>$$ TPR&#x3D;\frac{TP}{TP+FN} $$</p><blockquote><p>我们最想要的就是横坐标很小，但是纵坐标很大的模型。</p></blockquote><p>左边这个图在绘制的时候，是人工美化让他变光滑了的。</p><p>但是右边这个图才是实际情况下的图，因为样本数量是有限的，两个点之间的距离也都是直线，所以就会有很多锯齿。</p><p>左边那个图，中间的虚线不用去理解它。同时你也要注意，左图的 AUC 指的是阴影部分，和那个虚线没有任何关系。</p><p><img data-src="https://st.blackyau.net/blog/26/7.png" alt="ROC"></p><h3 id="AUC"><a href="#AUC" class="headerlink" title="AUC"></a>AUC</h3><p>AUC（Area Under ROC Curve）指的 ROC 曲线下面的面积，而这个面积越大，这个模型就越好。</p><p>虽然 AUC 这个概念只是说的 ROC 但是 P-R 曲线也是曲线下面的面积越大越好。</p><h1 id="术语中英对照"><a href="#术语中英对照" class="headerlink" title="术语中英对照"></a>术语中英对照</h1><table><thead><tr><th align="center">中文</th><th align="center">英文</th><th align="center">备注</th></tr></thead><tbody><tr><td align="center">数据集</td><td align="center">data set</td><td align="center"></td></tr><tr><td align="center">实例</td><td align="center">instance</td><td align="center">每一条记录</td></tr><tr><td align="center">样本</td><td align="center">sample</td><td align="center">又名实例</td></tr><tr><td align="center">属性</td><td align="center">attribute</td><td align="center">描述样本的什么方面</td></tr><tr><td align="center">特征</td><td align="center">feature</td><td align="center">又名属性</td></tr><tr><td align="center">属性样本</td><td align="center">attribute space</td><td align="center"></td></tr><tr><td align="center">特征向量</td><td align="center">feature vector</td><td align="center">在空间中确认一个点</td></tr><tr><td align="center">维数</td><td align="center">dimensionality</td><td align="center">样本有多少属性</td></tr><tr><td align="center">学习</td><td align="center">learning</td><td align="center"></td></tr><tr><td align="center">训练数据</td><td align="center">training data</td><td align="center"></td></tr><tr><td align="center">训练样本</td><td align="center">training sample</td><td align="center"></td></tr><tr><td align="center">训练集</td><td align="center">training set</td><td align="center"></td></tr><tr><td align="center">假设</td><td align="center">hypothesis</td><td align="center">关于数据的某种规律</td></tr><tr><td align="center">标记</td><td align="center">label</td><td align="center">真假</td></tr><tr><td align="center">样例</td><td align="center">example</td><td align="center">有标记的样本</td></tr><tr><td align="center">标记空间</td><td align="center">label space</td><td align="center">所有标记的集合</td></tr><tr><td align="center">分类</td><td align="center">classification</td><td align="center">预测的布尔值</td></tr><tr><td align="center">回归</td><td align="center">regression</td><td align="center">预测的连续值</td></tr><tr><td align="center">二分类</td><td align="center">binary classification</td><td align="center">预测类别只有两个</td></tr><tr><td align="center">多分类</td><td align="center">multi-class classification</td><td align="center">预测类别有两个以上</td></tr><tr><td align="center">测试</td><td align="center">testing</td><td align="center"></td></tr><tr><td align="center">测试样本</td><td align="center">testing sample</td><td align="center"></td></tr><tr><td align="center">聚类</td><td align="center">clustering</td><td align="center">给训练集分组</td></tr><tr><td align="center">簇</td><td align="center">cluster</td><td align="center">每个组就是一个簇</td></tr><tr><td align="center">监督学习</td><td align="center">supervised learning</td><td align="center">有标记</td></tr><tr><td align="center">无监督学习</td><td align="center">unsupervised learning</td><td align="center">无标记</td></tr><tr><td align="center">泛化</td><td align="center">generalization</td><td align="center">学习样本的能力</td></tr><tr><td align="center">归纳</td><td align="center">induction</td><td align="center">从具体事实归纳出规律</td></tr><tr><td align="center">演绎</td><td align="center">deduction</td><td align="center">从基础原理推导出具体情况</td></tr><tr><td align="center">特化</td><td align="center">specialization</td><td align="center">描述演绎的过程</td></tr><tr><td align="center">归纳学习</td><td align="center">inductive learning</td><td align="center">从具体事实归结出一般性规律</td></tr><tr><td align="center">概念</td><td align="center">concept</td><td align="center">对样本的描述</td></tr></tbody></table><h1 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h1><p><a href="https://imlogm.github.io/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/occam-razor-NFL/">奥卡姆剃刀和没有免费的午餐定理@LogM’s Blog</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;记录自己的学习，以及为期末考试准备一份复习资料，不过复习的主要目的也不是为了那一份资料，而是复习的过程吧。我个人一直都对机器学习的「无中生有」比较感兴趣，正是这样的兴趣推动了我选择了这个专业方向。目前看来，大量的数学公式推导确实让我非常吃力，但是也算是满足了自己一窥端倪的好奇心。&lt;/p&gt;</summary>
    
    
    
    <category term="学习笔记" scheme="https://blackyau.cc/categories/%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B0/"/>
    
    
    <category term="笔记" scheme="https://blackyau.cc/tags/%E7%AC%94%E8%AE%B0/"/>
    
    <category term="学习" scheme="https://blackyau.cc/tags/%E5%AD%A6%E4%B9%A0/"/>
    
    <category term="学习笔记" scheme="https://blackyau.cc/tags/%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B0/"/>
    
    <category term="西瓜书" scheme="https://blackyau.cc/tags/%E8%A5%BF%E7%93%9C%E4%B9%A6/"/>
    
    <category term="机器学习" scheme="https://blackyau.cc/tags/%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/"/>
    
    <category term="周志华" scheme="https://blackyau.cc/tags/%E5%91%A8%E5%BF%97%E5%8D%8E/"/>
    
  </entry>
  
  <entry>
    <title>算法复杂度分析(下)</title>
    <link href="https://blackyau.cc/25"/>
    <id>https://blackyau.cc/25</id>
    <published>2020-03-06T12:35:00.000Z</published>
    <updated>2020-03-20T05:28:00.000Z</updated>
    
    <content type="html"><![CDATA[<a href="/24" title="算法复杂度分析(上)">上一篇</a> 文章简单的分析了大 O 表示法，还有几个分析的技巧。当然在实际情况下，仅依靠之前的知识无法解决。<span id="more"></span><p>今天我会继续讲四个复杂度分析方面的知识点，<strong>最好情况时间复杂度</strong>（best case time complexity）、<strong>最坏情况时间复杂度</strong>（worst case time complexity）、<strong>平均情况时间复杂度</strong>（average case time complexity）、<strong>均摊时间复杂度</strong>（amortized time complexity）。如果这几个概念你都能掌握，那对你来说，复杂度分析这部分内容就没什么大问题了。</p><h2 id="最好、最坏情况时间复杂"><a href="#最好、最坏情况时间复杂" class="headerlink" title="最好、最坏情况时间复杂"></a>最好、最坏情况时间复杂</h2><p>先上代码</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// n表示数组array的长度</span></span><br><span class="line"><span class="type">int</span> <span class="title function_">find</span><span class="params">(<span class="type">int</span>[] <span class="built_in">array</span>, <span class="type">int</span> n, <span class="type">int</span> x)</span> &#123;</span><br><span class="line">  <span class="type">int</span> i = <span class="number">0</span>;</span><br><span class="line">  <span class="type">int</span> pos = <span class="number">-1</span>;</span><br><span class="line">  <span class="keyword">for</span> (; i &lt; n; ++i) &#123;</span><br><span class="line">    <span class="keyword">if</span> (<span class="built_in">array</span>[i] == x) pos = i;</span><br><span class="line">  &#125;</span><br><span class="line">  <span class="keyword">return</span> pos;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这段代码在一个无序的数组（array）中，查找 变量 x 出现的位置。如果没有找到就返回 -1。</p><p>使用上一篇文章的方法进行分析，这段代码的复杂度是 $O(n)$，其中 n 代表数组的长度。</p><p>我们在数组中查找一个数据，并不需要每次都把整个数组都遍历一遍，因为有可能中途找到就可以提前结束循环了。但是，这段代码写得不够高效。我们可以这样优化一下这段查找代码。</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// n表示数组array的长度</span></span><br><span class="line"><span class="type">int</span> <span class="title function_">find</span><span class="params">(<span class="type">int</span>[] <span class="built_in">array</span>, <span class="type">int</span> n, <span class="type">int</span> x)</span> &#123;</span><br><span class="line">  <span class="type">int</span> i = <span class="number">0</span>;</span><br><span class="line">  <span class="type">int</span> pos = <span class="number">-1</span>;</span><br><span class="line">  <span class="keyword">for</span> (; i &lt; n; ++i) &#123;</span><br><span class="line">    <span class="keyword">if</span> (<span class="built_in">array</span>[i] == x) &#123;</span><br><span class="line">       pos = i;</span><br><span class="line">       <span class="keyword">break</span>;</span><br><span class="line">    &#125;</span><br><span class="line">  &#125;</span><br><span class="line">  <span class="keyword">return</span> pos;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>这个时候，问题就来了。我们优化完之后，这段代码的时间复杂度还是 $O(n)$ 吗？很显然，咱们上一节讲的分析方法，解决不了这个问题。</p><p>如果要查找的元素就在第 1 个位置，那么时间复杂度就是 $O(1)$。</p><p>但是如果元素不在数组内，那么它会把整个数组都循环一遍，时间复杂度就是 $O(n)$。</p><p>为了表示代码在不同情况下的不同时间复杂度，我们需要引入三个概念：最好情况时间复杂度、最坏情况时间复杂度和平均情况时间复杂度。</p><p>顾名思义，<strong>最好情况时间复杂度就是，在最理想的情况下，执行这段代码的时间复杂度</strong>。就像我们刚刚讲到的，在最理想的情况下，要查找的变量正好是数组的第一个元素，这个时候对应的时间复杂度就是最好情况时间复杂度。</p><p>同理，<strong>最坏情况时间复杂度就是，在最糟糕的情况下，执行这段代码的时间复杂度</strong>。就像刚举的那个例子，如果数组中没有要查找的变量，我们需要把整个数组都遍历一遍才行，所以这种最糟糕情况下对应的时间复杂度就是最坏情况时间复杂度。</p><h2 id="平均情况时间复杂度"><a href="#平均情况时间复杂度" class="headerlink" title="平均情况时间复杂度"></a>平均情况时间复杂度</h2><p>我们都知道，最好情况时间复杂度和最坏情况时间复杂度对应的都是极端情况下的代码复杂度，发生的概率其实并不大。</p><p>为了更好地表示平均情况下的复杂度，我们需要引入另一个概念：平均情况时间复杂度，后面我简称为平均时间复杂度。</p><p>平均时间复杂度又该怎么分析呢？我还是借助刚才查找变量 x 的例子来给你解释。要查找的变量 x 在数组中的位置，有 n+1 种情况：<strong>在数组的 0～n-1 位置中</strong>和<strong>不在数组中</strong>。我们把每种情况下，查找需要遍历的元素个数累加起来，然后再除以 $n+1$，就可以得到需要遍历的元素个数的平均值，即：</p><p>$$ \frac{1+2+3+\dots+n+n}{n+1}&#x3D;\frac{n(n+3)}{2(n+1)} $$</p><p>我们知道，时间复杂度的大 O 标记法中，可以省略掉系数、低阶、常量，所以，咱们把刚刚这个公式简化之后，得到的平均时间复杂度就是：</p><p>$$ O(n) $$</p><p>这个结论虽然是正确的，但是计算过程稍微有点儿问题。究竟是什么问题呢？我们刚讲的这 n+1 种情况，出现的概率并不是一样的。（这里要稍微用到一点儿概率论的知识）</p><p>我们知道，要查找的变量 x，要么在数组里，要么就不在数组里。</p><p>这两种情况对应的概率统计起来很麻烦，为了方便你理解，我们假设在数组中与不在数组中的概率都为 $\frac{1}{2}$。</p><p>另外，要查找的数据出现在 0～n-1 这 n 个位置的概率也是一样的，为 $\frac{1}{n}$。所以，根据概率乘法法则，要查找的数据出现在 0～n-1 中任意位置的概率就是：</p><p>$$ \frac{1}{2}\cdot\frac{1}{n}&#x3D;\frac{1}{2n} $$</p><p>算出了数据出现在任意位置的概率，又因为执行次数的也是会在 1~n 之内变化的，所以把每一种情况出现的概率也要考虑进去，那么平均时间复杂度的计算过程变成了这个样子：</p><p>$$ 1\times\frac{1}{2n}+ 2\times\frac{1}{2n}+ 3\times\frac{1}{2n}+ \dots+n\times\frac{1}{2n}+n\times\frac{1}{2}&#x3D;\frac{3n+1}{4} $$ </p><p>最后面一个 $n$ 是不在数组中的概率，它独占 $\frac{1}{2}$，所以最后一个是 $n\times\frac{1}{2}$。</p><p>这个值就是概率论中的<strong>加权平均值</strong>，也叫作<strong>期望值</strong>，所以平均时间复杂度的全称应该叫<strong>加权平均时间复杂度</strong>或者<strong>期望时间复杂度</strong>。</p><p>引入概率之后，前面那段代码的加权平均值为 $(3n+1)&#x2F;4$。用大 O 表示法来表示，去掉系数和常量，这段代码的加权平均时间复杂度仍然是 $O(n)$。</p><p>你可能会说，平均时间复杂度分析好复杂啊，还要涉及概率论的知识。实际上，在大多数情况下，我们并不需要区分最好、最坏、平均情况时间复杂度三种情况。</p><p>像上一节的那些例子一样，很多时候，我们使用一个复杂度就可以满足需求了。只有同一块代码在不同的情况下，时间复杂度有量级的差距，我们才会使用这三种复杂度表示法来区分。</p><h2 id="均摊时间复杂度"><a href="#均摊时间复杂度" class="headerlink" title="均摊时间复杂度"></a>均摊时间复杂度</h2><p>到此为止，你应该已经掌握了算法复杂度分析的大部分内容了。下面我要给你讲一个更加高级的概念，均摊时间复杂度，以及它对应的分析方法，摊还分析（或者叫平摊分析）。</p><p>均摊时间复杂度，听起来跟平均时间复杂度有点儿像。这两个概念确实非常容易弄混。我前面说了，大部分情况下，我们并不需要区分最好、最坏、平均三种复杂度。平均复杂度只在某些特殊情况下才会用到，而均摊时间复杂度应用的场景比它更加特殊、更加有限。</p><p>老规矩，我还是借助一个具体的例子来帮助你理解。（当然，这个例子只是我为了方便讲解想出来的，实际上没人会这么写。）</p><figure class="highlight c"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// array表示一个长度为n的数组</span></span><br><span class="line"><span class="comment">// 代码中的array.length就等于n</span></span><br><span class="line"><span class="type">int</span>[] <span class="built_in">array</span> = new <span class="type">int</span>[n];</span><br><span class="line"><span class="type">int</span> count = <span class="number">0</span>;</span><br><span class="line"></span><br><span class="line"><span class="type">void</span> <span class="title function_">insert</span><span class="params">(<span class="type">int</span> val)</span> &#123;</span><br><span class="line">   <span class="keyword">if</span> (count == <span class="built_in">array</span>.length) &#123;</span><br><span class="line">      <span class="type">int</span> sum = <span class="number">0</span>;</span><br><span class="line">      <span class="keyword">for</span> (<span class="type">int</span> i = <span class="number">0</span>; i &lt; <span class="built_in">array</span>.length; ++i) &#123;</span><br><span class="line">         sum = sum + <span class="built_in">array</span>[i];</span><br><span class="line">      &#125;</span><br><span class="line">      <span class="built_in">array</span>[<span class="number">0</span>] = sum;</span><br><span class="line">      count = <span class="number">1</span>;</span><br><span class="line">   &#125;</span><br><span class="line"></span><br><span class="line">   <span class="built_in">array</span>[count] = val;</span><br><span class="line">   ++count;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>我先来解释一下这段代码。这段代码实现了一个往数组中插入数据的功能。当数组满了之后，也就是代码中的 <code>count == array.length</code> 时，我们用 <code>for</code> 循环遍历数组求和，并清空数组，将求和之后的 <code>sum</code> 值放到数组的第一个位置，然后再将新的数据插入。但如果数组一开始就有空闲空间，则直接将数据插入数组。</p><p>最理想的情况下，数组中有空闲空间，我们只需要将数据插入到数组下标为 <code>count</code> 的位置就可以了，所以最好情况时间复杂度为 $O(1)$。</p><p>最坏的情况下，数组中没有空闲空间了，我们需要先做一次数组的遍历求和，然后再将数据插入，所以最坏情况时间复杂度为 $O(n)$。</p><p>平均时间复杂度也可以用上面的方法计算出来，假设数组的长度是 n，根据数据插入的位置的不同，我们可以分为 n 种情况，每种情况的时间复杂度是 $O(1)$。除此之外，还有一种“额外”的情况，就是在数组没有空闲空间时插入一个数据，这个时候的时间复杂度是 $O(n)$。而且，这 n+1 种情况发生的概率一样，都是 $1&#x2F;(n+1)$。所以，根据加权平均的计算方法，我们求得的平均时间复杂度就是：</p><p>$$<br>\begin{align}<br>T(n) &amp;&#x3D; 1\times\frac{1}{n+1}+ 1\times\frac{1}{n+1}+ \dots+ 1\times\frac{1}{n+1}+ n\times\frac{1}{n+1}\<br>&amp;&#x3D; \frac{(n+1)\times1}{n+1}+n\times\frac{1}{n+1}\<br>&amp;&#x3D; 1+n\times\frac{1}{n+1}\<br>&amp;&#x3D; \frac{n}{n}\<br>&amp;&#x3D; O(1)<br>\end{align}<br>$$</p><p>至此为止，前面的最好、最坏、平均时间复杂度的计算，理解起来应该都没有问题。但是这个例子里的平均复杂度分析其实并不需要这么复杂，不需要引入概率论的知识。这是为什么呢？我们先来对比一下这个 <code>insert()</code> 的例子和前面那个 <code>find()</code> 的例子，你就会发现这两者有很大差别。</p><p>首先，<code>find()</code> 函数在极端情况下，复杂度才为 $O(1)$。但 <code>insert()</code> 在大部分情况下，时间复杂度都为 $O(1)$。只有个别情况下，复杂度才比较高，为 $O(n)$。这是 <code>insert()</code> <strong>第一个</strong>区别于 <code>find()</code> 的地方。</p><p>我们再来看<strong>第二个</strong>不同的地方。对于 <code>insert()</code> 函数来说，$O(1)$ 时间复杂度的插入和 $O(n)$ 时间复杂度的插入，出现的频率是非常有规律的，而且有一定的前后时序关系，一般都是一个 $O(n)$ 插入之后，紧跟着 n-1 个 $O(1)$ 的插入操作，循环往复。</p><p>所以，针对这样一种特殊场景的复杂度分析，我们并不需要像之前讲平均复杂度分析方法那样，找出所有的输入情况及相应的发生概率，然后再计算加权平均值。</p><p>针对这种特殊的场景，我们引入了一种更加简单的分析方法：<strong>摊还分析法</strong>，通过摊还分析得到的时间复杂度我们起了一个名字，叫<strong>均摊时间复杂度</strong>。</p><p>那究竟如何使用摊还分析法来分析算法的均摊时间复杂度呢？</p><p>我们还是继续看在数组中插入数据的这个例子。每一次 $O(n)$ 的插入操作，都会跟着 n-1 次 $O(1)$ 的插入操作，所以把耗时多的那次操作均摊到接下来的 n-1 次耗时少的操作上，均摊下来，这一组连续的操作的均摊时间复杂度就是 $O(1)$。这就是均摊分析的大致思路。</p><p>均摊时间复杂度和摊还分析应用场景比较特殊，所以我们并不会经常用到。为了方便你理解、记忆，我这里简单总结一下它们的应用场景。如果有遇到了，知道是怎么回事儿就行了。</p><p>对一个数据结构进行一组连续操作中，大部分情况下时间复杂度都很低，只有个别情况下时间复杂度比较高，而且这些操作之间存在前后连贯的时序关系，这个时候，我们就可以将这一组操作放在一块儿分析，看是否能将较高时间复杂度那次操作的耗时，平摊到其他那些时间复杂度比较低的操作上。</p><p>而且，在能够应用均摊时间复杂度分析的场合，一般均摊时间复杂度就等于最好情况时间复杂度。</p><h2 id="内容小结"><a href="#内容小结" class="headerlink" title="内容小结"></a>内容小结</h2><p>今天我们学习了几个复杂度分析相关的概念，分别有：最好情况时间复杂度、最坏情况时间复杂度、平均情况时间复杂度、均摊时间复杂度。</p><p>之所以引入这几个复杂度概念，是因为，同一段代码，在不同输入的情况下，复杂度量级有可能是不一样的。</p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><p><a href="https://time.geekbang.org/column/intro/100017301">极客时间@王争 - 数据结构与算法之美</a></p><p><a href="https://katex.org/docs/supported.html">katex Docs</a></p><p><a href="https://zh.wikipedia.org/wiki/%E5%8A%A0%E6%AC%8A%E5%B9%B3%E5%9D%87%E6%95%B8">Wikipedia - 加權平均數</a></p><p><a href="https://blog.csdn.net/jizhidexiaoming/article/details/86686933">CSDN@Mr.Q - LaTex 点乘，叉乘，点除，分数等常用算法</a></p><p><a href="https://www.zhihu.com/question/19994348/answer/170127505">知乎@alven - 如何证明概率论的乘法公式？</a></p>]]></content>
    
    
    <summary type="html">&lt;a href=&quot;/24&quot; title=&quot;算法复杂度分析(上)&quot;&gt;上一篇&lt;/a&gt; 文章简单的分析了大 O 表示法，还有几个分析的技巧。当然在实际情况下，仅依靠之前的知识无法解决。</summary>
    
    
    
    <category term="算法" scheme="https://blackyau.cc/categories/%E7%AE%97%E6%B3%95/"/>
    
    
    <category term="大O" scheme="https://blackyau.cc/tags/%E5%A4%A7O/"/>
    
    <category term="算法" scheme="https://blackyau.cc/tags/%E7%AE%97%E6%B3%95/"/>
    
    <category term="时间复杂度" scheme="https://blackyau.cc/tags/%E6%97%B6%E9%97%B4%E5%A4%8D%E6%9D%82%E5%BA%A6/"/>
    
    <category term="空间复杂度" scheme="https://blackyau.cc/tags/%E7%A9%BA%E9%97%B4%E5%A4%8D%E6%9D%82%E5%BA%A6/"/>
    
  </entry>
  
  <entry>
    <title>算法复杂度分析(上)</title>
    <link href="https://blackyau.cc/24"/>
    <id>https://blackyau.cc/24</id>
    <published>2020-02-19T15:03:00.000Z</published>
    <updated>2020-03-20T05:23:00.000Z</updated>
    
    <content type="html"><![CDATA[<p>刷题的时候经常都会看到 O(n<sup>2</sup>) 之类的公式用来表示某个算法的复杂度，我也只能大概的判断大小。最近正在系统的学习数据结构，这里也记录以下自己的思路，同时能帮助到其他朋友理解到，就更好了。</p><span id="more"></span><h2 id="为什么需要复杂度分析"><a href="#为什么需要复杂度分析" class="headerlink" title="为什么需要复杂度分析"></a>为什么需要复杂度分析</h2><p>在写代码的时候，我总是会有意识的想要提升运行效率，但是我在写完了一段代码后我也不知道它的速度是快还是慢。我甚至会在满足一些自己所谓的高效率，而花费太多不必要的时间。</p><p>通过对空间、时间复杂度的分析，可以让我写代码时更佳自信。</p><h2 id="大-O-复杂度表示法"><a href="#大-O-复杂度表示法" class="headerlink" title="大 O 复杂度表示法"></a>大 O 复杂度表示法</h2><p>先上代码</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">static</span> <span class="type">int</span> <span class="title function_">cal</span><span class="params">(<span class="type">int</span> n)</span> &#123;</span><br><span class="line">   <span class="type">int</span> <span class="variable">sum</span> <span class="operator">=</span> <span class="number">0</span>;</span><br><span class="line">   <span class="type">int</span> <span class="variable">i</span> <span class="operator">=</span> <span class="number">1</span>;</span><br><span class="line">   <span class="keyword">for</span> (; i &lt;= n; ++i) &#123;</span><br><span class="line">     sum = sum + i;</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">return</span> sum;</span><br><span class="line"> &#125;</span><br></pre></td></tr></table></figure><p>一个简单的 计算数列和 的代码，也就是 <code>1, 2, 3, ... ， n-1, n</code> 所有数字的和。</p><p>虽然计算机运行代码的时间，会比想象中的难以计算。因为每一行代码都是不同的，还会有很多不确定的情况。但是在计算复杂度时，都会大致将其看作一行代码，花了一个单位时间。</p><p>所以在这段代码中，L2-3 分别运行了一次，L4-5 分别运行了 n 次。那么我将这个代码的运行时间大概估计为 $2n+2$ 次，同时得出了一个结论，<strong>所有代码的执行时间 T(n) 与每行代码的执行次数成正比。</strong></p><p>再看一段代码</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">static</span> <span class="type">int</span> <span class="title function_">cal</span><span class="params">(<span class="type">int</span> n)</span> &#123;</span><br><span class="line">   <span class="type">int</span> <span class="variable">sum</span> <span class="operator">=</span> <span class="number">0</span>;</span><br><span class="line">   <span class="type">int</span> <span class="variable">i</span> <span class="operator">=</span> <span class="number">1</span>;</span><br><span class="line">   <span class="type">int</span> <span class="variable">j</span> <span class="operator">=</span> <span class="number">1</span>;</span><br><span class="line">   <span class="keyword">for</span> (; i &lt;= n; ++i) &#123;</span><br><span class="line">     j = <span class="number">1</span>;</span><br><span class="line">     <span class="keyword">for</span> (; j &lt;= n; ++j) &#123;</span><br><span class="line">       sum = sum +  i * j;</span><br><span class="line">     &#125;</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">return</span> sum;</span><br><span class="line"> &#125;</span><br></pre></td></tr></table></figure><p>在这一段代码中，L2-4 一共运行了 3 次（L1 初始化就不计了），L5 和 L6 都运行了 n 次，那么一共需要 $2n$ 次。L7 和 L8 都运行了 $n^2$ 次，那么一共运行了 $2n^2$ 次。如果把运行一次的时间设为 RunTime ，这段代码一共运行的时间 T(n) 为。</p><p>$$ T(n)&#x3D;(2n^2+2n+3)*RunTime $$</p><p>通过对这两段代码的分析，我们可以得到一个非常重要的规律，即，<strong>所有代码的执行时间 T(n) 与每行代码的执行次数 n 成正比</strong>。</p><p>我们可以把这个规律总结成一个规律，也就是</p><p>$$ T(n)&#x3D;O(f(n)) $$</p><table><thead><tr><th>符号</th><th>含义</th></tr></thead><tbody><tr><td>T(n)</td><td>代码执行的时间</td></tr><tr><td>n</td><td>数据的规模大小</td></tr><tr><td>f(n)</td><td>每行代码执行的次数总和</td></tr><tr><td>O</td><td>代码的执行时间 T(n) 与 f(n) 表达式成正比</td></tr></tbody></table><p>第一段代码的 $T(n)&#x3D;O(2n+2)$ 和 第二段代码的 $T(n)&#x3D;O(2n^2+2n+3)$ 也就是<strong>大 O 时间复杂度表示法</strong>。</p><blockquote><p>大 O 时间复杂度实际上并不具体表示代码真正的执行时间，而是表示<strong>代码执行时间随数据规模增长的变化趋势</strong>，所以，也叫作<strong>渐进时间复杂度</strong>（asymptotic time complexity），简称<strong>时间复杂度</strong>。</p></blockquote><p>当 n 很大时，你可以把它想象成 10000、100000。而公式中的低阶、常量、系数三部分并不左右增长趋势，所以都可以忽略。我们只需要记录一个最大量级就可以了，如果用大 O 表示法表示刚讲的那两段代码的时间复杂度，就可以记为</p><p>$$ T(n) &#x3D; O(n) $$</p><p>$$ T(n) &#x3D; O(n^2) $$</p><h2 id="时间复杂度分析"><a href="#时间复杂度分析" class="headerlink" title="时间复杂度分析"></a>时间复杂度分析</h2><p>在进行时间复杂度分析时，可以使用这三个比较使用的方法。</p><h3 id="只关注循环执行次数最多的一段代码"><a href="#只关注循环执行次数最多的一段代码" class="headerlink" title="只关注循环执行次数最多的一段代码"></a>只关注循环执行次数最多的一段代码</h3><p>我们在分析一个算法、一段代码的时间复杂度的时候，也只关注循环执行次数最多的那一段代码就可以了。</p><p>还是第一段代码为例</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">static</span> <span class="type">int</span> <span class="title function_">cal</span><span class="params">(<span class="type">int</span> n)</span> &#123;</span><br><span class="line">   <span class="type">int</span> <span class="variable">sum</span> <span class="operator">=</span> <span class="number">0</span>;</span><br><span class="line">   <span class="type">int</span> <span class="variable">i</span> <span class="operator">=</span> <span class="number">1</span>;</span><br><span class="line">   <span class="keyword">for</span> (; i &lt;= n; ++i) &#123;</span><br><span class="line">     sum = sum + i;</span><br><span class="line">   &#125;</span><br><span class="line">   <span class="keyword">return</span> sum;</span><br><span class="line"> &#125;</span><br></pre></td></tr></table></figure><p>因为在这段代码中，L2-3 都是常量级的执行时间，与 n 的数据大小无关，所以在我们进行时间复杂度分析时可忽略不计。而循环次数最多的 L4-5 才是我们需要关系的。在前面分析的时候也说了，它被执行了 n 次，所以时间复杂程度也就是 $O(n)$ 。</p><h3 id="加法法则：总复杂度等于量级最大的那段代码的复杂度"><a href="#加法法则：总复杂度等于量级最大的那段代码的复杂度" class="headerlink" title="加法法则：总复杂度等于量级最大的那段代码的复杂度"></a>加法法则：总复杂度等于量级最大的那段代码的复杂度</h3><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">static</span> <span class="type">int</span> <span class="title function_">cal</span><span class="params">(<span class="type">int</span> n)</span> &#123;</span><br><span class="line">   <span class="type">int</span> <span class="variable">sum_1</span> <span class="operator">=</span> <span class="number">0</span>;</span><br><span class="line">   <span class="type">int</span> <span class="variable">p</span> <span class="operator">=</span> <span class="number">1</span>;</span><br><span class="line">   <span class="keyword">for</span> (; p &lt; <span class="number">100</span>; ++p) &#123;</span><br><span class="line">     sum_1 = sum_1 + p;</span><br><span class="line">   &#125;</span><br><span class="line"></span><br><span class="line">   <span class="type">int</span> <span class="variable">sum_2</span> <span class="operator">=</span> <span class="number">0</span>;</span><br><span class="line">   <span class="type">int</span> <span class="variable">q</span> <span class="operator">=</span> <span class="number">1</span>;</span><br><span class="line">   <span class="keyword">for</span> (; q &lt; n; ++q) &#123;</span><br><span class="line">     sum_2 = sum_2 + q;</span><br><span class="line">   &#125;</span><br><span class="line"> </span><br><span class="line">   <span class="type">int</span> <span class="variable">sum_3</span> <span class="operator">=</span> <span class="number">0</span>;</span><br><span class="line">   <span class="type">int</span> <span class="variable">i</span> <span class="operator">=</span> <span class="number">1</span>;</span><br><span class="line">   <span class="type">int</span> <span class="variable">j</span> <span class="operator">=</span> <span class="number">1</span>;</span><br><span class="line">   <span class="keyword">for</span> (; i &lt;= n; ++i) &#123;</span><br><span class="line">     j = <span class="number">1</span>; </span><br><span class="line">     <span class="keyword">for</span> (; j &lt;= n; ++j) &#123;</span><br><span class="line">       sum_3 = sum_3 +  i * j;</span><br><span class="line">     &#125;</span><br><span class="line">   &#125;</span><br><span class="line"> </span><br><span class="line">   <span class="keyword">return</span> sum_1 + sum_2 + sum_3;</span><br><span class="line"> &#125;</span><br></pre></td></tr></table></figure><p> 不难看出来第一段循环它的运行次数是固定的 100 次，所以是一个常量的执行时间，这是与 n 无关的。</p><p> 第二段代码的循环次数是根据输入的 n 来决定的，所以它的时间复杂度为 $O(n)$ 。</p><p> 第三段代码也就是在第二段的基础上，又加了一次，也就是 $O(n^2)$ 。</p><p> 我们在计算的时候，只选取其中时间复杂度最大的一个。所以这段代码的时间复杂度为 $O(n^2)$ 。</p><p> 也就是说：总的时间复杂度就等于量级最大的那段代码的时间复杂度</p><h3 id="乘法法则：嵌套代码的复杂度等于嵌套内外代码复杂度的乘积"><a href="#乘法法则：嵌套代码的复杂度等于嵌套内外代码复杂度的乘积" class="headerlink" title="乘法法则：嵌套代码的复杂度等于嵌套内外代码复杂度的乘积"></a>乘法法则：嵌套代码的复杂度等于嵌套内外代码复杂度的乘积</h3> <figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">static</span> <span class="type">int</span> <span class="title function_">cal</span><span class="params">(<span class="type">int</span> n)</span> &#123;</span><br><span class="line">   <span class="type">int</span> <span class="variable">ret</span> <span class="operator">=</span> <span class="number">0</span>; </span><br><span class="line">   <span class="type">int</span> <span class="variable">i</span> <span class="operator">=</span> <span class="number">1</span>;</span><br><span class="line">   <span class="keyword">for</span> (; i &lt; n; ++i) &#123;</span><br><span class="line">     ret = ret + f(i);</span><br><span class="line">   &#125; </span><br><span class="line"> &#125; </span><br><span class="line"> </span><br><span class="line"> <span class="type">int</span> <span class="title function_">f</span><span class="params">(<span class="type">int</span> n)</span> &#123;</span><br><span class="line">  <span class="type">int</span> <span class="variable">sum</span> <span class="operator">=</span> <span class="number">0</span>;</span><br><span class="line">  <span class="type">int</span> <span class="variable">i</span> <span class="operator">=</span> <span class="number">1</span>;</span><br><span class="line">  <span class="keyword">for</span> (; i &lt; n; ++i) &#123;</span><br><span class="line">    sum = sum + i;</span><br><span class="line">  &#125; </span><br><span class="line">  <span class="keyword">return</span> sum;</span><br><span class="line"> &#125;</span><br></pre></td></tr></table></figure><p>我们单独看 cal() 函数。假设 f() 只是一个普通的操作，那第 4～6 行的时间复杂度就是，$T1(n) &#x3D; O(n)$。但 f() 函数本身不是一个简单的操作，它的时间复杂度是 $T2(n) &#x3D; O(n)$，所以，整个 cal() 函数的时间复杂度就是，$T(n) &#x3D; T1(n) * T2(n) &#x3D; O(n*n) &#x3D; O(n^2)$。</p><h2 id="几种常见时间复杂度实例分析"><a href="#几种常见时间复杂度实例分析" class="headerlink" title="几种常见时间复杂度实例分析"></a>几种常见时间复杂度实例分析</h2><p>虽然代码千差万别，但是常见的复杂度量级也就只有几个。大致可以分为两类。</p><p>多项式量级：</p><ul><li>常量阶 $O(1)$</li><li>对数阶 $O(logn)$</li><li>线性阶 $O(n)$</li><li>线性对数阶 $O(nlogn)$</li><li>平方阶 $O(n^2)$、立方阶 $O(n^3)$ … K次方阶 $O(n^k)$</li></ul><p>非多项式量级：</p><ul><li>指数阶 $O(2^n)$</li><li>阶乘阶 $O(n!)$</li></ul><p>我们把时间复杂度为非多项式量级的算法问题叫作 NP（Non-Deterministic Polynomial，非确定多项式）问题。</p><p>当数据规模 n 越来越大时，非多项式量级算法的执行时间会急剧增加，求解问题的执行时间会无限增长。所以，非多项式时间复杂度的算法其实是非常低效的算法。因此，关于 NP 时间复杂度我就不展开讲了。我们主要来看几种常见的<strong>多项式时间复杂度</strong>。</p><h3 id="O-1"><a href="#O-1" class="headerlink" title="O(1)"></a>O(1)</h3><p>需要注意的是，$O(1)$ 只是常量级时间复杂度的一种表示方法，并不是指只执行了一行代码。比如这段代码，即使有 3 行，它的时间复杂度也是 $O(1)$，而不是 $O(3)$。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">int</span> <span class="variable">i</span> <span class="operator">=</span> <span class="number">8</span>;</span><br><span class="line"><span class="type">int</span> <span class="variable">j</span> <span class="operator">=</span> <span class="number">6</span>;</span><br><span class="line"><span class="type">int</span> <span class="variable">sum</span> <span class="operator">=</span> i + j;</span><br></pre></td></tr></table></figure><p>可以说，只要代码的执行时间不随 n 的增大而增长，这样代码的时间复杂度我们都记作$O(1)$。或者说，<strong>一般情况下，只要算法中不存在循环语句、递归语句，即使有成千上万行的代码，其时间复杂度也是 $O(1)$</strong>。</p><h3 id="O-logn-、O-nlogn"><a href="#O-logn-、O-nlogn" class="headerlink" title="O(logn)、O(nlogn)"></a>O(logn)、O(nlogn)</h3><p>对数阶时间复杂度非常常见，同时也是最难分析的一种时间复杂度。看代码。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">i=<span class="number">1</span>;</span><br><span class="line"><span class="keyword">while</span> (i &lt;= n)  &#123;</span><br><span class="line">  i = i * <span class="number">2</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>根据我们前面讲的复杂度分析方法，第三行代码是循环执行次数最多的。所以，我们只要能计算出这行代码被执行了多少次，就能知道整段代码的时间复杂度。</p><p>从代码中可以看出，变量 i 的值从 1 开始取，每循环一次就乘以 2。当大于 n 时，循环结束。还记得我们高中学过的等比数列吗？实际上，变量 i 的取值就是一个等比数列。如果我把它一个一个列出来，就应该是这个样子的：</p><p>$$ 2^0, 2^1, 2^2, … 2^{x-1}, 2^x &#x3D; n $$</p><p>每一个数都是前一个数再乘一个 2。</p><p>所以，我们只要知道 x 值是多少，就知道这行代码执行的次数了。通过 $2^x&#x3D;n$ 求解 x 就是 $x&#x3D;log_2n$，所以，这段代码的时间复杂度就是</p><p>$$ O(log_2n) $$</p><p>再看一段代码</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">i=<span class="number">1</span>;</span><br><span class="line"><span class="keyword">while</span> (i &lt;= n)  &#123;</span><br><span class="line">  i = i * <span class="number">3</span>;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>根据我刚刚的思路，很简单就能看出来，这段代码的时间复杂度为 $O(log_3n)$。</p><p>实际上，不管是以 2 为底、以 3 为底，还是以 10 为底，我们可以把所有对数阶的时间复杂度都记为 $O(logn)$。为什么呢？</p><p>我们知道，对数之间是可以互相转换的，$log_3n$ 可以进行以下转换。以下所有变换均使用了如下的换底公式。</p><p>$$ log_bN &#x3D; \frac{log_aN}{log_ab} $$</p><p>下面开始进行互换，首先设 a 为 2 将所有数换为以 2 为底的。将含有 n 的分母移出来。再使用换底公式将 $log_23$ 换为以 3 为底的。又因为分母的 $log_33&#x3D;1$ 所以我们换底就成功拉。</p><p>$$<br>\begin{align}<br>log_3n &amp;&#x3D; \frac{log_2n}{log_23}\<br>&amp;&#x3D; \frac{1}{log_23} \cdot log_2n\<br>&amp;&#x3D; \frac{1}{\frac{log_33}{log_32}}\cdot log_2n\<br>&amp;&#x3D; \frac{log_32}{log_33}\cdot log_2n\<br>&amp;&#x3D; \frac{log_32}{1}\cdot log_2n\<br>&amp;&#x3D; log_32\cdot log_2n<br>\end{align}<br>$$</p><p>因为 $log_3n$ 等于 $log_32 * log_2n$，所以 $O(log_3n) &#x3D; O(C*log_2n)$，其中 $C &#x3D; log_32$ 是一个常量。基于我们前面的一个理论：<strong>在采用大 O 标记复杂度的时候，可以忽略系数，即 O(Cf(n)) &#x3D; O(f(n))</strong>。所以，$O(log_2n)$ 就等于 $O(log_3n)$。因此，在对数阶时间复杂度的表示方法里，我们忽略对数的“底”，统一表示为 $O(logn)$。</p><p>如果你理解了我前面讲的 $O(logn)$，那 $O(nlogn)$ 就很容易理解了。还记得我们刚讲的乘法法则吗？如果一段代码的时间复杂度是 $O(logn)$，我们循环执行 n 遍，时间复杂度就是 $O(nlogn)$ 了。而且，$O(nlogn)$ 也是一种非常常见的算法时间复杂度。比如，归并排序、快速排序的时间复杂度都是 $O(nlogn)$。</p><h3 id="O-m-n-、O-m-n"><a href="#O-m-n-、O-m-n" class="headerlink" title="O(m+n)、O(m*n)"></a>O(m+n)、O(m*n)</h3><p>我们再来讲一种跟前面都不一样的时间复杂度，代码的复杂度<strong>由两个数据的规模</strong>来决定。</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">static</span> <span class="type">int</span> <span class="title function_">cal</span><span class="params">(<span class="type">int</span> m, <span class="type">int</span> n)</span> &#123;</span><br><span class="line">  <span class="type">int</span> <span class="variable">sum_1</span> <span class="operator">=</span> <span class="number">0</span>;</span><br><span class="line">  <span class="type">int</span> <span class="variable">i</span> <span class="operator">=</span> <span class="number">1</span>;</span><br><span class="line">  <span class="keyword">for</span> (; i &lt; m; ++i) &#123;</span><br><span class="line">    sum_1 = sum_1 + i;</span><br><span class="line">  &#125;</span><br><span class="line"></span><br><span class="line">  <span class="type">int</span> <span class="variable">sum_2</span> <span class="operator">=</span> <span class="number">0</span>;</span><br><span class="line">  <span class="type">int</span> <span class="variable">j</span> <span class="operator">=</span> <span class="number">1</span>;</span><br><span class="line">  <span class="keyword">for</span> (; j &lt; n; ++j) &#123;</span><br><span class="line">    sum_2 = sum_2 + j;</span><br><span class="line">  &#125;</span><br><span class="line"></span><br><span class="line">  <span class="keyword">return</span> sum_1 + sum_2;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>从代码中可以看出，m 和 n 是表示两个数据规模。我们无法事先评估 m 和 n 谁的量级大，所以我们在表示复杂度的时候，就不能简单地利用加法法则，省略掉其中一个。所以，上面代码的时间复杂度就是 $O(m+n)$。</p><p>针对这种情况，原来的加法法则就不正确了，我们需要将加法规则改为：$T1(m) + T2(n) &#x3D; O(f(m) + g(n))$。但是乘法法则继续有效：$T1(m)*T2(n) &#x3D; O(f(m) * f(n))$。</p><h2 id="空间复杂度分析"><a href="#空间复杂度分析" class="headerlink" title="空间复杂度分析"></a>空间复杂度分析</h2><p>时间复杂度的全称是<strong>渐进时间复杂度，表示算法的执行时间与数据规模之间的增长关系</strong>。类比一下，空间复杂度全称就是<strong>渐进空间复杂度</strong>（asymptotic space complexity），<strong>表示算法的存储空间与数据规模之间的增长关系</strong>。</p><p>先看一段有点傻的代码</p><figure class="highlight java"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">static</span> <span class="keyword">void</span> <span class="title function_">print</span><span class="params">(<span class="type">int</span> n)</span> &#123;</span><br><span class="line">  <span class="type">int</span> <span class="variable">i</span> <span class="operator">=</span> <span class="number">0</span>;</span><br><span class="line">  <span class="type">int</span>[] a = <span class="keyword">new</span> <span class="title class_">int</span>[n];</span><br><span class="line">  <span class="keyword">for</span> (i; i &lt;n; ++i) &#123;</span><br><span class="line">    a[i] = i * i;</span><br><span class="line">  &#125;</span><br><span class="line"></span><br><span class="line">  <span class="keyword">for</span> (i = n-<span class="number">1</span>; i &gt;= <span class="number">0</span>; --i) &#123;</span><br><span class="line">    print out a[i]</span><br><span class="line">  &#125;</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>跟时间复杂度分析一样，我们可以看到，第 2 行代码中，我们申请了一个空间存储变量 i，但是它是常量阶的，跟数据规模 n 没有关系，所以我们可以忽略。第 3 行申请了一个大小为 n 的 int 类型数组，除此之外，剩下的代码都没有占用更多的空间，所以整段代码的空间复杂度就是 O(n)。</p><p>我们常见的空间复杂度就是 $O(1)$、$O(n)$、$O(n2)$，像 $O(logn)$、$O(nlogn)$ 这样的对数阶复杂度平时都用不到。而且，空间复杂度分析比时间复杂度分析要简单很多。所以，对于空间复杂度，这样就够了。</p><h2 id="小结"><a href="#小结" class="headerlink" title="小结"></a>小结</h2><p>复杂度也叫渐进复杂度，包括时间复杂度和空间复杂度，用来分析算法执行效率与数据规模之间的增长关系，可以粗略地表示，越高阶复杂度的算法，执行效率越低。常见的复杂度并不多，从低阶到高阶有：$O(1)$、$O(logn)$、$O(n)$、$O(nlogn)$、$O(n2)$。</p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><p><a href="https://time.geekbang.org/column/intro/100017301">极客时间@王争 - 数据结构与算法之美</a></p><p><a href="https://katex.org/docs/supported.html">katex Docs</a></p><p><a href="https://www.jianshu.com/p/8344510def39">简书@胜负55开 - LateX：对数表示</a></p><p><a href="https://blog.csdn.net/PursueLuo/article/details/95647876">简书@PursueLuo - Latex 公式等号对齐</a></p><p><a href="https://baike.baidu.com/item/%E6%8D%A2%E5%BA%95%E5%85%AC%E5%BC%8F">百度百科 - 换底公式</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;刷题的时候经常都会看到 O(n&lt;sup&gt;2&lt;/sup&gt;) 之类的公式用来表示某个算法的复杂度，我也只能大概的判断大小。最近正在系统的学习数据结构，这里也记录以下自己的思路，同时能帮助到其他朋友理解到，就更好了。&lt;/p&gt;</summary>
    
    
    
    <category term="算法" scheme="https://blackyau.cc/categories/%E7%AE%97%E6%B3%95/"/>
    
    
    <category term="大O" scheme="https://blackyau.cc/tags/%E5%A4%A7O/"/>
    
    <category term="算法" scheme="https://blackyau.cc/tags/%E7%AE%97%E6%B3%95/"/>
    
    <category term="时间复杂度" scheme="https://blackyau.cc/tags/%E6%97%B6%E9%97%B4%E5%A4%8D%E6%9D%82%E5%BA%A6/"/>
    
    <category term="空间复杂度" scheme="https://blackyau.cc/tags/%E7%A9%BA%E9%97%B4%E5%A4%8D%E6%9D%82%E5%BA%A6/"/>
    
  </entry>
  
  <entry>
    <title>IPTV 与互联网融合</title>
    <link href="https://blackyau.cc/23"/>
    <id>https://blackyau.cc/23</id>
    <published>2020-01-15T09:10:00.000Z</published>
    <updated>2021-10-04T08:09:01.000Z</updated>
    
    <content type="html"><![CDATA[<p>因为家里弱电箱到客厅电视只有一条线，在原有情况下无法做到 IPTV 和互联网在同一线路上通过。本教程不仅可以解决单线路同时播放互联网软件节目，还可以让任意设备播放 IPTV 节目。</p><span id="more"></span><h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><h3 id="依赖"><a href="#依赖" class="headerlink" title="依赖"></a>依赖</h3><ul><li>需要一个能够刷 OpenWrt 的路由器(需具有<code>数据包镜像</code>和 <code>udpxy</code> 功能&#x2F;插件)用于抓包和后续的使用，因为其功耗较低且价格比较便宜 Newifi D2 拼多多 100 以下就可以拿下</li><li>能够正常播放节目的 IPTV 机顶盒，如果自己家里都没有能用的那就没有融合一说了</li></ul><h3 id="环境"><a href="#环境" class="headerlink" title="环境"></a>环境</h3><blockquote><p>因为不同地区网络环境不同，我这里是四川电信，但是可能在同一个省环境都会不一样，所以并不一定在其他地区可用。</p></blockquote><table><thead><tr><th>设备</th><th>型号</th><th>软件</th></tr></thead><tbody><tr><td>光猫</td><td>TEWA-500E</td><td>-</td></tr><tr><td>主路由</td><td>Newifi-D2</td><td>Lean OpenWrt R9.6.1</td></tr><tr><td>AP</td><td>腾达 AC6</td><td>-</td></tr></tbody></table><p><img data-src="https://st.blackyau.net/blog/23/1.png" alt="网络结构拓扑图"></p><p>无需电信 IPTV 机顶盒，也可在任何设备上通过 http 链接直接播放直播节目。下图分别为 PotPlayer(PC) 和 超级直播(Android) 播放节目效果图。</p><p><img data-src="https://st.blackyau.net/blog/23/2.jpg" alt="PC PotPlayer"></p><p><img data-src="https://st.blackyau.net/blog/23/3.jpg" alt="Android 超级直播"></p><h3 id="iGMP-与-RTSP"><a href="#iGMP-与-RTSP" class="headerlink" title="iGMP 与 RTSP"></a>iGMP 与 RTSP</h3><p>在 IPTV 中常见的两种用于播放直播节目的协议分别为 IGMP 和 RTSP，他们之间的差异如下。</p><table><thead><tr><th align="center">协议</th><th align="center">节目类型</th><th align="center">可用时间</th><th align="center">鉴权</th></tr></thead><tbody><tr><td align="center">IGMP</td><td align="center">直播</td><td align="center">长期</td><td align="center">强制</td></tr><tr><td align="center">RTSP</td><td align="center">直播&#x2F;回放&#x2F;点播</td><td align="center">短期</td><td align="center">非强制</td></tr></tbody></table><h4 id="IGMP"><a href="#IGMP" class="headerlink" title="IGMP"></a>IGMP</h4><p><strong>网路群组管理协议</strong>（英语：Internet Group Management Protocol，缩写：IGMP）是用于管理网路协议多播组成员的一种通信协议，有时候我也会将其称为<strong>组播</strong>。</p><p>在电信这边，组播地址通常很少变化，但是很重要的是它只能看直播不能看回放。又因为它的地址是内网地址，所以你必须要获取到电信的内网IP才能正常播放。我比较倾向于使用组播地址，因为电视节目回放有啥可看的，一般都是爱奇艺什么的了，而且最重要的是它的地址很少变化，这样就给不会倒腾的家人减少了很多麻烦。</p><p>原理上组播和广播(给网络里的所有人都发送一个消息)有点相似，但是组播会划分一个更小的范围，并且这个范围里面设备的名单会同时由客户端和主机端进行维护，路由器会根据不同的组别来转发不同的数据。</p><p>下面假装这个网络里面，有 3组 人正在分别在看 3个 节目。</p><p><img data-src="https://st.blackyau.net/blog/23/4.png" alt="IGMP1"></p><p>正在收看 CCTV-251 的朋友说，这太假了。我想看点正能量的、让人血脉喷张的。然后请求换到 正在播放 CCTV-1 的 <code>igmp://239.93.22.133:9260</code></p><p><img data-src="https://st.blackyau.net/blog/23/5.png" alt="IGMP2"></p><p>路由器随后听到了这位朋友的呼唤，然后就将它放进了 CCTV-1 的组里。</p><p><img data-src="https://st.blackyau.net/blog/23/6.png" alt="IGMP3"></p><p>通过上面的例子你大致能了解到 IGMP 协议的工作原理，可以简单的总结为 IGMP 就是 <code>一对多</code>，下面的一个例子则是和 IGMP 相反。</p><h4 id="RTSP"><a href="#RTSP" class="headerlink" title="RTSP"></a>RTSP</h4><p><strong>实时流协议</strong>（Real Time Streaming Protocol，RTSP）是一种网络应用协议，专为娱乐和通信系统的使用，以控制流媒体服务器。该协议用于创建和控制终端之间的媒体会话。媒体服务器的客户端发布VCR命令，例如播放，录制和暂停，以便于实时控制从服务器到客户端（视频点播）或从客户端到服务器（语音录音）的媒体流，有时候我也会将其称为<strong>时移</strong>。</p><p>原理上 RTSP 和常见的 HTTP 协议比较相似，也就是 <code>一对一</code>，下面这个图可以帮助你理解。</p><p>你在观看节目的时候，可以随意的后退暂停，也可以自己想看什么就看什么，不用加入别人的组，整个资源都被你一个人享用。就和你平时看爱奇艺，B站什么的没区别。</p><p><img data-src="https://st.blackyau.net/blog/23/7.png" alt="RTSP"></p><p>正如上面的介绍一样，RTSP 的主要特点就是可以时移，也就是可以拖动进度条。而且大部分地区的 RTSP 地址都是公网 IP，甚至还可以在获取到地址后，不需要任何授权都可以直接正常播放。</p><p>所以网络上流传的 IPTV 直播源基本都是 RTSP 地址。不过四川电信这边 RTSP 有鉴权，必须要以电信的内网 IP 访问才行。</p><p>同时又因为有部分套餐的 IPTV 是没有回放权限的，所以电信应该还需要验证是谁在播放，这就让观看它成为了比较麻烦的事情(不同地区情况不同，这里只针对我所在的地区)。</p><h2 id="目标"><a href="#目标" class="headerlink" title="目标"></a>目标</h2><p>通过了我上面对 IGMP 和 RTSP 协议的介绍，相信你对他俩都有了一定的了解。接下来我将为你详细介绍本次教程的内容。</p><ul><li>适合本项目的硬件设备</li><li>软件安装及环境配置</li><li>抓包获取 IPTV 的 IGMP 和 RTSP 播放地址</li><li>使用 igmpproxy 将所有 IGMP 数据转发到 IPTV 口</li><li>使用 udpxy 将 IGMP 地址转换为 HTTP</li><li>在电视盒子、手机和 PC 上正常播放</li><li>在外欣赏自家 IPTV 直播源</li></ul><p>你可以通过点击右侧边栏，来快速跳跃到你需要的章节或查阅你当前的浏览进度。</p><h2 id="硬件"><a href="#硬件" class="headerlink" title="硬件"></a>硬件</h2><p>本章节会介绍你所需要的硬件设备，在抓包和使用融合网络时不可避免的会使用到，生活中平时少见的设备。</p><h3 id="路由器"><a href="#路由器" class="headerlink" title="路由器"></a>路由器</h3><p>IPTV与互联网融合，的主要设备也就是路由器了。一款合适的路由器，可以和电脑一起直接走通整个教程。对于本教程而言，一个能够刷 <code>OpenWrt</code> 或 <code>Lede</code> 同时还能安装 <code>igmpproxy</code> 和 <code>udpxy</code> 最好还支持 <code>交换机端口镜像</code> 就完美了。当然也有一些教程是通过 <code>Padavan</code> 来实现的，但是我个人没有尝试过，这里就不做评价了。</p><p><img data-src="https://st.blackyau.net/blog/23/8.jpg" alt="igmpproxy和udpxy 截图"></p><p><img data-src="https://st.blackyau.net/blog/23/9.jpg" alt="交换机端口镜像 截图"></p><p>如果你目前没有具有此功能的路由器，我推荐你购买 新三 它还有其他的名字 新路由3、Newifi D2、Newifi 3、Newifi 3 D2。也都是同一款，得益于所谓的矿难，这款路由器目前淘宝、拼多多和转转之类的，100元以内都可以拿下，同时购买的时候推荐你加钱让卖家刷好 <code>Breed</code> 和我使用一样的硬件设备，这也能让你配置的时候少走一些弯路。</p><p><img data-src="https://st.blackyau.net/blog/23/10.jpg" alt="新三 照片"></p><blockquote><p>Breed 相当于 Android 里面的 Recovery，Windows 里面的 PE。可以让你在刷机的时候不会轻易翻车。</p></blockquote><h3 id="抓包工具"><a href="#抓包工具" class="headerlink" title="抓包工具"></a>抓包工具</h3><p>通过抓包获取 IPTV 的<a href="https://blackyau.cc/23#iGMP-%E4%B8%8E-RTSP">组播地址</a>也是必不可少的一步。如果你的路由器没有 <code>交换机端口镜像</code> 的功能，你就需要淘宝单独购买一个 网络抓包工具，下图为 Amazon 搜索 <code>Throwing Star LAN Tap</code> 的外观图。</p><p><img data-src="https://st.blackyau.net/blog/23/11.jpg" alt="Throwing Star LAN Tap"></p><p>动手能力比较强的朋友，也可以参考恩山无线论坛的这个帖子<a href="https://www.right.com.cn/FORUM/thread-328186-1-1.html">小白的IPTV折腾教程（1）—0元DIY抓包神器</a>，利用两根网线和4个水晶头就可以做出一个具有同样功能的抓包工具。</p><p>不过我还是比较推荐刷一个具有 <code>交换机端口镜像</code> 功能的固件，毕竟直接就可以上手用。如果你是比较热门的机型，比如斐讯又或是我之前说到的 Newifi D2 或是其他搭载了 MT7621 芯片的路由器，应该都不难在恩山找到具有该功能的固件。</p><h2 id="软件"><a href="#软件" class="headerlink" title="软件"></a>软件</h2><p>下面列出了本次教程中所有需要的软件，我使用软件的版本，以及和下载链接。</p><table><thead><tr><th>设备</th><th>软件名</th><th>版本</th><th>下载地址</th></tr></thead><tbody><tr><td>PC</td><td>Wireshark</td><td>Portable 3.2.0</td><td><a href="https://www.wireshark.org/index.html#download">Wireshark 官网</a></td></tr><tr><td>PC</td><td>Notepad++</td><td>7.8.2</td><td><a href="https://notepad-plus-plus.org/downloads/">Notepad++ 官网</a></td></tr><tr><td>PC</td><td>Xshell</td><td>6.0.0032</td><td><a href="https://www.netsarang.com/zh/free-for-home-school/">Netsarang 家庭&#x2F;学校版</a></td></tr><tr><td>路由器</td><td>OpenWRT</td><td>R20.2.15</td><td><a href="https://st.blackyau.net/blog/23/newifi-d2.zip">newifi-d2.zip</a></td></tr><tr><td>路由器</td><td>igmpproxy</td><td>0.2.1-4</td><td>固件自带</td></tr><tr><td>路由器</td><td>udpxy</td><td>2016-09-18-53e4672a75..4-1</td><td>固件自带</td></tr><tr><td>路由器</td><td>luci-app-udpxy</td><td>git-19.146.62144-fd6fdb2-1</td><td>固件自带</td></tr></tbody></table><p>下面介绍了每个软件的用途。</p><table><thead><tr><th>软件名</th><th>用途</th></tr></thead><tbody><tr><td>Wireshark</td><td>抓包获取 IPTV 播放地址</td></tr><tr><td>Notepad++</td><td>抓包后数据整理</td></tr><tr><td>Xshell</td><td>SSH连接路由器</td></tr><tr><td>OpenWRT</td><td>路由器固件</td></tr><tr><td>igmpproxy</td><td>转发IGMP流量到指定端口</td></tr><tr><td>udpxy</td><td>IGMP流量转HTTP</td></tr></tbody></table><p>这里还有个友华 WR1200JS 可用的固件：<a href="https://st.blackyau.net/blog/23/youhua_wr1200js.zip">youhua_wr1200js.zip</a></p><h2 id="抓包"><a href="#抓包" class="headerlink" title="抓包"></a>抓包</h2><p>首先将来自光猫的互联网和往常一样连接到路由器的 WAN 口，将 ITV 口连接到路由器的 LAN 4 口，将 IPTV 盒子连接到路由器的 LAN 3 口，最后将 LAN 1 口连接至电脑。</p><p><img data-src="https://st.blackyau.net/blog/23/12.png" alt="抓包连线"></p><p>随后配置路由器的流量镜像功能（我提供的固件中路由器 IP 为 192.168.1.1），将接有 IPTV 盒子的 LAN 3 口（如果抓出来的包只有请求没有回复，那么就改抓 LAN 4）设置为 数据包镜像源端口，将接有电脑的 LAN 1 口设置为 数据包镜像监听端口。其他 VLAN 设置无需改动。</p><p><strong>要把光猫上网模式设置为桥接模式，并使用路由器进行拨号。否则当启动数据镜像后，IPTV 盒子会无法正常工作</strong></p><p><img data-src="https://st.blackyau.net/blog/23/13.jpg" alt="数据镜像设置"></p><p>保存并应用设置后，请严格按照以下步骤进行抓包：</p><ul><li>确认已经关闭电脑中所有程序，也包括 Wireshark</li><li>切断 IPTV 盒子电源</li><li>启动 Wireshark 并监听以太网接口</li><li>接通 IPTV 盒子电源</li><li>使用 IPTV 盒子遥控器，选择直播，并播放一个节目</li><li>让节目保持正常播放 2~3 秒</li><li>停止 Wireshark 抓包并保存抓包数据</li><li>切断 IPTV 盒子电源</li></ul><p>要关闭电脑中的其他联网程序，不然它也会被抓进来，干扰分析。</p><p>关闭 Wireshark 并重开，是方便定位 IPTV 盒子的初始化信息。让 IPTV 盒子初始化的信息，就在最初的几个数据包里面。</p><p>切断 IPTV 盒子电源并重开，是为了能够抓到有关初始化的信息，比如获取 IP 的方法，是 DHCP 获取 IP，还是 PPPOE 之类的。</p><p>启动 IPTV 盒子后，应不停的有数据显示在窗口中。</p><blockquote><p>如果你没有看到任何数据跳动，或者是特别少，应注意是否端口插错，或者是在设置流量镜像的地方有错。</p></blockquote><p>播放节目是为了让 IPTV 盒子去拉取直播节目列表，我们最需要的也就是直播节目列表信息。</p><h2 id="分析抓包数据"><a href="#分析抓包数据" class="headerlink" title="分析抓包数据"></a>分析抓包数据</h2><p><strong>因为不同地区的数据样式差异较大，我这里是四川电信，其他地区可供参考</strong></p><h3 id="获取地址"><a href="#获取地址" class="headerlink" title="获取地址"></a>获取地址</h3><p>在过滤器栏输入</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">http.request.uri contains &quot;frameset_builder.jsp&quot;</span><br></pre></td></tr></table></figure><p>四川成都电信可以尝试</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">http.request.uri contains &quot;AuthenticationURL&quot;</span><br></pre></td></tr></table></figure><p>右键第一个请求，追踪流 - HTTP 流</p><p><img data-src="https://st.blackyau.net/blog/23/14.jpg" alt="Wireshark 1"></p><p>再弹出的新窗口中查找</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">igmp://</span><br></pre></td></tr></table></figure><p>如果你找到了类似下图 <code>igmp://239.93.22.133:9260</code> 的连接，那么恭喜你，你的数据抓包已经完成了最重要的定位了！（不难发现旁边也获取到了 <code>rtsp://</code> 开头的时移地址）</p><p><img data-src="https://st.blackyau.net/blog/23/15.jpg" alt="Wireshark 2"></p><p>单击链接，主窗口就会自动定位到该请求。</p><p><img data-src="https://st.blackyau.net/blog/23/16.jpg" alt="Wireshark 3"></p><p>单击展开该请求的完整内容，查看里面的内容是不是含有 <code>igmp://</code> 之类的重要数据。</p><p><img data-src="https://st.blackyau.net/blog/23/17.jpg" alt="Wireshark 4"></p><p>右键 Line-based text data 导出分组字节流，随便取个名字保存到你能找到的地方。</p><p><img data-src="https://st.blackyau.net/blog/23/18.jpg" alt="Wireshark 5"></p><p>用 notepad++ 打开，查看是否显示正常(前几行都是回车，会一片空白，往后滑一点就能看到了)。</p><p><img data-src="https://st.blackyau.net/blog/23/19.jpg" alt="Wireshark 6"></p><h3 id="无法获取到地址"><a href="#无法获取到地址" class="headerlink" title="无法获取到地址"></a>无法获取到地址</h3><p>如果你在窗口中一个数据都未获取到，那么请检查数据镜像设置或网线位置是否有错。</p><p>如果你是四川省，请仔细检查是不是，在过滤的时候复制错了或漏了内容。</p><p>如果你非四川省，可以在过滤器中输入</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">http</span><br></pre></td></tr></table></figure><p>进行检索，一条一条的看，里面总会有 <code>igmp://</code> 之类获取节目单的数据（我也是这样找出来的），如果找不到你也可以向我求助 <a href="mailto:&#98;&#x6c;&#97;&#x63;&#x6b;&#121;&#x61;&#117;&#x34;&#x32;&#x36;&#64;&#x67;&#109;&#x61;&#105;&#x6c;&#x2e;&#99;&#x6f;&#x6d;">blackyau426@gmail.com</a></p><h3 id="格式化数据"><a href="#格式化数据" class="headerlink" title="格式化数据"></a>格式化数据</h3><p>开始格式化之前，建议保存好原始文件。</p><p>替换完毕后可以将名字带有 PIP 的删除，这是用于机顶盒画中画功能的，说白了就是降低了分辨率的，我们就留下正常的和高清的就行了。</p><h4 id="M3U8"><a href="#M3U8" class="headerlink" title="M3U8"></a>M3U8</h4><p>此格式文件可以在 PC 中直接使用 VLC media player 和 PotPlayer 打开并播放</p><p>查找目标</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">.*ChannelName=&quot;(.*)&quot;,UserChannelID=&quot;(.*)&quot;,ChannelURL=&quot;igmp://(.*)&quot;,TimeShift=.*</span><br></pre></td></tr></table></figure><p>替换为</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">#EXTINF:-1, \1\r\nhttp://192.168.10.1:8888/udp/\3</span><br></pre></td></tr></table></figure><p><img data-src="https://st.blackyau.net/blog/23/20.jpg" alt="M3U8 替换中"></p><p>将文档中被格式化了的数据，复制到新文档，并在文档首行写入</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">#EXTM3U</span><br></pre></td></tr></table></figure><p>处理完毕后效果如图，最后将文件另存为 .m3u8 即可。</p><p><img data-src="https://st.blackyau.net/blog/23/21.jpg" alt="M3U8 替换完毕"></p><h4 id="超级直播"><a href="#超级直播" class="headerlink" title="超级直播"></a>超级直播</h4><p>此文件可以在 Android 端的超级直播使用，电脑打开 在软件里面按返回时提示的网址，可以将自定义源上传至该软件。</p><p>查找目标</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">.*ChannelName=&quot;(.*)&quot;,UserChannelID=&quot;(.*)&quot;,ChannelURL=&quot;igmp://(.*)&quot;,TimeShift=.*</span><br></pre></td></tr></table></figure><p>替换为</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">\1,http://192.168.10.1:8888/udp/\3</span><br></pre></td></tr></table></figure><p>再次查找目标，删除空白行</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">\n\s*\r</span><br></pre></td></tr></table></figure><p>替换为空白</p><p>全选该文本所有内容后复制，在顶部 编码 - 编码字符集 - 中文 - GB2312，确认切换到该字符集。然后删除文本所有内容，并粘贴。最后将文件保存为 txt 即可。</p><blockquote><p>超级直播中文本编码格式必须为 GB2312 否则中文会乱码</p></blockquote><p><img data-src="https://st.blackyau.net/blog/23/22.jpg" alt="修改编码字符集"></p><p>处理完毕后效果如图</p><p><img data-src="https://st.blackyau.net/blog/23/23.jpg" alt="超级直播效果图"></p><h2 id="获取-IPTV-内网地址"><a href="#获取-IPTV-内网地址" class="headerlink" title="获取 IPTV 内网地址"></a>获取 IPTV 内网地址</h2><p>四川电信是 DHCP 获取，我在网上看很多地方都是 PPPOE 所以用户名和密码你们就需要自己翻翻 IPTV 盒子的设置拉~</p><p>这边也会使用到抓包的数据，应该就是前几个了，找到 <code>Dynamic Host Configuration Protocol (Request)</code> 请求，展开 <code>Option: (60) Vendor class identifier</code> 和<code>Option: (12) Host Name</code> 以及 <code>Client MAC address</code> 。都需要右键 - 复制 - 值 。</p><p><img data-src="https://st.blackyau.net/blog/23/24.jpg" alt="DHCP"></p><p>如果你没有找到 DHCP 的数据包，可以通过 IPTV 盒子底部的贴纸查看。我这款盒子，最后一个就是 <code>Option: (12) Host Name</code> 当然了 MAC 地址上面也有，而 <code>Option: (60) Vendor class identifier</code> 我已经在上图给你了，就是 <code>SCITV</code> 。</p><p><img data-src="https://st.blackyau.net/blog/23/25.jpg" alt="IPTV盒子"></p><p>接下来就开始路由器的设置了。</p><p>首先进入路由器设置 - 网络 - 交换机，将之前用于抓包的 数据包镜像 功能关掉。随后将插有 ITV 口的 LAN 4 在 VLAN 1 中设置为 <code>关</code> 。添加一个 VLAN 3 ，将 CPU (eth0) 设置为 <code>tagged</code> ，然后将 VLAN 3 的 LAN 4 设置为 <code>untagged</code> 。设置完毕后，效果如下图。</p><p><img data-src="https://st.blackyau.net/blog/23/26.jpg" alt="VLAN"></p><p>进入路由器设置 - 网络 - 接口 - 添加新接口。命名为 <code>IPTV</code> 注意全部大写，接口协议为 <code>DHCP 客户端</code> 包括接口 <code>VLAN:eth0.3</code> 。设置如下图。</p><p><img data-src="https://st.blackyau.net/blog/23/27.jpg" alt="添加新接口"></p><p>然后设置端口的 请求 DHCP 时发送的主机名 对应的就是之前获取的 <code>Option: (12) Host Name</code>，以及高级设置里面的 请求 DHCP 时发送的 Vendor Class 选项 也就是之前获取的 <code>Option: (60) Vendor class identifier</code> 即 <code>SCITV</code>，最后是 重设 MAC 地址 填入 <code>Client MAC address</code> 也就是你 IPTV 盒子的 MAC 地址。</p><p>还有不要勾选 使用内置的 IPv6 管理，使用网关跃点为 20 。</p><p><img data-src="https://st.blackyau.net/blog/23/28.jpg" alt="IPTV 接口设置"></p><p>保存并应用设置后，再进入你的 WAN 接口设置，将它的 网络跃点设置为 10，<strong>否则你会无法正常使用互联网</strong>。</p><p>进行到这里，你的 IPTV 接口应该就可以正常的获取到 <code>10</code> 开头的内网 IP 了。如果你不是四川的朋友，那么你地区的运营商可能是 PPPOE 验证或验证逻辑与我这里不同，如果你是四川的朋友，那么请你检查 之前抓包或者是在机顶盒上面看到的 <code>Option: (12) Host Name</code> 以及 <code>Option: (60) Vendor class identifier</code> 和 MAC 地址是否填写正确。如果你是通过抓包获取的数据，那应该不会有错，如果你是抄的机顶盒上面的，那么可能是因为你所在的地区与我的验证逻辑不同。</p><p>所以我强烈建议，还是通过抓包来分析 IPTV 盒子获得内网 IP 的全过程，因为不管你是 PPPOE 还是 DHCP 它都可以分析出来。</p><h2 id="配置-igmpproxy-和-udpxy"><a href="#配置-igmpproxy-和-udpxy" class="headerlink" title="配置 igmpproxy 和 udpxy"></a>配置 igmpproxy 和 udpxy</h2><h3 id="使用-SSH-连接到路由器"><a href="#使用-SSH-连接到路由器" class="headerlink" title="使用 SSH 连接到路由器"></a>使用 SSH 连接到路由器</h3><p>修改配置文件时需要使用 SSH 连接到路由器进行修改，进入路由器设置 - 系统 - 管理权，在接口 lan 下设置端口为 22，同时打开 密码验证和允许 root 用户凭密码登录。</p><p><img data-src="https://st.blackyau.net/blog/23/29.jpg" alt="路由器SSH设置"></p><p>下载 Xshell <a href="https://www.netsarang.com/zh/free-for-home-school/">https://www.netsarang.com/zh/free-for-home-school/</a> ，官网提供了免费的供家庭和学校使用的版本，足够本次教程所用。</p><p>新建连接，名称随意，主机填上路由器的 IP。点击左侧连接中的用户身份验证，将方法设置为 Password 用户名为 root 密码则为登录 Web 端后台时的密码，我提供的固件默认是 <code>password</code>。</p><p><img data-src="https://st.blackyau.net/blog/23/30.jpg" alt="Xshell"></p><h3 id="安装-igmpproxy-和-udpxy"><a href="#安装-igmpproxy-和-udpxy" class="headerlink" title="安装 igmpproxy 和 udpxy"></a>安装 igmpproxy 和 udpxy</h3><blockquote><p>如果你的路由器使用的我提供的固件则无需安装，因为固件是自带该软件包的。</p></blockquote><p>我建议在安装之前，在 Web 端后台的系统 - 备份&#x2F;升级 中备份当前配置文件。因为我尝试了多个固件，在安装了 udpxy 后 Web 端就会无法正常使用，有很多报错。只有恢复到出厂设置才恢复正常。最后找到了一个自带 udpxy 的固件才解决我的问题。</p><p>使用 Xshell 连接到路由器后执行以下命令。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">opkg update &amp;&amp; opkg install igmpproxy udpxy luci-app-udpxy</span><br></pre></td></tr></table></figure><p><code>opkg update</code> 是用来更新软件列表的，因为大陆对 OpenWrt 软件源地址连通性不佳，所以可能需要等很久或者是多次尝试。</p><p>查看命令返回的结果或查看系统 - 软件包中的已安装软件包中是否存在 <code>igmpproxy</code> <code>udpxy</code> <code>luci-app-udpxy</code> 来判断是否安装成功。</p><h3 id="配置-igmpproxy"><a href="#配置-igmpproxy" class="headerlink" title="配置 igmpproxy"></a>配置 igmpproxy</h3><p>关于 igmpproxy 它主要是将所有来自 lan 的 IGMP 数据都传到 IPTV 接口去，为了防止组播的 udp 数据在 lan 里面乱串，影响网络效率。但是我这里在 lan 里面是无法播放 <code>igmp://</code> 地址的数据的，我也不清楚是什么情况。而且从 <a href="https://github.com/coolsnowwolf/lede/issues/2841">lede issues</a> 可知 lede 的 igmpproxy 是失效的，如果有人在 lan 里面观看组播地址视频或者是使用 IPTV 盒子，都会导致局域网内的组播风暴，会导致网络堵塞。所以主要是后面的 udpxy 在起作用，<strong>你完全可以不配置 igmpproxy 使用 http 地址播放依然是可行的</strong>。</p><p>执行以下命令，一定要复制全一起粘贴进去然后再回车执行。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">echo &quot;config igmpproxy</span><br><span class="line">option quickleave 1</span><br><span class="line"></span><br><span class="line">config phyint</span><br><span class="line">option network IPTV</span><br><span class="line">option direction upstream</span><br><span class="line">list altnet 0.0.0.0/0</span><br><span class="line"></span><br><span class="line">config phyint</span><br><span class="line">option network lan</span><br><span class="line">option zone lan</span><br><span class="line">option direction downstream&quot; &gt; /etc/config/igmpproxy</span><br></pre></td></tr></table></figure><h3 id="配置-udpxy"><a href="#配置-udpxy" class="headerlink" title="配置 udpxy"></a>配置 udpxy</h3><p>在路由器 Web 端设置 - 服务 - udpxy 中，勾选启动、Respawn、状态。将端口设置为 <code>8888</code>，将 Source IP&#x2F;Interface 设置为 IPTV 接口的 ifname，也就是在路由器 Web 端设置 - 网络 - 接口 中 IPTV 接口图标下方的小字。在我这里为 <code>eth0.3</code> 。</p><p><img data-src="https://st.blackyau.net/blog/23/31.jpg" alt="udpxy"></p><p>如果你有多设备同时播放的需求，那么请根据情况设置 <code>Max clients</code> 选项的值，它可以控制同时播放的终端数，该值默认为 3 ，最大可为 5000 。</p><blockquote><p>感谢 <a href="https://www.right.com.cn/forum/thread-2921260-2-1.html">@xujuntc</a> 的提醒。</p></blockquote><p>udpxy 配置项介绍</p><table><thead><tr><th>配置项</th><th>配置文件</th><th>中文</th><th>说明</th><th>默认</th></tr></thead><tbody><tr><td>disabled</td><td>无</td><td>启动</td><td>启动 udpxy</td><td>关闭</td></tr><tr><td>Respawn</td><td>respawn</td><td>重启</td><td>允许在 status 中重启 udpxy</td><td>关闭</td></tr><tr><td>verbose</td><td>verbose</td><td>详细</td><td>启用详细日志输出</td><td>关闭</td></tr><tr><td>status</td><td>status</td><td>状态</td><td>启用 Web 端统计信息</td><td>关闭</td></tr><tr><td>Bind IP&#x2F;Interface</td><td>source</td><td>监听地址&#x2F;接口</td><td>要监听的地址或接口</td><td>0.0.0.0</td></tr><tr><td>Port</td><td>port</td><td>端口</td><td>监听端口</td><td>必填</td></tr><tr><td>Source IP&#x2F;Interface</td><td>source</td><td>源 IP&#x2F;端口</td><td>组播数据源的 IP 或端口</td><td>0.0.0.0</td></tr><tr><td>Max clients</td><td>max_clients</td><td>最大客户端数</td><td>同时播放的终端数</td><td>3(最大可设为5000)</td></tr><tr><td>Log file</td><td>log_file</td><td>日志目录</td><td>日志输出的目录</td><td>stderr(即打印在终端)</td></tr><tr><td>Buffer size</td><td>buffer_size</td><td>缓冲大小</td><td>组播数据入站的缓冲区大小</td><td>2048 bytes(可选 <code>65536</code>, <code>32Kb</code>, <code>1Mb</code>)</td></tr><tr><td>Buffer messages</td><td>buffer_messages</td><td>缓冲信息</td><td>向组播组请求多少数据并储存起来(单位:秒)</td><td>1</td></tr><tr><td>Buffer time</td><td>buffer_time</td><td>缓存保存时间</td><td>数据可在缓冲区内保存的最长时间</td><td>1</td></tr><tr><td>Nice increment</td><td>nice_increment</td><td>未知</td><td>未知</td><td>0</td></tr><tr><td>Multicast subscription renew</td><td>mcsub_renew</td><td>定期重新加入组播组</td><td>每隔一段时间重新加入组播组,防止网络波动导致丢失组播连接(单位:秒)</td><td>0</td></tr></tbody></table><blockquote><p>在路由器设置的 Web 端不知为什么后面 4 个选项，保存时都说值有误。如果有需要修改的朋友，只有手动修改 <code>/etc/config/udpxy</code> 配置文件了。下面是我的配置，可供参考。</p></blockquote><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">config udpxy</span><br><span class="line">        option respawn &#x27;1&#x27;</span><br><span class="line">        option verbose &#x27;0&#x27;</span><br><span class="line">        option status &#x27;1&#x27;</span><br><span class="line">        option disabled &#x27;0&#x27;</span><br><span class="line">        option port &#x27;8888&#x27;</span><br><span class="line">        option source &#x27;eth0.3&#x27;</span><br><span class="line">        option buffer_size &#x27;2097152&#x27;</span><br><span class="line"></span><br></pre></td></tr></table></figure><p>保存并应用后，打开 http:&#x2F;&#x2F;路由器IP:8888&#x2F;status 查看 udpxy 运行是否正常。当你在播放视频的时候，这个页面也会显示正在播放客户端的 IP 与它的实时流量。</p><p><img data-src="https://st.blackyau.net/blog/23/32.jpg" alt="udpxy status"></p><p>然后你就可以在 PotPlayer 和 VLC media player 播放之前处理好的连接了，可以直接打开 M3U8 播放列表，也可以播放一个单独的地址。</p><p>例如你获取的地址为 <code>igmp://239.93.22.6:6666</code></p><p>那么使用 udpxy 转换后的地址为 <code>http://192.168.10.1:8888/udp/239.93.22.6:6666</code></p><p>如果你仍然无法播放，请将下面的防火墙规则添加进 <code>/etc/config/firewall</code></p><p>如果你会使用 vim 那么直接在 Xshell 里面修改即可，如果你不会可以在 Xshell 窗口中点击  新建文本传输（Ctrl+Alt+F），将该文本下载到本地使用 notepad++ 进行修改，再上传上去。请注意你的防火墙配置可能已经存在，请你仔细的排查每一个设置项。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line">config rule</span><br><span class="line">        option name &#x27;Allow-IGMP&#x27;</span><br><span class="line">        option target &#x27;ACCEPT&#x27;</span><br><span class="line">        option family &#x27;ipv4&#x27;</span><br><span class="line">        option src &#x27;iptv&#x27;</span><br><span class="line">        option proto &#x27;IGMP&#x27;</span><br><span class="line"></span><br><span class="line">config rule</span><br><span class="line">        option name &#x27;Allow-UDP-udpxy&#x27;</span><br><span class="line">        option target &#x27;ACCEPT&#x27;</span><br><span class="line">        option src &#x27;iptv&#x27;</span><br><span class="line">        option proto &#x27;udp&#x27;</span><br><span class="line">        option dest_ip &#x27;224.0.0.0/4&#x27;</span><br><span class="line"></span><br><span class="line">config rule</span><br><span class="line">        option name &#x27;Allow-UDP-igmpproxy&#x27;</span><br><span class="line">        option target &#x27;ACCEPT&#x27;</span><br><span class="line">        option family &#x27;ipv4&#x27;</span><br><span class="line">        option src &#x27;iptv&#x27;</span><br><span class="line">        option proto &#x27;udp&#x27;</span><br><span class="line">        option dest &#x27;lan&#x27;</span><br><span class="line">        option dest_ip &#x27;224.0.0.0/4&#x27;</span><br></pre></td></tr></table></figure><p>最后，为了防止路由器断电后 udpxy 没有自动启动，还需要使用 hotplug 功能，在 IPTV 接口拨号成功后都检测 udpxy 是否正常工作，如果没有就启动一下 udpxy</p><p>在 Xshell 中使用 SSH 连接到路由器，新建脚本文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vim /etc/hotplug.d/iface/100-udpxy</span><br></pre></td></tr></table></figure><p>将以下内容全文粘贴进去</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_">#</span><span class="language-bash">!/bin/sh</span></span><br><span class="line"></span><br><span class="line">if [ &quot;$&#123;INTERFACE&#125;&quot; = &quot;IPTV&quot; ]; then # IPTV status change</span><br><span class="line">        if [ &quot;$&#123;ACTION&#125;&quot; = &quot;ifup&quot; ]; then # Interface up</span><br><span class="line">                flag=$(ps | grep udpxy | grep -v &quot;grep&quot; | wc -l)</span><br><span class="line">                if [ $flag = &quot;1&quot;]; then</span><br><span class="line">                        logger -t udpxy -s &quot;udpxy Running&quot;</span><br><span class="line">                else</span><br><span class="line">                        /etc/init.d/udpxy start</span><br><span class="line">                fi</span><br><span class="line">        fi</span><br><span class="line">fi</span><br><span class="line"></span><br></pre></td></tr></table></figure><p>然后使用 <code>:wq</code> 保存文件即可</p><h2 id="如何在外播放家中-IPTV-源"><a href="#如何在外播放家中-IPTV-源" class="headerlink" title="如何在外播放家中 IPTV 源"></a>如何在外播放家中 IPTV 源</h2><p>首先需要公网 IP，你可以在 在路由器 Web 端设置 - 网络 - 接口中，查看 WAN 获得的 IP 是否与你在 <a href="https://ip.sb/">https://ip.sb/</a> 看到的 IP 一致。如果不一致的话，可以向电信人工客服反映「我需要公网 IP」即可。</p><p>首先你需要在 udpxy 配置中将 <code>Bind IP/Interface</code> 配置为 <code>0.0.0.0</code> 。</p><p>在路由器 Web 端设置 - 网络 - 防火墙 - 端口转发 中，添加协议为 tcp，外部区域为 wan，外部端口为 8888，内部 IP 地址为 192.168.10.1，内部端口为 8888 的规则即可。</p><p><img data-src="https://st.blackyau.net/blog/23/33.jpg" alt="端口转发"></p><p>那么你在外要播放的话，只需要把路由器的 IP 地址换为你的公网 IP 即可。</p><p>例如你的本地播放地址为 <code>http://192.168.10.1:8888/udp/239.93.22.6:6666</code></p><p>那么当你的公网 IP 为 <code>125.60.90.40</code> 时</p><p>你的互联网播放地址则为 <code>http://125.60.90.40:8888/udp/239.93.22.6:6666</code></p><p>因为公网 IP 都在变，你可以使用 DDNS 也就是 动态 DNS 使用域名来访问，你可以使用路由器内自带的服务商。如果你和我一样将域名放置于 DNSPod 管理，也可以使用我制作的 <a href="https://github.com/blackyau/DdnsWithDnspod">DdnsWithDnspod</a> 使用一个子域名来专供 IPTV 的播放。</p><h2 id="结语"><a href="#结语" class="headerlink" title="结语"></a>结语</h2><p>首先非常感谢各位前辈，我也是通过阅读现有的教程总结出来的。本文用了接近 5000 字，详细的介绍了有关 IPTV 与互联网的融合，希望能够对需要的朋友有帮助。因为本人能力有限，文中难免有一些问题也希望有发现的朋友能够及时的指出，我将感激不尽。</p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><p><a href="https://github.com/coolsnowwolf/lede">GitHub@coolsnowwolf - Lean’s OpenWrt source</a></p><p><a href="https://www.right.com.cn/forum/thread-248400-1-1.html">恩山无线论坛@鲲翔 - IPTV融合进普通网络一般步骤</a></p><p><a href="https://www.right.com.cn/forum/thread-341748-1-1.html">恩山无线论坛@footlog - K2P&#x2F;K2 padavan双线接入，宽带+IPTV，udpxy+xupnpd详细设置</a></p><p><a href="https://www.right.com.cn/FORUM/thread-332086-1-1.html">恩山无线论坛@lcsuper - 小白的IPTV折腾教程(3)—双网融合、IPTV共享</a></p><p><a href="https://www.right.com.cn/forum/thread-508076-1-1.html">恩山无线论坛@kangtao022 - 最新四川南充电信IPTV组播地址，及整理出地址列表的方法！</a></p><p><a href="https://www.right.com.cn/forum/thread-596583-1-1.html">恩山无线论坛@橙子_MAX - 【附固件】全网首发，新三OpenWRT路由器IPTV内网融合视频教程</a></p><p><a href="https://www.maxlicheng.com/openwrt/113.html">橙子的个人博客 - IPTV内网融合，实现任意设备观看IPTV</a></p><p><a href="https://www.right.com.cn/FORUM/thread-421870-1-1.html">恩山无线论坛@angelkyo - 四川电信DHCP抓包能获取到IP，但是抓不到option60信息</a></p><p><a href="https://www.right.com.cn/FORUM/thread-329888-1-1.html">恩山无线论坛@wengmingao - 简单的的IPTV 0成本抓包！</a></p><p><a href="https://www.right.com.cn/forum/thread-2304980-1-1.html">恩山无线论坛@莫问归期 - 在openwrt里安装udpxy后主题界面就会乱</a></p><p><a href="https://www.right.com.cn/forum/thread-344825-1-1.html">恩山无线论坛@xtwz - OpenWrt 编译 LuCI -&gt; Applications 添加插件应用说明-L大</a></p><p><a href="https://www.right.com.cn/forum/thread-1237348-1-1.html">恩山无线论坛@happyzhang - OpenWrt入门编译 make menuconfig配置参考说明与自动生成脚本</a></p><p><a href="https://github.com/MitchellJo/DdnsWithDnspod">Github@MitchellJo - DdnsWithDnspod</a></p><p><a href="https://www.right.com.cn/forum/thread-52052-1-1.html">恩山无线论坛@hayate - ppp拨号后自动运行脚本的问题</a></p><p><a href="http://demon.tw/hardware/openwrt-hotplug.html">Demon’s Blog@Demon - OpenWrt中的Hotplug脚本</a></p><p><a href="https://kevin0304.pixnet.net/blog/post/227990189">痞客邦@Kai-Cho - [OpenWRT] hotplug</a></p><p><a href="https://www.right.com.cn/forum/thread-107357-1-1.html">恩山无线论坛@ghostry - hotplug的iface脚本不执行怎么办?</a></p><p><a href="https://openwrt.org/docs/guide-user/base-system/hotplug">OpenWrt Documentation - Hotplug</a></p><p><a href="https://openwrt.org/docs/guide-user/services/proxy/udpxy">OpenWrt Documentation - udpxy</a></p><p><a href="https://3mile.github.io/archives/2019/1106111732/">3mile博客 - 编译UDPXY新版本</a></p><p><a href="https://github.com/coolsnowwolf/lede/issues/1075">Github@xingsiyue - udpxy无法保存缓存大小等参数</a></p><p><a href="https://github.com/openwrt/luci/issues/1494">Github@jow- - luci-app-udpxy bug</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;因为家里弱电箱到客厅电视只有一条线，在原有情况下无法做到 IPTV 和互联网在同一线路上通过。本教程不仅可以解决单线路同时播放互联网软件节目，还可以让任意设备播放 IPTV 节目。&lt;/p&gt;</summary>
    
    
    
    <category term="教程" scheme="https://blackyau.cc/categories/%E6%95%99%E7%A8%8B/"/>
    
    
    <category term="IPTV" scheme="https://blackyau.cc/tags/IPTV/"/>
    
    <category term="OpenWrt" scheme="https://blackyau.cc/tags/OpenWrt/"/>
    
    <category term="Lean" scheme="https://blackyau.cc/tags/Lean/"/>
    
    <category term="udpxy" scheme="https://blackyau.cc/tags/udpxy/"/>
    
    <category term="IGMP" scheme="https://blackyau.cc/tags/IGMP/"/>
    
    <category term="RTSP" scheme="https://blackyau.cc/tags/RTSP/"/>
    
  </entry>
  
  <entry>
    <title>Linux LVM 逻辑卷扩容</title>
    <link href="https://blackyau.cc/22"/>
    <id>https://blackyau.cc/22</id>
    <published>2019-08-09T03:00:00.000Z</published>
    <updated>2019-08-09T06:13:08.000Z</updated>
    
    <content type="html"><![CDATA[<p>接上文 <a href="/10" title="真机安装 CentOS 7">真机安装 CentOS 7</a> 在安装完毕后，我又给它加了一次硬盘。得益于安装的时候使用了 LVM (Logical Volume Manager) 可以实现不关机扩展当前分区。</p><span id="more"></span><h2 id="LVM-介绍"><a href="#LVM-介绍" class="headerlink" title="LVM 介绍"></a>LVM 介绍</h2><p><img data-src="https://st.blackyau.net/blog/22/Lvm.svg" alt="Lvm"></p><blockquote><p>图片来自 <a href="https://commons.wikimedia.org/wiki/File:Lvm.svg#/media/File:Lvm.svg">Wikimedia Logical Volume Manager (Linux)</a></p></blockquote><ul><li>PV：物理卷，PV处于LVM系统最低层，它可以是整个硬盘，或者与磁盘分区具有相同功能的设备（如RAID），但和基本的物理存储介质相比较，多了与LVM相关管理参数</li><li>VG：卷组，创建在PV之上，由一个或多个PV组成，可以在VG上创建一个或多个“LVM分区”（逻辑卷），功能类似非LVM系统的物理硬盘</li><li>LV：逻辑卷，从VG中分割出的一块空间，创建之后其大小可以伸缩，在LV上可以创建文件系统（如&#x2F;var,&#x2F;home）</li><li>PE：物理区域，每一个PV被划分为基本单元（也被称为PE），具有唯一编号的PE是可以被LVM寻址的最小存储单元，默认为4MB</li></ul><p>本次扩容就是将设备转换为<code>物理卷</code>，然后将其加入<code>卷组</code>，最后再将<code>卷组</code>的剩余空间划分给<code>逻辑卷</code></p><h2 id="安装硬盘"><a href="#安装硬盘" class="headerlink" title="安装硬盘"></a>安装硬盘</h2><p>这一步没啥可说的了吧，直接把硬盘接上主板就行了</p><h2 id="系统环境"><a href="#系统环境" class="headerlink" title="系统环境"></a>系统环境</h2><table><thead><tr><th>设备名</th><th>大小</th><th>状态</th><th>所属卷组名</th></tr></thead><tbody><tr><td>&#x2F;dev&#x2F;sdc</td><td>500.1 GB</td><td>新加盘</td><td>无</td></tr><tr><td>&#x2F;dev&#x2F;sdb</td><td>3000.6 GB</td><td>原有数据盘</td><td>all</td></tr><tr><td>&#x2F;dev&#x2F;sda</td><td>500.1 GB</td><td>原有数据盘</td><td>all</td></tr><tr><td>&#x2F;dev&#x2F;sdd</td><td>320.1 GB</td><td>系统盘+数据盘</td><td>centos all</td></tr></tbody></table><p>这里我会将 <code>/dev/sdc</code> 划分进 <code>all</code> 卷组中以扩大它的可用容量</p><p>查询磁盘空间信息</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">df -H</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">Filesystem               Size  Used Avail Use% Mounted on</span><br><span class="line">/dev/mapper/centos-root   54G  2.1G   52G   4% /</span><br><span class="line">devtmpfs                 889M     0  889M   0% /dev</span><br><span class="line">tmpfs                    902M     0  902M   0% /dev/shm</span><br><span class="line">tmpfs                    902M   18M  884M   2% /run</span><br><span class="line">tmpfs                    902M     0  902M   0% /sys/fs/cgroup</span><br><span class="line">/dev/mapper/all-home     3.8T  3.6T  252G  94% /home</span><br><span class="line">/dev/sdd1                1.1G  143M  811M  15% /boot</span><br><span class="line">tmpfs                    181M     0  181M   0% /run/user/1000</span><br></pre></td></tr></table></figure><p>查询物理卷信息</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo pvs</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">PV         VG     Fmt  Attr PSize    PFree</span><br><span class="line">/dev/sda1  all    lvm2 a--  &lt;465.76g    0 </span><br><span class="line">/dev/sdb1  all    lvm2 a--    &lt;2.73t    0 </span><br><span class="line">/dev/sdd2  all    lvm2 a--  &lt;245.08g    0 </span><br><span class="line">/dev/sdd3  centos lvm2 a--    52.00g 4.00m</span><br></pre></td></tr></table></figure><p>查询硬盘分区表情况</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo fdisk -l</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br></pre></td><td class="code"><pre><span class="line">Disk /dev/sdc: 500.1 GB, 500107862016 bytes, 976773168 sectors</span><br><span class="line">Units = sectors of 1 * 512 = 512 bytes</span><br><span class="line">Sector size (logical/physical): 512 bytes / 4096 bytes</span><br><span class="line">I/O size (minimum/optimal): 4096 bytes / 4096 bytes</span><br><span class="line">Disk label type: dos</span><br><span class="line">Disk identifier: 0x95fe95fe</span><br><span class="line"></span><br><span class="line">   Device Boot      Start         End      Blocks   Id  System</span><br><span class="line">/dev/sdc1   *        4096   976773119   488384512    7  HPFS/NTFS/exFAT</span><br><span class="line"></span><br><span class="line">Disk /dev/sdd: 320.1 GB, 320072933376 bytes, 625142448 sectors</span><br><span class="line">Units = sectors of 1 * 512 = 512 bytes</span><br><span class="line">Sector size (logical/physical): 512 bytes / 512 bytes</span><br><span class="line">I/O size (minimum/optimal): 512 bytes / 512 bytes</span><br><span class="line">Disk label type: dos</span><br><span class="line">Disk identifier: 0x00084660</span><br><span class="line"></span><br><span class="line">   Device Boot      Start         End      Blocks   Id  System</span><br><span class="line">/dev/sdd1   *        2048     2099199     1048576   83  Linux</span><br><span class="line">/dev/sdd2         2099200   516073471   256987136   8e  Linux LVM</span><br><span class="line">/dev/sdd3       516073472   625141759    54534144   8e  Linux LVM</span><br><span class="line">WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.</span><br><span class="line"></span><br><span class="line">Disk /dev/sdb: 3000.6 GB, 3000592982016 bytes, 5860533168 sectors</span><br><span class="line">Units = sectors of 1 * 512 = 512 bytes</span><br><span class="line">Sector size (logical/physical): 512 bytes / 4096 bytes</span><br><span class="line">I/O size (minimum/optimal): 4096 bytes / 4096 bytes</span><br><span class="line">Disk label type: gpt</span><br><span class="line">Disk identifier: BDF3367A-EECA-4A93-BE0B-10D867E87AF1</span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">        Start          End    Size  Type            Name</span></span><br><span class="line"> 1         2048   5860532223    2.7T  Linux LVM       </span><br><span class="line"></span><br><span class="line">Disk /dev/sda: 500.1 GB, 500107862016 bytes, 976773168 sectors</span><br><span class="line">Units = sectors of 1 * 512 = 512 bytes</span><br><span class="line">Sector size (logical/physical): 512 bytes / 4096 bytes</span><br><span class="line">I/O size (minimum/optimal): 4096 bytes / 4096 bytes</span><br><span class="line">Disk label type: dos</span><br><span class="line">Disk identifier: 0x000c3c53</span><br><span class="line"></span><br><span class="line">   Device Boot      Start         End      Blocks   Id  System</span><br><span class="line">/dev/sda1            2048   976773119   488385536   8e  Linux LVM</span><br><span class="line"></span><br><span class="line">Disk /dev/mapper/centos-root: 53.7 GB, 53687091200 bytes, 104857600 sectors</span><br><span class="line">Units = sectors of 1 * 512 = 512 bytes</span><br><span class="line">Sector size (logical/physical): 512 bytes / 512 bytes</span><br><span class="line">I/O size (minimum/optimal): 512 bytes / 512 bytes</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">Disk /dev/mapper/centos-swap: 2147 MB, 2147483648 bytes, 4194304 sectors</span><br><span class="line">Units = sectors of 1 * 512 = 512 bytes</span><br><span class="line">Sector size (logical/physical): 512 bytes / 512 bytes</span><br><span class="line">I/O size (minimum/optimal): 512 bytes / 512 bytes</span><br><span class="line"></span><br><span class="line"></span><br><span class="line">Disk /dev/mapper/all-home: 3763.8 GB, 3763842580480 bytes, 7351255040 sectors</span><br><span class="line">Units = sectors of 1 * 512 = 512 bytes</span><br><span class="line">Sector size (logical/physical): 512 bytes / 4096 bytes</span><br><span class="line">I/O size (minimum/optimal): 4096 bytes / 4096 bytes</span><br></pre></td></tr></table></figure><h2 id="开始扩展"><a href="#开始扩展" class="headerlink" title="开始扩展"></a>开始扩展</h2><p>首先将 <code>/dev/sdc1</code> 转换为物理卷(PV)</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo pvcreate /dev/sdc1 /dev/sde</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">WARNING: ext4 signature detected on /dev/sdc1 at offset 1080. Wipe it? [y/n]: y</span><br><span class="line">  Wiping ext4 signature on /dev/sdc1.</span><br><span class="line">  Physical volume &quot;/dev/sdc1&quot; successfully created.</span><br></pre></td></tr></table></figure><p>查看物理卷(PV)信息</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo pvs</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">PV         VG     Fmt  Attr PSize    PFree   </span><br><span class="line">/dev/sda1  all    lvm2 a--  &lt;465.76g       0 </span><br><span class="line">/dev/sdb1  all    lvm2 a--    &lt;2.73t       0 </span><br><span class="line">/dev/sdc1         lvm2 ---  &lt;465.76g &lt;465.76g</span><br><span class="line">/dev/sdd2  all    lvm2 a--  &lt;245.08g       0 </span><br><span class="line">/dev/sdd3  centos lvm2 a--    52.00g    4.00m</span><br></pre></td></tr></table></figure><p>将物理卷加入卷组(VG) <code>all</code></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo vgextend all /dev/sdc1</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">Volume group &quot;all&quot; successfully extended</span><br></pre></td></tr></table></figure><p>查看物理卷(PV)信息</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo pvs</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">PV         VG     Fmt  Attr PSize    PFree   </span><br><span class="line">/dev/sda1  all    lvm2 a--  &lt;465.76g       0 </span><br><span class="line">/dev/sdb1  all    lvm2 a--    &lt;2.73t       0 </span><br><span class="line">/dev/sdc1  all    lvm2 a--  &lt;465.76g &lt;465.76g</span><br><span class="line">/dev/sdd2  all    lvm2 a--  &lt;245.08g       0 </span><br><span class="line">/dev/sdd3  centos lvm2 a--    52.00g    4.00m</span><br></pre></td></tr></table></figure><p>查看逻辑卷(LV)信息</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo lvs</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">LV   VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert</span><br><span class="line">home all    -wi-ao----  3.42t                                                    </span><br><span class="line">root centos -wi-ao---- 50.00g                                                    </span><br><span class="line">swap centos -wi-ao----  2.00g  </span><br></pre></td></tr></table></figure><p>看起来容量并没有变大，这是因为还需要将卷组中的空闲空间扩展到 <code>/home</code> 中</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo lvextend -L +465.75g /dev/mapper/all-home</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">Size of logical volume all/home changed from 3.42 TiB (897370 extents) to &lt;3.88 TiB (1016602 extents).</span><br><span class="line">Logical volume all/home successfully resized.</span><br></pre></td></tr></table></figure><p>查看逻辑卷信息(LV)，可以发现空间已经变大了</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo lvs</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">LV   VG     Attr       LSize  Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert</span><br><span class="line">home all    -wi-ao---- &lt;3.88t                                                    </span><br><span class="line">root centos -wi-ao---- 50.00g                                                    </span><br><span class="line">swap centos -wi-ao----  2.00g   </span><br></pre></td></tr></table></figure><p>查看物理卷(PV)信息，可以发现物理卷的可用空间已经变小了</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sudo pvs</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">PV         VG     Fmt  Attr PSize    PFree</span><br><span class="line">/dev/sda1  all    lvm2 a--  &lt;465.76g    0 </span><br><span class="line">/dev/sdb1  all    lvm2 a--    &lt;2.73t    0 </span><br><span class="line">/dev/sdc1  all    lvm2 a--  &lt;465.76g 8.00m</span><br><span class="line">/dev/sdd2  all    lvm2 a--  &lt;245.08g    0 </span><br><span class="line">/dev/sdd3  centos lvm2 a--    52.00g 4.00m</span><br></pre></td></tr></table></figure><p>使扩容生效 <code>xfs_groufs</code> 针对 xfs文件系统</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">mount</span><br><span class="line">sudo xfs_growfs /dev/mapper/all-home</span><br></pre></td></tr></table></figure><p>查看剩余空间信息</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">df -h</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">Filesystem               Size  Used Avail Use% Mounted on</span><br><span class="line">/dev/mapper/centos-root   50G  2.0G   49G   4% /</span><br><span class="line">devtmpfs                 848M     0  848M   0% /dev</span><br><span class="line">tmpfs                    860M     0  860M   0% /dev/shm</span><br><span class="line">tmpfs                    860M   17M  843M   2% /run</span><br><span class="line">tmpfs                    860M     0  860M   0% /sys/fs/cgroup</span><br><span class="line">/dev/mapper/all-home     3.9T  3.2T  701G  83% /home</span><br><span class="line">/dev/sdd1                976M  136M  774M  15% /boot</span><br><span class="line">tmpfs                    172M     0  172M   0% /run/user/1000</span><br></pre></td></tr></table></figure><p><code>/home</code> 空间变大了0.4T,扩容成功</p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><p><a href="https://blog.csdn.net/yaofengyaofeng/article/details/82353282">https://blog.csdn.net/yaofengyaofeng/article/details/82353282</a></p><p><a href="https://blog.csdn.net/u012439646/article/details/73380197">https://blog.csdn.net/u012439646/article/details/73380197</a></p><p><a href="http://lzw.me/a/linux-lvm.html">http://lzw.me/a/linux-lvm.html</a></p><p><a href="https://blog.csdn.net/youjin/article/details/79137203">https://blog.csdn.net/youjin/article/details/79137203</a></p><p><a href="https://blog.csdn.net/qq_27281257/article/details/81603410">https://blog.csdn.net/qq_27281257/article/details/81603410</a></p><p><a href="https://blog.csdn.net/weixin_42350212/article/details/80570211">https://blog.csdn.net/weixin_42350212/article/details/80570211</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;接上文 &lt;a href=&quot;/10&quot; title=&quot;真机安装 CentOS 7&quot;&gt;真机安装 CentOS 7&lt;/a&gt; 在安装完毕后，我又给它加了一次硬盘。得益于安装的时候使用了 LVM (Logical Volume Manager) 可以实现不关机扩展当前分区。&lt;/p&gt;</summary>
    
    
    
    <category term="教程" scheme="https://blackyau.cc/categories/%E6%95%99%E7%A8%8B/"/>
    
    
    <category term="CentOS" scheme="https://blackyau.cc/tags/CentOS/"/>
    
    <category term="Linux" scheme="https://blackyau.cc/tags/Linux/"/>
    
    <category term="LVM" scheme="https://blackyau.cc/tags/LVM/"/>
    
  </entry>
  
  <entry>
    <title>微信小程序开发记录</title>
    <link href="https://blackyau.cc/21"/>
    <id>https://blackyau.cc/21</id>
    <published>2019-07-24T07:18:02.000Z</published>
    <updated>2019-07-24T09:17:40.000Z</updated>
    
    <content type="html"><![CDATA[<p>刚刚完成了一个微信小程序的开发，从开发到发布线上版本体验了一起全过程。在这里也记录一下在开发过程中的一些想法和心得。</p><span id="more"></span><h2 id="项目介绍"><a href="#项目介绍" class="headerlink" title="项目介绍"></a>项目介绍</h2><p>这次开发的是一个公交的实时位置查询工具，当时的灵感来自我使用 APP 的时候。因为坐车这个事儿，其实每天也就是坐那几条线不会有太多变化，但是那 APP 每次都需要手动输入线路点击一下查询。而且 APP 的冷启动速度那也是相当的感人，从点击到出现首页广告都需要 4s 左右。然后就想自己来试试能不能解决这个问题，用抓包软件看了看这软件也是从 API 获取信息，而且调用 API 也没有经过鉴权传入的参数也都是固定的几个。</p><p>本来在这之前，一直都想体验一下 微信小程序 的开发。因为我平时在使用的时候，给我的体验还是挺不错的。启动速度也不赖，而且不用单独的下载其他 APP 。还可以轻松的做到跨平台，对于轻度的工具类应用来说，这个平台看起来是那么的美好。在加上和友人在聊天的时候，也说到了现在的 APP 对于普通人来说确实太「重」了，所以也就更加坚定了这次开发要使用微信小程序。</p><h2 id="可行性分析"><a href="#可行性分析" class="headerlink" title="可行性分析"></a>可行性分析</h2><p>首先分析抓包的数据，它到底需要传给它什么信息，同时又会传回什么数据。简单的用 <code>Postman</code> 试了试，发现所有的 API 传入的数据都只有 3 个。</p><ul><li>线路唯一标识</li><li>上下行状态</li><li>城市代码</li></ul><p>用 <code>Python</code> 的 <code>requests</code> 简单的尝试了一下，数据还是挺好抓取下来的。然后就开始咸鱼了十几天，等到7月15号的时候才开始认认真真的开始写代码。</p><h2 id="网络请求"><a href="#网络请求" class="headerlink" title="网络请求"></a>网络请求</h2><p>微信小程序的网络请求这方面，都是有提供 API 可以直接调用。我这里因为是第一次用 <code>JavaScript</code> 所以还是淌了不少浑水。</p><p>刚开始的时候最困惑我的就是，它不能二次调用经过传值了一次的 <code>request Object</code>。因为公交查询 API 这里，它的站点信息和车辆当前的运行状态是两个单独的，虽然车辆当前运行状态也有站台的标识，但是我做不到把他们关联起来。所以我在二次调用的时候，总是会提示错误。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">Cannot read property &#x27;xxx&#x27; of undefined;at api request success callback function</span><br><span class="line">TypeError: Cannot read property &#x27;xxx&#x27; of undefined</span><br></pre></td></tr></table></figure><p>真的让人挺头疼的，到了后面我还是使用的全局变量解决这个问题了的。每次都把获取到的数据放进全局变量，然后其他 <code>function</code> 在调用的时候也直接从全局变量里面取值。每次从全局变量里面取值都还挺讲究，每次都要写 <code>this.data.xxx</code> 。不知道这是 <code>JavaScript</code> 在作妖，还是微信小程序在作妖。用不同的名字区分不好吗，非要弄个 <code>this</code> 怪怪的。</p><h2 id="逻辑层"><a href="#逻辑层" class="headerlink" title="逻辑层"></a>逻辑层</h2><p>做逻辑层的前端数据展示花了挺多时间，对于静态数据的处理还是可以轻松的应对。但是一方面自己又懒，不想为每个线路都写一个不同的页面去做展示（后面想想好恐怖，这样搞的话不知道冗余的代码量得有多少），后面还是使用微信提供的 API 来解决这个问题。也就是给点击这个方法传参，然后将参数拼接成带有完整数据的 url 在交给线路查询页进行显示。</p><p>还有点击切换按钮切换线路上下行的时候，我也是发现 API 有一些怪怪的问题。有个别环线它的起点和终点是一摸一样的，所以切换线路的时候不只是上下行这个值有变化，同时线路的唯一标识也在变化，当然线路的名字也有变化，名字一般都是 A和B 来回切换。而且我发现，官方 APP 这个地方他也没作对，它对于环线是不能切换的，估计也是切换的时候单纯的切换了一下上下行状态。这大概就是前端和后端不打好招呼的后果？</p><h2 id="界面层"><a href="#界面层" class="headerlink" title="界面层"></a>界面层</h2><p>界面的数据写法和我以前接触的 <code>Jinja2</code> 特别相似，不过我接触 <code>Jinja2</code> 的时候也只是和 <code>ECharts</code> 配合做了一些简单的数据展示。做这个公交线路的显示的前端，代码量倒是非常的小，但真的是每一行都是比较费脑子的感觉。大概也是我对某些地方有着莫名的偏执，所以带来了这些麻烦。</p><p>也就比如在线路查询页的地方，我就必须想让公交车的图标要和站点名字处在同一行上，而且站点名字必须要在页面接近居中的文字左对齐。当时我在写这个页面的时候，一开始是使用的 条件渲染 也看这个站点有没有车，如果有就显示没有就不显示。使用这个方法就会导致，有公交车的和没公交车的站点名字他们的缩进是不一样的。然后我就很魔性的想到了使用控制透明度来解决这个问题，每个位置上默认都是要显示图标的，但是会根据这个位置有没有车然后改变透明度。刚刚想到这个方法的时候，都在想为什么我这么聪明233。</p><p><img data-src="https://st.blackyau.net/blog/21/1.png" alt="图标缩进不同解决方法"></p><p>后面还有个问题，就是有的站点他会有停很多车，一般都是在收车的时候，始发站就会停很多车。车多了的话，界面显示会抽风。一开始是想直接吧车辆直接 <code>for</code> 开始，然后一个一个的判断 <code>&gt;5</code>、<code>&lt;=5</code> 、<code>=0</code> 这种情况，但是写半天都不会。因为这个也没法设断点，看 Debug 信息它到底现在循环到了多少之类的。所以就直接在 <code>js</code> 里面吧数据处理了，吧站点当前停留的车辆 <code>&gt;5</code> 的全部都只显示 5 。</p><p>还有问题个问题就是，微信小程序官方的 开发者工具 在显示图标的时候对大小写不敏感。但是在真实设备上，对大小写是敏感的。我一开始调试的时候，车辆图标用的 <code>Bus.png</code> 开发工具上模拟一直没问题，就是到了真机死活显示不出来。最后检查出来是这个问题，挺福气的。而且 <code>Git</code> 也很奇怪，我改了个大小写他都显示我没有任何变动。最后是我强行改了个别的名字，打了个 <code>commit</code> 再改回小写。</p><h2 id="后端"><a href="#后端" class="headerlink" title="后端"></a>后端</h2><p>后端的开发这次倒是没什么技术，就只是单纯的吧数据转发过去然后把拿回来的数据返回过去。因为微信小程序官方不允许使用 http 协议，所以我就只能使用服务器去代理转发流量。官方什么时候要是把我服务器的 IP ban 了那就麻烦了。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><p>最深的感觉还是在开发的时候，因为官方文档都是中文所以查文档还是方便不少。但是搜索一些怪怪的问题的时候，反而用 Google 搜不出来太好的结果，还得要用百度在 CSDN 里面的一大堆重复文章中去找解决方法。</p><p>后面看看有没有时间再完善一下，增加个根据当前位置判断车还有多久到底最近站点的功能。</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;刚刚完成了一个微信小程序的开发，从开发到发布线上版本体验了一起全过程。在这里也记录一下在开发过程中的一些想法和心得。&lt;/p&gt;</summary>
    
    
    
    <category term="念念碎" scheme="https://blackyau.cc/categories/%E5%BF%B5%E5%BF%B5%E7%A2%8E/"/>
    
    
    <category term="微信小程序" scheme="https://blackyau.cc/tags/%E5%BE%AE%E4%BF%A1%E5%B0%8F%E7%A8%8B%E5%BA%8F/"/>
    
    <category term="开发" scheme="https://blackyau.cc/tags/%E5%BC%80%E5%8F%91/"/>
    
    <category term="实时公交" scheme="https://blackyau.cc/tags/%E5%AE%9E%E6%97%B6%E5%85%AC%E4%BA%A4/"/>
    
    <category term="JavaScript" scheme="https://blackyau.cc/tags/JavaScript/"/>
    
  </entry>
  
  <entry>
    <title>Spark 配置</title>
    <link href="https://blackyau.cc/20"/>
    <id>https://blackyau.cc/20</id>
    <published>2019-06-07T06:34:34.000Z</published>
    <updated>2019-06-07T09:21:15.000Z</updated>
    
    <content type="html"><![CDATA[<p><code>Apache Spark</code> 是一个开源集群运算框架，作用类似于 <code>Hadoop</code> 的 <code>MapReduce</code> 。但是相对于 <code>MapReduce</code> 来说它的速度要快得多。</p><span id="more"></span><h2 id="环境介绍"><a href="#环境介绍" class="headerlink" title="环境介绍"></a>环境介绍</h2><p>软件版本如下：</p><table><thead><tr><th>Program</th><th>Version</th><th>URL</th></tr></thead><tbody><tr><td>System</td><td>CentOS-7-x86_64-Minimal-1810</td><td><a href="https://mirrors.tuna.tsinghua.edu.cn/centos-vault/centos/7.9.2009/isos/x86_64/">TUNA Mirrors</a></td></tr><tr><td>JAVA</td><td>jdk-8u211-linux-x64.tar.gz</td><td><a href="https://www.oracle.com/cn/java/technologies/javase/javase8u211-later-archive-downloads.html">Oracle</a></td></tr><tr><td>Hadoop</td><td>hadoop-2.6.0.tar.gz</td><td><a href="http://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/">Apache Archive</a></td></tr><tr><td>Spark</td><td>spark-2.0.0-bin-hadoop2.6.gz</td><td><a href="http://archive.apache.org/dist/spark/spark-2.0.0/">Apache Archive</a></td></tr><tr><td>ZooKeeper</td><td>zookeeper-3.4.5.tar.gz</td><td><a href="http://archive.apache.org/dist/zookeeper/zookeeper-3.4.5/">Apache Archive</a></td></tr></tbody></table><h2 id="目标"><a href="#目标" class="headerlink" title="目标"></a>目标</h2><ul><li>完成 Standalone 集群搭建</li><li>在 YARN 上运行 Spark</li><li>在 Mesos 上运行 Spark</li></ul><h2 id="基础环境配置"><a href="#基础环境配置" class="headerlink" title="基础环境配置"></a>基础环境配置</h2><p>参考 <a href="/16" title="Hadoop HA 搭建">Hadoop HA 搭建</a> 目前已完成 Hadoop&#x2F;ZooKeeper 环境搭建</p><h3 id="下载解压"><a href="#下载解压" class="headerlink" title="下载解压"></a>下载解压</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">curl -O http://archive.apache.org/dist/spark/spark-2.0.0/spark-2.0.0-bin-hadoop2.6.tgz</span><br><span class="line">tar xf spark-2.0.0-bin-hadoop2.6.tgz -C /usr/local/src/</span><br><span class="line">mv /usr/local/src/spark-2.0.0-bin-hadoop2.6 /usr/local/src/spark</span><br></pre></td></tr></table></figure><h2 id="Standalone-Mode"><a href="#Standalone-Mode" class="headerlink" title="Standalone Mode"></a>Standalone Mode</h2><p>使用这种模式搭建，不需要借助其他外部工具(高可用性需要 ZooKeeper)</p><h3 id="手动启动集群"><a href="#手动启动集群" class="headerlink" title="手动启动集群"></a>手动启动集群</h3><p>先搭建一个最简单的</p><table><thead><tr><th>HostName</th><th>Mode</th><th>IP</th></tr></thead><tbody><tr><td>master</td><td>Master</td><td>192.168.66.128</td></tr><tr><td>slave1</td><td>Worker</td><td>192.168.66.129</td></tr><tr><td>slave2</td><td>Worker</td><td>192.168.66.130</td></tr></tbody></table><p>不需要修改任何配置，直接启动即可。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_"># </span><span class="language-bash">master</span></span><br><span class="line">cd /usr/local/src/spark/sbin</span><br><span class="line">./start-master.sh</span><br></pre></td></tr></table></figure><p>用浏览器打开的 <code>Web UI</code> 看看 <a href="http://master:8080/">http://master:8080/</a></p><p>其中的 <code>URL</code> 是让其他的 <code>Workers</code> 连接到 <code>master</code> 的重要参数</p><p><img data-src="https://st.blackyau.net/blog/20/1.png" alt="1"></p><p>将程序传到另外两台机子上</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">scp -r /usr/local/src/spark slave1:/usr/local/src/</span><br><span class="line">scp -r /usr/local/src/spark slave2:/usr/local/src/</span><br></pre></td></tr></table></figure><p>接下来在另外两台机子上启动 <code>workers</code> 并连接到 <code>master</code></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_"># </span><span class="language-bash">两台机子都要运行</span></span><br><span class="line">cd /usr/local/src/spark/sbin</span><br><span class="line">./start-slave.sh spark://master:7077</span><br></pre></td></tr></table></figure><p>如图出现了两个 <code>Worker</code> 且 <code>State</code> 处于 <code>ALIVE</code> 搭建完毕</p><p><img data-src="https://st.blackyau.net/blog/20/2.png" alt="2"></p><h3 id="脚本启动集群"><a href="#脚本启动集群" class="headerlink" title="脚本启动集群"></a>脚本启动集群</h3><p>首先将之前启动的服务都手动关掉</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_"># </span><span class="language-bash">master</span></span><br><span class="line">./stop-master.sh</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_"># </span><span class="language-bash">两个 slave 都需要关闭</span></span><br><span class="line">./stop-slave.sh</span><br></pre></td></tr></table></figure><p>将默认的配置文件改名为正式使用</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">cd /usr/local/src/spark/conf/</span><br><span class="line">cp spark-env.sh.template spark-env.sh</span><br><span class="line">cp slaves.template slaves</span><br></pre></td></tr></table></figure><p>修改 <code>slaves</code> 文件删掉里面的所有内容写入以下内容</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">slave1</span><br><span class="line">slave2</span><br></pre></td></tr></table></figure><p>修改 <code>spark-env.sh</code> 文件指定 <code>master</code>，写入以下内容</p><figure class="highlight sh"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">SPARK_MASTER_HOST=master</span><br><span class="line">JAVA_HOME=/usr/local/src/jdk1.8.0_211</span><br></pre></td></tr></table></figure><blockquote><p>如果不写 <code>JAVA_HOME</code> 的话，在启动 <code>slave</code> 的时候会报 <code>JAVA_HOME is not set</code> 估计是因为我的 <code>JAVA_HOME</code> 设置的仅对 <code>root</code> 生效的原因吧</p></blockquote><p>将配置文件同步给另外两台机子</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">scp -r /usr/local/src/spark/conf/ slave1:/usr/local/src/spark/</span><br><span class="line">scp -r /usr/local/src/spark/conf/ slave2:/usr/local/src/spark/</span><br></pre></td></tr></table></figure><p>在 <code>master</code> 上面启动 <code>master</code> 和 <code>slaves</code></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">cd /usr/local/src/spark/sbin/</span><br><span class="line">./start-master.sh</span><br><span class="line">./start-slaves.sh</span><br></pre></td></tr></table></figure><p>启动成功后浏览器打开 <a href="http://master:8080/">http://master:8080/</a> 看看，应该和上面的图片是一样的</p><p>下面的命令可以关闭</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">./stop-master.sh</span><br><span class="line">./stop-slaves.sh</span><br></pre></td></tr></table></figure><h3 id="高可用"><a href="#高可用" class="headerlink" title="高可用"></a>高可用</h3><p>编辑配置文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /usr/local/src/spark/conf/spark-env.sh</span><br></pre></td></tr></table></figure><p>在配置文件中新增以下内容</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">SPARK_DAEMON_JAVA_OPTS=&quot;-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master:2181,slave1:2181,slave2:2181 -Dspark.deploy.zookeeper.dir=/sparkha&quot;</span><br></pre></td></tr></table></figure><blockquote><p>注意如果你之前添加了 <code>SPARK_MASTER_HOST=master</code> 要删掉，因为 <code>master</code> 的任命被 <code>ZooKeeper</code> 接管了这个配置没用了</p></blockquote><p>将配置同步到另外两台机器上</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">scp -r /usr/local/src/spark/conf/ slave1:/usr/local/src/spark/</span><br><span class="line">scp -r /usr/local/src/spark/conf/ slave2:/usr/local/src/spark/</span><br></pre></td></tr></table></figure><p>启动 <code>ZooKeeper</code>，如果你还没有配置可以参考这篇文章<a href="https://blackyau.cc/16#ZooKeeper-%E9%85%8D%E7%BD%AE">Hadoop HA 配置 - ZooKeeper 配置</a></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_"># </span><span class="language-bash">每台机子都要启动</span></span><br><span class="line">zkServer.sh start</span><br></pre></td></tr></table></figure><p>在 <code>mastrt</code> 上启动 <code>master</code> 和 <code>slaves</code></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">./start-master.sh</span><br><span class="line">./start-slaves.sh</span><br></pre></td></tr></table></figure><p>在 <code>slave1</code> 上启动 <code>备用 master</code></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">./start-master.sh</span><br></pre></td></tr></table></figure><p><img data-src="https://st.blackyau.net/blog/20/3.png" alt="3"></p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><p><a href="https://kafka.apache.org/">Apache Kafka</a></p><p><a href="http://kafka.apachecn.org/">Kafka 中文文档</a></p><p><a href="https://www.cnblogs.com/luotianshuai/p/5206662.html">博客园@Mr.心弦 - Kafka【第一篇】Kafka集群搭建</a></p><p><a href="https://blog.csdn.net/weixin_42207486/article/details/80647802">CSDN@运维白菜鹏 - kafka搭建入门（手把手教你搭建）</a></p><p><a href="https://blog.csdn.net/belalds/article/details/80575751">CSDN@360linker - kafka如何彻底删除topic及数据</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;&lt;code&gt;Apache Spark&lt;/code&gt; 是一个开源集群运算框架，作用类似于 &lt;code&gt;Hadoop&lt;/code&gt; 的 &lt;code&gt;MapReduce&lt;/code&gt; 。但是相对于 &lt;code&gt;MapReduce&lt;/code&gt; 来说它的速度要快得多。&lt;/p&gt;</summary>
    
    
    
    <category term="大数据" scheme="https://blackyau.cc/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
    <category term="CentOS" scheme="https://blackyau.cc/tags/CentOS/"/>
    
    <category term="Linux" scheme="https://blackyau.cc/tags/Linux/"/>
    
    <category term="Hadoop" scheme="https://blackyau.cc/tags/Hadoop/"/>
    
    <category term="YARN" scheme="https://blackyau.cc/tags/YARN/"/>
    
    <category term="Spark" scheme="https://blackyau.cc/tags/Spark/"/>
    
  </entry>
  
  <entry>
    <title>Storm 配置</title>
    <link href="https://blackyau.cc/19"/>
    <id>https://blackyau.cc/19</id>
    <published>2019-06-06T12:50:00.000Z</published>
    <updated>2019-06-06T13:31:10.000Z</updated>
    
    <content type="html"><![CDATA[<p>Storm 是一个分布式实时计算系统，这应该是我最近遇过的搭建最简单的服务了</p><span id="more"></span><h2 id="环境介绍"><a href="#环境介绍" class="headerlink" title="环境介绍"></a>环境介绍</h2><p>软件版本如下：</p><table><thead><tr><th>Program</th><th>Version</th><th>URL</th></tr></thead><tbody><tr><td>System</td><td>CentOS-7-x86_64-Minimal-1810</td><td><a href="https://mirrors.tuna.tsinghua.edu.cn/centos-vault/centos/7.9.2009/isos/x86_64/">TUNA Mirrors</a></td></tr><tr><td>JAVA</td><td>jdk-8u211-linux-x64.tar.gz</td><td><a href="https://www.oracle.com/cn/java/technologies/javase/javase8u211-later-archive-downloads.html">Oracle</a></td></tr><tr><td>ZooKeeper</td><td>zookeeper-3.4.5.tar.gz</td><td><a href="http://archive.apache.org/dist/zookeeper/zookeeper-3.4.5/">Apache Archive</a></td></tr><tr><td>Storm</td><td>apache-storm-1.0.4.tar.gz</td><td><a href="https://archive.apache.org/dist/storm/apache-storm-1.0.4/">Apache Archive</a></td></tr></tbody></table><h2 id="目标"><a href="#目标" class="headerlink" title="目标"></a>目标</h2><ul><li>正确启动 Storm</li><li>有冗余</li></ul><h2 id="基础环境配置"><a href="#基础环境配置" class="headerlink" title="基础环境配置"></a>基础环境配置</h2><p>参考 <a href="/16" title="Hadoop HA 搭建">Hadoop HA 搭建</a> 目前已完成 ZooKeeper 环境搭建</p><table><thead><tr><th>HostName</th><th>IP</th></tr></thead><tbody><tr><td>master</td><td>192.168.66.128</td></tr><tr><td>slave1</td><td>192.168.66.129</td></tr><tr><td>slave2</td><td>192.168.66.130</td></tr></tbody></table><h3 id="zoo-cfg"><a href="#zoo-cfg" class="headerlink" title="zoo.cfg"></a>zoo.cfg</h3><p>zoo.cfg 配置如下</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">tickTime=2000</span><br><span class="line">initLimit=10</span><br><span class="line">syncLimit=5</span><br><span class="line"></span><br><span class="line">dataDir=/usr/local/src/zookeeper-3.4.5/data</span><br><span class="line">dataLogDir=/usr/local/src/zookeeper-3.4.5/logs</span><br><span class="line"></span><br><span class="line">clientPort=2181</span><br><span class="line">server.1=master:2888:3888</span><br><span class="line">server.2=slave1:2888:3888</span><br><span class="line">server.3=slave2:2888:3888</span><br></pre></td></tr></table></figure><h3 id="启动-ZooKeeper"><a href="#启动-ZooKeeper" class="headerlink" title="启动 ZooKeeper"></a>启动 ZooKeeper</h3><p>在每台主机上都要执行该命令</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">zkServer.sh start</span><br></pre></td></tr></table></figure><p>执行完毕后查看他们的运行状态</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">[root@master ~]# zkServer.sh status</span><br><span class="line">JMX enabled by default</span><br><span class="line">Using config: /usr/local/src/zookeeper-3.4.5/bin/../conf/zoo.cfg</span><br><span class="line">Mode: follower</span><br><span class="line"></span><br><span class="line">[root@slave1 ~]# zkServer.sh status</span><br><span class="line">JMX enabled by default</span><br><span class="line">Using config: /usr/local/src/zookeeper-3.4.5/bin/../conf/zoo.cfg</span><br><span class="line">Mode: leader</span><br><span class="line"></span><br><span class="line">[root@slave2 ~]# zkServer.sh status</span><br><span class="line">JMX enabled by default</span><br><span class="line">Using config: /usr/local/src/zookeeper-3.4.5/bin/../conf/zoo.cfg</span><br><span class="line">Mode: follower</span><br></pre></td></tr></table></figure><p>当有一台主机处于 <code>leader</code> 状态，其他的都处于 <code>follower</code> 时即启动成功</p><h2 id="下载解压"><a href="#下载解压" class="headerlink" title="下载解压"></a>下载解压</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">curl -O https://archive.apache.org/dist/storm/apache-storm-1.0.4/apache-storm-1.0.4.tar.gz</span><br><span class="line">tar xf apache-storm-1.0.4.tar.gz -C /usr/local/src/</span><br><span class="line">mv /usr/local/src/apache-storm-1.0.4 /usr/local/src/kafka</span><br></pre></td></tr></table></figure><h2 id="配置"><a href="#配置" class="headerlink" title="配置"></a>配置</h2><p>Storm 的配置还是挺简单的，因为配置文件就只有一个</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /usr/local/src/storm/conf/storm.yaml</span><br></pre></td></tr></table></figure><p>修改配置文件如下</p><figure class="highlight yaml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="attr">storm.zookeeper.servers:</span></span><br><span class="line">     <span class="bullet">-</span> <span class="string">&quot;master&quot;</span></span><br><span class="line">     <span class="bullet">-</span> <span class="string">&quot;slave1&quot;</span></span><br><span class="line">     <span class="bullet">-</span> <span class="string">&quot;slave2&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="attr">nimbus.seeds:</span> [<span class="string">&quot;master&quot;</span>, <span class="string">&quot;slave1&quot;</span>]</span><br><span class="line"></span><br><span class="line"><span class="attr">supervisor.slots.ports:</span></span><br><span class="line">    <span class="bullet">-</span> <span class="number">6700</span></span><br><span class="line">    <span class="bullet">-</span> <span class="number">6701</span></span><br><span class="line">    <span class="bullet">-</span> <span class="number">6702</span></span><br><span class="line">    <span class="bullet">-</span> <span class="number">6703</span></span><br></pre></td></tr></table></figure><p>需要留意下 <code>yaml</code> 配置文件的格式，和我之前经常见的 <code>json</code> 和 <code>xml</code> 不太一样。不过因为 <code>hexo</code> 配置文件也是用的这个格式，所以感觉还行</p><h2 id="运行"><a href="#运行" class="headerlink" title="运行"></a>运行</h2><p>在 <code>master</code> 上执行</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">cd /usr/local/src/kafka</span><br><span class="line">./storm nimbus &amp;</span><br><span class="line">./storm ui</span><br></pre></td></tr></table></figure><p>在 <code>slave1</code> 上执行</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">./storm nimbus &amp;</span><br></pre></td></tr></table></figure><p>在 <code>slave2</code> 上执行</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">./storm supervisor &amp;</span><br></pre></td></tr></table></figure><p>因为这个程序在运行的时候不会在控制台显示 Log 还不会自动到后台去，就后面加个 <code>&amp;</code> 让它到后台去运行了，也可以使用 <code>ctrl z</code> 吧它放到后台去。使用 <code>fg</code> 可以把它拉到前台，当后台有多个任务的时候用 <code>jobs</code> 看看有哪些程序，然后用 <code>fg id</code> 把它拉起就行了。 </p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><p><a href="http://storm.apache.org/releases/1.0.6/index.html">Storm 1.0.6 Documentation</a></p><p><a href="https://blog.csdn.net/bbaiggey/article/details/77017230">CSDN@奔跑-起点 - storm1.x支持主节点nimbus高可用 多master集群部署</a></p><p><a href="https://blog.csdn.net/lnho2015/article/details/51143726">CSDN@Lnho - CentOS下Storm 1.0.0集群安装详解</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;Storm 是一个分布式实时计算系统，这应该是我最近遇过的搭建最简单的服务了&lt;/p&gt;</summary>
    
    
    
    <category term="大数据" scheme="https://blackyau.cc/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
    <category term="CentOS" scheme="https://blackyau.cc/tags/CentOS/"/>
    
    <category term="Linux" scheme="https://blackyau.cc/tags/Linux/"/>
    
    <category term="ZooKeeper" scheme="https://blackyau.cc/tags/ZooKeeper/"/>
    
    <category term="Storm" scheme="https://blackyau.cc/tags/Storm/"/>
    
  </entry>
  
  <entry>
    <title>Kafka 配置</title>
    <link href="https://blackyau.cc/18"/>
    <id>https://blackyau.cc/18</id>
    <published>2019-06-04T10:00:00.000Z</published>
    <updated>2019-06-04T11:51:20.000Z</updated>
    
    <content type="html"><![CDATA[<p>Kafka 和我之前接触的 <a href="/12" title="Flume 配置">Flume</a> 非常相识,不过我关心的是它的搭建方式。</p><span id="more"></span><h2 id="环境介绍"><a href="#环境介绍" class="headerlink" title="环境介绍"></a>环境介绍</h2><p>软件版本如下：</p><table><thead><tr><th>Program</th><th>Version</th><th>URL</th></tr></thead><tbody><tr><td>System</td><td>CentOS-7-x86_64-Minimal-1810</td><td><a href="https://mirrors.tuna.tsinghua.edu.cn/centos-vault/centos/7.9.2009/isos/x86_64/">TUNA Mirrors</a></td></tr><tr><td>JAVA</td><td>jdk-8u211-linux-x64.tar.gz</td><td><a href="https://www.oracle.com/cn/java/technologies/javase/javase8u211-later-archive-downloads.html">Oracle</a></td></tr><tr><td>ZooKeeper</td><td>zookeeper-3.4.5.tar.gz</td><td><a href="http://archive.apache.org/dist/zookeeper/zookeeper-3.4.5/">Apache Archive</a></td></tr><tr><td>Kafka</td><td>kafka_2.11-1.0.0.tgz</td><td><a href="http://archive.apache.org/dist/kafka/1.0.0/">Apache Archive</a></td></tr></tbody></table><h2 id="目标"><a href="#目标" class="headerlink" title="目标"></a>目标</h2><ul><li>正确启动 Kafka</li><li>完成 生产者(producer) 配置</li><li>完成 消费者(consumer) 配置</li></ul><h2 id="基础环境配置"><a href="#基础环境配置" class="headerlink" title="基础环境配置"></a>基础环境配置</h2><p>参考 <a href="/16" title="Hadoop HA 搭建">Hadoop HA 搭建</a> 目前已完成 ZooKeeper 环境搭建</p><table><thead><tr><th>HostName</th><th>broker id</th><th>Config Name</th><th>IP</th></tr></thead><tbody><tr><td>master</td><td>1</td><td>server-1.properties</td><td>192.168.66.128</td></tr><tr><td>slave1</td><td>2</td><td>server-2.properties</td><td>192.168.66.129</td></tr><tr><td>slave2</td><td>3</td><td>server-3.properties</td><td>192.168.66.130</td></tr></tbody></table><h3 id="zoo-cfg"><a href="#zoo-cfg" class="headerlink" title="zoo.cfg"></a>zoo.cfg</h3><p>zoo.cfg 配置如下</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">tickTime=2000</span><br><span class="line">initLimit=10</span><br><span class="line">syncLimit=5</span><br><span class="line"></span><br><span class="line">dataDir=/usr/local/src/zookeeper-3.4.5/data</span><br><span class="line">dataLogDir=/usr/local/src/zookeeper-3.4.5/logs</span><br><span class="line"></span><br><span class="line">clientPort=2181</span><br><span class="line">server.1=master:2888:3888</span><br><span class="line">server.2=slave1:2888:3888</span><br><span class="line">server.3=slave2:2888:3888</span><br></pre></td></tr></table></figure><h3 id="启动-ZooKeeper"><a href="#启动-ZooKeeper" class="headerlink" title="启动 ZooKeeper"></a>启动 ZooKeeper</h3><p>在每台主机上都要执行该命令</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">zkServer.sh start</span><br></pre></td></tr></table></figure><p>执行完毕后查看他们的运行状态</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">[root@master ~]# zkServer.sh status</span><br><span class="line">JMX enabled by default</span><br><span class="line">Using config: /usr/local/src/zookeeper-3.4.5/bin/../conf/zoo.cfg</span><br><span class="line">Mode: follower</span><br><span class="line"></span><br><span class="line">[root@slave1 ~]# zkServer.sh status</span><br><span class="line">JMX enabled by default</span><br><span class="line">Using config: /usr/local/src/zookeeper-3.4.5/bin/../conf/zoo.cfg</span><br><span class="line">Mode: leader</span><br><span class="line"></span><br><span class="line">[root@slave2 ~]# zkServer.sh status</span><br><span class="line">JMX enabled by default</span><br><span class="line">Using config: /usr/local/src/zookeeper-3.4.5/bin/../conf/zoo.cfg</span><br><span class="line">Mode: follower</span><br></pre></td></tr></table></figure><p>当有一台主机处于 <code>leader</code> 状态，其他的都处于 <code>follower</code> 时即启动成功</p><h2 id="下载解压"><a href="#下载解压" class="headerlink" title="下载解压"></a>下载解压</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">curl -O http://archive.apache.org/dist/kafka/1.0.0/kafka_2.11-1.0.0.tgz</span><br><span class="line">tar xf kafka_2.11-1.0.0.tgz -C /usr/local/src/</span><br><span class="line">mv /usr/local/src/kafka_2.11-1.0.0 /usr/local/src/kafka</span><br></pre></td></tr></table></figure><h2 id="配置"><a href="#配置" class="headerlink" title="配置"></a>配置</h2><p>这里我直接就使用我自己，已经安装好的 <code>ZooKeeper</code> 作为服务端不使用 <code>Kafka</code> 自带的 <code>ZooKeeper</code> ，如果你需要使用自带的可以参考以下文档:</p><p><a href="https://kafka.apache.org/quickstart">Kafka Quickstart English</a></p><p><a href="http://kafka.apachecn.org/quickstart.html">Kafka Quickstart 中文文档</a></p><h3 id="编写配置文件"><a href="#编写配置文件" class="headerlink" title="编写配置文件"></a>编写配置文件</h3><p>新建 <code>master</code> 配置文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /usr/local/src/kafka/config/server-1.properties</span><br></pre></td></tr></table></figure><p>写入以下内容</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">broker.id=1</span><br><span class="line">listeners=PLAINTEXT://master:9093</span><br><span class="line">log.dir=/root/kafka/logs</span><br><span class="line">zookeeper.connect=master:2181,slave1:2181,slave2:2181</span><br></pre></td></tr></table></figure><p>新建 <code>slave1</code> 配置文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /usr/local/src/kafka/config/server-2.properties</span><br></pre></td></tr></table></figure><p>写入以下内容</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">broker.id=2</span><br><span class="line">listeners=PLAINTEXT://slave1:9093</span><br><span class="line">log.dir=/root/kafka/logs-2</span><br><span class="line">zookeeper.connect=master:2181,slave1:2181,slave2:2181</span><br></pre></td></tr></table></figure><p>新建 <code>slave2</code> 配置文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /usr/local/src/kafka/config/server-3.properties</span><br></pre></td></tr></table></figure><p>写入以下内容</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">broker.id=3</span><br><span class="line">listeners=PLAINTEXT://slave2:9093</span><br><span class="line">log.dir=/root/kafka/logs-3</span><br><span class="line">zookeeper.connect=master:2181,slave1:2181,slave2:2181</span><br></pre></td></tr></table></figure><h3 id="同步配置"><a href="#同步配置" class="headerlink" title="同步配置"></a>同步配置</h3><p>将程序和配置分发到所有主机上</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">scp /usr/local/src/kafka slave1:/usr/local/src/</span><br><span class="line">scp /usr/local/src/kafka slave2:/usr/local/src/</span><br></pre></td></tr></table></figure><h3 id="设置环境变量"><a href="#设置环境变量" class="headerlink" title="设置环境变量"></a>设置环境变量</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi ~/.bash_profile</span><br></pre></td></tr></table></figure><p>在文件底部新增以下内容</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">export KAFKA_HOME=/usr/local/src/kafka</span><br><span class="line">PATH=$PATH:$KAFKA_HOME/bin</span><br></pre></td></tr></table></figure><p>使其生效</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">source ~/.bash_profile</span><br></pre></td></tr></table></figure><h2 id="运行"><a href="#运行" class="headerlink" title="运行"></a>运行</h2><p>在 <code>master</code> 上执行</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">kafka-server-start.sh /usr/local/src/kafka/config/server-1.properties</span><br></pre></td></tr></table></figure><p>在 <code>slave1</code> 上执行</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">kafka-server-start.sh /usr/local/src/kafka/config/server-2.properties</span><br></pre></td></tr></table></figure><p>在 <code>slave2</code> 上执行</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">kafka-server-start.sh /usr/local/src/kafka/config/server-3.properties</span><br></pre></td></tr></table></figure><p>出现类似以下输出内容说明启动成功了</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">[2019-06-04 19:18:04,467] INFO [KafkaServer id=1] started (kafka.server.KafkaServer)</span><br></pre></td></tr></table></figure><h2 id="创建Topic"><a href="#创建Topic" class="headerlink" title="创建Topic"></a>创建Topic</h2><p>在一个新的窗口打开 <code>master</code> 的 shell 执行</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">kafka-topics.sh --create --zookeeper master:2181 --replication-factor 2 --partitions 1 --topic my-replicated-topic</span><br></pre></td></tr></table></figure><p>–replication-factor 2   #复制两份<br>–partitions 1 #创建1个分区<br>–topic #主题为my-replicated-topic</p><p>查看所有 topic</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">kafka-topics.sh --list --zookeeper master:12181</span><br></pre></td></tr></table></figure><p>应输出</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">my-replicated-topic</span><br></pre></td></tr></table></figure><p>查看 my-replicated-topic 详细信息</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">kafka-topics.sh --describe --zookeeper master:2181 --topic my-replicated-topic</span><br></pre></td></tr></table></figure><p>应输出</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">Topic:my-replicated-topicPartitionCount:1ReplicationFactor:2Configs:</span><br><span class="line">Topic: my-replicated-topicPartition: 0Leader: 2Replicas: 2,3Isr: 2,3</span><br></pre></td></tr></table></figure><ul><li>“leader”是负责给定分区所有读写操作的节点。每个节点都是随机选择的部分分区的领导者。</li><li>“replicas”是复制分区日志的节点列表，不管这些节点是leader还是仅仅活着。</li><li>“isr”是一组“同步”replicas，是replicas列表的子集，它活着并被指到leader。</li></ul><blockquote><p>不知到为啥有两个 Leader ，而且看起来 <code>master</code> 好像离线了一样。感觉应该是我设置 <code>--replication-factor 2</code> 导致的,也可以能是我刚刚调试的时候没有把这个 <code>topic</code> 删干净导致的</p></blockquote><h2 id="生产者-消费者"><a href="#生产者-消费者" class="headerlink" title="生产者&amp;消费者"></a>生产者&amp;消费者</h2><p>在 <code>master</code> 上面执行</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">kafka-console-producer.sh --broker-list master:9093 --topic my-replicated-topic</span><br></pre></td></tr></table></figure><p>输入任意字符</p><p>在 <code>slave2</code> 上面执行</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">kafka-console-consumer.sh --bootstrap-server master:9093 --from-beginning --topic my-replicated-topic</span><br></pre></td></tr></table></figure><p>看到消息同步出现，即成功。</p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><p><a href="https://kafka.apache.org/">Apache Kafka</a></p><p><a href="http://kafka.apachecn.org/">Kafka 中文文档</a></p><p><a href="https://www.cnblogs.com/luotianshuai/p/5206662.html">博客园@Mr.心弦 - Kafka【第一篇】Kafka集群搭建</a></p><p><a href="https://blog.csdn.net/weixin_42207486/article/details/80647802">CSDN@运维白菜鹏 - kafka搭建入门（手把手教你搭建）</a></p><p><a href="https://blog.csdn.net/belalds/article/details/80575751">CSDN@360linker - kafka如何彻底删除topic及数据</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;Kafka 和我之前接触的 &lt;a href=&quot;/12&quot; title=&quot;Flume 配置&quot;&gt;Flume&lt;/a&gt; 非常相识,不过我关心的是它的搭建方式。&lt;/p&gt;</summary>
    
    
    
    <category term="大数据" scheme="https://blackyau.cc/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
    <category term="CentOS" scheme="https://blackyau.cc/tags/CentOS/"/>
    
    <category term="Linux" scheme="https://blackyau.cc/tags/Linux/"/>
    
    <category term="ZooKeeper" scheme="https://blackyau.cc/tags/ZooKeeper/"/>
    
    <category term="Kafka" scheme="https://blackyau.cc/tags/Kafka/"/>
    
  </entry>
  
  <entry>
    <title>HBase 配置</title>
    <link href="https://blackyau.cc/17"/>
    <id>https://blackyau.cc/17</id>
    <published>2019-06-01T10:09:22.000Z</published>
    <updated>2019-06-02T09:36:14.000Z</updated>
    
    <content type="html"><![CDATA[<p>我接触的第一个 非关系型数据库 就是 HBase ,有关它的更多概念我这里就不说了。本文关注的是它的搭建、配置、使用。</p><span id="more"></span><h2 id="环境介绍"><a href="#环境介绍" class="headerlink" title="环境介绍"></a>环境介绍</h2><p>软件版本如下：</p><table><thead><tr><th>Program</th><th>Version</th><th>URL</th></tr></thead><tbody><tr><td>System</td><td>CentOS-7-x86_64-Minimal-1810</td><td><a href="https://mirrors.tuna.tsinghua.edu.cn/centos-vault/centos/7.9.2009/isos/x86_64/">TUNA Mirrors</a></td></tr><tr><td>JAVA</td><td>jdk-8u211-linux-x64.tar.gz</td><td><a href="https://www.oracle.com/cn/java/technologies/javase/javase8u211-later-archive-downloads.html">Oracle</a></td></tr><tr><td>Hadoop</td><td>hadoop-2.6.0.tar.gz</td><td><a href="http://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/">Apache Archive</a></td></tr><tr><td>ZooKeeper</td><td>zookeeper-3.4.5.tar.gz</td><td><a href="http://archive.apache.org/dist/zookeeper/zookeeper-3.4.5/">Apache Archive</a></td></tr><tr><td>HBase</td><td>hbase-1.2.0-bin.tar.gz</td><td><a href="http://archive.apache.org/dist/hbase/1.2.0/">Apache Archive</a></td></tr></tbody></table><p>关于版本的问题，我这里使用的环境并不是官方推荐的组合。在官方文档上面有关 Hadoop 不同版本与 HBase 的兼容性有介绍，可以看<a href="https://hbase.apache.org/book.html#hadoop">这里</a></p><blockquote><p>在 Hadoop 2.6.x 中的 Hadoop 2.6.0 版本下运行 HBase 可能会导致集群故障和数据丢失，请使用 Hadoop 2.6.1+ 版本</p></blockquote><h2 id="目标"><a href="#目标" class="headerlink" title="目标"></a>目标</h2><ul><li>完成 HBase 单机模式配置</li><li>完成 HBase 分布模式配置</li><li>HBase 数据的导出&#x2F;导入</li></ul><h2 id="基础环境配置"><a href="#基础环境配置" class="headerlink" title="基础环境配置"></a>基础环境配置</h2><p>参考 <a href="/16" title="Hadoop HA 搭建">Hadoop HA 搭建</a> 目前已完成 Hadoop HA 环境搭建</p><p>| HostName | Function | IP |<br>| — | — | — | — |<br>| master | DataNode&#x2F;NameNode&#x2F;ResourceManager | 192.168.66.128 |<br>| slave1 | DataNode&#x2F;NameNode&#x2F;JobHistoryServer | 192.168.66.129 |<br>| slave2 | DataNode&#x2F;ResourceManager | 192.168.66.130 |</p><h3 id="下载解压"><a href="#下载解压" class="headerlink" title="下载解压"></a>下载解压</h3><p>首先下载，解压 HBase</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">curl -O http://archive.apache.org/dist/hbase/1.2.0/hbase-1.2.0-bin.tar.gz</span><br><span class="line">tar xf hbase-1.2.0-bin.tar.gz -C /usr/local/src/</span><br></pre></td></tr></table></figure><h3 id="系统环境变量"><a href="#系统环境变量" class="headerlink" title="系统环境变量"></a>系统环境变量</h3><p>配置 HBase 环境变量，只对当前用户生效</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi ~/.bash_profile</span><br></pre></td></tr></table></figure><p>添加以下内容</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">export HBASE_HOME=/usr/local/src/hbase-1.2.0</span><br><span class="line">PATH=$PATH:$HBASE_HOME/bin</span><br></pre></td></tr></table></figure><p>使其生效</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">source ~/.bash_profile</span><br></pre></td></tr></table></figure><p>测试是否配置成功</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">hbase version</span><br></pre></td></tr></table></figure><p>输出以下信息说明配置成功</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">HBase 1.2.0</span><br><span class="line">Source code repository git://asf-dev/home/busbey/projects/hbase revision=25b281972df2f5b15c426c8963cbf77dd853a5ad</span><br><span class="line">Compiled by busbey on Thu Feb 18 23:01:49 CST 2016</span><br><span class="line">From source with checksum bcb25b7506ecf5d62c79d8f7193c829b</span><br></pre></td></tr></table></figure><h3 id="hbase-env"><a href="#hbase-env" class="headerlink" title="hbase-env"></a>hbase-env</h3><p>把 JAVA_HOME 写进 HBase 环境变量</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /usr/local/src/hbase-1.2.0/conf/hbase-env.sh</span><br></pre></td></tr></table></figure><p>添加以下内容</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">export /usr/local/src/jdk1.8.0_211/</span><br></pre></td></tr></table></figure><h2 id="单机模式"><a href="#单机模式" class="headerlink" title="单机模式"></a>单机模式</h2><h3 id="hbase-site-xml单机模式"><a href="#hbase-site-xml单机模式" class="headerlink" title="hbase-site.xml单机模式"></a>hbase-site.xml单机模式</h3><p>打开主配置文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /usr/local/src/hbase-1.2.0/conf/hbase-site.xml</span><br></pre></td></tr></table></figure><p>将光标放在第一行，输入以下命令清空配置文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">:.,$d</span><br></pre></td></tr></table></figure><p>写入以下内容,配置来自 <a href="https://hbase.apache.org/1.2/book.html#_get_started_with_hbase">HBase Doc v1.2</a></p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>hbase.rootdir<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>file:///root/standalone/hbase<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">    <span class="comment">&lt;!-- 设置储存目录 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>hbase.zookeeper.property.dataDir<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>/root/standalone/zookeeper<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">    <span class="comment">&lt;!-- ZooKeeper目录 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure><h3 id="运行单机模式"><a href="#运行单机模式" class="headerlink" title="运行单机模式"></a>运行单机模式</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">start-hbase.sh</span><br></pre></td></tr></table></figure><p>查看进程是否在运行</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">[root@master ~]# jps</span><br><span class="line">10082 HMaster</span><br><span class="line">10346 Jps</span><br></pre></td></tr></table></figure><p>到这里单机模式就配置完成了</p><h2 id="分布式模式"><a href="#分布式模式" class="headerlink" title="分布式模式"></a>分布式模式</h2><h3 id="hbase-site-xml分布式模式"><a href="#hbase-site-xml分布式模式" class="headerlink" title="hbase-site.xml分布式模式"></a>hbase-site.xml分布式模式</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /usr/local/src/hbase-1.2.0/conf/hbase-site.xml</span><br></pre></td></tr></table></figure><p>将光标放在第一行，输入以下命令清空配置文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">:.,$d</span><br></pre></td></tr></table></figure><p>写入以下内容,配置来自 <a href="https://hbase.apache.org/1.2/book.html#example_config">HBase Doc v1.2</a></p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&lt;?xml version=<span class="string">&quot;1.0&quot;</span>?&gt;</span></span><br><span class="line"><span class="meta">&lt;?xml-stylesheet type=<span class="string">&quot;text/xsl&quot;</span> href=<span class="string">&quot;configuration.xsl&quot;</span>?&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>hbase.zookeeper.quorum<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>master,slave1,slave2<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">    <span class="comment">&lt;!-- 集群主机的 hostname --&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">description</span>&gt;</span>The directory shared by RegionServers.</span><br><span class="line">    <span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>hbase.zookeeper.property.dataDir<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>/export/zookeeper<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">description</span>&gt;</span>Property from ZooKeeper config zoo.cfg.</span><br><span class="line">    The directory where the snapshot is stored.</span><br><span class="line">    <span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>hbase.rootdir<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>hdfs://nscluster:8020/hbase<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">    <span class="comment">&lt;!-- 注意这里要和你 Hadoop hdfs-site.xml 配置中的 fs.defaultFS 设置相同 --&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">description</span>&gt;</span>The directory shared by RegionServers.</span><br><span class="line">    <span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>hbase.cluster.distributed<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>true<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">description</span>&gt;</span>The mode the cluster will be in. Possible values are</span><br><span class="line">      false: standalone and pseudo-distributed setups with managed Zookeeper</span><br><span class="line">      true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)</span><br><span class="line">    <span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure><h3 id="regionservers"><a href="#regionservers" class="headerlink" title="regionservers"></a>regionservers</h3><p>修改集群节点信息配置文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /usr/local/src/hbase-1.2.0/conf/regionservers</span><br></pre></td></tr></table></figure><p>清空所有内容，写入以下内容</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">master</span><br><span class="line">slave1</span><br><span class="line">slave2</span><br></pre></td></tr></table></figure><h3 id="拷贝Hadoop配置"><a href="#拷贝Hadoop配置" class="headerlink" title="拷贝Hadoop配置"></a>拷贝Hadoop配置</h3><p>因为我 Hadoop 使用了 ZooKeeper 高可用模式， HBase 在没有 Hadoop 配置的情况下会找不到 HDFS 的地址。所以需要将配置拷贝到它的目录。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">cp $HADOOP_HOME/etc/hadoop/core-site.xml $HBASE_HOME/conf/</span><br><span class="line">cp $HADOOP_HOME/etc/hadoop/hdfs-site.xml $HBASE_HOME/conf/</span><br></pre></td></tr></table></figure><h3 id="集群同步配置"><a href="#集群同步配置" class="headerlink" title="集群同步配置"></a>集群同步配置</h3><p>将配置文件同步到集群中的其他机器中去</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">scp -r /usr/local/src/hbase-1.2.0 slave1:/usr/local/src/</span><br><span class="line">scp -r /usr/local/src/hbase-1.2.0 slave2:/usr/local/src/</span><br></pre></td></tr></table></figure><p>参考 <a href="https://blackyau.cc/17#%E7%B3%BB%E7%BB%9F%E7%8E%AF%E5%A2%83%E5%8F%98%E9%87%8F">HBase配置-系统环境变量</a> 在其他机器中也设置好系统环境变量</p><h3 id="运行分布式模式"><a href="#运行分布式模式" class="headerlink" title="运行分布式模式"></a>运行分布式模式</h3><blockquote><p>在运行之前要先启动 Hadoop ，启动 Hadoop 的命令根据你自己的环境而定</p></blockquote><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">start-hbase.sh</span><br></pre></td></tr></table></figure><p>使用 <code>jps</code> 查看正在运行的进程是否存在 <code>HMaster</code> 和 <code>HRegionServer</code></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">[root@master ~]# jps</span><br><span class="line">11538 HMaster</span><br><span class="line">10692 DataNode</span><br><span class="line">10884 NodeManager</span><br><span class="line">10517 JournalNode</span><br><span class="line">11221 DFSZKFailoverController</span><br><span class="line">12008 Jps</span><br><span class="line">11657 HRegionServer</span><br><span class="line">10795 ResourceManager</span><br><span class="line">10446 QuorumPeerMain</span><br><span class="line">10574 NameNode</span><br></pre></td></tr></table></figure><p>WEB端</p><p><a href="http://master:16010/">http://master:16010</a></p><p><img data-src="https://st.blackyau.net/blog/17/1.png" alt="WEB端截图"></p><h2 id="HBase-Shell"><a href="#HBase-Shell" class="headerlink" title="HBase Shell"></a>HBase Shell</h2><p>启动 HBase Shell</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">hbase shell</span><br></pre></td></tr></table></figure><p>如果出现以下报错</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">SLF4J: Class path contains multiple SLF4J bindings.</span><br><span class="line">SLF4J: Found binding in [jar:file:/usr/local/src/hbase-1.2.0/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]</span><br><span class="line">SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]</span><br><span class="line">SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.</span><br><span class="line">SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]</span><br></pre></td></tr></table></figure><p>是因为 <code>HBase</code> 自带的 <code>Jar</code> 包和 <code>Hadoop</code> 的包有冲突，删除冲突包即可</p><p>接下来开始创建 数据库&#x2F;表 插入数据。注意，创建表的时候如果不指定数据库，表就会被放进 <code>default</code> 表中</p><p>表结构如下:</p><escape><table>  <tr>    <th rowspan="2" align="center">Row Key</th>    <th colspan="2" align="center">inside</th>    <th colspan="2" align="center">outside</th>  </tr>  <tr>    <td align="center">name</td>    <td align="center">age</td>    <td align="center">slang</td>    <td align="center">zh</td>  </tr>  <tr>    <td align="center">1</td>    <td align="center">tom</td>    <td align="center">3</td>    <td align="center">cat</td>    <td align="center">mao</td>  </tr>  <tr>    <td align="center">2</td>    <td align="center">jerry</td>    <td align="center">2</td>    <td align="center">rat</td>    <td align="center">laoshu</td>  </tr></table></escape><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">create_namespace <span class="string">&#x27;test&#x27;</span> <span class="comment">--创建数据库</span></span><br><span class="line"><span class="keyword">create</span> <span class="string">&#x27;test:emp&#x27;</span>, <span class="string">&#x27;inside&#x27;</span>, <span class="string">&#x27;outside&#x27;</span> <span class="comment">--在test库中创建emp表</span></span><br><span class="line">list_namespace_tables <span class="string">&#x27;test&#x27;</span> <span class="comment">--查看test中的表</span></span><br></pre></td></tr></table></figure><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">TABLE</span></span><br><span class="line">emp</span><br><span class="line"><span class="number">1</span> <span class="type">row</span>(s) <span class="keyword">in</span> <span class="number">0.0130</span> seconds</span><br></pre></td></tr></table></figure><p>查看表结构</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">desc</span> <span class="string">&#x27;test:emp&#x27;</span></span><br></pre></td></tr></table></figure><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">Table</span> test:emp <span class="keyword">is</span> ENABLED</span><br><span class="line">test:emp</span><br><span class="line"><span class="keyword">COLUMN</span> FAMILIES DESCRIPTION</span><br><span class="line">&#123;NAME <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;inside&#x27;</span>, BLOOMFILTER <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;ROW&#x27;</span>, VERSIONS <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;1&#x27;</span>, IN_MEMORY <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;false&#x27;</span>, KEEP_DELETED_CELLS <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;FALSE&#x27;</span>, DATA_BLOCK_ENCODING <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;NONE&#x27;</span>, TTL <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;FOREVER&#x27;</span>, COMPRESSION <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;NONE&#x27;</span>, MIN_VERSIONS <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;0&#x27;</span>, BLOCKCACHE <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;true&#x27;</span>, BLOCKSIZE <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;65536&#x27;</span>, REPLICATION_SCOPE <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;0&#x27;</span>&#125;</span><br><span class="line">&#123;NAME <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;outside&#x27;</span>, BLOOMFILTER <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;ROW&#x27;</span>, VERSIONS <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;1&#x27;</span>, IN_MEMORY <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;false&#x27;</span>, KEEP_DELETED_CELLS <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;FALSE&#x27;</span>, DATA_BLOCK_ENCODING <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;NONE&#x27;</span>, TTL <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;FOREVER&#x27;</span>, COMPRESSION <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;NONE&#x27;</span>, MIN_VERSIONS <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;0&#x27;</span>, BLOCKCACHE <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;true&#x27;</span>, BLOCKSIZE <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;65536&#x27;</span>, REPLICATION_SCOPE <span class="operator">=</span><span class="operator">&gt;</span> <span class="string">&#x27;0&#x27;</span>&#125;</span><br><span class="line"><span class="number">2</span> <span class="type">row</span>(s) <span class="keyword">in</span> <span class="number">0.0410</span> seconds</span><br></pre></td></tr></table></figure><p>插入数据</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">put <span class="string">&#x27;test:emp&#x27;</span>, <span class="string">&#x27;1&#x27;</span>, <span class="string">&#x27;inside:name&#x27;</span>, <span class="string">&#x27;tom&#x27;</span></span><br><span class="line">put <span class="string">&#x27;test:emp&#x27;</span>, <span class="string">&#x27;1&#x27;</span>, <span class="string">&#x27;inside:age&#x27;</span>, <span class="string">&#x27;3&#x27;</span></span><br><span class="line">put <span class="string">&#x27;test:emp&#x27;</span>, <span class="string">&#x27;1&#x27;</span>, <span class="string">&#x27;outside:slang&#x27;</span>, <span class="string">&#x27;cat&#x27;</span></span><br><span class="line">put <span class="string">&#x27;test:emp&#x27;</span>, <span class="string">&#x27;1&#x27;</span>, <span class="string">&#x27;outside:zh&#x27;</span>, <span class="string">&#x27;mao&#x27;</span></span><br><span class="line">put <span class="string">&#x27;test:emp&#x27;</span>, <span class="string">&#x27;2&#x27;</span>, <span class="string">&#x27;inside:name&#x27;</span>, <span class="string">&#x27;jerry&#x27;</span></span><br><span class="line">put <span class="string">&#x27;test:emp&#x27;</span>, <span class="string">&#x27;2&#x27;</span>, <span class="string">&#x27;inside:age&#x27;</span>, <span class="string">&#x27;2&#x27;</span></span><br><span class="line">put <span class="string">&#x27;test:emp&#x27;</span>, <span class="string">&#x27;2&#x27;</span>, <span class="string">&#x27;outside:slang&#x27;</span>, <span class="string">&#x27;rat&#x27;</span></span><br><span class="line">put <span class="string">&#x27;test:emp&#x27;</span>, <span class="string">&#x27;2&#x27;</span>, <span class="string">&#x27;outside:zh&#x27;</span>, <span class="string">&#x27;laoshu&#x27;</span></span><br></pre></td></tr></table></figure><p>扫描表</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">scan <span class="string">&#x27;test:emp&#x27;</span></span><br></pre></td></tr></table></figure><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="type">ROW</span>    <span class="keyword">COLUMN</span><span class="operator">+</span>CELL</span><br><span class="line"></span><br><span class="line"> <span class="number">1</span>     <span class="keyword">column</span><span class="operator">=</span>inside:age, <span class="type">timestamp</span><span class="operator">=</span><span class="number">1559458876365</span>, <span class="keyword">value</span><span class="operator">=</span><span class="number">3</span></span><br><span class="line"> <span class="number">1</span>     <span class="keyword">column</span><span class="operator">=</span>inside:name, <span class="type">timestamp</span><span class="operator">=</span><span class="number">1559458822328</span>, <span class="keyword">value</span><span class="operator">=</span>tom</span><br><span class="line"> <span class="number">1</span>     <span class="keyword">column</span><span class="operator">=</span>outside:slang, <span class="type">timestamp</span><span class="operator">=</span><span class="number">1559459016767</span>, <span class="keyword">value</span><span class="operator">=</span>cat</span><br><span class="line"> <span class="number">1</span>     <span class="keyword">column</span><span class="operator">=</span>outside:zh, <span class="type">timestamp</span><span class="operator">=</span><span class="number">1559459100544</span>, <span class="keyword">value</span><span class="operator">=</span>mao</span><br><span class="line"> <span class="number">2</span>     <span class="keyword">column</span><span class="operator">=</span>inside:age, <span class="type">timestamp</span><span class="operator">=</span><span class="number">1559459253516</span>, <span class="keyword">value</span><span class="operator">=</span><span class="number">2</span></span><br><span class="line"> <span class="number">2</span>     <span class="keyword">column</span><span class="operator">=</span>inside:name, <span class="type">timestamp</span><span class="operator">=</span><span class="number">1559459242949</span>, <span class="keyword">value</span><span class="operator">=</span>jerry</span><br><span class="line"> <span class="number">2</span>     <span class="keyword">column</span><span class="operator">=</span>outside:slang, <span class="type">timestamp</span><span class="operator">=</span><span class="number">1559459336624</span>, <span class="keyword">value</span><span class="operator">=</span>rat</span><br><span class="line"> <span class="number">2</span>     <span class="keyword">column</span><span class="operator">=</span>outside:zh, <span class="type">timestamp</span><span class="operator">=</span><span class="number">1559459347676</span>, <span class="keyword">value</span><span class="operator">=</span>laoshu</span><br><span class="line"><span class="number">2</span> <span class="type">row</span>(s) <span class="keyword">in</span> <span class="number">0.0190</span> seconds</span><br></pre></td></tr></table></figure><p>删除表数据之前要先禁用</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">disable <span class="string">&#x27;test:emp&#x27;</span> <span class="comment">--禁用表</span></span><br><span class="line"><span class="keyword">drop</span> <span class="string">&#x27;test:emp&#x27;</span> <span class="comment">--删除表(先别删,后面导出了再删)</span></span><br></pre></td></tr></table></figure><p>退出 <code>HBase Shell</code></p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">quit</span><br></pre></td></tr></table></figure><h2 id="导入导出数据"><a href="#导入导出数据" class="headerlink" title="导入导出数据"></a>导入导出数据</h2><p>使用 <code>HBase</code> 自带的类 导出 二进制格式文件</p><blockquote><p>如果不加 <code>file://</code> 就会导出到 <code>HDFS</code> 上面去，如果带了数据库名一定要加 <code>&#39;&#39;</code> 不然导出的数据是空白。</p></blockquote><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">hbase org.apache.hadoop.hbase.mapreduce.Export `test:emp` file:///root/emp_out</span><br><span class="line">ls /root/emp_out/</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">part-m-00000  _SUCCESS</span><br></pre></td></tr></table></figure><blockquote><p>之前导出文件后一直找不到,结果是因为我启用的 <code>Hadoop HA</code> 把这个任务分配给了别的机器，所以不在 <code>master</code> 上面。上 <a href="http://master:8088/">http://master:8088</a> 看看也能知道是谁在运行。</p></blockquote><p><img data-src="https://st.blackyau.net/blog/17/2.png" alt="资源管理器 截图"></p><p>删除表，为导入数据做准备</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">hbase shell</span><br></pre></td></tr></table></figure><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">disable <span class="string">&#x27;test:emp&#x27;</span> <span class="comment">--禁用表</span></span><br><span class="line"><span class="keyword">drop</span> <span class="string">&#x27;test:emp&#x27;</span> <span class="comment">--删除表</span></span><br><span class="line"><span class="keyword">create</span> <span class="string">&#x27;test:emp&#x27;</span>, <span class="string">&#x27;inside&#x27;</span>, <span class="string">&#x27;outside&#x27;</span> <span class="comment">--新建表(没有同名表无法导入数据)</span></span><br><span class="line">quit</span><br></pre></td></tr></table></figure><p>导入之前导出的数据</p><blockquote><p>注意你的 <code>MapReduce</code> 任务会被分配到那台机器上运行，文件要放对位置。</p></blockquote><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">hbase org.apache.hadoop.hbase.mapreduce.Import `test:emp` file:///root/emp_out</span><br></pre></td></tr></table></figure><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><p><a href="https://hbase.apache.org/1.2/book.html">Apache HBase ™ Reference Guide Version 1.2.12</a></p><p><a href="https://blog.csdn.net/y472360651/article/details/79017308">CSDN@奔跑的豆子_ - HBase-单机模式安装</a></p><p><a href="https://www.cnblogs.com/huanlegu0426/p/hbase03.html">博客园@huanlegu0426 - hadoop2.5.1+hbase1.1.2安装与配置</a></p><p><a href="https://blog.csdn.net/jjshouji/article/details/78054556">CSDN@jjshouji - hbase 1.2.6 安装</a></p><p><a href="https://blog.csdn.net/u011444062/article/details/81138861">CSDN@疯子. - 大数据系列之数据库Hbase知识整理（三）Hbase的表结构，基本操作，元数据表meta</a></p><p><a href="https://www.yiibai.com/hbase">易白教程@张新发 - HBase教程</a></p><p><a href="https://stackoverflow.com/questions/25909132">Stack Overflow@Nanda - How to import&#x2F;export hbase data via hdfs (hadoop commands)</a></p><p><a href="https://www.jianshu.com/p/a97fbdb2f03f">简书@冯宇Ops - HBase入坑须知(一)</a></p><p><a href="https://blog.csdn.net/u012882134/article/details/52527469">CSDN@芙兰泣露 - hbase导入导出数据</a></p><p><a href="https://blog.csdn.net/helloxiaozhe/article/details/80325212">CSDN@Data_IT_Farmer - Hbase表两种数据备份方法-导入和导出示例</a></p><p><a href="https://blog.csdn.net/weixin_40652340/article/details/78744518">CSDN@weixin_4065234 - Hbase数据库的常用操作命令</a></p><p><a href="https://blog.csdn.net/maligebazi/article/details/79952459">CSDN@niugeblog - hbase 命令详解之namespace与table</a></p><p><a href="https://3nice.cc/2018/10/01/markdowntable/">3NICE - 解决在Markdown中的表格单元格合并的问题</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;我接触的第一个 非关系型数据库 就是 HBase ,有关它的更多概念我这里就不说了。本文关注的是它的搭建、配置、使用。&lt;/p&gt;</summary>
    
    
    
    <category term="大数据" scheme="https://blackyau.cc/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
    <category term="CentOS" scheme="https://blackyau.cc/tags/CentOS/"/>
    
    <category term="Linux" scheme="https://blackyau.cc/tags/Linux/"/>
    
    <category term="Hadoop" scheme="https://blackyau.cc/tags/Hadoop/"/>
    
    <category term="HDFS" scheme="https://blackyau.cc/tags/HDFS/"/>
    
    <category term="HBase" scheme="https://blackyau.cc/tags/HBase/"/>
    
    <category term="Hadoop HA" scheme="https://blackyau.cc/tags/Hadoop-HA/"/>
    
    <category term="Hadoop 高可用" scheme="https://blackyau.cc/tags/Hadoop-%E9%AB%98%E5%8F%AF%E7%94%A8/"/>
    
    <category term="ZooKeeper" scheme="https://blackyau.cc/tags/ZooKeeper/"/>
    
  </entry>
  
  <entry>
    <title>Hadoop HA 搭建</title>
    <link href="https://blackyau.cc/16"/>
    <id>https://blackyau.cc/16</id>
    <published>2019-05-28T07:54:12.000Z</published>
    <updated>2019-05-29T16:09:12.000Z</updated>
    
    <content type="html"><![CDATA[<p>因为 HDFS 的 NameNode 存在单点问题，当它出现问题的时候整个 HDFS 都会无法访问。基于 ZooKeeper 搭建一个 Hadoop HA 高可用分布式署集群就尤为重要。</p><span id="more"></span><h2 id="环境介绍"><a href="#环境介绍" class="headerlink" title="环境介绍"></a>环境介绍</h2><p>本次搭建的目标为，搭建 3 个 <code>DataNode</code> ，2个 <code>NameNode</code> ，2个 <code>yarn</code> 。并让两个 <code>NameNode</code> 做到能够异常自动切换， <code>yarn</code> 也同理。如下表：</p><table><thead><tr><th>HostName</th><th>Function</th><th>IP</th></tr></thead><tbody><tr><td>master</td><td>DataNode&#x2F;NameNode&#x2F;ResourceManager</td><td>192.168.66.128</td></tr><tr><td>slave1</td><td>DataNode&#x2F;NameNode&#x2F;JobHistoryServer</td><td>192.168.66.129</td></tr><tr><td>slave2</td><td>DataNode&#x2F;ResourceManager</td><td>192.168.66.130</td></tr></tbody></table><p>软件版本如下：</p><table><thead><tr><th>Program</th><th>Version</th><th>URL</th></tr></thead><tbody><tr><td>System</td><td>CentOS-7-x86_64-Minimal-1810</td><td><a href="https://mirrors.tuna.tsinghua.edu.cn/centos-vault/centos/7.9.2009/isos/x86_64/">TUNA Mirrors</a></td></tr><tr><td>JAVA</td><td>jdk-8u211-linux-x64.tar.gz</td><td><a href="https://www.oracle.com/cn/java/technologies/javase/javase8u211-later-archive-downloads.html">Oracle</a></td></tr><tr><td>Hadoop</td><td>hadoop-2.6.0.tar.gz</td><td><a href="http://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/">Apache Archive</a></td></tr><tr><td>ZooKeeper</td><td>zookeeper-3.4.5.tar.gz</td><td><a href="http://archive.apache.org/dist/zookeeper/zookeeper-3.4.5/">Apache Archive</a></td></tr></tbody></table><p>本文不会介绍理论性的东西，更多关于 <code>ZooKeeper</code> 和 <code>Hadoop HA</code> 定义相关的信息可以参考这个文章 <a href="https://segmentfault.com/a/1190000016349824">SegmentFault@Snailclimb - 可能是全网把 ZooKeeper 概念讲的最清楚的一篇文章</a></p><h2 id="基础环境配置"><a href="#基础环境配置" class="headerlink" title="基础环境配置"></a>基础环境配置</h2><p>参考 <a href="/11" title="Hadoop 伪分布部署">Hadoop 伪分布部署</a> 和 <a href="/14" title="Hadoop 完全分布部署">Hadoop 完全分布部署</a> 吧，这里不再多说。在开始配置之前吧所有相关服务都停止了再继续。</p><h2 id="ZooKeeper-配置"><a href="#ZooKeeper-配置" class="headerlink" title="ZooKeeper 配置"></a>ZooKeeper 配置</h2><p>首先下载，解压 ZooKeeper</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">curl -O http://archive.apache.org/dist/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz</span><br><span class="line">tar xf zookeeper-3.4.5.tar.gz -C /usr/local/src/</span><br><span class="line">vi ~/.bash_profile</span><br></pre></td></tr></table></figure><p>在文本中添加以下内容</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">export ZOOKEEPER_HOME=/usr/local/src/zookeeper-3.4.5</span><br><span class="line">PATH=$PATH:$ZOOKEEPER_HOME/bin</span><br></pre></td></tr></table></figure><p>更新使其生效</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">source ~/.bash_profile</span><br></pre></td></tr></table></figure><p>编辑 ZooKeeper 配置文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">cp /usr/local/src/zookeeper-3.4.5/conf/zoo_sample.cfg /usr/local/src/zookeeper-3.4.5/conf/zoo.cfg</span><br><span class="line">vi /usr/local/src/zookeeper-3.4.5/conf/zoo.cfg</span><br></pre></td></tr></table></figure><p>修改后如下</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br></pre></td><td class="code"><pre><span class="line"># The number of milliseconds of each tick</span><br><span class="line">tickTime=2000</span><br><span class="line"># The number of ticks that the initial </span><br><span class="line"># synchronization phase can take</span><br><span class="line">initLimit=10</span><br><span class="line"># The number of ticks that can pass between </span><br><span class="line"># sending a request and getting an acknowledgement</span><br><span class="line">syncLimit=5</span><br><span class="line"># the directory where the snapshot is stored.</span><br><span class="line"># do not use /tmp for storage, /tmp here is just </span><br><span class="line"># example sakes.</span><br><span class="line">dataDir=/usr/local/src/zookeeper-3.4.5/data</span><br><span class="line">dataLogDir=/usr/local/src/zookeeper-3.4.5/logs</span><br><span class="line"># the port at which the clients will connect</span><br><span class="line">clientPort=2181</span><br><span class="line"></span><br><span class="line">server.1=master:2888:3888</span><br><span class="line">server.2=slave1:2888:3888</span><br><span class="line">server.3=slave2:2888:3888</span><br><span class="line"></span><br><span class="line">#</span><br><span class="line"># Be sure to read the maintenance section of the </span><br><span class="line"># administrator guide before turning on autopurge.</span><br><span class="line">#</span><br><span class="line"># http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance</span><br><span class="line">#</span><br><span class="line"># The number of snapshots to retain in dataDir</span><br><span class="line">#autopurge.snapRetainCount=3</span><br><span class="line"># Purge task interval in hours</span><br><span class="line"># Set to &quot;0&quot; to disable auto purge feature</span><br><span class="line">#autopurge.purgeInterval=1</span><br></pre></td></tr></table></figure><p>注意这里的 <code>dataDir</code> 不要放在 <code>/tmp</code> 或 <code>$HADOOP_HOME/tmp</code> 里面去，因为这两个目录都不能长久的保存数据，而 <code>ZooKeeper</code> 需要数据被长期保存。请注意，这里的配置需要在另外两台机子(slave1&#x2F;slave2)上做同样的配置。可以直接使用 <code>scp</code> 传过去，然后手动配置 <code>~/.bash_profile</code> ，同时还需要手动创建一下文件夹。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_"># </span><span class="language-bash">在 master</span></span><br><span class="line">scp -r /usr/local/src/zookeeper-3.4.5 slave1:/usr/local/src/</span><br><span class="line">scp -r /usr/local/src/zookeeper-3.4.5 slave2:/usr/local/src/</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_"># </span><span class="language-bash">每台机子都需要</span></span><br><span class="line">mkdir /usr/local/src/zookeeper-3.4.5/logs</span><br><span class="line">mkdir /usr/local/src/zookeeper-3.4.5/data</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_"># </span><span class="language-bash">接下来在每台机子上都建立 myid 文件,并分别写入数字 1、2、3</span></span><br><span class="line">[root@master ~]# echo 1 &gt; /usr/local/src/zookeeper-3.4.5/data/myid</span><br><span class="line">[root@slave1 ~]# echo 2 &gt; /usr/local/src/zookeeper-3.4.5/data/myid</span><br><span class="line">[root@slave2 ~]# echo 3 &gt; /usr/local/src/zookeeper-3.4.5/data/myid</span><br></pre></td></tr></table></figure><p>接下来每台机子上都启动一下同时查看运行是否正常。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">zkServer.sh start</span><br><span class="line">zkServer.sh status</span><br></pre></td></tr></table></figure><p>如下有服务器进入了 <code>leader</code> 或 <code>follower</code> 模式即为启动成功</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br></pre></td><td class="code"><pre><span class="line">[root@master ~]# zkServer.sh status</span><br><span class="line">JMX enabled by default</span><br><span class="line">Using config: /usr/local/src/zookeeper-3.4.5/bin/../conf/zoo.cfg</span><br><span class="line">Mode: follower</span><br><span class="line"></span><br><span class="line">[root@slave1 ~]# zkServer.sh status</span><br><span class="line">JMX enabled by default</span><br><span class="line">Using config: /usr/local/src/zookeeper-3.4.5/bin/../conf/zoo.cfg</span><br><span class="line">Mode: leader</span><br><span class="line"></span><br><span class="line">[root@slave2 ~]# zkServer.sh status</span><br><span class="line">JMX enabled by default</span><br><span class="line">Using config: /usr/local/src/zookeeper-3.4.5/bin/../conf/zoo.cfg</span><br><span class="line">Mode: follower</span><br></pre></td></tr></table></figure><h2 id="Hadoop-HA-配置"><a href="#Hadoop-HA-配置" class="headerlink" title="Hadoop HA 配置"></a>Hadoop HA 配置</h2><h3 id="core-site-xml"><a href="#core-site-xml" class="headerlink" title="core-site.xml"></a>core-site.xml</h3><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&lt;?xml version=<span class="string">&quot;1.0&quot;</span> encoding=<span class="string">&quot;UTF-8&quot;</span>?&gt;</span></span><br><span class="line"><span class="meta">&lt;?xml-stylesheet type=<span class="string">&quot;text/xsl&quot;</span> href=<span class="string">&quot;configuration.xsl&quot;</span>?&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!--指定nameservice的名称，自定义，但后面必须保持一致--&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">name</span>&gt;</span>fs.defaultFS<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">value</span>&gt;</span>hdfs://nscluster<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">name</span>&gt;</span>hadoop.tmp.dir<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">value</span>&gt;</span>/root/hadoop/tmp<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- ZooKeeper服务器地址列表 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">name</span>&gt;</span>ha.zookeeper.quorum<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">value</span>&gt;</span>master:2181,slave1:2181,slave2:2181<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 主备NameNode切换时使用ssh登录上去杀掉进程 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.ha.fencing.methods<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">value</span>&gt;</span>sshfence<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 指定ssh的密钥 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.ha.fencing.ssh.private-key-files<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">value</span>&gt;</span>/root/.ssh/id_rsa<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure><h3 id="hdfs-site-xml"><a href="#hdfs-site-xml" class="headerlink" title="hdfs-site.xml"></a>hdfs-site.xml</h3><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&lt;?xml version=<span class="string">&quot;1.0&quot;</span> encoding=<span class="string">&quot;UTF-8&quot;</span>?&gt;</span></span><br><span class="line"><span class="meta">&lt;?xml-stylesheet type=<span class="string">&quot;text/xsl&quot;</span> href=<span class="string">&quot;configuration.xsl&quot;</span>?&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.replication<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">value</span>&gt;</span>3<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!--指定hdfs元数据存储的路径--&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.namenode.name.dir<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>file:/root/hadoop/tmp/data/nn<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!--指定hdfs数据存储的路径--&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.datanode.data.dir<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>file:/root/hadoop/tmp/data/dn<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!--关闭权限验证 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.permissions.enabled<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>false<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!--以下为ha的相关配置--&gt;</span></span><br><span class="line"><span class="comment">&lt;!-- 指定hdfs的nameservice的名称为nscluster，务必与core-site.xml中的逻辑名称相同 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.nameservices<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>nscluster<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span>    </span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 指定nscluster的两个namenode的名称，分别是nn1，nn2，注意后面的后缀.nscluster，这个是自定义的，如果逻辑名称为nsc，则后缀为.nsc，下面一样 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.ha.namenodes.nscluster<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>nn1,nn2<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span>    </span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 配置nn1，nn2的rpc通信 端口    --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.namenode.rpc-address.nscluster.nn1<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>master:9000<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.namenode.rpc-address.nscluster.nn2<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>slave1:9000<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span>    </span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 配置nn1，nn2的http访问端口 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.namenode.http-address.nscluster.nn1<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>master:50070<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.namenode.http-address.nscluster.nn2<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>slave1:50070<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span>    </span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 指定namenode的元数据存储在journalnode中的路径 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.namenode.shared.edits.dir<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>qjournal://master:8485;slave1:8485;slave2:8485/nscluster<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span>    </span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 开启失败故障自动转移 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.ha.automatic-failover.enabled<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>true<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span>     </span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 配置失败自动切换的方式 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.client.failover.proxy.provider.nscluster<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure><h3 id="yarn-site-xml"><a href="#yarn-site-xml" class="headerlink" title="yarn-site.xml"></a>yarn-site.xml</h3><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br><span class="line">76</span><br><span class="line">77</span><br><span class="line">78</span><br><span class="line">79</span><br><span class="line">80</span><br><span class="line">81</span><br><span class="line">82</span><br><span class="line">83</span><br><span class="line">84</span><br><span class="line">85</span><br><span class="line">86</span><br><span class="line">87</span><br><span class="line">88</span><br><span class="line">89</span><br><span class="line">90</span><br><span class="line">91</span><br><span class="line">92</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&lt;?xml version=<span class="string">&quot;1.0&quot;</span>?&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.nodemanager.aux-services<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">value</span>&gt;</span>mapreduce_shuffle<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!--以下为ha配置--&gt;</span></span><br><span class="line"><span class="comment">&lt;!-- 开启yarn ha --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.ha.enabled<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>true<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 指定yarn ha的名称 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.cluster-id<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>nscluster-yarn<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!--启用自动故障转移--&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.ha.automatic-failover.enabled<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>true<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- resourcemanager的两个名称 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.ha.rm-ids<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>rm1,rm2<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 配置rm1、rm2的主机  --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.hostname.rm1<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>master<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.hostname.rm2<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>slave2<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 配置yarn web访问的端口 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.webapp.address.rm1<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>master:8088<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.webapp.address.rm2<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>slave2:8088<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 配置zookeeper的地址 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.zk-address<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>master:2181,slave1:2181,slave2:2181<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 配置zookeeper的存储位置 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.zk-state-store.parent-path<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>/rmstore<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!--  yarn restart--&gt;</span></span><br><span class="line"><span class="comment">&lt;!-- 开启resourcemanager restart --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.recovery.enabled<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>true<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 配置resourcemanager的状态存储到zookeeper中 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.store.class<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 开启nodemanager restart --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.nodemanager.recovery.enabled<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>true<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 配置rpc的通信端口 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.nodemanager.address<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>0.0.0.0:45454<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure><h3 id="mapred-site-xml"><a href="#mapred-site-xml" class="headerlink" title="mapred-site.xml"></a>mapred-site.xml</h3><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">&lt;?xml version=<span class="string">&quot;1.0&quot;</span>?&gt;</span></span><br><span class="line"><span class="meta">&lt;?xml-stylesheet type=<span class="string">&quot;text/xsl&quot;</span> href=<span class="string">&quot;configuration.xsl&quot;</span>?&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">name</span>&gt;</span>mapreduce.framework.name<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">value</span>&gt;</span>yarn<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure><p>将配置文件同步到所有主机</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">scp -r /usr/local/hadoop-2.6.0/etc/hadoop slave1:/usr/local/hadoop-2.6.0/etc/</span><br><span class="line">scp -r /usr/local/hadoop-2.6.0/etc/hadoop slave2:/usr/local/hadoop-2.6.0/etc/</span><br></pre></td></tr></table></figure><h2 id="启动"><a href="#启动" class="headerlink" title="启动"></a>启动</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_"># </span><span class="language-bash">每台机子都要执行一次</span></span><br><span class="line">zkServer.sh start</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_"># </span><span class="language-bash">master</span> </span><br><span class="line">hadoop-daemons.sh start journalnode # 所有主机启动journalnode集群(带s可以一条命令启动集群)</span><br><span class="line">hdfs zkfc -formatZK # 格式化zkfc</span><br><span class="line">hadoop namenode -format # 格式化hdfs</span><br><span class="line">hadoop-daemon.sh start namenode # 本机启动NameNode</span><br><span class="line">hadoop-daemons.sh start datanode # 所有主机启动DataNode</span><br><span class="line">start-yarn.sh # 本机启动yarn</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_"># </span><span class="language-bash">slave1</span></span><br><span class="line">hdfs namenode -bootstrapStandby # 启动数据同步</span><br><span class="line">hadoop-daemon.sh start namenode # 本机启动NameNode</span><br><span class="line">mr-jobhistory-daemon.sh start historyserver # 启动历史服务器</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_"># </span><span class="language-bash">slave2</span></span><br><span class="line">yarn-daemon.sh start resourcemanager # 启动yarn备用节点</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_"># </span><span class="language-bash">master</span></span><br><span class="line">hadoop-daemons.sh start zkfc # 开启zkfc</span><br></pre></td></tr></table></figure><p>最后一步完成时，两个 <code>NameNode</code> 的其中一个就会变为 <code>active</code></p><h2 id="测试"><a href="#测试" class="headerlink" title="测试"></a>测试</h2><p><a href="http://master:50070/">http://master:50070</a></p><p><a href="http://master:8088/">http://master:8088</a></p><p><a href="http://slave1:50070/">http://slave1:50070</a></p><p><a href="http://slave1:19888/">http://slave1:19888</a></p><p><a href="http://slave2:8088/">http://slave2:8088</a></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">[root@master ~]# jps</span><br><span class="line">10003 DataNode</span><br><span class="line">10852 QuorumPeerMain</span><br><span class="line">10948 DFSZKFailoverController</span><br><span class="line">9797 JournalNode</span><br><span class="line">13050 Jps</span><br><span class="line">13004 ResourceManager</span><br><span class="line">9870 NameNode</span><br><span class="line">11150 NodeManager</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">[root@slave1 ~]# jps</span><br><span class="line">7379 DataNode</span><br><span class="line">7301 JournalNode</span><br><span class="line">8070 NodeManager</span><br><span class="line">7975 DFSZKFailoverController</span><br><span class="line">8218 JobHistoryServer</span><br><span class="line">8778 Jps</span><br><span class="line">7902 QuorumPeerMain</span><br><span class="line">7615 NameNode</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">[root@slave2 ~]# jps</span><br><span class="line">7317 JournalNode</span><br><span class="line">7765 QuorumPeerMain</span><br><span class="line">7989 ResourceManager</span><br><span class="line">7880 NodeManager</span><br><span class="line">7385 DataNode</span><br><span class="line">9839 Jps</span><br></pre></td></tr></table></figure><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><p><a href="https://item.jd.com/12109713.html">Hadoop: The Definitive Guide@Tom White</a></p><p><a href="https://blog.51cto.com/sstudent/1388865">51CTO博客@maisr25 - hadoop2.0 HA的主备自动切换</a></p><p><a href="https://blog.51cto.com/sstudent/1381674">51CTO博客@maisr25 - hadoop2.0 QJM方式的HA的配置</a></p><p><a href="https://www.cnblogs.com/learn21cn/p/6184490.html">博客园@learn21cn - zookeeper集群的搭建以及hadoop ha的相关配置</a></p><p><a href="https://www.cnblogs.com/YellowstonePark/p/7750213.html">博客园@黄石公园 - 大数据系列（hadoop） Hadoop+Zookeeper 3节点高可用集群搭建</a></p><p><a href="https://www.cnblogs.com/jingpeng77/p/9652380.html">博客园@一蓑烟雨任平生 - hadoop集群搭建（伪分布式）+使用自带jar包计算pi圆周率</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;因为 HDFS 的 NameNode 存在单点问题，当它出现问题的时候整个 HDFS 都会无法访问。基于 ZooKeeper 搭建一个 Hadoop HA 高可用分布式署集群就尤为重要。&lt;/p&gt;</summary>
    
    
    
    <category term="大数据" scheme="https://blackyau.cc/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
    <category term="CentOS" scheme="https://blackyau.cc/tags/CentOS/"/>
    
    <category term="Linux" scheme="https://blackyau.cc/tags/Linux/"/>
    
    <category term="Hadoop" scheme="https://blackyau.cc/tags/Hadoop/"/>
    
    <category term="HDFS" scheme="https://blackyau.cc/tags/HDFS/"/>
    
    <category term="Hadoop HA" scheme="https://blackyau.cc/tags/Hadoop-HA/"/>
    
    <category term="Hadoop 高可用" scheme="https://blackyau.cc/tags/Hadoop-%E9%AB%98%E5%8F%AF%E7%94%A8/"/>
    
    <category term="ZooKeeper" scheme="https://blackyau.cc/tags/ZooKeeper/"/>
    
    <category term="YARN" scheme="https://blackyau.cc/tags/YARN/"/>
    
    <category term="Hadoop 集群管理" scheme="https://blackyau.cc/tags/Hadoop-%E9%9B%86%E7%BE%A4%E7%AE%A1%E7%90%86/"/>
    
  </entry>
  
  <entry>
    <title>Hadoop 添加和删除节点</title>
    <link href="https://blackyau.cc/15"/>
    <id>https://blackyau.cc/15</id>
    <published>2019-05-07T08:55:43.000Z</published>
    <updated>2019-05-28T08:21:45.000Z</updated>
    
    <content type="html"><![CDATA[<p>Hadoop 集群的添加和删除节点比较容易，这里也就做个记录。</p><span id="more"></span><h2 id="环境介绍"><a href="#环境介绍" class="headerlink" title="环境介绍"></a>环境介绍</h2><p>目前我使用的虚拟机搭建，共有 1 个 <code>NameNode</code> 和 另外 2 个 <code>DataNode</code></p><p>| HostName | Function | ip |<br>| — | — | — | — |<br>| master | NameNode&#x2F;DataNode | 192.168.66.128 |<br>| slave1 | DataNode | 192.168.66.129 |<br>| slave2 | DataNode | 192.168.66.130 |</p><p><code>hdfs-site.xml</code> 中的 <code>dfs.replication</code> 使用的默认值 <code>3</code></p><h2 id="关键配置"><a href="#关键配置" class="headerlink" title="关键配置"></a>关键配置</h2><p>因为我之前在部署环境的时候没有设置 <code>dfs.hosts</code> 和 <code>dfs.hosts.exclude</code> 。先简单介绍一下。</p><p><code>dfs.hosts</code> 是在 <code>hdfs-site.xml</code> 中的一项配置，它定义了允许连接到 <code>NameNode</code> 的主机列表。在默认情况下(不配置该项)就会允许所有主机。</p><p><code>dfs.hosts.exclude</code> 是在 <code>hdfs-site.xml</code> 中的一项配置，它的功能正好与 <code>dfs.hosts</code> 相反，它是用于定义不允许连接到 <code>NameNode</code> 的主机列表。在默认情况下(不配置该项)就不会排除任何主机。</p><p>类似的，资源管理器配置文件 <code>yarn-site.xml</code> 中也拥有与 <code>HDFS</code> 这两项功能相似的配置，分别为 <code>yarn.resourcemanager.nodes.include-path</code> 和 <code>yarn.resourcemanager.nodes.exclude-path</code></p><p>一般情况下，我们会将 <code>yarn.resourcemanager.nodes.include-path</code> 和 <code>dfs.hosts</code> 指向同一个文件，将 <code>yarn.resourcemanager.nodes.exclude-path</code> 和 <code>dfs.hosts.exclude</code> 指向同一文件。但是他们在特殊情况下，具体的实现会细微的差别，后面再提。</p><h2 id="准备工作"><a href="#准备工作" class="headerlink" title="准备工作"></a>准备工作</h2><p>首先为了操作方便我们先创建好 <code>include</code> 和 <code>exclude</code> 文件，并往 <code>include</code> 中写入已存在的集群信息并测试一下是否运行正常</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">vi $HADOOP_HOME/etc/hadoop/include</span><br><span class="line">master</span><br><span class="line">slave1</span><br><span class="line">slave2</span><br></pre></td></tr></table></figure><p><code>exclude</code> 先不写入信息，准备好就行了</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">touch $HADOOP_HOME/etc/hadoop/exclude</span><br></pre></td></tr></table></figure><p>停止服务修改配置文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">stop-all.sh</span><br><span class="line">vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml</span><br></pre></td></tr></table></figure><p>写入以下信息</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line"></span><br><span class="line"> <span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.replication<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="comment">&lt;!-- 因为这里要把拥有3个DataNode的集群删除一个节点,所以要把备份数调为2以免出错 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>2<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line">  <span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.hosts<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="comment">&lt;!-- 允许连接到HDFS的主机列表 --&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>/usr/local/hadoop-2.6.0/etc/hadoop/include<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line">  <span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.hosts.exclude<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="comment">&lt;!-- 阻止连接到HDFS的主机列表 --&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>/usr/local/hadoop-2.6.0/etc/hadoop/exclude<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi $HADOOP_HOME/etc/hadoop/yarn-site.xml</span><br></pre></td></tr></table></figure><p>添加以下信息</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.nodes.include-path<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="comment">&lt;!-- 允许连接列表 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>/usr/local/hadoop-2.6.0/etc/hadoop/include<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.nodes.exclude-path<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="comment">&lt;!-- 阻止连接列表 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>/usr/local/hadoop-2.6.0/etc/hadoop/exclude<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br></pre></td></tr></table></figure><p>启动服务</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">start-dfs.sh</span><br><span class="line">start-yarn.sh</span><br></pre></td></tr></table></figure><p>用 <code>hdfs dfs -ls /</code> 和 <code>get/put</code> 之类的试试 <code>HDFS</code> 工作正常不,然后也可以用 <code>hive</code> 的 <code>select count(*)</code> 之类的语句一下 <code>MapReduce</code> 工作正常不，如果正常的话就可以走下一步了。</p><h2 id="删除节点"><a href="#删除节点" class="headerlink" title="删除节点"></a>删除节点</h2><p>判断一个节点能否连接到<code>资源管理器</code>非常简单。仅当节点出现在 <code>include</code> 文件且不出现在 <code>exclude</code> 文件中时，才能够连接到资源管理器。注意，如果未指定 <code>include</code> 文件或为空的话，则意味着所有节点都可以连接到<code>资源管理器</code>。</p><p>对于 <code>HDFS</code> 来说，<code>include</code> 和 <code>exclude</code> 文件稍有不同，如果一个 <code>DataNode</code> 在 <code>include</code> 文件中出现同时也在 <code>exlude</code> 中那么说明该节点即将被删除添加。下表总结了 <code>DataNode</code> 的不同组合方式。</p><table><thead><tr><th>是否在 include 中</th><th>是否在 exclude 中</th><th>状态</th></tr></thead><tbody><tr><td>否</td><td>否</td><td>无法连接</td></tr><tr><td>否</td><td>是</td><td>无法连接</td></tr><tr><td>是</td><td>否</td><td>可连接</td></tr><tr><td>是</td><td>是</td><td>可连接,将被删除</td></tr></tbody></table><blockquote><p>需要注意的是 <code>dfs.hosts</code> 和 <code>yarn.resourcemanager.nodes.include-path</code> 属性指定的文件(<code>include</code> 和 <code>exclude</code>)不同于 <code>slaves</code> 文件，前者供 <code>NameNode</code> 和资源管理器使用，用于决定可以连接那些节点。Hadoop 控制脚本使用 <code>slaves</code> 文件执行面向整个集群范围的操作，例如重启集群等。Hadoop 守护进程从不使用 <code>slaves</code> 文件。</p></blockquote><p>进入 <code>exclude</code> 配置，写入即将删除的节点</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi $HADOOP_HOME/etc/hadoop/exclude</span><br></pre></td></tr></table></figure><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">slave2</span><br></pre></td></tr></table></figure><p>运行以下指令，将节点信息更新至 <code>NameNode</code> 和 <code>资源管理器</code></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">hdfs dfsadmin -refreshNodes</span><br><span class="line">yarn rmadmin -refreshNodes</span><br></pre></td></tr></table></figure><p>你可以在 WEB 端看到 <code>Datanode</code> 的 <code>Admin State</code> 变化如下 <code>In Service</code> &gt; <code>Decommission In Progress</code> &gt; <code>Decommissioned</code></p><p><img data-src="https://st.blackyau.net/blog/15/1.png" alt="1"></p><p><img data-src="https://st.blackyau.net/blog/15/2.png" alt="2"></p><blockquote><p>如果在 <code>Decommission In Progress</code> 卡了很久，可能是你没有吧 <code>dfs.replication</code> 调低。我之前遇到了这个问题。</p></blockquote><blockquote><p>同时我在删除了1个节点后 HDFS 会疯狂报错，一直找不到解决方案。但是它也不影响 HDFS 的正常运行就直接无视了 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to place enough replicas, still in need of 1 to reach 3 (unavailableStorages&#x3D;[DISK, ARCHIVE], storagePolicy&#x3D;BlockStoragePolicy{HOT:7, storageTypes&#x3D;[DISK], creationFallbacks&#x3D;[], replicationFallbacks&#x3D;[ARCHIVE]}, newBlock&#x3D;false) All required storage types are unavailable:  unavailableStorages&#x3D;[DISK, ARCHIVE], storagePolicy&#x3D;BlockStoragePolicy{HOT:7, storageTypes&#x3D;[DISK], creationFallbacks&#x3D;[], replicationFallbacks&#x3D;[ARCHIVE]}</p></blockquote><p>资源管理器状态如下</p><p><img data-src="https://st.blackyau.net/blog/15/4.png" alt="4"></p><p>接下来你就可以将该节点从 <code>include</code> 文件中完全移除了</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">vi $HADOOP_HOME/etc/hadoop/include</span><br><span class="line">master</span><br><span class="line">slave1</span><br></pre></td></tr></table></figure><p>更新一下节点信息</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">hdfs dfsadmin -refreshNodes</span><br></pre></td></tr></table></figure><p><img data-src="https://st.blackyau.net/blog/15/3.png" alt="3"></p><p>如图，节点已经被完全删除了</p><h2 id="添加节点"><a href="#添加节点" class="headerlink" title="添加节点"></a>添加节点</h2><p>添加一个刚刚被删除的节点，会让它感到懵逼。所以这里我们先重启一下。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">stop-yarn.sh</span><br><span class="line">stop-dfs.sh</span><br><span class="line">start-dfs.sh</span><br><span class="line">start-yarn.sh</span><br></pre></td></tr></table></figure><p>直接将节点信息添加进 <code>include</code> ,同时把它从 <code>exclude</code> 中删除</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">vi $HADOOP_HOME/etc/hadoop/include</span><br><span class="line">master</span><br><span class="line">slave1</span><br><span class="line">slave2</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">vi $HADOOP_HOME/etc/hadoop/exclude</span><br><span class="line">dd</span><br></pre></td></tr></table></figure><p>更新一下节点信息</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">hdfs dfsadmin -refreshNodes</span><br><span class="line">yarn rmadmin -refreshNodes</span><br></pre></td></tr></table></figure><p>添加节点成功</p><p><img data-src="https://st.blackyau.net/blog/15/5.png" alt="5"></p><p><img data-src="https://st.blackyau.net/blog/15/6.png" alt="6"></p><p>其他 大数据系列文章 请看 <a href="https://blackyau.cc/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/">这里</a></p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><p><a href="https://item.jd.com/12109713.html">Hadoop: The Definitive Guide@Tom White</a></p><p><a href="http://wenda.chinahadoop.cn/question/3051">小象问答@fish - hadoop集群删除数据节点一直处于Decommission in progress状态问题</a></p><p><a href="https://issues.apache.org/jira/browse/HDFS-1590">Apache issues@Jim Huang - Decommissioning never ends when node to decommission has blocks that are under-replicated and cannot be replicated to the expected level of replication</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;Hadoop 集群的添加和删除节点比较容易，这里也就做个记录。&lt;/p&gt;</summary>
    
    
    
    <category term="大数据" scheme="https://blackyau.cc/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
    <category term="CentOS" scheme="https://blackyau.cc/tags/CentOS/"/>
    
    <category term="Linux" scheme="https://blackyau.cc/tags/Linux/"/>
    
    <category term="Hadoop" scheme="https://blackyau.cc/tags/Hadoop/"/>
    
    <category term="HDFS" scheme="https://blackyau.cc/tags/HDFS/"/>
    
    <category term="YARN" scheme="https://blackyau.cc/tags/YARN/"/>
    
    <category term="VMware" scheme="https://blackyau.cc/tags/VMware/"/>
    
    <category term="Hadoop 集群管理" scheme="https://blackyau.cc/tags/Hadoop-%E9%9B%86%E7%BE%A4%E7%AE%A1%E7%90%86/"/>
    
  </entry>
  
  <entry>
    <title>Hadoop 完全分布部署</title>
    <link href="https://blackyau.cc/14"/>
    <id>https://blackyau.cc/14</id>
    <published>2019-04-24T12:00:29.000Z</published>
    <updated>2019-04-25T08:18:40.000Z</updated>
    
    <content type="html"><![CDATA[<p>之前有说过Hadoop伪分布的部署，这次来讲讲完全分布。总体来说和伪分布的配置差别不大，只是不同机子之前的衔接和小部分的配置修改。</p><span id="more"></span><p>这次就不像以前的傻瓜式教程了，这次主要说说在这一次搭建中遇到的一些问题，同时也是对之前伪分布搭建的补充。</p><h2 id="安装-CentOS"><a href="#安装-CentOS" class="headerlink" title="安装 CentOS"></a>安装 CentOS</h2><p>之前分配只分配了 <code>1G</code> 内存太小了，至少要给 <code>master</code> 分配 <code>4G</code> 的内存。在安装的时候可以直接打开网络，同时吧 <code>hostname</code> 设置好，省的后面又要自己去改。先打开网络之后，就用打开一下 <code>网络时间同步</code> 同时吧时区调到和真机一样，方便调试。磁盘分区什么的直接自动走起就行了。</p><h2 id="基础环境配置"><a href="#基础环境配置" class="headerlink" title="基础环境配置"></a>基础环境配置</h2><p>关 <code>selinux</code> 和关 <code>防火墙</code> 是必须的，省了很多麻烦。然后我这次在部署的时候，为每个程序都分了一个用户(hadoop, yarn, hdfs, mapred 都在 <code>hadoop group</code> 里面)。这就和之前无脑使用 <code>root</code> 有点不一样，要很注意权限的问题。冷不丁顶的就会 <code>Permission denied</code> ，而且文件被新建的权限同组用户是只有读权限的。每次都要手动新建一下，然后 <code>chmod g+w xxxxxx</code> 挺讲究。其次就是 <code>ssh</code> 的配置，因为 <code>hdfs</code> 和 <code>yarn</code> 在不同的机子之间有交互，所以要把所有的 <code>id_rsa.pub</code> 都放在同一个 <code>authorized_key</code> 里面，每个机子的 <code>hdfs</code> 和 <code>yarn</code> 用户都要，然后每一行一个，如下：</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">ssh-rsa AAAAB3N......KcSd+lF9EYT7ED9KMWlajl yarn@master</span><br><span class="line">ssh-rsa AAAAB3N......pggVZIG3+ElmBlPqYp3O8D hdfs@master</span><br><span class="line">ssh-rsa AAAAB3N......a2X44NMnhAbcQVDcMwdoIA hdfs@slave1</span><br><span class="line">ssh-rsa AAAAB3N......PCBqtandHgi0CwKsLsCv2d yarn@slave1</span><br><span class="line">ssh-rsa AAAAB3N......GXNfdgrthrtgf0+1DrYHGp hdfs@slave2</span><br><span class="line">ssh-rsa AAAAB3N......PXvHKFTQ2b8Xt8ZAvB/dKy yarn@slave2</span><br></pre></td></tr></table></figure><h2 id="Hadoop配置"><a href="#Hadoop配置" class="headerlink" title="Hadoop配置"></a>Hadoop配置</h2><p>配置上面其实和伪分布差不了太多，只有一些小地方有变化。而且很方便的就是，所有机子的配置文件是一样的。所以不需要对每个单独的写配置，无脑的复制粘贴就好了。在机子比较多的情况下，还有很多可以自动化同步配置的工具，我这里目前用不上也没了解，就不多说了。直接看看配置吧。</p><h3 id="core-site-xml"><a href="#core-site-xml" class="headerlink" title="core-site.xml"></a>core-site.xml</h3><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line"><span class="comment">&lt;!-- Hadoop Core 配置项，用于设置 HDFS、MapReduce和YARN常用的 I/O 设置 --&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="comment">&lt;!-- 默认文件系统，端口默认值为 8020 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>fs.defaultFS<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>hdfs://master/<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="comment">&lt;!-- HDFS的储存目录,HDSF里面的数据都该目录内 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>hadoop.tmp.dir<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>/usr/local/hadoop-2.6.0/tmp<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure><h3 id="hdfs-site-xml"><a href="#hdfs-site-xml" class="headerlink" title="hdfs-site.xml"></a>hdfs-site.xml</h3><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line"><span class="comment">&lt;!-- HDFS的配置,因为我没什么需求挺多目录都跟随上面的 hadoop.tmp.dir 走就行了--&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line"> <span class="comment">&lt;!-- HDFS的默认副本数，其实这里的值默认就是3可以不用改 --&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.replication<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>3<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure><h3 id="yarn-site-xml"><a href="#yarn-site-xml" class="headerlink" title="yarn-site.xml"></a>yarn-site.xml</h3><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- YARN 守护进程的配置项，包括资源管理器，WEB应用代理服务器和节点管理器 --&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="comment">&lt;!-- 指定一台电脑作为资源管理器 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.hostname<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>master<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="comment">&lt;!-- 看官方文档说这里的默认是就是这个，但是书上说需要手动设置才能运行正常。后面有时间再深究 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.nodemanager.aux-services<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>mapreduce_shuffle<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure><h3 id="mapred-site-xml"><a href="#mapred-site-xml" class="headerlink" title="mapred-site.xml"></a>mapred-site.xml</h3><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">configuration</span>&gt;</span></span><br><span class="line"><span class="comment">&lt;!-- 这个文件本身是不存在的，需要手动 cp 一下改个名字。字如其名，MapReduce的配置文件 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="comment">&lt;!-- 指定 MapReduce 的使用框架，这里如果不指定的话他就默认使用的 local 模式，而且 8088 里面看不到 job--&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>mapreduce.framework.name<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>yarn<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;/<span class="name">configuration</span>&gt;</span></span><br></pre></td></tr></table></figure><h3 id="slaves"><a href="#slaves" class="headerlink" title="slaves"></a>slaves</h3><p>这是一个和其他配置文件同一目录的纯文本，它里面定义了该集群中所有主机的 <code>hostname</code> 或 <code>ip</code> ，这里我使用的 <code>hostname</code></p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">master</span><br><span class="line">slave1</span><br><span class="line">slave2</span><br></pre></td></tr></table></figure><h2 id="运行"><a href="#运行" class="headerlink" title="运行"></a>运行</h2><p>还需要注意的就是运行的时候，还是因为我使用了不同的用户管理不同的进程。所以运行的时候要来回切换用户，下面是不同服务启动时所对应的用户。</p><table><thead><tr><th>User</th><th>Start command</th></tr></thead><tbody><tr><td><code>hdfs</code></td><td><code>hdfs namenode -format</code></td></tr><tr><td><code>hdfs</code></td><td><code>start-dfs.sh</code></td></tr><tr><td><code>yarn</code></td><td><code>start-yarn.sh</code></td></tr><tr><td><code>mapred</code></td><td><code>mr-jobhistory-daemon.sh start historyserver</code></td></tr></tbody></table><p>JobHistory WEB 端能够正常显示，但是它不显示 <code>job</code> 。查 <code>log</code> 发现还是权限的问题，需要给一下权限。<code>hdfs dfs -chown -R mapred /tmp/hadoop-yarn/staging/history</code>上面的不见效就直接 777 <code>hdfs dfs -chmod 777 /tmp/hadoop-yarn/staging/history</code> 。</p><p>如果启动的时候失败了，数据没有成功同步到所有节点。就吧 <code>hadoop.tmp.dir</code> 和 <code>/tmp</code> 目录里面的东西全部删完，重新格式化就行了。而且注意不要格式化太多次，格式化太多次导致 <code>ID</code> 不一样也会是一个很头大的事情，<a href="https://blackyau.cc/12#%E6%8E%92%E9%94%99">之前</a>因为这个问题我头疼了 1-2 天。</p><h2 id="HIVE"><a href="#HIVE" class="headerlink" title="HIVE"></a>HIVE</h2><p>HIVE 启动的时候，还是要注意删一下和 <code>Hadoop</code> 冲突的 <code>jar</code> ,其中任选一个删除都可以</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">/usr/local/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar</span><br><span class="line">/usr/local/hadoop-2.6.0/share/hadoop/yarn/lib/hive-jdbc-1.1.0-standalone.jar</span><br></pre></td></tr></table></figure><p>也可以用环境变量，让他自己选择高版本的使用。但是这样设置的话，还是会有一串警告，看着挺不爽的。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">export HADOOP_USER_CLASSPATH_FIRST=true</span><br></pre></td></tr></table></figure><p>其他 大数据系列文章 请看 <a href="https://blackyau.cc/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/">这里</a></p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><p><a href="https://item.jd.com/12109713.html">Hadoop: The Definitive Guide@Tom White</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;之前有说过Hadoop伪分布的部署，这次来讲讲完全分布。总体来说和伪分布的配置差别不大，只是不同机子之前的衔接和小部分的配置修改。&lt;/p&gt;</summary>
    
    
    
    <category term="大数据" scheme="https://blackyau.cc/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
    <category term="CentOS" scheme="https://blackyau.cc/tags/CentOS/"/>
    
    <category term="Linux" scheme="https://blackyau.cc/tags/Linux/"/>
    
    <category term="Hadoop" scheme="https://blackyau.cc/tags/Hadoop/"/>
    
    <category term="HDFS" scheme="https://blackyau.cc/tags/HDFS/"/>
    
    <category term="YARN" scheme="https://blackyau.cc/tags/YARN/"/>
    
    <category term="VMware" scheme="https://blackyau.cc/tags/VMware/"/>
    
    <category term="MapReduce" scheme="https://blackyau.cc/tags/MapReduce/"/>
    
  </entry>
  
  <entry>
    <title>Hive 配置</title>
    <link href="https://blackyau.cc/13"/>
    <id>https://blackyau.cc/13</id>
    <published>2019-04-19T12:03:50.000Z</published>
    <updated>2019-06-09T16:46:10.000Z</updated>
    
    <content type="html"><![CDATA[<p>如何将一个基于传统关系型数据库和结构化查询语句（SQL）的现有数据转移到 Hadoop ，对于大量的 SQL 用户来说 HIVE 就是解决这个问题的方案。它提供了一个被称为 Hive 查询语言（简称 HiveQL 或 HQL）的 SQL 方言，来查询储存在 Hadoop 集群中的数据。</p><span id="more"></span><h2 id="目标"><a href="#目标" class="headerlink" title="目标"></a>目标</h2><ul><li>Hive 建表</li><li>Hive 数据加载</li><li>HQL 编写、数据查询统计</li><li>Sqoop 数据推送 MySQL</li></ul><h2 id="准备工作"><a href="#准备工作" class="headerlink" title="准备工作"></a>准备工作</h2><p>这里我选择版本较老的 HIVE 1.1.0 使用，因为它和我 <a href="/11" title="Hadoop 伪分布部署">之前安装过的 Hadoop</a>比较配</p><p>因为版本太老了在 Mirror 里面都找不到，只有 Archive 里面才有</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">curl -O https://archive.apache.org/dist/hive/hive-1.1.0/apache-hive-1.1.0-bin.tar.gz</span><br><span class="line">tar -xf apache-hive-1.1.0-bin.tar.gz</span><br></pre></td></tr></table></figure><h2 id="配置环境变量"><a href="#配置环境变量" class="headerlink" title="配置环境变量"></a>配置环境变量</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /etc/profile</span><br></pre></td></tr></table></figure><p>在配置下方添加以下内容</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">export HIVE_HOME=/root/apache-hive-1.1.0-bin</span><br><span class="line">export PATH=$PATH:$HIVE_HOME/bin</span><br></pre></td></tr></table></figure><p>让系统更新配置文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">source /etc/profile</span><br></pre></td></tr></table></figure><h2 id="修改配置文件"><a href="#修改配置文件" class="headerlink" title="修改配置文件"></a>修改配置文件</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">cp /root/apache-hive-1.1.0-bin/conf/hive-default.xml.template /root/apache-hive-1.1.0-bin/conf/hive-site.xml # 使用默认配置文件为正式配置文件</span><br><span class="line">vi /root/apache-hive-1.1.0-bin/conf/hive-site.xml</span><br></pre></td></tr></table></figure><p>在文件头部增加以下信息，如果不加启动会<a href="https://stackoverflow.com/questions/27099898/java-net-urisyntaxexception-when-starting-hive">报错</a></p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>system:java.io.tmpdir<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>/root/apache-hive-1.1.0-bin/tmp<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>system:user.name<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>root<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br></pre></td></tr></table></figure><h2 id="启动"><a href="#启动" class="headerlink" title="启动"></a>启动</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">hive</span><br></pre></td></tr></table></figure><p>如果你的的命令行变为了</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_">hive&gt; </span></span><br></pre></td></tr></table></figure><p>说明 HIVE 已经启动成功了</p><p>如果你遇到了这类错误</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">SLF4J: Class path contains multiple SLF4J bindings.</span><br></pre></td></tr></table></figure><p>可以执行以下指令，移动冲突的 <code>jar</code> 包到用户目录(也可以删除或加 <code>.bak</code>)</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">mv /root/apache-hive-1.1.0-bin/lib/hive-jdbc-1.1.0-standalone.jar ~</span><br></pre></td></tr></table></figure><p>如果你遇到了这个错误</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">[ERROR] Terminal initialization failed; falling back to unsupported</span><br><span class="line">java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected</span><br><span class="line">at jline.TerminalFactory.create(TerminalFactory.java:101)</span><br><span class="line">at jline.TerminalFactory.get(TerminalFactory.java:158)</span><br><span class="line">at jline.console.ConsoleReader.&lt;init&gt;(ConsoleReader.java:229)</span><br><span class="line">at jline.console.ConsoleReader.&lt;init&gt;(ConsoleReader.java:221)</span><br><span class="line">at jline.console.ConsoleReader.&lt;init&gt;(ConsoleReader.java:209)</span><br><span class="line">at org.apache.hadoop.hive.cli.CliDriver.getConsoleReader(CliDriver.java:773)</span><br><span class="line">at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:715)</span><br><span class="line">at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)</span><br><span class="line">at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615)</span><br><span class="line">at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)</span><br><span class="line">at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)</span><br><span class="line">at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)</span><br><span class="line">at java.lang.reflect.Method.invoke(Method.java:498)</span><br><span class="line">at org.apache.hadoop.util.RunJar.run(RunJar.java:221)</span><br><span class="line">at org.apache.hadoop.util.RunJar.main(RunJar.java:136)</span><br></pre></td></tr></table></figure><p>可以执行一下命令，移动冲突的 <code>jar</code> 包到用户目录(也可以删除或加 <code>.bak</code>)</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">mv /usr/local/hadoop-2.6.0/share/hadoop/yarn/lib/jline-0.9.94.jar ~</span><br></pre></td></tr></table></figure><h2 id="建表"><a href="#建表" class="headerlink" title="建表"></a>建表</h2><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">create</span> database test; <span class="comment">-- 创建数据库</span></span><br><span class="line"><span class="keyword">show</span> databases; <span class="comment">-- 查询已创建的数据库</span></span><br><span class="line">use test; <span class="comment">-- 使用该库</span></span><br><span class="line"><span class="keyword">create table</span> test1(<span class="keyword">hold</span> string, city string, url string)<span class="type">row</span> format delimited fields terminated <span class="keyword">by</span> <span class="string">&#x27;\t&#x27;</span>;</span><br><span class="line"><span class="comment">-- 新建表，并将 \t 作为字段分隔符</span></span><br><span class="line"><span class="keyword">desc</span> test1; <span class="comment">-- 查看表结构</span></span><br><span class="line">quit;</span><br></pre></td></tr></table></figure><h2 id="插入数据"><a href="#插入数据" class="headerlink" title="插入数据"></a>插入数据</h2><p>这是我使用的测试数据 <a href="https://st.blackyau.net/blog/13/small">small</a></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">curl -O https://st.blackyau.net/blog/13/small # 将文件下载到本地</span><br><span class="line">hdfs dfs -put ./small / # 将文件传到HDFS</span><br></pre></td></tr></table></figure><p>接下来的命令在 HIVE 下进行</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">use test; <span class="comment">-- 使用 test 库</span></span><br><span class="line">load data inpath &quot;/small&quot; <span class="keyword">into</span> <span class="keyword">table</span> test1; <span class="comment">-- 将文件加载进 test1 表中</span></span><br><span class="line"><span class="keyword">select</span> <span class="operator">*</span> <span class="keyword">from</span> test1;</span><br><span class="line"><span class="keyword">select</span> city, url <span class="keyword">from</span> tesst1;</span><br></pre></td></tr></table></figure><p>输出数据如下即导入成功</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">OK</span><br><span class="line">白银by.58.com/</span><br><span class="line">庆阳qingyang.58.com/</span><br><span class="line">嘉峪关jyg.58.com/</span><br><span class="line">... 手动省略</span><br><span class="line">宜宾yb.58.com/</span><br><span class="line">自贡zg.58.com/</span><br><span class="line">乐山ls.58.com/</span><br><span class="line">NULL</span><br><span class="line">Time taken: 0.062 seconds, Fetched: 51 row(s)</span><br></pre></td></tr></table></figure><h2 id="HIVE-参数设置"><a href="#HIVE-参数设置" class="headerlink" title="HIVE 参数设置"></a>HIVE 参数设置</h2><p>以下列出的参数大概是有点用的</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br><span class="line">52</span><br><span class="line">53</span><br><span class="line">54</span><br><span class="line">55</span><br><span class="line">56</span><br><span class="line">57</span><br><span class="line">58</span><br><span class="line">59</span><br><span class="line">60</span><br><span class="line">61</span><br><span class="line">62</span><br><span class="line">63</span><br><span class="line">64</span><br><span class="line">65</span><br><span class="line">66</span><br><span class="line">67</span><br><span class="line">68</span><br><span class="line">69</span><br><span class="line">70</span><br><span class="line">71</span><br><span class="line">72</span><br><span class="line">73</span><br><span class="line">74</span><br><span class="line">75</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="comment">&lt;!--- HIVE 大部分操作都会触发一个 MapReduce 修改该参数后它会尝试使用本地模式 以降低资源消耗--&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>hive.exec.mode.local.auto<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>true<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="comment">&lt;!--- 这是 HIVE 加载 JAR 的路径 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>hive.aux.jars.path<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>/&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="comment">&lt;!--- 如果启用了 LIMIT 优化，这个用来控制 LIMIT 的最小行取样量 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>hive.limit.row.max.size<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>100000<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">description</span>&gt;</span>When trying a smaller subset of data for simple LIMIT, how much size we need to guarantee each row to have at least.<span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="comment">&lt;!--- 如果启用了 LIMIT 优化，这个用来控制 LIMIT 的最大文件数 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>hive.limit.optimize.limit.file<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>10<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">description</span>&gt;</span>When trying a smaller subset of data for simple LIMIT, maximum number of files we can sample.<span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="comment">&lt;!--- 是否启用 LIMIT 优化，这是对元数据进行抽样统计，有可能输入有用的数据永远不会被处理到。毕竟是抽样 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>hive.limit.optimize.enable<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>true<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">description</span>&gt;</span>Whether to enable to optimization to trying a smaller subset of data for simple LIMIT first.<span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="comment">&lt;!--- 并行执行，一个查询可能会有多个阶段而且这些阶段可能并非完全相互依赖，所以阶段越多job可能就更快完成。同时它会增加对集群的利用率 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>hive.exec.parallel<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>true<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">description</span>&gt;</span>Whether to execute jobs in parallel<span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="comment">&lt;!--- 并行执行最大线程数 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>hive.exec.parallel.thread.number<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>8<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">description</span>&gt;</span>How many jobs at most can be executed in parallel<span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="comment">&lt;!--- 严格模式,如果修改为 strict 那么会禁止3种类型的查询:1.不限制分区查询(不允许用户扫描所有分区,where必须有两个或以上) 2.order by必须要LIMIT 3.限制笛卡积尔查询,多表连接查询应该使用join</span></span><br><span class="line"><span class="comment">  和on --&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>hive.mapred.mode<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>nonstrict<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">description</span>&gt;</span></span><br><span class="line">    The mode in which the Hive operations are being performed.</span><br><span class="line">    In strict mode, some risky queries are not allowed to run. They include:</span><br><span class="line">      Cartesian Product.</span><br><span class="line">      No partition being picked up for a query.</span><br><span class="line">      Comparing bigints and strings.</span><br><span class="line">      Comparing bigints and doubles.</span><br><span class="line">      Orderby without limit.</span><br><span class="line">  <span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="comment">&lt;!-- 因为Hive使用输入数据量的大小来确定reducer个数，修改这个数量就可以更改使用reducer的个数 --&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>hive.exec.reducers.bytes.per.reducer<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>256000000<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">description</span>&gt;</span>size per reducer.The default is 256Mb, i.e if the input size is 1G, it will use 4 reducers.<span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="comment">&lt;!-- 设置一个查询最多可消耗的reducers的量--&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>hive.exec.reducers.max<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>1009<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">description</span>&gt;</span></span><br><span class="line">    max number of reducers will be used. If the one specified in the configuration parameter mapred.reduce.tasks is</span><br><span class="line">    negative, Hive will use this one as the max number of reducers when automatically determine number of reducers.</span><br><span class="line">  <span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br></pre></td></tr></table></figure><h2 id="使用-HIVE-导出数据"><a href="#使用-HIVE-导出数据" class="headerlink" title="使用 HIVE 导出数据"></a>使用 HIVE 导出数据</h2><h3 id="默认参数导出"><a href="#默认参数导出" class="headerlink" title="默认参数导出"></a>默认参数导出</h3><p>这条命令是在 <code>shell</code> 中执行的，使用 HIVE 的 <code>-e</code> 参数，执行完该命令后就直接退出 HIVE，用重定向写进文档。导出的字段是用制表符 <code>\t</code> 分割的</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">hive -e &#x27;use data;select * from tenxun&#x27; &gt; /home/hadoop/out</span><br></pre></td></tr></table></figure><p>下面这条是 <code>HQL</code> 命令，它将查询的结果写入到指定的 <code>local directory</code> 中去，导出格式和上面的格式相同。</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">insert</span> overwrite <span class="keyword">local</span> directory <span class="string">&#x27;/home/hadoop/out2&#x27;</span></span><br><span class="line"><span class="keyword">select</span> <span class="operator">*</span> <span class="keyword">from</span> tenxun;</span><br></pre></td></tr></table></figure><h3 id="导出-csv-文件"><a href="#导出-csv-文件" class="headerlink" title="导出 csv 文件"></a>导出 csv 文件</h3><p>一开始都和上面一样，后面用 <code>sed</code> 将输出的 <code>\t</code> 全都都替换为 <code>,</code> 后面的 <code>g</code> 表示替换所有</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">hive -e &#x27;use data;select * from tenxun&#x27; | sed &#x27;s/[\t]/,/g&#x27; &gt; /home/hadoop/out5.csv</span><br></pre></td></tr></table></figure><p>同时将字段之间的分割符改为 <code>,</code> </p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">insert</span> overwrite <span class="keyword">local</span> directory <span class="string">&#x27;/home/hadoop/out4&#x27;</span></span><br><span class="line">  <span class="type">row</span> format delimited fields terminated <span class="keyword">by</span> <span class="string">&#x27;,&#x27;</span></span><br><span class="line">  <span class="keyword">select</span> <span class="operator">*</span> <span class="keyword">from</span> tenxun;</span><br></pre></td></tr></table></figure><h2 id="Sqoop-数据推送"><a href="#Sqoop-数据推送" class="headerlink" title="Sqoop 数据推送"></a>Sqoop 数据推送</h2><p>先下载程序为配置做准备</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">curl -O https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz</span><br><span class="line">tar -xf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz</span><br></pre></td></tr></table></figure><h3 id="配置环境变量-1"><a href="#配置环境变量-1" class="headerlink" title="配置环境变量"></a>配置环境变量</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /etc/profile</span><br></pre></td></tr></table></figure><p>在文本下方写入以下配置</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">export SQOOP_HOME=/root/sqoop-1.4.7.bin__hadoop-2.6.0</span><br><span class="line">export PATH=$PATH:$SQOOP_HOME/bin</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sqoop help</span><br></pre></td></tr></table></figure><p>当终端输出以下信息时说明你的配置成功了</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line">Warning: /root/sqoop-1.4.7.bin__hadoop-2.6.0/../hbase does not exist! HBase imports will fail.</span><br><span class="line">Please set $HBASE_HOME to the root of your HBase installation.</span><br><span class="line">Warning: /root/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.</span><br><span class="line">Please set $HCAT_HOME to the root of your HCatalog installation.</span><br><span class="line">Warning: /root/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.</span><br><span class="line">Please set $ACCUMULO_HOME to the root of your Accumulo installation.</span><br><span class="line">Warning: /root/sqoop-1.4.7.bin__hadoop-2.6.0/../zookeeper does not exist! Accumulo imports will fail.</span><br><span class="line">Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.</span><br><span class="line">19/04/20 18:55:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7</span><br><span class="line">usage: sqoop COMMAND [ARGS]</span><br><span class="line"></span><br><span class="line">Available commands:</span><br><span class="line">  codegen            Generate code to interact with database records</span><br><span class="line">  create-hive-table  Import a table definition into Hive</span><br><span class="line">  eval               Evaluate a SQL statement and display the results</span><br><span class="line">  export             Export an HDFS directory to a database table</span><br><span class="line">  help               List available commands</span><br><span class="line">  import             Import a table from a database to HDFS</span><br><span class="line">  import-all-tables  Import tables from a database to HDFS</span><br><span class="line">  import-mainframe   Import datasets from a mainframe server to HDFS</span><br><span class="line">  job                Work with saved jobs</span><br><span class="line">  list-databases     List available databases on a server</span><br><span class="line">  list-tables        List available tables in a database</span><br><span class="line">  merge              Merge results of incremental imports</span><br><span class="line">  metastore          Run a standalone Sqoop metastore</span><br><span class="line">  version            Display version information</span><br><span class="line"></span><br><span class="line">See &#x27;sqoop help COMMAND&#x27; for information on a specific command.</span><br></pre></td></tr></table></figure><p>如果你没有使用 <code>HBase、HCatalog、Accumulo、Zookeeper</code> 你可以忽略它的警告，但是如果你和我一样觉得烦。你可以通过注释相关代码以跳过检查。</p><p><code>vi /root/sqoop-1.4.7.bin__hadoop-2.6.0/bin/configure-sqoop</code></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_">#</span><span class="language-bash"><span class="comment"># Moved to be a runtime check in sqoop.</span></span></span><br><span class="line"><span class="meta prompt_">#</span><span class="language-bash"><span class="keyword">if</span> [ ! -d <span class="string">&quot;<span class="variable">$&#123;HBASE_HOME&#125;</span>&quot;</span> ]; <span class="keyword">then</span></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash"> <span class="built_in">echo</span> <span class="string">&quot;Warning: <span class="variable">$HBASE_HOME</span> does not exist! HBase imports will fail.&quot;</span></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash"> <span class="built_in">echo</span> <span class="string">&#x27;Please set $HBASE_HOME to the root of your HBase installation.&#x27;</span></span></span><br><span class="line"><span class="meta prompt_">#</span><span class="language-bash"><span class="keyword">fi</span></span></span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_">#</span><span class="language-bash"><span class="comment"># Moved to be a runtime check in sqoop.</span></span></span><br><span class="line"><span class="meta prompt_">#</span><span class="language-bash"><span class="keyword">if</span> [ ! -d <span class="string">&quot;<span class="variable">$&#123;HCAT_HOME&#125;</span>&quot;</span> ]; <span class="keyword">then</span></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash"> <span class="built_in">echo</span> <span class="string">&quot;Warning: <span class="variable">$HCAT_HOME</span> does not exist! HCatalog jobs will fail.&quot;</span></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash"> <span class="built_in">echo</span> <span class="string">&#x27;Please set $HCAT_HOME to the root of your HCatalog installation.&#x27;</span></span></span><br><span class="line"><span class="meta prompt_">#</span><span class="language-bash"><span class="keyword">fi</span></span></span><br><span class="line"><span class="meta prompt_"></span></span><br><span class="line"><span class="meta prompt_">#</span><span class="language-bash"><span class="keyword">if</span> [ ! -d <span class="string">&quot;<span class="variable">$&#123;ACCUMULO_HOME&#125;</span>&quot;</span> ]; <span class="keyword">then</span></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash"> <span class="built_in">echo</span> <span class="string">&quot;Warning: <span class="variable">$ACCUMULO_HOME</span> does not exist! Accumulo imports will fail.&quot;</span></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash"> <span class="built_in">echo</span> <span class="string">&#x27;Please set $ACCUMULO_HOME to the root of your Accumulo installation.&#x27;</span></span></span><br><span class="line"><span class="meta prompt_">#</span><span class="language-bash"><span class="keyword">fi</span></span></span><br><span class="line"><span class="meta prompt_">#</span><span class="language-bash"><span class="keyword">if</span> [ ! -d <span class="string">&quot;<span class="variable">$&#123;ZOOKEEPER_HOME&#125;</span>&quot;</span> ]; <span class="keyword">then</span></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash"> <span class="built_in">echo</span> <span class="string">&quot;Warning: <span class="variable">$ZOOKEEPER_HOME</span> does not exist! Accumulo imports will fail.&quot;</span></span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash"> <span class="built_in">echo</span> <span class="string">&#x27;Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.&#x27;</span></span></span><br><span class="line"><span class="meta prompt_">#</span><span class="language-bash"><span class="keyword">fi</span></span></span><br></pre></td></tr></table></figure><h3 id="安装-mysql"><a href="#安装-mysql" class="headerlink" title="安装 mysql"></a>安装 mysql</h3><p>下载安装包</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">curl -O https://mirrors.tuna.tsinghua.edu.cn/mysql/yum/mysql57-community-el7/mysql-community-common-5.7.25-1.el7.x86_64.rpm</span><br><span class="line">curl -O https://mirrors.tuna.tsinghua.edu.cn/mysql/yum/mysql57-community-el7/mysql-community-libs-5.7.25-1.el7.x86_64.rpm</span><br><span class="line">curl -O https://mirrors.tuna.tsinghua.edu.cn/mysql/yum/mysql57-community-el7/mysql-community-client-5.7.25-1.el7.x86_64.rpm</span><br><span class="line">curl -O https://mirrors.tuna.tsinghua.edu.cn/mysql/yum/mysql57-community-el7/mysql-community-server-5.7.25-1.el7.x86_64.rpm</span><br></pre></td></tr></table></figure><p>卸载自带的 MariaDB 或之前安装的 MySQL ,如果执行下面的命令存在已安装的包。</p><p>就用 <code>rpm -e mysql-community-xxxx</code> 之类的命令卸载就行了,Centos 还需要卸载 <code>mariadb</code> 可以使用命令 <code>yum -y remove maria*</code></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">rpm -qa | grep mysql</span><br><span class="line">rpm -qa | grep mariadb</span><br></pre></td></tr></table></figure><p>安装</p><blockquote><p>mysql 的依赖太多了，推荐用 <code>yum install -y mysql</code> 安装一下，解决依赖问题</p></blockquote><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">rpm -ivh mysql-community-common-5.7.25-1.el7.x86_64.rpm</span><br><span class="line">rpm -ivh mysql-community-libs-5.7.25-1.el7.x86_64.rpm</span><br><span class="line">rpm -ivh mysql-community-client-5.7.25-1.el7.x86_64.rpm</span><br><span class="line">rpm -ivh mysql-community-server-5.7.25-1.el7.x86_64.rpm</span><br></pre></td></tr></table></figure><p>配置</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">mysqld --initialize --user=mysql # 使用mysql用户运行MySQL</span><br><span class="line">cat /var/log/mysqld.log # 查看一下生成的临时密码</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">2019-04-21T01:53:51.675008Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).</span><br><span class="line">2019-04-21T01:53:52.388438Z 0 [Warning] InnoDB: New log files created, LSN=45790</span><br><span class="line">2019-04-21T01:53:52.484682Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.</span><br><span class="line">2019-04-21T01:53:52.549955Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 5075e10a-63d8-11e9-9cd1-000c2937e0f0.</span><br><span class="line">2019-04-21T01:53:52.552680Z 0 [Warning] Gtid table is not ready to be used. Table &#x27;mysql.gtid_executed&#x27; cannot be opened.</span><br><span class="line">2019-04-21T01:53:52.583134Z 1 [Note] A temporary password is generated for root@localhost: zeAxOg16AZ!7</span><br></pre></td></tr></table></figure><p>如上: <code>zeAxOg16AZ!7</code> 就是密码</p><p>接下来启动 MySQL</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">systemctl start mysqld # 启动</span><br><span class="line">systemctl status mysqld # 查看运行状态,看到active (running)就行了</span><br></pre></td></tr></table></figure><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">mysql <span class="operator">-</span>u root <span class="operator">-</span>p # 以root身份登录MySQL,密码就是上面的密码</span><br><span class="line"><span class="keyword">SET</span> PASSWORD <span class="operator">=</span> PASSWORD(<span class="string">&#x27;your_new_password&#x27;</span>); # 修改一下密码</span><br><span class="line"><span class="keyword">select</span> version();</span><br></pre></td></tr></table></figure><p>输出以下信息说明配置成功</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="operator">+</span><span class="comment">-----------+</span></span><br><span class="line"><span class="operator">|</span> version() <span class="operator">|</span></span><br><span class="line"><span class="operator">+</span><span class="comment">-----------+</span></span><br><span class="line"><span class="operator">|</span> <span class="number">5.7</span><span class="number">.25</span>    <span class="operator">|</span></span><br><span class="line"><span class="operator">+</span><span class="comment">-----------+</span></span><br><span class="line"><span class="number">1</span> <span class="type">row</span> <span class="keyword">in</span> <span class="keyword">set</span> (<span class="number">0.00</span> sec)</span><br></pre></td></tr></table></figure><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">use mysql;</span><br><span class="line"><span class="keyword">update</span> <span class="keyword">user</span> <span class="keyword">set</span> host <span class="operator">=</span> <span class="string">&#x27;%&#x27;</span> <span class="keyword">where</span> <span class="keyword">user</span> <span class="operator">=</span> <span class="string">&#x27;root&#x27;</span>; <span class="comment">-- 修改允许任何IP使用root身份登录</span></span><br></pre></td></tr></table></figure><h4 id="修改-MySQL-配置"><a href="#修改-MySQL-配置" class="headerlink" title="修改 MySQL 配置"></a>修改 MySQL 配置</h4><p>打开 MySQL 配置文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /etc/my.cnf</span><br></pre></td></tr></table></figure><p>在 <code>[mysqld]</code> 字段里加入 <code>character-set-server=utf8</code> 如下</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">[mysqld]</span><br><span class="line">character-set-server=utf8</span><br><span class="line">bind-address=0.0.0.0</span><br></pre></td></tr></table></figure><p>在配置文件底部增加以下信息</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">[mysql]</span><br><span class="line">no-auto-rehash</span><br><span class="line">default-character-set=utf8</span><br><span class="line"></span><br><span class="line">[client]</span><br><span class="line">default-character-set=utf8</span><br></pre></td></tr></table></figure><p>重启 MySQL</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">systemctl restart mysqld</span><br></pre></td></tr></table></figure><p>登录 MySQL 查看是否修改成功</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">mysql <span class="operator">-</span>u root <span class="operator">-</span>p</span><br><span class="line"><span class="keyword">show</span> variables <span class="keyword">like</span> <span class="string">&#x27;char%&#x27;</span>;</span><br></pre></td></tr></table></figure><p>输出信息如下说明配置成功</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="operator">+</span><span class="comment">--------------------------+----------------------------+</span></span><br><span class="line"><span class="operator">|</span> Variable_name            <span class="operator">|</span> <span class="keyword">Value</span>                      <span class="operator">|</span></span><br><span class="line"><span class="operator">+</span><span class="comment">--------------------------+----------------------------+</span></span><br><span class="line"><span class="operator">|</span> character_set_client     <span class="operator">|</span> utf8                       <span class="operator">|</span></span><br><span class="line"><span class="operator">|</span> character_set_connection <span class="operator">|</span> utf8                       <span class="operator">|</span></span><br><span class="line"><span class="operator">|</span> character_set_database   <span class="operator">|</span> utf8                       <span class="operator">|</span></span><br><span class="line"><span class="operator">|</span> character_set_filesystem <span class="operator">|</span> <span class="type">binary</span>                     <span class="operator">|</span></span><br><span class="line"><span class="operator">|</span> character_set_results    <span class="operator">|</span> utf8                       <span class="operator">|</span></span><br><span class="line"><span class="operator">|</span> character_set_server     <span class="operator">|</span> utf8                       <span class="operator">|</span></span><br><span class="line"><span class="operator">|</span> character_set_system     <span class="operator">|</span> utf8                       <span class="operator">|</span></span><br><span class="line"><span class="operator">|</span> character_sets_dir       <span class="operator">|</span> <span class="operator">/</span>usr<span class="operator">/</span>share<span class="operator">/</span>mysql<span class="operator">/</span>charsets<span class="operator">/</span> <span class="operator">|</span></span><br><span class="line"><span class="operator">+</span><span class="comment">--------------------------+----------------------------+</span></span><br><span class="line"><span class="number">8</span> <span class="keyword">rows</span> <span class="keyword">in</span> <span class="keyword">set</span> (<span class="number">0.01</span> sec)</span><br></pre></td></tr></table></figure><h4 id="创建用于接收数据的数据库"><a href="#创建用于接收数据的数据库" class="headerlink" title="创建用于接收数据的数据库"></a>创建用于接收数据的数据库</h4><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">create</span> database target;</span><br><span class="line">use target;</span><br><span class="line"><span class="keyword">CREATE TABLE</span> data (</span><br><span class="line">  name <span class="type">VARCHAR</span>(<span class="number">1000</span>), </span><br><span class="line">  url <span class="type">VARCHAR</span>(<span class="number">1000</span>), </span><br><span class="line">  tag <span class="type">VARCHAR</span>(<span class="number">1000</span>), </span><br><span class="line">  location <span class="type">VARCHAR</span>(<span class="number">1000</span>), </span><br><span class="line">  release_time <span class="type">VARCHAR</span>(<span class="number">1000</span>), </span><br><span class="line">  quantity <span class="type">VARCHAR</span>(<span class="number">1000</span>), </span><br><span class="line">  duties <span class="type">VARCHAR</span>(<span class="number">1000</span>), </span><br><span class="line">  claim <span class="type">VARCHAR</span>(<span class="number">1000</span>));</span><br></pre></td></tr></table></figure><h3 id="导入测试数据"><a href="#导入测试数据" class="headerlink" title="导入测试数据"></a>导入测试数据</h3><p>下载测试数据并上传至 HDFS</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">curl -O https://st.blackyau.net/blog/13/testdata</span><br><span class="line">hdfs dfs -put ./testdata /testdata</span><br><span class="line">hive # 进入Hive</span><br></pre></td></tr></table></figure><p>在Hive中创建数据库</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">create</span> database testdata;</span><br></pre></td></tr></table></figure><p>创建表</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">use testdata; <span class="comment">-- 使用该数据库</span></span><br><span class="line"><span class="keyword">CREATE TABLE</span> test1 (</span><br><span class="line">  `name` string, </span><br><span class="line">  `url` string, </span><br><span class="line">  `tag` string, </span><br><span class="line">  `location` string, </span><br><span class="line">  `release_time` string, </span><br><span class="line">  `quantity` <span class="type">int</span>, </span><br><span class="line">  `duties` string, </span><br><span class="line">  `claim` string)</span><br><span class="line"><span class="type">ROW</span> FORMAT DELIMITED </span><br><span class="line">  FIELDS TERMINATED <span class="keyword">BY</span> <span class="string">&#x27;\t&#x27;</span>; <span class="comment">-- 设置分割符为制表符</span></span><br></pre></td></tr></table></figure><p>载入数据</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">load data inpath <span class="string">&#x27;/testdata&#x27;</span> <span class="keyword">into</span> <span class="keyword">table</span> test1;</span><br></pre></td></tr></table></figure><p>输出以下信息时说明导入成功</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">Loading data <span class="keyword">to</span> <span class="keyword">table</span> testdata.test1</span><br><span class="line"><span class="keyword">Table</span> testdata.test1 stats: [numFiles<span class="operator">=</span><span class="number">1</span>, totalSize<span class="operator">=</span><span class="number">3401593</span>]</span><br><span class="line">OK</span><br><span class="line"><span class="type">Time</span> taken: <span class="number">0.414</span> seconds</span><br></pre></td></tr></table></figure><p>查询一下</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> <span class="operator">*</span> <span class="keyword">from</span> test1;</span><br><span class="line"><span class="keyword">select</span> <span class="operator">*</span> <span class="keyword">from</span> test1 limit <span class="number">10</span>;</span><br></pre></td></tr></table></figure><p>输出以下信息说明正常</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">OK</span><br><span class="line">PCG10<span class="operator">-</span>浏览器功能前端开发工程师https:<span class="operator">/</span><span class="operator">/</span>hr.tencent.com<span class="operator">/</span>position_detail.php?id<span class="operator">=</span><span class="number">49321</span><span class="operator">&amp;</span>keywords<span class="operator">=</span><span class="operator">&amp;</span>tid<span class="operator">=</span><span class="number">0</span><span class="operator">&amp;</span>lid<span class="operator">=</span><span class="number">0</span>技术类深圳<span class="number">2019</span><span class="number">-04</span><span class="number">-11</span><span class="number">1</span>负责QQ浏览器移动端<span class="operator">/</span>PC端页面和小程序的功能开发和维护;负责浏览器功能的开发和持续优化；负责浏览器功能的web前端页面开发，维护和优化工作，包括前端JS、HTML5以及nodejs等，持续优化前端页面体验和访问速度;&quot;2年以上前端开发工作经验；  对浏览器兼容性、前端安全防范、响应式布局、网络协议优化有实践经验；具备良好的沟通能力和团队协作精神</span><br><span class="line">21227-创新手游产品—市场和平台渠道推广</span><br><span class="line">.......</span><br><span class="line">完善现有运营安全流程及规范;4、持续完善安全组件的监控告警、故障排查5、负责安全平台的网站建设工作1. 3年以上运维开发及运维平台建设经验；2. 有安全相关行业工作经验；3. 具备良好的沟通能力与项目管理能力；4. 熟悉linux下网络编程，熟悉HTML/JS以及HTTP原理，熟悉多种UI框架5. 精通Python、Shell或其他编程语言，有C、Java或PHP开发经验者优先6. 具备良好的合作精神和快速学习能力；7.有互联网安全运维工作经验者优先。</span><br><span class="line">Time taken: 0.073 seconds, Fetched: 10 row(s)</span><br></pre></td></tr></table></figure><p>使用较复杂的命令（Hive会调用mapreduce）</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">select</span> location,<span class="built_in">count</span>(<span class="operator">*</span>) <span class="keyword">as</span> temp_sum <span class="keyword">from</span> test1 <span class="keyword">group</span> <span class="keyword">by</span> location <span class="keyword">order</span> <span class="keyword">by</span> temp_sum <span class="keyword">desc</span>;</span><br></pre></td></tr></table></figure><p>输出以下信息说明工作正常</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br></pre></td><td class="code"><pre><span class="line">Automatically selecting <span class="keyword">local</span> <span class="keyword">only</span> mode <span class="keyword">for</span> query</span><br><span class="line">Query ID <span class="operator">=</span> root_20190421152828_c7eaeae7<span class="number">-4e76</span><span class="number">-48</span>fb<span class="number">-9780</span><span class="number">-135262</span>ef66e0</span><br><span class="line">Total jobs <span class="operator">=</span> <span class="number">2</span></span><br><span class="line">Launching Job <span class="number">1</span> <span class="keyword">out</span> <span class="keyword">of</span> <span class="number">2</span></span><br><span class="line">Number <span class="keyword">of</span> reduce tasks <span class="keyword">not</span> specified. Estimated <span class="keyword">from</span> input data size: <span class="number">1</span></span><br><span class="line"><span class="keyword">In</span> <span class="keyword">order</span> <span class="keyword">to</span> change the average load <span class="keyword">for</span> a reducer (<span class="keyword">in</span> bytes):</span><br><span class="line">  <span class="keyword">set</span> hive.exec.reducers.bytes.per.reducer<span class="operator">=</span><span class="operator">&lt;</span>number<span class="operator">&gt;</span></span><br><span class="line"><span class="keyword">In</span> <span class="keyword">order</span> <span class="keyword">to</span> limit the maximum number <span class="keyword">of</span> reducers:</span><br><span class="line">  <span class="keyword">set</span> hive.exec.reducers.max<span class="operator">=</span><span class="operator">&lt;</span>number<span class="operator">&gt;</span></span><br><span class="line"><span class="keyword">In</span> <span class="keyword">order</span> <span class="keyword">to</span> <span class="keyword">set</span> a constant number <span class="keyword">of</span> reducers:</span><br><span class="line">  <span class="keyword">set</span> mapreduce.job.reduces<span class="operator">=</span><span class="operator">&lt;</span>number<span class="operator">&gt;</span></span><br><span class="line">Job <span class="keyword">running</span> <span class="keyword">in</span><span class="operator">-</span>process (<span class="keyword">local</span> Hadoop)</span><br><span class="line"><span class="number">2019</span><span class="number">-04</span><span class="number">-21</span> <span class="number">15</span>:<span class="number">28</span>:<span class="number">21</span>,<span class="number">976</span> Stage<span class="number">-1</span> map <span class="operator">=</span> <span class="number">100</span><span class="operator">%</span>,  reduce <span class="operator">=</span> <span class="number">100</span><span class="operator">%</span></span><br><span class="line">Ended Job <span class="operator">=</span> job_local1573803716_0003</span><br><span class="line">Launching Job <span class="number">2</span> <span class="keyword">out</span> <span class="keyword">of</span> <span class="number">2</span></span><br><span class="line">Number <span class="keyword">of</span> reduce tasks determined <span class="keyword">at</span> compile <span class="type">time</span>: <span class="number">1</span></span><br><span class="line"><span class="keyword">In</span> <span class="keyword">order</span> <span class="keyword">to</span> change the average load <span class="keyword">for</span> a reducer (<span class="keyword">in</span> bytes):</span><br><span class="line">  <span class="keyword">set</span> hive.exec.reducers.bytes.per.reducer<span class="operator">=</span><span class="operator">&lt;</span>number<span class="operator">&gt;</span></span><br><span class="line"><span class="keyword">In</span> <span class="keyword">order</span> <span class="keyword">to</span> limit the maximum number <span class="keyword">of</span> reducers:</span><br><span class="line">  <span class="keyword">set</span> hive.exec.reducers.max<span class="operator">=</span><span class="operator">&lt;</span>number<span class="operator">&gt;</span></span><br><span class="line"><span class="keyword">In</span> <span class="keyword">order</span> <span class="keyword">to</span> <span class="keyword">set</span> a constant number <span class="keyword">of</span> reducers:</span><br><span class="line">  <span class="keyword">set</span> mapreduce.job.reduces<span class="operator">=</span><span class="operator">&lt;</span>number<span class="operator">&gt;</span></span><br><span class="line">Job <span class="keyword">running</span> <span class="keyword">in</span><span class="operator">-</span>process (<span class="keyword">local</span> Hadoop)</span><br><span class="line"><span class="number">2019</span><span class="number">-04</span><span class="number">-21</span> <span class="number">15</span>:<span class="number">28</span>:<span class="number">23</span>,<span class="number">199</span> Stage<span class="number">-2</span> map <span class="operator">=</span> <span class="number">100</span><span class="operator">%</span>,  reduce <span class="operator">=</span> <span class="number">100</span><span class="operator">%</span></span><br><span class="line">Ended Job <span class="operator">=</span> job_local1403411610_0004</span><br><span class="line">MapReduce Jobs Launched: </span><br><span class="line">Stage<span class="operator">-</span>Stage<span class="number">-1</span>:  HDFS Read: <span class="number">20435416</span> HDFS Write: <span class="number">1483</span> SUCCESS</span><br><span class="line">Stage<span class="operator">-</span>Stage<span class="number">-2</span>:  HDFS Read: <span class="number">20436422</span> HDFS Write: <span class="number">2021</span> SUCCESS</span><br><span class="line">Total MapReduce CPU <span class="type">Time</span> Spent: <span class="number">0</span> msec</span><br><span class="line">OK</span><br><span class="line">深圳<span class="number">2128</span></span><br><span class="line">北京<span class="number">610</span></span><br><span class="line">上海<span class="number">209</span></span><br><span class="line">广州<span class="number">123</span></span><br><span class="line">成都<span class="number">40</span></span><br><span class="line">武汉<span class="number">20</span></span><br><span class="line">杭州<span class="number">20</span></span><br><span class="line">马来西亚<span class="number">10</span></span><br><span class="line">香港<span class="number">10</span></span><br><span class="line">韩国<span class="number">10</span></span><br><span class="line">美国<span class="number">10</span></span><br><span class="line">日本<span class="number">10</span></span><br><span class="line"><span class="type">Time</span> taken: <span class="number">2.78</span> seconds, Fetched: <span class="number">12</span> <span class="type">row</span>(s)</span><br></pre></td></tr></table></figure><h3 id="使用-Sqoop-导出数据到-MySQL"><a href="#使用-Sqoop-导出数据到-MySQL" class="headerlink" title="使用 Sqoop 导出数据到 MySQL"></a>使用 Sqoop 导出数据到 MySQL</h3><p>下载 <a href="https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.47.tar.gz">mysql-connector-java-5.1.47.jar</a> 并放置在 <code>$SQOOP_HOME/lib/</code> 目录下，否则会出现以下报错。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver</span><br><span class="line">java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver</span><br></pre></td></tr></table></figure><p>下载并保存到指定目录</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">curl -o $SQOOP_HOME/lib/mysql-connector-java-6.0.3.jar https://st.blackyau.net/blog/13/mysql-connector-java-6.0.3.jar</span><br></pre></td></tr></table></figure><p>使用 Sqoop 导出数据到 MySQL </p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sqoop export --connect jdbc:mysql://127.0.0.1/target --driver com.mysql.jdbc.Driver --username root --password SKKfgHfz2AgC --table data --export-dir /user/hive/warehouse/testdata.db/test1 --input-fields-terminated-by &#x27;\t&#x27;</span><br></pre></td></tr></table></figure><p>参数解释</p><p><code>export</code> ：从HDFS目录导出到数据库表</p><p><code>--connect</code>：指定 JDBC 连接参数(在这里就是导出目的地 MySQL 的地址)</p><p><code>--driver</code>：指定 JDBC 驱动程序类(不指定的话无法导出到 MySQL ,之前下载的 jar 就是为这个用的)</p><p><code>--username</code>：验证用户名</p><p><code>--password</code>：验证密码</p><p><code>--table</code>：要写入的表名</p><p><code>--export-dir</code>：要导出 HDFS 数据的所在源路径(可在 Hive 中通过 <code>show create table test1</code> 的 <code>LOCATION</code> 看到)</p><p><code>--input-fields-terminated-by</code>：指定字段分隔符(我这里的源数据分隔符为’\t’)</p><p>查询是否导出成功</p><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">use target;</span><br><span class="line"><span class="keyword">select</span> <span class="operator">*</span> <span class="keyword">from</span> data limit <span class="number">1</span>;</span><br><span class="line"><span class="keyword">select</span> name,location,tag <span class="keyword">from</span> data limit <span class="number">10</span>;</span><br><span class="line"><span class="keyword">select</span> location,<span class="built_in">count</span>(<span class="operator">*</span>) <span class="keyword">as</span> sum <span class="keyword">from</span> data <span class="keyword">group</span> <span class="keyword">by</span> location <span class="keyword">order</span> <span class="keyword">by</span> sum <span class="keyword">desc</span>;</span><br></pre></td></tr></table></figure><p>输出以下内容说明成功</p><table><thead><tr><th>name</th><th>location</th><th>tag</th></tr></thead><tbody><tr><td>26787-策略分析经理（深圳）</td><td>深圳</td><td>市场类</td></tr><tr><td>25663-泛互联网售中架构师（北&#x2F;上&#x2F;深）</td><td>上海</td><td>技术类</td></tr><tr><td>TEG06-Mac终端安全运营高级工程师（深圳）</td><td>深圳</td><td>技术类</td></tr><tr><td>15618-游戏客户端开发工程师（上海）</td><td>上海</td><td>技术类</td></tr><tr><td>PCG04-腾讯视频移动终端测试开发工程师（深圳）</td><td>深圳</td><td>技术类</td></tr><tr><td>26787-游戏合作商务</td><td>深圳</td><td>市场类</td></tr><tr><td>TEG06-Mac终端安全运营高级工程师（深圳）</td><td>深圳</td><td>技术类</td></tr><tr><td>15618-游戏客户端开发工程师（上海）</td><td>上海</td><td>技术类</td></tr><tr><td>PCG04-腾讯视频移动终端测试开发工程师（深圳）</td><td>深圳</td><td>技术类</td></tr><tr><td>26787-游戏合作商务</td><td>深圳</td><td>市场类</td></tr></tbody></table><p>10 rows in set (0.00 sec)</p><table><thead><tr><th>location</th><th>sum</th></tr></thead><tbody><tr><td>深圳</td><td>2128</td></tr><tr><td>北京</td><td>610</td></tr><tr><td>上海</td><td>209</td></tr><tr><td>广州</td><td>123</td></tr><tr><td>成都</td><td>40</td></tr><tr><td>武汉</td><td>20</td></tr><tr><td>杭州</td><td>20</td></tr><tr><td>日本</td><td>10</td></tr><tr><td>香港</td><td>10</td></tr><tr><td>美国</td><td>10</td></tr><tr><td>马来西亚</td><td>10</td></tr><tr><td>韩国</td><td>10</td></tr></tbody></table><p>12 rows in set (0.00 sec)</p><h2 id="修改元数据存储数据库为mysql"><a href="#修改元数据存储数据库为mysql" class="headerlink" title="修改元数据存储数据库为mysql"></a>修改元数据存储数据库为mysql</h2><p>首先修改配置</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi $HIVE_HOME/conf/hive-site.xml</span><br></pre></td></tr></table></figure><p>修改下面的配置</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>javax.jdo.option.ConnectionPassword<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>your_password<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">description</span>&gt;</span>password to use against metastore database<span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>javax.jdo.option.ConnectionURL<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>jdbc:mysql://localhost:3306/hive?useSSL=true<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">description</span>&gt;</span>JDBC connect string for a JDBC metastore<span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>javax.jdo.option.ConnectionDriverName<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>com.mysql.jdbc.Driver<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">description</span>&gt;</span>Driver class name for a JDBC metastore<span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">name</span>&gt;</span>javax.jdo.option.ConnectionUserName<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">value</span>&gt;</span>root<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">description</span>&gt;</span>Username to use against metastore database<span class="tag">&lt;/<span class="name">description</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br></pre></td></tr></table></figure><p>下载 <a href="https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.47.tar.gz">mysql-connector-java-5.1.47.jar</a> 并放置在 <code>$SQOOP_HOME/lib/</code> 目录下，否则可能会出现以下报错。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">Column name pattern can not be NULL or empty</span><br></pre></td></tr></table></figure><p>启动 <code>hive</code> 再 <code>show databases</code> 一下是否正常</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">hive</span><br><span class="line">show databases;</span><br></pre></td></tr></table></figure><h2 id="S-HQL-常用命令"><a href="#S-HQL-常用命令" class="headerlink" title="S&#x2F;HQL 常用命令"></a>S&#x2F;HQL 常用命令</h2><figure class="highlight sql"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">DROP</span> DATABASE test CASCADE; <span class="comment">-- 删除 test 数据库并删除里面的数据</span></span><br><span class="line"><span class="keyword">show</span> <span class="keyword">create table</span> test1; <span class="comment">-- 查看 test1 表的详细信息</span></span><br></pre></td></tr></table></figure><p>其他 大数据系列文章 请看 <a href="https://blackyau.cc/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/">这里</a></p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><p><a href="https://book.douban.com/subject/25791255/">Programming Hive</a></p><p><a href="https://stackoverflow.com/questions/27099898/">StackOverFlow@Jonathan L - java.net.URISyntaxException when starting HIVE</a></p><p><a href="https://stackoverflow.com/questions/14024756/">StackOverFlow@user1493140 - SLF4J: Class path contains multiple SLF4J bindings</a></p><p><a href="https://item.jd.com/12109713.html">Hadoop: The Definitive Guide@Tom White</a></p><p><a href="https://www.cnblogs.com/junneyang/p/5850440.html">博客园@junneyang - 三句话告诉你 mapreduce 中MAP进程的数量怎么控制？</a></p><p><a href="https://www.cnblogs.com/xiaodangshan/p/7230111.html">博客园@xiaodangshan - centos下RPM安装mysql5.7.13</a></p><p><a href="http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1763114">Sqoop User Guide - Connecting to a Database Server</a></p><p><a href="https://stackoverflow.com/questions/22741183/">StackOverFlow@malatesh - Sqoop: Could not load mysql driver exception</a></p><p><a href="https://blog.csdn.net/qq_32953079/article/details/54629245">CSDN@一介那个书生 - CentOS下修改mysql数据库编码为UTF-8</a></p><p><a href="https://blog.csdn.net/u014695188/article/details/51087456">CSDN@爱笑的T_T - CentOS(Linux)中解决MySQL中文乱码</a></p><p><a href="https://stackoverflow.com/questions/17086642/">StackOverFlow@user1922900 - How to export a Hive table into a CSV file?</a></p><p><a href="https://blog.csdn.net/u012527870/article/details/71633915">CSDN@iXiongYu - Column name pattern can not be NULL or empty</a></p><p><a href="https://blog.csdn.net/u012922838/article/details/73291524">CSDN@鞋带散了的木木 - Hive提示警告SSL</a></p><p><a href="https://blog.csdn.net/qq_26479655/article/details/52252335">CSDN@shawn_zhu1 - hive修改默认元数据存储数据库derby改为mysql</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;如何将一个基于传统关系型数据库和结构化查询语句（SQL）的现有数据转移到 Hadoop ，对于大量的 SQL 用户来说 HIVE 就是解决这个问题的方案。它提供了一个被称为 Hive 查询语言（简称 HiveQL 或 HQL）的 SQL 方言，来查询储存在 Hadoop 集群中的数据。&lt;/p&gt;</summary>
    
    
    
    <category term="大数据" scheme="https://blackyau.cc/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
    <category term="CentOS" scheme="https://blackyau.cc/tags/CentOS/"/>
    
    <category term="Linux" scheme="https://blackyau.cc/tags/Linux/"/>
    
    <category term="Hadoop" scheme="https://blackyau.cc/tags/Hadoop/"/>
    
    <category term="HDFS" scheme="https://blackyau.cc/tags/HDFS/"/>
    
    <category term="HIVE" scheme="https://blackyau.cc/tags/HIVE/"/>
    
    <category term="MySQL" scheme="https://blackyau.cc/tags/MySQL/"/>
    
  </entry>
  
  <entry>
    <title>Flume 配置</title>
    <link href="https://blackyau.cc/12"/>
    <id>https://blackyau.cc/12</id>
    <published>2019-04-14T12:50:08.000Z</published>
    <updated>2019-04-17T09:30:32.000Z</updated>
    
    <content type="html"><![CDATA[<p>接上文，我们已经完成了 <a href="/11" title="Hadoop 伪分布部署">Hadoop 伪分布部署</a> 接下来就可以配置 Flume 了。</p><span id="more"></span><h2 id="Flume-介绍"><a href="#Flume-介绍" class="headerlink" title="Flume 介绍"></a>Flume 介绍</h2><p>Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统。支持在日志系统中定制各类数据发送方，用于收集数据；同时，Flume提供对数据进行简单处理，并写到各种数据接受方(比如文本、HDFS、Hbase等)的能力。</p><p>Flume Agent (代理) 主要由以下三部分组成</p><ul><li><strong>Source</strong>: 从外部接收 <code>Source</code> 所识别格式的数据,并向 <code>Flume</code> 发送 <code>事件(Avent)</code> 。当 <code>Flume</code> 接收到事件后，会将它储存到一个或多个 <code>Channel</code></li><li><strong>Channel</strong>: 一个被动的储存器，用于保存事件，直到被 <code>sink</code> 消耗为止</li><li><strong>sink</strong>: 从 <code>Channel</code> 中移除事件并将其放入 <code>外部储存库</code> (例如:<code>HDFS</code>)，或将其转发到下一个 <code>Flume Agent</code> 的 <code>Flume Source</code>。</li></ul><p><code>Flume</code> 中的 <code>Source</code>、 <code>Channel</code> 和 <code>sink</code> 之间暂存的事件是异步运行的</p><p><img data-src="https://st.blackyau.net/blog/12/1.png" alt="1"></p><p>这里我主要使用 Flume 实现以下功能:</p><ul><li>监听5555网络端口</li><li>将从网络端口接收到的数据落地到hdfs以下目录 <code>/raw_data/receive/</code></li><li>文件名称格式以 <code>[YYYYMMDD]_</code> 为前缀</li><li>每接收 <code>10M</code> 数据落地一个文件，当接收数据不足 <code>10M</code> 时，每 <code>15分钟</code> 落地一个文件。</li></ul><h2 id="准备工作"><a href="#准备工作" class="headerlink" title="准备工作"></a>准备工作</h2><h3 id="下载解压-Flume"><a href="#下载解压-Flume" class="headerlink" title="下载解压 Flume"></a>下载解压 Flume</h3><p>我使用的是 <a href="http://archive.apache.org/dist/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz">Flume 1.6.0</a></p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">curl -O http://192.168.66.1/apache-flume-1.6.0-bin.tar.gz # 从本地下载</span><br><span class="line">curl -O http://archive.apache.org/dist/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz # 需要连接到公网</span><br><span class="line">tar -xf apache-flume-1.6.0-bin.tar.gz # 解压</span><br></pre></td></tr></table></figure><h3 id="环境变量"><a href="#环境变量" class="headerlink" title="环境变量"></a>环境变量</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /etc/profile</span><br></pre></td></tr></table></figure><p>在文件末尾添加以下内容</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">export FLUME_HOME=/root/apache-flume-1.6.0-bin</span><br><span class="line">export PATH=$PATH:$FLUME_HOME/bin</span><br></pre></td></tr></table></figure><p>让系统更新配置文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">source /etc/profile</span><br></pre></td></tr></table></figure><p>测试配置是否成功</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">flume-ng version</span><br></pre></td></tr></table></figure><p>输出以下信息说明配置成功</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">Flume 1.6.0</span><br><span class="line">Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git</span><br><span class="line">Revision: 2561a23240a71ba20bf288c7c2cda88f443c2080</span><br><span class="line">Compiled by hshreedharan on Mon May 11 11:15:44 PDT 2015</span><br><span class="line">From source with checksum b29e416802ce9ece3269d34233baf43f</span><br></pre></td></tr></table></figure><h3 id="检测依赖"><a href="#检测依赖" class="headerlink" title="检测依赖"></a>检测依赖</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">jps</span><br></pre></td></tr></table></figure><p>输出以下信息说明 <code>NameNode</code> 正常运行</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">33377 Jps</span><br><span class="line">20854 DataNode</span><br><span class="line">20998 SecondaryNameNode</span><br><span class="line">20743 NameNode</span><br></pre></td></tr></table></figure><p>看看 NameNode 的 WEB 端是否正常运行</p><p><a href="http://192.168.66.135:50070/">http://192.168.66.135:50070</a></p><p>如果它运行异常， <code>Flume</code> 是无法将数据正确写入 <code>HDFS</code> 的</p><h2 id="配置示例-Flume"><a href="#配置示例-Flume" class="headerlink" title="配置示例 Flume"></a>配置示例 Flume</h2><h3 id="编辑示例配置"><a href="#编辑示例配置" class="headerlink" title="编辑示例配置"></a>编辑示例配置</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /root/apache-flume-1.6.0-bin/conf/example.conf</span><br></pre></td></tr></table></figure><p>写入以下信息，该示例配置来自于 <a href="http://flume.apache.org/releases/content/1.6.0/FlumeUserGuide.html#starting-an-agent">Flume docs</a></p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br></pre></td><td class="code"><pre><span class="line"># Flume 示例配置,监听 44444 端口，并将来自该端口的信息存储在内存中，最后把信息打印在终端</span><br><span class="line"></span><br><span class="line"># 为该 Agent 的组件设置名字</span><br><span class="line">a1.sources = r1</span><br><span class="line">a1.sinks = k1</span><br><span class="line">a1.channels = c1</span><br><span class="line"></span><br><span class="line"># 设置 Sources 的属性,侦听网络端口并将每行文本转换为 Event</span><br><span class="line"># http://flume.apache.org/releases/content/1.6.0/FlumeUserGuide.html#netcat-source</span><br><span class="line">a1.sources.r1.type = netcat</span><br><span class="line">a1.sources.r1.bind = master</span><br><span class="line">a1.sources.r1.port = 44444</span><br><span class="line"></span><br><span class="line"># 设置 Sinks 的属性,在日志中以 INFO 级别记录所有 Event</span><br><span class="line"># http://flume.apache.org/releases/content/1.6.0/FlumeUserGuide.html#logger-sink</span><br><span class="line">a1.sinks.k1.type = logger</span><br><span class="line"></span><br><span class="line"># 设置 Channels 的属性,将 Event 储存在内存中</span><br><span class="line"># http://flume.apache.org/releases/content/1.6.0/FlumeUserGuide.html#memory-channel</span><br><span class="line">a1.channels.c1.type = memory</span><br><span class="line">a1.channels.c1.capacity = 1000</span><br><span class="line">a1.channels.c1.transactionCapacity = 100</span><br><span class="line"></span><br><span class="line"># 将 Sources 和 Sinks 绑定到 Channels</span><br><span class="line">a1.sources.r1.channels = c1</span><br><span class="line">a1.sinks.k1.channel = c1</span><br></pre></td></tr></table></figure><h3 id="运行示例配置的-Flume"><a href="#运行示例配置的-Flume" class="headerlink" title="运行示例配置的 Flume"></a>运行示例配置的 Flume</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">start-dfs.sh # 启动 HDFS</span><br><span class="line">flume-ng agent --conf-file /root/apache-flume-1.6.0-bin/conf/example.conf --name a1</span><br></pre></td></tr></table></figure><h3 id="发送测试数据"><a href="#发送测试数据" class="headerlink" title="发送测试数据"></a>发送测试数据</h3><p>使用真机的 <code>telnet</code> 连接到 Flume 服务器，发送任意字符串查看终端有没有显示相关信息。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">telnet 192.168.66.135 44444</span><br></pre></td></tr></table></figure><blockquote><p>‘telnet’ 不是内部或外部命令，也不是可运行的程序或批处理文件。请到 <code>控制面板-程序和功能-启动或关闭 Windows 功能</code> 勾选 <code>Telnet Client</code></p></blockquote><p><img data-src="https://st.blackyau.net/blog/12/2.png" alt="2"></p><h2 id="配置-Flume"><a href="#配置-Flume" class="headerlink" title="配置 Flume"></a>配置 Flume</h2><h3 id="编辑配置"><a href="#编辑配置" class="headerlink" title="编辑配置"></a>编辑配置</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /root/apache-flume-1.6.0-bin/conf/flume.conf</span><br></pre></td></tr></table></figure><p>写入以下配置</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br></pre></td><td class="code"><pre><span class="line"># Flume 完整配置,监听 5555 端口，并将来自该端口的信息存储在本地文件中，最后把信息发送到 HDFS</span><br><span class="line"></span><br><span class="line"># 为该 Agent 的组件设置名字</span><br><span class="line">a1.sources = r1</span><br><span class="line">a1.sinks = k1</span><br><span class="line">a1.channels = c1</span><br><span class="line"></span><br><span class="line"># 设置 Sources 的属性,侦听网络端口并将每行文本转换为 Event</span><br><span class="line"># http://flume.apache.org/releases/content/1.6.0/FlumeUserGuide.html#netcat-source</span><br><span class="line">a1.sources.r1.type = netcat</span><br><span class="line">a1.sources.r1.bind = master</span><br><span class="line">a1.sources.r1.port = 5555</span><br><span class="line"></span><br><span class="line"># 设置 Sinks 的属性,将落地到 HDFS</span><br><span class="line"># http://flume.apache.org/releases/content/1.6.0/FlumeUserGuide.html#hdfs-sink</span><br><span class="line">a1.sinks.k1.type = hdfs</span><br><span class="line"># 设置路径</span><br><span class="line">a1.sinks.k1.hdfs.path = hdfs://master:9000/raw_data/receive/</span><br><span class="line"># 设置前缀</span><br><span class="line">a1.sinks.k1.hdfs.filePrefix = [%Y%m%d]_</span><br><span class="line"># 落地文件大小单位为 bytes</span><br><span class="line">a1.sinks.k1.hdfs.rollSize = 10485760</span><br><span class="line"># 无视基于事件块的落地配置</span><br><span class="line">a1.sinks.k1.hdfs.rollCount = 0</span><br><span class="line"># 超时落地文件</span><br><span class="line">a1.sinks.k1.hdfs.rollInterval = 900</span><br><span class="line"># 设置使用本地时区，如果不设置的话会报错</span><br><span class="line">a1.sinks.k1.hdfs.useLocalTimeStamp = true</span><br><span class="line"></span><br><span class="line"># 设置 Channels 的属性,将 Event 储存在文件中</span><br><span class="line"># http://flume.apache.org/releases/content/1.6.0/FlumeUserGuide.html#file-channel</span><br><span class="line">a1.channels.c1.type = file</span><br><span class="line"># 存放 event 在那个 data 文件 logFileID ，的什么位置 offset 等信息，相当于索引</span><br><span class="line">a1.channels.c1.checkpointDir = /root/flume/checkpoint</span><br><span class="line"># 主数据文件</span><br><span class="line">a1.channels.c1.dataDirs = /root/flume/data</span><br><span class="line"></span><br><span class="line"># 将 Sources 和 Sinks 绑定到 Channels</span><br><span class="line">a1.sources.r1.channels = c1</span><br><span class="line">a1.sinks.k1.channel = c1</span><br></pre></td></tr></table></figure><h3 id="运行"><a href="#运行" class="headerlink" title="运行"></a>运行</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">flume-ng agent --conf-file /root/apache-flume-1.6.0-bin/conf/flume.conf --name a1</span><br></pre></td></tr></table></figure><p>输出类似下列信息说明配置成功</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br></pre></td><td class="code"><pre><span class="line">19/04/17 16:49:14 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started</span><br><span class="line">19/04/17 16:49:29 INFO hdfs.HDFSSequenceFile: writeFormat = Writable, UseRawLocalFileSystem = false</span><br><span class="line">19/04/17 16:49:29 INFO hdfs.BucketWriter: Creating hdfs://master:9000/raw_data/receive//[20190417]_.1555490969042.tmp</span><br><span class="line">19/04/17 16:49:43 INFO file.EventQueueBackingStoreFile: Start checkpoint for /root/flume/checkpoint/checkpoint, elements to sync = 4</span><br><span class="line">19/04/17 16:49:43 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1555490953596, queueSize: 0, queueHead: 2</span><br><span class="line">19/04/17 16:49:43 INFO file.Log: Updated checkpoint for file: /root/flume/data/log-9 position: 1654 logWriteOrderID: 1555490953596</span><br><span class="line">19/04/17 16:49:43 INFO file.LogFile: Closing RandomReader /root/flume/data/log-7</span><br><span class="line">19/04/17 16:50:13 INFO file.EventQueueBackingStoreFile: Start checkpoint for /root/flume/checkpoint/checkpoint, elements to sync = 1</span><br><span class="line">19/04/17 16:50:13 INFO file.EventQueueBackingStoreFile: Updating checkpoint metadata: logWriteOrderID: 1555490953601, queueSize: 0, queueHead: 2</span><br><span class="line">19/04/17 16:50:13 INFO file.Log: Updated checkpoint for file: /root/flume/data/log-9 position: 1840 logWriteOrderID: 1555490953601</span><br><span class="line">19/04/17 16:50:13 INFO file.Log: Removing old file: /root/flume/data/log-7</span><br><span class="line">19/04/17 16:50:13 INFO file.Log: Removing old file: /root/flume/data/log-7.meta</span><br><span class="line">19/04/17 17:04:30 INFO hdfs.BucketWriter: Closing hdfs://master:9000/raw_data/receive//[20190417]_.1555490969042.tmp</span><br><span class="line">19/04/17 17:04:30 INFO hdfs.BucketWriter: Renaming hdfs://master:9000/raw_data/receive/[20190417]_.1555490969042.tmp to hdfs://master:9000/raw_data/receive/[20190417]_.1555490969042</span><br><span class="line">19/04/17 17:04:30 INFO hdfs.HDFSEventSink: Writer callback called.</span><br></pre></td></tr></table></figure><h3 id="排错"><a href="#排错" class="headerlink" title="排错"></a>排错</h3><p>之前遇到了一个问题</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">There are 0 datanode(s) running and no node(s) are excluded in this operation</span><br></pre></td></tr></table></figure><p>这个问题是因为多次格式化 HDFS 导致了 ID 错乱，所以无法正确的写入文件。删除一下之前的文件然后重新格式化就好了。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">stop-all.sh # 停止所有服务</span><br><span class="line">rm -rf /tmp/*</span><br><span class="line">rm -rf /root/hadoop-2.6.0/tmp</span><br><span class="line">hdfs namenode -format</span><br><span class="line">start-dfs.sh</span><br></pre></td></tr></table></figure><p>接下来再启动 flume 就行了</p><h3 id="其他-sources"><a href="#其他-sources" class="headerlink" title="其他 sources"></a>其他 sources</h3><p>由于信息来源多种多样，配置方法都大相径庭，参考官方的文档配置即可。</p><p>Avro Source</p><blockquote><p>Flume到Flume</p></blockquote><p><a href="http://flume.apache.org/releases/content/1.6.0/FlumeUserGuide.html#avro-source">http://flume.apache.org/releases/content/1.6.0/FlumeUserGuide.html#avro-source</a></p><p>HTTP Source</p><blockquote><p>通过POST和GET接收数据</p></blockquote><p><a href="http://flume.apache.org/releases/content/1.6.0/FlumeUserGuide.html#http-source">http://flume.apache.org/releases/content/1.6.0/FlumeUserGuide.html#http-source</a></p><p>其他 大数据系列文章 请看 <a href="https://blackyau.cc/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/">这里</a></p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><p><a href="http://flume.apache.org/releases/content/1.6.0/FlumeUserGuide.html">Flume 1.6.0 User Guide</a></p><p><a href="https://stackoverflow.com/questions/26545524/">StackOverFlow@prayagupd - There are 0 datanode(s) running and no node(s) are excluded in this operation</a></p><p><a href="https://my.oschina.net/u/3747963/blog/1834981">开源中国@海岸线的曙光 - Flume日志收集之Logger和HDFS数据传输方式</a></p><p><a href="http://www.cnblogs.com/xiangyuqi/p/8690902.html">博客园@项羽齐 - 大数据3-Flume收集数据+落地HDFS</a></p><p><a href="https://www.jianshu.com/p/4f43780c82e9">简书@Woople - Flume HDFS Sink常用配置深度解读</a></p><p><a href="http://lxw1234.com/archives/2015/10/527.htm">lxw的大数据田地 - Flume中的HDFS Sink配置参数说明</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;接上文，我们已经完成了 &lt;a href=&quot;/11&quot; title=&quot;Hadoop 伪分布部署&quot;&gt;Hadoop 伪分布部署&lt;/a&gt; 接下来就可以配置 Flume 了。&lt;/p&gt;</summary>
    
    
    
    <category term="大数据" scheme="https://blackyau.cc/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
    <category term="CentOS" scheme="https://blackyau.cc/tags/CentOS/"/>
    
    <category term="Linux" scheme="https://blackyau.cc/tags/Linux/"/>
    
    <category term="Hadoop" scheme="https://blackyau.cc/tags/Hadoop/"/>
    
    <category term="HDFS" scheme="https://blackyau.cc/tags/HDFS/"/>
    
    <category term="Flume" scheme="https://blackyau.cc/tags/Flume/"/>
    
  </entry>
  
  <entry>
    <title>Hadoop 伪分布部署</title>
    <link href="https://blackyau.cc/11"/>
    <id>https://blackyau.cc/11</id>
    <published>2019-04-04T09:00:17.000Z</published>
    <updated>2019-06-02T07:47:45.000Z</updated>
    
    <content type="html"><![CDATA[<p>单机安装 Hadoop 还是比较简单，这里就使用 VMware 模拟部署一下。年轻人的一次大数据之旅？？</p><span id="more"></span><h2 id="新建虚拟机"><a href="#新建虚拟机" class="headerlink" title="新建虚拟机"></a>新建虚拟机</h2><p>下载镜像：<a href="https://mirrors.tuna.tsinghua.edu.cn/centos/7/isos/x86_64/CentOS-7-x86_64-Minimal-1810.iso">https://mirrors.tuna.tsinghua.edu.cn/centos/7/isos/x86_64&#x2F;CentOS-7-x86_64-Minimal-1810.iso</a></p><p><img data-src="https://st.blackyau.net/blog/11/1.png" alt="1"><br><img data-src="https://st.blackyau.net/blog/11/2.png" alt="2"><br><img data-src="https://st.blackyau.net/blog/11/3.png" alt="3"><br><img data-src="https://st.blackyau.net/blog/11/4.png" alt="4"><br><img data-src="https://st.blackyau.net/blog/11/5.png" alt="5"></p><p>内存和硬盘大小请根据情况而定。其次，我一般习惯在创建完虚拟机后，进入虚拟机设置将打印机移除再启动。</p><p>按方向键将焦点移动至上方 <code>Install CentOS 7</code> 并回车</p><p><img data-src="https://st.blackyau.net/blog/11/6.png" alt="6"></p><p>直接点 <code>Continue</code> 使用英语的话遇到问题方便在网上找解决方法</p><p><img data-src="https://st.blackyau.net/blog/11/7.png" alt="7"></p><p>首先为了查 <code>Log</code> 方便，进入 <code>DATE &amp; TIME</code> 把时区和时间调到和本机一样。</p><p>点进 <code>NETWORK &amp; HOST NAME</code> 把网络连接打开</p><p>最后点击开始安装</p><p><img data-src="https://st.blackyau.net/blog/11/8.png" alt="8"></p><p>安装的时候可以设置 <code>Root</code> 密码，安装完了直接点 <code>Reboot</code> 重启就完事儿了</p><p><img data-src="https://st.blackyau.net/blog/11/9.png" alt="9"></p><p>等待它出现</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">CentOS Linux 7 (Core)</span><br><span class="line">Kernel 3.10.0-957.e17.x86_64 on an x86_64</span><br><span class="line"></span><br><span class="line">localhost login:</span><br></pre></td></tr></table></figure><p>的时候就说明安装完毕了，然后我们输入 <code>root</code> 和你刚刚设置的密码。登陆系统后输入 <code>ip add</code> 查询一下本机IP。</p><p><img data-src="https://st.blackyau.net/blog/11/10.png" alt="10"></p><p>有了本机 <code>IP</code> 就可以换用 Xshell, putty, secureCRT 之类的软件使用 SSH 连接了。可以快乐的复制粘贴了，这里我以 Xshell 为例。</p><blockquote><p>Xshell 的 复制 快捷键为 <code>Ctrl+Insert</code> , 粘贴 快捷键为 <code>Shift+Insert</code> (在标准键盘中 Insert 在 Delete 的上面)</p></blockquote><p>新建会话填写 <code>名称</code> 和 <code>IP</code> ，然后点击侧栏的 <code>用户身份验证</code> 填入 <code>用户名</code> 和 <code>密码</code></p><p><img data-src="https://st.blackyau.net/blog/11/11.png" alt="11"></p><p>再然后点击侧栏 <code>SSH</code> 里面的 <code>隧道</code> ，将下面的 <code>转发X11连接到</code> 关闭。</p><p><img data-src="https://st.blackyau.net/blog/11/12.png" alt="12"></p><p>点击确定，连接后提示 <code>未知主机密钥</code> 选择 <code>接收并保存</code> 即可</p><p><img data-src="https://st.blackyau.net/blog/11/13.png" alt="13"></p><h2 id="服务器基本环境配置"><a href="#服务器基本环境配置" class="headerlink" title="服务器基本环境配置"></a>服务器基本环境配置</h2><h3 id="关闭防火墙"><a href="#关闭防火墙" class="headerlink" title="关闭防火墙"></a>关闭防火墙</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">systemctl stop firewalld # 关闭防火墙</span><br><span class="line">systemctl disable firewalld # 关闭防火墙自启</span><br><span class="line">firewall-cmd --state # 检查防火墙状态</span><br></pre></td></tr></table></figure><p>当它输出以下提示时意味着这一步你已经完成了</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">not running</span><br></pre></td></tr></table></figure><h3 id="关闭-Selinux"><a href="#关闭-Selinux" class="headerlink" title="关闭 Selinux"></a>关闭 Selinux</h3><blockquote><p>想了解有关它的更多信息请自行搜索</p></blockquote><p>首先打开配置文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi /etc/selinux/config</span><br></pre></td></tr></table></figure><p>修改字段如下</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta prompt_"># </span><span class="language-bash">This file controls the state of SELinux on the system.</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">SELINUX= can take one of these three values:</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">    enforcing - SELinux security policy is enforced.</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">    permissive - SELinux prints warnings instead of enforcing.</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">    disabled - No SELinux policy is loaded.</span></span><br><span class="line">SELINUX=disabled # 将这里改为 disabled</span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">SELINUXTYPE= can take one of three values:</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">    targeted - Targeted processes are protected,</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">    minimum - Modification of targeted policy. Only selected processes are protected.</span></span><br><span class="line"><span class="meta prompt_"># </span><span class="language-bash">    mls - Multi Level Security protection.</span></span><br><span class="line">SELINUXTYPE=targeted</span><br></pre></td></tr></table></figure><blockquote><p>保存并退出 vi 的方法是 按 esc 后输入 :wq 。有关 vi 的更多操作请自行搜索</p></blockquote><p>随后输入以下指令重启以生效</p><h3 id="修改-hostname"><a href="#修改-hostname" class="headerlink" title="修改 hostname"></a>修改 hostname</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">hostnamectl set-hostname master # 修改 hostsname 为 master</span><br><span class="line">vi /etc/hosts # 打开 hosts 文件</span><br></pre></td></tr></table></figure><p>由于 <code>HDFS</code> 钟爱于使用 <code>localhosts</code> 所以要将 <code>hosts</code> 里面有关的信息都注释掉，修改后 <code>hosts</code> 文件如下</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"># 127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4</span><br><span class="line"># ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6</span><br><span class="line">192.168.66.135 master # 这里应该改成你自己的 IP</span><br></pre></td></tr></table></figure><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">reboot # 重启生效</span><br></pre></td></tr></table></figure><h2 id="安装-Java"><a href="#安装-Java" class="headerlink" title="安装 Java"></a>安装 Java</h2><p>下载 <a href="https://www.oracle.com/cn/java/technologies/javase/javase8u211-later-archive-downloads.html">https://www.oracle.com/cn/java/technologies/javase/javase8u211-later-archive-downloads.html</a></p><p>我选择了 <code>Linux x64 rpm 安装包</code></p><p>在下载目录的空白处按住 <code>Shift</code> 点击鼠标右键，然后点击 <code>在此处打开 PowerShell 窗口</code> 召唤出来 <code>PowerShell</code> ，因为本机有 Python 环境，所以就使用 <code>python -m http.server 80</code> 启动一个简易的 <code>http服务</code> 。然后用浏览器打开 <code>服务器IP的默认网关</code> 一般情况下与 服务器IP 的差别只有 <code>最后一位为1</code> 。这里我就选择打开 <code>192.168.66.1</code>。并复制 <code>Java JDK 链接</code>。</p><blockquote><p>别忘了关闭 Windows 防火墙，python2(CentOS7 自带) 的http服务命令是 <code>python -m SimpleHTTPServer 80</code></p></blockquote><p><img data-src="https://st.blackyau.net/blog/11/14.png" alt="14"></p><p>在 CenOS 下输入以下命令下载并安装</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">curl -O http://192.168.66.1/jdk-8u201-linux-x64.rpm #这里应该替换为你刚刚复制的链接</span><br><span class="line">rpm -ivh jdk-8u201-linux-x64.rpm</span><br></pre></td></tr></table></figure><p>当你看到控制台输出类似信息说明你已安装成功</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line">[root@master ~] curl -O http://192.168.66.1/jdk-8u201-linux-x64.rpm</span><br><span class="line"><span class="meta prompt_">  % </span><span class="language-bash">Total    % Received % Xferd  Average Speed   Time    Time     Time  Current</span></span><br><span class="line">                                 Dload  Upload   Total   Spent    Left  Speed</span><br><span class="line">100  168M  100  168M    0     0  7569k      0  0:00:22  0:00:22 --:--:-- 7910k</span><br><span class="line">[root@master ~] rpm -ivh jdk-8u201-linux-x64.rpm</span><br><span class="line">warning: jdk-8u201-linux-x64.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEY</span><br><span class="line">Preparing...                          ################################# [100%]</span><br><span class="line">Updating / installing...</span><br><span class="line">   1:jdk1.8-2000:1.8.0_201-fcs        ################################# [100%]</span><br><span class="line">Unpacking JAR files...</span><br><span class="line">tools.jar...</span><br><span class="line">plugin.jar...</span><br><span class="line">javaws.jar...</span><br><span class="line">deploy.jar...</span><br><span class="line">rt.jar...</span><br><span class="line">jsse.jar...</span><br><span class="line">charsets.jar...</span><br><span class="line">localedata.jar...</span><br></pre></td></tr></table></figure><p>测试一下</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">java -version</span><br></pre></td></tr></table></figure><p>输出以下信息说明安装成功</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">java version &quot;1.8.0_201&quot;</span><br><span class="line">Java(TM) SE Runtime Environment (build 1.8.0_201-b09)</span><br><span class="line">Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)</span><br></pre></td></tr></table></figure><h2 id="安装-Hadoop"><a href="#安装-Hadoop" class="headerlink" title="安装 Hadoop"></a>安装 Hadoop</h2><p>下载 <a href="https://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/hadoop-2.6.0.tar.gz">https://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/hadoop-2.6.0.tar.gz</a></p><p>还是使用老办法将文件传进 CentOS</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">curl -O http://192.168.66.1/hadoop-2.6.0.tar.gz # 这里应该改成你复制的链接</span><br><span class="line">tar -xvf hadoop-2.6.0.tar.gz # 解压 Hadoop</span><br></pre></td></tr></table></figure><p>接下来我们要为 JAVA 和 Hadoop 配置环境变量，首先打开配置文件 <code>vi /etc/profile</code> 按一下大写 <code>G</code> 转跳到文本底部，添加内容后文本底部如下。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">for i in /etc/profile.d/*.sh /etc/profile.d/sh.local ; do</span><br><span class="line">    if [ -r &quot;$i&quot; ]; then</span><br><span class="line">        if [ &quot;$&#123;-#*i&#125;&quot; != &quot;$-&quot; ]; then</span><br><span class="line">            . &quot;$i&quot;</span><br><span class="line">        else</span><br><span class="line">            . &quot;$i&quot; &gt;/dev/null</span><br><span class="line">        fi</span><br><span class="line">    fi</span><br><span class="line">done</span><br><span class="line"></span><br><span class="line">unset i</span><br><span class="line">unset -f pathmunge</span><br><span class="line"></span><br><span class="line">export JAVA_HOME=/usr/java/jdk1.8.0_201-amd64</span><br><span class="line">export HADOOP_HOME=/root/hadoop-2.6.0</span><br><span class="line">export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin</span><br></pre></td></tr></table></figure><p>输入以下内容让系统更新配置文件</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">source /etc/profile</span><br></pre></td></tr></table></figure><p>输入以下内容检测配置是否正常</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">hadoop version</span><br></pre></td></tr></table></figure><p>当返回如下信息时说明你的配置已经成功了</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">Hadoop 2.6.0</span><br><span class="line">Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1</span><br><span class="line">Compiled by jenkins on 2014-11-13T21:10Z</span><br><span class="line">Compiled with protoc 2.5.0</span><br><span class="line">From source with checksum 18e43357c8f927c0695f1e9522859d6a</span><br><span class="line">This command was run using /root/hadoop-2.6.0/share/hadoop/common/hadoop-common-2.6.0.jar</span><br></pre></td></tr></table></figure><h2 id="配置-SSH-实现免密登陆"><a href="#配置-SSH-实现免密登陆" class="headerlink" title="配置 SSH 实现免密登陆"></a>配置 SSH 实现免密登陆</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh-keygen -t rsa </span><br></pre></td></tr></table></figure><p>然后一直回车，输出信息如下即可</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line">[root@master hadoop-2.6.0] ssh-keygen -t rsa </span><br><span class="line">Generating public/private rsa key pair.</span><br><span class="line">Enter file in which to save the key (/root/.ssh/id_rsa): </span><br><span class="line">Created directory &#x27;/root/.ssh&#x27;.</span><br><span class="line">Enter passphrase (empty for no passphrase): </span><br><span class="line">Enter same passphrase again: </span><br><span class="line">Your identification has been saved in /root/.ssh/id_rsa.</span><br><span class="line">Your public key has been saved in /root/.ssh/id_rsa.pub.</span><br><span class="line">The key fingerprint is:</span><br><span class="line">SHA256:bP4nV0fky1ShxRm07JRVIYIrpdb23O1ha+xIJYONTTY root@master</span><br><span class="line">The key&#x27;s randomart image is:</span><br><span class="line">+---[RSA 2048]----+</span><br><span class="line">|   .      .. .oL+|</span><br><span class="line">|     B   o  . *o*|</span><br><span class="line">|    B o + .  . Bo|</span><br><span class="line">|   o B @ o    ..=|</span><br><span class="line">|    . @ P     oo.|</span><br><span class="line">|     . O      .o.|</span><br><span class="line">|    . . .    . . |</span><br><span class="line">|     . . .. o    |</span><br><span class="line">|          .+     |</span><br><span class="line">+----[SHA256]-----+</span><br></pre></td></tr></table></figure><p>生成 authorized_keys</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">cat ~/.ssh/id_rsa.pub &gt;&gt; ~/.ssh/authorized_keys</span><br></pre></td></tr></table></figure><p>使用以下命令测试能否实现免密码登陆</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ssh master</span><br></pre></td></tr></table></figure><p>如果你只输入了 <code>Yes</code> 就登陆成功了，说明这一步你配置成功了。然后输入 <code>exit</code> 退出后进行下一步配置。</p><h2 id="配置-Hadoop"><a href="#配置-Hadoop" class="headerlink" title="配置 Hadoop"></a>配置 Hadoop</h2><blockquote><p>开始配置之前我建议你创建一次快照</p></blockquote><p>进入 Hadoop 目录</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">cd /root/hadoop-2.6.0/etc/hadoop/</span><br></pre></td></tr></table></figure><h3 id="core-site-xml"><a href="#core-site-xml" class="headerlink" title="core-site.xml"></a>core-site.xml</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi core-site.xml</span><br></pre></td></tr></table></figure><p>在 <code>&lt;configuration&gt;</code> 和 <code>&lt;/configuration&gt;</code> 之间添加以下内容</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">&lt;!-- 指定 HADOOP 所使用的文件系统 schema（URI），HDFS 的老大（NameNode）的地址 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>fs.defaultFS<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>hdfs://master:9000<span class="tag">&lt;/<span class="name">value</span>&gt;</span> # 本机 hostname</span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- 指定 hadoop 运行时产生文件的存储目录 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>hadoop.tmp.dir<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>/root/hadoop-2.6.0/tmp<span class="tag">&lt;/<span class="name">value</span>&gt;</span> # 储存目录</span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br></pre></td></tr></table></figure><h3 id="hdfs-site-xml"><a href="#hdfs-site-xml" class="headerlink" title="hdfs-site.xml"></a>hdfs-site.xml</h3><p><code>vi hdfs-site.xml</code></p><p>在 <code>&lt;configuration&gt;</code> 和 <code>&lt;/configuration&gt;</code> 之间添加以下内容</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">&lt;!-- 指定 HDFS 副本的数量 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>dfs.replication<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>1<span class="tag">&lt;/<span class="name">value</span>&gt;</span> # 伪分布,数据只储存在1个地方</span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br></pre></td></tr></table></figure><h3 id="mapred-site-xml"><a href="#mapred-site-xml" class="headerlink" title="mapred-site.xml"></a>mapred-site.xml</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">cp mapred-site.xml.template mapred-site.xml</span><br><span class="line">vi mapred-site.xml</span><br></pre></td></tr></table></figure><p>在 <code>&lt;configuration&gt;</code> 和 <code>&lt;/configuration&gt;</code> 之间添加以下内容</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">&lt;!-- 指定 mr 运行在 yarn 上 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>mapreduce.framework.name<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>yarn<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br></pre></td></tr></table></figure><h3 id="yarn-site-xml"><a href="#yarn-site-xml" class="headerlink" title="yarn-site.xml"></a>yarn-site.xml</h3><p><code>vi yarn-site.xml</code></p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">&lt;!-- 指定 YARN 的老大（ResourceManager）的地址 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.resourcemanager.hostname<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>master<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br><span class="line"></span><br><span class="line"><span class="comment">&lt;!-- reducer 获取数据的方式 --&gt;</span></span><br><span class="line"><span class="tag">&lt;<span class="name">property</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">name</span>&gt;</span>yarn.nodemanager.aux-services<span class="tag">&lt;/<span class="name">name</span>&gt;</span></span><br><span class="line">    <span class="tag">&lt;<span class="name">value</span>&gt;</span>mapreduce_shuffle<span class="tag">&lt;/<span class="name">value</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">property</span>&gt;</span></span><br></pre></td></tr></table></figure><h3 id="hadoop-env-sh"><a href="#hadoop-env-sh" class="headerlink" title="hadoop-env.sh"></a>hadoop-env.sh</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">vi hadoop-env.sh</span><br></pre></td></tr></table></figure><p>在文档最下面新增一行</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">export JAVA_HOME=/usr/java/jdk1.8.0_201-amd64</span><br></pre></td></tr></table></figure><h2 id="运行-Hadoop"><a href="#运行-Hadoop" class="headerlink" title="运行 Hadoop"></a>运行 Hadoop</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">hdfs namenode -format # 格式化 HDFS</span><br><span class="line">start-dfs.sh # 启动 HDFS</span><br><span class="line">start-yarn.sh # 启动 YARN</span><br><span class="line">mr-jobhistory-daemon.sh start historyserver # 启动 历史服务器</span><br><span class="line">hdfs dfs -chmod -R 755 /tmp # 赋予目录权限(这个问题暂时先这样把,最好还是用两个用户把它分离开.不直接使用root用户)</span><br><span class="line">start-all.sh # 很有把握的时候才用这个</span><br></pre></td></tr></table></figure><h2 id="停止-Hadoop"><a href="#停止-Hadoop" class="headerlink" title="停止 Hadoop"></a>停止 Hadoop</h2><p>调整了配置后，一定要先停止再启动</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">stop-dfs.sh # 停止 HDFS</span><br><span class="line">stop-yarn.sh # 停止 YARN</span><br><span class="line">mr-jobhistory-daemon.sh stop historyserver # 停止 MapReduce</span><br><span class="line">stop-all.sh # 停止所有</span><br></pre></td></tr></table></figure><p>其他 大数据系列文章 请看 <a href="https://blackyau.cc/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/">这里</a></p><h2 id="测试"><a href="#测试" class="headerlink" title="测试"></a>测试</h2><p><a href="http://192.168.66.135:50070/">http://192.168.66.135:50070</a> # NameNode</p><p><img data-src="https://st.blackyau.net/blog/11/15.png" alt="15"></p><p><a href="http://192.168.66.135:19888/">http://192.168.66.135:19888</a> # 历史服务器</p><p><img data-src="https://st.blackyau.net/blog/11/16.png" alt="16"></p><p><a href="http://192.168.66.135:8088/">http://192.168.66.135:8088</a> # 资源管理器</p><p><img data-src="https://st.blackyau.net/blog/11/17.png" alt="17"></p><h2 id="参考"><a href="#参考" class="headerlink" title="参考"></a>参考</h2><p><a href="https://item.jd.com/12109713.html">Hadoop: The Definitive Guide@Tom White</a></p><p><a href="https://blog.csdn.net/dancheng1/article/details/78512028">csdn@dancheng_work - centos 7关闭防火墙</a></p><p><a href="https://www.cnblogs.com/whoamme/p/4039998.html">博客园@WhoAmMe - CentOS添加环境变量</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;单机安装 Hadoop 还是比较简单，这里就使用 VMware 模拟部署一下。年轻人的一次大数据之旅？？&lt;/p&gt;</summary>
    
    
    
    <category term="大数据" scheme="https://blackyau.cc/categories/%E5%A4%A7%E6%95%B0%E6%8D%AE/"/>
    
    
    <category term="CentOS" scheme="https://blackyau.cc/tags/CentOS/"/>
    
    <category term="Linux" scheme="https://blackyau.cc/tags/Linux/"/>
    
    <category term="Hadoop" scheme="https://blackyau.cc/tags/Hadoop/"/>
    
    <category term="HDFS" scheme="https://blackyau.cc/tags/HDFS/"/>
    
    <category term="YARN" scheme="https://blackyau.cc/tags/YARN/"/>
    
    <category term="VMware" scheme="https://blackyau.cc/tags/VMware/"/>
    
    <category term="MapReduce" scheme="https://blackyau.cc/tags/MapReduce/"/>
    
  </entry>
  
  <entry>
    <title>真机安装 CentOS 7</title>
    <link href="https://blackyau.cc/10"/>
    <id>https://blackyau.cc/10</id>
    <published>2019-03-27T11:45:17.000Z</published>
    <updated>2025-03-02T12:55:01.000Z</updated>
    
    <content type="html"><![CDATA[<p>在家里用一台老古董台式机装了个 CentOS 遇到了一些以前用云服务器根本不会遇到的问题，这里也就做一下记录。</p><div class="note danger"><h4 id="警告"><a href="#警告" class="headerlink" title="警告"></a>警告</h4><p>CentOS Linux 7 已经于 2024 年 6 月 30 日终止生命周期（EOL），不建议继续在继续使用 CentOS 7 了，推荐使用 <a href="https://rockylinux.org/">Rocky Linux</a> 替代。</p></div><span id="more"></span><h2 id="准备工作"><a href="#准备工作" class="headerlink" title="准备工作"></a>准备工作</h2><p>直接在 <a href="https://www.centos.org/download/">官网</a> 下载的完整镜像，然后用 UltraISO 写进U盘制作成启动盘的。其实在安装 CentOS 之前，我为了安装 Ubuntu 断断续续折腾了两天的时间。就是初始化安装完毕后，始终进不去系统实在是搞不明白所以就转投了 CentOS 。</p><h2 id="进入安装向导"><a href="#进入安装向导" class="headerlink" title="进入安装向导"></a>进入安装向导</h2><p>使用 <code>UEFI</code> 方式从U盘启动后，会进入到这个界面。👇</p><p><img data-src="https://st.blackyau.net/blog/10/1.png" alt="1"></p><p>按 <code>e</code> 编辑一下(如果是使用 <code>Legacy BIOS</code> 启动，则按 <code>tab</code> 键)，将第一行的 <code>linuxefi /images/pxeboot/vmlinuz inst.stage2=hd:.......</code> 👇</p><p><img data-src="https://st.blackyau.net/blog/10/2.png" alt="2"></p><p>改为下面的 <code>linuxefi /images/pxeboot/vmlinuz initrd=initrd.img linux dd quiet</code> 这里是为了确定启动盘的 <code>设备名</code> 便于后续的安装。👇</p><p><img data-src="https://st.blackyau.net/blog/10/3.png" alt="3"></p><p>按下 <code>Ctrl-x</code> 后，等待片刻你就会在这里停下来。最好在这里拍个照记录一下，自己通过 <code>LABEL</code> 和 <code>TYPE</code> 记住 CentOS 启动盘的 <code>设备名</code> 。（下图是 <code>sda4</code> ）👇</p><p><img data-src="https://st.blackyau.net/blog/10/4.png" alt="4"></p><p>按电源键重启计算机，还是从U盘启动。还是按 <code>e</code> 编辑一下，不过这次要把第一行修改为 <code>linuxefi /images/pxeboot/vmlinuz inst.stage2=hd:/dev/设备名 quiet</code> 。（下图是 <code>sda4</code> ）然后按下<code>Ctrl+x</code> 进入安装向导。</p><p><img data-src="https://st.blackyau.net/blog/10/5.png" alt="5"></p><p><img data-src="https://st.blackyau.net/blog/10/6.png" alt="6"></p><h2 id="安装向导"><a href="#安装向导" class="headerlink" title="安装向导"></a>安装向导</h2><h3 id="选择语言"><a href="#选择语言" class="headerlink" title="选择语言"></a>选择语言</h3><p>毫不犹豫的选择 <code>English</code> ，遇到了一些问题比较方便找到解决方法。</p><p><img data-src="https://st.blackyau.net/blog/10/7.png" alt="7"></p><h3 id="硬盘分区"><a href="#硬盘分区" class="headerlink" title="硬盘分区"></a>硬盘分区</h3><p>点击 <code>SYSTEM - INSTALLATION DESTINATION</code> </p><p><img data-src="https://st.blackyau.net/blog/10/8.png" alt="8"></p><p>选中要作为启动盘的硬盘，并点击左下方的 <code>Set as Boot Device</code> </p><p><img data-src="https://st.blackyau.net/blog/10/9.jpg" alt="9"></p><p>因为我有多个物理磁盘，这里我使用了手动分配。并使用 LVM 进行磁盘管理，方便后续的扩容。</p><p>点击下面的 <code>x storage devices selected</code> 勾选上所有磁盘，在进行下一步的处理。</p><p><img data-src="https://st.blackyau.net/blog/10/10.jpg" alt="10"></p><p>添加分区后，点击左侧 <code>Volume Group - Modify</code> 创建一个名为 <code>all</code> 分组，并选择了所有的磁盘。再逐步调整。</p><p>我的设置如下：</p><p><img data-src="https://st.blackyau.net/blog/10/11.jpg" alt="11"></p><p><img data-src="https://st.blackyau.net/blog/10/12.jpg" alt="12"></p><p>还需要注意一下， <code>bios</code> 不能使用 <code>LVM</code> 分组。文件系统也不能使用 <code>LVM</code> ，这里的 <code>/</code> 和 <code>swap</code> 分区大小请各位根据自己的情况而定。</p><h3 id="时间"><a href="#时间" class="headerlink" title="时间"></a>时间</h3><p>磁盘配置完毕后，建议配置一下时间。建议与计算机当前当地时间保持一致，或者是你期望的位置。</p><p><img data-src="https://st.blackyau.net/blog/10/13.jpg" alt="13"></p><h3 id="登陆凭证"><a href="#登陆凭证" class="headerlink" title="登陆凭证"></a>登陆凭证</h3><p>一切准备就绪后，点击右下方的 <code>Begin Installation</code> 即可开始最后的登陆凭证设置。这里就不再赘述。</p><p><img data-src="https://st.blackyau.net/blog/10/14.png" alt="14"></p><h2 id="安装完毕"><a href="#安装完毕" class="headerlink" title="安装完毕"></a>安装完毕</h2><p><img data-src="https://st.blackyau.net/blog/10/15.png" alt="15"></p><h2 id="参考"><a href="#参考" class="headerlink" title="参考:"></a>参考:</h2><p><a href="http://linux.vbird.org/linux_basic/0157installcentos7.php">鳥哥的 Linux 私房菜 - 第三章、安裝 CentOS7.x</a></p><p><a href="https://zhuanlan.zhihu.com/p/35161351">知乎@cenbug - 真机安装CentOS7&#x2F;Linux</a></p><p><a href="https://www.cnblogs.com/wudongyu/p/6673784.html">博客园@期待某一天 - 笔记本真机安装centos7</a></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;在家里用一台老古董台式机装了个 CentOS 遇到了一些以前用云服务器根本不会遇到的问题，这里也就做一下记录。&lt;/p&gt;
&lt;div class=&quot;note danger&quot;&gt;&lt;h4 id=&quot;警告&quot;&gt;&lt;a href=&quot;#警告&quot; class=&quot;headerlink&quot; title=&quot;警告&quot;&gt;&lt;/a&gt;警告&lt;/h4&gt;&lt;p&gt;CentOS Linux 7 已经于 2024 年 6 月 30 日终止生命周期（EOL），不建议继续在继续使用 CentOS 7 了，推荐使用 &lt;a href=&quot;https://rockylinux.org/&quot;&gt;Rocky Linux&lt;/a&gt; 替代。&lt;/p&gt;
&lt;/div&gt;</summary>
    
    
    
    <category term="教程" scheme="https://blackyau.cc/categories/%E6%95%99%E7%A8%8B/"/>
    
    
    <category term="CentOS" scheme="https://blackyau.cc/tags/CentOS/"/>
    
    <category term="Linux" scheme="https://blackyau.cc/tags/Linux/"/>
    
  </entry>
  
  <entry>
    <title>使用 PGP/GPG 进行数字签名加密解密</title>
    <link href="https://blackyau.cc/9"/>
    <id>https://blackyau.cc/9</id>
    <published>2018-06-09T15:27:00.000Z</published>
    <updated>2019-04-06T03:36:24.000Z</updated>
    
    <content type="html"><![CDATA[<p>PGP 是用于信息加密，应用程序来源验证的加密协议，这里会介绍 PGP 在 Windows 平台下的使用方法</p><span id="more"></span><h2 id="PGP-GPG-GnuPG-OpenPGP-关系和区别"><a href="#PGP-GPG-GnuPG-OpenPGP-关系和区别" class="headerlink" title="PGP&#x2F;GPG&#x2F;GnuPG&#x2F;OpenPGP 关系和区别"></a>PGP&#x2F;GPG&#x2F;GnuPG&#x2F;OpenPGP 关系和区别</h2><p>PGP 是 Phil Zimmermann 于1991年开发的商业应用程序，1997年7月，PGP Inc.与齐默尔曼同意 IETF 制定一项公开的互联网标准，称作 OpenPGP，任何支持这一标准的程序也被允许称作 OpenPGP。GnuPG 则是支持了这一标准的开源应用程序。提个小故事：</p><blockquote><p>当年的 PGP 之父（Phil Zimmermann）因为发明了 PGP 这款文件&#x2F;邮件加密工具，差点被美国政府抓去坐牢。当时（上世纪90年代初）的美国法律规定，“高强度加密技术”属于军用技术，不允许出口到海外。Phil Zimmermann 后来想了个妙计——通过 MIT 出版社出了一本书，把 PGP 的全部源代码放到书中，然后援引“美国宪法第一修正案”（出版物受“言论自由”的保护），才推翻了对他的指控。<a href="https://program-think.blogspot.com/2014/06/truecrypt-dead.html"><sup>来源</sup></a></p></blockquote><h2 id="PGP-加解密和数字签名简单介绍"><a href="#PGP-加解密和数字签名简单介绍" class="headerlink" title="PGP 加解密和数字签名简单介绍"></a>PGP 加解密和数字签名简单介绍</h2><p>我们可以生成世界上唯一的一组公钥和密钥，公钥用于加密，密钥用于解密。我们一般只公开公钥，他人使用公钥加密后在通过任何方式发送给我。但是因为密钥只能解密<em>使用与自己配对的公钥加密后的数据</em>，这样就保证了使用了我公钥加密后的数据只能由我解密查看。这就是PGP加解密的基本原理。</p><p>数字签名则是用来校验文件完整性的它不具备加密功能，它在发送的信息末尾都会加上一个用于检验的字段（签名）。这个签名是基于发布者的信息和文件计算出来的，接收者使用发送者的公钥就可以核对信息是否遭到篡改。大部分支持OpenPGP协议的应用，都会在加密的同时默认开启这个特性。</p><h2 id="Windows-下使用-GnuPG-加密-解密-签名-校验"><a href="#Windows-下使用-GnuPG-加密-解密-签名-校验" class="headerlink" title="Windows 下使用 GnuPG 加密&#x2F;解密&#x2F;签名&#x2F;校验"></a>Windows 下使用 GnuPG 加密&#x2F;解密&#x2F;签名&#x2F;校验</h2><p>Windows 环境下我选择 <a href="https://gpg4win.org/">Gpg4win</a> 作为本次教程的对象，它使用 GnuPG 作为加解密后端、使用 Kleopatra 作为证书管理器和通用加密对话框。更多可查看<a href="https://gpg4win.org/about.html">关于</a>，并且它拥有可视化图形界面，最重要的是它以及它包含的工具都是<a href="https://gpg4win.org/about.html">开源应用</a>。</p><h3 id="下载安装-Gpg4win"><a href="#下载安装-Gpg4win" class="headerlink" title="下载安装 Gpg4win"></a>下载安装 Gpg4win</h3><p>直接在 Gpg4win 主页 <a href="https://gpg4win.org/">https://gpg4win.org</a> 下载并安装到你认为合适的路径即可</p><h3 id="使用-Gpg4win-加解密-签名检验文件"><a href="#使用-Gpg4win-加解密-签名检验文件" class="headerlink" title="使用 Gpg4win 加解密&#x2F;签名检验文件"></a>使用 Gpg4win 加解密&#x2F;签名检验文件</h3><p>打开桌面或开始中的 Kleopatra ，并点击「新建密钥对」</p><p><img data-src="https://st.blackyau.net/blog/9/1.png" alt="1"></p><p>在对话框中输入你所期望的信息(对任何人都是可见的)</p><p><img data-src="https://st.blackyau.net/blog/9/2.png" alt="2"></p><p>会让你输入一个密码，用来加密密钥</p><p><img data-src="https://st.blackyau.net/blog/9/3.png" alt="3"></p><p>「生成你的密钥对的副本」可以让你保存密钥和公钥到其他地方，「将公钥上传到目录服务」可以将你的公钥上传到服务器，这样的话别人就可以通过昵称&#x2F;邮箱或指纹搜索到你的公钥(只有指纹具有唯一性)。注：将公钥上传到服务器是一种不可逆的行为，就算密钥到期了也不会被删除而是会被标记上「已过期」。</p><p>Kleopatra默认密钥服务器地址:<a href="http://pool.sks-keyservers.net/">http://pool.sks-keyservers.net/</a></p><p><img data-src="https://st.blackyau.net/blog/9/4.png" alt="4"></p><p>新建密钥对后就可以选择加密文件了，点击左上角的「签名&#x2F;加密」。选择一个文件加密后，别人在没有你的私钥的情况下就无法解密你的文件了。当你需要使用别人的公钥加密文件时，要先导入他人的公钥。并在加密文件时勾选「为他人加密」，在输入他人的名字并选中即可。</p><p><img data-src="https://st.blackyau.net/blog/9/5.png" alt="5"></p><p>解密时你必须要先导入对应的密钥对，同时还需要提供密码，并点击「Save All」</p><p><img data-src="https://st.blackyau.net/blog/9/6.png" alt="6"></p><h3 id="使用-Gpg4win-加解密-签名检验文本"><a href="#使用-Gpg4win-加解密-签名检验文本" class="headerlink" title="使用 Gpg4win 加解密&#x2F;签名检验文本"></a>使用 Gpg4win 加解密&#x2F;签名检验文本</h3><p>点击工具栏最后一项「记事本」，可以对纯文本进行加解密和签名校验。</p><p><img data-src="https://st.blackyau.net/blog/9/7.png" alt="7"></p><p>在文本框中填入内容在「收件人」选项中选择加解密或签名检验密钥，点击上方的「签名&#x2F;加密 Notepa」或「Decryep &#x2F;Verify Notepa」可进行相应的操作。</p><p><img data-src="https://st.blackyau.net/blog/9/8.png" alt="8"></p><h3 id="导出密钥对"><a href="#导出密钥对" class="headerlink" title="导出密钥对"></a>导出密钥对</h3><p>选中你的密钥，右键「导出绝密密钥」即可</p><h2 id="来试试使用PGP解密以下纯文本吧"><a href="#来试试使用PGP解密以下纯文本吧" class="headerlink" title="来试试使用PGP解密以下纯文本吧"></a>来试试使用PGP解密以下纯文本吧</h2><p>密文:<a href="https://st.blackyau.net/blog/file/Ciphertext.txt">https://st.blackyau.net/blog/file/Ciphertext.txt</a> (你需要将里面的内容复制出来)</p><p>密钥对:<a href="https://st.blackyau.net/blog/file/LetPGPFly.gpg">https://st.blackyau.net/blog/file/LetPGPFly.gpg</a> (可加密&#x2F;解密)</p><p>公钥:<a href="https://st.blackyau.net/blog/file/LetPGPFly.asc">https://st.blackyau.net/blog/file/LetPGPFly.asc</a> (仅可加密)</p><h2 id="其他平台"><a href="#其他平台" class="headerlink" title="其他平台"></a>其他平台</h2><p>Android: OpenKeychain: Easy PGP(加密时第一项「加密到」需要你手动输入加密密钥的昵称)</p><blockquote><p>Google Play:<a href="https://play.google.com/store/apps/details?id=org.sufficientlysecure.keychain">https://play.google.com/store/apps/details?id=org.sufficientlysecure.keychain</a><br>F-Droid:<a href="https://f-droid.org/packages/org.sufficientlysecure.keychain/">https://f-droid.org/packages/org.sufficientlysecure.keychain/</a><br><a href="https://bbs.letitfly.me/d/985">LetITFly BBS@Yongmeng使用 OpenKeychain 管理 OpenPGP 密钥</a></p></blockquote><p>Mac: GPG Suite</p><blockquote><p><a href="https://gpgtools.org/">https://gpgtools.org/</a></p></blockquote><p>Linux: GnuPG</p><blockquote><p><a href="https://www.gnupg.org/download/index.html">https://www.gnupg.org/download/index.html</a></p></blockquote><h2 id="外部链接"><a href="#外部链接" class="headerlink" title="外部链接"></a>外部链接</h2><p><a href="https://program-think.blogspot.com/2014/06/truecrypt-dead.html#head-2">分析一下 TrueCrypt 之死（自杀 or 他杀？），介绍一下应对措施 - 编程随想</a></p><p><a href="https://program-think.blogspot.com/2013/02/file-integrity-check.html#head-9">扫盲文件完整性校验——关于散列值和数字签名 - 编程随想</a></p><p><a href="https://program-think.blogspot.com/2017/03/Why-Linux-Is-More-Secure-Than-Windows-and-macOS.html#head-7">为什么桌面系统装 Linux 可以做到更好的安全性（相比 Windows &amp; macOS 而言） - 编程随想</a></p><p><a href="https://www.zhihu.com/question/60520344/answer/218561457">上传到公钥服务器的gpg公钥过期了会被删除吗？ 知乎@胡涵铭</a></p><p><a href="http://www.williamlong.info/archives/3439.html">使用GnuPG(PGP)加密信息及数字签名教程 - 月光博客</a></p><h2 id="版权声明"><a href="#版权声明" class="headerlink" title="版权声明"></a>版权声明</h2><p>本文中除引用自互联网内容外，其他内容均以<a href="https://creativecommons.org/publicdomain/zero/1.0/deed.zh">CC0 1.0 通用 (CC0 1.0) 公共领域贡献</a>方式授权。我已经将作品 贡献 至公共领域，在法律允许的范围，放弃所有我在全世界范围内基于著作权法对作品享有的权利，包括所有相关权利和邻接权利。 您可以复制、修改、发行和表演本作品，甚至可用于商业性目的，都无需同意和标出署名。</p><p><img data-src="https://licensebuttons.net/p/zero/1.0/88x15.png" alt="license"></p>]]></content>
    
    
    <summary type="html">&lt;p&gt;PGP 是用于信息加密，应用程序来源验证的加密协议，这里会介绍 PGP 在 Windows 平台下的使用方法&lt;/p&gt;</summary>
    
    
    
    <category term="教程" scheme="https://blackyau.cc/categories/%E6%95%99%E7%A8%8B/"/>
    
    
    <category term="PGP" scheme="https://blackyau.cc/tags/PGP/"/>
    
    <category term="GPG" scheme="https://blackyau.cc/tags/GPG/"/>
    
    <category term="数字签名" scheme="https://blackyau.cc/tags/%E6%95%B0%E5%AD%97%E7%AD%BE%E5%90%8D/"/>
    
    <category term="加密" scheme="https://blackyau.cc/tags/%E5%8A%A0%E5%AF%86/"/>
    
    <category term="解密" scheme="https://blackyau.cc/tags/%E8%A7%A3%E5%AF%86/"/>
    
  </entry>
  
  <entry>
    <title>使用UltraEdit一键转换MAT方案为IFW</title>
    <link href="https://blackyau.cc/8"/>
    <id>https://blackyau.cc/8</id>
    <published>2017-09-17T15:09:00.000Z</published>
    <updated>2017-09-18T07:26:00.000Z</updated>
    
    <content type="html"><![CDATA[<p>使用 Intent Firewall 可以让你优雅的做到类似 My Android Tools 禁用组件的效果</p><span id="more"></span><p>My Android Tools作为一位Android优化软件新秀，使用与其他优化软件单纯的杀后台抑制后台唤醒完全不同的方法，深受Android倒腾党的喜爱。但任何应用App都可以使用公开的接口<a href="https://developer.android.com/reference/android/content/pm/PackageManager.html?hl=zh-cn#setComponentEnabledSetting">Android Developer</a>来重新激活自己的组件。My Android Tools作者也考虑到了这个问题，同时也推出了一个<a href="https://www.coolapk.com/apk/cn.wq.myandroidtoolsxposed">Xposed模块</a>阻止这个api的调用。</p><p>Xposed模块的滥用又会导致系统执行效率，详细原因可以查看该文<a href="https://blog.nfz.moe/archives/why-xposed-cause-unsmooth-exprience.html">为什么安装 Xposed 以后会导致卡顿</a>。所以又有人挖掘除了新的方法，能够在仅ROOT的情况下阻止禁用后的Service&#x2F;Receiver&#x2F;Activity调用api重新激活自己的组件。绿色守护在V3.0时也在新推出的“处方”功能时，使用了该特性。<strong>Intent Firewall</strong></p><h2 id="Intent是什么？"><a href="#Intent是什么？" class="headerlink" title="Intent是什么？"></a>Intent是什么？</h2><p>Activity、服务和广播接收器，是通过名为 Intent 的消息进行启动的。可以将 Intent 视为从其他组件请求操作的信使，无论组件属于这个应用还是其他应用。<br>对于 Activity 和服务， Intent 传达要执行的操作。例如， Intent 传达的请求可以是打开应用设置界面的Activity，或是启动下载某文件的服务<br>对于广播接收器， Intent 只会定义要广播的通知，可以响应这个广播的接收器会被启动以执行下一步操作</p><h2 id="Intent-Firewall的原理？"><a href="#Intent-Firewall的原理？" class="headerlink" title="Intent Firewall的原理？"></a>Intent Firewall的原理？</h2><p>写轮眼是通过关闭组件来达到效果，而处方则是拦截了启动组件的途径。Intent防火墙是Android框架的一个组件，允许根据XML文件中定义的规则强制阻止intent。Intent防火墙不是Android框架的官方支持的功能，没有官方文档，只能通过阅读源代码了解。<br>每一个在Android框架中启动的Intent，包括由操作系统创建的Intent，都会通过Intent防火墙。这意味着Intent防火墙有权允许或拒绝任何Intent。Intent防火墙在决定如何处理传入Intent时不考虑发送方，只考虑Intent的细节和其预期的接收者。XML文件被写入到<code>/data/system/ifw</code>目录并可以删除，这使得Intent防火墙可以动态更新其规则集。<br>绿色守护在此基础上进行开发，使处方成为一个非常强大的功能，但严重依赖网友贡献的意图筛选规则，比My Android Tools更甚。</p><h2 id="Intent-Firewall的编写规则？"><a href="#Intent-Firewall的编写规则？" class="headerlink" title="Intent Firewall的编写规则？"></a>Intent Firewall的编写规则？</h2><p>你可以参考<a href="http://www.cis.syr.edu/~wedu/android/IntentFirewall/">http://www.cis.syr.edu/~wedu&#x2F;android&#x2F;IntentFirewall&#x2F;</a>。下面是一个实例</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">rules</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">broadcast</span> <span class="attr">block</span>=<span class="string">&quot;true&quot;</span> <span class="attr">log</span>=<span class="string">&quot;false&quot;</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">component-filter</span> <span class="attr">name</span>=<span class="string">&quot;com.tencent.mobileqq/cooperation.dingdong.DingdongPluginProxyBroadcastReceiver&quot;</span> /&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">component-filter</span> <span class="attr">name</span>=<span class="string">&quot;com.tencent.mobileqq/cooperation.qzone.QzoneProxyReceiver&quot;</span> /&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">component-filter</span> <span class="attr">name</span>=<span class="string">&quot;com.tencent.mobileqq/com.tencent.open.downloadnew.common.DownloadReceiver&quot;</span> /&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">component-filter</span> <span class="attr">name</span>=<span class="string">&quot;com.tencent.mobileqq/com.tencent.open.downloadnew.common.DownloadReceiverWebProcess&quot;</span> /&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">component-filter</span> <span class="attr">name</span>=<span class="string">&quot;com.tencent.mobileqq/com.tencent.open.business.base.appreport.AppReportReceiver&quot;</span> /&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">component-filter</span> <span class="attr">name</span>=<span class="string">&quot;com.tencent.mobileqq/cooperation.weiyun.WeiyunProxyBroadcastReceiver&quot;</span> /&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">component-filter</span> <span class="attr">name</span>=<span class="string">&quot;com.tencent.mobileqq/cooperation.weiyun.WeiyunBroadcastReceiver&quot;</span> /&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">component-filter</span> <span class="attr">name</span>=<span class="string">&quot;com.tencent.mobileqq/com.tencent.mobileqq.msf.core.NetConnInfoCenter&quot;</span> /&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">broadcast</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">service</span> <span class="attr">block</span>=<span class="string">&quot;true&quot;</span> <span class="attr">log</span>=<span class="string">&quot;false&quot;</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">component-filter</span> <span class="attr">name</span>=<span class="string">&quot;com.tencent.mobileqq/com.tencent.mobileqq.app.CoreService&quot;</span> /&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">component-filter</span> <span class="attr">name</span>=<span class="string">&quot;com.tencent.mobileqq/com.tencent.mobileqq.app.CoreService$KernelService&quot;</span> /&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">component-filter</span> <span class="attr">name</span>=<span class="string">&quot;com.tencent.mobileqq/cooperation.qzone.remote.logic.QzoneWebPluginProxyService&quot;</span> /&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">component-filter</span> <span class="attr">name</span>=<span class="string">&quot;com.tencent.mobileqq/com.tencent.tmdownloader.TMAssistantDownloadService&quot;</span> /&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">service</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">activity</span> <span class="attr">block</span>=<span class="string">&quot;true&quot;</span> <span class="attr">log</span>=<span class="string">&quot;false&quot;</span>&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">component-filter</span> <span class="attr">name</span>=<span class="string">&quot;com.tencent.mobileqq/com.tencent.mobileqq.activity.UpgradeActivity&quot;</span> /&gt;</span></span><br><span class="line">  <span class="tag">&lt;<span class="name">component-filter</span> <span class="attr">name</span>=<span class="string">&quot;com.tencent.mobileqq/com.tencent.mobileqq.activity.UpgradeDetailActivity&quot;</span> /&gt;</span></span><br><span class="line"> <span class="tag">&lt;/<span class="name">activity</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">rules</span>&gt;</span></span><br></pre></td></tr></table></figure><p><code>broadcast</code>为广播，<code>service</code>为服务，<code>activity</code>为活动。其他的你应该很容易的看出规律。</p><h2 id="如何将MAT备份转为IFW方案？"><a href="#如何将MAT备份转为IFW方案？" class="headerlink" title="如何将MAT备份转为IFW方案？"></a>如何将MAT备份转为IFW方案？</h2><p>首先下载安装UltraEdit个人推荐zd426破解版，因主站易主不建议使用最新版，这里提供可信的历史版直链下载(基于又拍云的个人CDN)和蓝奏云<br><a href="https://st.blackyau.net/dl/UltraEdit/UltraEdit_v25.0.0.82_x64_zh_CN.7z">UltraEdit_v25.0.0.82_x64_zh_CN</a><br><a href="https://st.blackyau.net/dl/UltraEdit/UltraEdit_v25.0.0.82_x32_zh_CN.7z">UltraEdit_v25.0.0.82_x32_zh_CN</a><br><a href="https://st.blackyau.net/dl/UltraEdit/UltraEdit.exe">UltraEditv 8.20 简体中文汉化经典版单文件</a><br><a href="https://www.lanzous.com/b275916">https://www.lanzous.com/b275916</a> 密码:hg3y</p><p>然后下载<a href="https://st.blackyau.net/blog/8/MAT%E8%BD%ACIFW-V2.0.mac">MAT转IFW.mac</a><br><img data-src="https://st.blackyau.net/blog/8/1.png" alt="1"></p><p><strong>在转换时不建议使用键盘或鼠标进行任何操作，在转换量较大时应用会未响应属正常现象。等待即可。</strong><br>目前在<code>MAT转IFW-new</code>中使用了新的思路，来自<a href="https://bbs.letitfly.me/d/100/11">cubesky</a>，转换速度快到令人发指。实现方法为，先在两边加上前缀和后缀，后将整个MAT方案复制两遍然后在将IFW用于判定broadcast、service、activity属性的语句，重复三遍。</p><figure class="highlight xml"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag">&lt;<span class="name">rules</span>&gt;</span></span><br><span class="line"> <span class="tag">&lt;<span class="name">broadcast</span> <span class="attr">block</span>=<span class="string">&quot;true&quot;</span> <span class="attr">log</span>=<span class="string">&quot;false&quot;</span>&gt;</span></span><br><span class="line">    我是毒瘤服务1</span><br><span class="line">    我是毒瘤服务2</span><br><span class="line">    我是毒瘤广播1</span><br><span class="line">    我是毒瘤广播2</span><br><span class="line">    我是毒瘤活动1</span><br><span class="line"><span class="tag">&lt;/<span class="name">broadcast</span>&gt;</span></span><br><span class="line"></span><br><span class="line"> <span class="tag">&lt;<span class="name">service</span> <span class="attr">block</span>=<span class="string">&quot;true&quot;</span> <span class="attr">log</span>=<span class="string">&quot;false&quot;</span>&gt;</span></span><br><span class="line">    我是毒瘤服务1</span><br><span class="line">    我是毒瘤服务2</span><br><span class="line">    我是毒瘤广播1</span><br><span class="line">    我是毒瘤广播2</span><br><span class="line">    我是毒瘤活动1</span><br><span class="line"><span class="tag">&lt;/<span class="name">service</span>&gt;</span></span><br><span class="line"></span><br><span class="line"> <span class="tag">&lt;<span class="name">activity</span> <span class="attr">block</span>=<span class="string">&quot;true&quot;</span> <span class="attr">log</span>=<span class="string">&quot;false&quot;</span>&gt;</span></span><br><span class="line">    我是毒瘤服务1</span><br><span class="line">    我是毒瘤服务2</span><br><span class="line">    我是毒瘤广播1</span><br><span class="line">    我是毒瘤广播2</span><br><span class="line">    我是毒瘤活动1</span><br><span class="line"> <span class="tag">&lt;/<span class="name">activity</span>&gt;</span></span><br><span class="line"><span class="tag">&lt;/<span class="name">rules</span>&gt;</span></span><br></pre></td></tr></table></figure><p>经测试，方案中的所有服务&#x2F;广播&#x2F;活动都被IFW正常干掉。具体会导致什么负面效果有待进一步的查证，现在看来除了会让IFW配置文件变得大一点还没发现什么问题。</p><p><code>MAT转IFW-old</code>使用了broadcast、service、activity关键字来判断组件属性，生成IFW方案比较干净没有多余的内容。不过如果有不按标准命名的组件，将会在宏运行结束后在最底部需要你手动操作一下。你可以将光标停留在任意一行，通过快捷键ALT+F1|ALT+F2|ALT+F3分别将该行组件插入服务|广播|活动分组。</p><p>我正在寻找能够高效率&#x2F;跨平台完成MAT到IFW的转换方法，如果你有建议欢迎给我提出。这会对我有极大的帮助。</p><p>如果您由于各种原因，无法自己转换。可以将MAT方案发送到 <a href="mailto:&#98;&#108;&#97;&#99;&#x6b;&#x79;&#x61;&#117;&#52;&#x32;&#x36;&#64;&#x67;&#x6d;&#97;&#x69;&#108;&#46;&#x63;&#111;&#109;">blackyau426@gmail.com</a> 、Telegram:@Black_Yau或是直接在博客评论区留言。我会尽力为你解答。</p>]]></content>
    
    
    <summary type="html">&lt;p&gt;使用 Intent Firewall 可以让你优雅的做到类似 My Android Tools 禁用组件的效果&lt;/p&gt;</summary>
    
    
    
    <category term="教程" scheme="https://blackyau.cc/categories/%E6%95%99%E7%A8%8B/"/>
    
    
    <category term="MAT" scheme="https://blackyau.cc/tags/MAT/"/>
    
    <category term="IFW" scheme="https://blackyau.cc/tags/IFW/"/>
    
    <category term="My Android Tools" scheme="https://blackyau.cc/tags/My-Android-Tools/"/>
    
    <category term="Intent Firewall" scheme="https://blackyau.cc/tags/Intent-Firewall/"/>
    
    <category term="UltraEdit" scheme="https://blackyau.cc/tags/UltraEdit/"/>
    
    <category term="宏" scheme="https://blackyau.cc/tags/%E5%AE%8F/"/>
    
    <category term="一键" scheme="https://blackyau.cc/tags/%E4%B8%80%E9%94%AE/"/>
    
  </entry>
  
</feed>
