當前位置: 首頁>>技術教程>>正文


Googlebot大量請求wp-login.php?redirect_to=解決辦法

問題描述

最近觀察網站(Powered By WordPress)後台日誌發現,Googlebot大量請求/wp-login.php?redirect_to=xxx(xxx表示某個文章頁的URL)這一類頁麵。這些請求最後都直接返回/wp-login.php登陸頁麵的簡短內容,無論請求多少次,返回的內容都大同小異。這個情況,一方麵對搜索引擎非常不友好,大量URL對應的內容一致;另外一方麵,這種對網站搜索排名沒有意義的請求,卻浪費了較多的帶寬資源。問題截圖如下:
wp-login.php redirect_to googlebot
可能看不太清楚,這裏再貼幾條日誌:

"GET /wp-login.php?redirect_to=https%3A%2F%2Fvimsky.com%2Farticle%2F8140.html HTTP/1.1" 200 4689 "vimsky.com" "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
 "GET /wp-login.php?redirect_to=https%3A%2F%2Fvimsky.com%2Farticle%2F8145.html HTTP/1.1" 200 4689 "vimsky.com" "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
 "GET /wp-login.php?redirect_to=https%3A%2F%2Fvimsky.com%2Farticle%2F8144.html HTTP/1.1" 200 4688 "vimsky.com" "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
 "GET /wp-login.php?redirect_to=https%3A%2F%2Fvimsky.com%2Farticle%2F8142.html HTTP/1.1" 200 9775 "vimsky.com" "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
 "GET /wp-login.php?redirect_to=https%3A%2F%2Fvimsky.com%2Farticle%2F8136.html HTTP/1.1" 200 6666 "vimsky.com" "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
 "GET /wp-login.php?redirect_to=https%3A%2F%2Fvimsky.com%2Farticle%2F8143.html HTTP/1.1" 200 9781 "vimsky.com" "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
 "GET /wp-login.php?redirect_to=https%3A%2F%2Fvimsky.com%2Farticle%2F8129.html HTTP/1.1" 200 4687 "vimsky.com" "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
 "GET /wp-login.php?redirect_to=https%3A%2F%2Fvimsky.com%2Farticle%2F8135.html HTTP/1.1" 200 4687 "vimsky.com" "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
 "GET /wp-login.php?redirect_to=https%3A%2F%2Fvimsky.com%2Farticle%2F8133.html HTTP/1.1" 200 6008 "vimsky.com" "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"
 "GET /wp-login.php?redirect_to=https%3A%2F%2Fvimsky.com%2Farticle%2F8128.html HTTP/1.1" 200 6766 "vimsky.com" "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"

解決方案

剛看到這個問題的時候,還以為是有人在攻擊本站,想要暴力破解login賬號。但仔細分析之後,從請求的IP池、使用GET而非POST協議、以及訪問頻率等來看:這些應該是Googlebot的正常請求,問題可能出在vimsky站點本身頁麵上帶有這種鏈接。基於這個思路,經過一番查找,發現問題症結如下圖所示:
wp-login.php redirect_to
本站設置了登陸才能發表評論,所以這個地方有一個向登陸頁的重定向,所以Googlebot能發現這個鏈接並嘗試下載。那麽接下來的問題是,如何禁止Googlebot或者Baiduspider這樣的爬蟲抓取這樣的網頁呢?
通常來說一般有兩個方法:

一、給鏈接加上 rel="nofollow"屬性。

在鏈接上加上nofollow這個屬性,是告訴搜索引擎不要跟蹤這個鏈接。Wordpress的“登陸之後才能評論”對應的鏈接,位於文件wp-includes/comment-template.php大約2220行,修改之後如下:


2217         /** This filter is documented in wp-includes/link-template.php */
2218         'must_log_in'          => '< p class="must-log-in" >' . sprintf(
2219                                       /* translators: %s: login URL */
2220                                        str_replace("\">", "\" rel=\"nofollow\">", __( 'You must be logged in to post a comment.' )), 
2221                                       wp_login_url( apply_filters( 'the_permalink', get_permalink( $post_id ) ) )
2222                                   ) . '< /p>',
2223         /** This filter is documented in wp-includes/link-template.php */

考慮到不影響Wordpress原始代碼中的漢化(涉及./wp-content/languages/zh_CN.po文件),這裏簡單的對字符串做了str_replace替換,替換之後加上了rel="nofollow"屬性。

二、在網站的robots.txt文件中設置禁止訪問wp-login相關URL

在robots.txt加上禁止訪問wp-login相關的URL

User-agent: *
Disallow: /wp-admin
Disallow: /comments/feed
Disallow: /wp-login

最好二種方法都用上,更徹底地避免爬蟲對wp-login.php相關URL的請求。

本文由《純淨天空》出品。文章地址: https://vimsky.com/zh-tw/article/3313.html,轉載請注明來源鏈接。