且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

SEO - 无法访问的页面

更新时间:2023-12-03 08:14:16

以下是统计信息中引用的每个项目的列表、其含义及其来源.

Google 网页排名:这是谷歌的专有计算.Google 不提供用于获取页面排名的 API,但可以使用诸如this 之类的工具来确定网页排名.

站点地图:站点地图统计信息只是站点是否具有 sitemap.xml 文件.可以通过查找 domain.tld/sitemap.xml 文件来检查它.您可以在 sitemaps.org 上了解这些站点地图.

机器人.txt很像站点地图,这只是检查 domain.tld/robots.txt 文件.这记录在 robotstxt.org

页面错误:很难判断页面错误是什么,因为它们可能有多种情况.

无法访问的页面:这些是断开的链接.有像 W3C Link Checker 这样的工具可以检查页面上的所有链接,并确保没有返回 404 错误.您可以编写一个脚本,将页面上的所有链接转换为数组或对象,然后使用 PHP 或命令行工具,例如 wgetcurl 以获取资源的标题(链接).每次计算 404 错误时,Unreachable Pages 都会增加 1.

域名年龄&域到期:这些可以手动或实用地使用 Whois 搜索找到.>

希望这会有所帮助并祝您好运!

编辑

有一个关于如何编写断开链接检查器(无法访问的页面")的教程 这里.

I'm trying to make a personal web analyzer site using PHP.. I bought a script to get other seo data... Now I want to have the "Unreachable Pages" evaluation like the one shown at the bottom of this page http://free-website-analysis.net/website-analysis/website-analysis-seo-free/... The problem is I don't know how these are computed.. This is not included on the script I bought as well.. I've already googled alot, and I mean a lot, of websites also but I can't seem to find any site like this.. Can anyone help me with the computation or direct me to sites that show this information instead?

Here's a list of each item referenced in the statistics, what it means, and where it comes from.

Google Page Rank: This is Google's proprietary calculation. Google does not provide an API to get the Page Rank but there are tools such as this to determine a Page Rank.

Sitemap: The sitemap statistic is simply whether or not the site has a sitemap.xml file. It can be checked by looking for a domain.tld/sitemap.xml file. You can learn about these sitemaps at sitemaps.org.

Robots.txt Much like sitemaps, this just checks for a domain.tld/robots.txt file. This is documented at robotstxt.org

Page Errors: It's hard to tell what page errors are as they could be any number of things.

Unreachable Pages: These are broken links. There are tools like W3C Link Checker that go through all the links on the page and make sure none return a 404 error. You can write a script that turns all the links on the page into an array or object and then use PHP or a command line tool like wget or curl to get the headers of the resource (link). Each time you count a 404 error you increment Unreachable Pages by one.

Domain Age & Domain Expiration: These can be found manually or pragmatically using a Whois search.

Hope this helps and good luck!

EDIT

There is a tutorial on how to write a broken links checker ("unreachable pages") here.