Introducing AI Performance in Bing Webmaster Tools Public Preview
We are happy to introduce AI Performance in Bing Webmaster Tools, a new set of insights that shows how publisher content appears across Microsoft Copilot, AI-generated summaries in Bing, and select partner integrations. For the first time, you can understand how often your content is cited in generative answers, with clear visibility into which URLs are referenced and how citation activity changes over time.
Extending Search Insights to AI Answers
Bing Webmaster Tools has long helped website owners understand indexing, crawl health, and search performance. AI Performance extends those insights to AI-generated answers by showing where and how content from your site is referenced as a source across AI experiences.
As AI becomes a more common way people discover information, visibility is not only about blue links. It is also about whether your content is cited and referenced when AI systems generate answers. This release is an early step toward Generative Engine Optimization (GEO) tooling in Bing Webmaster Tools, helping publishers understand how their content participates in AI-driven experiences.
AI-performance-dashboard.png
AI Performance Dashboard: Visibility Across AI Experiences
The AI Performance dashboard provides a consolidated view of when your site is cited in AI answers.
What the dashboard measures
Total Citations
Shows the total number of citations that are displayed as sources in AI-generated answers during the selected time frame. This highlights how often your content is referenced by AI systems, without indicating placement or presentation within a specific answer.
Average Cited Pages
Shows the average number of unique pages from your site that are displayed as sources in AI-generated answers per day over the selected time range. Because the data is aggregated across supported AI surfaces, average cited pages reflect overall citation patterns and does not indicate ranking, authority, or the role of any page within an individual answer.
Grounding queries
Shows the key phrases the AI used when retrieving content that was referenced in AI-generated answers. The data shown represents a sample of overall citation activity. We will continue to refine this metric as additional data is processed.
Page-level citation activity
Shows citation counts for specific URLs from your site, making it easy to see which individual pages are most often referenced across AI-generated answers during the selected date range. This reflects how often pages are cited, not page importance, ranking, or placement.
Visibility trends over time
The timeline shows how citation activity for your site changes over time across supported AI experiences, making it easier to spot trends at a glance.
Important Note: Bing respects all content owner preferences expressed through robots.txt and other supported control mechanisms.
Using AI Performance Insights in Bing Webmaster Tools
By reviewing cited pages and grounding query phrases, AI Performance insights help clarify your content visibility in AI-generated answers.
These insights can help you:
Validate which pages are already being used as references in AI answers.
Identify content that appears frequently across AI answers.
Spot opportunities to improve clarity, structure, or completeness on pages that are indexed but less frequently cited.
Using These Insights to Improve Content
Once you understand which pages and topics are being cited, you can use those signals to guide content improvements.
Strengthen depth and expertise
Pages cited for specific grounding query phrases often reflect clear subject focus and domain expertise. Deepening coverage in related areas can reinforce authority.
Improve structure and clarity
Clear headings, tables, and FAQ sections help surface key information and make content easier for AI systems to reference accurately.
Support claims with evidence
Examples, data, and cited sources help build trust when content is reused in AI-generated answers.
Keep content fresh and accurate
Regular updates help ensure AI systems reference the most current version of your content.
Reduce ambiguity across formats
Align text, images, and video so they consistently represent the same entities, products, or concepts.
For deeper guidance on structuring content to improve inclusion in AI-generated answers, see Optimizing Your Content for Inclusion in AI Search Answers.
Keeping Content Fresh with IndexNow
Accurate and up to date content is important for inclusion and citation in AI-generated answers. IndexNow helps keep information fresh across search and AI experiences by notifying participating search engines whenever content is added, updated, or removed.
By enabling faster discovery of content changes, IndexNow helps ensure that AI systems reference the most current version of a page when generating answers. If you’re not already using IndexNow, go to https://www.indexnow.org to get started.
Local Business Information and AI Visibility
For local businesses, accurate business information is especially important when AI experiences surface answers to location-based queries.
In addition to using Bing Webmaster Tools, businesses can register with Bing Places for Business to help ensure that key details such as address, hours, and contact information remain current and eligible for inclusion in AI-generated responses.
Evolving AI Performance with the Webmaster Community
AI Performance in Bing Webmaster Tools marks an important step toward greater transparency between AI systems and the open web. As we expand these insights, we’ll continue working with publishers and the webmaster community to improve inclusion, attribution, and visibility across both search results and AI experiences.
We look forward to partnering with you as we evolve these capabilities and continue building tools that support discovery in the next generation of search and AI experiences.
Krishna Madhavan, Meenaz Merchant, Fabrice Canel, Saral Nigam
Product Managers, Microsoft AI
批量收录插件
?php
/*
Plugin Name: 批量网址收录·定时生成
Version: 1.8.8
Plugin URL: https://www.emlog.net/plugin/detail/xxx
Description: 独立定时批量生成插件,支持网址队列、自动抓取、AI生成、自动填充多分类/TDK/导航字段、封面优先Favicon。集成智能别名、自动刷新缓存。新增URL已收录检测(智能忽略www和结尾斜杠)、抓取失败跳过功能。支持文章生成后自动推送到必应搜索引擎(IndexNow)。新增全品类分类体系,支持两个分类体系并行选择。优化提示词,强化GEO/SEO内容生成,提升文章质量和搜索引擎友好度。增加基于别名的重复检测,避免相同标题不同域名重复收录。单次定时任务连续处理多个任务,失败立即跳过。增强源网站有效性检测,减少无效任务。新增图标下载优先使用API、域名跳转检测、违规内容过滤。优化:HTTP 403 直接标记失败跳过。移除Google图标下载源,避免超时拖慢任务。优化图标下载重试策略和HTML检测。增加定时任务执行时间至180秒,优化工具站内容提取,强化AI内容忠实度检测。新增实测模块,以“AI创作导航”第一人称视角生成真实体验内容。修复别名冲突逻辑:不同域名允许加数字后缀收录。增强cURL抓取能力:支持HTTP/2、完整浏览器头部模拟、Cookie管理,大幅提升反爬虫站点抓取成功率。新增网站截图功能,自动在工具介绍后插入截图,并添加自定义水印(修复PNG水印透明背景黑色问题,水印大小为原始尺寸的1/3)。
Author: 您的名字
Author URL: https://www.emlog.net/profiles/xxx
*/
!defined('EMLOG_ROOT') && exit('access denied!');
class ChuangAiLootBatch
{
const ID = 'chuang_ailoot';
const VERSION = '1.8.8';
// 截图功能开关
const ENABLE_SCREENSHOT = true;
// 截图 API 基础地址
const SCREENSHOT_API_URL = 'https://screenshotsnap.com/api/screenshot';
// 水印图片URL(右上角水印)
const WATERMARK_URL = 'https://cxgn.cn/apple-touch-icon.webp';
// 水印边距(像素)
const WATERMARK_MARGIN = 10;
// 水印透明度(0-100,100为不透明)
const WATERMARK_OPACITY = 10;
// 水印缩放比例(1/5 即缩小到原来的三分之一)
const WATERMARK_SCALE = 1/5;
private static $_instance;
private $_inited = false;
private $_pinyinDict = null;
private $_mbstringAvailable = false;
const USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36';
private static $_userAgents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:123.0) Gecko/20100101 Firefox/123.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3.1 Safari/605.1.15',
];
private static $badWords = [
'赌博', '赌场', '百家乐', '轮盘', '老虎机', '六合彩', '时时彩', '彩票', '赌球',
'色情', '情色', '成人', 'AV', '淫秽', '三级片', '性交', '裸聊', '约炮', '嫖娼',
'casino', 'gambling', 'porn', 'xxx', 'sex', 'nude', 'erotic'
];
public static function getInstance()
{
if (self::$_instance === null) {
self::$_instance = new self();
}
return self::$_instance;
}
private function __construct()
{
$this->_mbstringAvailable = function_exists('mb_strlen') && function_exists('mb_substr') && function_exists('mb_internal_encoding');
if ($this->_mbstringAvailable) {
mb_internal_encoding('UTF-8');
}
}
// ========== 终极UTF-8净化器 ==========
private static function cleanUtf8($str)
{
if (empty($str) || !is_string($str)) return $str;
$str = preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F\xEF\xBB\xBF]/u', '', $str);
if (mb_check_encoding($str, 'UTF-8')) {
return $str;
}
$encodings = ['GB18030', 'GBK', 'GB2312', 'BIG5', 'ASCII', 'ISO-8859-1'];
foreach ($encodings as $enc) {
if (mb_check_encoding($str, $enc)) {
$converted = mb_convert_encoding($str, 'UTF-8', $enc);
if ($converted !== false) {
return $converted;
}
}
}
return preg_replace('/[\x80-\xFF]/', '', $str);
}
private static function isUnknownValue($str)
{
if (empty($str)) return true;
$str = trim($str);
$pattern = '/未知|未公开|不确定|不详|无|null|none|undefine|unknown|anonymous|官方未公开|官方未提供|暂未提供|暂无|未明确披露|未填写|测试|demo|example|localhost|all rights reserved|保留(?:所有)?权利|版权所有|copyright|©|\(c\)|powered by|designed by|developed by|技术支持|提供技术支持|theme|template|wordpress/i';
return preg_match($pattern, $str) ? true : false;
}
private function convertHtmlToUtf8($html, $url = '')
{
if (empty($html)) return '';
$charset = '';
if (preg_match('/]+charset=["\']?([^"\'\s>]+)/i', $html, $m)) {
$charset = trim($m[1]);
} elseif (preg_match('/]+content=["\'][^"\']*charset=([^"\'\s>]+)/i', $html, $m)) {
$charset = trim($m[1]);
}
if (!empty($charset)) {
$charset = strtoupper($charset);
$charset = preg_replace('/[^A-Z0-9-]/', '', $charset);
if ($charset == 'UTF8') $charset = 'UTF-8';
if (in_array($charset, ['GB2312', 'GBK'])) $charset = 'GB18030';
if ($charset != 'UTF-8') {
$converted = @iconv($charset, 'UTF-8//IGNORE', $html);
if ($converted !== false) return $converted;
}
}
return self::cleanUtf8($html);
}
private function getFallbackPinyinDict()
{
// 简化字典,实际使用时请使用完整字典文件 pinyin_dict.php
return [
'的' => 'de', '一' => 'yi', '是' => 'shi', '了' => 'le', '我' => 'wo', '不' => 'bu', '人' => 'ren', '在' => 'zai', '他' => 'ta', '有' => 'you',
'这' => 'zhe', '个' => 'ge', '上' => 'shang', '来' => 'lai', '到' => 'dao', '大' => 'da', '们' => 'men', '说' => 'shuo', '中' => 'zhong', '为' => 'wei',
'子' => 'zi', '和' => 'he', '你' => 'ni', '地' => 'di', '出' => 'chu', '道' => 'dao', '也' => 'ye', '时' => 'shi', '年' => 'nian', '得' => 'de',
'就' => 'jiu', '那' => 'na', '要' => 'yao', '下' => 'xia', '以' => 'yi', '生' => 'sheng', '会' => 'hui', '自' => 'zi', '着' => 'zhe', '去' => 'qu',
'之' => 'zhi', '过' => 'guo', '家' => 'jia', '学' => 'xue', '对' => 'dui', '可' => 'ke', '她' => 'ta', '里' => 'li', '后' => 'hou', '小' => 'xiao',
'么' => 'me', '心' => 'xin', '多' => 'duo', '天' => 'tian', '而' => 'er', '能' => 'neng', '好' => 'hao', '都' => 'dou', '然' => 'ran', '没' => 'mei',
'日' => 'ri', '于' => 'yu', '起' => 'qi', '还' => 'hai', '发' => 'fa', '成' => 'cheng', '事' => 'shi', '只' => 'zhi', '作' => 'zuo', '当' => 'dang',
'想' => 'xiang', '看' => 'kan', '文' => 'wen', '无' => 'wu', '开' => 'kai', '手' => 'shou', '十' => 'shi', '用' => 'yong', '主' => 'zhu', '方' => 'fang',
'前' => 'qian', '如' => 'ru', '进' => 'jin', '样' => 'yang', '从' => 'cong', '同' => 'tong', '工' => 'gong', '也' => 'ye', '面' => 'mian', '又' => 'you',
'马' => 'ma', '动' => 'dong', '而' => 'er', '现' => 'xian', '点' => 'dian', '最' => 'zui', '新' => 'xin', '打' => 'da', '重' => 'zhong', '每' => 'mei',
'但' => 'dan', '身' => 'shen', '些' => 'xie', '高' => 'gao', '已' => 'yi', '此' => 'ci', '实' => 'shi', '书' => 'shu', '部' => 'bu', '其' => 'qi',
'法' => 'fa', '因' => 'yin', '相' => 'xiang', '什' => 'shen', '二' => 'er', '问' => 'wen', '理' => 'li', '美' => 'mei', '点' => 'dian', '月' => 'yue',
'万' => 'wan', '将' => 'jiang', '外' => 'wai', '政' => 'zheng', '义' => 'yi', '安' => 'an', '原' => 'yuan', '女' => 'nv',
'么' => 'yao', '先' => 'xian', '老' => 'lao', '很' => 'hen', '通' => 'tong', '教' => 'jiao', '并' => 'bing', '提' => 'ti', '意' => 'yi', '认' => 'ren',
'件' => 'jian', '计' => 'ji', '决' => 'jue', '公' => 'gong', '特' => 'te', '长' => 'chang', '党' => 'dang', '军' => 'jun', '民' => 'min',
'等' => 'deng', '度' => 'du', '务' => 'wu', '具' => 'ju', '战' => 'zhan', '名' => 'ming', '力' => 'li', '关' => 'guan', '机' => 'ji',
'田' => 'tian', '量' => 'liang', '联' => 'lian', '已' => 'yi', '处' => 'chu', '应' => 'ying', '它' => 'ta', '便' => 'bian', '任' => 'ren',
'记' => 'ji', '北' => 'bei', '男' => 'nan', '西' => 'xi', '买' => 'mai', '卖' => 'mai', '车' => 'che', '红' => 'hong', '光' => 'guang', '东' => 'dong',
'南' => 'nan', '华' => 'hua', '国' => 'guo', '族' => 'zu', '志' => 'zhi', '爱' => 'ai', '护' => 'hu', '保' => 'bao', '持' => 'chi',
'续' => 'xu', '展' => 'zhan', '科' => 'ke', '技' => 'ji', '术' => 'shu', '化' => 'hua', '育' => 'yu', '体' => 'ti', '健' => 'jian', '康' => 'kang',
'卫' => 'wei', '生' => 'sheng', '产' => 'chan', '业' => 'ye', '商' => 'shang', '品' => 'pin', '质' => 'zhi', '标' => 'biao', '准' => 'zhun', '规' => 'gui',
'格' => 'ge', '式' => 'shi', '器' => 'qi', '械' => 'xie', '电' => 'dian', '水' => 'shui', '火' => 'huo', '土' => 'tu', '木' => 'mu', '金' => 'jin',
'石' => 'shi', '山' => 'shan', '川' => 'chuan', '湖' => 'hu', '海' => 'hai', '洋' => 'yang', '空' => 'kong', '气' => 'qi', '风' => 'feng', '雨' => 'yu',
'雪' => 'xue', '雷' => 'lei', '闪' => 'shan', '声' => 'sheng', '音' => 'yin', '乐' => 'yue', '舞' => 'wu', '台' => 'tai', '戏' => 'xi', '影' => 'ying',
'视' => 'shi', '频' => 'pin', '道' => 'dao', '网' => 'wang', '络' => 'luo', '站' => 'zhan', '页' => 'ye', '址' => 'zhi', '域' => 'yu',
'深' => 'shen', '求' => 'qiu', '索' => 'suo', '章' => 'zhang', '成' => 'cheng', '苏' => 'su', '州' => 'zhou', '搜' => 'sou',
'信' => 'xin', '息' => 'xi', '有' => 'you', '限' => 'xian', '司' => 'si', '天' => 'tian', '津' => 'jin', '滴' => 'di', '忆' => 'yi',
];
}
private function loadPinyinDict()
{
if ($this->_pinyinDict === null) {
$dict = null;
$dictFile = __DIR__ . '/pinyin_dict.php';
if (file_exists($dictFile)) {
try {
$dict = include $dictFile;
if (is_array($dict)) {
$this->_pinyinDict = $dict;
$this->logDebug("拼音字典加载成功: 外部文件,条目数 " . count($dict));
return $this->_pinyinDict;
} else {
$this->logDebug("拼音字典文件格式错误:不是数组,将使用内置字典");
}
} catch (Exception $e) {
$this->logDebug("拼音字典加载异常: " . $e->getMessage() . ",将使用内置字典");
}
} else {
$this->logDebug("拼音字典文件不存在: " . $dictFile . ",将使用内置字典");
}
$this->_pinyinDict = $this->getFallbackPinyinDict();
$this->logDebug("拼音字典加载成功: 内置字典,条目数 " . count($this->_pinyinDict));
}
return $this->_pinyinDict;
}
private function chineseToPinyin($str)
{
if (!$this->_mbstringAvailable) {
$this->logDebug("mbstring扩展不可用,汉字转拼音功能已禁用");
return '';
}
$dict = $this->loadPinyinDict();
if (empty($dict)) {
$this->logDebug("拼音字典为空,无法转换拼音");
return '';
}
try {
$result = [];
$len = mb_strlen($str, 'UTF-8');
for ($i = 0; $i < $len; $i++) {
$char = mb_substr($str, $i, 1, 'UTF-8');
if (isset($dict[$char])) {
$result[] = $dict[$char];
} else {
$this->logDebug("未收录汉字: {$char},已忽略");
}
}
if (empty($result)) {
$this->logDebug("汉字块 '{$str}' 转换结果为空");
return '';
}
return implode('-', $result);
} catch (Exception $e) {
$this->logDebug("汉字转拼音异常: " . $e->getMessage());
return '';
}
}
private function generateAlias($title)
{
$fallback = 'post-' . time();
try {
$title = self::cleanUtf8($title);
if (empty($title)) {
return $fallback;
}
$processed = preg_replace_callback(
'/[\x{4e00}-\x{9fa5}]+/u',
function($matches) {
$chineseBlock = $matches[0];
$pinyin = $this->chineseToPinyin($chineseBlock);
if (!empty($pinyin)) {
return ' ' . $pinyin . ' ';
}
return '';
},
$title
);
$alias = preg_replace('/[^\p{L}\p{N}\s_-]/u', '', $processed);
$alias = preg_replace('/[\s_\.\/\\\\]+/', '-', $alias);
$alias = preg_replace('/-+/', '-', $alias);
$alias = trim($alias, '-');
$alias = strtolower($alias);
if (!empty($alias) && strlen($alias) <= 200) {
$this->logDebug("别名生成成功: {$alias}");
return $alias;
}
$this->logDebug("拼音转换结果为空,使用降级方案");
$title = $this->fullToHalf($title);
$alias = preg_replace('/[^\p{L}\p{N}\s-]/u', '', $title);
$pinyinMap = [
'深度求索' => 'shen-du-qiu-suo',
'文章' => 'wen-zhang',
'生成' => 'sheng-cheng',
'工具' => 'gong-ju',
'AI' => 'ai',
];
foreach ($pinyinMap as $ch => $py) {
$alias = str_replace($ch, $py, $alias);
}
$alias = preg_replace('/[\x{4e00}-\x{9fa5}]/u', '', $alias);
$alias = strtolower($alias);
$alias = preg_replace('/[\s_\.\/\\\\]+/', '-', $alias);
$alias = preg_replace('/-+/', '-', $alias);
$alias = trim($alias, '-');
if (strlen($alias) > 200) {
$alias = substr($alias, 0, 200);
}
if (empty($alias)) {
$alias = $fallback;
}
$this->logDebug("降级别名生成成功: {$alias}");
return $alias;
} catch (Exception $e) {
$this->logDebug("生成别名异常: " . $e->getMessage());
return $fallback;
}
}
/**
* 检查别名是否已被其他文章使用,并根据原始URL域名判断是否为重复内容
*/
private function ensureUniqueAliasWithDomainCheck($alias, $currentUrl)
{
$db = Database::getInstance();
$originalAlias = $alias;
$counter = 1;
while (true) {
$sql = "SELECT gid FROM " . DB_PREFIX . "blog WHERE alias = '" . $db->escape_string($alias) . "'";
$res = $db->query($sql);
if ($res->num_rows == 0) {
return ['alias' => $alias, 'error' => null];
}
$row = $res->fetch_assoc();
$existingPostId = $row['gid'];
$existingUrl = $this->getPostOriginalUrl($existingPostId);
if (empty($existingUrl)) {
$this->logDebug("无法获取别名 '{$alias}' 对应文章ID {$existingPostId} 的原始URL,将使用数字后缀");
$alias = $originalAlias . '-' . $counter;
$counter++;
continue;
}
$currentDomain = parse_url($currentUrl, PHP_URL_HOST);
$existingDomain = parse_url($existingUrl, PHP_URL_HOST);
$currentDomain = preg_replace('/^www\./i', '', $currentDomain);
$existingDomain = preg_replace('/^www\./i', '', $existingDomain);
if (strtolower($currentDomain) === strtolower($existingDomain)) {
return ['alias' => $alias, 'error' => "别名已存在且属于同一域名 ({$existingDomain}),内容重复"];
}
$this->logDebug("别名 '{$alias}' 已存在,但域名不同(当前:{$currentDomain},已有:{$existingDomain}),将使用数字后缀");
$alias = $originalAlias . '-' . $counter;
$counter++;
}
}
private function getPostOriginalUrl($postId)
{
$db = Database::getInstance();
$navTable = DB_PREFIX . 'chuang_nav';
$tableCheck = $db->query("SHOW TABLES LIKE '{$navTable}'");
if ($db->num_rows($tableCheck) == 0) {
return '';
}
$sql = "SELECT `value` FROM `{$navTable}` WHERE `gid` = " . intval($postId) . " LIMIT 1";
$row = $db->once_fetch_array($sql);
if (!$row) {
$sql = "SELECT `value` FROM `{$navTable}` WHERE `id` = " . intval($postId) . " LIMIT 1";
$row = $db->once_fetch_array($sql);
}
if ($row && !empty($row['value'])) {
$navData = @unserialize($row['value']);
if (is_array($navData) && isset($navData['chuang_url'])) {
return $navData['chuang_url'];
}
}
return '';
}
private function normalizeUrlForComparison($url)
{
$parsed = parse_url($url);
if (!$parsed) {
return $url;
}
$scheme = isset($parsed['scheme']) ? $parsed['scheme'] . '://' : '';
$host = isset($parsed['host']) ? $parsed['host'] : '';
$path = isset($parsed['path']) ? $parsed['path'] : '';
$host = preg_replace('/^www\./i', '', $host);
$path = rtrim($path, '/');
$normalized = $scheme . $host . $path;
return $normalized;
}
private function urlExistsInNav($url)
{
$db = Database::getInstance();
$table = DB_PREFIX . "chuang_nav";
$sql = "SELECT `value` FROM `{$table}`";
$res = $db->query($sql);
$normalizedInput = $this->normalizeUrlForComparison($url);
while ($row = $res->fetch_assoc()) {
$data = @unserialize($row['value']);
if (is_array($data) && isset($data['chuang_url'])) {
$storedUrl = $data['chuang_url'];
$normalizedStored = $this->normalizeUrlForComparison($storedUrl);
if ($normalizedStored === $normalizedInput) {
return true;
}
}
}
return false;
}
private function pushToBing($post_id)
{
$this->logDebug("准备推送文章ID {$post_id} 到必应搜索引擎 (仅IndexNow)");
if (!file_exists(EMLOG_ROOT . '/content/plugins/chuang_bing/chuang_bing.php')) {
$this->logDebug("错误: 必应推送插件未安装 (文件不存在)");
return false;
}
require_once EMLOG_ROOT . '/content/plugins/chuang_bing/chuang_bing.php';
if (!function_exists('chuang_bing_indexnow_push')) {
$this->logDebug("错误: 必应推送插件中的函数 chuang_bing_indexnow_push 不存在");
return false;
}
$storage = Storage::getInstance('chuang_bing');
$bing_enabled = (int)$storage->getValue('bing_enabled');
if (!$bing_enabled) {
$this->logDebug("错误: 必应推送插件未启用IndexNow推送方式");
return false;
}
$results = array();
$key = $storage->getValue('indexnow_key');
$keyLocation = $storage->getValue('indexnow_key_location');
$host = $storage->getValue('indexnow_host');
$key_masked = $key ? substr($key, 0, 4) . '****' : '空';
$this->logDebug("IndexNow配置: key={$key_masked}, keyLocation={$keyLocation}, host={$host}");
if ($key && $keyLocation && $host) {
$url = '';
if (class_exists('Url') && method_exists('Url', 'log')) {
$url = Url::log($post_id);
$this->logDebug("通过Url::log获取文章URL: " . ($url ?: '空'));
}
if (empty($url)) {
$url = Option::get('blogurl') . '?post=' . $post_id;
$this->logDebug("使用备用URL: {$url}");
}
if (empty($url)) {
$this->logDebug("错误: 无法生成文章URL");
$results[] = "IndexNow: 无法生成URL";
} else {
$result = chuang_bing_indexnow_push($url, $key, $keyLocation, $host);
$results[] = "IndexNow: " . $result;
$this->logDebug("IndexNow推送原始返回: " . $result);
}
} else {
$missing = [];
if (empty($key)) $missing[] = 'indexnow_key';
if (empty($keyLocation)) $missing[] = 'indexnow_key_location';
if (empty($host)) $missing[] = 'indexnow_host';
$this->logDebug("错误: IndexNow配置不完整,缺失项: " . implode(', ', $missing));
$results[] = "IndexNow: 配置不完整";
}
if (!empty($results) && function_exists('chuang_bing_add_log')) {
$db = Database::getInstance();
$sql = "SELECT title FROM " . DB_PREFIX . "blog WHERE gid = {$post_id}";
$row = $db->once_fetch_array($sql);
$title = $row ? $row['title'] : '';
$logResult = '批量生成自动推送 - ' . implode('; ', $results);
chuang_bing_add_log($title, $url, $logResult);
$this->logDebug("已记录推送日志到必应插件");
}
$finalResult = implode('; ', $results);
$this->logDebug("推送完成: {$finalResult}");
return true;
}
public function init()
{
if ($this->_inited) return;
$this->_inited = true;
if ($this->_mbstringAvailable) {
mb_internal_encoding('UTF-8');
}
$db = Database::getInstance();
$db->query("SET NAMES utf8mb4");
addAction('adm_menu', function() {
echo '
批量收录
';
});
if (isset($_GET['plugin']) && $_GET['plugin'] == self::ID && isset($_GET['batch_action'])) {
$this->handleBatchAction();
}
if (isset($_GET['plugin']) && $_GET['plugin'] == self::ID && isset($_GET['update_action'])) {
$this->handleUpdateBatchAction();
}
if (isset($_GET['ai_cron_old']) && $_GET['ai_cron_old'] == '1') {
$this->handleCron();
exit;
}
if (isset($_GET['ai_cron_update']) && $_GET['ai_cron_update'] == '1') {
$this->handleUpdateCron();
exit;
}
if (isset($_GET['plugin']) && $_GET['plugin'] == self::ID) {
if (!function_exists('plugin_setting_view')) {
require_once __DIR__ . '/chuang_ailoot_setting.php';
}
}
addAction('adm_head', [$this, 'hookHeader']);
}
public function hookHeader()
{
echo '';
}
private function handleBatchAction()
{
$action = Input::getStrVar('batch_action');
$task_index = Input::getIntVar('task_index', -1);
switch ($action) {
case 'run_task': $this->processSingleTask($task_index); break;
case 'run_all': $this->processAllTasks(); break;
case 'delete_task': $this->deleteTask($task_index); break;
case 'retry_failure': $this->retryFailure($task_index); break;
case 'batch_retry_failures': $this->batchRetryFailures(); break;
case 'batch_delete_failures': $this->batchDeleteFailures(); break;
case 'clear_completed': $this->clearCompletedTasks(); break;
case 'clear_pending': $this->clearPendingTasks(); break;
case 'clear_failures': $this->clearFailures(); break;
case 'add_example_tasks': $this->addExampleTasks(); break;
case 'force_pending': $this->forceAllToPending(); break;
case 'reset_processing_to_pending': $this->resetProcessingToPending(); break;
case 'diagnose_tasks': $this->diagnoseTasks(); break;
case 'pause_cron': $this->pauseCron(); break;
case 'resume_cron': $this->resumeCron(); break;
case 'approve_submission': $this->approveSubmission($task_index); break;
case 'reject_submission': $this->rejectSubmission($task_index); break;
case 'run_update_task': $this->processSingleUpdateTask($task_index); break;
case 'run_all_update_tasks': $this->processAllUpdateTasks(); break;
case 'delete_update_task': $this->deleteUpdateTask($task_index); break;
case 'clear_completed_update': $this->clearCompletedUpdateTasks(); break;
case 'clear_all_update': $this->clearAllUpdateTasks(); break;
case 'retry_update_task': $this->retryUpdateTask($task_index); break;
}
header('Location: ' . BLOG_URL . 'admin/plugin.php?plugin=' . self::ID . '&tab=batch&t=' . time());
exit;
}
public function addBatchUrls($urls_text)
{
$storage = Storage::getInstance(self::ID);
$tasks = $storage->getValue('batch_tasks') ?: [];
$urls = array_filter(array_map('trim', explode("\n", $urls_text)));
$added = 0;
foreach ($urls as $url) {
if (empty($url) || !filter_var($url, FILTER_VALIDATE_URL)) continue;
$exists = false;
foreach ($tasks as $task) if ($task['url'] === $url) { $exists = true; break; }
if (!$exists) {
$tasks[] = [
'url' => $url, 'status' => 'pending', 'created_at' => time(), 'updated_at' => time(),
'post_id' => 0, 'detailed' => 1, 'retry_count' => 0
];
$added++;
}
}
if ($added > 0) $storage->setValue('batch_tasks', $tasks, 'array');
return $added;
}
public function getBatchStats()
{
$storage = Storage::getInstance(self::ID);
$tasks = $storage->getValue('batch_tasks') ?: [];
$failures = $storage->getValue('batch_failures') ?: [];
$stats = ['total'=>0, 'pending'=>0, 'processing'=>0, 'completed'=>0, 'failed'=>0, 'exists'=>0, 'failures_count'=>count($failures)];
foreach ($tasks as $task) {
$stats['total']++;
$s = $task['status'] ?? '';
if (isset($stats[$s])) $stats[$s]++;
}
return $stats;
}
public function getBatchTasks() { $storage = Storage::getInstance(self::ID); return $storage->getValue('batch_tasks') ?: []; }
public function getFailures() { $storage = Storage::getInstance(self::ID); return $storage->getValue('batch_failures') ?: []; }
public function getCronLog() { $storage = Storage::getInstance(self::ID); return $storage->getValue('cron_log') ?: []; }
public function addPendingSubmission($url)
{
if (empty($url) || !filter_var($url, FILTER_VALIDATE_URL)) return false;
$storage = Storage::getInstance(self::ID);
$pending = $storage->getValue('pending_submissions') ?: [];
foreach ($pending as $item) {
if ($item['url'] === $url) return false;
}
$pending[] = [
'url' => $url,
'submit_time' => time(),
'status' => 'pending'
];
$storage->setValue('pending_submissions', $pending, 'array');
return true;
}
public function getPendingSubmissions()
{
$storage = Storage::getInstance(self::ID);
return $storage->getValue('pending_submissions') ?: [];
}
public function approveSubmission($index)
{
$storage = Storage::getInstance(self::ID);
$pending = $storage->getValue('pending_submissions') ?: [];
if (isset($pending[$index])) {
$url = $pending[$index]['url'];
$this->addBatchUrls($url);
unset($pending[$index]);
$pending = array_values($pending);
$storage->setValue('pending_submissions', $pending, 'array');
$_SESSION['chuang_ailoot_message'] = "已通过审核,URL已加入任务队列:{$url}";
}
}
public function rejectSubmission($index)
{
$storage = Storage::getInstance(self::ID);
$pending = $storage->getValue('pending_submissions') ?: [];
if (isset($pending[$index])) {
$url = $pending[$index]['url'];
unset($pending[$index]);
$pending = array_values($pending);
$storage->setValue('pending_submissions', $pending, 'array');
$_SESSION['chuang_ailoot_message'] = "已拒绝提交:{$url}";
}
}
public function handleCron()
{
header('Content-Type: application/json; charset=utf-8');
set_time_limit(600);
$token = isset($_GET['token']) ? trim($_GET['token']) : '';
if (!$this->verifyCronToken($token)) exit(json_encode(['success'=>false,'message'=>'Token验证失败']));
$storage = Storage::getInstance(self::ID);
$paused = $storage->getValue('cron_paused', false);
if ($paused) {
$this->logDebug('定时任务已暂停,跳过执行');
exit(json_encode(['success'=>false,'message'=>'定时任务已暂停']));
}
$start = time();
$max_execution_time = 180;
$success_count = 0;
$fail_count = 0;
$results = [];
$this->logDebug('定时任务开始(连续处理模式)');
while (true) {
if (time() - $start >= $max_execution_time) {
$this->logDebug("达到最大执行时间,停止处理");
break;
}
$result = $this->processBatchTask();
if (!$result['success']) {
if ($result['message'] === '无待处理任务') {
$this->logDebug("无待处理任务,结束循环");
break;
}
$fail_count++;
$results[] = $result;
continue;
}
$success_count++;
$results[] = $result;
usleep(500000);
}
$total_time = time() - $start;
$this->logDebug("定时任务结束,耗时:{$total_time}秒,成功:{$success_count},失败:{$fail_count}");
$summary = [
'success' => true,
'message' => "定时任务执行完毕,成功:{$success_count},失败:{$fail_count},耗时:{$total_time}秒",
'success_count' => $success_count,
'fail_count' => $fail_count,
'results' => $results,
];
$this->logCronExecution($summary);
echo json_encode($summary);
exit;
}
public function generateCronToken() { return md5(Option::get('site_key') . self::ID . '_cron'); }
private function verifyCronToken($token) { return $token === $this->generateCronToken(); }
private function logCronExecution($result)
{
$storage = Storage::getInstance(self::ID);
$log = $storage->getValue('cron_log') ?: [];
array_unshift($log, [
'time' => time(),
'success' => $result['success'],
'message' => $result['message'] ?? '',
'success_count' => $result['success_count'] ?? 0,
'fail_count' => $result['fail_count'] ?? 0,
]);
$log = array_slice($log, 0, 50);
$storage->setValue('cron_log', $log, 'array');
}
public function processBatchTask()
{
$storage = Storage::getInstance(self::ID);
$tasks = $storage->getValue('batch_tasks') ?: [];
$index = null;
foreach ($tasks as $i => $task) if ($task['status'] === 'pending') { $index = $i; break; }
if ($index === null) {
foreach ($tasks as $i => $task) if ($task['status'] === 'failed' && ($task['retry_count']??0) < 3) { $index = $i; break; }
}
if ($index === null) return ['success'=>false, 'message'=>'无待处理任务', 'details'=>json_encode($this->getBatchStats())];
$url = $tasks[$index]['url'];
if ($this->urlExistsInNav($url)) {
$tasks[$index]['status'] = 'exists';
$tasks[$index]['error'] = 'URL已收录(忽略www和结尾斜杠后匹配),跳过生成';
$tasks[$index]['updated_at'] = time();
$storage->setValue('batch_tasks', $tasks, 'array');
$this->logDebug("任务跳过,URL已存在(规范化后匹配): {$url}");
return ['success'=>false, 'message'=>'URL已收录,跳过生成', 'details'=>"URL: {$url}"];
}
$tasks[$index]['status'] = 'processing';
$tasks[$index]['updated_at'] = time();
$tasks[$index]['retry_count'] = ($tasks[$index]['retry_count']??0) + 1;
$storage->setValue('batch_tasks', $tasks, 'array');
$success = false;
$post_id = null;
$error_msg = '';
try {
$detailed = $tasks[$index]['detailed'] ?? 1;
$this->logDebug("开始处理任务 #{$index} : {$url}");
$result = $this->generateArticleFromUrl($url, $detailed);
if (!$result['success']) throw new Exception($result['error'] ?? '生成失败');
$rawAlias = $result['raw_alias'] ?? '';
if (!empty($rawAlias)) {
$aliasCheck = $this->ensureUniqueAliasWithDomainCheck($rawAlias, $url);
if ($aliasCheck['error'] !== null) {
throw new Exception($aliasCheck['error']);
}
$result['alias'] = $aliasCheck['alias'];
}
$post_id = $this->saveArticle($result);
if (!$post_id) throw new Exception('文章保存失败');
$success = true;
$this->logDebug("任务完成,文章ID: {$post_id}");
} catch (Exception $e) {
$error_msg = $e->getMessage();
$this->logDebug("任务失败: " . $error_msg);
}
$tasks = $storage->getValue('batch_tasks') ?: [];
if (isset($tasks[$index])) {
if ($success) {
$tasks[$index]['status'] = 'completed';
$tasks[$index]['post_id'] = $post_id;
$tasks[$index]['error'] = '';
} else {
if (strpos($error_msg, '别名已存在') !== false) {
$tasks[$index]['retry_count'] = 3;
}
$tasks[$index]['status'] = 'failed';
$tasks[$index]['error'] = $error_msg;
$failures = $storage->getValue('batch_failures') ?: [];
$failures[] = [
'url' => $tasks[$index]['url'],
'error' => $error_msg,
'failed_at' => time(),
'detailed' => $tasks[$index]['detailed'] ?? 1,
'retry_count' => $tasks[$index]['retry_count'] ?? 0
];
$storage->setValue('batch_failures', $failures, 'array');
}
$tasks[$index]['updated_at'] = time();
$storage->setValue('batch_tasks', $tasks, 'array');
$this->logDebug("任务状态已更新: index={$index}, status={$tasks[$index]['status']}");
}
if ($success) {
return ['success'=>true, 'message'=>"文章生成成功 ID: {$post_id}", 'post_id'=>$post_id, 'details'=>"URL: {$url}"];
} else {
return ['success'=>false, 'message'=>$error_msg, 'details'=>"URL: {$url}"];
}
}
private function saveArticle($aiData)
{
$db = Database::getInstance();
$db->query("SET NAMES utf8mb4");
$aiData['title'] = self::cleanUtf8($aiData['title']);
$aiData['content'] = self::cleanUtf8($aiData['content']);
$aiData['excerpt'] = self::cleanUtf8($aiData['excerpt'] ?? '');
$aiData['tags'] = self::cleanUtf8($aiData['tags'] ?? '');
$excerpt = '';
if (!empty($aiData['excerpt'])) {
$excerpt = strip_tags($aiData['excerpt']);
$excerpt = html_entity_decode($excerpt, ENT_QUOTES, 'UTF-8');
$excerpt = mb_substr($excerpt, 0, 15, 'UTF-8');
}
$tags = '';
if (!empty($aiData['tags'])) {
$tags = trim($aiData['tags'], ', ');
$tags = preg_replace('/,+/', ',', $tags);
$tags = trim($tags, ', ');
}
if (!empty($aiData['alias'])) {
$alias = $aiData['alias'];
} else {
$alias = $this->generateAlias($aiData['title']);
if (!preg_match('/^[a-zA-Z0-9_-]+$/', $alias)) {
$alias = '';
}
if (empty($alias)) {
$alias = 'post-' . time();
}
}
$logData = [
'title' => $aiData['title'],
'content' => $aiData['content'],
'excerpt' => $excerpt,
'author' => 1,
'date' => time(),
'checked' => 'y',
'allow_remark' => 'y',
'hide' => 'n',
'sortid' => 0,
'alias' => $alias,
];
$log_model = new Log_Model();
$post_id = $log_model->addlog($logData);
if (!$post_id) return false;
if (!empty($aiData['category_ids']) && is_array($aiData['category_ids'])) {
$cat_ids = array_map('intval', $aiData['category_ids']);
$cat_ids = array_unique($cat_ids);
$this->saveMultiCategories($post_id, $cat_ids);
}
if (!empty($tags)) {
$tag_model = new Tag_Model();
$tag_model->addTag($tags, $post_id);
}
if (!empty($aiData['cover_url'])) {
$this->setPostCover($post_id, $aiData['cover_url']);
}
if (!empty($aiData['seo_title']) || !empty($aiData['seo_description'])) {
$this->saveTdk($post_id, $aiData['seo_title'] ?? '', $aiData['seo_description'] ?? '', '');
}
$this->saveNavFields($post_id, $aiData);
if (class_exists('Cache')) {
try {
Cache::getInstance()->updateCache();
$this->logDebug("全站缓存已刷新");
} catch (Exception $e) {
$this->logDebug("缓存刷新失败: " . $e->getMessage());
}
}
$this->pushToBing($post_id);
return $post_id;
}
private function saveNavFields($post_id, $aiData)
{
$db = Database::getInstance();
$navTable = DB_PREFIX . 'chuang_nav';
$tableCheck = $db->query("SHOW TABLES LIKE '{$navTable}'");
if ($db->num_rows($tableCheck) == 0) {
$this->logDebug("导航表 {$navTable} 不存在,无法保存导航字段");
return;
}
$navData = [
'chuang_url' => $aiData['url'] ?? '',
'is_ai' => $aiData['nav_fields']['is_ai'] ?? 'unknown',
'is_featured' => 'no',
'location' => $aiData['nav_fields']['location'] ?? '',
'update_time' => time(),
];
$this->logDebug("准备保存导航字段: " . json_encode($navData, JSON_UNESCAPED_UNICODE));
if (class_exists('ChuangNavClass')) {
try {
$nav = ChuangNavClass::getInstance();
$nav->set_data($post_id, $navData);
$this->logDebug("通过 ChuangNavClass 保存导航字段成功,文章ID: {$post_id}");
return;
} catch (Exception $e) {
$this->logDebug("通过 ChuangNavClass 保存导航字段失败: " . $e->getMessage() . ",将尝试直接数据库操作");
}
}
$value = serialize($navData);
$value = $db->escape_string($value);
$checkSql = "SELECT id FROM {$navTable} WHERE id = {$post_id}";
$checkRes = $db->query($checkSql);
if ($db->num_rows($checkRes) > 0) {
$sql = "UPDATE {$navTable} SET `value` = '{$value}', `update_time` = " . time() . " WHERE id = {$post_id}";
} else {
$sql = "INSERT INTO {$navTable} (id, `value`, `update_time`) VALUES ({$post_id}, '{$value}', " . time() . ")";
}
if ($db->query($sql)) {
$this->logDebug("直接数据库操作保存导航字段成功,文章ID: {$post_id}");
} else {
$this->logDebug("直接数据库操作保存导航字段失败: " . $db->error());
}
}
private function saveMultiCategories($post_id, $category_ids)
{
try {
$db = Database::getInstance();
$table = DB_PREFIX . 'multi_category';
$check = $db->query("SHOW TABLES LIKE '{$table}'");
if ($db->num_rows($check) == 0) return false;
$db->query("DELETE FROM {$table} WHERE gid = {$post_id}");
$valid_ids = array_filter(array_map('intval', $category_ids));
foreach ($valid_ids as $cid) {
if ($cid > 0) {
$db->query("INSERT INTO {$table} (gid, sid) VALUES ({$post_id}, {$cid})");
}
}
if (!empty($valid_ids)) {
$main_cat = intval($valid_ids[0]);
$db->query("UPDATE " . DB_PREFIX . "blog SET sortid = {$main_cat} WHERE gid = {$post_id}");
}
return true;
} catch (Exception $e) {
$this->logDebug("多分类保存失败: " . $e->getMessage());
return false;
}
}
private function saveTdk($post_id, $title, $description, $keywords = '')
{
try {
$db = Database::getInstance();
$table = DB_PREFIX . 'chuang_tdk_data';
$check = $db->query("SHOW TABLES LIKE '{$table}'");
if ($db->num_rows($check) == 0) return false;
$title = $db->escape_string(self::cleanUtf8($title));
$description = $db->escape_string(self::cleanUtf8($description));
$keywords = $db->escape_string(self::cleanUtf8($keywords));
$sql = "INSERT INTO {$table} (gid, t, d, k)
VALUES ({$post_id}, '{$title}', '{$description}', '{$keywords}')
ON DUPLICATE KEY UPDATE t='{$title}', d='{$description}', k='{$keywords}'";
$db->query($sql);
return true;
} catch (Exception $e) {
$this->logDebug("TDK保存失败: " . $e->getMessage());
return false;
}
}
private function setPostCover($post_id, $image_url)
{
if (empty($image_url)) return false;
try {
if (strpos($image_url, 'http') !== 0) {
$cover_url = $image_url;
} else {
$dir_name = gmdate('Ym');
$upload_path = Option::UPLOADFILE_FULL_PATH . $dir_name . '/';
if (!is_dir($upload_path)) mkdir($upload_path, 0755, true);
$path_info = pathinfo(parse_url($image_url, PHP_URL_PATH));
$ext = isset($path_info['extension']) ? preg_replace('/[^a-zA-Z0-9]/', '', $path_info['extension']) : 'jpg';
if (!in_array($ext, ['jpg','jpeg','png','gif','webp','ico'])) $ext = 'jpg';
$filename = substr(md5($image_url . time()), 0, 12) . '_' . time() . '.' . $ext;
$file_path = $upload_path . $filename;
$ch = curl_init($image_url);
curl_setopt_array($ch, [
CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_TIMEOUT => 30,
CURLOPT_USERAGENT => self::USER_AGENT, CURLOPT_SSL_VERIFYPEER => false
]);
$img = curl_exec($ch);
$http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($http_code != 200 || empty($img)) return false;
if (!file_put_contents($file_path, $img)) return false;
$cover_url = Option::UPLOADFILE_PATH . $dir_name . '/' . $filename;
}
$db = Database::getInstance();
$cover_url = $db->escape_string($cover_url);
$db->query("UPDATE " . DB_PREFIX . "blog SET cover = '{$cover_url}' WHERE gid = {$post_id}");
return true;
} catch (Exception $e) {
$this->logDebug("设置封面失败: " . $e->getMessage());
return false;
}
}
// ========== 网站截图下载功能(含水印,支持PNG/WebP/JPG,缩放比例 1/3,透明背景修复) ==========
private function downloadScreenshot($url, $width = 1200, $height = 800, $format = 'webp')
{
if (!self::ENABLE_SCREENSHOT) {
$this->logDebug("截图功能已禁用,跳过");
return '';
}
$this->logDebug("开始下载网站截图: {$url}");
$apiUrl = self::SCREENSHOT_API_URL . '?' . http_build_query([
'url' => $url,
'format' => $format,
'width' => $width,
'height' => $height,
]);
try {
$imageData = $this->downloadScreenshotData($apiUrl);
if (empty($imageData)) {
$this->logDebug("截图 API 返回空数据");
return '';
}
if (strlen($imageData) < 1024) {
$this->logDebug("截图数据过小,可能下载失败");
return '';
}
// 保存临时文件
$tmpFile = tempnam(sys_get_temp_dir(), 'screenshot_') . '.' . $format;
if (!file_put_contents($tmpFile, $imageData)) {
$this->logDebug("临时截图文件写入失败");
return '';
}
// 添加水印
$watermarkedFile = $this->addWatermarkToImage($tmpFile, $format);
if (!$watermarkedFile) {
$this->logDebug("水印添加失败,将使用原始截图");
$watermarkedFile = $tmpFile;
}
// 移动到正式目录
$dir = gmdate('Ym');
$fullDir = Option::UPLOADFILE_FULL_PATH . $dir . '/';
if (!is_dir($fullDir)) {
mkdir($fullDir, 0755, true);
}
$filename = 'screenshot_' . md5($url . time()) . '.' . $format;
$filepath = $fullDir . $filename;
if (copy($watermarkedFile, $filepath)) {
$localUrl = Option::UPLOADFILE_PATH . $dir . '/' . $filename;
$this->logDebug("截图保存成功(已添加水印): {$localUrl}");
@unlink($tmpFile);
if ($watermarkedFile !== $tmpFile) @unlink($watermarkedFile);
return $localUrl;
} else {
$this->logDebug("截图文件移动失败");
@unlink($tmpFile);
if ($watermarkedFile !== $tmpFile) @unlink($watermarkedFile);
return '';
}
} catch (Exception $e) {
$this->logDebug("截图下载异常: " . $e->getMessage());
return '';
}
}
private function downloadScreenshotData($apiUrl)
{
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => $apiUrl,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_TIMEOUT => 30,
CURLOPT_CONNECTTIMEOUT => 10,
CURLOPT_USERAGENT => self::USER_AGENT,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_HTTPHEADER => [
'Accept: image/webp,image/apng,image/*,*/*;q=0.8',
],
]);
$data = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
$error = curl_error($ch);
curl_close($ch);
if ($error) {
$this->logDebug("截图 cURL 错误: {$error}");
return '';
}
if ($httpCode != 200) {
$this->logDebug("截图 API 返回 HTTP {$httpCode}");
return '';
}
return $data;
}
/**
* 检测图片格式(通过文件头魔数)
*/
private function detectImageFormat($data)
{
$header = substr($data, 0, 12);
if (strpos($header, 'RIFF') !== false && strpos($header, 'WEBP') !== false) {
return 'webp';
} elseif (strpos($header, "\x89PNG\r\n\x1a\n") !== false) {
return 'png';
} elseif (strpos($header, "\xff\xd8\xff") !== false) {
return 'jpg';
}
return 'png'; // 默认
}
/**
* 给图片添加水印(右上角,缩放至 1/3,支持PNG透明通道,修复黑色背景问题)
*/
private function addWatermarkToImage($imagePath, $format)
{
if (!function_exists('imagecreatetruecolor')) {
$this->logDebug("GD库不可用,无法添加水印");
return false;
}
// 下载水印图片
$watermarkData = $this->downloadWatermark();
if (!$watermarkData) {
$this->logDebug("水印图片下载失败");
return false;
}
// 检测水印图片的实际格式
$extension = $this->detectImageFormat($watermarkData);
$this->logDebug("检测到水印图片格式: {$extension}");
// 保存临时水印文件
$watermarkTmp = tempnam(sys_get_temp_dir(), 'watermark_') . '.' . $extension;
if (!file_put_contents($watermarkTmp, $watermarkData)) {
$this->logDebug("临时水印文件写入失败");
return false;
}
try {
// 加载原始图片
switch (strtolower($format)) {
case 'png':
$srcImage = @imagecreatefrompng($imagePath);
break;
case 'jpg':
case 'jpeg':
$srcImage = @imagecreatefromjpeg($imagePath);
break;
case 'webp':
$srcImage = @imagecreatefromwebp($imagePath);
break;
default:
$srcImage = @imagecreatefromstring(file_get_contents($imagePath));
break;
}
if (!$srcImage) {
$this->logDebug("无法加载原始截图");
@unlink($watermarkTmp);
return false;
}
// 根据实际格式加载水印图片
$watermarkImage = null;
switch ($extension) {
case 'webp':
$watermarkImage = @imagecreatefromwebp($watermarkTmp);
break;
case 'png':
$watermarkImage = @imagecreatefrompng($watermarkTmp);
break;
case 'jpg':
$watermarkImage = @imagecreatefromjpeg($watermarkTmp);
break;
default:
$watermarkImage = @imagecreatefromstring($watermarkData);
break;
}
if (!$watermarkImage) {
$this->logDebug("无法加载水印图片,格式: {$extension}");
imagedestroy($srcImage);
@unlink($watermarkTmp);
return false;
}
// 保留水印图片的透明度
imagealphablending($watermarkImage, false);
imagesavealpha($watermarkImage, true);
// 获取尺寸
$srcWidth = imagesx($srcImage);
$srcHeight = imagesy($srcImage);
$wmWidth = imagesx($watermarkImage);
$wmHeight = imagesy($watermarkImage);
// 强制将水印缩放到原始尺寸的三分之一
$targetScale = self::WATERMARK_SCALE;
$newWmWidth = (int)($wmWidth * $targetScale);
$newWmHeight = (int)($wmHeight * $targetScale);
// ========== 关键修复:创建透明背景的画布 ==========
$resizedWatermark = imagecreatetruecolor($newWmWidth, $newWmHeight);
// 关闭默认的 alpha 混合,以便独立设置透明度
imagealphablending($resizedWatermark, false);
// 保存完整的 alpha 通道信息
imagesavealpha($resizedWatermark, true);
// 用完全透明的颜色填充整个画布(关键一步!)
$transparent = imagecolorallocatealpha($resizedWatermark, 0, 0, 0, 127);
imagefill($resizedWatermark, 0, 0, $transparent);
// 将原始水印缩放并复制到透明画布上
imagecopyresampled($resizedWatermark, $watermarkImage, 0, 0, 0, 0, $newWmWidth, $newWmHeight, $wmWidth, $wmHeight);
imagedestroy($watermarkImage);
$watermarkImage = $resizedWatermark;
$wmWidth = $newWmWidth;
$wmHeight = $newWmHeight;
// 如果缩放后仍大于截图宽度的30%,再按比例缩小
if ($wmWidth > $srcWidth * 0.3) {
$scale = ($srcWidth * 0.3) / $wmWidth;
$newWmWidth2 = (int)($wmWidth * $scale);
$newWmHeight2 = (int)($wmHeight * $scale);
$resizedWatermark2 = imagecreatetruecolor($newWmWidth2, $newWmHeight2);
// 同样填充透明背景
imagealphablending($resizedWatermark2, false);
imagesavealpha($resizedWatermark2, true);
$transparent2 = imagecolorallocatealpha($resizedWatermark2, 0, 0, 0, 127);
imagefill($resizedWatermark2, 0, 0, $transparent2);
imagecopyresampled($resizedWatermark2, $watermarkImage, 0, 0, 0, 0, $newWmWidth2, $newWmHeight2, $wmWidth, $wmHeight);
imagedestroy($watermarkImage);
$watermarkImage = $resizedWatermark2;
$wmWidth = $newWmWidth2;
$wmHeight = $newWmHeight2;
}
// 计算水印位置(右上角,留边距)
$destX = $srcWidth - $wmWidth - self::WATERMARK_MARGIN;
$destY = self::WATERMARK_MARGIN;
// 启用 Alpha 混合
imagealphablending($srcImage, true);
imagesavealpha($srcImage, true);
// 复制水印到原图
$opacity = self::WATERMARK_OPACITY;
if ($opacity < 100) {
imagecopymerge($srcImage, $watermarkImage, $destX, $destY, 0, 0, $wmWidth, $wmHeight, $opacity);
} else {
imagecopy($srcImage, $watermarkImage, $destX, $destY, 0, 0, $wmWidth, $wmHeight);
}
// 保存带水印的图片
$outputPath = tempnam(sys_get_temp_dir(), 'watermarked_') . '.' . $format;
switch (strtolower($format)) {
case 'png':
imagepng($srcImage, $outputPath);
break;
case 'jpg':
case 'jpeg':
imagejpeg($srcImage, $outputPath, 90);
break;
case 'webp':
imagewebp($srcImage, $outputPath, 90);
break;
default:
imagewebp($srcImage, $outputPath, 90);
break;
}
imagedestroy($srcImage);
imagedestroy($watermarkImage);
@unlink($watermarkTmp);
return $outputPath;
} catch (Exception $e) {
$this->logDebug("水印添加异常: " . $e->getMessage());
@unlink($watermarkTmp);
return false;
}
}
/**
* 下载水印图片数据(增强版,添加 Referer 防盗链处理)
*/
private function downloadWatermark()
{
$ch = curl_init(self::WATERMARK_URL);
curl_setopt_array($ch, [
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_TIMEOUT => 15,
CURLOPT_USERAGENT => self::USER_AGENT,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_REFERER => 'https://cxgn.cn/',
CURLOPT_HTTPHEADER => [
'Accept: image/webp,image/apng,image/*,*/*;q=0.8',
'Accept-Language: zh-CN,zh;q=0.9',
],
]);
$data = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
$error = curl_error($ch);
curl_close($ch);
if ($httpCode != 200 || empty($data)) {
$this->logDebug("水印图片下载失败,HTTP: {$httpCode}, 错误: {$error}");
return false;
}
return $data;
}
private function insertScreenshotIntoContent($content, $screenshotUrl)
{
if (empty($screenshotUrl)) {
return $content;
}
$needle = '### 工具介绍';
$pos = mb_strpos($content, $needle);
if ($pos === false) {
$this->logDebug("未找到「### 工具介绍」标题,截图将插入到内容末尾");
return $content . "\n\n" . $this->buildScreenshotHtml($screenshotUrl);
}
$nextSectionPos = mb_strpos($content, "\n###", $pos + mb_strlen($needle));
if ($nextSectionPos === false) {
$insertPos = mb_strlen($content);
} else {
$insertPos = $nextSectionPos;
}
$screenshotHtml = "\n\n" . $this->buildScreenshotHtml($screenshotUrl) . "\n\n";
$newContent = mb_substr($content, 0, $insertPos) . $screenshotHtml . mb_substr($content, $insertPos);
$this->logDebug("截图已插入到「工具介绍」之后");
return $newContent;
}
private function buildScreenshotHtml($screenshotUrl)
{
return ' . ')
';
}
private function generateArticleFromUrl($url, $detailed = true, $skip_icon = false, $existing_post_id = 0, $retry_mode = false)
{
try {
$websiteData = $this->fetchWebContent($url);
if (isset($websiteData['effective_url']) && !$this->isSameDomain($url, $websiteData['effective_url'])) {
throw new Exception('目标URL跳转到其他域名,跳过生成(原域名:' . parse_url($url, PHP_URL_HOST) . ',跳转后:' . parse_url($websiteData['effective_url'], PHP_URL_HOST) . ')');
}
$invalidSitePatterns = [
'domain is parked', 'domain for sale', 'buy this domain', 'under construction', 'site can’t be reached',
'暂时无法访问', '域名出售', '该域名已过期', 'this domain is expired', 'parked page', 'sedo parking',
];
$htmlLower = strtolower($websiteData['html']);
foreach ($invalidSitePatterns as $pattern) {
if (strpos($htmlLower, strtolower($pattern)) !== false) {
throw new Exception('目标网站已失效或为停放页面,跳过处理');
}
}
$contentLength = mb_strlen($websiteData['content'] ?? '', 'UTF-8');
$isSimpleStrategy = isset($websiteData['strategy']) && $websiteData['strategy'] === 'simple';
if (!$isSimpleStrategy && $contentLength < 30) {
if (!$retry_mode) {
throw new Exception('抓取到的网页正文过短(' . $contentLength . '字),无法生成有效内容');
}
$this->logDebug("重试模式:内容过短但继续生成(当前长度: {$contentLength}字)");
}
if ($isSimpleStrategy) {
$this->logDebug("使用simple策略,跳过正文长度检查(当前长度: {$contentLength}字)", "抓取策略");
}
if ($retry_mode && $contentLength < 100) {
$this->logDebug("重试模式:内容较少,尝试提取更多元信息...");
$extraInfo = $this->extractExtraMetaInfo($url, $websiteData['html']);
if (!empty($extraInfo)) {
$websiteData = array_merge($websiteData, $extraInfo);
$this->logDebug("重试模式:提取到额外信息 - " . json_encode(array_keys($extraInfo)));
}
}
$cover_url = '';
if (!$skip_icon) {
try {
$cover_url = $this->downloadFavicon($url, $websiteData['html']);
if (empty($cover_url)) {
$this->logDebug("图标下载失败,但任务将继续(无封面)");
}
} catch (Exception $e) {
$this->logDebug("图标下载异常,任务将继续(无封面): " . $e->getMessage());
$cover_url = '';
}
} else {
if ($existing_post_id > 0) {
$db = Database::getInstance();
$sql = "SELECT cover FROM " . DB_PREFIX . "blog WHERE gid = {$existing_post_id}";
$row = $db->once_fetch_array($sql);
if (!empty($row['cover'])) {
$cover_url = $row['cover'];
$this->logDebug("更新任务:保留原封面 {$cover_url}");
}
}
$this->logDebug("更新任务:跳过图标下载");
}
// ========== 下载网站截图(含水印) ==========
$screenshot_url = '';
if (self::ENABLE_SCREENSHOT) {
try {
$screenshot_url = $this->downloadScreenshot($url, 1200, 800, 'webp');
if (empty($screenshot_url)) {
$this->logDebug("截图下载失败,文章将不含截图");
}
} catch (Exception $e) {
$this->logDebug("截图下载异常: " . $e->getMessage());
}
}
if (!class_exists('AI')) throw new Exception('AI功能未配置');
$ai_config = AI::getCurrentModelInfo();
if (empty($ai_config['api_key'])) throw new Exception('请先在系统设置中配置AI');
$prompt = $detailed ? $this->generateOriginalPrompt($url, $websiteData, $retry_mode) : $this->generateSimplePrompt($url, $websiteData, $retry_mode);
$content = $this->callAI($prompt);
if (empty($content)) throw new Exception('AI返回内容为空');
if (!$this->isAIContentComplete($content, $detailed)) {
$this->logDebug("AI生成内容不完整,尝试重试一次");
sleep(2);
$content = $this->callAI($prompt);
if (empty($content) || !$this->isAIContentComplete($content, $detailed)) {
throw new Exception('AI生成内容不完整,可能由于超时或限制,已跳过发布');
}
}
$processed = $this->processAIResponse($content, $websiteData);
$coreKeywords = $this->extractCoreKeywordsFromPage($url, $websiteData);
if (!empty($coreKeywords)) {
$combinedText = $processed['title'] . ' ' . $processed['content'] . ' ' . ($processed['seo_title'] ?? '') . ' ' . ($processed['seo_description'] ?? '');
$matchCount = 0;
foreach ($coreKeywords as $keyword) {
if (mb_strpos($combinedText, $keyword) !== false) {
$matchCount++;
}
}
$threshold = ceil(count($coreKeywords) * 0.3);
if ($matchCount < $threshold) {
$this->logDebug("警告:生成的文章可能偏离页面主题,仅匹配到 {$matchCount}/" . count($coreKeywords) . " 个核心关键词");
}
}
if (!$this->isValidArticleContent($processed['content'])) {
throw new Exception('AI生成的内容无效(包含过多占位信息或无实质内容)');
}
// 插入截图到内容中
if (!empty($screenshot_url)) {
$processed['content'] = $this->insertScreenshotIntoContent($processed['content'], $screenshot_url);
}
$rawAlias = $this->generateAlias($processed['title']);
$this->logDebug("生成文章数据,URL: {$url}");
return [
'success' => true,
'url' => $url,
'title' => self::cleanUtf8($processed['title']),
'content' => self::cleanUtf8($processed['content']),
'excerpt' => self::cleanUtf8($processed['excerpt']),
'tags' => self::cleanUtf8($processed['tags']),
'category_ids' => $processed['category_ids'],
'cover_url' => $cover_url,
'screenshot_url' => $screenshot_url,
'alias' => $rawAlias,
'raw_alias' => $rawAlias,
'seo_title' => self::cleanUtf8($processed['seo_title'] ?? ''),
'seo_description' => self::cleanUtf8($processed['seo_description'] ?? ''),
'nav_fields' => $processed['nav_fields'],
];
} catch (Exception $e) {
return ['success' => false, 'error' => $e->getMessage()];
}
}
private function isAIContentComplete($content, $detailed = true)
{
if (empty($content)) return false;
if ($detailed) {
$requiredSections = [
'### 工具介绍',
'### 核心功能',
'### 使用场景',
'### 适用人群',
'### 独特优势',
'### 实测体验'
];
$missing = [];
foreach ($requiredSections as $section) {
if (strpos($content, $section) === false) {
$missing[] = $section;
}
}
if (count($missing) > 2) {
$this->logDebug("AI内容缺失关键章节: " . implode(', ', $missing));
return false;
}
$bodyStart = strpos($content, '### 工具介绍');
if ($bodyStart !== false) {
$body = substr($content, $bodyStart);
} else {
$body = $content;
}
if (mb_strlen($body, 'UTF-8') < 500) {
$this->logDebug("AI生成正文过短: " . mb_strlen($body) . " 字符");
return false;
}
} else {
if (mb_strlen($content, 'UTF-8') < 100) {
return false;
}
}
return true;
}
private function extractCoreKeywordsFromPage($url, $websiteData)
{
$keywords = [];
if (!empty($websiteData['title'])) {
$titleWords = preg_split('/[\s,,.。、::]+/u', $websiteData['title'], -1, PREG_SPLIT_NO_EMPTY);
foreach ($titleWords as $word) {
if (mb_strlen($word) > 2 && !in_array($word, $keywords)) {
$keywords[] = $word;
}
}
}
if (!empty($websiteData['desc'])) {
$descWords = preg_split('/[\s,,.。、::]+/u', $websiteData['desc'], -1, PREG_SPLIT_NO_EMPTY);
foreach ($descWords as $word) {
if (mb_strlen($word) > 2 && !in_array($word, $keywords)) {
$keywords[] = $word;
}
}
}
return array_slice($keywords, 0, 5);
}
private function isSameDomain($url1, $url2)
{
$host1 = parse_url($url1, PHP_URL_HOST);
$host2 = parse_url($url2, PHP_URL_HOST);
if (!$host1 || !$host2) return false;
$host1 = preg_replace('/^www\./i', '', $host1);
$host2 = preg_replace('/^www\./i', '', $host2);
return strtolower($host1) === strtolower($host2);
}
private function extractExtraMetaInfo($url, $html)
{
$extra = [];
if (preg_match('/]*property=["\']og:title["\'][^>]*content=["\']([^"\']+)["\'][^>]*>/i', $html, $m)) {
$extra['og_title'] = $m[1];
} elseif (preg_match('/]*content=["\']([^"\']+)["\'][^>]*property=["\']og:title["\'][^>]*>/i', $html, $m)) {
$extra['og_title'] = $m[1];
}
if (preg_match('/]*property=["\']og:description["\'][^>]*content=["\']([^"\']+)["\'][^>]*>/i', $html, $m)) {
$extra['og_desc'] = $m[1];
} elseif (preg_match('/]*content=["\']([^"\']+)["\'][^>]*property=["\']og:description["\'][^>]*>/i', $html, $m)) {
$extra['og_desc'] = $m[1];
}
if (preg_match('/]*name=["\']twitter:title["\'][^>]*content=["\']([^"\']+)["\'][^>]*>/i', $html, $m)) {
$extra['twitter_title'] = $m[1];
}
if (preg_match('/]*name=["\']twitter:description["\'][^>]*content=["\']([^"\']+)["\'][^>]*>/i', $html, $m)) {
$extra['twitter_desc'] = $m[1];
}
if (preg_match('/]*name=["\']application-name["\'][^>]*content=["\']([^"\']+)["\'][^>]*>/i', $html, $m)) {
$extra['app_name'] = $m[1];
}
if (preg_match('/]*>(.*?)<\/h1>/is', $html, $m)) {
$h1 = strip_tags($m[1]);
$h1 = trim(preg_replace('/\s+/', ' ', $h1));
if (!empty($h1)) {
$extra['h1'] = $h1;
}
}
$h2_tags = [];
if (preg_match_all('/]*>(.*?)<\/h2>/is', $html, $matches)) {
foreach ($matches[1] as $h2) {
$h2 = strip_tags($h2);
$h2 = trim(preg_replace('/\s+/', ' ', $h2));
if (!empty($h2) && mb_strlen($h2) > 3) {
$h2_tags[] = $h2;
}
}
}
if (!empty($h2_tags)) {
$extra['h2_tags'] = implode(' | ', array_slice($h2_tags, 0, 3));
}
if (preg_match('/]*property=["\']article:published_time["\'][^>]*content=["\']([^"\']+)["\'][^>]*>/i', $html, $m)) {
$extra['published_time'] = $m[1];
}
return $extra;
}
private function isValidArticleContent($content)
{
if (empty($content)) {
return false;
}
$content = trim($content);
$length = mb_strlen($content, 'UTF-8');
if ($length < 10) {
$this->logDebug("内容过短({$length}字符),判定为无效");
return false;
}
$invalidPhrases = ['页面未找到', '404', 'page not found', 'not found'];
$invalidCount = 0;
foreach ($invalidPhrases as $phrase) {
$count = substr_count(mb_strtolower($content), mb_strtolower($phrase));
$invalidCount += $count;
}
if ($invalidCount > 10) {
$this->logDebug("内容包含过多无效关键词({$invalidCount}次),判定为无效");
return false;
}
$cleanContent = strip_tags($content);
$cleanContent = preg_replace('/\s+/', ' ', $cleanContent);
$cleanContent = trim($cleanContent);
if (mb_strlen($cleanContent) < 5) {
$this->logDebug("清理后的内容过短(" . mb_strlen($cleanContent) . "字符),判定为无效");
return false;
}
$paragraphs = preg_split('/\n+/', $content);
$validParagraphs = 0;
foreach ($paragraphs as $para) {
$para = trim($para);
$para = strip_tags($para);
$para = preg_replace('/[#*]+\s*/', '', $para);
$para = preg_replace('/\s+/', ' ', $para);
$para = trim($para);
if (mb_strlen($para) > 15) {
$validParagraphs++;
}
}
if ($validParagraphs < 1) {
$this->logDebug("有效段落过少({$validParagraphs}个),判定为无效");
return false;
}
return true;
}
private function fetchWebContent($url)
{
$methods = ['curlWithRetry', 'curlRobust', 'curlSimple', 'curlAntiBot', 'fileGet'];
foreach ($methods as $method) {
try {
$result = $this->$method($url);
if (!empty($result['html']) && strlen($result['html']) > 100) {
$html = $this->convertHtmlToUtf8($result['html'], $url);
$simple_data = $this->extractSimpleContent($html, $url);
if ($simple_data !== null) {
$simple_data['html'] = $html;
$simple_data['effective_url'] = $result['effective_url'];
$this->logDebug("使用唐僧插件风格简单提取成功,URL: {$url}", "抓取策略");
return $simple_data;
}
$data = $this->extractWebsiteData($html);
$data['html'] = $html;
$data['effective_url'] = $result['effective_url'];
if (mb_strlen($data['content']) < 20) {
$this->logDebug("抓取到的内容过短(" . mb_strlen($data['content']) . "字符),URL: {$url}");
$fallback_content = $this->extractFallbackContent($html);
if (mb_strlen($fallback_content) > mb_strlen($data['content'])) {
$data['content'] = $fallback_content;
$this->logDebug("使用备用提取方法,内容长度: " . mb_strlen($fallback_content));
}
}
return $data;
}
} catch (Exception $e) {
$this->logDebug("{$method} 失败: " . $e->getMessage());
continue;
}
}
throw new Exception('无法获取网页内容,所有抓取方法均失败');
}
// ========== 增强版抓取方法(HTTP/2、完整浏览器模拟、Cookie管理) ==========
private function curlWithRetry($url, $maxRetries = 2)
{
$lastException = null;
for ($i = 0; $i < $maxRetries; $i++) {
try {
if ($i > 0) {
$this->logDebug("抓取重试第 {$i} 次,URL: {$url}");
sleep(2 * $i);
}
return $this->curlRobust($url);
} catch (Exception $e) {
$lastException = $e;
if (strpos($e->getMessage(), '反爬虫') !== false) {
break;
}
}
}
throw $lastException ?: new Exception('抓取失败,已达最大重试次数');
}
private function curlRobust($url)
{
$userAgents = [
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3 Safari/605.1.15',
];
$userAgent = $userAgents[array_rand($userAgents)];
$this->logDebug("使用User-Agent: " . substr($userAgent, 0, 60) . "...");
$parsedUrl = parse_url($url);
$referer = ($parsedUrl['scheme'] ?? 'https') . '://' . ($parsedUrl['host'] ?? '') . '/';
$cookieFile = sys_get_temp_dir() . '/chuang_cookie_' . md5($url) . '.txt';
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 45,
CURLOPT_CONNECTTIMEOUT => 20,
CURLOPT_USERAGENT => $userAgent,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_SSL_VERIFYHOST => 0,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_2_0,
CURLOPT_ENCODING => 'gzip, deflate, br',
CURLOPT_COOKIEFILE => $cookieFile,
CURLOPT_COOKIEJAR => $cookieFile,
CURLOPT_HTTPHEADER => [
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8',
'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-US;q=0.7',
'Accept-Encoding: gzip, deflate, br',
'Cache-Control: no-cache',
'Pragma: no-cache',
'Sec-Ch-Ua: "' . $this->getRandomSecChUa() . '"',
'Sec-Ch-Ua-Mobile: ?0',
'Sec-Ch-Ua-Platform: "Windows"',
'Sec-Fetch-Dest: document',
'Sec-Fetch-Mode: navigate',
'Sec-Fetch-Site: none',
'Sec-Fetch-User: ?1',
'Upgrade-Insecure-Requests: 1',
'DNT: 1',
'Referer: ' . $referer,
],
CURLOPT_AUTOREFERER => true,
CURLOPT_IPRESOLVE => CURL_IPRESOLVE_V4,
]);
$html = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
$effectiveUrl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
$error = curl_error($ch);
curl_close($ch);
@unlink($cookieFile);
if ($error) {
$this->logDebug("cURL错误: {$error}");
throw new Exception("网络请求失败: {$error}");
}
if ($httpCode == 403 || $httpCode == 503) {
if ($this->isAntiBotPage($html)) {
throw new Exception('网站启用了反爬虫验证(Cloudflare/JS挑战),无法抓取');
}
throw new Exception("HTTP {$httpCode} - 访问被拒绝");
}
if ($httpCode != 200 && empty($html)) {
throw new Exception("HTTP {$httpCode} 且无内容返回");
}
if ($httpCode != 200) {
$this->logDebug("HTTP {$httpCode} 但返回了内容,继续尝试提取");
}
if (strlen($html) < 500 && $this->isAntiBotPage($html)) {
throw new Exception('返回内容为反爬虫验证页面');
}
return ['html' => $html, 'effective_url' => $effectiveUrl];
}
private function getRandomSecChUa()
{
$brands = [
'"Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"',
'"Chromium";v="123", "Not(A:Brand";v="8", "Google Chrome";v="123"',
'"Chromium";v="121", "Not(A:Brand";v="99", "Google Chrome";v="121"',
];
return $brands[array_rand($brands)];
}
private function isAntiBotPage($html)
{
$anti_bot_indicators = [
'Just a moment', 'Enable JavaScript', 'Enable cookies',
'Checking your browser', 'Verifying you are human',
'Cloudflare', 'DDoS protection', 'Security check',
'请稍候', '正在验证', '需要JavaScript', '需要Cookie',
'Cloudflare Ray ID', 'Checking if the site connection is secure'
];
foreach ($anti_bot_indicators as $indicator) {
if (stripos($html, $indicator) !== false) {
return true;
}
}
return false;
}
private function curlSimple($url)
{
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true,
CURLOPT_TIMEOUT => 30, CURLOPT_USERAGENT => self::USER_AGENT,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_HTTPHEADER => [
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8',
]
]);
$html = curl_exec($ch);
$effective_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
$http = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($http == 403) {
throw new Exception("HTTP 403 Forbidden - 服务器拒绝访问,跳过");
}
if ($this->isAntiBotPage($html)) {
$this->logDebug("检测到反爬虫验证页面,URL: {$url}");
}
if ($http != 200) {
if (empty($html)) {
throw new Exception("HTTP {$http}");
}
$this->logDebug("HTTP {$http} 但有内容返回,将继续尝试提取,长度: " . strlen($html), "反爬虫");
}
return ['html' => $html, 'effective_url' => $effective_url];
}
private function fileGet($url)
{
$ctx = stream_context_create([
'http' => ['timeout'=>30, 'header'=>"User-Agent: ".self::USER_AGENT."\r\n"],
'ssl' => ['verify_peer'=>false, 'verify_peer_name'=>false]
]);
$html = @file_get_contents($url, false, $ctx);
if ($html === false) {
throw new Exception('file_get_contents失败');
}
return ['html' => $html, 'effective_url' => $url];
}
private function curlAntiBot($url)
{
$this->logDebug("尝试激进的反爬虫绕过方法...", "反爬虫");
$userAgent = 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html';
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_TIMEOUT => 45,
CURLOPT_CONNECTTIMEOUT => 20,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_USERAGENT => $userAgent,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_SSL_VERIFYHOST => 0,
CURLOPT_ENCODING => '',
CURLOPT_HTTPHEADER => [
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-US,en;q=0.5',
'Accept-Encoding: gzip, deflate',
'Connection: close',
],
CURLOPT_AUTOREFERER => true,
CURLOPT_MAXREDIRS => 3,
CURLOPT_IPRESOLVE => CURL_IPRESOLVE_V4,
]);
$html = curl_exec($ch);
$effective_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
$http = curl_getinfo($ch, CURLINFO_HTTP_CODE);
$error = curl_error($ch);
curl_close($ch);
if ($error) {
$this->logDebug("curlAntiBot cURL错误: {$error}", "反爬虫");
}
if ($http == 403) {
throw new Exception("HTTP 403 Forbidden - 服务器拒绝访问,跳过");
}
if (empty($html) && $http != 200) {
throw new Exception("HTTP {$http}");
}
$this->logDebug("curlAntiBot完成,HTTP: {$http}", "反爬虫");
return ['html' => $html, 'effective_url' => $effective_url];
}
private function extractSimpleContent($html, $url = '')
{
$this->logDebug("尝试唐僧插件风格的简单提取...", "内容策略");
$title = '';
if (preg_match('/]*>(.*?)<\/title>/is', $html, $m)) {
$title = trim(html_entity_decode($m[1], ENT_QUOTES | ENT_HTML5, 'UTF-8'));
$title = preg_replace('/\s+/', ' ', $title);
$this->logDebug("提取到标题(标准): {$title}", "简单提取");
}
if (empty($title) && preg_match('//is', $html, $m)) {
$title = trim(html_entity_decode($m[1], ENT_QUOTES | ENT_HTML5, 'UTF-8'));
$this->logDebug("提取到标题(og:title): {$title}", "简单提取");
}
if (empty($title) && preg_match('//is', $html, $m)) {
$title = trim(html_entity_decode($m[1], ENT_QUOTES | ENT_HTML5, 'UTF-8'));
$this->logDebug("提取到标题(twitter:title): {$title}", "简单提取");
}
if (empty($title) && preg_match('/