Introducing AI Performance in Bing Webmaster Tools Public Preview We are happy to introduce AI Performance in Bing Webmaster Tools, a new set of insights that shows how publisher content appears across Microsoft Copilot, AI-generated summaries in Bing, and select partner integrations. For the first time, you can understand how often your content is cited in generative answers, with clear visibility into which URLs are referenced and how citation activity changes over time. Extending Search Insights to AI Answers Bing Webmaster Tools has long helped website owners understand indexing, crawl health, and search performance. AI Performance extends those insights to AI-generated answers by showing where and how content from your site is referenced as a source across AI experiences. As AI becomes a more common way people discover information, visibility is not only about blue links. It is also about whether your content is cited and referenced when AI systems generate answers. This release is an early step toward Generative Engine Optimization (GEO) tooling in Bing Webmaster Tools, helping publishers understand how their content participates in AI-driven experiences. AI-performance-dashboard.png AI Performance Dashboard: Visibility Across AI Experiences The AI Performance dashboard provides a consolidated view of when your site is cited in AI answers. What the dashboard measures Total Citations Shows the total number of citations that are displayed as sources in AI-generated answers during the selected time frame. This highlights how often your content is referenced by AI systems, without indicating placement or presentation within a specific answer. Average Cited Pages Shows the average number of unique pages from your site that are displayed as sources in AI-generated answers per day over the selected time range. Because the data is aggregated across supported AI surfaces, average cited pages reflect overall citation patterns and does not indicate ranking, authority, or the role of any page within an individual answer. Grounding queries Shows the key phrases the AI used when retrieving content that was referenced in AI-generated answers. The data shown represents a sample of overall citation activity. We will continue to refine this metric as additional data is processed. Page-level citation activity Shows citation counts for specific URLs from your site, making it easy to see which individual pages are most often referenced across AI-generated answers during the selected date range. This reflects how often pages are cited, not page importance, ranking, or placement. Visibility trends over time The timeline shows how citation activity for your site changes over time across supported AI experiences, making it easier to spot trends at a glance. Important Note: Bing respects all content owner preferences expressed through robots.txt and other supported control mechanisms. Using AI Performance Insights in Bing Webmaster Tools By reviewing cited pages and grounding query phrases, AI Performance insights help clarify your content visibility in AI-generated answers. These insights can help you: Validate which pages are already being used as references in AI answers. Identify content that appears frequently across AI answers. Spot opportunities to improve clarity, structure, or completeness on pages that are indexed but less frequently cited. Using These Insights to Improve Content Once you understand which pages and topics are being cited, you can use those signals to guide content improvements. Strengthen depth and expertise Pages cited for specific grounding query phrases often reflect clear subject focus and domain expertise. Deepening coverage in related areas can reinforce authority. Improve structure and clarity Clear headings, tables, and FAQ sections help surface key information and make content easier for AI systems to reference accurately. Support claims with evidence Examples, data, and cited sources help build trust when content is reused in AI-generated answers. Keep content fresh and accurate Regular updates help ensure AI systems reference the most current version of your content. Reduce ambiguity across formats Align text, images, and video so they consistently represent the same entities, products, or concepts. For deeper guidance on structuring content to improve inclusion in AI-generated answers, see Optimizing Your Content for Inclusion in AI Search Answers. Keeping Content Fresh with IndexNow Accurate and up to date content is important for inclusion and citation in AI-generated answers. IndexNow helps keep information fresh across search and AI experiences by notifying participating search engines whenever content is added, updated, or removed. By enabling faster discovery of content changes, IndexNow helps ensure that AI systems reference the most current version of a page when generating answers. If you’re not already using IndexNow, go to https://www.indexnow.org to get started. Local Business Information and AI Visibility For local businesses, accurate business information is especially important when AI experiences surface answers to location-based queries. In addition to using Bing Webmaster Tools, businesses can register with Bing Places for Business to help ensure that key details such as address, hours, and contact information remain current and eligible for inclusion in AI-generated responses. Evolving AI Performance with the Webmaster Community AI Performance in Bing Webmaster Tools marks an important step toward greater transparency between AI systems and the open web. As we expand these insights, we’ll continue working with publishers and the webmaster community to improve inclusion, attribution, and visibility across both search results and AI experiences. We look forward to partnering with you as we evolve these capabilities and continue building tools that support discovery in the next generation of search and AI experiences. Krishna Madhavan, Meenaz Merchant, Fabrice Canel, Saral Nigam Product Managers, Microsoft AI 批量收录插件 ?php /* Plugin Name: 批量网址收录·定时生成 Version: 1.8.8 Plugin URL: https://www.emlog.net/plugin/detail/xxx Description: 独立定时批量生成插件,支持网址队列、自动抓取、AI生成、自动填充多分类/TDK/导航字段、封面优先Favicon。集成智能别名、自动刷新缓存。新增URL已收录检测(智能忽略www和结尾斜杠)、抓取失败跳过功能。支持文章生成后自动推送到必应搜索引擎(IndexNow)。新增全品类分类体系,支持两个分类体系并行选择。优化提示词,强化GEO/SEO内容生成,提升文章质量和搜索引擎友好度。增加基于别名的重复检测,避免相同标题不同域名重复收录。单次定时任务连续处理多个任务,失败立即跳过。增强源网站有效性检测,减少无效任务。新增图标下载优先使用API、域名跳转检测、违规内容过滤。优化:HTTP 403 直接标记失败跳过。移除Google图标下载源,避免超时拖慢任务。优化图标下载重试策略和HTML检测。增加定时任务执行时间至180秒,优化工具站内容提取,强化AI内容忠实度检测。新增实测模块,以“AI创作导航”第一人称视角生成真实体验内容。修复别名冲突逻辑:不同域名允许加数字后缀收录。增强cURL抓取能力:支持HTTP/2、完整浏览器头部模拟、Cookie管理,大幅提升反爬虫站点抓取成功率。新增网站截图功能,自动在工具介绍后插入截图,并添加自定义水印(修复PNG水印透明背景黑色问题,水印大小为原始尺寸的1/3)。 Author: 您的名字 Author URL: https://www.emlog.net/profiles/xxx */ !defined('EMLOG_ROOT') && exit('access denied!'); class ChuangAiLootBatch { const ID = 'chuang_ailoot'; const VERSION = '1.8.8'; // 截图功能开关 const ENABLE_SCREENSHOT = true; // 截图 API 基础地址 const SCREENSHOT_API_URL = 'https://screenshotsnap.com/api/screenshot'; // 水印图片URL(右上角水印) const WATERMARK_URL = 'https://cxgn.cn/apple-touch-icon.webp'; // 水印边距(像素) const WATERMARK_MARGIN = 10; // 水印透明度(0-100,100为不透明) const WATERMARK_OPACITY = 10; // 水印缩放比例(1/5 即缩小到原来的三分之一) const WATERMARK_SCALE = 1/5; private static $_instance; private $_inited = false; private $_pinyinDict = null; private $_mbstringAvailable = false; const USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36'; private static $_userAgents = [ 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:123.0) Gecko/20100101 Firefox/123.0', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3.1 Safari/605.1.15', ]; private static $badWords = [ '赌博', '赌场', '百家乐', '轮盘', '老虎机', '六合彩', '时时彩', '彩票', '赌球', '色情', '情色', '成人', 'AV', '淫秽', '三级片', '性交', '裸聊', '约炮', '嫖娼', 'casino', 'gambling', 'porn', 'xxx', 'sex', 'nude', 'erotic' ]; public static function getInstance() { if (self::$_instance === null) { self::$_instance = new self(); } return self::$_instance; } private function __construct() { $this->_mbstringAvailable = function_exists('mb_strlen') && function_exists('mb_substr') && function_exists('mb_internal_encoding'); if ($this->_mbstringAvailable) { mb_internal_encoding('UTF-8'); } } // ========== 终极UTF-8净化器 ========== private static function cleanUtf8($str) { if (empty($str) || !is_string($str)) return $str; $str = preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F\xEF\xBB\xBF]/u', '', $str); if (mb_check_encoding($str, 'UTF-8')) { return $str; } $encodings = ['GB18030', 'GBK', 'GB2312', 'BIG5', 'ASCII', 'ISO-8859-1']; foreach ($encodings as $enc) { if (mb_check_encoding($str, $enc)) { $converted = mb_convert_encoding($str, 'UTF-8', $enc); if ($converted !== false) { return $converted; } } } return preg_replace('/[\x80-\xFF]/', '', $str); } private static function isUnknownValue($str) { if (empty($str)) return true; $str = trim($str); $pattern = '/未知|未公开|不确定|不详|无|null|none|undefine|unknown|anonymous|官方未公开|官方未提供|暂未提供|暂无|未明确披露|未填写|测试|demo|example|localhost|all rights reserved|保留(?:所有)?权利|版权所有|copyright|©|\(c\)|powered by|designed by|developed by|技术支持|提供技术支持|theme|template|wordpress/i'; return preg_match($pattern, $str) ? true : false; } private function convertHtmlToUtf8($html, $url = '') { if (empty($html)) return ''; $charset = ''; if (preg_match('/]+charset=["\']?([^"\'\s>]+)/i', $html, $m)) { $charset = trim($m[1]); } elseif (preg_match('/]+content=["\'][^"\']*charset=([^"\'\s>]+)/i', $html, $m)) { $charset = trim($m[1]); } if (!empty($charset)) { $charset = strtoupper($charset); $charset = preg_replace('/[^A-Z0-9-]/', '', $charset); if ($charset == 'UTF8') $charset = 'UTF-8'; if (in_array($charset, ['GB2312', 'GBK'])) $charset = 'GB18030'; if ($charset != 'UTF-8') { $converted = @iconv($charset, 'UTF-8//IGNORE', $html); if ($converted !== false) return $converted; } } return self::cleanUtf8($html); } private function getFallbackPinyinDict() { // 简化字典,实际使用时请使用完整字典文件 pinyin_dict.php return [ '的' => 'de', '一' => 'yi', '是' => 'shi', '了' => 'le', '我' => 'wo', '不' => 'bu', '人' => 'ren', '在' => 'zai', '他' => 'ta', '有' => 'you', '这' => 'zhe', '个' => 'ge', '上' => 'shang', '来' => 'lai', '到' => 'dao', '大' => 'da', '们' => 'men', '说' => 'shuo', '中' => 'zhong', '为' => 'wei', '子' => 'zi', '和' => 'he', '你' => 'ni', '地' => 'di', '出' => 'chu', '道' => 'dao', '也' => 'ye', '时' => 'shi', '年' => 'nian', '得' => 'de', '就' => 'jiu', '那' => 'na', '要' => 'yao', '下' => 'xia', '以' => 'yi', '生' => 'sheng', '会' => 'hui', '自' => 'zi', '着' => 'zhe', '去' => 'qu', '之' => 'zhi', '过' => 'guo', '家' => 'jia', '学' => 'xue', '对' => 'dui', '可' => 'ke', '她' => 'ta', '里' => 'li', '后' => 'hou', '小' => 'xiao', '么' => 'me', '心' => 'xin', '多' => 'duo', '天' => 'tian', '而' => 'er', '能' => 'neng', '好' => 'hao', '都' => 'dou', '然' => 'ran', '没' => 'mei', '日' => 'ri', '于' => 'yu', '起' => 'qi', '还' => 'hai', '发' => 'fa', '成' => 'cheng', '事' => 'shi', '只' => 'zhi', '作' => 'zuo', '当' => 'dang', '想' => 'xiang', '看' => 'kan', '文' => 'wen', '无' => 'wu', '开' => 'kai', '手' => 'shou', '十' => 'shi', '用' => 'yong', '主' => 'zhu', '方' => 'fang', '前' => 'qian', '如' => 'ru', '进' => 'jin', '样' => 'yang', '从' => 'cong', '同' => 'tong', '工' => 'gong', '也' => 'ye', '面' => 'mian', '又' => 'you', '马' => 'ma', '动' => 'dong', '而' => 'er', '现' => 'xian', '点' => 'dian', '最' => 'zui', '新' => 'xin', '打' => 'da', '重' => 'zhong', '每' => 'mei', '但' => 'dan', '身' => 'shen', '些' => 'xie', '高' => 'gao', '已' => 'yi', '此' => 'ci', '实' => 'shi', '书' => 'shu', '部' => 'bu', '其' => 'qi', '法' => 'fa', '因' => 'yin', '相' => 'xiang', '什' => 'shen', '二' => 'er', '问' => 'wen', '理' => 'li', '美' => 'mei', '点' => 'dian', '月' => 'yue', '万' => 'wan', '将' => 'jiang', '外' => 'wai', '政' => 'zheng', '义' => 'yi', '安' => 'an', '原' => 'yuan', '女' => 'nv', '么' => 'yao', '先' => 'xian', '老' => 'lao', '很' => 'hen', '通' => 'tong', '教' => 'jiao', '并' => 'bing', '提' => 'ti', '意' => 'yi', '认' => 'ren', '件' => 'jian', '计' => 'ji', '决' => 'jue', '公' => 'gong', '特' => 'te', '长' => 'chang', '党' => 'dang', '军' => 'jun', '民' => 'min', '等' => 'deng', '度' => 'du', '务' => 'wu', '具' => 'ju', '战' => 'zhan', '名' => 'ming', '力' => 'li', '关' => 'guan', '机' => 'ji', '田' => 'tian', '量' => 'liang', '联' => 'lian', '已' => 'yi', '处' => 'chu', '应' => 'ying', '它' => 'ta', '便' => 'bian', '任' => 'ren', '记' => 'ji', '北' => 'bei', '男' => 'nan', '西' => 'xi', '买' => 'mai', '卖' => 'mai', '车' => 'che', '红' => 'hong', '光' => 'guang', '东' => 'dong', '南' => 'nan', '华' => 'hua', '国' => 'guo', '族' => 'zu', '志' => 'zhi', '爱' => 'ai', '护' => 'hu', '保' => 'bao', '持' => 'chi', '续' => 'xu', '展' => 'zhan', '科' => 'ke', '技' => 'ji', '术' => 'shu', '化' => 'hua', '育' => 'yu', '体' => 'ti', '健' => 'jian', '康' => 'kang', '卫' => 'wei', '生' => 'sheng', '产' => 'chan', '业' => 'ye', '商' => 'shang', '品' => 'pin', '质' => 'zhi', '标' => 'biao', '准' => 'zhun', '规' => 'gui', '格' => 'ge', '式' => 'shi', '器' => 'qi', '械' => 'xie', '电' => 'dian', '水' => 'shui', '火' => 'huo', '土' => 'tu', '木' => 'mu', '金' => 'jin', '石' => 'shi', '山' => 'shan', '川' => 'chuan', '湖' => 'hu', '海' => 'hai', '洋' => 'yang', '空' => 'kong', '气' => 'qi', '风' => 'feng', '雨' => 'yu', '雪' => 'xue', '雷' => 'lei', '闪' => 'shan', '声' => 'sheng', '音' => 'yin', '乐' => 'yue', '舞' => 'wu', '台' => 'tai', '戏' => 'xi', '影' => 'ying', '视' => 'shi', '频' => 'pin', '道' => 'dao', '网' => 'wang', '络' => 'luo', '站' => 'zhan', '页' => 'ye', '址' => 'zhi', '域' => 'yu', '深' => 'shen', '求' => 'qiu', '索' => 'suo', '章' => 'zhang', '成' => 'cheng', '苏' => 'su', '州' => 'zhou', '搜' => 'sou', '信' => 'xin', '息' => 'xi', '有' => 'you', '限' => 'xian', '司' => 'si', '天' => 'tian', '津' => 'jin', '滴' => 'di', '忆' => 'yi', ]; } private function loadPinyinDict() { if ($this->_pinyinDict === null) { $dict = null; $dictFile = __DIR__ . '/pinyin_dict.php'; if (file_exists($dictFile)) { try { $dict = include $dictFile; if (is_array($dict)) { $this->_pinyinDict = $dict; $this->logDebug("拼音字典加载成功: 外部文件,条目数 " . count($dict)); return $this->_pinyinDict; } else { $this->logDebug("拼音字典文件格式错误:不是数组,将使用内置字典"); } } catch (Exception $e) { $this->logDebug("拼音字典加载异常: " . $e->getMessage() . ",将使用内置字典"); } } else { $this->logDebug("拼音字典文件不存在: " . $dictFile . ",将使用内置字典"); } $this->_pinyinDict = $this->getFallbackPinyinDict(); $this->logDebug("拼音字典加载成功: 内置字典,条目数 " . count($this->_pinyinDict)); } return $this->_pinyinDict; } private function chineseToPinyin($str) { if (!$this->_mbstringAvailable) { $this->logDebug("mbstring扩展不可用,汉字转拼音功能已禁用"); return ''; } $dict = $this->loadPinyinDict(); if (empty($dict)) { $this->logDebug("拼音字典为空,无法转换拼音"); return ''; } try { $result = []; $len = mb_strlen($str, 'UTF-8'); for ($i = 0; $i < $len; $i++) { $char = mb_substr($str, $i, 1, 'UTF-8'); if (isset($dict[$char])) { $result[] = $dict[$char]; } else { $this->logDebug("未收录汉字: {$char},已忽略"); } } if (empty($result)) { $this->logDebug("汉字块 '{$str}' 转换结果为空"); return ''; } return implode('-', $result); } catch (Exception $e) { $this->logDebug("汉字转拼音异常: " . $e->getMessage()); return ''; } } private function generateAlias($title) { $fallback = 'post-' . time(); try { $title = self::cleanUtf8($title); if (empty($title)) { return $fallback; } $processed = preg_replace_callback( '/[\x{4e00}-\x{9fa5}]+/u', function($matches) { $chineseBlock = $matches[0]; $pinyin = $this->chineseToPinyin($chineseBlock); if (!empty($pinyin)) { return ' ' . $pinyin . ' '; } return ''; }, $title ); $alias = preg_replace('/[^\p{L}\p{N}\s_-]/u', '', $processed); $alias = preg_replace('/[\s_\.\/\\\\]+/', '-', $alias); $alias = preg_replace('/-+/', '-', $alias); $alias = trim($alias, '-'); $alias = strtolower($alias); if (!empty($alias) && strlen($alias) <= 200) { $this->logDebug("别名生成成功: {$alias}"); return $alias; } $this->logDebug("拼音转换结果为空,使用降级方案"); $title = $this->fullToHalf($title); $alias = preg_replace('/[^\p{L}\p{N}\s-]/u', '', $title); $pinyinMap = [ '深度求索' => 'shen-du-qiu-suo', '文章' => 'wen-zhang', '生成' => 'sheng-cheng', '工具' => 'gong-ju', 'AI' => 'ai', ]; foreach ($pinyinMap as $ch => $py) { $alias = str_replace($ch, $py, $alias); } $alias = preg_replace('/[\x{4e00}-\x{9fa5}]/u', '', $alias); $alias = strtolower($alias); $alias = preg_replace('/[\s_\.\/\\\\]+/', '-', $alias); $alias = preg_replace('/-+/', '-', $alias); $alias = trim($alias, '-'); if (strlen($alias) > 200) { $alias = substr($alias, 0, 200); } if (empty($alias)) { $alias = $fallback; } $this->logDebug("降级别名生成成功: {$alias}"); return $alias; } catch (Exception $e) { $this->logDebug("生成别名异常: " . $e->getMessage()); return $fallback; } } /** * 检查别名是否已被其他文章使用,并根据原始URL域名判断是否为重复内容 */ private function ensureUniqueAliasWithDomainCheck($alias, $currentUrl) { $db = Database::getInstance(); $originalAlias = $alias; $counter = 1; while (true) { $sql = "SELECT gid FROM " . DB_PREFIX . "blog WHERE alias = '" . $db->escape_string($alias) . "'"; $res = $db->query($sql); if ($res->num_rows == 0) { return ['alias' => $alias, 'error' => null]; } $row = $res->fetch_assoc(); $existingPostId = $row['gid']; $existingUrl = $this->getPostOriginalUrl($existingPostId); if (empty($existingUrl)) { $this->logDebug("无法获取别名 '{$alias}' 对应文章ID {$existingPostId} 的原始URL,将使用数字后缀"); $alias = $originalAlias . '-' . $counter; $counter++; continue; } $currentDomain = parse_url($currentUrl, PHP_URL_HOST); $existingDomain = parse_url($existingUrl, PHP_URL_HOST); $currentDomain = preg_replace('/^www\./i', '', $currentDomain); $existingDomain = preg_replace('/^www\./i', '', $existingDomain); if (strtolower($currentDomain) === strtolower($existingDomain)) { return ['alias' => $alias, 'error' => "别名已存在且属于同一域名 ({$existingDomain}),内容重复"]; } $this->logDebug("别名 '{$alias}' 已存在,但域名不同(当前:{$currentDomain},已有:{$existingDomain}),将使用数字后缀"); $alias = $originalAlias . '-' . $counter; $counter++; } } private function getPostOriginalUrl($postId) { $db = Database::getInstance(); $navTable = DB_PREFIX . 'chuang_nav'; $tableCheck = $db->query("SHOW TABLES LIKE '{$navTable}'"); if ($db->num_rows($tableCheck) == 0) { return ''; } $sql = "SELECT `value` FROM `{$navTable}` WHERE `gid` = " . intval($postId) . " LIMIT 1"; $row = $db->once_fetch_array($sql); if (!$row) { $sql = "SELECT `value` FROM `{$navTable}` WHERE `id` = " . intval($postId) . " LIMIT 1"; $row = $db->once_fetch_array($sql); } if ($row && !empty($row['value'])) { $navData = @unserialize($row['value']); if (is_array($navData) && isset($navData['chuang_url'])) { return $navData['chuang_url']; } } return ''; } private function normalizeUrlForComparison($url) { $parsed = parse_url($url); if (!$parsed) { return $url; } $scheme = isset($parsed['scheme']) ? $parsed['scheme'] . '://' : ''; $host = isset($parsed['host']) ? $parsed['host'] : ''; $path = isset($parsed['path']) ? $parsed['path'] : ''; $host = preg_replace('/^www\./i', '', $host); $path = rtrim($path, '/'); $normalized = $scheme . $host . $path; return $normalized; } private function urlExistsInNav($url) { $db = Database::getInstance(); $table = DB_PREFIX . "chuang_nav"; $sql = "SELECT `value` FROM `{$table}`"; $res = $db->query($sql); $normalizedInput = $this->normalizeUrlForComparison($url); while ($row = $res->fetch_assoc()) { $data = @unserialize($row['value']); if (is_array($data) && isset($data['chuang_url'])) { $storedUrl = $data['chuang_url']; $normalizedStored = $this->normalizeUrlForComparison($storedUrl); if ($normalizedStored === $normalizedInput) { return true; } } } return false; } private function pushToBing($post_id) { $this->logDebug("准备推送文章ID {$post_id} 到必应搜索引擎 (仅IndexNow)"); if (!file_exists(EMLOG_ROOT . '/content/plugins/chuang_bing/chuang_bing.php')) { $this->logDebug("错误: 必应推送插件未安装 (文件不存在)"); return false; } require_once EMLOG_ROOT . '/content/plugins/chuang_bing/chuang_bing.php'; if (!function_exists('chuang_bing_indexnow_push')) { $this->logDebug("错误: 必应推送插件中的函数 chuang_bing_indexnow_push 不存在"); return false; } $storage = Storage::getInstance('chuang_bing'); $bing_enabled = (int)$storage->getValue('bing_enabled'); if (!$bing_enabled) { $this->logDebug("错误: 必应推送插件未启用IndexNow推送方式"); return false; } $results = array(); $key = $storage->getValue('indexnow_key'); $keyLocation = $storage->getValue('indexnow_key_location'); $host = $storage->getValue('indexnow_host'); $key_masked = $key ? substr($key, 0, 4) . '****' : '空'; $this->logDebug("IndexNow配置: key={$key_masked}, keyLocation={$keyLocation}, host={$host}"); if ($key && $keyLocation && $host) { $url = ''; if (class_exists('Url') && method_exists('Url', 'log')) { $url = Url::log($post_id); $this->logDebug("通过Url::log获取文章URL: " . ($url ?: '空')); } if (empty($url)) { $url = Option::get('blogurl') . '?post=' . $post_id; $this->logDebug("使用备用URL: {$url}"); } if (empty($url)) { $this->logDebug("错误: 无法生成文章URL"); $results[] = "IndexNow: 无法生成URL"; } else { $result = chuang_bing_indexnow_push($url, $key, $keyLocation, $host); $results[] = "IndexNow: " . $result; $this->logDebug("IndexNow推送原始返回: " . $result); } } else { $missing = []; if (empty($key)) $missing[] = 'indexnow_key'; if (empty($keyLocation)) $missing[] = 'indexnow_key_location'; if (empty($host)) $missing[] = 'indexnow_host'; $this->logDebug("错误: IndexNow配置不完整,缺失项: " . implode(', ', $missing)); $results[] = "IndexNow: 配置不完整"; } if (!empty($results) && function_exists('chuang_bing_add_log')) { $db = Database::getInstance(); $sql = "SELECT title FROM " . DB_PREFIX . "blog WHERE gid = {$post_id}"; $row = $db->once_fetch_array($sql); $title = $row ? $row['title'] : ''; $logResult = '批量生成自动推送 - ' . implode('; ', $results); chuang_bing_add_log($title, $url, $logResult); $this->logDebug("已记录推送日志到必应插件"); } $finalResult = implode('; ', $results); $this->logDebug("推送完成: {$finalResult}"); return true; } public function init() { if ($this->_inited) return; $this->_inited = true; if ($this->_mbstringAvailable) { mb_internal_encoding('UTF-8'); } $db = Database::getInstance(); $db->query("SET NAMES utf8mb4"); addAction('adm_menu', function() { echo ''; }); if (isset($_GET['plugin']) && $_GET['plugin'] == self::ID && isset($_GET['batch_action'])) { $this->handleBatchAction(); } if (isset($_GET['plugin']) && $_GET['plugin'] == self::ID && isset($_GET['update_action'])) { $this->handleUpdateBatchAction(); } if (isset($_GET['ai_cron_old']) && $_GET['ai_cron_old'] == '1') { $this->handleCron(); exit; } if (isset($_GET['ai_cron_update']) && $_GET['ai_cron_update'] == '1') { $this->handleUpdateCron(); exit; } if (isset($_GET['plugin']) && $_GET['plugin'] == self::ID) { if (!function_exists('plugin_setting_view')) { require_once __DIR__ . '/chuang_ailoot_setting.php'; } } addAction('adm_head', [$this, 'hookHeader']); } public function hookHeader() { echo ''; } private function handleBatchAction() { $action = Input::getStrVar('batch_action'); $task_index = Input::getIntVar('task_index', -1); switch ($action) { case 'run_task': $this->processSingleTask($task_index); break; case 'run_all': $this->processAllTasks(); break; case 'delete_task': $this->deleteTask($task_index); break; case 'retry_failure': $this->retryFailure($task_index); break; case 'batch_retry_failures': $this->batchRetryFailures(); break; case 'batch_delete_failures': $this->batchDeleteFailures(); break; case 'clear_completed': $this->clearCompletedTasks(); break; case 'clear_pending': $this->clearPendingTasks(); break; case 'clear_failures': $this->clearFailures(); break; case 'add_example_tasks': $this->addExampleTasks(); break; case 'force_pending': $this->forceAllToPending(); break; case 'reset_processing_to_pending': $this->resetProcessingToPending(); break; case 'diagnose_tasks': $this->diagnoseTasks(); break; case 'pause_cron': $this->pauseCron(); break; case 'resume_cron': $this->resumeCron(); break; case 'approve_submission': $this->approveSubmission($task_index); break; case 'reject_submission': $this->rejectSubmission($task_index); break; case 'run_update_task': $this->processSingleUpdateTask($task_index); break; case 'run_all_update_tasks': $this->processAllUpdateTasks(); break; case 'delete_update_task': $this->deleteUpdateTask($task_index); break; case 'clear_completed_update': $this->clearCompletedUpdateTasks(); break; case 'clear_all_update': $this->clearAllUpdateTasks(); break; case 'retry_update_task': $this->retryUpdateTask($task_index); break; } header('Location: ' . BLOG_URL . 'admin/plugin.php?plugin=' . self::ID . '&tab=batch&t=' . time()); exit; } public function addBatchUrls($urls_text) { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('batch_tasks') ?: []; $urls = array_filter(array_map('trim', explode("\n", $urls_text))); $added = 0; foreach ($urls as $url) { if (empty($url) || !filter_var($url, FILTER_VALIDATE_URL)) continue; $exists = false; foreach ($tasks as $task) if ($task['url'] === $url) { $exists = true; break; } if (!$exists) { $tasks[] = [ 'url' => $url, 'status' => 'pending', 'created_at' => time(), 'updated_at' => time(), 'post_id' => 0, 'detailed' => 1, 'retry_count' => 0 ]; $added++; } } if ($added > 0) $storage->setValue('batch_tasks', $tasks, 'array'); return $added; } public function getBatchStats() { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('batch_tasks') ?: []; $failures = $storage->getValue('batch_failures') ?: []; $stats = ['total'=>0, 'pending'=>0, 'processing'=>0, 'completed'=>0, 'failed'=>0, 'exists'=>0, 'failures_count'=>count($failures)]; foreach ($tasks as $task) { $stats['total']++; $s = $task['status'] ?? ''; if (isset($stats[$s])) $stats[$s]++; } return $stats; } public function getBatchTasks() { $storage = Storage::getInstance(self::ID); return $storage->getValue('batch_tasks') ?: []; } public function getFailures() { $storage = Storage::getInstance(self::ID); return $storage->getValue('batch_failures') ?: []; } public function getCronLog() { $storage = Storage::getInstance(self::ID); return $storage->getValue('cron_log') ?: []; } public function addPendingSubmission($url) { if (empty($url) || !filter_var($url, FILTER_VALIDATE_URL)) return false; $storage = Storage::getInstance(self::ID); $pending = $storage->getValue('pending_submissions') ?: []; foreach ($pending as $item) { if ($item['url'] === $url) return false; } $pending[] = [ 'url' => $url, 'submit_time' => time(), 'status' => 'pending' ]; $storage->setValue('pending_submissions', $pending, 'array'); return true; } public function getPendingSubmissions() { $storage = Storage::getInstance(self::ID); return $storage->getValue('pending_submissions') ?: []; } public function approveSubmission($index) { $storage = Storage::getInstance(self::ID); $pending = $storage->getValue('pending_submissions') ?: []; if (isset($pending[$index])) { $url = $pending[$index]['url']; $this->addBatchUrls($url); unset($pending[$index]); $pending = array_values($pending); $storage->setValue('pending_submissions', $pending, 'array'); $_SESSION['chuang_ailoot_message'] = "已通过审核,URL已加入任务队列:{$url}"; } } public function rejectSubmission($index) { $storage = Storage::getInstance(self::ID); $pending = $storage->getValue('pending_submissions') ?: []; if (isset($pending[$index])) { $url = $pending[$index]['url']; unset($pending[$index]); $pending = array_values($pending); $storage->setValue('pending_submissions', $pending, 'array'); $_SESSION['chuang_ailoot_message'] = "已拒绝提交:{$url}"; } } public function handleCron() { header('Content-Type: application/json; charset=utf-8'); set_time_limit(600); $token = isset($_GET['token']) ? trim($_GET['token']) : ''; if (!$this->verifyCronToken($token)) exit(json_encode(['success'=>false,'message'=>'Token验证失败'])); $storage = Storage::getInstance(self::ID); $paused = $storage->getValue('cron_paused', false); if ($paused) { $this->logDebug('定时任务已暂停,跳过执行'); exit(json_encode(['success'=>false,'message'=>'定时任务已暂停'])); } $start = time(); $max_execution_time = 180; $success_count = 0; $fail_count = 0; $results = []; $this->logDebug('定时任务开始(连续处理模式)'); while (true) { if (time() - $start >= $max_execution_time) { $this->logDebug("达到最大执行时间,停止处理"); break; } $result = $this->processBatchTask(); if (!$result['success']) { if ($result['message'] === '无待处理任务') { $this->logDebug("无待处理任务,结束循环"); break; } $fail_count++; $results[] = $result; continue; } $success_count++; $results[] = $result; usleep(500000); } $total_time = time() - $start; $this->logDebug("定时任务结束,耗时:{$total_time}秒,成功:{$success_count},失败:{$fail_count}"); $summary = [ 'success' => true, 'message' => "定时任务执行完毕,成功:{$success_count},失败:{$fail_count},耗时:{$total_time}秒", 'success_count' => $success_count, 'fail_count' => $fail_count, 'results' => $results, ]; $this->logCronExecution($summary); echo json_encode($summary); exit; } public function generateCronToken() { return md5(Option::get('site_key') . self::ID . '_cron'); } private function verifyCronToken($token) { return $token === $this->generateCronToken(); } private function logCronExecution($result) { $storage = Storage::getInstance(self::ID); $log = $storage->getValue('cron_log') ?: []; array_unshift($log, [ 'time' => time(), 'success' => $result['success'], 'message' => $result['message'] ?? '', 'success_count' => $result['success_count'] ?? 0, 'fail_count' => $result['fail_count'] ?? 0, ]); $log = array_slice($log, 0, 50); $storage->setValue('cron_log', $log, 'array'); } public function processBatchTask() { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('batch_tasks') ?: []; $index = null; foreach ($tasks as $i => $task) if ($task['status'] === 'pending') { $index = $i; break; } if ($index === null) { foreach ($tasks as $i => $task) if ($task['status'] === 'failed' && ($task['retry_count']??0) < 3) { $index = $i; break; } } if ($index === null) return ['success'=>false, 'message'=>'无待处理任务', 'details'=>json_encode($this->getBatchStats())]; $url = $tasks[$index]['url']; if ($this->urlExistsInNav($url)) { $tasks[$index]['status'] = 'exists'; $tasks[$index]['error'] = 'URL已收录(忽略www和结尾斜杠后匹配),跳过生成'; $tasks[$index]['updated_at'] = time(); $storage->setValue('batch_tasks', $tasks, 'array'); $this->logDebug("任务跳过,URL已存在(规范化后匹配): {$url}"); return ['success'=>false, 'message'=>'URL已收录,跳过生成', 'details'=>"URL: {$url}"]; } $tasks[$index]['status'] = 'processing'; $tasks[$index]['updated_at'] = time(); $tasks[$index]['retry_count'] = ($tasks[$index]['retry_count']??0) + 1; $storage->setValue('batch_tasks', $tasks, 'array'); $success = false; $post_id = null; $error_msg = ''; try { $detailed = $tasks[$index]['detailed'] ?? 1; $this->logDebug("开始处理任务 #{$index} : {$url}"); $result = $this->generateArticleFromUrl($url, $detailed); if (!$result['success']) throw new Exception($result['error'] ?? '生成失败'); $rawAlias = $result['raw_alias'] ?? ''; if (!empty($rawAlias)) { $aliasCheck = $this->ensureUniqueAliasWithDomainCheck($rawAlias, $url); if ($aliasCheck['error'] !== null) { throw new Exception($aliasCheck['error']); } $result['alias'] = $aliasCheck['alias']; } $post_id = $this->saveArticle($result); if (!$post_id) throw new Exception('文章保存失败'); $success = true; $this->logDebug("任务完成,文章ID: {$post_id}"); } catch (Exception $e) { $error_msg = $e->getMessage(); $this->logDebug("任务失败: " . $error_msg); } $tasks = $storage->getValue('batch_tasks') ?: []; if (isset($tasks[$index])) { if ($success) { $tasks[$index]['status'] = 'completed'; $tasks[$index]['post_id'] = $post_id; $tasks[$index]['error'] = ''; } else { if (strpos($error_msg, '别名已存在') !== false) { $tasks[$index]['retry_count'] = 3; } $tasks[$index]['status'] = 'failed'; $tasks[$index]['error'] = $error_msg; $failures = $storage->getValue('batch_failures') ?: []; $failures[] = [ 'url' => $tasks[$index]['url'], 'error' => $error_msg, 'failed_at' => time(), 'detailed' => $tasks[$index]['detailed'] ?? 1, 'retry_count' => $tasks[$index]['retry_count'] ?? 0 ]; $storage->setValue('batch_failures', $failures, 'array'); } $tasks[$index]['updated_at'] = time(); $storage->setValue('batch_tasks', $tasks, 'array'); $this->logDebug("任务状态已更新: index={$index}, status={$tasks[$index]['status']}"); } if ($success) { return ['success'=>true, 'message'=>"文章生成成功 ID: {$post_id}", 'post_id'=>$post_id, 'details'=>"URL: {$url}"]; } else { return ['success'=>false, 'message'=>$error_msg, 'details'=>"URL: {$url}"]; } } private function saveArticle($aiData) { $db = Database::getInstance(); $db->query("SET NAMES utf8mb4"); $aiData['title'] = self::cleanUtf8($aiData['title']); $aiData['content'] = self::cleanUtf8($aiData['content']); $aiData['excerpt'] = self::cleanUtf8($aiData['excerpt'] ?? ''); $aiData['tags'] = self::cleanUtf8($aiData['tags'] ?? ''); $excerpt = ''; if (!empty($aiData['excerpt'])) { $excerpt = strip_tags($aiData['excerpt']); $excerpt = html_entity_decode($excerpt, ENT_QUOTES, 'UTF-8'); $excerpt = mb_substr($excerpt, 0, 15, 'UTF-8'); } $tags = ''; if (!empty($aiData['tags'])) { $tags = trim($aiData['tags'], ', '); $tags = preg_replace('/,+/', ',', $tags); $tags = trim($tags, ', '); } if (!empty($aiData['alias'])) { $alias = $aiData['alias']; } else { $alias = $this->generateAlias($aiData['title']); if (!preg_match('/^[a-zA-Z0-9_-]+$/', $alias)) { $alias = ''; } if (empty($alias)) { $alias = 'post-' . time(); } } $logData = [ 'title' => $aiData['title'], 'content' => $aiData['content'], 'excerpt' => $excerpt, 'author' => 1, 'date' => time(), 'checked' => 'y', 'allow_remark' => 'y', 'hide' => 'n', 'sortid' => 0, 'alias' => $alias, ]; $log_model = new Log_Model(); $post_id = $log_model->addlog($logData); if (!$post_id) return false; if (!empty($aiData['category_ids']) && is_array($aiData['category_ids'])) { $cat_ids = array_map('intval', $aiData['category_ids']); $cat_ids = array_unique($cat_ids); $this->saveMultiCategories($post_id, $cat_ids); } if (!empty($tags)) { $tag_model = new Tag_Model(); $tag_model->addTag($tags, $post_id); } if (!empty($aiData['cover_url'])) { $this->setPostCover($post_id, $aiData['cover_url']); } if (!empty($aiData['seo_title']) || !empty($aiData['seo_description'])) { $this->saveTdk($post_id, $aiData['seo_title'] ?? '', $aiData['seo_description'] ?? '', ''); } $this->saveNavFields($post_id, $aiData); if (class_exists('Cache')) { try { Cache::getInstance()->updateCache(); $this->logDebug("全站缓存已刷新"); } catch (Exception $e) { $this->logDebug("缓存刷新失败: " . $e->getMessage()); } } $this->pushToBing($post_id); return $post_id; } private function saveNavFields($post_id, $aiData) { $db = Database::getInstance(); $navTable = DB_PREFIX . 'chuang_nav'; $tableCheck = $db->query("SHOW TABLES LIKE '{$navTable}'"); if ($db->num_rows($tableCheck) == 0) { $this->logDebug("导航表 {$navTable} 不存在,无法保存导航字段"); return; } $navData = [ 'chuang_url' => $aiData['url'] ?? '', 'is_ai' => $aiData['nav_fields']['is_ai'] ?? 'unknown', 'is_featured' => 'no', 'location' => $aiData['nav_fields']['location'] ?? '', 'update_time' => time(), ]; $this->logDebug("准备保存导航字段: " . json_encode($navData, JSON_UNESCAPED_UNICODE)); if (class_exists('ChuangNavClass')) { try { $nav = ChuangNavClass::getInstance(); $nav->set_data($post_id, $navData); $this->logDebug("通过 ChuangNavClass 保存导航字段成功,文章ID: {$post_id}"); return; } catch (Exception $e) { $this->logDebug("通过 ChuangNavClass 保存导航字段失败: " . $e->getMessage() . ",将尝试直接数据库操作"); } } $value = serialize($navData); $value = $db->escape_string($value); $checkSql = "SELECT id FROM {$navTable} WHERE id = {$post_id}"; $checkRes = $db->query($checkSql); if ($db->num_rows($checkRes) > 0) { $sql = "UPDATE {$navTable} SET `value` = '{$value}', `update_time` = " . time() . " WHERE id = {$post_id}"; } else { $sql = "INSERT INTO {$navTable} (id, `value`, `update_time`) VALUES ({$post_id}, '{$value}', " . time() . ")"; } if ($db->query($sql)) { $this->logDebug("直接数据库操作保存导航字段成功,文章ID: {$post_id}"); } else { $this->logDebug("直接数据库操作保存导航字段失败: " . $db->error()); } } private function saveMultiCategories($post_id, $category_ids) { try { $db = Database::getInstance(); $table = DB_PREFIX . 'multi_category'; $check = $db->query("SHOW TABLES LIKE '{$table}'"); if ($db->num_rows($check) == 0) return false; $db->query("DELETE FROM {$table} WHERE gid = {$post_id}"); $valid_ids = array_filter(array_map('intval', $category_ids)); foreach ($valid_ids as $cid) { if ($cid > 0) { $db->query("INSERT INTO {$table} (gid, sid) VALUES ({$post_id}, {$cid})"); } } if (!empty($valid_ids)) { $main_cat = intval($valid_ids[0]); $db->query("UPDATE " . DB_PREFIX . "blog SET sortid = {$main_cat} WHERE gid = {$post_id}"); } return true; } catch (Exception $e) { $this->logDebug("多分类保存失败: " . $e->getMessage()); return false; } } private function saveTdk($post_id, $title, $description, $keywords = '') { try { $db = Database::getInstance(); $table = DB_PREFIX . 'chuang_tdk_data'; $check = $db->query("SHOW TABLES LIKE '{$table}'"); if ($db->num_rows($check) == 0) return false; $title = $db->escape_string(self::cleanUtf8($title)); $description = $db->escape_string(self::cleanUtf8($description)); $keywords = $db->escape_string(self::cleanUtf8($keywords)); $sql = "INSERT INTO {$table} (gid, t, d, k) VALUES ({$post_id}, '{$title}', '{$description}', '{$keywords}') ON DUPLICATE KEY UPDATE t='{$title}', d='{$description}', k='{$keywords}'"; $db->query($sql); return true; } catch (Exception $e) { $this->logDebug("TDK保存失败: " . $e->getMessage()); return false; } } private function setPostCover($post_id, $image_url) { if (empty($image_url)) return false; try { if (strpos($image_url, 'http') !== 0) { $cover_url = $image_url; } else { $dir_name = gmdate('Ym'); $upload_path = Option::UPLOADFILE_FULL_PATH . $dir_name . '/'; if (!is_dir($upload_path)) mkdir($upload_path, 0755, true); $path_info = pathinfo(parse_url($image_url, PHP_URL_PATH)); $ext = isset($path_info['extension']) ? preg_replace('/[^a-zA-Z0-9]/', '', $path_info['extension']) : 'jpg'; if (!in_array($ext, ['jpg','jpeg','png','gif','webp','ico'])) $ext = 'jpg'; $filename = substr(md5($image_url . time()), 0, 12) . '_' . time() . '.' . $ext; $file_path = $upload_path . $filename; $ch = curl_init($image_url); curl_setopt_array($ch, [ CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_TIMEOUT => 30, CURLOPT_USERAGENT => self::USER_AGENT, CURLOPT_SSL_VERIFYPEER => false ]); $img = curl_exec($ch); $http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE); curl_close($ch); if ($http_code != 200 || empty($img)) return false; if (!file_put_contents($file_path, $img)) return false; $cover_url = Option::UPLOADFILE_PATH . $dir_name . '/' . $filename; } $db = Database::getInstance(); $cover_url = $db->escape_string($cover_url); $db->query("UPDATE " . DB_PREFIX . "blog SET cover = '{$cover_url}' WHERE gid = {$post_id}"); return true; } catch (Exception $e) { $this->logDebug("设置封面失败: " . $e->getMessage()); return false; } } // ========== 网站截图下载功能(含水印,支持PNG/WebP/JPG,缩放比例 1/3,透明背景修复) ========== private function downloadScreenshot($url, $width = 1200, $height = 800, $format = 'webp') { if (!self::ENABLE_SCREENSHOT) { $this->logDebug("截图功能已禁用,跳过"); return ''; } $this->logDebug("开始下载网站截图: {$url}"); $apiUrl = self::SCREENSHOT_API_URL . '?' . http_build_query([ 'url' => $url, 'format' => $format, 'width' => $width, 'height' => $height, ]); try { $imageData = $this->downloadScreenshotData($apiUrl); if (empty($imageData)) { $this->logDebug("截图 API 返回空数据"); return ''; } if (strlen($imageData) < 1024) { $this->logDebug("截图数据过小,可能下载失败"); return ''; } // 保存临时文件 $tmpFile = tempnam(sys_get_temp_dir(), 'screenshot_') . '.' . $format; if (!file_put_contents($tmpFile, $imageData)) { $this->logDebug("临时截图文件写入失败"); return ''; } // 添加水印 $watermarkedFile = $this->addWatermarkToImage($tmpFile, $format); if (!$watermarkedFile) { $this->logDebug("水印添加失败,将使用原始截图"); $watermarkedFile = $tmpFile; } // 移动到正式目录 $dir = gmdate('Ym'); $fullDir = Option::UPLOADFILE_FULL_PATH . $dir . '/'; if (!is_dir($fullDir)) { mkdir($fullDir, 0755, true); } $filename = 'screenshot_' . md5($url . time()) . '.' . $format; $filepath = $fullDir . $filename; if (copy($watermarkedFile, $filepath)) { $localUrl = Option::UPLOADFILE_PATH . $dir . '/' . $filename; $this->logDebug("截图保存成功(已添加水印): {$localUrl}"); @unlink($tmpFile); if ($watermarkedFile !== $tmpFile) @unlink($watermarkedFile); return $localUrl; } else { $this->logDebug("截图文件移动失败"); @unlink($tmpFile); if ($watermarkedFile !== $tmpFile) @unlink($watermarkedFile); return ''; } } catch (Exception $e) { $this->logDebug("截图下载异常: " . $e->getMessage()); return ''; } } private function downloadScreenshotData($apiUrl) { $ch = curl_init(); curl_setopt_array($ch, [ CURLOPT_URL => $apiUrl, CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_TIMEOUT => 30, CURLOPT_CONNECTTIMEOUT => 10, CURLOPT_USERAGENT => self::USER_AGENT, CURLOPT_SSL_VERIFYPEER => false, CURLOPT_HTTPHEADER => [ 'Accept: image/webp,image/apng,image/*,*/*;q=0.8', ], ]); $data = curl_exec($ch); $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE); $error = curl_error($ch); curl_close($ch); if ($error) { $this->logDebug("截图 cURL 错误: {$error}"); return ''; } if ($httpCode != 200) { $this->logDebug("截图 API 返回 HTTP {$httpCode}"); return ''; } return $data; } /** * 检测图片格式(通过文件头魔数) */ private function detectImageFormat($data) { $header = substr($data, 0, 12); if (strpos($header, 'RIFF') !== false && strpos($header, 'WEBP') !== false) { return 'webp'; } elseif (strpos($header, "\x89PNG\r\n\x1a\n") !== false) { return 'png'; } elseif (strpos($header, "\xff\xd8\xff") !== false) { return 'jpg'; } return 'png'; // 默认 } /** * 给图片添加水印(右上角,缩放至 1/3,支持PNG透明通道,修复黑色背景问题) */ private function addWatermarkToImage($imagePath, $format) { if (!function_exists('imagecreatetruecolor')) { $this->logDebug("GD库不可用,无法添加水印"); return false; } // 下载水印图片 $watermarkData = $this->downloadWatermark(); if (!$watermarkData) { $this->logDebug("水印图片下载失败"); return false; } // 检测水印图片的实际格式 $extension = $this->detectImageFormat($watermarkData); $this->logDebug("检测到水印图片格式: {$extension}"); // 保存临时水印文件 $watermarkTmp = tempnam(sys_get_temp_dir(), 'watermark_') . '.' . $extension; if (!file_put_contents($watermarkTmp, $watermarkData)) { $this->logDebug("临时水印文件写入失败"); return false; } try { // 加载原始图片 switch (strtolower($format)) { case 'png': $srcImage = @imagecreatefrompng($imagePath); break; case 'jpg': case 'jpeg': $srcImage = @imagecreatefromjpeg($imagePath); break; case 'webp': $srcImage = @imagecreatefromwebp($imagePath); break; default: $srcImage = @imagecreatefromstring(file_get_contents($imagePath)); break; } if (!$srcImage) { $this->logDebug("无法加载原始截图"); @unlink($watermarkTmp); return false; } // 根据实际格式加载水印图片 $watermarkImage = null; switch ($extension) { case 'webp': $watermarkImage = @imagecreatefromwebp($watermarkTmp); break; case 'png': $watermarkImage = @imagecreatefrompng($watermarkTmp); break; case 'jpg': $watermarkImage = @imagecreatefromjpeg($watermarkTmp); break; default: $watermarkImage = @imagecreatefromstring($watermarkData); break; } if (!$watermarkImage) { $this->logDebug("无法加载水印图片,格式: {$extension}"); imagedestroy($srcImage); @unlink($watermarkTmp); return false; } // 保留水印图片的透明度 imagealphablending($watermarkImage, false); imagesavealpha($watermarkImage, true); // 获取尺寸 $srcWidth = imagesx($srcImage); $srcHeight = imagesy($srcImage); $wmWidth = imagesx($watermarkImage); $wmHeight = imagesy($watermarkImage); // 强制将水印缩放到原始尺寸的三分之一 $targetScale = self::WATERMARK_SCALE; $newWmWidth = (int)($wmWidth * $targetScale); $newWmHeight = (int)($wmHeight * $targetScale); // ========== 关键修复:创建透明背景的画布 ========== $resizedWatermark = imagecreatetruecolor($newWmWidth, $newWmHeight); // 关闭默认的 alpha 混合,以便独立设置透明度 imagealphablending($resizedWatermark, false); // 保存完整的 alpha 通道信息 imagesavealpha($resizedWatermark, true); // 用完全透明的颜色填充整个画布(关键一步!) $transparent = imagecolorallocatealpha($resizedWatermark, 0, 0, 0, 127); imagefill($resizedWatermark, 0, 0, $transparent); // 将原始水印缩放并复制到透明画布上 imagecopyresampled($resizedWatermark, $watermarkImage, 0, 0, 0, 0, $newWmWidth, $newWmHeight, $wmWidth, $wmHeight); imagedestroy($watermarkImage); $watermarkImage = $resizedWatermark; $wmWidth = $newWmWidth; $wmHeight = $newWmHeight; // 如果缩放后仍大于截图宽度的30%,再按比例缩小 if ($wmWidth > $srcWidth * 0.3) { $scale = ($srcWidth * 0.3) / $wmWidth; $newWmWidth2 = (int)($wmWidth * $scale); $newWmHeight2 = (int)($wmHeight * $scale); $resizedWatermark2 = imagecreatetruecolor($newWmWidth2, $newWmHeight2); // 同样填充透明背景 imagealphablending($resizedWatermark2, false); imagesavealpha($resizedWatermark2, true); $transparent2 = imagecolorallocatealpha($resizedWatermark2, 0, 0, 0, 127); imagefill($resizedWatermark2, 0, 0, $transparent2); imagecopyresampled($resizedWatermark2, $watermarkImage, 0, 0, 0, 0, $newWmWidth2, $newWmHeight2, $wmWidth, $wmHeight); imagedestroy($watermarkImage); $watermarkImage = $resizedWatermark2; $wmWidth = $newWmWidth2; $wmHeight = $newWmHeight2; } // 计算水印位置(右上角,留边距) $destX = $srcWidth - $wmWidth - self::WATERMARK_MARGIN; $destY = self::WATERMARK_MARGIN; // 启用 Alpha 混合 imagealphablending($srcImage, true); imagesavealpha($srcImage, true); // 复制水印到原图 $opacity = self::WATERMARK_OPACITY; if ($opacity < 100) { imagecopymerge($srcImage, $watermarkImage, $destX, $destY, 0, 0, $wmWidth, $wmHeight, $opacity); } else { imagecopy($srcImage, $watermarkImage, $destX, $destY, 0, 0, $wmWidth, $wmHeight); } // 保存带水印的图片 $outputPath = tempnam(sys_get_temp_dir(), 'watermarked_') . '.' . $format; switch (strtolower($format)) { case 'png': imagepng($srcImage, $outputPath); break; case 'jpg': case 'jpeg': imagejpeg($srcImage, $outputPath, 90); break; case 'webp': imagewebp($srcImage, $outputPath, 90); break; default: imagewebp($srcImage, $outputPath, 90); break; } imagedestroy($srcImage); imagedestroy($watermarkImage); @unlink($watermarkTmp); return $outputPath; } catch (Exception $e) { $this->logDebug("水印添加异常: " . $e->getMessage()); @unlink($watermarkTmp); return false; } } /** * 下载水印图片数据(增强版,添加 Referer 防盗链处理) */ private function downloadWatermark() { $ch = curl_init(self::WATERMARK_URL); curl_setopt_array($ch, [ CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_TIMEOUT => 15, CURLOPT_USERAGENT => self::USER_AGENT, CURLOPT_SSL_VERIFYPEER => false, CURLOPT_REFERER => 'https://cxgn.cn/', CURLOPT_HTTPHEADER => [ 'Accept: image/webp,image/apng,image/*,*/*;q=0.8', 'Accept-Language: zh-CN,zh;q=0.9', ], ]); $data = curl_exec($ch); $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE); $error = curl_error($ch); curl_close($ch); if ($httpCode != 200 || empty($data)) { $this->logDebug("水印图片下载失败,HTTP: {$httpCode}, 错误: {$error}"); return false; } return $data; } private function insertScreenshotIntoContent($content, $screenshotUrl) { if (empty($screenshotUrl)) { return $content; } $needle = '### 工具介绍'; $pos = mb_strpos($content, $needle); if ($pos === false) { $this->logDebug("未找到「### 工具介绍」标题,截图将插入到内容末尾"); return $content . "\n\n" . $this->buildScreenshotHtml($screenshotUrl); } $nextSectionPos = mb_strpos($content, "\n###", $pos + mb_strlen($needle)); if ($nextSectionPos === false) { $insertPos = mb_strlen($content); } else { $insertPos = $nextSectionPos; } $screenshotHtml = "\n\n" . $this->buildScreenshotHtml($screenshotUrl) . "\n\n"; $newContent = mb_substr($content, 0, $insertPos) . $screenshotHtml . mb_substr($content, $insertPos); $this->logDebug("截图已插入到「工具介绍」之后"); return $newContent; } private function buildScreenshotHtml($screenshotUrl) { return '

网站截图

'; } private function generateArticleFromUrl($url, $detailed = true, $skip_icon = false, $existing_post_id = 0, $retry_mode = false) { try { $websiteData = $this->fetchWebContent($url); if (isset($websiteData['effective_url']) && !$this->isSameDomain($url, $websiteData['effective_url'])) { throw new Exception('目标URL跳转到其他域名,跳过生成(原域名:' . parse_url($url, PHP_URL_HOST) . ',跳转后:' . parse_url($websiteData['effective_url'], PHP_URL_HOST) . ')'); } $invalidSitePatterns = [ 'domain is parked', 'domain for sale', 'buy this domain', 'under construction', 'site can’t be reached', '暂时无法访问', '域名出售', '该域名已过期', 'this domain is expired', 'parked page', 'sedo parking', ]; $htmlLower = strtolower($websiteData['html']); foreach ($invalidSitePatterns as $pattern) { if (strpos($htmlLower, strtolower($pattern)) !== false) { throw new Exception('目标网站已失效或为停放页面,跳过处理'); } } $contentLength = mb_strlen($websiteData['content'] ?? '', 'UTF-8'); $isSimpleStrategy = isset($websiteData['strategy']) && $websiteData['strategy'] === 'simple'; if (!$isSimpleStrategy && $contentLength < 30) { if (!$retry_mode) { throw new Exception('抓取到的网页正文过短(' . $contentLength . '字),无法生成有效内容'); } $this->logDebug("重试模式:内容过短但继续生成(当前长度: {$contentLength}字)"); } if ($isSimpleStrategy) { $this->logDebug("使用simple策略,跳过正文长度检查(当前长度: {$contentLength}字)", "抓取策略"); } if ($retry_mode && $contentLength < 100) { $this->logDebug("重试模式:内容较少,尝试提取更多元信息..."); $extraInfo = $this->extractExtraMetaInfo($url, $websiteData['html']); if (!empty($extraInfo)) { $websiteData = array_merge($websiteData, $extraInfo); $this->logDebug("重试模式:提取到额外信息 - " . json_encode(array_keys($extraInfo))); } } $cover_url = ''; if (!$skip_icon) { try { $cover_url = $this->downloadFavicon($url, $websiteData['html']); if (empty($cover_url)) { $this->logDebug("图标下载失败,但任务将继续(无封面)"); } } catch (Exception $e) { $this->logDebug("图标下载异常,任务将继续(无封面): " . $e->getMessage()); $cover_url = ''; } } else { if ($existing_post_id > 0) { $db = Database::getInstance(); $sql = "SELECT cover FROM " . DB_PREFIX . "blog WHERE gid = {$existing_post_id}"; $row = $db->once_fetch_array($sql); if (!empty($row['cover'])) { $cover_url = $row['cover']; $this->logDebug("更新任务:保留原封面 {$cover_url}"); } } $this->logDebug("更新任务:跳过图标下载"); } // ========== 下载网站截图(含水印) ========== $screenshot_url = ''; if (self::ENABLE_SCREENSHOT) { try { $screenshot_url = $this->downloadScreenshot($url, 1200, 800, 'webp'); if (empty($screenshot_url)) { $this->logDebug("截图下载失败,文章将不含截图"); } } catch (Exception $e) { $this->logDebug("截图下载异常: " . $e->getMessage()); } } if (!class_exists('AI')) throw new Exception('AI功能未配置'); $ai_config = AI::getCurrentModelInfo(); if (empty($ai_config['api_key'])) throw new Exception('请先在系统设置中配置AI'); $prompt = $detailed ? $this->generateOriginalPrompt($url, $websiteData, $retry_mode) : $this->generateSimplePrompt($url, $websiteData, $retry_mode); $content = $this->callAI($prompt); if (empty($content)) throw new Exception('AI返回内容为空'); if (!$this->isAIContentComplete($content, $detailed)) { $this->logDebug("AI生成内容不完整,尝试重试一次"); sleep(2); $content = $this->callAI($prompt); if (empty($content) || !$this->isAIContentComplete($content, $detailed)) { throw new Exception('AI生成内容不完整,可能由于超时或限制,已跳过发布'); } } $processed = $this->processAIResponse($content, $websiteData); $coreKeywords = $this->extractCoreKeywordsFromPage($url, $websiteData); if (!empty($coreKeywords)) { $combinedText = $processed['title'] . ' ' . $processed['content'] . ' ' . ($processed['seo_title'] ?? '') . ' ' . ($processed['seo_description'] ?? ''); $matchCount = 0; foreach ($coreKeywords as $keyword) { if (mb_strpos($combinedText, $keyword) !== false) { $matchCount++; } } $threshold = ceil(count($coreKeywords) * 0.3); if ($matchCount < $threshold) { $this->logDebug("警告:生成的文章可能偏离页面主题,仅匹配到 {$matchCount}/" . count($coreKeywords) . " 个核心关键词"); } } if (!$this->isValidArticleContent($processed['content'])) { throw new Exception('AI生成的内容无效(包含过多占位信息或无实质内容)'); } // 插入截图到内容中 if (!empty($screenshot_url)) { $processed['content'] = $this->insertScreenshotIntoContent($processed['content'], $screenshot_url); } $rawAlias = $this->generateAlias($processed['title']); $this->logDebug("生成文章数据,URL: {$url}"); return [ 'success' => true, 'url' => $url, 'title' => self::cleanUtf8($processed['title']), 'content' => self::cleanUtf8($processed['content']), 'excerpt' => self::cleanUtf8($processed['excerpt']), 'tags' => self::cleanUtf8($processed['tags']), 'category_ids' => $processed['category_ids'], 'cover_url' => $cover_url, 'screenshot_url' => $screenshot_url, 'alias' => $rawAlias, 'raw_alias' => $rawAlias, 'seo_title' => self::cleanUtf8($processed['seo_title'] ?? ''), 'seo_description' => self::cleanUtf8($processed['seo_description'] ?? ''), 'nav_fields' => $processed['nav_fields'], ]; } catch (Exception $e) { return ['success' => false, 'error' => $e->getMessage()]; } } private function isAIContentComplete($content, $detailed = true) { if (empty($content)) return false; if ($detailed) { $requiredSections = [ '### 工具介绍', '### 核心功能', '### 使用场景', '### 适用人群', '### 独特优势', '### 实测体验' ]; $missing = []; foreach ($requiredSections as $section) { if (strpos($content, $section) === false) { $missing[] = $section; } } if (count($missing) > 2) { $this->logDebug("AI内容缺失关键章节: " . implode(', ', $missing)); return false; } $bodyStart = strpos($content, '### 工具介绍'); if ($bodyStart !== false) { $body = substr($content, $bodyStart); } else { $body = $content; } if (mb_strlen($body, 'UTF-8') < 500) { $this->logDebug("AI生成正文过短: " . mb_strlen($body) . " 字符"); return false; } } else { if (mb_strlen($content, 'UTF-8') < 100) { return false; } } return true; } private function extractCoreKeywordsFromPage($url, $websiteData) { $keywords = []; if (!empty($websiteData['title'])) { $titleWords = preg_split('/[\s,,.。、::]+/u', $websiteData['title'], -1, PREG_SPLIT_NO_EMPTY); foreach ($titleWords as $word) { if (mb_strlen($word) > 2 && !in_array($word, $keywords)) { $keywords[] = $word; } } } if (!empty($websiteData['desc'])) { $descWords = preg_split('/[\s,,.。、::]+/u', $websiteData['desc'], -1, PREG_SPLIT_NO_EMPTY); foreach ($descWords as $word) { if (mb_strlen($word) > 2 && !in_array($word, $keywords)) { $keywords[] = $word; } } } return array_slice($keywords, 0, 5); } private function isSameDomain($url1, $url2) { $host1 = parse_url($url1, PHP_URL_HOST); $host2 = parse_url($url2, PHP_URL_HOST); if (!$host1 || !$host2) return false; $host1 = preg_replace('/^www\./i', '', $host1); $host2 = preg_replace('/^www\./i', '', $host2); return strtolower($host1) === strtolower($host2); } private function extractExtraMetaInfo($url, $html) { $extra = []; if (preg_match('/]*property=["\']og:title["\'][^>]*content=["\']([^"\']+)["\'][^>]*>/i', $html, $m)) { $extra['og_title'] = $m[1]; } elseif (preg_match('/]*content=["\']([^"\']+)["\'][^>]*property=["\']og:title["\'][^>]*>/i', $html, $m)) { $extra['og_title'] = $m[1]; } if (preg_match('/]*property=["\']og:description["\'][^>]*content=["\']([^"\']+)["\'][^>]*>/i', $html, $m)) { $extra['og_desc'] = $m[1]; } elseif (preg_match('/]*content=["\']([^"\']+)["\'][^>]*property=["\']og:description["\'][^>]*>/i', $html, $m)) { $extra['og_desc'] = $m[1]; } if (preg_match('/]*name=["\']twitter:title["\'][^>]*content=["\']([^"\']+)["\'][^>]*>/i', $html, $m)) { $extra['twitter_title'] = $m[1]; } if (preg_match('/]*name=["\']twitter:description["\'][^>]*content=["\']([^"\']+)["\'][^>]*>/i', $html, $m)) { $extra['twitter_desc'] = $m[1]; } if (preg_match('/]*name=["\']application-name["\'][^>]*content=["\']([^"\']+)["\'][^>]*>/i', $html, $m)) { $extra['app_name'] = $m[1]; } if (preg_match('/]*>(.*?)<\/h1>/is', $html, $m)) { $h1 = strip_tags($m[1]); $h1 = trim(preg_replace('/\s+/', ' ', $h1)); if (!empty($h1)) { $extra['h1'] = $h1; } } $h2_tags = []; if (preg_match_all('/]*>(.*?)<\/h2>/is', $html, $matches)) { foreach ($matches[1] as $h2) { $h2 = strip_tags($h2); $h2 = trim(preg_replace('/\s+/', ' ', $h2)); if (!empty($h2) && mb_strlen($h2) > 3) { $h2_tags[] = $h2; } } } if (!empty($h2_tags)) { $extra['h2_tags'] = implode(' | ', array_slice($h2_tags, 0, 3)); } if (preg_match('/]*property=["\']article:published_time["\'][^>]*content=["\']([^"\']+)["\'][^>]*>/i', $html, $m)) { $extra['published_time'] = $m[1]; } return $extra; } private function isValidArticleContent($content) { if (empty($content)) { return false; } $content = trim($content); $length = mb_strlen($content, 'UTF-8'); if ($length < 10) { $this->logDebug("内容过短({$length}字符),判定为无效"); return false; } $invalidPhrases = ['页面未找到', '404', 'page not found', 'not found']; $invalidCount = 0; foreach ($invalidPhrases as $phrase) { $count = substr_count(mb_strtolower($content), mb_strtolower($phrase)); $invalidCount += $count; } if ($invalidCount > 10) { $this->logDebug("内容包含过多无效关键词({$invalidCount}次),判定为无效"); return false; } $cleanContent = strip_tags($content); $cleanContent = preg_replace('/\s+/', ' ', $cleanContent); $cleanContent = trim($cleanContent); if (mb_strlen($cleanContent) < 5) { $this->logDebug("清理后的内容过短(" . mb_strlen($cleanContent) . "字符),判定为无效"); return false; } $paragraphs = preg_split('/\n+/', $content); $validParagraphs = 0; foreach ($paragraphs as $para) { $para = trim($para); $para = strip_tags($para); $para = preg_replace('/[#*]+\s*/', '', $para); $para = preg_replace('/\s+/', ' ', $para); $para = trim($para); if (mb_strlen($para) > 15) { $validParagraphs++; } } if ($validParagraphs < 1) { $this->logDebug("有效段落过少({$validParagraphs}个),判定为无效"); return false; } return true; } private function fetchWebContent($url) { $methods = ['curlWithRetry', 'curlRobust', 'curlSimple', 'curlAntiBot', 'fileGet']; foreach ($methods as $method) { try { $result = $this->$method($url); if (!empty($result['html']) && strlen($result['html']) > 100) { $html = $this->convertHtmlToUtf8($result['html'], $url); $simple_data = $this->extractSimpleContent($html, $url); if ($simple_data !== null) { $simple_data['html'] = $html; $simple_data['effective_url'] = $result['effective_url']; $this->logDebug("使用唐僧插件风格简单提取成功,URL: {$url}", "抓取策略"); return $simple_data; } $data = $this->extractWebsiteData($html); $data['html'] = $html; $data['effective_url'] = $result['effective_url']; if (mb_strlen($data['content']) < 20) { $this->logDebug("抓取到的内容过短(" . mb_strlen($data['content']) . "字符),URL: {$url}"); $fallback_content = $this->extractFallbackContent($html); if (mb_strlen($fallback_content) > mb_strlen($data['content'])) { $data['content'] = $fallback_content; $this->logDebug("使用备用提取方法,内容长度: " . mb_strlen($fallback_content)); } } return $data; } } catch (Exception $e) { $this->logDebug("{$method} 失败: " . $e->getMessage()); continue; } } throw new Exception('无法获取网页内容,所有抓取方法均失败'); } // ========== 增强版抓取方法(HTTP/2、完整浏览器模拟、Cookie管理) ========== private function curlWithRetry($url, $maxRetries = 2) { $lastException = null; for ($i = 0; $i < $maxRetries; $i++) { try { if ($i > 0) { $this->logDebug("抓取重试第 {$i} 次,URL: {$url}"); sleep(2 * $i); } return $this->curlRobust($url); } catch (Exception $e) { $lastException = $e; if (strpos($e->getMessage(), '反爬虫') !== false) { break; } } } throw $lastException ?: new Exception('抓取失败,已达最大重试次数'); } private function curlRobust($url) { $userAgents = [ 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:124.0) Gecko/20100101 Firefox/124.0', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3 Safari/605.1.15', ]; $userAgent = $userAgents[array_rand($userAgents)]; $this->logDebug("使用User-Agent: " . substr($userAgent, 0, 60) . "..."); $parsedUrl = parse_url($url); $referer = ($parsedUrl['scheme'] ?? 'https') . '://' . ($parsedUrl['host'] ?? '') . '/'; $cookieFile = sys_get_temp_dir() . '/chuang_cookie_' . md5($url) . '.txt'; $ch = curl_init(); curl_setopt_array($ch, [ CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_MAXREDIRS => 10, CURLOPT_TIMEOUT => 45, CURLOPT_CONNECTTIMEOUT => 20, CURLOPT_USERAGENT => $userAgent, CURLOPT_SSL_VERIFYPEER => false, CURLOPT_SSL_VERIFYHOST => 0, CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_2_0, CURLOPT_ENCODING => 'gzip, deflate, br', CURLOPT_COOKIEFILE => $cookieFile, CURLOPT_COOKIEJAR => $cookieFile, CURLOPT_HTTPHEADER => [ 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8', 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-US;q=0.7', 'Accept-Encoding: gzip, deflate, br', 'Cache-Control: no-cache', 'Pragma: no-cache', 'Sec-Ch-Ua: "' . $this->getRandomSecChUa() . '"', 'Sec-Ch-Ua-Mobile: ?0', 'Sec-Ch-Ua-Platform: "Windows"', 'Sec-Fetch-Dest: document', 'Sec-Fetch-Mode: navigate', 'Sec-Fetch-Site: none', 'Sec-Fetch-User: ?1', 'Upgrade-Insecure-Requests: 1', 'DNT: 1', 'Referer: ' . $referer, ], CURLOPT_AUTOREFERER => true, CURLOPT_IPRESOLVE => CURL_IPRESOLVE_V4, ]); $html = curl_exec($ch); $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE); $effectiveUrl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); $error = curl_error($ch); curl_close($ch); @unlink($cookieFile); if ($error) { $this->logDebug("cURL错误: {$error}"); throw new Exception("网络请求失败: {$error}"); } if ($httpCode == 403 || $httpCode == 503) { if ($this->isAntiBotPage($html)) { throw new Exception('网站启用了反爬虫验证(Cloudflare/JS挑战),无法抓取'); } throw new Exception("HTTP {$httpCode} - 访问被拒绝"); } if ($httpCode != 200 && empty($html)) { throw new Exception("HTTP {$httpCode} 且无内容返回"); } if ($httpCode != 200) { $this->logDebug("HTTP {$httpCode} 但返回了内容,继续尝试提取"); } if (strlen($html) < 500 && $this->isAntiBotPage($html)) { throw new Exception('返回内容为反爬虫验证页面'); } return ['html' => $html, 'effective_url' => $effectiveUrl]; } private function getRandomSecChUa() { $brands = [ '"Chromium";v="122", "Not(A:Brand";v="24", "Google Chrome";v="122"', '"Chromium";v="123", "Not(A:Brand";v="8", "Google Chrome";v="123"', '"Chromium";v="121", "Not(A:Brand";v="99", "Google Chrome";v="121"', ]; return $brands[array_rand($brands)]; } private function isAntiBotPage($html) { $anti_bot_indicators = [ 'Just a moment', 'Enable JavaScript', 'Enable cookies', 'Checking your browser', 'Verifying you are human', 'Cloudflare', 'DDoS protection', 'Security check', '请稍候', '正在验证', '需要JavaScript', '需要Cookie', 'Cloudflare Ray ID', 'Checking if the site connection is secure' ]; foreach ($anti_bot_indicators as $indicator) { if (stripos($html, $indicator) !== false) { return true; } } return false; } private function curlSimple($url) { $ch = curl_init(); curl_setopt_array($ch, [ CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_TIMEOUT => 30, CURLOPT_USERAGENT => self::USER_AGENT, CURLOPT_SSL_VERIFYPEER => false, CURLOPT_HTTPHEADER => [ 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8', ] ]); $html = curl_exec($ch); $effective_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); $http = curl_getinfo($ch, CURLINFO_HTTP_CODE); curl_close($ch); if ($http == 403) { throw new Exception("HTTP 403 Forbidden - 服务器拒绝访问,跳过"); } if ($this->isAntiBotPage($html)) { $this->logDebug("检测到反爬虫验证页面,URL: {$url}"); } if ($http != 200) { if (empty($html)) { throw new Exception("HTTP {$http}"); } $this->logDebug("HTTP {$http} 但有内容返回,将继续尝试提取,长度: " . strlen($html), "反爬虫"); } return ['html' => $html, 'effective_url' => $effective_url]; } private function fileGet($url) { $ctx = stream_context_create([ 'http' => ['timeout'=>30, 'header'=>"User-Agent: ".self::USER_AGENT."\r\n"], 'ssl' => ['verify_peer'=>false, 'verify_peer_name'=>false] ]); $html = @file_get_contents($url, false, $ctx); if ($html === false) { throw new Exception('file_get_contents失败'); } return ['html' => $html, 'effective_url' => $url]; } private function curlAntiBot($url) { $this->logDebug("尝试激进的反爬虫绕过方法...", "反爬虫"); $userAgent = 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html'; $ch = curl_init(); curl_setopt_array($ch, [ CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => true, CURLOPT_TIMEOUT => 45, CURLOPT_CONNECTTIMEOUT => 20, CURLOPT_FOLLOWLOCATION => true, CURLOPT_USERAGENT => $userAgent, CURLOPT_SSL_VERIFYPEER => false, CURLOPT_SSL_VERIFYHOST => 0, CURLOPT_ENCODING => '', CURLOPT_HTTPHEADER => [ 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Language: en-US,en;q=0.5', 'Accept-Encoding: gzip, deflate', 'Connection: close', ], CURLOPT_AUTOREFERER => true, CURLOPT_MAXREDIRS => 3, CURLOPT_IPRESOLVE => CURL_IPRESOLVE_V4, ]); $html = curl_exec($ch); $effective_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); $http = curl_getinfo($ch, CURLINFO_HTTP_CODE); $error = curl_error($ch); curl_close($ch); if ($error) { $this->logDebug("curlAntiBot cURL错误: {$error}", "反爬虫"); } if ($http == 403) { throw new Exception("HTTP 403 Forbidden - 服务器拒绝访问,跳过"); } if (empty($html) && $http != 200) { throw new Exception("HTTP {$http}"); } $this->logDebug("curlAntiBot完成,HTTP: {$http}", "反爬虫"); return ['html' => $html, 'effective_url' => $effective_url]; } private function extractSimpleContent($html, $url = '') { $this->logDebug("尝试唐僧插件风格的简单提取...", "内容策略"); $title = ''; if (preg_match('/]*>(.*?)<\/title>/is', $html, $m)) { $title = trim(html_entity_decode($m[1], ENT_QUOTES | ENT_HTML5, 'UTF-8')); $title = preg_replace('/\s+/', ' ', $title); $this->logDebug("提取到标题(标准): {$title}", "简单提取"); } if (empty($title) && preg_match('//is', $html, $m)) { $title = trim(html_entity_decode($m[1], ENT_QUOTES | ENT_HTML5, 'UTF-8')); $this->logDebug("提取到标题(og:title): {$title}", "简单提取"); } if (empty($title) && preg_match('//is', $html, $m)) { $title = trim(html_entity_decode($m[1], ENT_QUOTES | ENT_HTML5, 'UTF-8')); $this->logDebug("提取到标题(twitter:title): {$title}", "简单提取"); } if (empty($title) && preg_match('/]*>(.*?)<\/h1>/is', $html, $m)) { $h1_text = trim(strip_tags(html_entity_decode($m[1], ENT_QUOTES | ENT_HTML5, 'UTF-8'))); if (!empty($h1_text)) { $title = $h1_text; $this->logDebug("提取到标题(h1): {$title}", "简单提取"); } } if (empty($title) && !empty($url)) { $host = parse_url($url, PHP_URL_HOST); if ($host) { $title = ucfirst(str_replace('www.', '', $host)); $this->logDebug("提取到标题(URL fallback): {$title}", "简单提取"); } } $desc = ''; $desc_patterns = [ '//is', '//is', '//is', '//is', '//is' ]; foreach ($desc_patterns as $pattern) { if (preg_match($pattern, $html, $m)) { $desc = trim(html_entity_decode($m[1], ENT_QUOTES | ENT_HTML5, 'UTF-8')); $this->logDebug("提取到描述,长度: " . mb_strlen($desc), "简单提取"); break; } } $keys = ''; $key_patterns = [ '//is', '//is', '//is' ]; foreach ($key_patterns as $pattern) { if (preg_match($pattern, $html, $m)) { $keys = trim(html_entity_decode($m[1], ENT_QUOTES | ENT_HTML5, 'UTF-8')); break; } } $content = $html; $main_content = ''; $content_selectors = [ '/]*>(.*?)<\/main>/si', '/]*>(.*?)<\/article>/si', '/]*class="[^"]*content[^"]*"[^>]*>(.*?)<\/div>/si', '/]*id="[^"]*content[^"]*"[^>]*>(.*?)<\/div>/si', '/]*>(.*?)<\/section>/si', ]; foreach ($content_selectors as $pattern) { if (preg_match($pattern, $content, $m)) { $main_content = $m[1]; break; } } if (!empty($main_content)) { $content = $main_content; } $content = preg_replace('/]*>.*?<\/script>/si', '', $content); $content = preg_replace('/]*>.*?<\/style>/si', '', $content); $content = preg_replace('//s', '', $content); if (empty($main_content) && preg_match('/]*>(.*?)<\/body>/si', $content, $m)) { $content = $m[1]; } $content = strip_tags($content); $content = preg_replace('/\s+/', ' ', $content); $content = trim($content); if (mb_strlen($content, 'UTF-8') < 50) { $this->logDebug("正文内容过少,尝试从段落标签提取...", "简单提取"); $p_texts = []; if (preg_match_all('/]*>(.*?)<\/p>/si', $html, $matches)) { foreach ($matches[1] as $p) { $p_text = trim(strip_tags($p)); if (mb_strlen($p_text) > 10) { $p_texts[] = $p_text; } } } if (!empty($p_texts)) { $content = implode(' ', $p_texts); $this->logDebug("从段落标签提取到内容,长度: " . mb_strlen($content), "简单提取"); } } if (mb_strlen($content, 'UTF-8') < 30 && !empty($url)) { $host = parse_url($url, PHP_URL_HOST); $path = parse_url($url, PHP_URL_PATH); $basic_info = "网站:{$host}"; if (!empty($path) && $path != '/') { $basic_info .= ",页面:{$path}"; } $basic_info .= "。这是一个网站的介绍页面。"; if (mb_strlen($content, 'UTF-8') < 10) { $content = $basic_info; } else { $content = $content . ' ' . $basic_info; } $this->logDebug("正文内容不足,补充基本信息,最终长度: " . mb_strlen($content), "简单提取"); } $max_length = 800; if (mb_strlen($content, 'UTF-8') > $max_length) { $content = mb_substr($content, 0, $max_length, 'UTF-8') . '...'; } $this->logDebug("简单提取正文,长度: " . mb_strlen($content), "简单提取"); if (!empty($title)) { $this->logDebug("唐僧插件风格提取成功(title=" . (!empty($title) ? "有" : "无") . ", desc=" . (!empty($desc) ? "有" : "无") . ")", "内容策略"); return [ 'title' => $title, 'description' => $desc, 'desc' => $desc, 'keywords' => $keys, 'keys' => $keys, 'content' => $content, 'strategy' => 'simple' ]; } $this->logDebug("唐僧插件风格提取失败(无title且无desc)", "内容策略"); return null; } private function extractWebsiteData($html) { $title = ''; if (preg_match('/(.*?)<\/title>/is', $html, $m)) $title = trim(html_entity_decode($m[1], ENT_QUOTES, 'UTF-8')); $desc = ''; if (preg_match('/<meta\s+name=["\']description["\']\s+content=["\'](.*?)["\']/is', $html, $m)) $desc = trim(html_entity_decode($m[1], ENT_QUOTES, 'UTF-8')); $keys = ''; if (preg_match('/<meta\s+name=["\']keywords["\']\s+content=["\'](.*?)["\']/is', $html, $m)) $keys = trim(html_entity_decode($m[1], ENT_QUOTES, 'UTF-8')); $spa_data = $this->extractSpaData($html); if (!empty($spa_data['title']) && empty($title)) $title = $spa_data['title']; if (!empty($spa_data['desc']) && empty($desc)) $desc = $spa_data['desc']; $body = ''; if (!empty($spa_data['content'])) { $body = $spa_data['content']; $this->logDebug("从SPA数据提取到内容,长度: " . mb_strlen($body)); } $jsonld_content = ''; if (preg_match('/<script\s+type=["\']application\/ld\+json["\'][^>]*>(.*?)<\/script>/is', $html, $m)) { $jsonld_content = trim($m[1]); try { $jsonld_data = json_decode($jsonld_content, true); if (is_array($jsonld_data)) { if (isset($jsonld_data['@type']) && in_array(strtolower($jsonld_data['@type']), ['article', 'blogposting', 'newsarticle', 'techarticle'])) { if (empty($title) && !empty($jsonld_data['headline'])) { $title = self::cleanUtf8(trim($jsonld_data['headline'])); } if (empty($desc) && !empty($jsonld_data['description'])) { $desc = self::cleanUtf8(trim($jsonld_data['description'])); } if (!empty($jsonld_data['articleBody'])) { $jsonld_content = self::cleanUtf8(trim($jsonld_data['articleBody'])); } } if (isset($jsonld_data['@type']) && strtolower($jsonld_data['@type']) === 'website') { if (empty($title) && !empty($jsonld_data['name'])) { $title = self::cleanUtf8(trim($jsonld_data['name'])); } if (empty($desc) && !empty($jsonld_data['description'])) { $desc = self::cleanUtf8(trim($jsonld_data['description'])); } } } } catch (Exception $e) { $this->logDebug("JSON-LD 解析失败: " . $e->getMessage()); } } if (preg_match('/<meta\s+property=["\']og:title["\']\s+content=["\'](.*?)["\']/is', $html, $m) && empty($title)) { $title = self::cleanUtf8(trim(html_entity_decode($m[1], ENT_QUOTES, 'UTF-8'))); } if (preg_match('/<meta\s+property=["\']og:description["\']\s+content=["\'](.*?)["\']/is', $html, $m) && empty($desc)) { $desc = self::cleanUtf8(trim(html_entity_decode($m[1], ENT_QUOTES, 'UTF-8'))); } if (empty($body)) { $body = $this->extractBodyContent($html); if (!empty($jsonld_content) && mb_strlen($jsonld_content) > 100 && mb_strlen($jsonld_content) > mb_strlen($body)) { $body = $jsonld_content; $this->logDebug("使用 JSON-LD 文章正文"); } } if (mb_strlen($body) < 50) { $this->logDebug("正文内容过短,尝试备用方案"); $fallback_body = $this->extractFallbackContent($html); if (mb_strlen($fallback_body) > mb_strlen($body)) { $body = $fallback_body; } } if (mb_strlen($body) > 1500) { $body = mb_substr($body, 0, 1500) . '...'; } $title = self::cleanUtf8($title); $desc = self::cleanUtf8($desc); $keys = self::cleanUtf8($keys); $body = self::cleanUtf8($body); return ['html'=>$html, 'title'=>$title, 'desc'=>$desc, 'keys'=>$keys, 'content'=>$body]; } private function extractSpaData($html) { $data = ['title' => '', 'desc' => '', 'content' => '']; if (preg_match('/<script\s+id="__NEXT_DATA__"[^>]*>(.*?)<\/script>/is', $html, $m)) { try { $next_data = json_decode($m[1], true); $this->logDebug("发现 Next.js 数据"); if (isset($next_data['props']['pageProps']['seo']['title'])) { $data['title'] = $next_data['props']['pageProps']['seo']['title']; } if (isset($next_data['props']['pageProps']['seo']['description'])) { $data['desc'] = $next_data['props']['pageProps']['seo']['description']; } $content = ''; if (isset($next_data['props']['pageProps']['content'])) { $content = $this->extractTextFromData($next_data['props']['pageProps']['content']); } elseif (isset($next_data['props']['pageProps']['page']['content'])) { $content = $this->extractTextFromData($next_data['props']['pageProps']['page']['content']); } elseif (isset($next_data['props']['pageProps']['data'])) { $content = $this->extractTextFromData($next_data['props']['pageProps']['data']); } if (!empty($content)) { $data['content'] = $content; } } catch (Exception $e) { $this->logDebug("Next.js 数据解析失败: " . $e->getMessage()); } } if (preg_match('/<script\s+id="__NUXT__"[^>]*>(.*?)<\/script>/is', $html, $m)) { try { $nuxt_data = json_decode($m[1], true); $this->logDebug("发现 Nuxt.js 数据"); if (isset($nuxt_data['data'][0]['seoMeta']['title'])) { $data['title'] = $nuxt_data['data'][0]['seoMeta']['title']; } if (isset($nuxt_data['data'][0]['seoMeta']['description'])) { $data['desc'] = $nuxt_data['data'][0]['seoMeta']['description']; } } catch (Exception $e) { $this->logDebug("Nuxt.js 数据解析失败: " . $e->getMessage()); } } if (preg_match_all('/<script[^>]*type=["\']application\/json["\'][^>]*>(.*?)<\/script>/is', $html, $matches)) { foreach ($matches[1] as $json_str) { try { $json_data = json_decode($json_str, true); $extracted = $this->extractTextFromData($json_data); if (mb_strlen($extracted) > mb_strlen($data['content'])) { $data['content'] = $extracted; } } catch (Exception $e) { } } } if (preg_match_all('/window\.__(\w+)\s*=\s*({[^;]+});/is', $html, $matches)) { foreach ($matches[2] as $i => $data_str) { try { $json_data = json_decode($data_str, true); $extracted = $this->extractTextFromData($json_data); if (mb_strlen($extracted) > mb_strlen($data['content'])) { $this->logDebug("从 window.__{$matches[1][$i]} 提取内容"); $data['content'] = $extracted; } } catch (Exception $e) { } } } return $data; } private function extractTextFromData($data, $max_depth = 5, $current_depth = 0) { if ($current_depth > $max_depth) return ''; $text = ''; if (is_string($data)) { $text = strip_tags($data); $text = preg_replace('/\s+/', ' ', $text); $text = trim($text); } elseif (is_array($data)) { foreach ($data as $value) { $extracted = $this->extractTextFromData($value, $max_depth, $current_depth + 1); if (!empty($extracted) && mb_strlen($extracted) > 10) { if (!in_array($extracted, ['true', 'false', 'null', '{}', '[]']) && !preg_match('/^(true|false|null|\d+)$/', $extracted)) { $text .= ' ' . $extracted; } } } } elseif (is_object($data)) { $array_data = json_decode(json_encode($data), true); $text = $this->extractTextFromData($array_data, $max_depth, $current_depth + 1); } return trim(preg_replace('/\s+/', ' ', $text)); } private function extractBodyContent($html) { $body = ''; $main_selectors = [ '/<main[^>]*>(.*?)<\/main>/is', '/<article[^>]*>(.*?)<\/article>/is', '/<div[^>]*class=["\'][^"\']*content[^"\']*["\'][^>]*>(.*?)<\/div>/is', '/<div[^>]*class=["\'][^"\']*post[^"\']*["\'][^>]*>(.*?)<\/div>/is', '/<div[^>]*class=["\'][^"\']*article[^"\']*["\'][^>]*>(.*?)<\/div>/is', '/<section[^>]*class=["\'][^"\']*content[^"\']*["\'][^>]*>(.*?)<\/section>/is', '/<div[^>]*class=["\'][^"\']*prose[^"\']*["\'][^>]*>(.*?)<\/div>/is', ]; foreach ($main_selectors as $selector) { if (preg_match($selector, $html, $m)) { $body = $m[1]; $this->logDebug("使用选择器成功提取正文区域"); break; } } if (empty($body)) { if (preg_match('/<body[^>]*>(.*?)<\/body>/is', $html, $m)) { $body = $m[1]; } else { $body = $html; } } $body = $this->cleanBodyContent($body); return $body; } private function extractFallbackContent($html) { $remove_tags = [ '/<script\b[^>]*>.*?<\/script>/is', '/<style\b[^>]*>.*?<\/style>/is', '/<nav\b[^>]*>.*?<\/nav>/is', '/<header\b[^>]*>.*?<\/header>/is', '/<footer\b[^>]*>.*?<\/footer>/is', '/<aside\b[^>]*>.*?<\/aside>/is', '/<form\b[^>]*>.*?<\/form>/is', '/<noscript\b[^>]*>.*?<\/noscript>/is', '/<!--.*?-->/s', '/<svg\b[^>]*>.*?<\/svg>/is', ]; $body = $html; foreach ($remove_tags as $pattern) { $body = preg_replace($pattern, '', $body); } $remove_classes = [ 'sidebar', 'advertisement', 'ad-', 'comment', 'related', 'share', 'social', 'breadcrumb', 'pagination', 'menu', 'navigation', 'footer', 'header', 'cookie', 'popup', 'modal', 'dialog', 'alert' ]; foreach ($remove_classes as $class) { $pattern = '/<(div|section|aside)[^>]*class=["\'][^"\']*' . preg_quote($class, '/') . '[^"\']*["\'][^>]*>.*?<\/\1>/is'; $body = preg_replace($pattern, '', $body); } $paragraphs = []; if (preg_match_all('/<p[^>]*>(.*?)<\/p>/is', $body, $matches)) { foreach ($matches[1] as $p) { $text = strip_tags($p); $text = preg_replace('/\s+/', ' ', $text); $text = trim($text); if (mb_strlen($text) > 20) { $paragraphs[] = $text; } } } if (!empty($paragraphs)) { $body = implode(' ', $paragraphs); } else { $body = strip_tags($body); $body = preg_replace('/\s+/', ' ', $body); $body = trim($body); } return $body; } private function cleanBodyContent($body) { $remove_tags = [ '/<script\b[^>]*>.*?<\/script>/is', '/<style\b[^>]*>.*?<\/style>/is', '/<nav\b[^>]*>.*?<\/nav>/is', '/<header\b[^>]*>.*?<\/header>/is', '/<footer\b[^>]*>.*?<\/footer>/is', '/<aside\b[^>]*>.*?<\/aside>/is', '/<form\b[^>]*>.*?<\/form>/is', '/<noscript\b[^>]*>.*?<\/noscript>/is', '/<!--.*?-->/s', ]; foreach ($remove_tags as $pattern) { $body = preg_replace($pattern, '', $body); } $remove_classes = [ 'sidebar', 'advertisement', 'ad-', 'comment', 'related', 'share', 'social', 'breadcrumb', 'pagination', 'menu', 'navigation', 'footer', 'header' ]; foreach ($remove_classes as $class) { $pattern = '/<(div|section|aside)[^>]*class=["\'][^"\']*' . preg_quote($class, '/') . '[^"\']*["\'][^>]*>.*?<\/\1>/is'; $body = preg_replace($pattern, '', $body); } $body = preg_replace('/<(p|div|span)[^>]*>\s*<\/\1>/is', '', $body); $body = strip_tags($body); $body = preg_replace('/\s+/', ' ', $body); $body = trim($body); return $body; } private function downloadFavicon($url, $html = '') { $domain = parse_url($url, PHP_URL_HOST); if (empty($domain)) { $this->logDebug("downloadFavicon: 无法解析域名,URL={$url}"); return ''; } $apiUrl = 'https://v2.xxapi.cn/api/ico?url=' . urlencode($url); try { $this->logDebug("尝试使用API获取图标: {$apiUrl}"); $apiResult = $this->fetchJsonFromApi($apiUrl); if ($apiResult && isset($apiResult['code']) && $apiResult['code'] == 200 && !empty($apiResult['data'])) { $iconRealUrl = $apiResult['data']; $this->logDebug("API返回真实图标URL: {$iconRealUrl}"); $ext = $this->getFileExtension($iconRealUrl); $file = $this->downloadFile($iconRealUrl, $ext, false); if ($file && $this->isValidImage($file)) { $this->logDebug("通过API下载图标成功: {$file}"); return $file; } else { $this->logDebug("API返回的图标URL下载失败或无效,继续尝试其他方法"); } } else { $this->logDebug("API调用失败或返回无效数据: " . json_encode($apiResult)); } } catch (Exception $e) { $this->logDebug("API调用异常: " . $e->getMessage()); } $candidates = []; $extracted = $this->extractFaviconUrl($url, $html); if ($extracted) { $candidates[] = [ 'url' => $extracted, 'ext' => $this->getFileExtension($extracted), 'source' => 'html_extract' ]; } $candidates[] = [ 'url' => "https://icons.duckduckgo.com/ip3/{$domain}.ico", 'ext' => 'ico', 'source' => 'duckduckgo' ]; $candidates[] = [ 'url' => "https://icon.horse/icon/{$domain}", 'ext' => 'ico', 'source' => 'icon.horse' ]; $candidates[] = [ 'url' => "https://api.faviconkit.com/{$domain}/144", 'ext' => 'png', 'source' => 'faviconkit' ]; $candidates[] = [ 'url' => "https://{$domain}/favicon.ico", 'ext' => 'ico', 'source' => 'direct_https' ]; $candidates[] = [ 'url' => "http://{$domain}/favicon.ico", 'ext' => 'ico', 'source' => 'direct_http' ]; $candidates[] = [ 'url' => "https://{$domain}/favicon.png", 'ext' => 'png', 'source' => 'direct_png' ]; try { $grabberUrl = "https://favicongrabber.com/api/grab/{$domain}"; $grabberData = $this->fetchJsonFromApi($grabberUrl); if ($grabberData && isset($grabberData['icons']) && is_array($grabberData['icons'])) { foreach ($grabberData['icons'] as $icon) { if (!empty($icon['src'])) { $iconUrl = $icon['src']; if (strpos($iconUrl, '//') === 0) { $iconUrl = 'https:' . $iconUrl; } elseif (strpos($iconUrl, '/') === 0) { $iconUrl = 'https://' . $domain . $iconUrl; } $candidates[] = [ 'url' => $iconUrl, 'ext' => $this->getFileExtension($iconUrl), 'source' => 'favicongrabber' ]; } } } } catch (Exception $e) { $this->logDebug("favicongrabber 调用失败: " . $e->getMessage()); } $candidates[] = [ 'url' => "https://www.allfavicons.com/{$domain}", 'ext' => 'png', 'source' => 'allfavicons' ]; $candidates[] = [ 'url' => "https://faviconfinder.com/favicon/{$domain}", 'ext' => 'png', 'source' => 'faviconfinder' ]; $seen = []; $uniqueCandidates = []; foreach ($candidates as $c) { $key = $c['url']; if (!isset($seen[$key])) { $seen[$key] = true; $uniqueCandidates[] = $c; } } foreach ($uniqueCandidates as $candidate) { try { $this->logDebug("尝试下载图标: source={$candidate['source']}, url={$candidate['url']}"); $file = $this->downloadFile($candidate['url'], $candidate['ext'], false); if ($file && $this->isValidImage($file)) { $this->logDebug("图标下载成功: source={$candidate['source']}, file={$file}"); return $file; } } catch (Exception $e) { $this->logDebug("图标下载异常 (source={$candidate['source']}): " . $e->getMessage()); } usleep(100000); } $this->logDebug("所有图标候选均失败,domain: {$domain},将继续处理任务(无封面)"); return ''; } private function fetchJsonFromApi($url) { $ch = curl_init(); curl_setopt_array($ch, [ CURLOPT_URL => $url, CURLOPT_RETURNTRANSFER => true, CURLOPT_TIMEOUT => 10, CURLOPT_USERAGENT => self::USER_AGENT, CURLOPT_SSL_VERIFYPEER => false, CURLOPT_HTTPHEADER => ['Accept: application/json'] ]); $res = curl_exec($ch); $http = curl_getinfo($ch, CURLINFO_HTTP_CODE); curl_close($ch); if ($http == 200 && !empty($res)) { $data = json_decode($res, true); if (is_array($data)) { return $data; } } return null; } private function extractFaviconUrl($url, $html) { if (empty($html)) return ''; $patterns = [ '/<link[^>]+rel=["\'](?:shortcut\s+)?icon["\'][^>]+href=["\']([^"\']+)["\']/i', '/<link[^>]+href=["\']([^"\']+)["\'][^>]+rel=["\'](?:shortcut\s+)?icon["\']/i', '/<link[^>]+rel=["\']apple-touch-icon["\'][^>]+href=["\']([^"\']+)["\']/i', '/<link[^>]+rel=["\']apple-touch-icon-precomposed["\'][^>]+href=["\']([^"\']+)["\']/i', '/<link[^>]+rel=["\']mask-icon["\'][^>]+href=["\']([^"\']+)["\']/i', ]; foreach ($patterns as $pattern) { if (preg_match($pattern, $html, $m)) { $fav = trim($m[1]); if (strpos($fav, '//') === 0) { $fav = 'https:' . $fav; } elseif (strpos($fav, '/') === 0) { $scheme = parse_url($url, PHP_URL_SCHEME) ?: 'https'; $host = parse_url($url, PHP_URL_HOST); $fav = $scheme . '://' . $host . $fav; } elseif (!preg_match('/^https?:\/\//i', $fav)) { $base = rtrim($url, '/'); $fav = $base . '/' . ltrim($fav, '/'); } return $fav; } } return ''; } private function isValidImage($filePath) { if (empty($filePath)) return false; $localPath = str_replace(Option::UPLOADFILE_PATH, Option::UPLOADFILE_FULL_PATH, $filePath); if (!file_exists($localPath)) return false; $size = filesize($localPath); if ($size < 50) { $this->logDebug("图片文件太小: {$localPath}, 大小={$size}"); return false; } $fp = fopen($localPath, 'r'); if (!$fp) return false; $header = fread($fp, 512); fclose($fp); if (stripos($header, '<!DOCTYPE') !== false || stripos($header, '<html') !== false) { $this->logDebug("图片文件包含HTML内容,删除: {$localPath}"); unlink($localPath); return false; } $ext = pathinfo($localPath, PATHINFO_EXTENSION); if (!in_array(strtolower($ext), ['ico', 'svg'])) { $imageInfo = @getimagesize($localPath); if ($imageInfo === false) { $this->logDebug("getimagesize验证失败,删除文件: {$localPath}"); unlink($localPath); return false; } } return true; } private function downloadFile($url, $ext = 'jpg', $is_screenshot = false) { $maxRetries = 2; for ($attempt = 1; $attempt <= $maxRetries; $attempt++) { if ($attempt > 1) { $this->logDebug("下载重试 ({$attempt}/{$maxRetries}): {$url}"); sleep($attempt * 2); } $dir = gmdate('Ym'); $full_dir = Option::UPLOADFILE_FULL_PATH . $dir . '/'; if (!is_dir($full_dir)) mkdir($full_dir, 0755, true); $filename = substr(md5($url . time() . $attempt), 0, 12) . '_' . time() . '.' . $ext; $filepath = $full_dir . $filename; $userAgent = self::$_userAgents[array_rand(self::$_userAgents)]; $parsedUrl = parse_url($url); $referer = isset($parsedUrl['scheme']) && isset($parsedUrl['host']) ? $parsedUrl['scheme'] . '://' . $parsedUrl['host'] . '/' : 'https://www.google.com/'; $headers = [ 'Accept: image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8', 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8', 'Accept-Encoding: gzip, deflate, br', 'Cache-Control: no-cache', 'Connection: keep-alive', 'Pragma: no-cache', 'DNT: 1', 'Sec-Fetch-Dest: image', 'Sec-Fetch-Mode: no-cors', 'Sec-Fetch-Site: same-origin', 'User-Agent: ' . $userAgent, ]; $ch = curl_init($url); curl_setopt_array($ch, [ CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_TIMEOUT => 10, CURLOPT_SSL_VERIFYPEER => false, CURLOPT_SSL_VERIFYHOST => 0, CURLOPT_HTTPHEADER => $headers, CURLOPT_REFERER => $referer, CURLOPT_ENCODING => '', CURLOPT_AUTOREFERER => true, CURLOPT_MAXREDIRS => 5, ]); $data = curl_exec($ch); $code = curl_getinfo($ch, CURLINFO_HTTP_CODE); $error = curl_error($ch); curl_close($ch); if ($code == 200 && !empty($data)) { $isHtml = (stripos($data, '<!DOCTYPE') !== false || stripos($data, '<html') !== false); $contentLength = strlen($data); if ($isHtml) { if ($contentLength > 10240 && stripos($data, '<title>') !== false) { $this->logDebug("下载内容为完整HTML页面,大小 {$contentLength} 字节,可能被反爬,删除文件: {$url}"); @unlink($filepath); continue; } else { $this->logDebug("下载内容包含HTML标签(大小 {$contentLength} 字节),可能被反爬,删除文件: {$url}"); @unlink($filepath); continue; } } if (file_put_contents($filepath, $data)) { $this->logDebug("下载成功: {$url} -> {$filepath}"); return Option::UPLOADFILE_PATH . $dir . '/' . $filename; } else { $this->logDebug("文件保存失败: {$filepath}"); } } else { $this->logDebug("下载失败: HTTP={$code}, error={$error}, url={$url}"); } } $this->logDebug("下载最终失败: {$url}"); return ''; } private function getFileExtension($url) { $path = parse_url($url, PHP_URL_PATH); if ($path && ($pos = strrpos($path, '.')) !== false) { $ext = substr($path, $pos + 1); $ext = preg_replace('/[^a-zA-Z0-9]/', '', $ext); $ext = strtolower($ext); if (in_array($ext, ['ico','png','jpg','jpeg','gif','svg','webp'])) return $ext == 'jpeg' ? 'jpg' : $ext; } return 'ico'; } private function callAI($prompt) { set_time_limit(300); $ai_config = AI::getCurrentModelInfo(); if (empty($ai_config['api_key']) || empty($ai_config['api_url'])) { throw new Exception('AI配置不完整'); } $apiUrl = $ai_config['api_url']; $apiKey = $ai_config['api_key']; $model = $ai_config['model']; $messages = [ ["role" => "user", "content" => $prompt] ]; $post_data = [ 'model' => $model, 'input' => $messages, 'max_output_tokens' => 8192, 'stream' => false, 'temperature' => 1, ]; $post_json = json_encode($post_data, JSON_UNESCAPED_UNICODE); $headers = [ 'Content-Type: application/json', 'Accept: application/json', 'Authorization: Bearer ' . $apiKey ]; $ch = curl_init(); curl_setopt_array($ch, [ CURLOPT_URL => $apiUrl, CURLOPT_POST => true, CURLOPT_POSTFIELDS => $post_json, CURLOPT_HTTPHEADER => $headers, CURLOPT_RETURNTRANSFER => true, CURLOPT_TIMEOUT => 300, CURLOPT_CONNECTTIMEOUT => 30, CURLOPT_SSL_VERIFYPEER => false, CURLOPT_SSL_VERIFYHOST => 0, CURLOPT_ENCODING => '', ]); $response = curl_exec($ch); $httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE); $error = curl_error($ch); curl_close($ch); if ($error) { $this->logDebug("AI cURL错误: {$error}"); throw new Exception("AI请求失败: {$error}"); } if ($httpCode != 200) { $this->logDebug("AI HTTP错误: {$httpCode}, 响应: " . substr($response, 0, 500)); throw new Exception("AI服务返回错误状态: {$httpCode}"); } $responseLen = strlen($response); $this->logDebug("AI原始响应长度: {$responseLen} 字节"); $decoded = json_decode($response, true); if ($decoded === null) { $this->logDebug("AI响应JSON解析失败,原始内容: " . substr($response, 0, 500)); throw new Exception('AI返回无效的JSON'); } $content = ''; if (isset($decoded['output']) && is_array($decoded['output'])) { foreach ($decoded['output'] as $item) { if (isset($item['type']) && $item['type'] === 'message') { if (isset($item['content'][0]['text'])) { $content = $item['content'][0]['text']; } elseif (isset($item['text'])) { $content = $item['text']; } break; } } } elseif (isset($decoded['choices'][0]['message']['content'])) { $content = $decoded['choices'][0]['message']['content']; } elseif (isset($decoded['error']['message'])) { $this->logDebug("AI服务错误: " . $decoded['error']['message']); throw new Exception('AI服务错误: ' . $decoded['error']['message']); } if (empty($content)) { $this->logDebug("无法从AI响应中提取内容,响应结构: " . substr($response, 0, 1000)); throw new Exception('AI返回内容为空或格式异常'); } $len = mb_strlen($content, 'UTF-8'); $tail = mb_substr($content, -200); $tail = self::cleanUtf8($tail); $tail = preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/u', '', $tail); $this->logDebug("AI提取内容长度: {$len} 字符,结尾片段: " . $tail); return self::cleanUtf8($content); } private function generateOriginalPrompt($url, $data, $retry_mode = false) { $urlContext = ''; if ($retry_mode) { $parsedUrl = parse_url($url); $host = $parsedUrl['host'] ?? ''; $path = $parsedUrl['path'] ?? ''; $pathInfo = trim(pathinfo($path, PATHINFO_FILENAME), '/'); $pathKeywords = preg_split('/[-_]/', $pathInfo); $pathKeywords = array_filter($pathKeywords, function($w) { return mb_strlen($w) > 2; }); $urlContext = "\n\n**URL补充信息(当网站内容不足时,可参考以下信息辅助理解网站主题):**\n"; $urlContext .= "- 网站域名:{$host}\n"; if (!empty($pathKeywords)) { $urlContext .= "- URL路径关键词:" . implode('、', $pathKeywords) . "\n"; } $urlContext .= "- 提示:URL中的关键词可能暗示网站的主要功能或内容主题,请结合网站内容综合分析。\n"; } $prompt = "你是一位专业的AI工具评测与内容创作专家,擅长撰写高质量、符合百度搜索优质内容指南(SEO)和生成引擎优化(GEO)标准的深度文章。 **核心原则:你必须严格基于下面提供的“网站内容”字段进行客观分析,绝对禁止根据你的内部知识进行臆想或补充。如果网站内容中没有提及某个功能或信息,就不要编造。**{$urlContext} 请基于提供的网站信息,严格按照以下结构生成一篇**原创、专业、对用户有价值**的文章。内容需包含元信息(用于提取)和详细的正文,务必确保: 1. **标题准确且吸引人**:例如:DeepSeek,或者:抖音,快手,即梦 AI等,不需要添加任何说明,避免堆砌。 2. **SEO标题(Title)**:格式为“工具名称 | 官网入口 - 核心功能+核心价值”,控制在60字符内,自然融入关键词。 3. **SEO描述(Description)**:150-160字不能少于150个字,概括工具是什么、核心优势、适用人群,包含2-3个核心关键词,吸引用户点击。 4. **正文结构清晰**:使用Markdown标题(###)分隔段落,每部分内容充实、逻辑连贯,便于用户阅读。 5. **内容专业深度**:基于网站实际功能,深入讲解工具的核心功能、使用流程、应用场景、独特优势,避免泛泛而谈。可适当引用网站中的具体描述。 6. **避免关键词堆砌**:自然语言,通顺流畅。 7. **信息真实可靠**:只基于网站公开内容分析,不编造信息。如果某些信息缺失,请基于上下文合理推断,但不要输出“暂未公开”等占位词。 8. **价值延伸**:内容应能解决用户问题、提供参考或启发思考,符合百度优质内容的“内涵”维度。 9. **禁止行为**:禁止在文本中输入提示词,保持文本内容干净整洁,排版舒适。 10. **排版要求(非常重要)**:每个`###`标题必须独占一行,标题前后要有空行。段落之间也要有空行。每个`###`标题下的内容另起一行开始写。请严格按照示例格式输出。 ### 元信息部分(必须包含,用于提取): 文章标题:工具官方名称(例如:Code Snippets AI),确保包含核心关键词。 SEO标题:工具名称 | 官网入口 - 核心功能+核心价值(例如:Code Snippets AI | 官网入口 - 基于开源AI的团队代码片段管理与协作平台),控制在60字符内。 SEO描述:150-160字以内的摘要不能少于150个字,概括工具是什么、核心优势、适用人群,并自然融入2-3个核心关键词。 核心关键词:3-5个最能代表工具的词,用英文逗号分隔(例如:AI代码片段管理, 团队协作, 代码库, 开源LLM, 开发者工具)。 分类:请从以下两个分类体系中筛选1~5个最匹配的分类,输出时需包含分类ID。格式示例:'分类:AI脚本创作 id:16' 或 '分类:通用大模型 id:76'。无需标注上级大类。 **AI创作全链路分类体系(为视频与图文创作与运行相关的新媒体工具)** 1. 策划与创意 - 热点追踪 id:11 - 赛道洞察 id:12 - 关键词挖掘 id:13 - 用户画像 id:14 - 爆款案例库 id:15 2. 内容与创作 - AI脚本创作 id:16 - AI写作 id:97 - AI提示词 id:79 - 文案优化 id:17 - AI视频生成 id:18 - AI图文转视频 id:19 - AI图像生成 id:20 - AI音频生成 id:21 - AI数字人 id:22 - 3D生成 id:23 3. 素材与资产 - 视频素材 id:24 - 图像素材 id:25 - 音频素材 id:26 - AI免抠素材 id:27 - AI元素生成 id:28 - 3D/动态素材 id:29 - 聚合搜索工具 id:30 4. 拍摄与实时 - 运镜指导 id:31 - 智能提词 id:32 - 防抖修复 id:33 - AI辅助拍摄 id:34 - AI构图指导 id:35 - AI布光模拟 id:36 - 实时动捕驱动 id:37 - 实时特效 id:38 - 直播推流工具 id:39 - 虚拟直播设置 id:40 5. 后期与剪辑 - 自动剪辑 id:41 - 多轨剪辑 id:42 - 字幕生成 id:43 - 特效合成 id:44 - AI智能调色 id:45 - AI音频增强 id:46 - AI视频增强 id:47 视频格式转换id:78 6. 运营与变现 - 平台官方工具 id:48 - 多平台分发 id:49 - 账号与数据管理 id:50 - AI运营策略 id:51 - 流量推广 id:52 - 直播公会平台 id:53 - 带货联盟平台 id:54 - 达人合作平台 id:55 - 商业化变现 id:56 7. 合规与安全 - AI检测与绕过 id:57 - 版权认证 id:58 - 数据安全 id:59 - 跨境合规 id:60 - AIGC水印溯源 id:61 - AI伦理检测 id:62 8. 领域与创作 - 美妆领域 id:63 - 游戏领域 id:64 - 知识科普 id:65 - 时尚穿搭 id:66 - 影视娱乐 id:67 - 三农 id:99 - 美食 id:100 - 亲子 id:101 - 职场 id:102 9. 学习与增长 - 工具教程 id:68 - 行业报告 id:69 - 工具评测 id:70 - 官方课程 id:71 - 学习路径 id:72 - AI学习助手 id:73 - 短视频/直播案例库 id:74 **全品类分类体系(所有AI工具)** - 通用大模型 id:76 - AI智能搜索 id:98 - AI智能体 id:10 - AI 办公助手 id:77 - AI 设计工具 id:75 - AI 编程开发 id:81 - AI 语音交互 id:82 - AI 数据处理 id:83 AI 教育学习(直接选择子分类) - 自适应学习 id:104 - 智能题库 id:105 - AI辅导 id:106 - 智能备课 id:107 - 智能评测 id:108 - 语言学习 id:109 - 学习分析 id:110 - 特教支持 id:112 - 职业培训 id:113 - 家庭教育 id:114 - AI 学术论文 id:85 - AI 医疗健康 id:86 - AI 工业制造 id:87 - AI 营销推广 id:88 - AI 效率工具 id:89 - AI 法律合规 id:90 - AI 金融理财 id:91 - AI 建筑设计 id:92 - AI 智慧农业 id:93 - AI 人力资源 id:94 - AI 游戏开发 id:95 - AI 工具合集 id:96 是否是AI工具:[是/不是/不确定] 地区:[国内/国外] 开发者:如果有明确信息则填写,否则留空(不要输出“暂未公开”等占位词) 收费模式:[多个用英文逗号分隔,例如:免费,订阅制,按次购买] 组合使用推荐链:[留空] 标签:3~5个,每个标签控制在4字左右,多个标签用逗号隔开,需贴合工具核心功能。 简介:15字以内,一句话概括工具核心功能与价值。 ### 正文内容(请使用以下结构,注意换行和空行,确保内容详细、基于网站实际信息,并自然融入核心关键词): ### 工具介绍 详细说明工具是什么、由谁开发(如果有)、核心定位。基于网站实际功能与开发背景,准确提炼核心用途、核心优势及适配场景。开发者信息如有则写,没有则不提。篇幅控制在200-300字,客观精准,可适当加入与同类工具的对比(如有)。 <br> ### 效果展示/案例参考 通过文字描述工具实际生成成果、成品案例与落地应用表现,以文字形式具象化工具能力,可按场景/风格分类说明,直观体现工具输出质量与实用价值。 <br> ### 核心功能 从网站中提取具体的功能点,不要泛泛而谈。基于页面的“Comprehensive Features”、“Some of our great features”等板块,列出至少5-8项最核心、最具特色的功能。每项用简洁的动宾短语描述,并简要说明其价值(例如:智能代码片段管理 - 集中存储、组织和检索团队的代码片段,支持AI驱动的上下文生成)。 - 功能1:描述 - 功能2:描述 - 功能3:描述 - ... <br> ### 使用流程 梳理工具极简操作步骤,通常3-4步即可完成核心使用,清晰说明从打开工具到输出成果的全流程,降低用户上手门槛,突出工具易用性。 - 步骤1:... - 步骤2:... - 步骤3:... - ... <br> ### 使用场景 清晰描述工具的多维度实际应用场景,覆盖不同用户需求。每个场景可用一小段话说明,或列点简述。 - 场景1:... - 场景2:... - 场景3:... - ... <br> ### 适用人群 根据工具实际目标用户客观描述,列出多个相关人群(例如:开发者、技术团队负责人、独立开发者、编程学习者、需要管理代码库的企业),表述简洁,可适当说明为什么适合他们。 <br> ### 独特优势 聚焦工具核心竞争力,对比同类产品提炼差异化亮点。可从技术架构、生成效果、操作门槛、版权属性、使用成本等方面展开,不泛泛而谈。 <br> ### 常见问题(FAQ)提炼(如网站有相关内容) 基于网站中的FAQ或用户评价,整理出3-5个最常被问到的问题及其简要回答。 - Q1: ... - A1: ... - Q2: ... - A2: ... - Q3: ... - A3: ... - Q4: ... - A4: ... - Q5: ... - A5: ... <br> ### 实测体验(由AI创作导航实测) 请以“AI创作导航”的第一人称视角撰写一段约150-200字的实测体验。内容应基于网站提供的功能信息,模拟真实使用场景,包括: - 实际使用感受(例如:我用这个工具处理了XX任务,体验如何) - 突出优点(1-2个) - 指出缺点或需要注意的地方(如果有) - 给出推荐建议(适合哪些人群) 要求:语气真实、自然,不要过于官方,体现“我们亲自试用”的感觉。 注意:正文中不要重复SEO标题和SEO描述,这些已在元信息部分输出。 分析依据:基于网站公开内容进行客观分析,确保信息准确。如果某些信息缺失,请基于上下文合理推断,但不要编造,不要输出“暂未公开”等占位词。 目标网址:{$url} 网站标题:{$data['title']} 网站描述:{$data['desc']} 网站关键词:{$data['keys']} 网站内容(已抓取的部分): {$data['content']} 请开始分析并严格按照上述格式输出:"; $prompt .= "\n\n" . "是否是AI工具:[是/不是/不确定]\n"; $prompt .= "是否设为精选:[是/不是]\n"; $prompt .= "地区:[国内/国外]\n"; $prompt .= "开发者:如果有明确信息则填写,否则留空\n"; $prompt .= "收费模式:[多个用英文逗号分隔,例如:免费,订阅制,按次购买]\n"; $prompt .= "组合使用推荐链:[留空]\n"; return $prompt; } private function generateSimplePrompt($url, $data, $retry_mode = false) { $urlContext = ''; if ($retry_mode) { $parsedUrl = parse_url($url); $host = $parsedUrl['host'] ?? ''; $path = $parsedUrl['path'] ?? ''; $pathInfo = trim(pathinfo($path, PATHINFO_FILENAME), '/'); $pathKeywords = preg_split('/[-_]/', $pathInfo); $pathKeywords = array_filter($pathKeywords, function($w) { return mb_strlen($w) > 2; }); $urlContext = "\n\nURL补充信息:网站域名={$host}"; if (!empty($pathKeywords)) { $urlContext .= ",路径关键词=" . implode('、', $pathKeywords); } } return "请根据以下网站信息,生成一篇300-500字的简短介绍文章,包含工具名称、主要功能、适用人群和简单评价。{$urlContext}\n网址:{$url}\n标题:{$data['title']}\n描述:{$data['desc']}\n内容:{$data['content']}"; } private function processAIResponse($content, $websiteData) { $content = self::cleanUtf8($content); $title = ''; if (preg_match('/文章标题[::]\s*(.*?)(\n|$)/u', $content, $m)) $title = trim($m[1]); if (empty($title)) $title = $websiteData['title'] ?: 'AI生成文章'; $seo_title = ''; if (preg_match('/SEO标题[::]\s*(.*?)(\n|$)/u', $content, $m)) $seo_title = trim($m[1]); $seo_description = ''; if (preg_match('/SEO描述[::]\s*(.*?)(\n|$)/u', $content, $m)) $seo_description = trim($m[1]); $excerpt = ''; if (preg_match('/简介[::]\s*(.*?)(?=\n|$)/u', $content, $m)) { $excerpt = trim($m[1]); } $excerpt = html_entity_decode($excerpt, ENT_QUOTES, 'UTF-8'); $excerpt = strip_tags($excerpt); $excerpt = mb_substr($excerpt, 0, 15, 'UTF-8'); $tags = ''; if (preg_match('/(?:^|\n)\s*标签[::]\s*(.*?)(?=\n|$)/u', $content, $m)) { $tags = trim($m[1]); $tags = html_entity_decode($tags, ENT_QUOTES, 'UTF-8'); $tags = str_replace([',', '、', ';', ':', ' '], ',', $tags); $tags = preg_replace('/,+/', ',', $tags); $tags = trim($tags, ', '); } $category_ids = []; if (preg_match_all('/id\s*[::]\s*(\d+)/u', $content, $matches)) { $category_ids = array_map('intval', $matches[1]); $category_ids = array_slice(array_unique($category_ids), 0, 5); } $clean = $content; $pos = strpos($clean, '### 工具介绍'); if ($pos !== false) { $clean = substr($clean, $pos); } $lines = explode("\n", $clean); $filteredLines = []; foreach ($lines as $line) { if (preg_match('/^(文章标题|SEO标题|SEO描述|核心关键词|分类|是否是AI工具|是否设为精选|地区|开发者|收费模式|组合使用推荐链|标签|简介)[::]/u', $line)) { continue; } $filteredLines[] = $line; } $clean = implode("\n", $filteredLines); $clean = $this->formatArticleContent($clean); $nav_fields = [ 'is_ai' => 'unknown', 'is_featured' => 'no', 'location' => '', ]; if (preg_match('/是否是AI工具[::]\s*([^\n]+)/u', $content, $m)) { $val = trim($m[1]); if (strpos($val, '是') !== false || stripos($val, 'yes') !== false) { $nav_fields['is_ai'] = 'yes'; } elseif (strpos($val, '不是') !== false || strpos($val, '否') !== false || stripos($val, 'no') !== false) { $nav_fields['is_ai'] = 'no'; } } if (preg_match('/地区[::]\s*([^\n]+)/u', $content, $m)) { $val = trim($m[1]); if (strpos($val, '国内') !== false || stripos($val, 'china') !== false || stripos($val, 'cn') !== false) { $nav_fields['location'] = '国内'; } elseif (strpos($val, '国外') !== false || stripos($val, 'foreign') !== false) { $nav_fields['location'] = '国外'; } } $this->logDebug("提取导航字段: is_ai={$nav_fields['is_ai']}, location={$nav_fields['location']}"); return [ 'title' => self::cleanUtf8($title), 'content' => self::cleanUtf8($clean), 'excerpt' => self::cleanUtf8($excerpt), 'tags' => self::cleanUtf8($tags), 'category_ids' => $category_ids, 'seo_title' => self::cleanUtf8($seo_title), 'seo_description' => self::cleanUtf8($seo_description), 'nav_fields' => $nav_fields, ]; } private function formatArticleContent($content) { $content = preg_replace('/([^\n])\n(###\s+[^\n]+)\n/', "$1\n\n$2\n\n", $content); $content = preg_replace('/\n(###\s+[^\n]+)\n([^\n])/', "\n\n$1\n\n$2", $content); $content = preg_replace("/\n{3,}/", "\n\n", $content); $content = str_replace('<br>', "<br>\n", $content); return trim($content); } private function fullToHalf($str) { $arr = [ '0'=>'0','1'=>'1','2'=>'2','3'=>'3','4'=>'4','5'=>'5','6'=>'6','7'=>'7','8'=>'8','9'=>'9', 'A'=>'A','B'=>'B','C'=>'C','D'=>'D','E'=>'E','F'=>'F','G'=>'G','H'=>'H','I'=>'I','J'=>'J', 'K'=>'K','L'=>'L','M'=>'M','N'=>'N','O'=>'O','P'=>'P','Q'=>'Q','R'=>'R','S'=>'S','T'=>'T', 'U'=>'U','V'=>'V','W'=>'W','X'=>'X','Y'=>'Y','Z'=>'Z', 'a'=>'a','b'=>'b','c'=>'c','d'=>'d','e'=>'e','f'=>'f','g'=>'g','h'=>'h','i'=>'i','j'=>'j', 'k'=>'k','l'=>'l','m'=>'m','n'=>'n','o'=>'o','p'=>'p','q'=>'q','r'=>'r','s'=>'s','t'=>'t', 'u'=>'u','v'=>'v','w'=>'w','x'=>'x','y'=>'y','z'=>'z', '('=>'(',')'=>')','〔'=>'[','〕'=>']','【'=>'[','】'=>']','〖'=>'[','〗'=>']', '{'=>'{','}'=>'}','《'=>'<','》'=>'>','%'=>'%','+'=>'+','—'=>'-','-'=>'-','~'=>'~',':'=>':', '。'=>'.','、'=>',',','=>',',';'=>';','?'=>'?','!'=>'!','…'=>'-','"'=>'"','''=>"'",'`'=>'`', '|'=>'|',' '=>' ', ]; return strtr($str, $arr); } public function processSingleTask($index) { $r = $this->processBatchTask(); $_SESSION['chuang_ailoot_message'] = $r['success'] ? '任务执行成功:'.$r['message'] : '任务执行失败:'.$r['message']; } public function processAllTasks() { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('batch_tasks') ?: []; $c = 0; $f = 0; $t = 0; foreach ($tasks as $i => $task) { if ($task['status'] === 'pending' || $task['status'] === 'failed') { $t++; $res = $this->processBatchTask(); if ($res['success']) $c++; else $f++; sleep(5); } } $_SESSION['chuang_ailoot_message'] = $t ? "批量执行完成:总计{$t},成功{$c},失败{$f}" : "没有待执行的任务"; } public function deleteTask($index) { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('batch_tasks') ?: []; if (isset($tasks[$index])) { unset($tasks[$index]); $tasks = array_values($tasks); $storage->setValue('batch_tasks', $tasks, 'array'); $_SESSION['chuang_ailoot_message'] = '任务已删除'; } } public function retryFailure($index) { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('batch_tasks') ?: []; $failures = $storage->getValue('batch_failures') ?: []; if (isset($failures[$index])) { $f = $failures[$index]; $tasks[] = ['url'=>$f['url'], 'status'=>'pending', 'created_at'=>time(), 'updated_at'=>time(), 'post_id'=>0, 'detailed'=>$f['detailed']??1, 'retry_count'=>0]; unset($failures[$index]); $failures = array_values($failures); $storage->setValue('batch_tasks', $tasks, 'array'); $storage->setValue('batch_failures', $failures, 'array'); $_SESSION['chuang_ailoot_message'] = '任务已重新加入队列'; } } public function batchRetryFailures() { $indices = Input::getStrVar('indices', ''); if (empty($indices)) { $_SESSION['chuang_ailoot_message'] = '未选择任何记录'; return; } $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('batch_tasks') ?: []; $failures = $storage->getValue('batch_failures') ?: []; $indexArray = array_map('intval', explode(',', $indices)); $count = 0; foreach ($indexArray as $index) { if (isset($failures[$index])) { $f = $failures[$index]; $tasks[] = [ 'url' => $f['url'], 'status' => 'pending', 'created_at' => time(), 'updated_at' => time(), 'post_id' => 0, 'detailed' => $f['detailed'] ?? 1, 'retry_count' => 0 ]; unset($failures[$index]); $count++; } } if ($count > 0) { $failures = array_values($failures); $storage->setValue('batch_tasks', $tasks, 'array'); $storage->setValue('batch_failures', $failures, 'array'); $_SESSION['chuang_ailoot_message'] = "已将 {$count} 条失败记录重新加入队列"; } else { $_SESSION['chuang_ailoot_message'] = '未找到有效的失败记录'; } } public function batchDeleteFailures() { $indices = Input::getStrVar('indices', ''); if (empty($indices)) { $_SESSION['chuang_ailoot_message'] = '未选择任何记录'; return; } $storage = Storage::getInstance(self::ID); $failures = $storage->getValue('batch_failures') ?: []; $indexArray = array_map('intval', explode(',', $indices)); $count = 0; rsort($indexArray); foreach ($indexArray as $index) { if (isset($failures[$index])) { unset($failures[$index]); $count++; } } if ($count > 0) { $failures = array_values($failures); $storage->setValue('batch_failures', $failures, 'array'); $_SESSION['chuang_ailoot_message'] = "已删除 {$count} 条失败记录"; } else { $_SESSION['chuang_ailoot_message'] = '未找到有效的失败记录'; } } public function clearCompletedTasks() { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('batch_tasks') ?: []; $original_count = count($tasks); $new = []; foreach ($tasks as $t) { if (!in_array($t['status'], ['completed', 'failed', 'exists'])) { $new[] = $t; } } $new_count = count($new); $storage->setValue('batch_tasks', $new, 'array'); $removed = $original_count - $new_count; $_SESSION['chuang_ailoot_message'] = "已清除 {$removed} 个已完成/失败/已存在任务(剩余 {$new_count} 个)"; $this->logDebug("clearCompletedTasks: 原有 {$original_count} 个,清除 {$removed} 个,剩余 {$new_count} 个"); } public function clearPendingTasks() { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('batch_tasks') ?: []; $original_count = count($tasks); $new = []; $removed = 0; foreach ($tasks as $t) { if ($t['status'] !== 'pending') { $new[] = $t; } else { $removed++; } } $storage->setValue('batch_tasks', $new, 'array'); $_SESSION['chuang_ailoot_message'] = "已清除 {$removed} 个等待中的任务(剩余 " . count($new) . " 个)"; $this->logDebug("clearPendingTasks: 原有 {$original_count} 个,清除 {$removed} 个等待中任务,剩余 " . count($new) . " 个"); } public function clearFailures() { $storage = Storage::getInstance(self::ID); $storage->setValue('batch_failures', [], 'array'); $_SESSION['chuang_ailoot_message'] = '失败记录已清空'; } public function addExampleTasks() { $urls = [ 'https://chat.openai.com/', 'https://www.midjourney.com/', 'https://www.canva.com/', 'https://runwayml.com/', 'https://www.descript.com/', 'https://www.adobe.com/products/firefly.html', 'https://www.copy.ai/', 'https://www.jasper.ai/', 'https://www.grammarly.com/', 'https://www.synthesia.io/' ]; $added = 0; foreach ($urls as $url) $added += $this->addSingleUrlToQueue($url, 1); $_SESSION['chuang_ailoot_message'] = "已添加 {$added} 个示例任务"; } public function forceAllToPending() { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('batch_tasks') ?: []; $cnt = 0; $details = []; $new_tasks = []; foreach ($tasks as $task) { if ($task['status'] !== 'pending') { $old_status = $task['status']; $task['status'] = 'pending'; $task['updated_at'] = time(); $cnt++; $details[] = "{$old_status}→pending"; } $new_tasks[] = $task; } $storage->setValue('batch_tasks', $new_tasks, 'array'); $status_detail = implode(', ', array_count_values($details)); $_SESSION['chuang_ailoot_message'] = "已将 {$cnt} 个任务设为等待中({$status_detail})"; $this->logDebug("forceAllToPending: 将 {$cnt} 个任务设为等待中"); } public function resetProcessingToPending() { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('batch_tasks') ?: []; $cnt = 0; $processing_urls = []; $new_tasks = []; foreach ($tasks as $task) { if ($task['status'] === 'processing') { $processing_urls[] = $task['url']; $task['status'] = 'pending'; $task['updated_at'] = time(); $cnt++; } $new_tasks[] = $task; } $storage->setValue('batch_tasks', $new_tasks, 'array'); $verify_tasks = $storage->getValue('batch_tasks') ?: []; $verify_processing = 0; foreach ($verify_tasks as $vt) { if ($vt['status'] === 'processing') $verify_processing++; } $_SESSION['chuang_ailoot_message'] = "已将 {$cnt} 个进行中的任务重置为等待中(验证后剩余 {$verify_processing} 个进行中)"; $this->logDebug("resetProcessingToPending: 将 {$cnt} 个 processing 任务重置为 pending"); $this->logDebug("resetProcessingToPending: 验证后剩余 {$verify_processing} 个 processing 任务"); } public function diagnoseTasks() { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('batch_tasks') ?: []; $diagnosis = []; foreach ($tasks as $idx => $task) { $status = $task['status'] ?? 'unknown'; if (!isset($diagnosis[$status])) { $diagnosis[$status] = ['count' => 0, 'urls' => []]; } $diagnosis[$status]['count']++; if ($status === 'processing' && count($diagnosis[$status]['urls']) < 5) { $diagnosis[$status]['urls'][] = $task['url']; } } $report = "任务诊断报告:\n"; foreach ($diagnosis as $status => $info) { $report .= " [{$status}] {$info['count']} 个任务\n"; if (!empty($info['urls'])) { $report .= " 示例URL: " . implode(', ', $info['urls']) . "\n"; } } $_SESSION['chuang_ailoot_message'] = $report; $this->logDebug("diagnoseTasks:\n" . $report); } public function pauseCron() { $storage = Storage::getInstance(self::ID); $storage->setValue('cron_paused', true); $_SESSION['chuang_ailoot_message'] = '定时任务已暂停,现在可以安全地重置任务状态'; $this->logDebug('pauseCron: 定时任务已暂停'); } public function resumeCron() { $storage = Storage::getInstance(self::ID); $storage->setValue('cron_paused', false); $_SESSION['chuang_ailoot_message'] = '定时任务已恢复'; $this->logDebug('resumeCron: 定时任务已恢复'); } private function addSingleUrlToQueue($url, $detailed = 1) { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('batch_tasks') ?: []; foreach ($tasks as $t) if ($t['url'] === $url) return false; $tasks[] = ['url'=>$url, 'status'=>'pending', 'created_at'=>time(), 'updated_at'=>time(), 'post_id'=>0, 'detailed'=>$detailed, 'retry_count'=>0]; $storage->setValue('batch_tasks', $tasks, 'array'); return true; } private function logDebug($msg) { $msg = self::cleanUtf8((string)$msg); $msg = preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/u', '', $msg); $log = "[" . date('Y-m-d H:i:s') . "] " . $msg . "\n"; $file = __DIR__ . '/chuang_ailoot_debug.log'; if (file_exists($file) && filesize($file) > 10*1024*1024) { $lines = file($file); $lines = array_slice($lines, -1000); file_put_contents($file, implode('', $lines)); } @file_put_contents($file, $log, FILE_APPEND); } // ========== 更新任务相关方法 ========== public function addUpdateTasks($post_ids, $sortids = []) { if (empty($post_ids)) return 0; if (!is_array($post_ids)) { $post_ids = array_filter(array_map('trim', explode("\n", $post_ids))); } $post_ids = array_map('intval', $post_ids); $post_ids = array_filter($post_ids, function($id) { return $id > 0; }); if (empty($post_ids)) return 0; $storage = Storage::getInstance(self::ID); $update_tasks = $storage->getValue('update_tasks') ?: []; $added = 0; foreach ($post_ids as $post_id) { $exists = false; foreach ($update_tasks as $task) { if ($task['post_id'] == $post_id) { $exists = true; break; } } if ($exists) continue; $db = Database::getInstance(); $sql = "SELECT gid, title FROM " . DB_PREFIX . "blog WHERE gid = {$post_id} AND hide = 'n' LIMIT 1"; $articleRow = $db->once_fetch_array($sql); if (!$articleRow) { $this->logDebug("文章ID {$post_id} 不存在或已隐藏,跳过"); continue; } $nav_url = ''; $navTable = DB_PREFIX . 'chuang_nav'; $sql = "SELECT `value` FROM `{$navTable}` WHERE `gid` = " . intval($post_id) . " LIMIT 1"; $navRow = $db->once_fetch_array($sql); if ($navRow && !empty($navRow['value'])) { $nav_data = @unserialize($navRow['value']); if (is_array($nav_data) && isset($nav_data['chuang_url'])) { $nav_url = $nav_data['chuang_url']; } } if (empty($nav_url)) { $sql2 = "SELECT `value` FROM `{$navTable}` WHERE `id` = " . intval($post_id) . " LIMIT 1"; $navRow2 = $db->once_fetch_array($sql2); if ($navRow2 && !empty($navRow2['value'])) { $nav_data = @unserialize($navRow2['value']); if (is_array($nav_data) && isset($nav_data['chuang_url'])) { $nav_url = $nav_data['chuang_url']; } } } if (empty($nav_url)) { $this->logDebug("文章ID {$post_id} 无目标URL,跳过"); continue; } $this->logDebug("文章ID {$post_id} 找到URL: {$nav_url}"); $task_data = [ 'post_id' => $post_id, 'title' => $articleRow['title'], 'url' => $nav_url, 'status' => 'pending', 'created_at' => time(), 'updated_at' => time(), 'detailed' => 1, 'retry_count' => 0 ]; if (!empty($sortids) && is_array($sortids)) { $task_data['sortids'] = array_map('intval', $sortids); } $update_tasks[] = $task_data; $added++; } if ($added > 0) { $storage->setValue('update_tasks', $update_tasks, 'array'); } return $added; } public function getUpdateTasks() { $storage = Storage::getInstance(self::ID); return $storage->getValue('update_tasks') ?: []; } public function getUpdateStats() { $tasks = $this->getUpdateTasks(); $stats = ['total'=>0, 'pending'=>0, 'processing'=>0, 'completed'=>0, 'failed'=>0]; foreach ($tasks as $task) { $stats['total']++; $s = $task['status'] ?? ''; if (isset($stats[$s])) $stats[$s]++; } return $stats; } private function handleUpdateBatchAction() { $action = Input::getStrVar('update_action'); $task_index = Input::getIntVar('task_index', -1); switch ($action) { case 'run_update_task': $this->processSingleUpdateTask($task_index); break; case 'run_all_update_tasks': $this->processAllUpdateTasks(); break; case 'delete_update_task': $this->deleteUpdateTask($task_index); break; case 'clear_completed_update': $this->clearCompletedUpdateTasks(); break; case 'clear_all_update': $this->clearAllUpdateTasks(); break; case 'retry_update_task': $this->retryUpdateTask($task_index); break; } header('Location: ' . BLOG_URL . 'admin/plugin.php?plugin=' . self::ID . '&tab=update&t=' . time()); exit; } public function processSingleUpdateTask($index) { $result = $this->processUpdateBatchTask(); $_SESSION['chuang_ailoot_message'] = $result['success'] ? '更新任务执行成功:'.$result['message'] : '更新任务执行失败:'.$result['message']; } public function processAllUpdateTasks() { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('update_tasks') ?: []; $c = 0; $f = 0; $t = 0; foreach ($tasks as $i => $task) { if ($task['status'] === 'pending' || $task['status'] === 'failed') { $t++; $res = $this->processUpdateBatchTask(); if ($res['success']) $c++; else $f++; sleep(5); } } $_SESSION['chuang_ailoot_message'] = $t ? "批量更新执行完成:总计{$t},成功{$c},失败{$f}" : "没有待执行的更新任务"; } public function processUpdateBatchTask($retry = false) { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('update_tasks') ?: []; $index = null; foreach ($tasks as $i => $task) { if ($task['status'] === 'pending') { $index = $i; break; } } if ($index === null) { foreach ($tasks as $i => $task) { if ($task['status'] === 'failed' && ($task['retry_count'] ?? 0) < 3) { $index = $i; break; } } } if ($index === null) return ['success' => false, 'message' => '无待处理更新任务', 'details' => '']; $post_id = $tasks[$index]['post_id']; $url = $tasks[$index]['url']; $detailed = $tasks[$index]['detailed'] ?? 1; $sortids = $tasks[$index]['sortids'] ?? []; $tasks[$index]['status'] = 'processing'; $tasks[$index]['updated_at'] = time(); $tasks[$index]['retry_count'] = ($tasks[$index]['retry_count'] ?? 0) + 1; $storage->setValue('update_tasks', $tasks, 'array'); $success = false; $error_msg = ''; try { $this->logDebug("开始处理更新任务 #{$index}: 文章ID={$post_id}, URL={$url}"); $result = $this->generateArticleFromUrl($url, $detailed, true, $post_id, $retry); if (!$result['success']) { if (!$retry && strpos($result['error'], '内容不足') !== false) { $this->logDebug("首次抓取内容不足,尝试增强抓取..."); $result = $this->generateArticleFromUrl($url, $detailed, true, $post_id, true); } if (!$result['success']) throw new Exception($result['error'] ?? '生成失败'); } $options = ['sortids' => $sortids]; $updated = $this->updateArticle($post_id, $result, $options); if (!$updated) throw new Exception('文章更新失败'); $success = true; $this->logDebug("更新任务完成,文章ID: {$post_id}"); } catch (Exception $e) { $error_msg = $e->getMessage(); $this->logDebug("更新任务失败: " . $error_msg); } $tasks = $storage->getValue('update_tasks') ?: []; if (isset($tasks[$index])) { if ($success) { $tasks[$index]['status'] = 'completed'; $tasks[$index]['error'] = ''; } else { $tasks[$index]['status'] = 'failed'; $tasks[$index]['error'] = $error_msg; if (strpos($error_msg, '别名已存在') !== false) { $tasks[$index]['retry_count'] = 3; } } $tasks[$index]['updated_at'] = time(); $storage->setValue('update_tasks', $tasks, 'array'); } if ($success) { return ['success' => true, 'message' => "文章更新成功 ID: {$post_id}", 'post_id' => $post_id, 'details' => "URL: {$url}"]; } else { return ['success' => false, 'message' => $error_msg, 'details' => "URL: {$url}"]; } } private function updateArticle($post_id, $aiData, $options = []) { $db = Database::getInstance(); $db->query("SET NAMES utf8mb4"); $aiData['title'] = self::cleanUtf8($aiData['title']); $aiData['content'] = self::cleanUtf8($aiData['content']); $aiData['excerpt'] = self::cleanUtf8($aiData['excerpt'] ?? ''); $excerpt = ''; if (!empty($aiData['excerpt'])) { $excerpt = strip_tags($aiData['excerpt']); $excerpt = html_entity_decode($excerpt, ENT_QUOTES, 'UTF-8'); $excerpt = mb_substr($excerpt, 0, 15, 'UTF-8'); } $title = $db->escape_string($aiData['title']); $content = $db->escape_string($aiData['content']); $excerpt_escaped = $db->escape_string($excerpt); $sql_get_alias = "SELECT alias, date FROM " . DB_PREFIX . "blog WHERE gid = " . intval($post_id); $row_alias = $db->once_fetch_array($sql_get_alias); $alias = $db->escape_string($row_alias['alias'] ?? ''); $original_date = $row_alias['date'] ?? time(); $sql = "UPDATE " . DB_PREFIX . "blog SET title = '{$title}', content = '{$content}', excerpt = '{$excerpt_escaped}', alias = '{$alias}', date = {$original_date} WHERE gid = {$post_id}"; if (!$db->query($sql)) { $this->logDebug("更新文章基本信息失败: " . $db->error()); return false; } $this->logDebug("跳过标签更新,保留原有标签"); if (!empty($options['sortids']) && is_array($options['sortids'])) { $sortids = array_map('intval', $options['sortids']); $sortids = array_filter($sortids, function($id) { return $id > 0; }); if (!empty($sortids)) { $main_sortid = intval($sortids[0]); $db->query("UPDATE " . DB_PREFIX . "blog SET sortid = {$main_sortid} WHERE gid = " . intval($post_id)); $meta_table = DB_PREFIX . 'meta'; $type = 'sort'; $db->query("DELETE FROM {$meta_table} WHERE gid = " . intval($post_id) . " AND type = 'sort'"); foreach ($sortids as $sid) { $db->query("INSERT INTO {$meta_table} (gid, metaid, type) VALUES (" . intval($post_id) . ", {$sid}, 'sort')"); } $this->logDebug("多分类更新成功: " . implode(',', $sortids)); } } else { $this->logDebug("未指定多分类,保持原有分类"); } if (!empty($aiData['cover_url'])) { $this->setPostCover($post_id, $aiData['cover_url']); } if (!empty($aiData['seo_title']) || !empty($aiData['seo_description'])) { $this->saveTdk($post_id, $aiData['seo_title'] ?? '', $aiData['seo_description'] ?? '', ''); } $this->saveNavFields($post_id, $aiData); if (class_exists('Cache')) { try { Cache::getInstance()->updateCache(); $this->logDebug("全站缓存已刷新"); } catch (Exception $e) { $this->logDebug("缓存刷新失败: " . $e->getMessage()); } } return true; } public function deleteUpdateTask($index) { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('update_tasks') ?: []; if (isset($tasks[$index])) { unset($tasks[$index]); $tasks = array_values($tasks); $storage->setValue('update_tasks', $tasks, 'array'); $_SESSION['chuang_ailoot_message'] = '更新任务已删除'; } } public function retryUpdateTask($index) { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('update_tasks') ?: []; if (isset($tasks[$index])) { $tasks[$index]['status'] = 'pending'; $tasks[$index]['retry_count'] = 0; $tasks[$index]['updated_at'] = time(); $storage->setValue('update_tasks', $tasks, 'array'); $_SESSION['chuang_ailoot_message'] = '更新任务已重新加入队列'; } } public function clearCompletedUpdateTasks() { $storage = Storage::getInstance(self::ID); $tasks = $storage->getValue('update_tasks') ?: []; $remaining = []; foreach ($tasks as $task) { if ($task['status'] !== 'completed') { $remaining[] = $task; } } $storage->setValue('update_tasks', $remaining, 'array'); $_SESSION['chuang_ailoot_message'] = '已清理完成的更新任务'; } public function clearAllUpdateTasks() { $storage = Storage::getInstance(self::ID); $storage->setValue('update_tasks', [], 'array'); $_SESSION['chuang_ailoot_message'] = '已清空所有更新任务'; } public function handleUpdateCron() { header('Content-Type: application/json; charset=utf-8'); set_time_limit(600); $token = isset($_GET['token']) ? trim($_GET['token']) : ''; if (!$this->verifyCronToken($token)) exit(json_encode(['success' => false, 'message' => 'Token验证失败'])); $storage = Storage::getInstance(self::ID); $paused = $storage->getValue('cron_paused', false); if ($paused) { $this->logDebug('定时任务已暂停,跳过执行'); exit(json_encode(['success' => false, 'message' => '定时任务已暂停'])); } $start = time(); $max_execution_time = 180; $success_count = 0; $fail_count = 0; $results = []; $this->logDebug('更新定时任务开始(连续处理模式)'); while (true) { if (time() - $start >= $max_execution_time) { $this->logDebug("达到最大执行时间,停止处理"); break; } $result = $this->processUpdateBatchTask(); if (!$result['success']) { if ($result['message'] === '无待处理更新任务') { $this->logDebug("无待处理更新任务,结束循环"); break; } $fail_count++; $results[] = $result; continue; } $success_count++; $results[] = $result; usleep(500000); } $total_time = time() - $start; $this->logDebug("更新定时任务结束,耗时:{$total_time}秒,成功:{$success_count},失败:{$fail_count}"); $summary = [ 'success' => true, 'message' => "更新定时任务执行完毕,成功:{$success_count},失败:{$fail_count},耗时:{$total_time}秒", 'success_count' => $success_count, 'fail_count' => $fail_count, 'results' => $results, ]; $this->logCronExecution($summary); echo json_encode($summary); exit; } public function getUpdateCronUrl() { return BLOG_URL . '?ai_cron_update=1&token=' . $this->generateCronToken(); } } ChuangAiLootBatch::getInstance()->init(); <!doctype html> <html lang="zh-CN"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0"> <meta http-equiv="X-UA-Compatible" content="ie=edge"> <meta name="keywords" content="大语言模型,OpenAI,人工智能,智能体开发"/> <meta name="description" content="2026年4月16日,人工智能研发机构OpenAI正式推出智能体软件开发工具包Agents SDK的重大更新,本次更新由刘煜编译、陈骏达编辑,新增原生沙箱执行环境、升级分布式管控框架,实现管控与计算资源分离,解决了长任务整体崩溃的行业痛点,新功能沿用原有API标准定价,目前已通过API面向所有客户开放。"/> <meta name="generator" content="emlog pro"/> <title>OpenAI更新智能体SDK 新增沙箱解决长任务崩溃问题
登录体验完整功能(收藏、点赞、评论等) —

让AI触手可及,让应用激发潜能

OpenAI更新智能体SDK 新增沙箱解决长任务崩溃问题

2026年4月16日,人工智能研发机构OpenAI正式推出智能体软件开发工具包Agents SDK的重大更新,本次更新由刘煜编译、陈骏达编辑,新增原生沙箱执行环境、升级分布式管控框架,实现管控与计算资源分离,解决了长任务整体崩溃的行业痛点,新功能沿用原有API标准定价,目前已通过API面向所有客户开放。

当前AI智能体已经成为大模型落地产业端的核心方向之一,开发者越来越多地尝试用智能体处理多步骤的复杂长任务,从自动批量处理千级文档到连续调试代码项目,覆盖场景越来越广。但长期以来,长任务整体崩溃是行业普遍存在的痛点:只要智能体执行过程中某一个环节出错,往往会导致整个任务前功尽弃,同时无隔离的运行环境也存在越权操作的安全隐患,让不少企业不敢将核心业务交给AI智能体处理。

本次OpenAI对Agents SDK的更新,精准瞄准了开发者的核心痛点。最核心的更新是新增了原生沙箱执行环境,相当于给AI智能体单独开辟了一块完全受控的隔离运行空间,所有操作都在沙箱内部完成,既避免了异常操作影响外部系统,也阻挡了潜在的安全风险。

除此之外,本次更新还升级了分布式管控框架,支持智能体在指定工作空间内处理文件,仅能调用开发者预先授权的工具,进一步收紧了权限边界。更关键的设计是实现了管控框架与计算资源的分离,这套设计既守住了安全与稳定性的底线,又不会限制智能体的规模扩展,让开发者可以按需调配资源支撑更大体量的长任务。

!
信息及评测声明: 本文部分信息整理自互联网公开资料,并包含由 AI创作导航 团队独立进行的实测体验。我们力求内容客观准确,但因工具功能、价格及政策可能存在实时调整,所有信息仅供参考,请务必在使用前访问官网确认。文中观点不构成任何决策建议,读者需自行评估和承担使用风险。如发现内容有误或侵权,欢迎随时反馈,我们将及时核实处理。
相关资讯
AI小创