人工智能将在 2025 年进入一个新的阶段。多年来,AI 工具主要根据命令回答问题或生成内容,而今年的创新是关于 AI 真正完成工作。2025 年福布斯 AI 50 强榜单说明了这一关键转变,因为其中的初创公司标志着 AI 从仅响应提示的 AI 转变为解决问题并完成整个工作流程的 AI。尽管围绕 OpenAI、Anthropic 或 xAI 等大型 AI 模型制造商的讨论很多,但 2025 年最大的变化在于这些使用 AI 产生实际业务成果的应用层工具。
从聊天机器人到完成工作流程
从历史上看,AI 助手可以聊天或提供信息,但人类仍然必须根据输出进行作。到 2025 年,这种情况将发生变化。例如,法律 AI 初创公司 Harvey 表明,其软件不仅可以回答法律问题,还可以处理从文档审查到案件预测分析的整个法律工作流程。它的平台可以起草文件、提出修改建议,甚至帮助自动化谈判、案件管理和客户联系——这些任务通常需要一个初级律师团队才能完成。这一成就使 Harvey 在福布斯 AI 50 强榜单上占有一席之地,它体现了 AI 从有用的工具演变为动手解决问题者。
引领潮流的企业工具
许多杰出的 AI 50 公司都是从事在职工作的企业工具。Sierra 和 Cursor 是初创公司 Anysphere 的唯一产品,是新一代商业 AI 的象征。Sierra 实现了客户服务自动化,同时大大改善了体验,为公司随时为客户提供帮助开辟了道路。与此同时,Cursor 已经席卷了软件开发人员社区。它的技术不仅允许任何人自动完成代码行(就像 GitHub CoPilot),而且只需用简单的英语请求即可生成整个功能和应用程序。
机器人技术正在崛起
尽管我们还没有达到公司大规模部署机器人的地步,但专注于机器人技术的初创公司在过去一年中取得了有意义的进步,因为它们将使用变压器(ChatGPT 中的“t”)构建的模型与硬件集成在一起。在 Nvidia 最近的开发者大会上,机器人技术在 Nvidia 的主题演讲中占据了突出位置,Jensen Huang 声称,“ 用于工业和机器人的物理 AI 是一个 50 万亿美元的机会 。Figure AI 最近宣布了其 BotQ 大批量制造工厂,该工厂每年能够生产 12,000 个类人机器人,以及其新的通用视觉-语言-行动 (VLA) 模型 Helix。另一家 AI 50 强公司 Skild AI 正在采取不同的方法,专注于开发通用机器人基础模型 Skild Brain,该模型可以集成到各种机器人中,而不是构建自己的机器人。Skild 还计划向使用 Skild Brain 的机器人行业销售服务。
处于 AI 生产力风口浪尖的消费者
到目前为止,日常消费者主要通过 OpenAI 的 ChatGPT 或 Anthropic 的 Claude(以及 Grok 等新来者)等聊天机器人接触到高级 AI。数亿人尝试使用这些 AI 工具,但我们还没有看到一个真正的主流应用程序,让 AI 为人们完成日常任务。这种情况可能会在 2026 年改变。随着技术的成熟,以及人们意识到 AI 如何在工作中节省时间和金钱,预计会出现一波面向消费者的 AI 产品,这些产品可以代表用户处理整个任务。例如,Anthropic 最近推出了 Claude Code,使消费者能够编写软件。这些新功能提高了 AI 能够端到端管理您的日程安排、预订旅行或组织文件的前景。
Building a Personal AI Knowledge Base with Ollama+DeepSeek+AnythingLLM
DeepSeek has been subjected to a large number of overseas attacks in the past month.Since January 27,the methods of attack have been escalated.In addition to DDoS attacks,analysis has revealed that there have also been numerous brute-force password attacks,causing frequent system outages.In the last article,someone suggested that using AnythingLLM with a wrapper would be more comfortable.Also,the official website's service has been a bit unstable.That's why I decided to try setting up a simple GUI deployment at home today.First,I need to deploy DeepSeek locally.You can refer to my previous article for this.After the deployment is complete,I'll add a wrapper.
The local deployment of DeepSeek has been specifically shared before,so I won't repeat it.You can download AnythingLLM from the official website.
https://anythingllm.com/desktop
Just keep clicking"Next"until the installation is complete.Once it's installed,start the configuration.Here,select"Ollama"and fill in the address with its default setting.Then create a workspace.Then,set it up like this in the settings,and it should be good to go.Then you should be able to start chatting and asking questions happily!
Resource Compilation | 32 Python Crawler Projects to Satisfy Your Appetite!
Today, I have compiled a list of 32 Python web scraping projects for everyone.
I’ve gathered these projects because web scraping is a simple and fast way to get started with Python, and it’s also great for beginners to build confidence. All the links point to GitHub, so have fun exploring! O(∩_∩)O~
WechatSogou [1] - A WeChat public account crawler based on Sogou's WeChat search. It can be expanded to a Sogou search-based crawler, returning a list where each item is a dictionary of detailed public account information.
DouBanSpider [2] - A Douban book crawler. It can scrape all books under Douban's book tags, rank them by rating, and store them in Excel for easy filtering, such as finding highly-rated books with over 1,000 reviewers. Different topics can be saved in separate sheets. It uses User Agent spoofing and random delays to mimic browser behavior and avoid being blocked.
zhihu_spider [3] - A Zhihu crawler. This project scrapes user information and social network relationships on Zhihu. It uses the Scrapy framework and stores data in MongoDB.
bilibili-user [4] - A Bilibili user crawler. Total data: 20,119,918. Fields include user ID, nickname, gender, avatar, level, experience points, followers, birthday, address, registration time, signature, and more. It generates a Bilibili user data report after scraping.
SinaSpider [5] - A Sina Weibo crawler. It mainly scrapes user personal information, posts, followers, and followings. It uses Sina Weibo cookies for login and supports multiple accounts to avoid anti-scraping measures. It primarily uses the Scrapy framework.
distribute_crawler [6] - A distributed novel download crawler. It uses Scrapy, Redis, MongoDB, and graphite to implement a distributed web crawler. The underlying storage is a MongoDB cluster, distributed via Redis, and the crawler status is displayed using graphite. It mainly targets a novel website.
CnkiSpider [7] - A CNKI (China National Knowledge Infrastructure) crawler. After setting search conditions, it executes src/CnkiSpider.py to scrape data, stored in the /data directory. The first line of each data file contains the field names.
LianJiaSpider [8] - A Lianjia crawler. It scrapes historical second-hand housing transaction records in Beijing. It includes all the code from the Lianjia simulated login article.
scrapy_jingdong [9] - A JD.com crawler based on Scrapy. Data is saved in CSV format.
QQ-Groups-Spider [10] - A QQ group crawler. It batch scrapes QQ group information, including group name, group number, member count, group owner, group description, etc., and generates XLS(X) / CSV result files.
wooyun_public [11] - A WooYun crawler. It scrapes WooYun's public vulnerabilities and knowledge base. All public vulnerabilities are stored in MongoDB, taking up about 2GB. If the entire site, including text and images, is scraped for offline querying, it requires about 10GB of space and 2 hours (10M broadband). The knowledge base takes up about 500MB. Vuln search uses Flask as the web server and Bootstrap for the frontend.
spider [12] - A hao123 website crawler. Starting from hao123, it scrolls to scrape external links, collects URLs, and records the number of internal and external links on each URL, along with the title. Tested on Windows 7 32-bit, it can collect about 100,000 URLs every 24 hours.
findtrip [13] - A flight ticket crawler (Qunar and Ctrip). Findtrip is a Scrapy-based flight ticket crawler, currently integrating data from two major ticket websites in China (Qunar + Ctrip).
163spider [14] - A NetEase client content crawler based on requests, MySQLdb, and torndb.
doubanspiders [15] - A collection of Douban crawlers for movies, books, groups, albums, and more.
QQSpider [16] - A QQ Zone crawler, including logs, posts, personal information, etc. It can scrape 4 million pieces of data per day.
baidu-music-spider [17] - A Baidu MP3 site crawler, using Redis for resumable scraping.
tbcrawler [18] - A Taobao and Tmall crawler. It can scrape page information based on search keywords and item IDs, with data stored in MongoDB.
stockholm [19] - A stock data (Shanghai and Shenzhen) crawler and stock selection strategy testing framework. It scrapes stock data for all stocks in the Shanghai and Shenzhen markets over a selected date range. It supports defining stock selection strategies using expressions and multi-threading. Data is saved in JSON and CSV files.
BaiduyunSpider [20] - A Baidu Cloud crawler.
Spider [21] - A social data crawler. It supports Weibo, Zhihu, and Douban.
proxy pool [22] - A Python crawler proxy IP pool.
music-163 [23] - A crawler for scraping comments on all songs from NetEase Cloud Music.
jandan_spider [24] - A crawler for scraping images from Jiandan.
CnblogsSpider [25] - A Cnblogs list page crawler.
spider_smooc [26] - A crawler for scraping videos from MOOC.
CnkiSpider [27] - A CNKI crawler.
knowsecSpider2 [28] - A Knownsec crawler project.
aiss-spider [29] - A crawler for scraping images from the Aiss app.
SinaSpider [30] - A crawler that uses dynamic IPs to bypass Sina's anti-scraping mechanism for quick content scraping.
csdn-spider [31] - A crawler for scraping blog articles from CSDN.
ProxySpider [32] - A crawler for scraping and validating proxy IPs from Xici.
Update:
webspider [33] - This system is a job data crawler primarily using Python 3, Celery, and requests. It implements scheduled tasks, error retries, logging, and automatic cookie changes. It uses ECharts + Bootstrap for frontend pages to display the scraped data.