20-F
BAIDU, INC. filed this Form 20-F on 03/31/2017
Entire Document
 


Table of Contents

inputted by our users, allowing us to provide more relevant search results to users. Starting from 2013, we applied deep learning technology in our search ranking system, and such technology is playing an increasingly important role in search.

Information Extraction. We extract information from a web page using high performance algorithms and information extraction techniques. Our techniques enable us to understand web page content, delete extraneous data, build link structures, identify duplicate and junk pages and decide whether to include or exclude a web page based on its quality. Our techniques can process millions of web pages quickly. In addition, our anti-spam algorithms and tools can identify and respond to spam web pages quickly and effectively.

Web Crawling. Our powerful computer clusters and intelligent scheduling algorithms allow us to crawl web pages efficiently. We can easily scale up our system to collect an ever-growing number of Chinese web pages. Our spider technology enables us to refresh web indices at intervals ranging from every few minutes to every few weeks. We set the index refresh frequency based on our knowledge of internet search users’ needs and the nature of the information. For example, our news index is typically updated every five minutes, and can be as frequent as every minute, throughout the day given the importance of timely information for news. We also mine multimedia and other forms of files from web page repositories.

Knowledge Graph. We build our knowledge graph by extracting and aggregating the content from multiple sources and classify them into billions of entities, where each entity is a well-defined structure data, consisting of various attributes and operations. We also developed applied technology based on our knowledge graph that uses existing data and generates rich new knowledge to satisfy the demands of users. Our knowledge graph provides powerful connection between entities and online services in a wide range of areas.

Natural Language Processing. We analyze and understand user queries and web pages by using various natural language processing techniques, including, among others, word segmentation, named entity recognition, entity linking, syntax and semantic analysis, sentiment analysis, summarization, generation, paraphrasing and language dependent encoding, all of which enhance the accuracy of our search results. For Q&A type searches, we provide relevant and in-depth answers to search inquiries by using our deep analysis and learning technology to locate, summarize and consolidate relevant information from massive data. For voice search, we understand user queries via context-aware analysis and provide answers via dialogue management and generation technologies. For feed recommendation, we model both users and contents from a variety of semantic perspectives to improve the accuracy and diversity of recommendations.

Multimedia Technologies. We work on developing intelligent algorithms and systems to better understand human spoken languages, identify audio contents, and recognize the meaning of images and videos. These technologies will enable users to access information in a most natural way, and help our search engine better organize the vast amount of multimedia contents on the web. For example, our speech recognition technology has been applied to our mobile search on smartphones, and our face recognition technology has been applied to generate relevant photos when a person is searched. We have also launched similar image search engine, which can recognize the object and scene in the image that users want to search for and return an image that contains the most similar object and scene.

Aladdin aims at discovering useful information of the “Hidden Web,” which usually refers to the invisible database of the numerous websites and the part of the internet that traditional search engine technology may not be able to index. The resulted Aladdin platform enriches our search index and hence provides richer search results to our users. Our Aladdin platform, which not only provides a better and faster way to integrate new “hidden web” information into our search index, but also revolutionizes the search result presentation of the search result page.

MIP (Mobile Instant Pages) is a set of open technical standards applying to mobile webpages, which accelerates the loading of mobile webpages by adopting MIP-HTML norms, MIP-JS operating environment and MIP-Cache system. When mobile websites use this backend technology, the speed at which they can be visited

 

55