Google’s PageRank and Beyond The Science of Search

Why doesn’t your home page appear on the first page of search results, even when you query your own name? How do other Web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of Web page rankings, Google’s PageRank and Beyond supplies the answers to these and other questions and more.

The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research.

The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample Web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text.

Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided.

  • Many illustrative examples and entertaining asides
  • MATLAB code
  • Accessible and informal style
  • Complete and self-contained section for mathematics review

more info

15 Responses to “Google’s PageRank and Beyond The Science of Search”

  • HonestAbe says:

    An excellent introduction into the algorithms and mathematical bases underlying Pagerank and other link-based algorithms
    Rating:5 out of 5 stars
    The authors subdivide the book into two main sections: the first few chapters, which are conversational in the manner in which they address pagerank and similar algorithms, and the subsequent chapters, which grow increasingly mathematical. Both authors have strong backgrounds in mathematics, hence that focus. Understanding that, the book is very approachable, lucid and useful in understanding the treated subject matter.

  • Geraldo XEXEO says:

    Good review
    Rating:4 out of 5 stars
    This book is a good review of the mathematics behind PageRank and other algorithms, such as HITS. It can be used as an auxilliary text in bth graduate and undergraduate courses.

    The main characteristic of the book is that it covers a subject that is not present in most textbooks of the area.

  • Kevin A. Burton says:

    Great book…
    Rating:5 out of 5 stars
    Great book. It’s nice to have all the recent work done in trust metrics all in one place.

  • Truyen Tran says:

    A bit dissapoited
    Rating:3 out of 5 stars
    I’ve read Langville’s papers as part of my study on link-based ranking techniques. However, the book is only intended to be a very gentle introduction for people with good maths background, and who only want to play with the maths behind PageRank. I would expect more comprehensive materials and deeper insights on the technology for search engine ranking.

  • M. Clements says:

    truly pagerank and beyond
    Rating:5 out of 5 stars
    Great book describing the algorithms that made current search engines so useful and popular. The book describes the math behind the pagerank and HITS algorithms, supported by MATLAB code. Wonderfully written!

    Do not buy this book if you want to know how to use search engines, only if you want to understand them!

  • Carl Cerecke says:

    The maths of google
    Rating:3 out of 5 stars
    The subtitle “The science of search engine rankings” is a misnomer. This book is primarily about the *mathematics* of pagerank. For non-mathematicians, such as a computer scientist like myself (though I do have undergrad maths), it was pretty slow going and just plain boring.

    I wanted algorithm examples for pagerank calculation of largish (10M) data sets. Not matlab code. Matlab might be great for people who love matrices and don’t mind being locked-in to a proprietary language, but it is hardly a sensible choice for a production implementation of the pagerank algorithm. And an algorithm using matrix manipulation, while it might be mathematically nice, is difficult to implement efficiently without fancy matrix compression tricks (as far as I can tell).

    In the end, I discarded the book, and wrote my own shorter, simpler, non-matrix implementation in python, verified it produced the same results, and then rewrote it in C. It is quite fast enough for 10M pages even without any fancy optimisations. Not a matrix in sight. Yay.

    For mathematicians, this book might deserve more than 3 stars. For computer scientists though, I wouldn’t recommend it.

  • Golden Lion says:

    iPhone – Google – “Single function buttons hold promise”
    Rating:5 out of 5 stars
    Naisbitt wrote that major trends are the result of innovations discovered and happening locally. A person should be able to read his local newspaper discover what products and services have come into demand and reflect on the new emerging wave of productivity generated; a boost in life style quality for the middle class; a new wave of technology, education, and globalization surrounding the new products and services.

    1. It will take 70 years for life style quality to double at 1% increase in productivity per year. Computers and technology will change the flat productivity line.

    2. 20 years of prosperity will emerge for the middle class because of boost in computers and technology. The US middle class will become more productive, wealthy, and skilled. US innovation will create new and better jobs. Jobs will be exported to economies with cheap labor.

    3. Change takes time. It takes time to change people’s mind. Change starts by reorganization of the work place allowing more rapid increases in productivity. The increase in productivity matches the equation rate of growth of living standards equals Gross Domestic Production. The reeducation of the work force into knowledge skill works will be accomplished by Community Colleges.

    4. Technology and computer software interactions will need to become simpler. The fact software design is becoming simply will cause the middle class wealth to surge upward. The army has taken steps to simplify the computer software and hardware running the A1 tank and retrofit upgrades into a new A12 tank. Handling of the tank was simplified using a joy stick and spread screens for the driver and the commander changing the tactic too hunter/killer pair. “Complex computer technology should be easy to use”, it will “open the door to employees of lesser skills”, and fill jobs involving computers.

    5. Speech recognition software and network computing will become widely used in software. Knowledge based agents will be programmed using rules that were “picked from the brains of experts and codified”.

    6. iPhone will become a major player. The measure of computing is whether you can hold in your hand”, says Mark Weiser. Single function button applications hold promise. Dragon Software system has simplified menial tasks for lawyers and medical. Companies will build applications for the iPhone to inventories.

    7. Google will extend its services allowing people to ask questions that only a knowledge base agent could answer. The search result answers will be amazingly accurate. Google could provide a mechanism for helping the user refine search or correlate across domains of information to helping the user drill down into more relevant and comprehensive information. The group intelligent of the Internet will provide the intelligence necessary for identifying the “best information”, “Wisdom of the crowds”. Google will provide health information similar to a flesh and blood doctor and include descriptive and associative accessment of prescriptions, diagnosis, procedures, and alternatives. Perhaps, Google interface could change to a hybrid Inxight interface with a Doctor Know possibility. The user will ask the Google Doctor Know a question and Google would provide a list of possible categories to limit the data. Additional questions will be asked by the Doctor Know Knowledge Based Agents in these domains bringing back information from distributed pieces of Internet distributed across the web.

    Large call centers will turn to Google for search results and opinions relating too taxes, medical, legal, financial, and social questions. iPhone will be the tool of the future to return the information.

  • Xin Chen says:

    Google‘s PageRank and Beyond: The Science of Search Engine Rankings
    Rating:4 out of 5 stars
    The book is good at explaining the Google‘s pageRanking, and it try to present rigorious math proof to demonstrate the idea. It is good, however, the math part is not well organized, and it is not easy for people without linear algebra knowledge to follow it. Anyway, it is still good book to demonstrate Markov chain in pageRank.

  • Semyon Berkovich says:

    Google‘s PageRank and Beyond
    Rating:5 out of 5 stars
    Google‘s PageRank and Beyond: The Science of Search Engine Rankings” by Amy N. Langville and Carl D. Meyer is a foremost book presenting the captivating mystery of Google. Never before in the whole technological history of the world an idea that is so apparently simple got such an immediate overwhelming practical recognition. This cannot be explained solely by an extraordinary ingenuity. In our view, the PageRank approach had inadvertently revealed the basic mechanism in the workings of the brain – the Axiom of Choice, a mathematical peculiarity considered by Georg Cantor as the Fifth Law of Thought. This axiom along with other logical operations must be innate to the organization of the brain. The Google-type ranking process appears nowadays as an indispensable tool for efficient realization of all kinds of searching. The book gives the most comprehensive overview of the current understanding of this situation.

  • Man Kam Tam says:

    Probability Transition Matrix, Markov Chain, and Stationary Vector
    Rating:5 out of 5 stars
    A web search engine has six major components. The components are (1) crawler module, (2) page repository, (3) indexing module, (4) indexes, (5) query module, and (6) ranking module. The ranking module takes the set of relevant pages and ranks them according to both the content score and the popularity score. The popularity score is the focus of Amy N. Langville and Carl D. Meyer’s “Google‘s PageRank and Beyond: The Science of Search Engine Rankings.” The popularity score of a web page is determined by Web pages’ hyperlink structure.

    Brin and Page`s PagerRank philosophy is that a page with more recommendations must be more important than a page with a few links. Or a web page is more important if it is pointed to by other important page. Brin and Page then build a normalized hyperlink matrix (H). With the adjustments named stochasticity and primitivity, a Google matrix (G) is obtained, which is, in fact, a probability transition matrix of a Markov chain. The desired ranking of the web pages is the stationary vector of the matrix G or the solution of the corresponding linear homogeneous system.

    To calculate the ranking vector is not an easy task, for the matrix G has 8.1 billion rows and 8.1 billions columns. The matrix is growing everyday as the number of web pages grows everyday. The book consider several major large-scale implementation issues such as storage, convergence criterion, accuracy, dangling nodes, and back button modeling. Accelerating methods are presented as well. They are the adaptive power method, extrapolation, and aggregation. Once the ranking vector is calculated, it has to be updated periodically. However, there is no effective and efficient update method available other than calculating from scratch.

    Other ranking methods such as HITS and SALSA are introduced. They are both query dependent. They have both the hub and authority scores. They are both easier to spam than PageRank. Several interesting Matlab programs are provided. One could use them crawl the web, build the matrices, and accelerate the calculation of the stationary vector.

    This is a wonderful book with timely technical material, entertaining asides, and a cute book cover. Best of all, the primary author is a lady. I am looking forward to read more books like this.

  • Denise Magic says:

    More a math textbook than anything else
    Rating:2 out of 5 stars
    You need a degree in math to comprehend this book – if that is what you are looking for great. If not this book is not for web professionals like myself.

  • Rafael Lopez Callejon says:

    not very interesting for promoting web sites
    Rating:2 out of 5 stars
    The book is 90% mathematics. I didn’t found it very practical for promoting web sites.

  • jim says:

    practical and fun
    Rating:5 out of 5 stars
    Great work! I wish I read it before I start my Ph.D. study.

    Pros:

    1) Precise and intuitive description of the search algorithm

    2) Plenty of interesting stories making mathematics fully applicable in practice

    3) Sample Matlab code available

    Cons:

    This is actually a perfect book. But one needs to have basic linear algebra to appreciate its value. If you are looking for “SEO”, you are in a wrong spot.

    But if anyone wonder how Page and Brin turn math into treasure, read it!

  • W Boudville says:

    surveys search techniques
    Rating:5 out of 5 stars
    Langville and Meyer have done a superb job describing both Google‘s technical foundations, and the broader subject of how search engines rank pages. Over half the book is devoted to explaining the maths and rationales behind PageRank. The level of maths is understandable to those who have done some university level courses on linear algebra (i.e. matrices).

    The book also has considerable value in analysing what other organisations (like search engines) and researchers have cobbled together. It gives a useful summation of the state of the research, circa 2006. Essentially, everyone seems to focus on link analysis, after Google revolutionised the industry in 1998 by using this. It blew away the previous leader, AltaVista.

    It is true, as the authors point out, that most of the material here has already been published. But as discrete events, scattered through various scientific journals and websites. You can certainly get explanations of PageRank on several websites. But the mathematical depth and reliability of those discussions can vary with the site. The book is far handier.

    It is a good starting point, if you are interesting in devising your own search methods.

  • M. Grant says:

    Good balance
    Rating:4 out of 5 stars
    The book strikes a good balance between the novice and the highly experienced math junkie

Leave a Reply

Wordpress Webinar