A Call for a Search Discussion – How Google Works

If you follow me on Twitter then you know that I sometimes complain about the current state of the industry – especially focused on what is happening for research and discussion these days. There is an impression that people want them to be given the fish – with little interest in learning how the person with the fish caught the fish. The search for debates and experiments seems to have been replaced by the fact of wanting to be fed by blog posts – often accompanied by assumptions and misinformation hidden in a single-case graph or a smooth graph coupled with an impressive signature

The fact of fiction becomes increasingly difficult as the next generation of our industry gets up by following rather than exploring.

At SMX West Paul Haahr, Google engineer, gave a preview Google works In my opinion, it is the most transparent and useful information that Google has introduced us for years.

I spent this morning taking a whole series of notes – as I always do because I think it helps me better retain information. In my notes, I make notes on the questions I have, the theories I propose and the conclusions I draw, good or bad. This is not the sexy job, but it's the job needed to be a formidable opponent in the game.

When I looked at the notes, I've realized that the discussions, debates and sharing of experiences

Part of what I think limits this kind of discussion nowadays should be considered as an infallible expert on everything related to Google. The industry is so absorbed by being an expert that she is afraid to ask questions or question assumptions, for fear of being proven wrong. Unlike many names in this industry, I'm not afraid to go wrong. I rejoice. Having proven wrong means that I still have a concrete knowledge necessary to win the game. To be questioned or challenged on a theory that I hold gives me another theory to test

So I publish my notes – and personal logs – and I make a call to a real, exploratory – shit if I'm not. ; m true or false – search for discussion. Whether you are an old school with a massive experience or a new school with untested theories and ideas – bring it to the table and see what we can all get out of it.

Presentation Notes How Google Works – SMX West 16

Speaker – Paul Haahr | Presentation Video | Presentation Slides

To be clear, these are my notes of the presentation and not a transcription (the notes in orange are comments made by me and not the speaker) .

General opening remarks

Google is all about the mobile first webYour site counts a lot when searching on mobileAuto full plays a bigger role Its presentation centers mainly around classic search

Life of a petition

Timestamp: 3:38 – Link to the time stamp

Haahr deduced that this next piece of information is a secret version of 20 minutes from the half-day class in which each new Google participates. engineer

He begins by explaining the two main parts of the search engine:

1. What happens in advance (before the query):

CrawlingAnalyzing pages crawled: links, rendering content, semantic annotationBuild the index: think of it as the index of bookMade up of Shards. Shards segment groups of millions of pages. There are thousands of fragments in Google Metadata indexPer document

2. And query processing:

Understanding queries – What does the query mean: are there any known entities? Useful synonyms? Indicates that the context is important for queries.Retrieval and scoringSend the query to all matching pages of shardsFind in each ShardCompute a note for A. the query (relevance) and B. the page (quality) returns the first pages of each Shard by scoreCombine all the best pages of each ShardSort combined top Shard results by scorePost-retrieval settingsHost clustering (notation – does this mean using a dedicated server can be a bonus? If you check the shared hosts for sites with similar topics – for networked or related sites – this has been clarified by an older Googler, see this comment for more details The tldr is that clustering is synonymous with domain clustering (the term the most used in the industry) and the clustering of sites does not refer to the host as to accommodation.) The sitelinks n / a are they appropriate? Is there too much duplication? Spam downgrades and manual actions are applied. The extractions are activated.

What Engineers Do

Timestamp: 8:49 – Link to Timestamp

Write CodeWrite formulas to calculate scoring numbers to find the best match between a query and a page based on the signalsQuery notation independent rating factors – page feature like Pagerank, language, mobile usabilityQuality-dependent rating factors – page and query characteristics such as hits, synonyms, proximity , etc. (notation – with respect to the proximity of the keyword in the page or the locale of the user or the alleged site of the site?) Combining signals to produce new algorithms or filters and improve results

Key indicators for rankings

Timestamp: 10:10 – Link to timestamping

Relevance – La pa Does ge respond to the user's query in context – it's the front of the line? MetricsQuality – What is the quality of the results displayed in relation to the response of the user? What is the quality of the individual pages? (notation – The focus on the individual is mine) Time of result (the faster it is better) (notation – Time for the site to render or for the user to find the answer on the ranking page? Or a combination? Site rendering time could be a sub-time factor for the user to be able to find the answer on the ranking page Modify> Asked Haahr for clarification on Twitter – he is unable to elaborate However, there is some probable elaboration found by Amit Singhal in this commentary .) More unlisted metricsOffer it "should mention" that the measures are based on looking at the SERP as a whole and not for a result as a time. Uses the convention that higher results are importing weighed metrics. reciprocalPosition 1 is worth the most, l a position 2 is half that the number 1 is, the position 3 is 1/3 of the number 1, etc. (notation – The premise of reciprocally classified measures is passed on my head a d J & # 39; appreciate the simplified clarifications on what he is talking about here.)

Optimizing Metrics

Timestamp: 12:00 – Link to Time Stamp

Metric Optimization Ideas and strategies are developed through an internal evaluation process that analyzes the results of various experiments:

Live Experiments

Timestamp: 12:33 – Link to Time Stamping

Split Experiment Tests on Real TrafficLooking for Changes in Click Patterns (Rating – There Has Been a Long Debate date on whether click rates are counted or p ris in the rankings. I've taken his comments here to signify that he claims that click rates are analyzed from a perspective of the quality of the SERP as a whole and to judge the context of the query relative to for the benefit of a specific site getting more clicks. I still agree that I still discuss internally.) Google does a lot of experiments Almost all queries are in at least one live experience test experience – Google tested 41 blue colors for their results links trying to determine an optimal performance

Example given to interpret live experiments: Page 1 against Page 2

The two pages P1 and P2 meet the needs of the user. userFor P1 the answer appears only on the pageFor P2, the response appears both on the page and extracted (pulled by the snippeting algorithm – resource on the algortihm extract ) Algorithm A puts P1 before P2; the user clicks on P1: from an algorithmic point of view, this looks like a "good" result in the analysis of the live experience. The algorithm B puts P2 before P1; but no click is generated because the user sees the answer in the extract. purely from an algorithmic point of view this sounds like a "bad" result

But in this scenario, was the algorithm A better than the algorithm B? The second scenario should be a "good" result because the user got a good answer – faster – from the excerpt. But it is difficult for the algorithm to assess if the user left the SERP because the answer they needed was not there or they left because They got their answer from an excerpt

use of human quality assessors

Quality Experiences of the Human Evaluator

Timestamp: 15:21 – Link to Timestamp

View real people results of experimental researchHave them evaluate how the results are moderately scored by the scorers Published Guidelines Explaining Criteria for Quality Evaluators to Use in Automated Evaluation of SiteTools Support

Other Notes:

– States They Make human-grade evaluator experiences for great sets of queries to obtain statistical significance and cite it as being similar to Mechanical Turk type processes

– mentions that the published calculation guidelines are Google's intentions for the types of resu They want produce (notation – very different from a user's satisfaction rating by a user – instead, they need to identify if the query results meet Google's satisfaction requirements and include the kind of results that Google think should be included – or not included. The quality evaluation rules are the results produced by Google's dream algorithm.)

– He says that if ever we wonder why Google is doing something, c & rsquo; Is most often as the guidelines of the evaluator. (notation – Haahr m reiterated on Twitter how much he believes that reading the guidelines is for SEO.)

– Slide showing the Human Evaluator Tools: Slide 33, 34

– Mobile Re-First – More Mobile Queries in Samples (2x)

Users must be attentive to the location of the user when evaluating the results not on a desktop computer

Types of ranking

Timestamp: 19:04 – Link to the stamp

Assessment of relevance

Are the needs defined The instructions tell the evaluators to think about the needs of mobile users and to think about the satisfaction of the result for mobile users. Tariff scales include: full match, meet very well, meet moderately, meet slightly, fail to meet under a "meets" levelExa mple: a result can be rated highly satisfied and the slider bar allows the 39; evaluator to subclass the result "very satisfying" because he meets very well, meets more, etc. There are two sliders to evaluate the results – one for "needs met" (relevance) and one for "page quality" Examples of complete slides – slide 41: CNN Query – result – full meetsSearch for yelp and you have a yelp application installed on the phone so that google serve the application satisfied with a complete query, they want an unambiguous query and fully satisfy the user's needs for this queryExamples of highly encountered in them slides – slides 42 – 44 showing varied subclassifications of high queries queriesInformation and the result is a great source of informationSite is authoritativeAuthor has expertise on the subject under discussionComprehensive for the issue in questionDisplaying photos where the # 39; user is likely to look for photosExamples of moderately meets in slides – slide 45Re sult has good informationInformation interesting and useful, but not all for the query or super authoritativeNo worthy to be a number one answer, but could be good to have on the first page of the resultsSetly meetsResult contains less than 40% Quality Information Example: A search for Honda Odyssey could bring up the page for Odyssey 2010 on KBB. It meets slightly because the subject is correct and there is good information, but the ranking page is obsolete. The user did not specify the 2010 model, so the user is probably looking for new models. He cites this result as "acceptable but not great" Does not answerExample: A search for German cars and get the Subaru website (which is made in Japan) Example: A search for a rodent moving company brings a result to the # Another end of the world (Notation – They want to geo-locate specific types of queries that are likely to be geocentric in need – for example local service companies. Using quality evaluators can help identify what these types of services are and add them to the geographic standard list such as plumbers, electricians, etc.)

Rating page quality:

Timestamp: 23: 58 – Link on the timestamp

The three most important concepts for quality:

ExpertiseThe author is an expert on the subject? Authoritatibilitypage on the topicTrustworthinessCan you trust? Gives examples of categories where reliability would be most important to assess the overall quality of the page – medical, financial, purchase a product

The rating scale is of high quality at low quality:

Does the page contain high quality signals as defined in part by: Satisfaction of the main content of high qualityThe website shows the expertise, authority and reliability of the subject The page has a good reputation for the subject of the page. Does the page have poor quality signals as partially defined by: The quality of the content is low Insufficient amount of main content The author has no expertise or does not authority on the subject on the subject is bold in its presentation (notation – The concept behind the rank of author lives in my opinion.We were who taught them how to connect the dots with the markup of They can probably do it now algorithmically and no longer need us manually to link these points.) The site has an explicit negative reputationThe secondary content is useless – ads, etc. (notation – human input giving them a roadmap to the way they calculate and shape the Above the Fold algorithm? Probably refers to the affiliation ratings in the evaluator's guidelines search from page 10 of Google's guidelines.)

Metric Optimization – Experiments

Timestamp: 25:28 – Link to Timestamping

Someone has an idea to improve the results via metrics and signals or solve a problem in the resultsRepeat the development and testing on idea until the functionality is ready; The code, the data, the experiments, the analysis of the results of the experiments that can take weeks or months. If the idea turns, final experiments are executed and a launch report is written and undergoes a quantitative analysis. He believes that this process is objective. work on and is emotionally invested in the ideaLaunch review process is heldAnd on Thursday morning, there is a meeting where leaders in the region hear ideas for projects, summaries, reports or experiences, etc.Debates surrounding if that is good for users, for the system architecture, and discuss whether the system can continue to be improved if this change is made. (Notation – It refers to them having published a launch review meeting a few years ago. I believe it refers to that .) S & 39 it is approved, it goes into production it takes a long time to rewrite the code to make it fast enough, clean, adapted to their architecture, etc. and may take months. It took almost two years to ship something.

The main objective is to move pages with good notes up and pages with bad notes down. (notation – I believe it means human notations, but this has not been clarified.)

Two of the basic problems they face in the construction of the algorithm

Timestamp 50 – Link to the timestamp

Systematically wrong classification:

Gives bad examples of notation, texas farm fertilizerUser is looking for a brand of fertilizerShowed a pack of 3 local results and a map to the It is unlikely that the user performing the search would want to go to the company's headquarters as it is sold in local home improvement stores.But the evaluators cite the result on average with the seat map because they are almost highly successful because of the ratings of the evaluators. As a model of losses In a series of experiments that increased card triggering, human evaluators valued them highly. Google was not in agreement, so they changed their scoring guidelines to create more examples of these queries and explain to users that they should be quoted as not meeting – see slide 61 of the presentation The new examples told the evaluators that if they did not think that the user would go there, the cards are a bad result for the query. When Google sees patterns of losses, they look for things that are wrong in the results and create examples for the evaluator's guidelines to correct them

Metrics do not capture things that hold them to the # AKA Statistics Missing

slide with the title Google News Gamed Gamed by a Crappy Content Farm From 2009 to 2011, they received a lot of complaints about poor quality contentBut the reviews Sometimes, low quality content can be very relevant. He cites this as an example of what they consider content farms, they did not measure what they needed. So they defined an explicit measure of quality – which is not the same as relevance – and that is why relevancy and the h have their own cursors for human evaluators now determined quality n & # 39; 39; is not the same as relevantIls have been able to develop quality signals distinct signals of relevance Now they can work on improving the definitions of the two separately in the algorithm

(notation – The emphasis is mine.I think most of the research industry sees this as a metric and thinks that it's important to point out that they are not and do not have not been for a long time now.)

So what now?

Contribute. What ideas did you take from the presentation? What were your thoughts on the things I noticed? Are there things I have not noticed and on which you have a comment or theory? Do you disagree with any of Haahr's claims? Do you disagree with mine? Did anything in his presentation surprise you? Has something been confirmed for you? Whatever your thoughts on his presentation, drop them in the comments below.

Leave a Reply

Your email address will not be published. Required fields are marked *