KEMBAR78
Keyword Research and Topic Modeling in a Semantic Web | PPTX
#pubcon
@bill_slawski
Keyword Research and
Topic Modeling in a Semantic Web
Presented by:
Bill Slawski
Director of SEO Research
Go Fish Digital
#pubcon
@bill_slawski
Leo Carillo Rancho
#pubcon
@bill_slawski
A historic renovated rancho
#pubcon
@bill_slawski
Be Careful to Read All The Signs
#pubcon
@bill_slawski
An Entity Audit Uncovers Surprises
Named entities are specific people, places, and things,
including products and brands.
#pubcon
@bill_slawski
Paul Haahr- How Google Works
#pubcon
@bill_slawski
Schema Markup, Google
MyBusiness Verification, Entry in
Wikipedia can lead to
Knowledge panels, but they are
only the start of adding entity
information…
#pubcon
@bill_slawski
An elevator Ride from the DC Metro
#pubcon
@bill_slawski
There is no clear sign telling people
#pubcon
@bill_slawski
On the DC Metroline, you connect to:
• 91 Stations in Md, Va, & DC
• National Zoo
• 19 Smithsonian Museums
• National Gallery of Art
• Capital One Arena
• Fedex Field
• Pentagon City Shopping Mall
#pubcon
@bill_slawski
Identify all Missing Entities
#pubcon
@bill_slawski
Knowing how Google uses context and
semantically related phrases can improve the
content you create and how well you optimize
pages for particular queries.
#pubcon
@bill_slawski
Keywords & Context Vectors
ā€œFor example, a horse to a rancher is an animal. A horse
to a carpenter is an implement of work. A horse to a
gymnast is an implement on which to perform certain
exercises.
User-context-based search engine
#pubcon
@bill_slawski
Look to Knowledge Bases
https://en.wikipedia.org/wiki/Horse
#pubcon
@bill_slawski
For Other Meanings
https://en.wikipedia.org/wiki/Sawhorse
#pubcon
@bill_slawski
See Disambiguation Pages
https://en.wikipedia.org/wiki/Vault_(gymnastics)
#pubcon
@bill_slawski
Context Search Results
Context-based filtering of search results
#pubcon
@bill_slawski
Map Keywords to Pages, then…
• Make sure you add words that indicate context
• Look up the top pages that rank for those keywords
• Find phrases that co-occur for that meaning
• See: Improving semantic topic clustering for search
Queries with word co-occurrence and biograph co-
clustering
#pubcon
@bill_slawski
Phrase-Based Indexing
• Look for co-occurring phrases on pages that rank highly
for a query.
• Using these related phrases on a page can boost how it
ranks for that query (body hits)
• Using those related phrases as anchors can boost how
the page targeted ranks for that query (anchor hits)
#pubcon
@bill_slawski
Related Words/Phrases
Thematic Modeling Using Related Words in Documents and Anchor Text
#pubcon
@bill_slawski
Use Complete Phrases
• Incomplete Phrase… ā€œPresident of theā€¦ā€
• Complete Phrase… ā€œPresident of the United States.ā€
#pubcon
@bill_slawski
Use Meaningful Phrases
• Some phrases do not add meaning to a page:
Pay the Piper
Out of the Blue
Top of the Morning
#pubcon
@bill_slawski
Predictive Aspects of Phrases
• Semantically, related phrases will be those that are
commonly used to discuss or describe a given topic or
concept, such as "President of the United States" and
"White House." For a given phrase, the related
phrases can be ordered according to their relevance
or significance based on their respective prediction
measures.
• Integrated external related phrase information into a
phrase-based indexing information retrieval system
#pubcon
@bill_slawski
Co-occurring Phrases/High Ranking Pages
#pubcon
@bill_slawski
Clustered Meanings
• Jaguars- Cats, Cars, NFL Football Team
• Java – Programming Language, Island in Indonesia,
Drink
• Bank – A place to store money, a river’s side, to lean
to a side
#pubcon
@bill_slawski
Ranking Documents Based on Contained Phrases (Body Hits)
ā€œā€¦a ranking stage in which the documents in the search
results are ranked, using the phrase information in each
document's related phrase bit vector, and the cluster
bit vector for the query phrases. This approach ranks
documents according to the phrases that are contained
in the document, or informally ā€˜body hits.ā€™ā€
Integrated external related phrase information into a
phrase-based indexing information retrieval system
#pubcon
@bill_slawski
Anchor Hits
ā€Sorting the documents on the outbound score
component makes documents that have many related
phrases to the query as ā€˜anchor hits,’ rank most highly,
thus representing these documents as ā€˜expert’
documentsā€
•Integrated external related phrase information into a
phrase-based indexing information retrieval system
#pubcon
@bill_slawski
Personalization & Query Classifications
• Depending upon results selected by a searcher, the
results they see may fall into a specific category from
a biased document set
Personalizing Search Results at Google
#pubcon
@bill_slawski
Which Lincoln?
#pubcon
@bill_slawski
Look at Knowledge Bases
• Abraham Lincoln
#pubcon
@bill_slawski
Look at Top Search Results
• Lincoln, Towncar
#pubcon
@bill_slawski
Look at Other Search Entities
• Lincoln, Nebraska
#pubcon
@bill_slawski
Query Classifications
Search for ā€œLincolnā€ and click on the Person (Abe), the
Place (Nebraska), or the thing (towncar). What you
click on may determine what you see in the future on
searches for ā€œLincoln.ā€
…determining whether to assign the classification to
the first query based upon classifications for the
identified search entities.
•Propagating query classifications
#pubcon
@bill_slawski
Searches are what we type, and what we say, but
they will also be based upon what we see and
take photos of in the future.
#pubcon
@bill_slawski
Google Lens Schema
Smart Camera User Interface
#pubcon
@bill_slawski
Further Reading
• Knowledge-Based Trust: Estimating the
Trustworthiness of Web Sources
• A Review of Relational Machine Learning for
Knowledge Graphs
• Knowledge Curation and Knowledge Fusion:
Challenges, Models, and Applications
• Improving semantic topic clustering for search queries
with word co-occurrence and bigraph co-clustering
#pubcon
@bill_slawski
Questions? Ask Me At:
• Twitter: https://twitter.com/bill_slawski
• LinkedIn: https://www.linkedin.com/in/slawski/
• Facebook: https://www.facebook.com/bill.slawski
• Google+: https://plus.google.com/+BillSlawski
• SEO by the Sea: http://www.seobythesea.com/
• Go Fish Digital Blog: https://gofishdigital.com/blog/

Keyword Research and Topic Modeling in a Semantic Web