|
C O U R S E L E C T U R E Search and Basic Indexing Notes taken on June 16, 2014 by Edward Tanguay |
based on 1950s party game where a person takes type-written answers to questions and tries to determine if it was written by a man or a woman
human tries to determine if text is written by human or computer
a CAPTCHA is an example of a Turing Test in reverse
examples of successful artificial intelligence
instant translations of hundreds of texts
object recognition of by e.g. Google Googles
machines recognizing faces e.g. on Facebook
billions of Facebook pages
hundreds of million tweets a day
millions of servers, petabytes of data
old-style business intelligence
databases, clean the data, data warehouse, more database, statistics
new-style business intelligence (Google, Facebook, etc.)
this is the heart of big data technology
relationship between data and intelligence
the difference between data and intelligence is with intelligence, you can predict
applications that use big data to achieve predictive intelligence
online advertising predicting our intent and interest
gauging consumer sentiment and predicting behavior
detecting adverse events and predicting their impact
recognizing places and faces
personalize genomic medicine
medicines actually have different affects on each person
intelligent public services for energy, water
securing ourselves from criminals
fusing social intelligence with business intelligence
data-driven business models and processes
how to predict the future with artificial intelligence and big data
the purpose of looking is to find stuff
finding stuff is essential to intelligent behavior
on the web, finding stuff is about finding documents
we type in words and expect to find documents
if we type in "large bird" we want documents which contain both "large" and "bird" sorted by those on top which have both "large" and "bird"
we do this via indexing
in a binary tree, looking up a document takes O(log m)
using a hash might be faster
in multiple-term queries, you would also need to sort them
the time it takes is O(r q) if r = number of intermediate results in all