EDWARD'S LECTURE NOTES:
More notes at http://tanguay.info/learntracker
C O U R S E 
Web Intelligence and Big Data
Gautam Shroff, Indian Institute of Technology Delhi
https://www.coursera.org/course/bigdata
C O U R S E   L E C T U R E 
Search and Basic Indexing
Notes taken on June 16, 2014 by Edward Tanguay
Turing Test
based on 1950s party game where a person takes type-written answers to questions and tries to determine if it was written by a man or a woman
human tries to determine if text is written by human or computer
a CAPTCHA is an example of a Turing Test in reverse
examples of successful artificial intelligence
instant translations of hundreds of texts
object recognition of by e.g. Google Googles
machines recognizing faces e.g. on Facebook
big data
billions of Facebook pages
hundreds of million tweets a day
millions of servers, petabytes of data
old-style business intelligence
databases, clean the data, data warehouse, more database, statistics
new-style business intelligence (Google, Facebook, etc.)
massive parallelism
Map-Reduce paradigm
this is the heart of big data technology
relationship between data and intelligence
the difference between data and intelligence is with intelligence, you can predict
applications that use big data to achieve predictive intelligence
online advertising predicting our intent and interest
gauging consumer sentiment and predicting behavior
detecting adverse events and predicting their impact
fires
floods
earthquakes
recognizing places and faces
personalize genomic medicine
medicines actually have different affects on each person
intelligent public services for energy, water
intelligent sensors
deep analytics
securing ourselves from criminals
big data analytics
fusing social intelligence with business intelligence
data-driven business models and processes
how to predict the future with artificial intelligence and big data
looking
listening
learning
connecting
predicting
correcting
looking
the purpose of looking is to find stuff
on the web
on one's computer
in one's memories
finding stuff is essential to intelligent behavior
on the web, finding stuff is about finding documents
we type in words and expect to find documents
if we type in "large bird" we want documents which contain both "large" and "bird" sorted by those on top which have both "large" and "bird"
we do this via indexing
in a binary tree, looking up a document takes O(log m)
using a hash might be faster
in multiple-term queries, you would also need to sort them
the time it takes is O(r q) if r = number of intermediate results in all