Overview
Search
engine
is
a
tool
de-facto
to
obtain
any
kind
of
information
on
the
web.
Every
Internet
savvy
user
must
have
used
at
some
time
or
the
other.
Motivating
factors
for
this
tutorial
are:
-
Relative
difficulty
in
applying
advanced
operators
in
the
basic
search
-
Lack
of
awareness
in
the
common
web
searcher
on
"how
to
get
what
he/she
wants
by
giving
what
n,he/she
knows"
-
Educate
the
user
about
Web
search
internals
-
To
explore
the
difficulties
and
possibilities
of
shifting
key-word
centric
search
to
multi-media
centric
n,search
and
-
Analyze
Cost
of
Service
(CoS)
and
Quality
of
Service
(QoS)
of
present
crop
of
search
engines.
From
user
perspective,
given
a
keyword
or
a
phrase,
getting
the
relevant
results
in
the
order
of
milliseconds
is
an
astounding
feat.
On
submission
of
user
request
it
is
assumed
that
search
engines
"search
the
web
for
the
required,
relevant
information"
and
respond
to
the
user
by
giving
a
summary
sheet,
ordered
on
some
criterion
(C).
Google
certainly
has
taken
the
web
community
by
awe
and
inspiration.
Evolution
of
Google
and
other
present
generation
search
engines
still
have
a
long
road
to
tread
to
satisfy
all
the
needs
of
a
web
user.
Some
of
them
are:
-
How
precise
and
small
can
the
result
set
can
be?
-
How
to
search
the
multimedia
content?
-
How
intuitive
and
user
friendly
is
search
engine
interface?
Search
engine
perspective
illuminates
the
underlying
web
indexing
process
that
has
some
concurrent/non-concurrent
stages.
Crawling
and
Indexing
are
vital
phases
of
search
engines.
Typically
present
search
engines
have
to
crawl
billions
of
web
pages
and
index
them
on
virtually
every
possible
key
word.
Ranking
the
pages
and
yield
them
to
the
user
is
another
challenge.
It
is
also
amazing
to
notice
that
search
engines
are
adequately
coping
with
growing
web,
changing
web
and
increasing
user
base.
On
the
same
keel,
thronging
questions
are:
-
How
search
engines
cope
with
growing
web?
-
What
is
the
crawl
periodicity
or
re-indexing
period?
-
What
are
the
load-balancing
strategies
adopted
to
respond
to
the
web
searcher?
-
Estimate
the
storage,
communication
and
computation
cost
-
What
are
the
Return
on
Investment
(ROI)
strategies?
After
addressing
these
basic
issues,
tutorial
explores
some
pertinent
issues
like
profiling
which
has
caused
the
"personalization
-
privacy"
dilemma
in
e-Commerce
environment,
indexing
various
data
formats,
research
issues
in
multimedia
content
analysis
etc.
Overall
this
tutorial
gives
insight
into
a)
Intricacies
in
present
web
search
b)
Internals
of
present
search
technology
and
c)
Inadequacies
of
present
technology
And
also
brings
some
of
the
pertinent
research
issues
like
a)
Freshness/Recency
Maintenance
with
growing
web
b)
Relevancy
on
per
user
basis
to
the
audience
notice.
Organization
and
Structure
-
Introduction
-
Search
Engine
-
System
Perspective
n,o
Crawling
and
Crawl
Strategies
n,o
Page
Ranking
and
Subject
Specific
Ranking
n,o
Indexing
and
Retrieval
-
Search
Engine
-
User
Perspective
-
Intricacies
n,o
Basic
Search
n,o
Advanced
Search
n,o
Meta
Search
n,o
Profiling
(Personalization
-
Privacy
dilemma)
n,o
Multimedia
Search
-
Present
Search
Technologies
n,o
Computation,
communication
and
Storage
Requirements
n,o
Hardware
and
Software
Internals
n,o
Protocols
and
formats
-
Inadequacies
and
Amendments
n,o
Coping
with
dynamic
and
growing
Web
n,o
Spider
Menace
n,o
Bandwidth
Considerations
n,o
Clustering
and
Classification
n,o
Multimedia
Content
Analysis
n,o
CoS
and
QoS
Analysis
-
Conclusion
Mr.
Sai
Prakash
is
a
doctoral
student
from
Indian
Institute
of
Technology
Madras,
Chennai,
India.
His
research
area
is
"Search
Engine
Technologies".
He
has
obtained
his
Master
of
Science
in
Mathematics
and
Computer
Science
in
1995
from
Sri
Sathya
Sai
Institute
of
Higher
Learning,
Prasanthinilayam,
India.
After
that
he
went
on
to
complete
Master
of
Technology
in
Computer
Science
from
the
same
institute.
He
also
obtained
Master
of
Business
Administration
(Specialization
in
Finance)
from
Indira
Gandhi
National
Open
Univeristy,
India.
After
completing
masters
and
before
joining
for
doctoral
program,
he
worked
in
academia
and
industry
for
a
couple
of
years.
His
academic
and
research
interests
include
Knowledge
based
Systems,
Intelligent
Networks,
Web
Technologies,
e-Commerce,
Mobile
&
Distributed
Computing.
He
is
a
member
of
IEEE
and
ACM
since
2000,
a
life
member
of
CSI
(Computer
Society
of
India)
and
ISCA
(Indian,
Science
Congress
Association)
and
a
member
of
IADIS
(International
Association
for
the
Development
of
Information
Society).
|