The Bag-of-Words (BOW) model assumes that:
Anonymous Quiz
12%
All words have equal importance
12%
Word order is critically important for meaning
0%
Documents should be treated as strings
76%
The topic of a document is determined by the words it contains,not their order
The Boolean Retrieval Model is called exact-match because it:
Anonymous Quiz
6%
Is 100%accurate
11%
Always finds all relevant documents
0%
Uses a precise ranking function
83%
Returns documents that exactly satisfy the Boolean logical condition
An example of a Boolean query is:
Anonymous Quiz
0%
cat dog
21%
cat~
5%
black cat
74%
cat AND dog NOT pet
The main component of an IR system that matches queries to documents is the:
Anonymous Quiz
21%
User Interface
11%
Parser
58%
Search Engine
11%
Database
Which of these is NOT a typical application of IR technology?
Anonymous Quiz
5%
Recommender system
5%
Email search
86%
Relational database management system(RDBMS)
5%
Web search engine
The Cranfield Paradigm for evaluation requires:
Anonymous Quiz
0%
A live user study
7%
Only a document collection
0%
Only a set of queries
93%
A document collection,queries, and relevance judgments
A test collection in IR is reusable because:
Anonymous Quiz
7%
It only uses synthetic data
0%
It never changes
7%
It is very small
86%
It allows different systems to be compared on the same ground truth
System-centered evaluation focuses on:
Anonymous Quiz
0%
The aesthetics of the interface
8%
How quickly users learn the system
0%
User satisfaction surveys
92%
Measuring performance against a fixed test collection
User-centered evaluation focuses on:
Anonymous Quiz
8%
The number of documents in the collection
0%
The compression ratio of the index
8%
The speed of the indexing algorithm
85%
Involving real users to complete tasks and measuring their success/satisfaction
The unit document problem in indexing refers to:
Anonymous Quiz
0%
Removing stop words
0%
Tokenizing the text
0%
Choosing a character encoding
100%
Deciding what constitutes a single document to be indexed(e.g., a book, a chapter, a page)
IR systems are important because they:
Anonymous Quiz
7%
Are easier to build than other systems
0%
Can understand the semantic meaning of all text
0%
Are faster than database systems
93%
Help users overcome information overload by finding needed information
The initial step in any IR process is:
Anonymous Quiz
0%
Building an index
6%
Displaying the results
18%
Ranking the results
76%
Understanding the users information need
An inverted index consists of:
Anonymous Quiz
0%
A matrix of term frequencies
0%
A list of documents
8%
A graph of document relationships
92%
A dictionary of terms and their corresponding postings lists
A posting in an inverted index typically contains:
Anonymous Quiz
0%
The user who authored the document
0%
The entire text of a document
9%
The relevance score for that term-document pair
91%
A document ID and often the term frequency or positions
The indexing process is typically performed:
Anonymous Quiz
9%
Only when a document is updated
0%
At query time(online)
9%
Both online and offline
82%
Offline,before any queries are received
The first step in the text preprocessing pipeline is:
Anonymous Quiz
0%
Case folding
11%
Stemming
11%
Stopping
78%
Tokenization
Tokenization is the process of:
Anonymous Quiz
9%
Removing common words
0%
Translating words into another language
55%
Splitting a text stream into individual tokens(words)
36%
Reducing words to their root form
A significant challenge in tokenizing text from social media (like Twitter) is:
Anonymous Quiz
9%
The text is too long
0%
The text is in multiple languages
0%
There are no spaces between words
91%
Handling hashtags(#), mentions (@), and emoticons
The main reason for removing stop words is to:
Anonymous Quiz
9%
Make the text easier to read
0%
Improve grammatical correctness
9%
Increase the number of relevant documents
82%
Reduce the size of the index and improve efficiency