A test collection in IR is reusable because:
Anonymous Quiz
7%
It only uses synthetic data
0%
It never changes
7%
It is very small
86%
It allows different systems to be compared on the same ground truth
System-centered evaluation focuses on:
Anonymous Quiz
0%
The aesthetics of the interface
8%
How quickly users learn the system
0%
User satisfaction surveys
92%
Measuring performance against a fixed test collection
User-centered evaluation focuses on:
Anonymous Quiz
8%
The number of documents in the collection
0%
The compression ratio of the index
8%
The speed of the indexing algorithm
85%
Involving real users to complete tasks and measuring their success/satisfaction
The unit document problem in indexing refers to:
Anonymous Quiz
0%
Removing stop words
0%
Tokenizing the text
0%
Choosing a character encoding
100%
Deciding what constitutes a single document to be indexed(e.g., a book, a chapter, a page)
IR systems are important because they:
Anonymous Quiz
7%
Are easier to build than other systems
0%
Can understand the semantic meaning of all text
0%
Are faster than database systems
93%
Help users overcome information overload by finding needed information
The initial step in any IR process is:
Anonymous Quiz
0%
Building an index
6%
Displaying the results
18%
Ranking the results
76%
Understanding the users information need
An inverted index consists of:
Anonymous Quiz
0%
A matrix of term frequencies
0%
A list of documents
8%
A graph of document relationships
92%
A dictionary of terms and their corresponding postings lists
A posting in an inverted index typically contains:
Anonymous Quiz
0%
The user who authored the document
0%
The entire text of a document
9%
The relevance score for that term-document pair
91%
A document ID and often the term frequency or positions
The indexing process is typically performed:
Anonymous Quiz
9%
Only when a document is updated
0%
At query time(online)
9%
Both online and offline
82%
Offline,before any queries are received
The first step in the text preprocessing pipeline is:
Anonymous Quiz
0%
Case folding
11%
Stemming
11%
Stopping
78%
Tokenization
Tokenization is the process of:
Anonymous Quiz
9%
Removing common words
0%
Translating words into another language
55%
Splitting a text stream into individual tokens(words)
36%
Reducing words to their root form
A significant challenge in tokenizing text from social media (like Twitter) is:
Anonymous Quiz
9%
The text is too long
0%
The text is in multiple languages
0%
There are no spaces between words
91%
Handling hashtags(#), mentions (@), and emoticons
The main reason for removing stop words is to:
Anonymous Quiz
9%
Make the text easier to read
0%
Improve grammatical correctness
9%
Increase the number of relevant documents
82%
Reduce the size of the index and improve efficiency
For which type of query is stop word removal particularly problematic?
Anonymous Quiz
0%
Queries with spelling errors
9%
Boolean AND queries
0%
Queries with very rare terms
91%
Phrase queries(e.g., to be or not to be)
What is the main goal of Information Retrieval (IR) systems?
Anonymous Quiz
0%
Store structured data
91%
Find unstructured documents that satisfy the user’s need
0%
Compress data
9%
Translate texts
Which of these applications is an example of IR?
Anonymous Quiz
17%
Searching emails
0%
Spam filtering
25%
Searching images or videos
58%
All of the above
What is the main difference between IR and databases?
Anonymous Quiz
82%
IR deals with free text, while databases handle structured data
9%
Databases are slower
0%
IR is always more accurate
9%
No difference
In IR, which component represents the user’s need?
Anonymous Quiz
20%
Documents
40%
Query
0%
Index
40%
Search engine
What are the main challenges in IR?
Anonymous Quiz
73%
Efficiency and effectiveness
9%
Algorithm design
9%
Text size only
9%
Language accuracy
How is a document defined in IR systems?
Anonymous Quiz
8%
Only a structured element
92%
An element with a unique identifier; it can be text, image, or video
0%
A small database
0%
A fixed text paragraph