What is the problem of incidence matrix in large collections?
Anonymous Quiz
22%
Very dense
0%
Very small
78%
Sparse, full of zeros
0%
Cannot be created
If we have 1 million documents and 500,000 terms, total cells = ?
Anonymous Quiz
30%
500 million
0%
1 billion
70%
500 billion
0%
1 trillion
What is the biggest drawback of the Boolean model?
Anonymous Quiz
75%
Does not rank results by relevance
25%
Always slow
0%
Never used
0%
Relies on AI
Which systems still use the Boolean model?
Anonymous Quiz
13%
Modern search engines only
88%
Email systems, libraries, Spotlight
0%
Neural networks
0%
None
What happens if no document matches the query?
---
---
Anonymous Quiz
22%
Returns all documents
44%
Returns empty results
22%
Returns half of the documents
11%
Returns random results
What is the purpose of using an inverted index?
Anonymous Quiz
40%
Reduce storage
40%
Speed up search
20%
Translate documents
0%
Remove duplicates
What are the components of an inverted index?
Anonymous Quiz
70%
Dictionary + Postings Lists
20%
Spreadsheet
10%
SQL database
0%
Text files
What does the dictionary contain?
Anonymous Quiz
60%
All unique terms
30%
Full text of documents
10%
Usernames
0%
Titles only
What do postings lists contain?
Anonymous Quiz
89%
List of documents containing the term
0%
Synonyms
11%
Letter frequency
0%
Image locations
What is the first step in building an inverted index?
Anonymous Quiz
89%
Tokenization
0%
Normalization
11%
Stemming
0%
Sorting
Tokenization means:
Anonymous Quiz
73%
Splitting text into words/tokens
27%
Removing short words
0%
Compressing text
0%
Translating text
Challenge in Hewlett-Packard during tokenization:
Anonymous Quiz
44%
One word or two?
33%
Ambiguous meaning
22%
Long length
0%
Not used
Query: wink AND drink means:
Anonymous Quiz
33%
Union of lists
56%
Intersection of lists
11%
Subtract lists
0%
Nothing
What distinguishes positional index from normal index?
Anonymous Quiz
63%
Stores word positions
38%
Smaller size
0%
Always faster
0%
No difference
Purpose of positional index?
Anonymous Quiz
100%
Support phrase and proximity queries
0%
Reduce storage
0%
Compress text
0%
No use
Drawback of positional index?
Anonymous Quiz
30%
Large size
10%
Always slow
30%
Inaccurate
30%
Complicated
Query NEAR/3 means:
Anonymous Quiz
70%
Two words within 3 words of each other
10%
Two words within 3 documents
20%
Two words within 3 seconds
0%
Two identical words
Purpose of stop words removal?
Anonymous Quiz
100%
Reduce size and focus on important terms
0%
Increase frequency
0%
Improve translation
0%
Compress text