Natural Language Processing with Python Updated EditionChapter 42
Chapter 2: Basic Text Processing
Section 4 of 8-~ 12 min read-Synced from Cuantum content
- What is tokenization in NLP?
- a) Combining multiple words into a single token.
b) Splitting text into smaller units like words or sentences.
c) Removing punctuation from text.
d) Encoding text into binary format.
- Which of the following is a technique used to reduce words to their base or root form?
- a) Tokenization
b) Stop word removal
c) Stemming
d) Vectorization
- What are stop words?
- a) Words that are frequently used and often removed during text preprocessing.
b) Words that are rarely used in any text.
c) Words that are essential for the meaning of a sentence.
d) Words that appear at the end of a sentence.
- Which Python library can be used to apply regular expressions for text processing?
- a) re
b) numpy
c) pandas
d) matplotlib