1. Word embeddings are important for providing input features in downstream language tasks such as text classification.
2. Controlled experiments were conducted to systematically examine classic and contextualised word embeddings for text classification using two encoders, CNN and BiLSTM.
3. The study recommends choosing CNN over BiLSTM for document classification datasets where the context in sequence is not as indicative of class membership, and BERT outperforms ELMo for long document datasets.
The article titled "A Comparative Study on Word Embeddings in Deep Learning for Text Classification" presents a systematic examination of classic and contextualized word embeddings for text classification. The study uses two encoders, CNN and BiLSTM, to encode sequences from word representations and evaluates their performance on four benchmarking classification datasets.
One potential bias in the study is the selection of only four benchmarking datasets, which may not be representative of all possible text classification tasks. Additionally, the study does not consider the impact of different hyperparameters or optimization techniques on the performance of word embeddings.
The article reports that CNN outperforms BiLSTM in most situations, especially for document context-insensitive datasets. However, it does not provide a detailed explanation of why this is the case or explore potential counterarguments to this claim.
The study also recommends choosing BERT over ELMo for long document datasets but does not provide evidence to support this claim beyond stating that BERT overall outperforms ELMo. This lack of evidence weakens the credibility of the recommendation.
Furthermore, while the article acknowledges that concatenation of multiple classic embeddings or increasing their size does not lead to a statistically significant difference in performance, it does not explore potential risks associated with using larger or concatenated embeddings such as increased computational complexity or overfitting.
Overall, while the article provides valuable insights into the performance of different word embeddings for text classification tasks, it could benefit from more thorough exploration and consideration of potential biases and limitations.