TY - GEN
T1 - Effect of text color on word embeddings
AU - Ikoma, Masaya
AU - Iwana, Brian Kenji
AU - Uchida, Seiichi
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Number JP17H06100.
Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020
Y1 - 2020
N2 - In natural scenes and documents, we can find a correlation between text and its color. For instance, the word, “hot,” is often printed in red, while “cold” is often in blue. This correlation can be thought of as a feature that represents the semantic difference between the words. Based on this observation, we propose the idea of using text color for word embeddings. While text-only word embeddings (e.g. word2vec) have been extremely successful, they often represent antonyms as similar since they are often interchangeable in sentences. In this paper, we try two tasks to verify the usefulness of text color in understanding the meanings of words, especially in identifying synonyms and antonyms. First, we quantify the color distribution of words from the book cover images and analyze the correlation between the color and meaning of the word. Second, we try to retrain word embeddings with the color distribution of words as a constraint. By observing the changes in the word embeddings of synonyms and antonyms before and after re-training, we aim to understand the kind of words that have positive or negative effects in their word embeddings when incorporating text color information.
AB - In natural scenes and documents, we can find a correlation between text and its color. For instance, the word, “hot,” is often printed in red, while “cold” is often in blue. This correlation can be thought of as a feature that represents the semantic difference between the words. Based on this observation, we propose the idea of using text color for word embeddings. While text-only word embeddings (e.g. word2vec) have been extremely successful, they often represent antonyms as similar since they are often interchangeable in sentences. In this paper, we try two tasks to verify the usefulness of text color in understanding the meanings of words, especially in identifying synonyms and antonyms. First, we quantify the color distribution of words from the book cover images and analyze the correlation between the color and meaning of the word. Second, we try to retrain word embeddings with the color distribution of words as a constraint. By observing the changes in the word embeddings of synonyms and antonyms before and after re-training, we aim to understand the kind of words that have positive or negative effects in their word embeddings when incorporating text color information.
UR - http://www.scopus.com/inward/record.url?scp=85090096490&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090096490&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-57058-3_24
DO - 10.1007/978-3-030-57058-3_24
M3 - Conference contribution
AN - SCOPUS:85090096490
SN - 9783030570576
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 341
EP - 355
BT - Document Analysis Systems - 14th IAPR International Workshop, DAS 2020, Proceedings
A2 - Bai, Xiang
A2 - Karatzas, Dimosthenis
A2 - Lopresti, Daniel
PB - Springer
T2 - 14th IAPR International Workshop on Document Analysis Systems, DAS 2020
Y2 - 26 July 2020 through 29 July 2020
ER -