Effect of text color on word embeddings

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In natural scenes and documents, we can find a correlation between text and its color. For instance, the word, “hot,” is often printed in red, while “cold” is often in blue. This correlation can be thought of as a feature that represents the semantic difference between the words. Based on this observation, we propose the idea of using text color for word embeddings. While text-only word embeddings (e.g. word2vec) have been extremely successful, they often represent antonyms as similar since they are often interchangeable in sentences. In this paper, we try two tasks to verify the usefulness of text color in understanding the meanings of words, especially in identifying synonyms and antonyms. First, we quantify the color distribution of words from the book cover images and analyze the correlation between the color and meaning of the word. Second, we try to retrain word embeddings with the color distribution of words as a constraint. By observing the changes in the word embeddings of synonyms and antonyms before and after re-training, we aim to understand the kind of words that have positive or negative effects in their word embeddings when incorporating text color information.

Original languageEnglish
Title of host publicationDocument Analysis Systems - 14th IAPR International Workshop, DAS 2020, Proceedings
EditorsXiang Bai, Dimosthenis Karatzas, Daniel Lopresti
PublisherSpringer
Pages341-355
Number of pages15
ISBN (Print)9783030570576
DOIs
Publication statusPublished - 2020
Event14th IAPR International Workshop on Document Analysis Systems, DAS 2020 - Wuhan, China
Duration: Jul 26 2020Jul 29 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12116 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference14th IAPR International Workshop on Document Analysis Systems, DAS 2020
CountryChina
CityWuhan
Period7/26/207/29/20

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Effect of text color on word embeddings'. Together they form a unique fingerprint.

Cite this