Example: Review Labeling by Topic

In this notebook, we will use a subset of the Amazon Product Review data to demonstrate the usage of this work for labeling arguments. The problem to be addressed here is determining the credibility (reliable/unreliable) of reviews that evaluate a specific aspect of the product (i.e. size of shoes) and being able to provide reasoning for the results. The dataset can be found here.

[21]:

from orangearg.argument.miner import reader, chunker, processor, miner

fpath = "./example_dataset.json"

Read the input file

[2]:

df_arguments = reader.read_json_file(fpath=fpath)
df_arguments = df_arguments.dropna().reset_index(drop=True)  # remove rows with na

The results of reading the data file are as follows. It can be seen that this dataset contains two aspects of information, namely the text of the reviews (reviewText) and the rating evaluations provided by the purchasers (overall, ranging from 1 to 5 stars).

[3]:

df_arguments

[3]:

	reviewText	overall
0	I always get a half size up in my tennis shoes...	3
1	Put them on and walked 3 hours with no problem...	5
2	excelente	5
3	The shoes fit well in the arch area. They are ...	4
4	Tried them on in a store before buying online ...	5
...	...	...
365	Favorite Nike shoe ever! The flex sole is exce...	5
366	I wear these everyday to work, the gym, etc.	5
367	Love these shoes! Great fit, very light weight.	5
368	Super comfortable and fit my small feet perfec...	5
369	Love these shoes!	5

370 rows × 2 columns

Split arguments into chunks

By analyzing, reviews will first be segmented into smaller chunks, which are clauses that express complete meanings. The reason for doing this is to identify from which different perspectives reviews provide their evaluations, in preparation for the subsequent review labeling process.

[4]:

arguments = df_arguments["reviewText"]
arg_scores = df_arguments["overall"]

# Split reviews into chunks
chunk_arg_ids, chunks = chunker.get_chunk(docs=arguments)

# Compute polarity score of chunks
chunk_p_scores = chunker.get_chunk_polarity_score(chunks=chunks)

# Compute topics of chunks
chunk_topics, chunk_embeds, df_topics = chunker.get_chunk_topic(chunks=chunks)

# Comput importance of chunks inside the arguments
chunk_ranks = chunker.get_chunk_rank(arg_ids=chunk_arg_ids, embeds=chunk_embeds)

# Collect everything together as a dataframe
df_chunks = chunker.get_chunk_table(
    arg_ids=chunk_arg_ids,
    chunks=chunks,
    p_scores=chunk_p_scores,
    topics=chunk_topics,
    ranks=chunk_ranks
)

Some explanations of df_chunks:

argument_id: the index of the argument the chunk coming from.
polarity_score: the sentiment polarity score of a chunk, in range of [-1, 1], where 0 signifies neutrality, positive values indicate positivity, and negative values denote negativity.
topic: the index of a topic in the df_topics table below.
rank: importance of a chunk within the argument it comes from, in range of [0, 1]. This is computed as the pagerank of chunks based on the similarity network of chunks. Therefore, the sum of ranks from chunks belonging to the same argument is equal to 1.

[5]:

df_chunks

[5]:

	argument_id	chunk	polarity_score	topic	rank
0	0	I always get a half size up in my tennis shoes .	-0.166667	4	0.500000
1	0	For some reason these feel to big in the heel ...	-0.050000	10	0.500000
2	1	walked 3 hours with no problem	0.000000	7	0.249374
3	1	Put them on and !	0.000000	2	0.255228
4	1	Love them !	0.625000	3	0.250344
...	...	...	...	...	...
1192	368	I can wear the shoe all day long and	-0.050000	15	0.125961
1193	368	they are easy to clean compared to other shoes...	0.225000	0	0.128238
1194	368	They are light colored so any dirt will be see...	0.342857	23	0.128449
1195	368	Would definitely buy another pair in a differe...	0.000000	13	0.125681
1196	369	Love these shoes !	0.625000	22	1.000000

1197 rows × 5 columns

And explanations of df_topics:

topic: the index of a topic
count: the number of chunks in a topic
keywords: the top keywords of a topic
name: a short name of the topic

[6]:

df_topics.head()

[6]:

	topic	count	name	keywords
0	0	147	0_shoes_the_these_for	(shoes, the, these, for, shoe, comfortable, ar...
1	1	99	1_fit_perfect_true_perfectly	(fit, perfect, true, perfectly, fits, expected...
2	2	87	2_for_them_work_use	(for, them, work, use, wear, training, in, gym...
3	3	79	3_love_them_they_are	(love, them, they, are, these, cute, really, p...
4	4	74	4_size_ordered_half_big	(size, ordered, half, big, large, order, an, a...

Merge chunks back to arguments

By merging the chunks back into reviews and performing the corresponding computations, we will obtain relevant information at the review level, including the topics covered in each review, the sentiment of the review, and its consistency with the overall score. This information will be further used for labeling reviews under different topics.

[7]:

# Compute topics of arguments
arg_topics = processor.get_argument_topics(arg_ids=chunk_arg_ids, topics=chunk_topics)

# Compute sentiment of arguments
arg_sentiments = processor.get_argument_sentiment(arg_ids=chunk_arg_ids, ranks=chunk_ranks, p_scores=chunk_p_scores)

# Compute the coherence between the sentiments and the overall of arguments
arg_coherences = processor.get_argument_coherence(scores=arg_scores, sentiments=arg_sentiments)

# Collect everything together as a datafrae
df_arguments_processed = processor.update_argument_table(
    df_arguments=df_arguments,
    topics=arg_topics,
    sentiments=arg_sentiments,
    coherences=arg_coherences
)

Some columns are added to the original df_arguments dataframe, which are:

topics: the topics that an argument has mentioned.
sentiment: the sentiment score of an argument, in range of [0, 1], the higher the more positive.
coherence: the coherence between the sentiment and overall, in range of [0, 1], the higher the more coherent.

[8]:

df_arguments_processed.head()

[8]:

	reviewText	overall	topics	sentiment	coherence
0	I always get a half size up in my tennis shoes...	3	(4, 10)	0.445833	0.992692
1	Put them on and walked 3 hours with no problem...	5	(7, 2, 3, 9)	0.627243	0.706545
2	excelente	5	(6,)	0.500000	0.535261
3	The shoes fit well in the arch area. They are ...	4	(0, 10, 10, 21)	0.524397	0.880521
4	Tried them on in a store before buying online ...	5	(1, 0, 5, 0, 6)	0.712758	0.813614

Review labeling

In this step, by looking at reviews under a specific topic, an attacking network of reviews are built, where nodes are reviews and edges are the attacks in between. Reviews are labeled based on that.

These are the rules of generating the network:

Edges exist only between reviews with different overall.
Edges start from a review with higher coherence to lower coherence.
Weight of edges are computed as difference of coherence of the vertices.
A node is labeled as supportive (meaning reliable in our case),
- if no other nodes attack it, or
- if all attackers of this node are attacked by some other nodes.
A node is labeled as defeated (meaning unreliable in our case), if it is not supportive.

[9]:

from IPython.display import display, HTML

# Select reviews of the last topic
last_topic = df_topics.iloc[-1]["topic"]
print(f"The last topic is topic nr. {last_topic}:")
display(HTML(df_topics[df_topics["topic"] == last_topic].to_html()))

The last topic is topic nr. 24:

	topic	count	name	keywords
24	24	19	24_return_returned_returning_ca	(return, returned, returning, ca, because, them, can, were, defective, package)

Seems that the arguments under this topic are about judgements of the returning experience of this product.

[10]:

# select the arguments under the last topic
arg_selection = miner.select_by_topic(data=df_arguments_processed, topic=last_topic)
arg_selection = arg_selection.rename(columns={
    "reviewText": "argument",
    "overall": "score"
})  # rename columns for the following steps
arg_selection

[10]:

	argument	score	topics	sentiment	coherence	argument_id
0	I wore these shoe one time, from the airport i...	1	(16, 14, 23, 24, 24)	0.500000	0.535261	46
1	I usually wear a size 8 and they fit fine. The...	1	(1, 4, 10, 24, 4, 6)	0.496439	0.540030	77
2	Great shoe! Outside arch is kind of high, but ...	5	(21, 2, 19, 14, 4, 4, 4, 1, 24)	0.659098	0.747863	78
3	I bought these for gym training - weight class...	2	(13, 2, 7, 7, 18, 7, 24, 10)	0.516497	0.837317	83
4	Oops! I returned these because I ordered wrong...	1	(14, 24)	0.343750	0.744226	114
5	I loved these shoes...that is until after abou...	1	(13, 7, 24)	0.599393	0.407310	118
6	I returned them...found a Ryka pair I liked be...	3	(24,)	0.775000	0.827735	121
7	I got the impression it's cushiony and comfy b...	3	(0, 2, 24, 11, 9, 18, 6, 24)	0.565749	0.989251	154
8	Ordered 9(m) received 9 Wide for the second ti...	1	(4, 4, 24)	0.491652	0.546454	205
9	Returning these. the pictures on here make the...	1	(24, 23, 23, 23, 3, 15)	0.520525	0.507953	254
10	Tried one in the store and bought it online bu...	2	(24, 24, 23, 1)	0.557920	0.788963	263
11	I returned these as they were not true to size...	2	(24, 4)	0.509821	0.844705	266
12	I bought a pair of these in my size, but they ...	3	(4, 4, 7, 0, 24)	0.440069	0.991061	288
13	Unfortunately, this Flex Supreme does NOT have...	1	(9, 10, 24, 24)	0.494785	0.542248	304
14	After using this shoes seven times for regular...	1	(7, 14, 20, 24)	0.346269	0.741000	305

[11]:

# Compute edges of the attacking network
edges = miner.get_edges(data=arg_selection)
weights = miner.get_edge_weights(data=arg_selection, edges=edges)
df_edges = miner.get_edge_table(edges=edges, weights=weights)

Edges are defined between reviews with different overall scores. Also, edges are directed and weighted, where source and target are indices of reviews in arg_selection.

[12]:

df_edges

[12]:

	source	target	weight
0	2	0	0.21
1	3	0	0.30
2	6	0	0.29
3	7	0	0.45
4	10	0	0.25
...	...	...	...
66	12	11	0.15
67	11	13	0.30
68	11	14	0.10
69	12	13	0.45
70	12	14	0.25

71 rows × 3 columns

[13]:

# Compute labels of reviews in the selection
labels = miner.get_node_labels(
    indices=arg_selection.index.tolist(),
    sources=df_edges["source"].tolist(),
    targets=df_edges["target"].tolist()
)
arg_selection["labels"] = labels
arg_selection

[13]:

	argument	score	topics	sentiment	coherence	argument_id	labels
0	I wore these shoe one time, from the airport i...	1	(16, 14, 23, 24, 24)	0.500000	0.535261	46	defeated
1	I usually wear a size 8 and they fit fine. The...	1	(1, 4, 10, 24, 4, 6)	0.496439	0.540030	77	defeated
2	Great shoe! Outside arch is kind of high, but ...	5	(21, 2, 19, 14, 4, 4, 4, 1, 24)	0.659098	0.747863	78	defeated
3	I bought these for gym training - weight class...	2	(13, 2, 7, 7, 18, 7, 24, 10)	0.516497	0.837317	83	defeated
4	Oops! I returned these because I ordered wrong...	1	(14, 24)	0.343750	0.744226	114	defeated
5	I loved these shoes...that is until after abou...	1	(13, 7, 24)	0.599393	0.407310	118	defeated
6	I returned them...found a Ryka pair I liked be...	3	(24,)	0.775000	0.827735	121	supportive
7	I got the impression it's cushiony and comfy b...	3	(0, 2, 24, 11, 9, 18, 6, 24)	0.565749	0.989251	154	supportive
8	Ordered 9(m) received 9 Wide for the second ti...	1	(4, 4, 24)	0.491652	0.546454	205	defeated
9	Returning these. the pictures on here make the...	1	(24, 23, 23, 23, 3, 15)	0.520525	0.507953	254	defeated
10	Tried one in the store and bought it online bu...	2	(24, 24, 23, 1)	0.557920	0.788963	263	defeated
11	I returned these as they were not true to size...	2	(24, 4)	0.509821	0.844705	266	defeated
12	I bought a pair of these in my size, but they ...	3	(4, 4, 7, 0, 24)	0.440069	0.991061	288	supportive
13	Unfortunately, this Flex Supreme does NOT have...	1	(9, 10, 24, 24)	0.494785	0.542248	304	defeated
14	After using this shoes seven times for regular...	1	(7, 14, 20, 24)	0.346269	0.741000	305	defeated

The attacking network of the reviews are visualized as below for better understanding the output.

[14]:

import networkx as nx
import matplotlib.pyplot as plt

DG = nx.DiGraph()
DG.add_edges_from(edges)

# graph layout
pos = nx.shell_layout(DG)

# draw nodes
reliable_indices = arg_selection[arg_selection["labels"] == "supportive"].index.tolist()
unreliable_indices = arg_selection[arg_selection["labels"] == "defeated"].index.tolist()
nx.draw_networkx_nodes(DG, pos, nodelist=reliable_indices, node_color="tab:green")
nx.draw_networkx_nodes(DG, pos, nodelist=unreliable_indices, node_color="tab:red")

# draw edges
nx.draw_networkx_edges(DG, pos, width=df_edges["weight"])

# draw labels
labels = {i: i for i in arg_selection.index}
nx.draw_networkx_labels(DG, pos, labels, font_size=9, font_color="whitesmoke")

plt.show()

It can be seen from the above plot that review #6, #7, and #12 are considered reliable, while the others are not. Look back to the arg_selection table, it seems that those reviews indeed show very high level of consistence, except #6, which is attacked by #3 and #11. But since they are also attacked by some other reviews, #6 is safe.

The weights of edges seems to make sense. One example here is that the attack $11\rightarrow6$ is much weaker than $3\rightarrow6$, because #3 describes the totally opposite returning experience than #6 and #11 (can’t return in #3 vs. return succesfully in $6 and #11).