Example: Review Labeling by Topic

In this notebook, we will use a subset of the Amazon Product Review data to demonstrate the usage of this work for labeling arguments. The problem to be addressed here is determining the credibility (reliable/unreliable) of reviews that evaluate a specific aspect of the product (i.e. size of shoes) and being able to provide reasoning for the results. The dataset can be found here.

[21]:
from orangearg.argument.miner import reader, chunker, processor, miner

fpath = "./example_dataset.json"

Read the input file

[2]:
df_arguments = reader.read_json_file(fpath=fpath)
df_arguments = df_arguments.dropna().reset_index(drop=True)  # remove rows with na

The results of reading the data file are as follows. It can be seen that this dataset contains two aspects of information, namely the text of the reviews (reviewText) and the rating evaluations provided by the purchasers (overall, ranging from 1 to 5 stars).

[3]:
df_arguments
[3]:
reviewText overall
0 I always get a half size up in my tennis shoes... 3
1 Put them on and walked 3 hours with no problem... 5
2 excelente 5
3 The shoes fit well in the arch area. They are ... 4
4 Tried them on in a store before buying online ... 5
... ... ...
365 Favorite Nike shoe ever! The flex sole is exce... 5
366 I wear these everyday to work, the gym, etc. 5
367 Love these shoes! Great fit, very light weight. 5
368 Super comfortable and fit my small feet perfec... 5
369 Love these shoes! 5

370 rows × 2 columns

Split arguments into chunks

By analyzing, reviews will first be segmented into smaller chunks, which are clauses that express complete meanings. The reason for doing this is to identify from which different perspectives reviews provide their evaluations, in preparation for the subsequent review labeling process.

[4]:
arguments = df_arguments["reviewText"]
arg_scores = df_arguments["overall"]

# Split reviews into chunks
chunk_arg_ids, chunks = chunker.get_chunk(docs=arguments)

# Compute polarity score of chunks
chunk_p_scores = chunker.get_chunk_polarity_score(chunks=chunks)

# Compute topics of chunks
chunk_topics, chunk_embeds, df_topics = chunker.get_chunk_topic(chunks=chunks)

# Comput importance of chunks inside the arguments
chunk_ranks = chunker.get_chunk_rank(arg_ids=chunk_arg_ids, embeds=chunk_embeds)

# Collect everything together as a dataframe
df_chunks = chunker.get_chunk_table(
    arg_ids=chunk_arg_ids,
    chunks=chunks,
    p_scores=chunk_p_scores,
    topics=chunk_topics,
    ranks=chunk_ranks
)

Some explanations of df_chunks:

  • argument_id: the index of the argument the chunk coming from.

  • polarity_score: the sentiment polarity score of a chunk, in range of [-1, 1], where 0 signifies neutrality, positive values indicate positivity, and negative values denote negativity.

  • topic: the index of a topic in the df_topics table below.

  • rank: importance of a chunk within the argument it comes from, in range of [0, 1]. This is computed as the pagerank of chunks based on the similarity network of chunks. Therefore, the sum of ranks from chunks belonging to the same argument is equal to 1.

[5]:
df_chunks
[5]:
argument_id chunk polarity_score topic rank
0 0 I always get a half size up in my tennis shoes . -0.166667 4 0.500000
1 0 For some reason these feel to big in the heel ... -0.050000 10 0.500000
2 1 walked 3 hours with no problem 0.000000 7 0.249374
3 1 Put them on and ! 0.000000 2 0.255228
4 1 Love them ! 0.625000 3 0.250344
... ... ... ... ... ...
1192 368 I can wear the shoe all day long and -0.050000 15 0.125961
1193 368 they are easy to clean compared to other shoes... 0.225000 0 0.128238
1194 368 They are light colored so any dirt will be see... 0.342857 23 0.128449
1195 368 Would definitely buy another pair in a differe... 0.000000 13 0.125681
1196 369 Love these shoes ! 0.625000 22 1.000000

1197 rows × 5 columns

And explanations of df_topics:

  • topic: the index of a topic

  • count: the number of chunks in a topic

  • keywords: the top keywords of a topic

  • name: a short name of the topic

[6]:
df_topics.head()
[6]:
topic count name keywords
0 0 147 0_shoes_the_these_for (shoes, the, these, for, shoe, comfortable, ar...
1 1 99 1_fit_perfect_true_perfectly (fit, perfect, true, perfectly, fits, expected...
2 2 87 2_for_them_work_use (for, them, work, use, wear, training, in, gym...
3 3 79 3_love_them_they_are (love, them, they, are, these, cute, really, p...
4 4 74 4_size_ordered_half_big (size, ordered, half, big, large, order, an, a...

Merge chunks back to arguments

By merging the chunks back into reviews and performing the corresponding computations, we will obtain relevant information at the review level, including the topics covered in each review, the sentiment of the review, and its consistency with the overall score. This information will be further used for labeling reviews under different topics.

[7]:
# Compute topics of arguments
arg_topics = processor.get_argument_topics(arg_ids=chunk_arg_ids, topics=chunk_topics)

# Compute sentiment of arguments
arg_sentiments = processor.get_argument_sentiment(arg_ids=chunk_arg_ids, ranks=chunk_ranks, p_scores=chunk_p_scores)

# Compute the coherence between the sentiments and the overall of arguments
arg_coherences = processor.get_argument_coherence(scores=arg_scores, sentiments=arg_sentiments)

# Collect everything together as a datafrae
df_arguments_processed = processor.update_argument_table(
    df_arguments=df_arguments,
    topics=arg_topics,
    sentiments=arg_sentiments,
    coherences=arg_coherences
)

Some columns are added to the original df_arguments dataframe, which are:

  • topics: the topics that an argument has mentioned.

  • sentiment: the sentiment score of an argument, in range of [0, 1], the higher the more positive.

  • coherence: the coherence between the sentiment and overall, in range of [0, 1], the higher the more coherent.

[8]:
df_arguments_processed.head()
[8]:
reviewText overall topics sentiment coherence
0 I always get a half size up in my tennis shoes... 3 (4, 10) 0.445833 0.992692
1 Put them on and walked 3 hours with no problem... 5 (7, 2, 3, 9) 0.627243 0.706545
2 excelente 5 (6,) 0.500000 0.535261
3 The shoes fit well in the arch area. They are ... 4 (0, 10, 10, 21) 0.524397 0.880521
4 Tried them on in a store before buying online ... 5 (1, 0, 5, 0, 6) 0.712758 0.813614

Review labeling

In this step, by looking at reviews under a specific topic, an attacking network of reviews are built, where nodes are reviews and edges are the attacks in between. Reviews are labeled based on that.

These are the rules of generating the network:

  • Edges exist only between reviews with different overall.

  • Edges start from a review with higher coherence to lower coherence.

  • Weight of edges are computed as difference of coherence of the vertices.

  • A node is labeled as supportive (meaning reliable in our case),

    • if no other nodes attack it, or

    • if all attackers of this node are attacked by some other nodes.

  • A node is labeled as defeated (meaning unreliable in our case), if it is not supportive.

[9]:
from IPython.display import display, HTML

# Select reviews of the last topic
last_topic = df_topics.iloc[-1]["topic"]
print(f"The last topic is topic nr. {last_topic}:")
display(HTML(df_topics[df_topics["topic"] == last_topic].to_html()))
The last topic is topic nr. 24:
topic count name keywords
24 24 19 24_return_returned_returning_ca (return, returned, returning, ca, because, them, can, were, defective, package)

Seems that the arguments under this topic are about judgements of the returning experience of this product.

[10]:
# select the arguments under the last topic
arg_selection = miner.select_by_topic(data=df_arguments_processed, topic=last_topic)
arg_selection = arg_selection.rename(columns={
    "reviewText": "argument",
    "overall": "score"
})  # rename columns for the following steps
arg_selection
[10]:
argument score topics sentiment coherence argument_id
0 I wore these shoe one time, from the airport i... 1 (16, 14, 23, 24, 24) 0.500000 0.535261 46
1 I usually wear a size 8 and they fit fine. The... 1 (1, 4, 10, 24, 4, 6) 0.496439 0.540030 77
2 Great shoe! Outside arch is kind of high, but ... 5 (21, 2, 19, 14, 4, 4, 4, 1, 24) 0.659098 0.747863 78
3 I bought these for gym training - weight class... 2 (13, 2, 7, 7, 18, 7, 24, 10) 0.516497 0.837317 83
4 Oops! I returned these because I ordered wrong... 1 (14, 24) 0.343750 0.744226 114
5 I loved these shoes...that is until after abou... 1 (13, 7, 24) 0.599393 0.407310 118
6 I returned them...found a Ryka pair I liked be... 3 (24,) 0.775000 0.827735 121
7 I got the impression it's cushiony and comfy b... 3 (0, 2, 24, 11, 9, 18, 6, 24) 0.565749 0.989251 154
8 Ordered 9(m) received 9 Wide for the second ti... 1 (4, 4, 24) 0.491652 0.546454 205
9 Returning these. the pictures on here make the... 1 (24, 23, 23, 23, 3, 15) 0.520525 0.507953 254
10 Tried one in the store and bought it online bu... 2 (24, 24, 23, 1) 0.557920 0.788963 263
11 I returned these as they were not true to size... 2 (24, 4) 0.509821 0.844705 266
12 I bought a pair of these in my size, but they ... 3 (4, 4, 7, 0, 24) 0.440069 0.991061 288
13 Unfortunately, this Flex Supreme does NOT have... 1 (9, 10, 24, 24) 0.494785 0.542248 304
14 After using this shoes seven times for regular... 1 (7, 14, 20, 24) 0.346269 0.741000 305
[11]:
# Compute edges of the attacking network
edges = miner.get_edges(data=arg_selection)
weights = miner.get_edge_weights(data=arg_selection, edges=edges)
df_edges = miner.get_edge_table(edges=edges, weights=weights)

Edges are defined between reviews with different overall scores. Also, edges are directed and weighted, where source and target are indices of reviews in arg_selection.

[12]:
df_edges
[12]:
source target weight
0 2 0 0.21
1 3 0 0.30
2 6 0 0.29
3 7 0 0.45
4 10 0 0.25
... ... ... ...
66 12 11 0.15
67 11 13 0.30
68 11 14 0.10
69 12 13 0.45
70 12 14 0.25

71 rows × 3 columns

[13]:
# Compute labels of reviews in the selection
labels = miner.get_node_labels(
    indices=arg_selection.index.tolist(),
    sources=df_edges["source"].tolist(),
    targets=df_edges["target"].tolist()
)
arg_selection["labels"] = labels
arg_selection
[13]:
argument score topics sentiment coherence argument_id labels
0 I wore these shoe one time, from the airport i... 1 (16, 14, 23, 24, 24) 0.500000 0.535261 46 defeated
1 I usually wear a size 8 and they fit fine. The... 1 (1, 4, 10, 24, 4, 6) 0.496439 0.540030 77 defeated
2 Great shoe! Outside arch is kind of high, but ... 5 (21, 2, 19, 14, 4, 4, 4, 1, 24) 0.659098 0.747863 78 defeated
3 I bought these for gym training - weight class... 2 (13, 2, 7, 7, 18, 7, 24, 10) 0.516497 0.837317 83 defeated
4 Oops! I returned these because I ordered wrong... 1 (14, 24) 0.343750 0.744226 114 defeated
5 I loved these shoes...that is until after abou... 1 (13, 7, 24) 0.599393 0.407310 118 defeated
6 I returned them...found a Ryka pair I liked be... 3 (24,) 0.775000 0.827735 121 supportive
7 I got the impression it's cushiony and comfy b... 3 (0, 2, 24, 11, 9, 18, 6, 24) 0.565749 0.989251 154 supportive
8 Ordered 9(m) received 9 Wide for the second ti... 1 (4, 4, 24) 0.491652 0.546454 205 defeated
9 Returning these. the pictures on here make the... 1 (24, 23, 23, 23, 3, 15) 0.520525 0.507953 254 defeated
10 Tried one in the store and bought it online bu... 2 (24, 24, 23, 1) 0.557920 0.788963 263 defeated
11 I returned these as they were not true to size... 2 (24, 4) 0.509821 0.844705 266 defeated
12 I bought a pair of these in my size, but they ... 3 (4, 4, 7, 0, 24) 0.440069 0.991061 288 supportive
13 Unfortunately, this Flex Supreme does NOT have... 1 (9, 10, 24, 24) 0.494785 0.542248 304 defeated
14 After using this shoes seven times for regular... 1 (7, 14, 20, 24) 0.346269 0.741000 305 defeated

The attacking network of the reviews are visualized as below for better understanding the output.

[14]:
import networkx as nx
import matplotlib.pyplot as plt

DG = nx.DiGraph()
DG.add_edges_from(edges)

# graph layout
pos = nx.shell_layout(DG)

# draw nodes
reliable_indices = arg_selection[arg_selection["labels"] == "supportive"].index.tolist()
unreliable_indices = arg_selection[arg_selection["labels"] == "defeated"].index.tolist()
nx.draw_networkx_nodes(DG, pos, nodelist=reliable_indices, node_color="tab:green")
nx.draw_networkx_nodes(DG, pos, nodelist=unreliable_indices, node_color="tab:red")

# draw edges
nx.draw_networkx_edges(DG, pos, width=df_edges["weight"])

# draw labels
labels = {i: i for i in arg_selection.index}
nx.draw_networkx_labels(DG, pos, labels, font_size=9, font_color="whitesmoke")

plt.show()
_images/example_25_0.png

It can be seen from the above plot that review #6, #7, and #12 are considered reliable, while the others are not. Look back to the arg_selection table, it seems that those reviews indeed show very high level of consistence, except #6, which is attacked by #3 and #11. But since they are also attacked by some other reviews, #6 is safe.

The weights of edges seems to make sense. One example here is that the attack \(11\rightarrow6\) is much weaker than \(3\rightarrow6\), because #3 describes the totally opposite returning experience than #6 and #11 (can’t return in #3 vs. return succesfully in $6 and #11).