Topic Models Applied to Online News and Reviews

Topic Models Applied to Online News and Reviews


>> LIN: Okay, hi. My name is Jimmy Lin. I work
here at Google and I’m happy to introduce Professor Alice Oh. She’s an assistant professor
of Computer Science at the Korea Advanced Institute of Science and Technology. Professor
Oh’s group does research in natural language processing, machine learning and human computer
interaction. And today, she’ll be talking about the work that her group is doing on
Topic Models related to Online News and Reviews.>>OH: Okay. I’m very glad to be here and
thank you, Jimmy, for hosting this talk. So, today I’m going to be talking about Topic
Models and how we apply that to online reviews and online news. And in doing so, I’m going
to be talking about Topic Models a little bit just to make sure that everybody knows
what Topic Models are, so it’ll be a brief introduction to it. And then I’ll go into
the details of our work. These are sort of two mini-talks. The topics are–the second
and the third items are related but they’re not quite the same. And these are from two
papers that my students and I recently submitted to Wisdom next year. So I keep my fingers
crossed. Okay, so let’s just dive into the main problem that we’re going to be talking
about. And, recently Google Books has announced this, right? I’m sure many of you read about
this somewhere that Google Books has counted at least 130 million books out there ever
written; and I’m sure there are more. So what we can see, and I’m probably preaching to
the choir here, is that there is a lot of text data out there to be understood, right?
So if you have 130 million books, we can ask this question, it’s a very simple question
when you just look at it, “What are the books about?” But it’s a very challenging problem,
right? So if you know anything about text processing you’ll agree with me that if we
have 130 million books and we’re trying to figure out what actually is in those books
that’s a very difficult question. Topic Models is one answer, one approach to getting at
that answer, okay? So the plate diagram up there is what you normally see when we talk
about Topic Models so I just put it up there. We’ll get back to the plate diagram in a little
bit. But Topic Models, the main purpose of them is to understand–sorry, to understand
and uncover the underlying semantic structure of text of your corpus, okay? So let’s look
at an example of what a Topic Model could do for you, right? So this is an article from
the New York Times a couple days ago, a few days ago, and the title is “Economic Slowdown
Catches Up With NASCAR.” And as you can see from the headlines it’s talking about the
NASCAR car racing and it’s also talking about the economic recession. And what Topic Models
do for you is it kind of discovers, it uncovers what the latent topics are in an article or
in a corpus, okay? So you can see in these three colors here, green, orange and a little,
I guess, purplish pink color, the three topics; three of the main topics you can see from
the article. So the green one is about NASCAR races and you can see sort of throughout the
document I’ve highlighted the words that are about that topic, okay? So, NASCAR races,
track, raceway, cars, et cetera. And then, the same thing with orange, is about economic
recession, so you would see–or it’s like sales, costs and so on. And the purple is
the general sports topics. So, [INDISTINCT] Topic Models and the–a Topic Model would
have, it–every document is made up of multiple topics and the title words in the document
are generated from those multiple topics, okay? So LDA, the Latent Dirichlet Allocation,
is one of the simplest Topic Models and it’s very widely used, so–and it’s a generative
model, which means that it tries to mimic what the writing process is, right? So it
tries to generate a document given the topics, okay? So let’s that’s how that works. So bear
with me if you are experts on LDA. I’m sure some of you are. So, here again we start with
the three topics, the NASCAR races, economic recession, and the general sports topic. And
when you have those topics and notice the topics are made up of words–and I’m just
showing you a subset of the words that have high probabilities in that topic. But actually
the topics are multinomials over the entire vocabulary. So, the NASCAR race topic, it
has–it gives high probabilities to those words but there other words in that topic
and they have small probabilities. Okay, so when you have these multinomials over words,
when you want to, say, generate three documents from these three topics, what you would do
is kind of produce or kind of guess these topic distributions of the documents that
you’re trying to generate. So, for example, the middle–the one in the middle, the writer
is thinking, you know, “I’m going to write mostly about the general sports topic and
then I’ll talk a little bit about maybe some of the other topics,” okay? That’s what the
big purple bar means. Okay, so when you have those topic distributions, then what you can
do is–oops, I’m going backwards. Okay, so from, say, the bottom topic distribution,
you can generate the words according to that distribution, okay? So, since we have a lot
of the green, you would see many green words popping up there, okay? And then the same
thing you would do for the other documents. So that’s the generative process of an LDA.
Okay, so let’s look at it from the plate diagram perspective. So here up on the left is the
general–is the widely used plate diagram for this. And what you can see here is the
phi’s, okay, next to the beta there, are the topics, which are the multinomials over the
vocabulary. And then up there up to the right corner over there are the thetas which are
the topic distributions, okay? And then from those topic distributions if you want to generate
one document you would generate sort of the set of rectangles I have over there which
are the words or which are the topics of the words that you’re going to write in your document,
okay? So in this example, I’m looking at the first topic distribution and I’ve sort of
picked out the topics that I want to write about. And then, when you have those topics
then you can look up the multinomial topics over here to generate, to actually come up
with the words, okay? According–so, for example, the first word you’re going to write is in
orange, so you’re going to come here and look at the orange topic and say, “Okay, these
words have high probabilities in these topics I’m going to pick one of those words,” okay?
So that’s what you get in the actual–the rectangle down below, where you actually have
the words in your document, okay, and that’s how the document is generated. Okay, but in
reality, what you have is the–you only observe the words. So these are the documents in your
corpus, okay? So, where this is one document in your corpus and all others like the topic
distributions, the topic themselves, they’re latent. We don’t know them. But the purpose
of fitting the model then is to come up with those topic distributions and the topics,
okay? So they–all those others must be discovered by the model. Okay, so that’s what you do
when you, you know, fit an LDA. Okay, so what does an output of an LDA look like? They look
something like this. There are some other outputs that LDA gives you but one of the
major outputs of LDA is these multinomials over words which are the topics. So the NASCAR
topic, for example, has those words with high probabilities. What I put at the very bottom
row is to let you know that actually every topic has every word in the vocabulary. It’s
just that some like the money word here in the NASCAR topic has very low probability
of it. In actuality it’s probably much lower than that, okay? So that’s what the topics
look like. So if we go back to the question I posed earlier, if you have 130 million books
and you want to answer the question, “What are the books about?” then you can imagine
you can sort of feed these books into an LDA to discover topics, right? So if you represent
one book as one document and you run it over 130 million of them, you can discover the
underlying topics, the underlying semantic structure of your corpus. Let’s look at a
smaller problem since it’s very hard to run LDA on 130 million documents. But if we have
news articles and we have about 200,000 of them over the last 12 months then we can ask
the question, “What are the news articles about?” And this is something that we can
try to solve. And one difference here is that time is a very important dimension here, right?
Because news is inherently sequential and temporal and you want to know what happened
when and how long did it last and so on, okay? So we need something that considers time.
Okay, so we proposed what we call Topic Chains, which is–has the main purpose of uncovering
the underlying semantic structure of a sequential corpus of news, okay? And this is work that
my student did mostly and so if you have any detailed questions about it you can send them
an email, although I’ll try to answer most of them. By the way, if you have any questions,
feel free to raise your hand and ask. Okay, so let’s look at what we have, what kind of
tools that we have available to us now. So if you look at the New York Times, if you
kind of scrolled down to the bottom half of the page, this is what you get. You get a
quick news-at-a-glance type of a thing, right? So you can look at the New York Times, the
front page of that website and kind of figure out what’s been going on in the last couple
of days, okay, in terms of technology, the world, business, arts and so on. And this
is a very good view and I love to look at it. But this, as you can imagine, takes a
lot of intelligence and a lot of work, right? So this is a product of, you know, intelligent
New York Times editors out there who are trying–who are putting this together. And plus, it doesn’t
have the dimension of time because this is a snapshot of the news, right? So here, at
Google, somebody has made this really interesting tool it’s called Google News Timeline. It’s
still in the Google Labs, so you may not–may or may not know. But this is where you can
look at the sequential issues and events, right? So, right now, it’s showing the monthly
view. So you can see what was the more–most important news in March of 2009 and so on.
And you can search, too. So if you search for a certain keyword then you would get articles
that are about that keyword, right, and you can look at the weekly view and the yearly
view as well, I think, and the daily, of course. So this is pretty cool but I think that here
we still have some questions that are unresolved. For example, if you have an article, say,
there’s an important article in March of 2009, are there similar articles that follow that
are talking about the same thing in April of 2009, perhaps a couple or a few months
later, right? And if there are similar articles talking about the same topic over a long period
of time, how long is that period of time? How long did that topic last, right? And if
it’s a long-lasting topic, then is it part of a general sort of professional topic like
the U.S. economy or was it part of a long, running sort of event or issue such as the
H1N1 issue, right? Or was it part of a very short, temporary topic such as the death of
Michael Jackson, okay? So we would like to know those things when we look at the articles,
but at least what we saw in the previous slide didn’t really show that. And if it’s a general,
sort of long-running topic like the H1N1, for example, the topic itself kind of evolves
through the nine months or how many–however many months that it lasts, okay? First, it
was talking about the outbreak, perhaps, maybe it was talking about travel restrictions and
then vaccinations that’s–schools and so on, okay? So we would like to see how the same
topic evolves through time. So, what we propose is something like this. This is a part of
our results, is that you can look at several months of news and kind of look at the topics
and how they’re clustered together in what we call Topic Chains. So you–here you see
that there’s a Topic Chain about labor unions, education, the War in Afghanistan, the swine
flu and so on. And then you would see some events like there was a terror in Hong Kong
or something like that and the death of Michael Jackson. So we produced something like this
where you can see the general perpetual topics, you can see the long-running “It” topics and
then you can see sort of the temporary events that happened. So, this is what we call the
Topic Chains and this is the plan that we had to do something like that, right? So what
we did is we took a bunch of articles over a bunch of months and we divide the corpus
into time slices and we just chose time slice of 10 days each, okay? And for each time slice
you would have a bunch of articles, right, and we can find the topics using just the
simple LDA. And when you have the topics from the LDA then you can try to match them up
to see which topics are similar, okay? And when you have the similar topics, you can
sort of link them up into Topic Chains. And once you–yeah?
>>At what similarity are topics based on?>>OH: Yes, so I’ll talk about similarity
metrics in the next slide, I think, or a couple of slides. And then once we have the Topic
Chains we can identify which are the long topics, which are the short topics, and within
the long topics we can sort of see what the topic evolution looks like. Okay, so let me
talk now about sort of each of those steps; except for the first one because that one
is trivial. So we worked with the corpus of nine months of news in Korea. So we took the
websites of three major newspapers and collected documents and articles from all of them. The
corpus looks like that, 130,000 articles, 140,000 unique words, named entities, and
we chose 50 as the top–the number of topics per each time slice for a total of 1,400 topics.
And let me just show you the results of the LDA; finding topics using LDA. And there’s,
since there’s 1,400 of them, I can’t really show you them all, but I’m just showing you
four examples of how the topics turned out pretty good. The first one you can see is
about sports. The second one is about business and then about smartphones and technology.
And the last thing, the last topic is about Academia. So when we have those topics, we
can construct Topic Chains like this where you look for similar topics within certain
window size. And we also–I’ll show you an experiment that we did with increasing or
decreasing the window size and what happens there. But what it means is do you look at
only the time slice before or how many time slices do you go back to find the similar
topics? Okay, so here comes the answer to that question, measuring similarity. This
is kind of an important issue because the major thing about Topic Chains is that we’re
finding similar topics, right? So, remember, the topics look like those, which are multinomials
over words. And so you can imagine various ways to measure similarity and within the
Topping modeling research community people have used most of these metrics. Most notably,
they usually use KL divergence or cosine similar–cosine, yeah, cosine similarity. And I’ve kind of
categorized these six–or is it six or five–six similarity metrics by how each metric looks
at the topics, okay? So the first thing, you can look at–or I said that our topic is multinomial
over the vocabulary, right? So if you have two probability distributions and you want
to measure the distance, then KL divergence is the answer, right, or JS divergence, which
is the symmetric version of KL divergence, okay? Or you can look at a topic as a vector
where each dimension is a probability of the word in the topic. So if you take that view
then you can use cosine similarity because to measure the distance between two vectors,
right? Or you can use Kendall’s Tau if you look at a topic as a list of ranked words,
a ranked list of words. So if we just ignore the probabilities but just look at, you know,
NASCAR is the first rank and so on then we can use Kendall’s Tau or DCG which is used
I guess a lot in information retrieval. And then lastly, if you look at only the subset
of words that have top–high probabilities, then we can look at the intersection and unions
of sets, which we can measure with Jaccard’s coefficient. So we wanted to test these metrics
to see which would be the most or the best performing similarity metric, okay? So what
we did is this, we computed the log likelihood of data of the corpus, given the topics that
LDA found. What that means is if you have a small negative log likelihood of data given
the topics, then that means your topics are explaining your corpus very well, okay? So
the higher the value, there’s sort of a mismatch between your topics and your corpus, okay?
So it’s kind of like perplexity, too. So what we did is we took an original set of 50 topics
that LDA found for each–for one time slice and replaced five of those topics with similar
topics that are found by each of the metrics, each of the six metrics, okay? So, for example,
you know, if KL divergence says, among these 50 topics and then another set of 50 topics
in the next time slice these five or the most similar pairs, then we replace the topics
from the second time slice and kind of put them in the first time slice. So you would
have the 45 of the original topics plus five new ones that KL says is most similar or most
similar, okay? So then, when we compute the log likelihood of the modified topics or the
log likelihood of the data given the modified set of topics, then we can see which of the
similarity metrics found the most similar topics, okay? So, as I said before, KL divergence
and cosine similarity are most often used similarity metrics and we found that JS divergence
actually performs a little bit better. And the asterisks next to the metrics mean that
there is a significant difference, statistically significant difference, between that metric
and JS divergence. And Jaccard’s coefficient performs pretty well too, but we didn’t use
that because you have to have this parameter. There’s a parameter that we had to set and
we thought that that’s probably not as general as just using JS divergence with no parameter.
So, that’s what we chose to use as our similarity metric for constructing the Topic Chains.
And let’s now talk about the size of the window, the size of the sliding window. So if we can
take the [INDISTINCT] assumption and just look at one time slice to find a similar topics
but then you wouldn’t find that case of, sort of the long arrow over there, where a topic
was kind of an important issue for awhile and then it kind of disappears for a few weeks
and then comes back again. So, we didn’t want to miss that similarity chain there. So, this
is what we did as the experiment to see how the sliding window size affects the resulting
Topic Chains. So, up at the very top, is a set of Topic Chains found when we use the
sliding window of size one looking back just one time cycle, and then at the bottom is
the sliding window of size six. So, it’s kind of an obvious result, but you can see when
you’re looking back only one time slice then the Topic Chains are kind of fragmented and
they’re kind of dispersed all over the place. But as you increase the window size, then
the Topic Chains over there, that had sort of a little gap at the middle–in the middle
and then continued a few weeks later, they kind of merged together, right. So the Topic
Chains has become larger, longer, and you would find these pretty large Topic Chains
at the bottom. What’s interesting is, if you look the middle one, where there are two major
ones and then they come together at the size of five, those Topic Chains are about technology
and business. So one of them is about the technology itself, sort of manufacturing,
research, development, type of topics and then the other one is sort of the business
side of the technology, okay? So, you can see that they kind of merge together at the
window size of five, and it’s kind of hard to interpret that in terms of what it means
for the user, right? So if the user wants those to be kind of separately separate then
that’s probably what we should do. But if you want them to be sort of in this same Topic
Chain then you might want to go with the larger window size. But, in general, as you increase
the size of the sliding window, the Topic Chains tend to become more abstract. So at
the end, you would have something that’s similar to like the sections in your newspaper, right?
So, so business, life and, you know, culture, and, you know, the world news, and so on,
right; whereas sort of in the middle you would have sort of more concrete topics. Okay, so
that’s what that shows. Let’s look a little bit closer at the chains themselves, what
they mean. So if we look at the long chains, for example, the swine flu chain, right, you
want to know more than just that there was this big chain of swine flu and we can see
that, you know, in 2009 it was kind of a big issue for most of the year, we want to know
how that topic actually changed, okay? So as I talked about before, first it was talking
about the outbreak and then vaccinations and so on, right? So we call those focus shifts.
So within a Topic Chain, we can look at how the focus shifts in the chain. And I apologize
for the small font. It’s really hard to see. But this Topic Chain is about automobile industry,
okay? So, I’ll just read you–on topic number one has the tab words automobiles, Vietnam,
Kia Motors, vehicle and sales. And topic number three, which is right below that, is develop,
technology, automobile, investment, and industry, okay? And then the other topics are pretty
similar to that. So you really can’t tell what is going on just by looking at the topics
themselves, okay? So what we wanted to do is look at words that changed the most between
two similar topics. So, what happened between this topic and the next topic? And if we look
at the words that are not common but are most different among the two topics, then you can
sort of figure out what’s been going on. And on top of that, we looked at just the named
entities. Named entities are things like names of organizations, names of people, sort of
specific things like that, because a lot of the news, a lot of the events that happened
in the news are about specific people or organizations and so on, okay? So coming–going from number
one to number three, again, when we do that the–excuse me–the named entities that we
find are green, solar, Japan, energy, and what’s the last one–and carbon, yeah. So
that tells you that there was something going on in the second–so those are the words that
changed–that increased the most in probability from topic number one to topic number three.
And we–if we just go back and look at the headlines, you see that there was–there were
a few headlines that are talking about Japanese carmakers like Toyota coming up with solar
powered cars, okay? So, if we look at this sort of closed–close up view of the Topic
Chains and the named entities that changed, then we can have a much deeper understanding
of the evolution of the topics. Okay, now, let’s look at the short chains and this is
pretty interesting. Here, every line is a topic and the left column is the date. So,
0P 07 means the first 10 days of July in 2009, there was–there was a missile launched. There
was a discussion over the North Korea missile launch. And then the next line talks about
the death of Michael Jackson and then some milk scandal and then some heightened–a topic
about heightened security at the end of the year and then some romance over, you know,
entertainment people. In April of–April is when Korea has Arbor Day, so talking about
trees and stuff like that. And then the last topic is kind of interesting. Obama, Republicans,
Jeju Island is an island in Korea that’s used for resorts and playing golf and you see golf
and Tiger Woods. But, I don’t know if Obama went there to play golf or not, with Tiger
Woods maybe, but what we can interpret that is–that LDA found a topic that’s kind of
not about a single topic and LDA often does that. If you’ve ever run LDA or any other
Topic Models–oops–you’ll find many of–or some of the results, some of the topics that
you find are not really coherent. So, anyway, these are short Topic Chains which means they’re
like two or three topics or even one, two, or three topics, and they represent mostly
temporal events, temporal issues or they could be about incoherent topics. And you can kind
of see how, if it’s a coherent topic, then it would more easily find similar topics in
the next time windows, right? Okay, so that’s actually the end of the Topic Chains part
of the talk. How am I doing on time? Okay. I’ll go quickly over the next topic. So, we
propose Topic Chains, which is a framework based on very simple LDA to understand what’s
going on in the news corpus. Okay, now, let me switch gears and talk about sentiment and
aspects and reviews. So, the model is called Aspect Sentiment Unification Model and its
main purpose is to uncover the structure of aspects and sentiments in a review, okay?
And this is another student of mine who worked on this mostly. And the promise is this, if
you go to Amazon–this is a review of a digital camera. It’s a very long review. It’s like
a–it’s like a conference paper almost and this is actually not the end, there’s more.
But it’s a very, you know, detailed review. He talks about or this user talks about a
lot of good things and bad things about this camera and we want to do something like this,
right? Amazon does sort of aspect or attribute based sentiment analysis of the review. So,
in addition to the general, how many stars did this camera get? It also gives you how
many stars for the picture quality and so on. The way Amazon does it–I don’t know exactly
how they do it, but I noticed that–so this is a camera with lots and lots of reviews
like 300 reviews or so. And then there’s–the same Canon digital camera, certain other models
which have very few reviews, and for those we actually don’t have these attributes. So
it looks like there’s some manual work and some automated way of looking at what the
attributes are. And we call those attributes aspects, and there are things like this; this
thing is small and it’s light, starts up and turns off fast, the low light performance
is best. And so these are actual sentences from the reviews. And the sentiment is something
like this. The words highlighted in pink are the ones that carry sentiment for this–for
each sentence. Okay. So let’s look a little bit closely at what these sentiment words
are, okay? Some of them are general sort of effective words that express emotion like
love; “I love this,” “I’m satisfied,” “I’m dissatisfied,” “I’m disappointed,” okay? And
then, some of the other ones are general sentiment words like, “Best, excellent, bad.” There–they
evaluate the quality of something, but they’re just general. If something is best, then it’s
best no matter if it’s a coffeemaker or a chair, right? And then there are aspect specific
evaluative words. And this is a little more fine-grained than domain specific evaluative
words. So let me show you what I mean. In the camera domain, okay, if you say, “This
camera is small,” it’s probably a good thing. “The LCD is small,” it’s probably a bad thing,
right? If you’re in the restaurant domain, “The beer was cold,” is good. “Pizza was cold”
is bad. And, “The wine list is long,” is good and “The wait is long,” is bad. So, beyond
the domain, right, we need to go sort of down to each aspect of the review and say whether
the sentiment word there expresses positive or negative sentiment. Okay. So this is the
problem that we’re trying to solve. Okay, we’re trying to discover the aspects automatically
as well as the sentiment and the words that carry the sentiment. So to do that we made
two models, one is called “Sentence LDA,” the other is called “Aspect Sentiment Unification
Model.” And we worked with two types of corpura. The first one was Amazon reviews and we took
seven product categories, including digital cameras, coffeemakers, I think, heaters and
things like that. They’re just pretty different electronic products–oops. And we also looked
at the Yelp restaurant reviews over four cities and 328 restaurants. And on average, each
review had 12 sentences. And our observation starts again with the same set of sentences.
What we noticed here is that for many of the sentences in the reviews, one sentence describes
only one aspect, okay? And this is different from the general LDA assumption which is that
each word in the corpus, each word in the document, represents or is generated from
one topic or if you applied it to aspects, one aspect, okay? So we wanted to make this
sentence LDA. If you noticed, the only difference is the box around the W circle, okay? So what
that means is the words–and N is the number of words in your documents, so each word is
generated–but Z, which are the topics, are over M, which is the number of sentences,
okay? So we’re saying there are only M aspects in that document, which is the number of sentences
in that review, okay? And each sentence has one topic or one aspect, okay? So that’s the
basic difference between LDA and SLDA. And what we found is that when we run SLDA over
our data–so this is–oh, this–yeah, the results from both the Amazon reviews, and
the last one actually is from the restaurant review. So, remember, we ran SLDA over all
sort of seven categories of electronics reviews and we get these aspects. They are similar
to what we saw earlier in the Amazon attribute categories, okay? So the portability, quality
of photo and ease of use, those are the three–and the camera product. And then if you look at
the laptop reviews the first one is about software and OS and then the second one is
about hardware and so on. And some–and what we found when we compared the results of LDA
versus SLDA is that LDA or SLDA was finding more product specific aspects, okay? For example,
the last one, liquors category, liquors topic or aspect was not found by LDA. Instead, the
words like beer and wine and martini was actually one of the top words, too. They were kind
of spread out over different topics, like wine was maybe with the Italian food aspect
and so on. So, I think it’s important to notice that SLDA, because of that one difference
in the assumption it makes, finds better product specific aspects, details of the reviews.
Okay, we then took SLDA and extended it to form a joint model over aspects and sentiments,
okay? So the right side of the model, which has the gamma, the pi, and the S, so S is
the sentiment and you can see that word is now generated from a pair of sentiment and
aspect, okay? So, with this joint model then, if we run it over the corpus, without using
the labels, any labels of the corpus, of the documents, we can automatically discover the
aspects and the sentiments, okay? But we do use seed words. We took Turney’s Paradigm
words because they’re kind of generic paradigm sort of sentiment words that a lot of people
use, like good and nice, and bad and nasty and then–so that was one set of seed words
we used. We also augmented the paradigm words a little bit with other sort of general sentiment
words that we found from the corpus, okay? So–and what we do with this–with these sentiment
words that are a little bit different from other prior work in this joint modeling of
sentiment and aspect is that we build the seed words right into the model by playing
with the priors of the LDA, okay? So setting asymmetric priors and initializing Gibbs sampling,
which is an inference algorithm to kind of play with the seed words, okay? And I’ll–let
me explain that a little bit better here, although I didn’t even talk about Gibbs sampling,
so if you want to explain that, we can talk later after the talk. So beta is the prior
for the Dirichlet Distribution over the phi’s, okay? And what that means is, do we start
with a uniform distribution of betas, which means that every distribution is equally likely?
If we play with the betas and do asymmetric priors, then we’re saying some of the distributions
are more likely than others. Okay, so what we do–what we did with SLDA is we just used
the uniform priors. What we do with the betas here is we set zero–beta to zero, for any
negative sentiment seed words in the other, in the opposite–in the positive sentiments,
okay? That means–and we do vice-versa for the negative–for the positive sentiment seed
words, okay? That means if you have a positive seed word like “good,” then it’s not going
to be assigned a non-zero probability in a negative sentiment–negative aspect sentiment,
okay? And also, we start Gibbs sampling, we initialize a Gibbs sampling by setting the
positive seed words to have positive sentiment and the negative seed words to have negative
sentiment. So that’s opposed to randomly assigning sentiment, which is what we usually do for
Gibbs sampling, okay? So the combination of those two makes the seed words kind of right
into the model without fidgeting anymore with the words themselves. Okay. So these are senti
aspects discovered by ASUM, as we call the model, and these–so, every multinomial now
is–so every word in the multinomial is generated either by the sentiment or by the aspect or
they’re actually jointly by the pair sentiment and aspect, okay? So interesting results here,
like the meat senti aspects here–meat positive and meat negative, the meat aspect was not
found by SLDA. So what this tells you is that for some senti aspects, if there’s strong
sort of sentiment correlated with that aspect, then it comes out better with the ASUM model
than it does for just SLDA without sentiments built-in. So in original SLDA, what happens
is the meat aspect is kind of scattered around again in, you know, in pizza, in burger steak,
right? Those aspects have meat words in them. But because we’ve forced kind of the sentiment
to play a bigger role in finding the aspects, we see aspects like that. And an interesting
case that we see with payment is that we only get a negative aspect for payment. We don’t
get a positive senti aspect for payment. It’s the same thing with parking, too. What that
tells us, just an interesting bit, is that people complain about payment not being able
to use their credit card or they complain about parking situations. But if they have
some satisfaction with it, they don’t really write it in the reviews. Okay. And the yummy
aspect is kind of funny; too, with the last word is funny, right? So that aspect is something
that LDA doesn’t really find. Okay. So let me go on. So what we can do with these topics,
okay, the words in the topics is then we can try to figure out which are the sentiment
words and which are the aspect words, right? So if we have the two meat senti aspect words,
we can look at the words that appear common across the two sentiments like meat and–I
don’t know what else, sauce, I think, and we say, “Those are the common sort of aspect
words for the meat category,” okay? Whereas, things like “crispy” or “blend” are the sentiment
carrying words for that aspect, okay? So those are the aspect specific sentiment words. And
so, what we do is we align the senti aspects with the similar aspects again and then we
look at the positive aspects, the negative aspects, and we look at the common words and
the words that have a lot of difference in them to figure out things like this. The screen
aspect, the words, the common words are like screen, glossy LCD and then the sentiment
words are like bright, clear. Those are the positive words and, like, reflect, glare–MacBook,
obviously, it doesn’t have a good screen. So–or apparently. I don’t agree with that
personally, but anyway, so that’s what we can do with those topics. Okay, let’s look
at some other results of this. So here is a result that shows you that sentiment classification
per sentence is done pretty well. So we–so these are two reviews. The first one is about
a coffeemaker and then the second one is about a restaurant. And you can see in green, those
are the positive sentiment sentences. So that’s what the model found as the positive sentences
and the ones in pink are the–where the model found them to be negative. And, of course,
I’m going to show you good examples. But most of the examples are pretty good, okay? So,
another set of results we can do–we can look at is, how well are the aspects assigned to
the sentences, right? So these are four different reviews where the same aspect was found. The
aspect–the senti aspect of parking and the negative sentiment, and you can see that parking
is only validated for three hours and so on. So those are–and these came out pretty well,
right? Here’s another example. Some of these things like very convenient, how the model
find–found that to be coffeemaker easy? It could be some–I don’t know. It’s probably
just because convenient is up there as one of the top probability words for that senti
aspect. Some of the other shorter sentences this model has trouble with because there’s
not enough clue, okay? But–and I do like to show some of the bad examples as well.
So the second one, it took us several uses to understand how much coffee to use. That’s
obviously not a positive sentiment, but the model classified it as that. But, you know,
one out of five is not bad, right? Okay, so those are the senti aspects assigned to sentences
that we can see. Oh, and the last one, I put in there to show you that our assumption that
one sentence carries one aspect may not always be true, right? So the last sentence is talking
about how nice it looks and how easy it is to use. But you can kind of say, you know,
“Are they the same thing or not?” I don’t think they’re really the same thing. Nice
looking is probably–or it should be another aspect, sort of the design aspect of the product
and then the ease of use is the usability of the product, right? So we do see a lot
of sentences in our corpus that do not validate our assumption, that of one sentence equals
one aspect or even one sentence is one sentiment. But–so that’s future work for us, right,
to deal with sentences like that. Okay, so all of these that I’ve–the results that I’ve
showed you, because of the way Topic Models, you know, they produce these topics, it’s
really hard to evaluate them. There’s really no good way to quantitatively evaluate the
aspect. So we can’t ask users to go through 20,000 reviews and find all the aspects and
kind of compare them against our results, right, or the sentiments. So the sentiment
would be a little bit easier to do. So what we do with the sentiment actually, is we quantitatively
measured how sentiment classification is done against other generative models that jointly
model sentiment and aspect together. And so, this is–I have to tell you, though, so these–our
model as well these other models, JST and TSM are the two models that we’re comparing
against, they’re all not designed for sentiment classification per se. So they’re all trying
to discover aspects and sentiments together and come up with these sentiment words and
so on. They’re not, you know, models to do classification and neither is ours. But we
put this experiment in there to show at least that sentiments are found well, okay? So let
me explain the different things. ASUM, the blue, is our model with the regular paradigm
words. I think there’s like a dozen of the paradigm words that I showed you. ASUM Plus
is the paradigm plus words, the augmented list of words, and then GST Plus and TSM Plus
we also–they also use seed words, so we use the same set of paradigm plus words. And we
implemented those two models and rank classification over our own corpus to see how they–how well
they perform. So you can see the red line, which is ASUM Plus, performs the best. The
next one is the blue with–so, ASUM without the paradigm plus words and then the other
models don’t perform as well on the classification task. So, just to tell you, these models are
pretty similar to ours. They both don’t have the one sentence, one aspect assumption built-in.
They don’t use the seed words right into the model they do something else with the seed
words. So those are sort of the main differences I would say between those two models and our
model. Okay, so let me just wrap up. I think time is probably up, too. I just talked about
SLDA and ASUM, which are the two models, extensions of the basic LDA to discover sentiment and
aspect together. And we discovered that specific aspects that we found were pretty well aligned
with the details of the reviews that people actually write and we can, by looking at the
topics and the words within the topics, we can learn aspect-specific sentiment words.
And lastly, we just tested with sentiment classification and found that it performs
pretty well, okay? So just to wrap up now really, Topic Chains and ASUM are the two
things that I’ve talked about and they both work with LDA, right? So they’re on different
domains, one is on the news domain and then the other one is in the reviews domain. We’re
trying to do the same thing. We’re trying to uncover what is latent, what is the hidden
semantic structure within the news corpus and within the reviews corpus. So if you’re
interested in our work, further discussions with me and my students, please send us email
or you can look at our website to see what the latest things are going on. Okay. Thank
you. Questions?>>Could you comment on what you discovered
when you ran these two books? Like you were mentioning the…
>>OH: Yeah.>>…130 million books. These books are a
lot different than…>>OH: Yeah, yeah. So we didn’t–we haven’t
done that.>>You haven’t done that? Oh, okay.
>>OH: Yeah. Well, I should ask Google Books to do that for me. I don’t–I can’t imagine
what the results would be. One thing about LDA, though is that–and Topic Models in general,
is that they are very computationally expensive. As you can imagine, right? If–as the document
size grows large, the number of vocabulary–unique vocabulary grows large and you’re doing Gibbs
sampling over all your vocabulary at each iteration, and you have to do thousands of
iterations to converge. So the inference part is difficult. And we would like to maybe use
Google and, you know, use distributed computing and all that to figure out how to do that.
Oh, question?>>So the aspects you found usually it seems
not to the ones defined by users, like the camera. Is there any way you can specify,
“Okay, I like to find those aspects specified by users?”
>>OH: Yeah, that’s a good question. So we are thinking about it. We haven’t done anything.
I don’t know how to do it. Sort of like the sentiment seed words you can have maybe seed
words for aspects too, to say, “I want to find these aspects.” Good question and a good
idea for an extension of this work. Okay. Thank you, again.

6 Replies to “Topic Models Applied to Online News and Reviews

  1. David Blei has extended the original LDA model to account for topic evolution over time. Imagine that original plate structure repeated several times with time dependence connecting each plate structure together. It's essentially LDA with a Markov Chain structure. This approach seems like a simplification of that model.

Leave a Reply

Your email address will not be published. Required fields are marked *