yu-na.log

<게재논문> 스마트 관광 활성화를 위한 트립어드바이저 애플리케이션 리뷰 분석 : 토픽 모델링을 중심으로

Fri, 23 May 2025 12:05:53 GMT

스마트 관광 활성화를 위한 트립어드바이저 애플리케이션 리뷰 분석 : 토픽 모델링을 중심으로

초록

정보통신의 발달과 스마트 기기의 발전 및 보급 향상은 관광 형태의 변화를 야기하였고, 이후 스마트 관광이라는 개념이 등장하였다. 이에 스마트 관광 정책 및 설문에 관한 연구가 진행되고 있으나 애플리케이션 리뷰에 관한 연구는 미비한 편이다. 본 연구는 구글 플레이 스토어 내의 스마트 관광 분야의 대표적인 애플리케이션인 트립어드바이저 애플리케이션 리뷰 데이터를 수집하여 LDA(Latent Dirichlet Allocation)토픽 모델링을 통해 사용 용도와 사용자 만족을 파악하고자 한다. 분석 결과 4개의 토픽이 도출되었으며 2개의 토픽에서는 긍정적인 평가를 나머지 2개의 토픽에서는 부정적인 평가를 하고 있었다. 사용자들은 해당 애플리케이션의 숙박 및 관광 명소 추천 시스템에 만족하고 있음을 알 수 있었으며 검색 시 설정한 필터가 적용되지 않거나 업데이트 후 리뷰가 게시되지 않음에 불편을 겪음을 알 수 있었다. 이에 다양한 카테고리를 추가하여 사용자에게 다양한 경험을 제공함에 따라 만족도 향상에 도움이 될 것으로 기대된다. 또한 필터 기능을 포함한 애플리케이션 내의 문제를 파악하여 애플리케이션 환경 점검과 해당 기능 오류 개선을 한다면 사용자 만족도를 향상시킬 수 있을 것으로 기대된다.

###### ■ 중심어 : 스마트 관광; 관광 애플리케이션; 사용자 리뷰 분석; LDA; 텍스트 마이닝

https://www.dbpia.co.kr/Journal/articleDetail?nodeId=NODE11783938

<코드>트립어드바이저 논문 코드

Fri, 23 May 2025 11:17:50 GMT

Messenger 코드와 치이점

(긍정/부정) 분리 유무 Messenger: 리뷰 평점(score)을 기준으로 긍정/부정 리뷰를 분리 후 각각 분석 TripAdvisor: 감성 구분 없이 전체 리뷰를 통합하여 분석
텍스트 전처리 방식 Messenger: 표제어 추출과 POS 태깅 사용 (NOUN, ADJ 등) TripAdvisor: 표제어 추출 외에 bigram/trigram(연결된 단어들) 생성도 수행
토픽 수 결정 방법 Messenger: coherence score와 harmonic mean을 함께 사용 TripAdvisor: coherence score만 사용

import re
import numpy as np
import pandas as pd
from pprint import pprint

# Gensim
import gensim
import gensim.corpora as corpora
from gensim.utils import simple_preprocess
from gensim.models import CoherenceModel

# spacy for lemmatization
import spacy

# Plotting tools
import pyLDAvis
import pyLDAvis.gensim_models
import matplotlib.pyplot as plt

#Enable logging for gensim
import logging
logging.basicConfig(format='%(asctime)s:%(levelname)s:%(message)s',level=logging.ERROR)

import warnings
warnings.filterwarnings('ignore',category=DeprecationWarning)

# NLTK stop words
from nltk.corpus import stopwords
stop_words=stopwords.words('english')
stop_words.extend(['great','app','good','apps','really','nice'])

import os

df=pd.read_csv("./dataset/tripadvisor.csv" )
df.columns
df=df[['content']]
df=df.dropna()
df['content']=df['content'].astype(str)
df['content']=df['content'].apply(lambda x: x.encode("utf-8").decode("ascii","ignore"))


# Tokenize words and Clean-up text
data= df.content.values.tolist()
def content_to_words(sentences):
    for sentence in sentences:
        yield (gensim.utils.simple_preprocess(str(sentence),deacc=True)) #deacc=True removes punctuations

data_words=list(content_to_words(data))
print(data_words[:1])
count=[len(sublist) for sublist in data_words ]
df['words']=data_words
df['review_len']=count
data_words=[sublist for sublist in data_words if len(sublist)>1]
df=df[df['words'].map(len) >1]

# Creating Bigram and Trigram Models
bigram=gensim.models.Phrases(data_words,min_count=5,threshold=10) #higher threshold fewer phrase
trigram=gensim.models.Phrases(bigram[data_words],threshold=10)

#faster way to get a sentence clubbed as a trigram/bigram
bigram_mod=gensim.models.phrases.Phraser(bigram)
trigram_mod=gensim.models.phrases.Phraser(trigram)


# Remove Stopwords, Make Bigrams and Lemmatize
# Define functions for stopwards, bigrams, trigrams and lemmatization
def remove_stopwords(texts):
    return[[word for word in simple_preprocess(str(doc))if word not in stop_words]for doc in texts]

def make_bigram(texts):
    return [bigram_mod[doc] for doc in texts]

def make_trigrams(texts):
    return[trigram_mod[bigram_mod[doc]] for doc in texts]

def lemmatization(texts, allowed_postags=['NOUN','ADJ','VERB',"ADV"]):
    """https://spacy.io/api/annotation"""
    texts_out=[]
    for content in texts:
        doc=nlp(" ".join(content))
        texts_out.append([token.lemma_ for token in doc if token.pos_ in allowed_postags])
    return texts_out

# remove stopwords
data_words_nostops= remove_stopwords(data_words)

# form bigrams
data_words_bigram= make_bigram(data_words_nostops)
print(data_words_bigram[:1])

data_words_trigram=make_trigrams(data_words_nostops)
print(data_words_trigram[:1])

nlp=spacy.load("en_core_web_sm",disable=['parser,ner'])
data_lemmatized= lemmatization(data_words_trigram,allowed_postags=['NOUN','ADJ','VERB','ADV'])

print(data_lemmatized[:1])
df['words']=data_lemmatized

data_lemmatized=[sublist for sublist in data_lemmatized if len(sublist)>1]
df=df[df['words'].map(len) >1]

id2word= corpora.Dictionary(data_lemmatized)

#Create Corpus
texts=(data_lemmatized)

#Term Document Frequency
corpus=[id2word.doc2bow(text) for text in texts]

print(corpus[:1])

#If you want to see what word a given id corresponds to, pass the id as a key to the dictionary.
id2word[0]

[[(id2word[id],freq) for id, freq in cp]for cp in corpus[:1]]

def compute_coherence_values(dictionary, corpus, texts, limit, start=2, step=3):
    coherence_values = []
    model_list = []
    for num_topics in range(start, limit, step):
        model = gensim.models.ldamodel.LdaModel(corpus=corpus,id2word=id2word,
                                         num_topics=num_topics,random_state=100,
                                         alpha='auto',per_word_topics=True)
        model_list.append(model)

        coherencemodel = CoherenceModel(model=model, texts=texts, dictionary=dictionary, coherence='c_v')
        coherence_values.append(coherencemodel.get_coherence())
    return model_list, coherence_values

model_list, coherence_values = compute_coherence_values(dictionary=id2word, corpus=corpus, texts=texts, start=2, limit=20, step=2)

#Show graph
limit=20; start=2; step=2;
x = range(start, limit, step)
plt.plot(x, coherence_values)
plt.xlabel("Num Topics")
plt.ylabel("Coherence score")
plt.legend(("coherence_values"), loc='best')
plt.show()



lda_model= gensim.models.ldamodel.LdaModel(corpus=corpus,id2word=id2word,
                                           num_topics=4,random_state=100,
                                           alpha='auto',per_word_topics=True)

# Print the Keyword in the 10 topics
doc_lda=lda_model[corpus]
pprint(lda_model.print_topics())


#compute perplexity
lda_perplexity=lda_model.log_perplexity(corpus)
print('\nPerplexity:',lda_perplexity)# a measure of how good the model is, lower the better.


#compute coherence score
coherence_model_lda=CoherenceModel(model=lda_model,texts=texts,dictionary=id2word,coherence='u_mass')
coherence_lda=coherence_model_lda.get_coherence()
print('\nCoherence Score:',coherence_lda)

vis=pyLDAvis.gensim_models.prepare(lda_model,corpus,id2word)
pyLDAvis.save_html(vis,"TripAdvisor2.html")

# 1. Wordcloud of Top N words in each topic
from matplotlib import pyplot as plt
from wordcloud import WordCloud, STOPWORDS
import matplotlib.colors as mcolors

cols = [color for name, color in mcolors.TABLEAU_COLORS.items()]  # more colors: 'mcolors.XKCD_COLORS'

cloud = WordCloud(stopwords=stop_words,
                  background_color='white',
                  max_words=50,
                  colormap='tab10',
                  color_func=lambda *args, **kwargs: cols[i],
                  prefer_horizontal=1.0)

topics = lda_model.show_topics(num_words=50,formatted=False)

fig, axes = plt.subplots(2, 2, figsize=(30,30), sharey=True)

for i, ax in enumerate(axes.flatten()):
    fig.add_subplot(ax)
    topic_words = dict(topics[i][1])
    cloud.generate_from_frequencies(topic_words, max_font_size=300)
    plt.gca().imshow(cloud,interpolation='bilinear')
    plt.gca().set_title('Topic ' + str(i), fontdict=dict(size=16))
    plt.gca().axis('off')


plt.subplots_adjust(wspace=5, hspace=5)
plt.axis('off')
plt.margins(x=5, y=5)
plt.tight_layout()
plt.show()

def format_topics_sentences(ldamodel=None, corpus=corpus, texts=data):
    # Init output
    sent_topics_df = pd.DataFrame()

    # Get main topic in each document
    for i, row_list in enumerate(ldamodel[corpus]):
        row = row_list[0] if ldamodel.per_word_topics else row_list
        # print(row)
        row = sorted(row, key=lambda x: (x[1]), reverse=True)
        # Get the Dominant topic, Perc Contribution and Keywords for each document
        for j, (topic_num, prop_topic) in enumerate(row):
            if j == 0:  # => dominant topic
                wp = ldamodel.show_topic(topic_num)
                topic_keywords = ", ".join([word for word, prop in wp])
                sent_topics_df = sent_topics_df.append(pd.Series([int(topic_num), round(prop_topic,4), topic_keywords]), ignore_index=True)
            else:
                break
    sent_topics_df.columns = ['Dominant_Topic', 'Perc_Contribution', 'Topic_Keywords']

    # Add original text to the end of the output
    contents = pd.Series(texts)
    sent_topics_df = pd.concat([sent_topics_df, contents], axis=1)
    return(sent_topics_df)


df_topic_sents_keywords = format_topics_sentences(ldamodel=lda_model, corpus=corpus, texts=texts)

# Format
df_dominant_topic = df_topic_sents_keywords.reset_index()
df_dominant_topic.columns = ['Document_No', 'Dominant_Topic', 'Topic_Perc_Contrib', 'Keywords', 'Text']
df_dominant_topic.head(10)
df[['Dominant_Topic', 'Topic_Perc_Contrib', 'Keywords']]=df_dominant_topic[['Dominant_Topic', 'Topic_Perc_Contrib', 'Keywords']].values

import  os
file_path = os.getcwd()
file_name = 'tripadvisor_topics2 .xlsx'
save_file = os.path.join(file_path, file_name)
df.to_excel(save_file,
                 engine='openpyxl',
                 startrow=0,
                 startcol=0,
                 header=True,
                 na_rep='NaN',
                 float_format='%.2f',
                 sheet_name='Sheet1'
                 )


#pyLDAvis 저장
#pyLDAvis.save_html(vis,"200314_after_neg_lda_model_4.html")

from gensim.test.utils import datapath
#saving model to disk.
temp_file = datapath("tripadvisor_model")
lda_model.save(temp_file)


#loading model from disk

from gensim import  models

lda = models.ldamodel.LdaModel.load(temp_file)

#로드
# neg_4 = datapath("230314_after_neg_lda_model_4")
# neg_lda4= models.ldamodel.LdaModel.load(neg_4)
# vis = pyLDAvis.gensim_models.prepare(neg_lda4, neg_corpus, neg_id2word)
# pyLDAvis.save_html(vis,"200313_after_neg_lda_model_4.html")

all_topics = {}
num_terms = 10  # Adjust number of words to represent each topic
lambd = 1
# Adjust this accordingly based on tuning above
topic_Term = []
topic_relevance = []
for i in range(1, 5):  # Adjust this to reflect number of topics chosen for final LDA model
    topic = vis.topic_info[vis.topic_info.Category == 'Topic' + str(i)].copy()
    topic['relevance'] = topic['loglift'] * (1 - lambd) + topic['logprob'] * lambd
    topic_Term.append(topic['Term'])
    topic_relevance.append(topic['relevance'])
    all_topics['Topic ' + str(i)] = topic.sort_values(by='relevance', ascending=False).Term[:num_terms].values

pd.DataFrame(all_topics).T


wc = WordCloud(width=1000, height=1000, background_color="white")

plt.figure(figsize=(30,30))
for t in range(lda_model.num_topics):
    plt.subplot(2,2,t+1)
    x = dict(zip(topic_Term[t],topic_relevance[t]))
    im = wc.generate_from_frequencies(x)
    plt.imshow(im)
    plt.axis("off")
    plt.title("Topic #" + str(t+1), size=50)

plt.show()

<코드>Messenger 논문 코드

Fri, 23 May 2025 10:58:58 GMT

분석 흐름 요약

Google Play Store에서 Messenger 앱 리뷰 크롤링
리뷰 날짜 기준으로 팬데믹 전(before) / 후(after) 분리
thumbsUpCount == 0인 리뷰 제거 → 실제 반응 중심으로 정제
긍정/부정 리뷰 분류 (별점 기준)
텍스트 전처리 (불용어 제거, 표제어 추출 등)
gensim을 통한 LDA 토픽 모델링 → 최적 토픽 수 선정
토픽 시각화 (pyLDAvis, WordCloud)
토픽별 주요 키워드 정리 & 영향력 있는 리뷰 추출
사용자 반응의 변화 포착: 팬데믹 전후 긍·부정 리뷰의 주제 비교

1. 라이브러리 불러오기

from google_play_scraper import Sort, reviews_all
import pandas as pd
import nltk
import spacy
import warnings
import time
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm
import numpy as np
from nltk.corpus import stopwords
from gensim.utils import simple_preprocess
import gensim.corpora as corpora
from gensim.models import CoherenceModel, LdaModel
from gensim.test.utils import datapath
from wordcloud import WordCloud
import pyLDAvis
import pyLDAvis.gensim_models
import dataframe_image as dfi
import re

2. google_play_scraper를 통한 리뷰 크롤링

messenger_review=reviews_all('com.facebook.orca',
                           sleep_milliseconds=0,
                           lang='en',
                           country='us',
                           sort=Sort.NEWEST,
                           filter_score_with=None)

df=pd.DataFrame.from_records(messenger_review)
df.head()
df.to_csv("/Users/lyn/messenger/messenger.csv")

3. 데이터 기간 설정 및 팬데믹 전과 후로 나누기

df['at'] = pd.to_datetime(df['at'])
df['at'] = df['at'].dt.strftime('%Y-%m-%d')
df = df[df['at'] >= '2018-01-01']
df = df[df['at'] <= '2022-09-30']

before = df[df['at'] < '2020-03-11']
after = df[df['at'] >= '2020-03-11']

before.to_excel("/Users/lyn/messenger/before.xlsx")
after.to_excel("/Users/lyn/messenger/after.xlsx")

4. 데이터 불러오기

before = pd.read_excel('/Users/lyn/messenger/before.xlsx')
before

5. thumbsUpCount 0이면 삭제

data = before[before["thumbsUpCount"] != 0]
data

6. score 기준으로 긍정, 부정으로 분류

pos = data[(data['score'].isin([4, 5]))]['content']
neg = data[(data['score'].isin([1, 2]))]['content']

print('긍정 : ' + str(len(pos)) + '개, ' + '부정 : ' + str(len(neg)) + '개')
print('총 : ' + str(len(pos)+len(neg)) + '개')

7. 영어 알파벳과 숫자를 제외한 문자 제거 및 리스트 형태로

def remove_non_english(text):
    return re.sub(r'[^a-zA-Z0-9\s]', '', text)

def sent_to_words(sentences):
    for sentence in sentences:
        yield gensim.utils.simple_preprocess(remove_non_english(str(sentence)), deacc=True)

pos_words = list(sent_to_words(pos))
neg_words = list(sent_to_words(neg))

8. 불용어 제거

stop_words = stopwords.words('english') + ['app', 'messenger', 'application', 'aap','m','aap','facebook']

def remove_stopwords(texts):
    return [[word for word in simple_preprocess(str(doc)) if word not in stop_words] for doc in texts]

pos_words_nostops = remove_stopwords(pos_words)
neg_words_nostops = remove_stopwords(neg_words)

9. 표제어 추출

nlp = spacy.load('en_core_web_sm', disable=['parser', 'ner'])

def lemmatization(texts, allowed_postags=['NOUN', 'ADJ', 'VERB', 'ADV']):
    texts_out = []
    for sent in texts:
        doc = nlp(" ".join(sent)) 
        texts_out.append([token.lemma_ for token in doc if token.pos_ in allowed_postags])
    return texts_out

pos_lemmatized = lemmatization(pos_words_nostops)
neg_lemmatized = lemmatization(neg_words_nostops)

10. Dictionary 및 Corpus 생성

pos_id2word = corpora.Dictionary(pos_lemmatized)
neg_id2word = corpora.Dictionary(neg_lemmatized)

pos_corpus = [pos_id2word.doc2bow(text) for text in pos_lemmatized]
neg_corpus = [neg_id2word.doc2bow(text) for text in neg_lemmatized]

11. 토픽수 정하기_일관성, 복잡도, 조화 평균 구하기

coherencesT=[]
perplexitiesT=[]
ntopicsT=[]
warnings.filterwarnings('ignore')

for i in range(1,10):
    if i==1:
        ntopics = 2
    else:
        ntopics = i+1
    nwords = 100
    tic = time.time()
    ldamodel = gensim.models.ldamodel.LdaModel(pos_corpus, id2word=pos_id2word, num_topics=ntopics)
    print('ntopics',ntopics,time.time() - tic)
    ntopicsT.append(ntopics)

    cm = CoherenceModel(model=ldamodel, texts=pos_lemmatized, corpus=pos_corpus, coherence='c_v')
    coherence = cm.get_coherence()
    coherencesT.append(coherence)

    perplexitiesT.append(ldamodel.log_perplexity(pos_corpus))


plt.rcParams['font.size'] = 20  # Set desired font size

plt.plot(ntopicsT, coherencesT)
plt.ylabel("Coherences")
plt.title('Before_POS')
plt.show()

plt.plot(ntopicsT, perplexitiesT)
plt.xlabel("ntopics")
plt.ylabel("Perplexity")
plt.show()

# Calculate harmonic mean
harmonic_mean = 2 * (np.array(coherencesT) * np.array(perplexitiesT)) / (np.array(coherencesT) + np.array(perplexitiesT))

# Visualize harmonic mean
plt.plot(ntopicsT, harmonic_mean)
plt.xlabel("ntopics")
plt.ylabel("Harmonic Mean")
plt.show()

12. LDA 모델링 및 pyLDAvis 시각화

#모델링
pos_lda_model_4 = gensim.models.ldamodel.LdaModel(corpus=pos_corpus, id2word=pos_id2word, num_topics=4)

#시각화
vis = pyLDAvis.gensim_models.prepare(pos_lda_model_4, pos_corpus, pos_id2word)
pyLDAvis.display(vis)

13. 모델 저장 및 로드

#pyLDAvis 저장
pyLDAvis.save_html(vis,"200313_before_pos_lda_model_4.html")

#저장
pos_4 = datapath("230313_before_pos_lda_model_4")
pos_lda_model_4.save(pos_4)

#로드
pos_4 = datapath("230313_before_pos_lda_model_4")
pos_lda4= models.ldamodel.LdaModel.load(pos_4)
vis = pyLDAvis.gensim_models.prepare(pos_lda4, pos_corpus, pos_id2word)

14. 테이블 그리기

all_topics = {}
num_terms = 10
lambd = 1

topic_Term=[]
topic_relevance=[]
for i in range(1,5):
    topic = vis.topic_info[vis.topic_info.Category == 'Topic'+str(i)].copy()
    topic['relevance'] = topic['loglift']*(1-lambd)+topic['logprob']*lambd
    topic_Term.append(topic['Term'])
    topic_relevance.append(topic['relevance'])
    all_topics['Topic '+str(i)] = topic.sort_values(by='relevance', ascending=False).Term[:num_terms].values

pd.DataFrame(all_topics).T

15. 키워드별 비율 구하기

for topic in pos_lda4.show_topics():
    print(topic)

16. 워드 클라우드 생성 및 저장

####

max_words ＝ 워드클라우드 키워드 수
font_path ＝ 글꼴 파일 경로 (기본값 : 200)
background_color ＝ 배경색 (기본값 : black)
width = 출력 이미지의 가로 길이 （픽셀 단위 / 기본값 : 400)
height = 출력 이미지의 세로 길이 (픽셀 단위 / 기본값 : 200)
max_font_size = 키워드 글꼴의 최대 크기 （기본값 : None — 제한 없음)
min_font_size ＝ 키워드 글꼴의 최소 크기 （기본값 : 4)

#한글 깨짐 방지를 위한 폰트 설정
fontpath = "/System/Library/Fonts/Supplemental/AppleGothic.ttf"
font = fm.FontProperties(fname=fontpath, size=12)

wc = WordCloud(font_path=fontpath, width=1000, height=1000, background_color="white")

plt.figure(figsize=(30,30))
for t in range(pos_lda4.num_topics):
    plt.subplot(2,2,t+1)
    x = dict(zip(topic_Term[t],topic_relevance[t]))
    im = wc.generate_from_frequencies(x)
    plt.imshow(im)
    plt.axis("off")
    plt.title("Topic #" + str(t+1), fontproperties=font, size=50)

# 이미지 저장
plt.show()
plt.savefig('팬데믹 전 긍정 2*2.png')

17. 토픽별 리뷰 영향 파악

# pos_lda4 모델의 corpus와 dictionary 사용
corpus = pos_corpus
dictionary = pos_id2word

# 토픽별 가장 큰 값의 문서 index를 저장할 딕셔너리 초기화
top_docs_by_topic = {i: [] for i in range(pos_lda4.num_topics)}

# 모든 문서에 대해서 토픽 분포 계산 및 토픽별 가장 큰 값의 문서 index 저장
for i, row_list in enumerate(pos_lda4[corpus]):
    row = row_list[0] if pos_lda4.per_word_topics else row_list
    row = sorted(row, key=lambda x: (x[1]), reverse=True)

    top_topic = row[0][0]
    top_value = row[0][1]
    top_docs_by_topic[top_topic].append((i, top_value))

# 각 토픽별 가장 영향력 있는 문서들을 출력 결과를 파일에 저장
with open('before the pandemic pos.txt', 'w') as f:
    # 각 토픽별 가장 영향력 있는 문서들을 출력
    for i in range(pos_lda4.num_topics):
        f.write(f"\n토픽 {i}에서 가장 영향력 있는 문서 10개:\n")
        top_docs = sorted(top_docs_by_topic[i], key=lambda x: x[1], reverse=True)[:10]
        for doc_index, value in top_docs:
            doc = pos_content[doc_index]
            f.write(f"- {doc}\n")

    # 각 문서별 Dominant topic, Perc Contribution, Topic Keywords 정보를 출력 결과를 파일에 저장
    df_topic_sents_keywords = format_topics_sentences(ldamodel=pos_lda4, corpus=corpus, texts=pos_content)
    df_dominant_topic = df_topic_sents_keywords.reset_index()
    df_dominant_topic.columns = ['Document_No', 'Dominant_Topic', 'Topic_Perc_Contrib', 'Keywords', 'Text']
    df_dominant_topic.to_csv('before the pandemic pos_dominant_topic.csv', index=False)

<발표논문> 0과잉 포아송 모형을 활용한 불법주정차 예측 모델링

Fri, 23 May 2025 10:54:02 GMT

한국스마트미디어학회 2023 스마트미디어 심포지움

일시 : 2023년 10월 26일 ~ 10월 28일

장소 : 순천대학교 70주년기념관&산학협력관

주최 : 사)한국스마트미디어학회/사)한국전자거래학회

초록

증가하는 자동차 대수로 인해 불법 주정차 문제가 심각해지고 있다. 이에 불법 주정차 요인 분석, 불법 주정차 감지 방법 등에 관한 연구가 진행되고 있으나 예측에 관한 연구는 미비하다. 이에 본 연구는 대구광역시 달서구 불법 주정차 데이터를 시간 단위로 예측하여 불법 주정차 발생 패턴을 파악하고 불법 주정차 예방에 기여하고자 한다. 연구에 사용된 데이터는 공공데이터 포탈과 D-데이터 허브의 대구광역시 달서구 불법 주정차 데이터이다. 불법 주정차가 발생한 정확한 지점을 파악하기 위해 두 데이터를 유사도 90%를 기준으로 병합하였으며 분석에 사용된 데이터는 총 182,163개이다. 불법 주정차 데이터는 해당 사건이 감지된 시점에만 기록됨으로 0값이 대다수이다. 이에 본 연구는 0과잉 포아송 모형을 통해 시간 단위 예측을 진행하였다. 본 연구를 바탕을 혼잡이 예상되는 지점을 사전에 파악하여 더욱 효율적인 교통 단속이 가능할 것으로 기대된다.

프로시딩 https://www.kism.or.kr/bbs/board.php?bo_table=DA01030000&wr_id=111&sst=wr_hit&sod=desc&sop=and&page=4

<게재논문> Analyzing User Feedback on a Fan Community Platform 'Weverse': A Text Mining Approach

Fri, 23 May 2025 10:37:25 GMT

* Analyzing User Feedback on a Fan Community Platform 'Weverse': A Text Mining Approach *

Abstract

This study applies topic modeling to uncover user experience and app issues expressed in users' online reviews of a fan community platform, Weverse on Google Play Store. It allows us to identify the features which need to be improved to enhance user experience or need to be maintained and leveraged to attract more users. Therefore, we collect 88,068 first-level English online reviews of Weverse on Google Play Store with Google-Play-Scraper tool. After the initial preprocessing step, a dataset of 31,861 online reviews is analyzed using Latent Dirichlet Allocation (LDA) topic modeling with Gensim library in Python. There are 5 topics explored in this study which highlight significant issues such as network connection error, delayed notification, and incorrect translation. Besides, the result revealed the app's effectiveness in fostering not only interaction between fans and artists but also fans' mutual relationships. Consequently, the business can strengthen user engagement and loyalty by addressing the identified drawbacks and leveraging the platform for user communication. ###### Keywords: Weverse| Topic modeling| LDA| Fan community platform| Communication

Van Ho, T. T., Noh, M. J., Lee, Y. N., & Kim, Y. S. (2024). Analyzing User Feedback on a Fan Community Platform'Weverse': A Text Mining Approach. 스마트미디어저널, 13(6), 62-71.

https://www.dbpia.co.kr/journal/articleDetail?nodeId=NODE11979888

<게재논문>Empowering Agriculture: Exploring User Sentiments and Suggestions for Plantix, a Smart Farming Application

Fri, 23 May 2025 10:27:04 GMT

Empowering Agriculture: Exploring User Sentiments and Suggestions for Plantix, a Smart Farming Application

Abstract

Farming activities are transforming from traditional skill-based agriculture into knowledgebased and technology-driven digital agriculture. The use of intelligent information and communication technology introduces the idea of smart farming that enables farmers to collect weather data, monitor crop growth remotely and detect crop diseases easily. The introduction of Plantix, a pest and disease management tool in the form of a mobile application has allowed farmers to identify pests and diseases of the crop using their mobile devices. Hence, this study collected the reviews of Plantix to explore the response of the users on the Google Play Store towards the application through Latent Dirichlet Allocation (LDA) topic modeling. Results indicate four latent topics in the reviews: two positive evaluations (compliments, appreciation) and two suggestions (plant options, recommendations). We found the users suggested the application to additional plant options and additional features that might help the farmers with their difficulties. In addition, the application is expected to benefit the farmer more by having an early alert of diseases to farmers and providing various substitutes and a list of components for the remedial measures. ###### Keywords : Smart farming|Plant disease detection|User review analytics|LDA|Text Mining

Siow, M. Q., Han, M. M. C., Lee, Y. N., Yu, S. Y., Noh, M. J., & Kim, Y. S. (2023). Empowering Agriculture: Exploring User Sentiments and Suggestions for Plantix, a Smart Farming Application. 스마트미디어저널, 12(10), 38-46.

https://www.dbpia.co.kr/Journal/articleDetail?nodeId=NODE11783938

<발표논문>COVID-19 팬데믹 기간 Messenger 어플리케이션 평점-리뷰 분석

Fri, 23 May 2025 10:17:05 GMT

한국스마트미디어학회_2023종합학술대회

일시 : 2023년 04월 27일 ~ 04월 29일

장소 : 제주대학교 아라컨벤션홀

주최 : 사)한국스마트미디어학회

초록

COVID-19 팬데믹 영향으로 대면 소통이 어려워지면서 메신저 애플리케이션을 통한 소통이 증가하고 있다. 이에 따라 원격 교육이나 원격 의료 등에 관한 연구가 진행되고 있으나 팬데믹이 메신저 애플리케이션 리뷰에 미친 영향에 관한 연구는 미비한 편이다. 이에 본 연구의 목적은 Messenger 애플리케이션의 리뷰를 분석해 사용자들의 만족도를 향상하기 위한 방안을 도출하는 것이다. 이에 google-play-scraper를 통해 구글플레이스토어에서 Messenger 애플리케이션 리뷰를 수집한 후 평점을 기준으로 1, 2는 부정으로 4, 5는 긍정으로 분류했다. 분석에 사용한 데이터는 긍정 리뷰 25,266개와 부정 리뷰 43,540개이다. gensim기반 LDA (Latent Dirichlet Allocation) 기법을 통해 분석을 진행한 결과 긍정의 경우 업데이트, 발신 기능, 사용 목적, 사용 경험과 같은 토픽이 나타났다. 부정의 경우는 발신 기능, 알림 기능, 채팅 기능, 계정 관리와 같은 토픽이 나타났다. 본 연구 결과를 바탕으로 애플리케이션개발자들은 해당 부분에 초점을 맞춰 애플리케이션을 유지 및 보수하면 사용자 만족도를 높일 수 있을 것이다.

프로시딩 https://www.kism.or.kr/bbs/board.php?bo_table=DA01030000&wr_id=110&sst=wr_hit&sod=desc&sop=and&page=4

<게재논문>팬데믹 기간 Messenger 애플리케이션 리뷰 변화를 통한 서비스 전략 분석

Thu, 22 May 2025 12:46:50 GMT

팬데믹 기간 Messenger 애플리케이션 리뷰 변화를 통한 서비스 전략 분석.

초록

COVID-19 팬데믹 영향으로 대면 소통이 어려워지면서 비대면 소통의 영향을 파악하는 연구가 진행되고 있으나 메신저 애플리케이션 리뷰를 통해 이를 살펴본 연구는 미비하다. 본 연구는 구글 플레이 스토어 내의 메신저 애플 리케이션 리뷰 데이터를 수집하여 LDA(Latent Dirichlet Allocation)토픽 모델링을 통해 팬데믹의 영향을 파악 하고, 이에 따른 서비스 전략 방안을 제시하고자 한다. 연구에서는 팬데믹이 시작된 시점과 사용자가 부여한 평점 을 기준으로 데이터를 분류하였다. 분석 결과 주로 중장년층이 메신저를 사용하는 것으로 나타났으며, 팬데믹 이후 에는 가족과의 소통이 증가한 것으로 확인되었다. 사용자들은 애플리케이션의 업데이트에 대해 불만을 표현하였으 며, 변화에 대한 적응이 어려움을 보였다. 이에 업데이트 주기를 조정하고 사용자들의 의견을 적극 수용하는 개발 접근이 필요하다. 또한, 직관적이고 간편한 사용자 인터페이스(UI)를 제공한다면 사용자 만족도를 향상시킬 수 있 을 것으로 기대된다. ###### ■중심어 : 메신저 애플리케이션 ; 코로나 ; 사용자 리뷰 분석 ; LDA ; 사용자 만족

이유나, 노미진, & 김양석. (2023). 팬데믹 기간 Messenger 애플리케이션 리뷰 변화를 통한 서비스 전략 분석. 스마트미디어저널, 12(6), 15-26.

https://www.dbpia.co.kr/Journal/articleDetail?nodeId=NODE11737962