1 / 18

Estimating review score from words

CMPE 545 Artificial Neural Networks. Estimating review score from words. Işık Barış Fidaner. S. = 1/N . score i. Metascore. The rating given to this product. r t =. The source of this review. Score. Reviewer. Quote. + affectionate. A few sentences that summarize this review.

leena
Download Presentation

Estimating review score from words

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMPE 545 ArtificialNeural Networks Estimating reviewscore from words Işık Barış Fidaner

  2. S = 1/N . scorei Metascore

  3. The rating giventothisproduct rt = The source of this review Score Reviewer Quote +affectionate A few sentences that summarize this review Bag of words representation +exuberant +embrace xt = ? Existence of some words in the quote

  4. Purposes • A new database that relates text to score (...) An affectionate, exuberant picture that seeks to bring even those who don't know Klingon from Portuguese into the embrace of a pop-culture phenomenon. (...) ? 90

  5. Purposes • Quantify meaning with machine learning Review quote: An affectionate, exuberant picture that seeks to bring even those who don't know Klingon from Portuguese into the embrace of a pop-culture phenomenon. 0 0 1 0 1 0 0 1 riveting exhilerating affectionate crafted exuberant dull lacking embrace wT xt 73 ´ 70 ´ 65 ´

  6. Purposes • Meta-metacritic deductions, such as riveting exhilerating crafted superb extraordinary brilliant unfunny tedious fails mess dull lacking Positive words Negative words

  7. Obtaining the database • Developed a PHP web crawler • It ran for a few days • TV show reviews • 8,335 records • Music album reviews • 62,293 records • Movie reviews • 113,456 records PHP MySQL

  8. Bag of words assumption • Features affect the result independently phenomenon from an exuberant picture those into a portugese don’t pop-culture affectionate to embrace bring klingon of who know seeks An affectionate, exuberant picture that seeks to bring even those who don't know Klingon from Portuguese into the embrace of a pop-culture phenomenon. = • Semantic organization does not matter

  9. Bag of words assumption • The problem with modifiers: ¹ This is not good. Is this not good? • We rely on the information encoded in the vocabulary, not grammar • Opinions expressed clearly and simply: Excellent, wonderful! This is dreadful.

  10. Word selection • Quote count (QC) • Product count (PC) ~20 thousand words ~300 words Score mean (SM) Score stdev (SS) • Meaningful words (SS < SSmax = 20) • Frequently used words (PC > PCmin = 20) • Non-grammatical words (PC < PCmax = 100)

  11. Significant words for TV and movies casual words! fancy words! unfunny disappointment supposed, fails waste TV takes too much time! Movies are overrated!

  12. Significant words for music albums masterpiece artists Music is art date modern Music ages quickly personality Albums are attached to the musician’s personality

  13. The input vector and estimation • Example input vector (divided by quote size) • xt = [1 0 0 1 0 0 0 1 0 0 0 0 ... 0] / 3 • Estimation function • There is a weight for every selected word • xt chooses the subset of contained words • Estimation is the sum of w0 and the arithmetic mean of the weights of contained words

  14. Linear and SVM regression • Linear regression uses square difference err. • Which imply these update equations: • SVM regression uses e-sensitive error func. • With these simpler update equations

  15. Linear regression learning Unstable learning in validation set Error of 17 points Error of 14 points

  16. SVM regression learning Robustness increased, because SVM error function is linear and tolerant to error. Better results with SVM! Error of 13 points Error of 11 points

  17. Possible improvements • Non-linear model that actually weighs the importance of words • Normalization by estimating reviewer parameters • Adding two-word combinations to the input vector

  18. CMPE 545 ArtificialNeural Networks Estimating reviewscore from words Işık Barış Fidaner

More Related