1 / 13

tf= term frequency in doc

RoloDex Model. 16. 6. itemset itemset card. 5. wtf(t,d)=tf(t,d) * idf(t) where idf(t) ( inverse term frequency) = log 2 N/n= total # of docs, n= # of docs containing t. Item. wtf = weightd term frequency. 4. tf= term frequency in doc. 3.

axelle
Download Presentation

tf= term frequency in doc

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RoloDex Model 16 6 itemset itemset card 5 wtf(t,d)=tf(t,d) * idf(t) where idf(t) (inverse term frequency) = log2N/n= total # of docs, n= # of docs containing t Item wtf = weightd term frequency 4 tf= term frequency in doc 3 TermSpace: dimensions=terms (stems?), points (vectors)=docs, each entry = term frequency (tf) of the term in the doc. tf(t,d) can be existential (1 iff t exists in d); t-count(d) in d; t-ratio(d) = t-count(d) / total-term-count(d) 2 1 Author People  2 1 1 2 2 3 3 3 4 4 4 5 5 6 7 ItemSet ItemSet antecedent  Customer 1 1 1 1 1 1 1 1 5 6 16 1 1 1 1 1 1 1 1 1 Doc 1 movie 2 3 term  G 3 0 0 0 5 0 4 0 5 0 0 0 1 0 1 2 3 4 5 6 7 Doc 0 0 3 0 0 customer rates movie card 0 2 2 0 3 4 0 0 0 0 1 0 0 1 0 0 4 0 0 5 0 3 2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 PI Gene 0 0 0 customer rates movie as 1 card 3 0 0 0 0 1 0 0 0 0 0 0 0 0 0 4 t 5 1 6 1 2 1 3 Gene 3 Exp 4 0 0 0 0 1 0 0 0 1 0 0 0 0 0 5 0 0 0 6 7 customer rates movie as 5 card 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 Conf(AB) =Supp(AB)/Supp(A) Supp(A)=|{c: c related to all i in A}| cust item card te term exists: 1 iff term is in doc authordoc card genegene card (ppi) docdoc People  expPI card expgene card genegene card (ppi) . . . termterm card (shared stem?

  2. From Wikipedia: The tf–idf weight (term frequency–inverse document frequency) is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. Variations of the tf–idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query. One of the simplest ranking functions is computed by summing the tf-idf for each query term; many more sophisticated ranking functions are variants of this simple model. • Motivation • Suppose we have a set of English text documents and wish to determine which document is most relevant to the query "the brown cow." A simple way to start out is by eliminating documents that do not contain all three words "the," "brown," and "cow," but this still leaves many documents. To further distinguish them, we might count the number of times each term occurs in each document and sum them all together; the number of times a term occurs in a document is called its term frequency. However, because the term "the" is so common, this will tend to incorrectly emphasize documents which happen to use the word "the" more, without giving enough weight to the more meaningful terms "brown" and "cow". Also the term "the" is not a good keyword to distinguish relevant and non-relevant documents and terms like "brown" and "cow" that occur rarely are good keywords to distinguish relevant documents from the non-relevant documents. Hence an inverse document frequency factor is incorporated which diminishes the weight of terms that occur very frequently in the collection and increases the weight of terms that occur rarely. • Mathematical details • The term count in the given document is simply the number of times a given term appears in that document. This count is usually normalized to prevent a bias towards longer documents (which may have a higher term count regardless of the actual importance of that term in the document) to give a measure of the importance of the term ti within the particular document dj. Thus we have the term frequency, defined as: tfi,j = ni.j / k nk,j • where ni,jis the number of occurrences of the considered term (ti) in document dj, and the denominator is the sum of number of occurrences of all terms in document dj, that is, the size of the document |dj | . The inverse document frequency is a measure of the general importance of the term (obtained by dividing the total number of documents by the number of documents containing the term, and then taking the logarithm of that quotient). idfi = log (|D| / {d: ti d} with | D | : total number of documents in the corpus. |{d: tid}|:number of documents where the termtiappears (that is ). If the term is not in the corpus, this will lead to a division-by-zero. It is therefore common to use 1+ |{d: tid}|: • A high weight in tf–idf is reached by a high term frequency (in the given document) and a low document frequency of the term in the whole collection of documents; the weights hence tend to filter out common terms. The tf-idf value for a term will be greater than zero if and only if the ratio inside the idf's log function is greater than 1. Depending on whether a 1 is added to the denominator, a term in all documents will have either a zero or negative idf, and if the 1 is added to the denominator a term that occurs in all but one document will have an idf equal to zero. Various (mathematical) forms of the tf-idf term weight can be derived from a probabilistic retrieval model that mimicks human relevance decision making. • Example Consider a document containing 100 words wherein the word cow appears 3 times. Following the previously defined formulas, the term frequency (TF) for cow is then (3 / 100) = 0.03. Now, assume we have 10 million documents and cow appears in one thousand of these. Then, the inverse document frequency is calculated as log(10 000 000 / 1 000) = 4. The TF-IDF score is the product of these quantities: 0.03 × 4 = 0.12. • Applications in vector space model • The tf-idf weighting scheme is often used in the vector space model together with cosine similarity to determine the similarity between two documents. • See alsoOkapi BM25Noun phraseWord countKullback-Leibler divergenceMutual InformationLatent semantic analysisLatent semantic indexingLatent Dirichlet allocation • References Spärck Jones, Karen (1972). "A statistical interpretation of term specificity and its application in retrieval". Journal of Documentation28 (1): 11–21. doi:10.1108/eb026526. http://www.soi.city.ac.uk/~ser/idfpapers/ksj_orig.pdf. Salton, G. and M. J. McGill (1983). Introduction to modern information retrieval. McGraw-Hill. ISBN0070544840. Salton, Gerard, Edward A. Fox & Harry Wu (November 1983). "Extended Boolean information retrieval". Communications of the ACM26 (11): 1022–1036. doi:10.1145/182.358466. http://portal.acm.org/citation.cfm?id=358466. Salton, Gerard and Buckley, C. (1988). "Term-weighting approaches in automatic text retrieval". Information Processing & Management24 (5): 513–523. doi:10.1016/0306-4573(88)90021-0. H.C. Wu, R.W.P. Luk, K.F. Wong, K.L. Kwok (2008). "Interpreting TF-IDF term weights as making relevance decisions". ACM Transactions on Information Systems26 (3). doi:10.1145/1361684.1361686.

  3. movie-vote.C ARM code /** Public function. This function implements movie voting. * \param pcfg A pointer to the class containing the parameters which configure voting. * \param M The movie number for which a prediction is to be made * \param supportM The PTree identifying the support for the movie to be predicted. * \param U The identity number of the user for which a prediction is to be made. * \param supportU The Ptree identifying the support for the user who a predication is being made for. * \return The recommended prediction. */ extern double movie_vote(PredictionConfig *pcfg, unsigned long int M, \ PTree & supportM, unsigned long int U, PTree & supportU) { auto double MU=Users.get_rating(U,M)-2; //for PROBE-run diag print. Take out for QUAL-runs.) auto double VOTE=DEFAULT_VOTE, VOTE_sum=0, VOTE_cnt=0, Nb,Mb,dsSq,UCor=1, supportUsize=supportU.get_count(),supportMsize=supportM.get_count(); struct pruning *internal_prune; struct external_prune *external_prune; auto PTree supM = supportM, supU = supportU; supM.clearbit(U);supU.clearbit(M); /* External pruning: Prune Users supM */ external_prune = pcfg->get_movie_Prune_Users_in_SupM(); if (external_prune->enabled) {if(supM.get_count()>external_prune->params.Ct) do_pruning(external_prune, M, U, supM, supU); supM.clearbit(U); supU.clearbit(M); if ( (supM.get_count() < 1) || (supU.get_count() < 1) ) return VOTE; } /* Reset support if requested. */ if (pcfg->reset_movie_support()) {supU=supportU; supU.clearbit(M);} /* External pruning: Prune Movies supU */ external_prune = pcfg->get_movie_Prune_Movies_in_SupU(); if (external_prune->enabled) {if(supU.get_count()>external_prune->params.Ct ) do_pruning(external_prune, M, U, supM, supU); supM.clearbit(U); supU.clearbit(M); if( (supM.get_count() < 1) || (supU.get_count() < 1) ) return VOTE; }

  4. movie-vote.C ARM code 2 /** ARM Code **** * First an EXPLANATION of the ratings Ptree implemention: * Actual ratings are first translated from 1,2,3,4,5 to 3,4,5,6,7 (E.g., * rating=1 is implemented in Ptrees as rating=3), rating=2 as rating=4), * rating=3 is implemented in Ptrees as rating=5), rating=4 as rating=6), * rating=5 is implemented in Ptrees as rating=7). * This design decision was made so that rating=0 (which means "not rated" * and does NOT mean "the very lowest rating") would be at a maximum separation * from the lowest true rating, yet all ratings could still be implemented with * 3 Ptrees (all rating values are 3-bit numbers). Thus, Ptrees represent ratings as * 0=000 (movie not rated by user), 3=011 (very low or a 1-star rating), * 4=100 (low or a 2-star rating), 5=101 (average or a 3-star rating), * 6=110 (high or a 4-star rating), 7=111 (very high or a 5-star rating). * We also partition (cluster) the movies in the support of user predictee, U, by rating. * This is done so that we can restrict to only those pertinent user predictor voters * for each rating value in our ARM code (don't need to loop through all voters, * e.g., for ARM done on rating=1 relationship, but only user that predict 1 for M)*/ auto PTreeSet & U_ptree_set=Users.get_ptreeset(), & M_ptree_set=Movies.get_ptreeset(); supU.clearbit(M); supM.clearbit(U); auto PTree supU_1=supU& (~U_ptree_set[(U*3)+0])& ( U_ptree_set[(U*3)+1])& ( U_ptree_set[(U*3)+2]), supU_2=supU& ( U_ptree_set[(U*3)+0])& (~U_ptree_set[(U*3)+1])& (~U_ptree_set[(U*3)+2]), supU_3=supU& ( U_ptree_set[(U*3)+0])& (~U_ptree_set[(U*3)+1])& ( U_ptree_set[(U*3)+2]), supU_4=supU& ( U_ptree_set[(U*3)+0])& ( U_ptree_set[(U*3)+1])& (~U_ptree_set[(U*3)+2]), supU_5=supU& ( U_ptree_set[(U*3)+0])& ( U_ptree_set[(U*3)+1])& ( U_ptree_set[(U*3)+2]), supM_1=supM& (~M_ptree_set[(M*3)+0])& ( M_ptree_set[(M*3)+1])& ( M_ptree_set[(M*3)+2]), supM_2=supM& ( M_ptree_set[(M*3)+0])& (~M_ptree_set[(M*3)+1])& (~M_ptree_set[(M*3)+2]), supM_3=supM& ( M_ptree_set[(M*3)+0])& (~M_ptree_set[(M*3)+1])& ( M_ptree_set[(M*3)+2]), supM_4=supM& ( M_ptree_set[(M*3)+0])& ( M_ptree_set[(M*3)+1])& (~M_ptree_set[(M*3)+2]), supM_5=supM& ( M_ptree_set[(M*3)+0])& ( M_ptree_set[(M*3)+1])& ( M_ptree_set[(M*3)+2]), sou, souM, souU, som, somU, somM, spM, spU; auto double thr1,expnt1,thr2,expnt2,s,S,ss,sn,sM,sU,c,C,wt,XBalVT,wt_const=16;

  5. movie-vote.C ARM code 3 /* Association Rule Mining (ARM) to enhance Movie Votes: *For each rating value, k=1,2,3,4,5, we consider in turn only movie voters that are rated k by U. *We loop through those movies, N, for which rating(N,U)=k, one k=1,2,3,4,5 at at time; *looking for strong Rules, N-->M (the N_arm_k LOOP, k=1,2,3,4,5), *then for strong Rules, N,O-->M (the NO_arm_k LOOP, k=1,2,3,4,5), *then for strong Rules, N,O,P-->M (the NOP_arm_k LOOP, k=1,2,3,4,5), *then for strong Rules, N,O,P,Q-->M (the NOPQ_arm_k LOOP, k=1,2,3,4,5), *then for strong Rules, N,O,P,Q,R-->M (the NOPQR_arm_k LOOP, k=1,2,3,4,5), *then for strong Rules, N,O,P,Q,R,S-->M (the NOPQRS_arm_k LOOP, k=1,2,3,4,5), *then for strong Rules, N,O,P,Q,R,S,T-->M (the NOPQRST_arm_k LOOP, k=1,2,3,4,5), * When a strong rule is found, we issue "bak" ballots to the antecedent set to vote for k. * We allow "shifting" votes from "k" with "vsk" (vote shift). Thus the parameters are: *spk (min Rating=1 ARM support threshold), *cfk (minimum Rating=1 ARM confidence threshold), *bak (num of ballot issued to Ratings=1_Strong_ARM antecedent set), vsk (shift for vote=1). * For each rating, k=12345 doing a separate "rating=k Assoc. Rule Mine (for strong rules * with consequent={M}, antecedent=a_movie set. For each strong rule extra votes are given and * vote shift allowed to optimize benefit. Each rule support has its own loop. If individual * ARM_supports are not allowed, downward closure of ARM support can be applied. */ //SAMPLE-statistic-based dMNsds pruning config parameters hijacked here for ARM parm use. internal_prune = pcfg->get_internal_prune(movie_dMNsds); thr1=internal_prune->threshold; expnt1=internal_prune->exponent; internal_prune = pcfg->get_internal_prune(movie_Nsds_Msds); thr2=internal_prune->threshold; expnt2=internal_prune->exponent;

  6. movie-vote.C ARM code 4 #if 1 // ARM(pre 5/1/10) #if 1 // Movie_ARM Rating = 1 #if 1 // N Movie_ARM Rating = 1 auto unsigned long long int *supUlist_1=supU_1.get_indexes(); for (unsigned long long int n = 0; n < supU_1.get_count(); ++n) { auto unsigned long long int N=supUlist_1[n]; auto PTree supN = Movies.get_users(N), supN_1 = supN & (~M_ptree_set[(N*3)+0]) & ( M_ptree_set[(N*3)+1]) & ( M_ptree_set[(N*3)+2]), supN_2 = supN & ( M_ptree_set[(N*3)+0]) & (~M_ptree_set[(N*3)+1]) & (~M_ptree_set[(N*3)+2]), supN_3 = supN & ( M_ptree_set[(N*3)+0]) & (~M_ptree_set[(N*3)+1]) & ( M_ptree_set[(N*3)+2]), supN_4 = supN & ( M_ptree_set[(N*3)+0]) & ( M_ptree_set[(N*3)+1]) & (~M_ptree_set[(N*3)+2]), supN_5 = supN & ( M_ptree_set[(N*3)+0]) & ( M_ptree_set[(N*3)+1]) & ( M_ptree_set[(N*3)+2]), csMN_1 = supM_1 & supN_1, csMN_2 = supM_2 & supN_2, csMN_3 = supM_3 & supN_3, csMN_4 = supM_4 & supN_4, csMN_5 = supM_5 & supN_5; #if 1 // vote code auto double NU = Users.get_rating(U,N)-2, sMNN1= csMN_1.get_count(), sNN1= supN_1.get_count() , sMNN2= csMN_2.get_count(), sNN2= supN_2.get_count() , sMNN3= csMN_3.get_count(), sNN3= supN_3.get_count() , sMNN4= csMN_4.get_count(), sNN4= supN_4.get_count() , sMNN5= csMN_5.get_count(), sNN5= supN_5.get_count() , sMNNn1= sMNN2+sMNN3+sMNN4+sMNN5, sNNn1= sNN2+sNN3+sNN4+sNN5; if ( ( (sMNNn1 > sNNn1 * expnt1 ) ) && ( ( sMNN1 > sNN1 * thr1 ) ) ) { VOTE_sum += UCor * NU ; VOTE_cnt += UCor ; } #endif // vote code #endif // N Rating = 1 }

  7. movie-vote.C ARM code 5 #if 1 // Nearest Neighbor Code supU.clearbit(M); auto unsigned long long int *supUlist = supU.get_indexes(); for (unsigned long long int n = 0; n < supU.get_count(); ++n)//NLOOP (Ns are movie voters) {auto unsigned long long int N=supUlist[n]; if (N == M) continue; auto double NU=Users.get_rating(U,N)-2,MAX=0,smN=0,smM=0,MM=0,MN=0,NN=0,denom=0,dm; auto PTree supN=Movies.get_users(N), csMN= supM & supN; csMN.clearbit(U); dm=csMN.get_count(); if(dm<1) continue; /* External pruning: PRUNE USERS CoSupMN */ external_prune = pcfg->get_movie_Prune_Users_in_CoSupMN(); if (external_prune->enabled) {if(csMN.get_count()>external_prune->params.Ct) do_pruning(external_prune,M,U,csMN,supU); csMN.clearbit(U); supU.clearbit(M); dm = csMN.get_count(); if( dm < 1) continue;} /* Adjusted Cosine declarations */ auto double ACCor,Vbar,ACCnum=0, ACCden, ACCdenSum1=0, ACCdenSum2=0; /* NV: VLOOP (Vs are user dimensions) */ auto unsigned long long int *csMNlist = csMN.get_indexes(); for (unsigned long long int v= 0; v < csMN.get_count(); ++v) { auto unsigned long long int V=csMNlist[v]; auto double MV=Users.get_rating(V,M)-2, NV=Users.get_rating(V,N)-2; if(pow(MV-NV,2) > MAX) MAX=pow(MV-NV,2); smN+=NV; smM+=MV; MM+=MV*MV; MN+=NV*MV; NN+=NV*NV; ++denom; /* Adjusted Cosine code */ auto PTree supV=Users.get_movies(V); Vbar=Users.get_mean(V,supV); ACCnum+=(NV-Vbar)*(MV-Vbar); ACCdenSum1+=(NV-Vbar)*(NV-Vbar); ACCdenSum2+=(MV-Vbar)*(MV-Vbar); }//VLOOP ends

  8. movie-vote.C ARM code 6 /* Adjusted Cosine code */ ACCden=pow(ACCdenSum1,.5)*pow(ACCdenSum2,.5); ACCor=ACCnum/ACCden;UCor=ACCor;dm=csMN.get_count(); if(denom<1) continue; else {Nb=smN/dm; Mb=smM/dm; dsSq=NN-2*MN MM; VOTE=NU-Nb+Mb;} /* force_vote_in_Voter_Loop goes here. */ if ( pcfg->movie_vote_force_in_loop() ) { if ((VOTE<1) && (VOTE!=DEFAULT_VOTE)) VOTE=1; if ((VOTE>5) && (VOTE!=DEFAULT_VOTE)) VOTE=5; } /* SAMPLE-statistic-based pruning thru early exit */ if( dm > 1 ) { internal_prune = pcfg->get_internal_prune(movie_dMNsds); if ( internal_prune->enabled ) { auto double dMNsds, thr = internal_prune->threshold, expnt = internal_prune->exponent; dMNsds = pow((dsSq-dm*(Nb-Mb)*(Nb-Mb))/(dm-1), 0.5); //if(dMNsds>thr)continue; hijacking dMNsds expnt as UCor exponent if(UCor>=0)UCor=pow(UCor,expnt); } internal_prune = pcfg->get_internal_prune(movie_Nsds_Msds); if(internal_prune->enabled) { auto double Msds, Nsds, thr=internal_prune->threshold; Msds = pow((MM-dm*Mb*Mb)/(dm-1), 0.5); Nsds = pow((NN-dm*Nb*Nb)/(dm-1), 0.5); if ( Nsds > (thr * Msds) ) continue; } internal_prune = pcfg->get_internal_prune(movie_DVCors); if ( internal_prune->enabled ) { auto double Msds,Nsds,DVCors, thr=internal_prune->threshold, expnt=internal_prune->exponent; Msds=pow(dm*MM-smM*smM,.5)/dm; Nsds=pow(dm*NN-smN*smN,.5)/dm; DVCors=exp(expnt*(Nsds-Msds)*(Nsds-Msds)); if(DVCors<thr)continue; if(internal_prune->weight)UCor=DVCors; } internal_prune = pcfg->get_internal_prune(movie_VDCors); if ( internal_prune->enabled ) { auto double VDCors,dMNsds,thr=internal_prune->threshold,expnt=internal_prune->exponent; dMNsds=pow((dsSq-dm*(Nb-Mb)*(Nb-Mb))/(dm-1), .5); VDCors=exp(expnt*dMNsds*dMNsds); if ( VDCors < thr ) continue; if ( internal_prune->weight ) UCor = VDCors; } } // end of SAMPLE-statistic-based pruning through early

  9. /*POPULATION-statistics pruning thru early exit*/ if(dm>0){ internal_prune=pcfg->get_internal_prune(movie_dMNsdp); if(internal_prune->enabled){ auto double dMNsdp,thr=internal_prune->threshold, expnt=internal_prune->exponent; dMNsdp=pow(dm*dsSq-(smN-smM)*(smN-smM),.5)/dm; dMNsdp=pow(-expnt*dMNsdp,2); ifdMNsdp>thr) continue;} internal_prune = pcfg->get_internal_prune(movie_Nsdp_Msdp); if (internal_prune->enabled){ auto double Nsdp,Msdp,thr=internal_prune->threshold; Msdp=pow(dm*MM-smM*smM,.5)/dm; Nsdp=pow(dm*NN-smN*smN,.5)/dm; if( Nsdp > (thr * Msdp))continue; } internal_prune = pcfg->get_internal_prune(movie_DVCorp); if(internal_prune->enabled){ auto double DVCorp,Msdp,Nsdp,thr=internal_prune->threshold,expnt=internal_prune->exponent; Msdp=pow(dm*MM-smM*smM,.5)/dm;Nsdp=pow(dm*NN-smN*smN,.5)/dm; DVCorp=exp(expnt*(Nsdp-Msdp)*(Nsdp-Msdp)); if ( DVCorp<thr) continue; if ( internal_prune->weight ) UCor = DVCorp; } internal_prune = pcfg->get_internal_prune(movie_VDCorp); if (internal_prune->enabled){ auto double VDCorp,dMNsdp,thr=internal_prune->threshold,expnt=internal_prune->exponent; dMNsdp=pow(dm*dsSq-(smN-smM)*(smN-smM),.5)/dm; VDCorp =exp(expnt*dMNsdp*dMNsdp); if ( VDCorp < thr ) continue; if ( internal_prune->weight ) UCor = VDCorp; } internal_prune = pcfg->get_internal_prune(movie_SCor); if (internal_prune->enabled){ auto double SCor, thr=internal_prune->threshold; SCor=(MN-dm*Mb*Nb)/(.0001+(pow((MM-dm*pow(Mb,2)),.5))*(.0001+pow((NN-dm*pow(Nb,2)),.5))); if ( SCor < thr ) continue; if ( internal_prune->weight ) UCor = SCor; } internal_prune = pcfg->get_internal_prune(movie_PCor); if (internal_prune->enabled){ auto double ONEPDS,PCor=1,thr=internal_prune->threshold;ONEPDS=dsSq-dm*pow(Nb-Mb,2); if ( MAX > 0 ) PCor = exp(-0.1 * ONEPDS / (pow(MAX, .75) * pow(dm,.5))); if( PCor < thr ) continue; if ( internal_prune->weight ) UCor = PCor; } internal_prune = pcfg->get_internal_prune(movie_DCor); if(internal_prune->enabled){ auto double DCor,ONEPDS,thr=internal_prune->threshold; ONEPDS=dsSq-dm*pow(Nb-Mb,2); DCor=exp(-dsSq/100); if ( DCor < thr ) continue; if ( internal_prune->weight ) UCor = DCor; } } // POPULATION-statistics-based pruning through early exit ends here. if ( UCor > 0 ) { VOTE_sum+=VOTE*UCor; VOTE_cnt+=UCor; } else continue; /* force_vote_in_Voter_Loop goes here. */ if ( pcfg->movie_vote_force_in_loop() ) { if ( (VOTE<1) && (VOTE!= DEFAULT_VOTE) ) VOTE=1; if ((VOTE>5) && (VOTE!=DEFAULT_VOTE)) VOTE=5; } } /* ends NV NLOOP (movie voter loop) */ #endif //Nearest Neighbor Code if ( VOTE_cnt > 0 ) VOTE=VOTE_sum/VOTE_cnt; else VOTE = DEFAULT_VOTE; /* force_vote_after_Voter_Loop goes here. */ if ( pcfg->movie_vote_force_after_loop() ) { if ( (VOTE < 1) && (VOTE != DEFAULT_VOTE) ) VOTE=1; if ( (VOTE > 5) && (VOTE != DEFAULT_VOTE) ) VOTE=5; return VOTE;} movie-vote.C ARM code 7

  10. movie-vote.C ARM code 8 if (UCor>0) {VOTE_sum+=VOTE*UCor; VOTE_cnt+=UCor; } else continue; /* force_vote_in_Voter_Loop goes here. */ if ( pcfg->movie_vote_force_in_loop() ) { if ( (VOTE<1) && (VOTE!= DEFAULT_VOTE) ) VOTE=1; if ( (VOTE>5) && (VOTE!= DEFAULT_VOTE) ) VOTE=5; } } /* ends NV NLOOP (movie voter loop) */ #endif //Nearest Neighbor Code if ( VOTE_cnt>0 ) VOTE=VOTE_sum/VOTE_cnt; else VOTE=DEFAULT_VOTE; /* force_vote_after_Voter_Loop goes here. */ if ( pcfg->movie_vote_force_after_loop() ) { if ( (VOTE < 1) && (VOTE != DEFAULT_VOTE) ) VOTE=1; if ( (VOTE > 5) && (VOTE != DEFAULT_VOTE) ) VOTE=5; return VOTE; }

  11. createconfigs script in src/mpp-mpred-3.2.0/p95/mu11 #!/bin/bash for g in .1 .2 .4 .7 .9 do sed -i -e "s/dMNsdsThr=[^ ]*/dMNsdsThr=$g/" t.config for h in .1 .2 .4 .7 .9 do sed -i -e "s/dMNsdsExp=[^ ]*/dMNsdsExp=$h/" t.config cp t.config configs/a$g$h.config done done submitin src/mpp-mpred-3.2.0 produces here #!/bin/bash for g in .1 .2 .4 .7 .9 do for h in .1 .2 .4 .7 .9 do ./mpp-submit -S -i Data/p95test.txt -c p95/mu11/configs a$g$h.out -t .05 -d ./p95/mu11 done done -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:15 a.1.1.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:15 a.1.2.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:15 a.1.4.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:15 a.1.7.out -rw-r--r-- 1 perrizo faculty 3625 Nov 3 10:15 a.1.9.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:15 a.2.1.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:15 a.2.2.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:16 a.2.4.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:16 a.2.7.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:16 a.2.9.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:16 a.4.1.out -rw-r--r-- 1 perrizo faculty 3625 Nov 3 10:16 a.4.2.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:16 a.4.4.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:16 a.4.7.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:17 a.4.9.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:17 a.7.1.out -rw-r--r-- 1 perrizo faculty 3625 Nov 3 10:17 a.7.2.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:17 a.7.4.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:17 a.7.7.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:17 a.7.9.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:17 a.9.1.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:18 a.9.2.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:18 a.9.4.out -rw-r--r-- 1 perrizo faculty 3625 Nov 3 10:18 a.9.7.out -rw-r--r-- 1 perrizo faculty 3626 Nov 3 10:18 a.9.9.out which I then copy to src/mpp-mpred-3.2.0/dotouts. creates in src.mpp-mpred-3.2.0/p95/mu11/configs: -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:11 a.1.1.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:11 a.1.2.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:11 a.1.4.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:11 a.1.7.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:11 a.1.9.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:11 a.2.1.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:11 a.2.2.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:11 a.2.4.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:11 a.2.7.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:11 a.2.9.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:12 a.4.1.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:12 a.4.2.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:12 a.4.4.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:12 a.4.7.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:12 a.4.9.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:12 a.7.1.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:12 a.7.2.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:12 a.7.4.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:12 a.7.7.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:12 a.7.9.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:12 a.9.1.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:12 a.9.2.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:12 a.9.4.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:12 a.9.7.config -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:12 a.9.9.config

  12. submit script run in scr/mpp-mpred-3.2.0 also produces these subdirs in mpp-mpred-3.2.0 : p95test.txt.rmse Movie: 12641: 0: Answer: 1 Prediction: 1.22 Error: 0.04840 1: Answer: 4 Prediction: 3.65 Error: 0.12250 2: Answer: 2 Prediction: 2.55 Error: 0.30250 3: Answer: 4 Prediction: 4.04 Error: 0.00160 4: Answer: 2 Prediction: 1.85 Error: 0.02250 Sum: 0.49750 Total: 5 RMSE: 0.315436 Running RMSE: 0.315436 / 5 predictions Movie: 12502: 0: Answer: 4 Prediction: 4.71 Error: 0.50410 1: Answer: 5 Prediction: 3.54 Error: 2.13160 2: Answer: 5 Prediction: 3.87 Error: 1.27690 3: Answer: 3 Prediction: 3.33 Error: 0.10890 4: Answer: 2 Prediction: 2.97 Error: 0.94090 Sum: 4.96240 Total: 5 RMSE: 0.996233 Running RMSE: 0.738911 / 10 predictions . . . Movie: 10811: 0: Answer: 5 Prediction: 4.05 Error: 0.90250 1: Answer: 3 Prediction: 3.49 Error: 0.24010 2: Answer: 4 Prediction: 3.94 Error: 0.00360 3: Answer: 3 Prediction: 3.39 Error: 0.15210 Sum: 1.29830 Total: 4 RMSE: 0.569715 Running RMSE: 0.964397 / 743 predictions Movie: 12069: 0: Answer: 4 Prediction: 3.20 Error: 0.64000 1: Answer: 3 Prediction: 3.48 Error: 0.23040 Sum: 0.87040 Total: 2 RMSE: 0.659697 Prediction summary: Sum: 691.90610 Total: 745 RMSE: 0.963708 .predictions 12641: 1.22 3.65 2.55 4.04 1.85 12502: 4.71 3.54 3.87 3.33 2.97 . . . 10811: 4.05 3.49 3.94 3.39 12069: 3.20 3.48 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:15 a.1.1 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:15 a.1.2 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:15 a.1.4 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:15 a.1.7 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:15 a.1.9 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:15 a.2.1 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:15 a.2.2 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:16 a.2.4 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:16 a.2.7 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:16 a.2.9 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:16 a.4.1 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:16 a.4.2 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:16 a.4.4 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:16 a.4.7 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:17 a.4.9 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:17 a.7.1 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:17 a.7.2 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:17 a.7.4 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:17 a.7.7 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:17 a.7.9 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:17 a.9.1 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:18 a.9.2 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:18 a.9.4 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:18 a.9.7 drwxr-xr-x 2 perrizo faculty 4096 Nov 3 10:18 a.9.9 and e.g., a.9.9 contains: -rw-r--r-- 1 perrizo faculty 7441 Nov 3 10:17 a.9.9.config -rw-r--r-- 1 perrizo faculty 5191 Nov 3 10:18 hi-a.9.9.txt -rw-r--r-- 1 perrizo faculty 1808 Nov 3 10:18 hi-a.9.9.txt.answers -rw-r--r-- 1 perrizo faculty 1465 Nov 3 10:18 lo-a.9.9.txt -rw-r--r-- 1 perrizo faculty 688 Nov 3 10:18 lo-a.9.9.txt.answers -rw-r--r-- 1 perrizo faculty 4330 Nov 3 10:18 p95test.txt.predictions -rw-r--r-- 1 perrizo faculty 46147Nov 3 10:18 p95test.txt.rmse

  13. In dotouts is a script, createtablejob: #!/bin/bash for g in .1 .2 .4 .7 .9 do for h in .1 .2 .4 .7 .9 do grep Input:\ \ \ lo a$g$h.out >> job done done In dotouts is a script, createtablermse: #!/bin/bash for g in .1 .2 .4 .7 .9 do for h in .1 .2 .4 .7 .9 do grep RMSE:\ a$g$h.out >> rmse done done Sum: 692.82510 Total: 745 RMSE: 0.964348 Sum: 691.59330 Total: 745 RMSE: 0.963490 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.84690 Total: 745 RMSE: 0.963667 Sum: 690.47330 Total: 745 RMSE: 0.962710 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 693.27970 Total: 745 RMSE: 0.964664 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Sum: 691.90610 Total: 745 RMSE: 0.963708 Input: lo-a.1.1.txt Input: lo-a.1.2.txt Input: lo-a.1.4.txt Input: lo-a.1.7.txt Input: lo-a.1.9.txt Input: lo-a.2.1.txt Input: lo-a.2.2.txt Input: lo-a.2.4.txt Input: lo-a.2.7.txt Input: lo-a.2.9.txt Input: lo-a.4.1.txt Input: lo-a.4.2.txt Input: lo-a.4.4.txt Input: lo-a.4.7.txt Input: lo-a.4.9.txt Input: lo-a.7.1.txt Input: lo-a.7.2.txt Input: lo-a.7.4.txt Input: lo-a.7.7.txt Input: lo-a.7.9.txt Input: lo-a.9.1.txt Input: lo-a.9.2.txt Input: lo-a.9.4.txt Input: lo-a.9.7.txt Input: lo-a.9.9.txt

More Related