1 / 47

Group Testing and Coding Theory

Group Testing and Coding Theory. Atri Rudra ( U. at Buffalo ) joint works with Piotr Indyk ( MIT ) Hung Ngo ( UB ) Ely Porat ( Bar- Ilan ). Main Message. Group Testing. Data Stream Algorithms. Coding Theory. Group Testing Overview. Test soldier for a disease.

livvy
Download Presentation

Group Testing and Coding Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Group Testing and Coding Theory Atri Rudra (U. at Buffalo) joint works with PiotrIndyk (MIT) Hung Ngo (UB) Ely Porat (Bar-Ilan)

  2. Main Message Group Testing Data Stream Algorithms Coding Theory

  3. Group Testing Overview Test soldier for a disease WWII example: syphillis

  4. Group Testing Overview Can pool blood samples and check if at least one soldier has the disease Test an army for a disease WWII example: syphillis What if only one soldier has the disease?

  5. Group Testing Tons of applications Set of items: (Unknown) vector x in {0,1}n At most d positives: |x| ≤ d Tests: a subset S of {1,..,n} ………… 1 2 3 n …………. …………. …………. …………. 1 0 0 1 0 0 1 0 1 1 0 0 1 0 1 0 1 Non-adaptive tests: all tests are fixed a priori 2 Result of a test: OR of xi’s such that i in S 3 . . . . . . Output + items Goal 1: Figure out x t t = O(d2log n) is possible Goal 2: Minimize the number of tests t

  6. The Decoding Step To be designed unknown Observed r1 x1 r2 x2 r3 ………… x3 1 2 3 n . . . …………. …………. …………. …………. . . . . . . 0 1 0 1 1 0 0 0 0 1 0 1 1 1 0 0 1 2 rt 3 How fast can this step be done? . . . . . . xn t

  7. Our Main Result d is O(logn) # tests (t) Decoding time O(d2 log n) O(nt) Folklore, [PR08] Big savings O(d4 log n) O(t) [GI04] O(d2 log2 n) poly(t) [GI04, implicit] O(d2 log n) poly(t) Our result

  8. An application: Heavy Hitters One pass, poly log space, poly log update, poly log report time Stream items are numbers in the range {1,…,n} Output all items that occur at least 1/d fraction of the times

  9. Cormode-Muthukrishnan idea Use group testing: maintain counters for each test Heavy tail property: Total frequency of non-heavy items < 1/d Maintain total count m ………… 1 2 3 n c1 …………. …………. 1 0 0 0 0 0 1 1 …………. 0 0 1 0 c2 ri = 1 iff ci ≥ m/d c3 xj= 1 iff j is a heavy item (|x| ≤ d) . . . Maintain count of items in tests . . . Reporting the heavy items is just decoding! r = M × x …………. 1 1 1 0 ct

  10. Requirements from Group Testing Non-adaptiveness is crucial Minimize t (space) ………… 1 2 3 n c1 …………. …………. 0 1 0 0 0 0 1 1 Strongly explicit matrix …………. 0 0 1 0 c2 c3 . . . . . . Minimize decoding time (report time) …………. 1 1 1 0 ct

  11. d-disjunct Matrices Every non-positive column has one0test result Sufficient condition for group testing d columns 0 0 0 …………….. 0 1 Test result=0 Exists Set of positives True for every d subset of columns and a disjoint column

  12. Naïve Decoder for d-disjunct Matrices If rj = 0 then for every column i that is in test j, set xi = 0 d columns If xi=1 then all tests column i participates in will have a 1 0 0 0 …………….. 0 1 Set of positives O(nt) time O(Lt) time L columns

  13. So far… Strongly explicit d-disjunct matrix with t = O(d2 log2n) [Kautz-Singleton 1964] d columns Deterministic d-disjunct matrix with t = O(d2 logn) [Porat-Rothschild 2008] r1 r2 Lower bound of Ω(d2 log n/log d) [Dyachkov-Rykov 1982] r3 . . . 0 0 0 …………….. 0 1 rt d-disjunct matrix Set of positives O(nt) time

  14. Filter-Evaluate Decoding Paradigm L columns d columns r1 y1 “Filtering” matrix r2 y2 r3 y3 . . . . . . 0 0 0 …………….. 0 1 rt yt’ d-disjunct matrix Set of positives O(Lt) time poly(t’)time

  15. So all we need to do o(d2 log n/log d) tests

  16. The filtering matrix New* object: (d,L)-list disjunct matrix d columns Running naïve decoder returns ≤ L boguscolumns (d,d)-list disjunct matrices exists with O(d log n) tests Independently considered by [Cheraghchi 09] Set of positives d+L columns

  17. The rest of the talk Strongly explicit d-disjunct matrix with O(d2 log2 n) tests Strongly explicit (d,d2)-list disjunct matrix with t’=O(d1.6 log n) tests and can be decoded in time poly(t’)

  18. Coding Theory is the Bridge Group Testing Data Stream Algorithms Coding Theory

  19. All you ever needed to know about (Reed-Solomon) codes… at least for this talk q is a prime power codewords qq/(d+1)vectors from [q]qwhere every two agree in < q/(d+1) positions poly(q) time algorithm for list recovery S1 S2 S3 Sq Si subset of [q] . . . Output all codewords that agree with all the input lists ……………………… . . . . |Si| ≤ d ……………………… c1 c3 cq c2

  20. Disjunct matrices from RS codes Column i gets ith codeword n = qq/(d+1) Code Concatenation t = q2= O(d2 log2n) …. x …. 0 0 0 1 q x q rows x d-disjunct matrix [Kautz,Singleton] .

  21. A q=3 example 0 1 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 1 2 1 1 0 2 0 0 0 1 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 1 1 2 0 0 2 0 1 0 0 0 1 1 0 0 0 0 1 1 0 0 0 1 0 2 1 0 1 0 2 1 0 2 0 0 1 0 1 0 0 0 1

  22. 1-Agreement between two columns 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 ≤ 1 agr 2 1 1 0 0 2 0 0 1 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 2 2 1 1 0 0 1 0 1 0 1 0 0 0 1 0 1 0 0 0 0 1 2 1 1 2 0 0 1 0 2 Agreement in binary = Agreement among RS codewords < q/(d+1) 0 0 1 0 1 0 1 0 0

  23. d-Disjunctness of Kautz-Singleton d columns 1 0 0 0 > q- q*d/(d+1)) rows 1 1 < q/(d+1) agr 1 1 < q/(d+1) agr 1 1 < q/(d+1) agr

  24. d-disjunct Matrices Sufficient condition for group testing d columns 0 0 0 …………….. 0 1 Exists Set of positives True for every d subset of columns and a disjoint column

  25. The rest of the talk Strongly explicit d-disjunct matrix with O(d2 log2 n) tests Strongly explicit (d,d2)-list disjunct matrix with t’=O(d1.6 log n) tests and can be decoded in time poly(t’)

  26. A detour The Kautz-Singleton matrix is a Strongly explicit (d,d2)-list disjunct matrix with t’=O(d2 log2 n) tests and can be decoded in time poly(t’)

  27. Back to the example 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 1 {1,2} {2} {0,2} 0 1 2 0 2 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 1 2 2 0 1 0 0 1 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 0 1 0 1 1 2 0 1 0 2 1 0 2 + items Result vector 1 0 0 0 1 0 0 0 1

  28. All you ever needed to know about (Reed-Solomon) codes… at least for this talk q is a prime power qq/(d+1)vectors from [q]qwhere every two agree in < q/(d+1) positions poly(q) time algorithm for list recovery S1 S2 S3 Sq Si subset of [q] . . . Output all codewords that agree with all the input lists ……………………… |Si| ≤ d . . . . ……………………… c1 c3 cq c2

  29. Connection to List Recovery Decoding: Output all codewords that match the test results List recover from S1,…,Stto get the positive codewords 1 . . . 2 …. x …. 0 0 0 1 . . . . . . |Sj|≤ d x x ………… ………… ………… ………… 1 j x S1 S2 Sj Sq . . . . . . . . . q . . . . r

  30. What does this imply? d2 columns d columns Implicit in [Guruswami-Indyk 04] t = O(d2 log2 n) r1 r2 r3 . . . 0 0 0 …………….. 0 1 rt KS matrix Set of positives poly(t) time O(d2t) time

  31. The rest of the talk Strongly explicit d-disjunct matrix with O(d2 log2 n) tests Strongly explicit (d,d2)-list disjunct matrix with t’=O(d1.6 log n) tests and can be decoded in time poly(t’)

  32. Revisiting the decoding algorithm q 3 1 2 ………. 1 1 ………. 2 1 ………. 3 1 . . . 1 . . 2 q 1 Works but hits a d3 barrier . . . . . . |Sj|≤ d x ………… 1 j x Sj . . . . . . . . . d-disjunct matrix Naïve decoder q . . . . r

  33. Revisiting the decoding algorithm-II q 3 1 2 Need to change the parameters of the Reed-Solomon codes a bit. 1 . . 2 . . . . . . |Sj|≤ 2d x ………… 1 j x Sj . . . . . . . . . (d,d)-list disjunct Naïve decoder q . . . . r

  34. Some number crunching q 3 1 2 RS codeword d log q rows 1 2 . . . . . . j . . . n ~ qq/d . . . . . . (d,d)-list disjunct t = q X (d log q) q . . . . ~ (d X log n/ log q) X (d log q) = d2 log n

  35. What does this imply? Matches best known bound! t = O(d2 log n) d2 columns d columns r1 y1 “Filtering” matrix r2 y2 r3 y3 . . . . . . 0 0 0 …………….. 0 1 rt yt d-disjunct matrix Set of positives O(d2t) time poly(t)time

  36. http://www.impawards.com/2007/are_we_done_yet.html

  37. How we get our hands on… q 3 1 2 RS codeword d log q rows 1 2 . . . . . . j . . . n ~ qq/d . . . . . . (d,d)-list disjunct t = q X (d log q) q . . . . ~ (d X log n/ log q) X (d log q) = d2 log n

  38. Solution 1 [Indyk, Ngo, R. 10] q 3 1 2 d log q rows Pick “inner” codes at random (d,d)-list disjunct

  39. Can also show d-disjunctness Different “inner” matrices for different RS codeword positions Random matrix x Can show whp all matrices are what they should be .

  40. Solution 2 [Ngo, Porat, R. 11] q 3 1 2 d log q rows Use explicit expanders! (d,d)-list disjunct

  41. (d,d)-list disjunct Matrices d columns d columns 0 0 0 …………….. 0 1 0 Exists Set of positives True for every disjoint d subsets of columns

  42. The expander connection Works if sets of size 2d expand by at least .75*degree d columns d columns 0 0 0 …………….. 0 1 0 Exists Rows Set of positives Columns

  43. Solution 2 [Ngo, Porat, R. 10] q 3 1 2 d log q rows Use explicit expanders! Some comments: (d,d)-list disjunct Left degree of the expander not important d1+o(1) log q rows possible [GUV 07, Cheraghchi 09] Use PV codes instead of RS codes

  44. The rest of the talk Strongly explicit d-disjunct matrix with O(d2 log2 n) tests Strongly explicit (d,d2)-list disjunct matrix with t’=O(d1.6 log n) tests and can be decoded in time poly(t’)

  45. Our Main Result O(d4 log n) O(t) [GI04] # tests (t) Decoding time O(d2 log n) O(nt) Folklore, [PR08] O(d2 log2 n) poly(t) [GI04, implicit] O(d2 log n) poly(t) Our result

  46. Other work/Open Questions Results generalize to compressed sensing [Ngo, Porat, R. 11] Other applications of group testing? Complexity Theory? Strongly explicit construction of optimal disjunct matrices ?

  47. Questions?

More Related