Lecture Notes For All: Information Retrieval

GoDaddy

...................

Sunday, May 15, 2011

Information Retrieval

Information Retrieval

 

Information Retrieval (IR) is the study of methods for capturing, representing, storing, organizing, and retrieving unstructured or loosely structured information. Its most well known aspect is also known as document retrieval: the process of indexing and retrieving text documents. However, the field of Information Retrieval includes almost any type of unstructured or semi-structured data, including newswire stories, transcribed speech, email, blogs, images, or even video. When the data consists of material found on the Web, Information Retrieval is a critical aspect of Web search engines.
CMPSCI 646 is a graduate-level class in Information Retrieval. It covers the basic ideas of IR to provide the student with an intuition for how search engines work, why they're successful, and to some degree how they fail. The course touches on popular and important approaches to the problem, providing both historical context as well as state-of-the-art results.

Download Lectures

Class #
Topic

Class canceled on account of graduate orientation.
1
Administrivia [pdf] and introduction [pdf]
2
Evaluation basics [pdf].  Please read CMS 8.1, 8.2, and
8.4 and/or MRS 8.1-8.4 beforehand.  (This assignment was made late, so there
will be no assumption that this material was read in advance.  But you should if you can.)
3
Retrieval models [pdf].  Read CMS 7.1-7.3 and/or MRS 6.3, 11.1-11.3, 12.1-12.2.  
4
Retrieval models, wrapping up sketch of language modeling,
then onto vector space, including LSI [pdf].
5
Retrieval models, binary independence model [pdf].  
Also, the first quiz [Q1,pdf].
6
Retrieval models, complete probabilistic, also
 inference networks and logic models [pdf].

Programming homework P1 is due at 8:00pm.
7
Text statistics [pdf].  Read CMS 4.1-4.4 and the rest
of the chapter if it grabs your attention.  MRS 5.1 is also useful but less thorough.
8
Guest lecture on relevance models [pdf] by
 Niranjan Balasubramanian.
9
Complete text statistics [pdf], talking about estimating weights.




  Start talking about file organization--i.e., some issues involved in how
an IR system is actually implemented [pdf].  Homework H1 is due before class today.

No class today because the University is running a Monday schedule.
10
Complete presentation of file organization, specifically inverted files [pdf].
 A pop quiz [Q2,pdf].  Discussion of class projects.  Readings: For CMS,
 look at Chapter 5, particularly 5.3 for inverted lists.  For MRS the inverted list is
introducted in 1.2, built on in 2.3 and 2.4, and wildcards are in 3.2.
11
Compression.  Handed back HW1 (solution [pdf]) Readings:
 For CMS, look at Chapter 5, particularly 5.4 for compression.  
For MRS,  compression is 5.4 (though you'll need more of chapter 5 for background).  
12
Complete compression [pdf].
13
Clustering for IR, largely from a vector space view [pdf]
14
Clustering for IR, largely from a language modeling (including topic modeling)
perspective [pdf].  Also a pop quiz [Q3,pdf].
15
Web retrieval basics, including NDCG [pdf].  
HW2 [pdf] is due but will be accepted until Thursday.
16
Midterm review with some discussion questions
(but no answers) [pdf].  
HW2 due today (here is the solutoin [pdf]).
17
In-class, open book midterm exam.  You may bring electronic devices,
 but only if you can convincingly demonstrate that they cannot access the outside world.
18
Optional class.  Project "workshop", helping groups clean up project
specifications and sort out the details of what needs to be done.

By 5:00pm, submit a project description (see Project tab above).  
Also prepare a project pitch presentation (one or two slides) for Tuesday and send the presentation
to the professor before 7:00am tomorrow (Tuesday the 7th).  
19
Project pitches.  Send your presentation (if any) to the professor before 7:00am today.  
Also bring a backup presentation mechanism in case it won't work.  
In particular, if you're a Keynote user there may be issues....
20
Guest lecture by Niranjan
overviewing learning to rank [pdf, but note that it's 19Mb].
21
MapReduce approach to massively
distributed computation for IR [pdf].  Also a pop quiz [Q4,pdf].

No class today because
 it is Thanksgiving break.



22
Cross-language retrieval [pdf]. P2 is due by 8:00pm today.  (Feel free to take an extension of a day or three.)
23
Question answering [pdf].  Draft version of final project writeup due to professor.
24
Final project presentations, part 1 of 2.  (This is National Computer Science Education Week!)
25
Final project presentations, part 2 of 2.  Last class.   (This is National Computer Science Education Week!)

P2, P3, and HW3 final submission deadline.  Remember that you may elect to skip one of them and get full credit for it (but you must explicitly skip it or not bother handing it in).

Final exam available for pickup.  See "exams" tab for details. 

Last possible time to pick up final exam and get a full 48 hours to complete it.

Last possible day and time to hand in final exam.  


No comments:

Post a Comment