Internet-based
Information Extraction Technologies
Teacher: Fang Li
Office: SEIEE Building, No.3 Room: 533
Office
Time: Tuesday (10~11 AM)
References:
1)Sonit Singh, Natural Language Processing
for Information Extraction (2018)
2) Information
Extraction: Algorithms and Prospects in a Retrieval Context by Marie-Francine
Moens
Published by Spinger, (P.O. Box 17, 3300 AA Dordrecht, The Netherlands) . ISBN-13 978-1-4020-4993-4 (e-book)
3)Jerry R.Hobbs, Ellen Riloff,
Information
Extraction chapter 21 of
Handbook of Natural Language Processing (2010).
4)Ralph Grishman. Information Extraction:
Capabilities and Challenges (2012)
Introduction:
Internet-based Information extraction (IE) is the method
of deriving structured information from unstructured text and semi-structured
web pages. More succinctly, information extraction is finding names of the
entities, relations and events from the Internet and free text.
The
lecture introduces an overview of the history and technologies of information
extraction. It presents the state-of-the art research methods and focuses on
real world applications.
Readings will be based on the conference articles. Grades
will be based on class participation and projects. There is no final
examination for this course. Students are encouraged to form a group in order
to finish projects and write reports. There are three tasks. Each group can
choose two of them and present their project in the class workshop at the end
of the semester.
Course Topics
and Readings
Weeks |
Topics |
Slides |
Readings
& References |
1th
|
Motivation
& Course
Introduction |
||
2th
|
Basic
Knowledge for IE |
|
|
3th |
IE
Concepts |
||
4th |
Holiday |
|
|
5th |
Named
Entity Extraction |
||
6
th |
Named
Entity Extraction (Discussion
& group presentation) |
||
7th |
Relation
Extraction (pattern-based, supervised, semi-supervised) |
||
8th |
Relation
Extraction (distant-supervised, deepLearning) &
group Discussion |
||
9th |
Event
extraction |
||
10th |
Opinion
Mining |
PolarityEmbeddingFusionforRobustSentimentAnalysis SAwithEnsembleofConvolutionalNeuralNetworkswithDistantSupervision |
|
11th |
Opinion
Mining |
Inducing Domain-specific Sentiment Lexicons
from Unlabeled Corpora |
|
12th |
Webpage
IE |
||
13th |
IE
system |
||
14th |
Knowledge
Graph (new) |
||
15th
~16th |
Student
Workshop |
Each
group presents their work which includes: the task, its problems and analysis (2 minutes) Describe your general approach (3
minutes) Your results (3 minutes) Open questions and challenges (2
minutes) , Q&A
(5 minutes). |
Noted:
The content of each lecture may change every
year. The above slides only give you the general information about each
lecture. The classroom exercises and discussions are
not included in these slides. The new teaching materials are in the Canvas.
Prerequisites
Data Structure, Programming Language, Natural Language Processing
Grading:
1.
Attendence & Classroom Discussions (40%)
2.
Reading & writing (20%)
3.
Algorithm Design (40%)
Project tasks:
1)
Specific Relation
Extraction. Please
see the training data(for employment,
chief of, location extraction) , another training
data (for four kinds of employment relationship extraction) and student work1 and student work2 presented in the last few years
for your references.
2)
Positive and
negative Sentimental Analysis. Please see the training
data for example.
3)
Web Page Extraction
About the
evaluation:
1)
Task1 (employment
relation extraction): input file format and output
file format.
2)
Task2 (positive and
negative sentimental analysis) input file format and
output file format
3)
Specification for evaluation and evaluation tools.