日本データベース学会

dbjapanメーリングリストアーカイブ(2007年)

[dbjapan] NTCIR-7 Call For Participation


(重複して受け取られた方はご容赦下さい)

第7回NTCIRワークショップ(NTCIR-7)のタスク参加案内をお送りします。
情報アクセス技術の研究や開発にご関心がある個人もしくはグループは
参加することができます。詳細は本メール末尾の英文案内をご覧下さい。


NTCIRは、情報アクセス技術の研究促進を目的として、研究に必要不可欠
な大規模かつ再利用可能な実験用データセットを提供します。さらに、
研究上のアイディアや妥当な評価方法について議論できるフォーラムを
提供します。具体的には、複数の研究チームが共通の「研究課題(タスク)」
を進めることで、協調と競争のバランスを保ちながら研究者コミュニティー
が発展することを目指しています。

第1回NTCIR-1の成果報告会は、1999年に開催されました。それ以降、回を
重ねるごとにタスクの種類や参加チームの数が増加し、言語横断情報検索、
自動要約、質問応答、特許情報処理、意見情報分析、動向情報分析、Web
検索などの多彩なテーマの発展に貢献してきました。

他方において、研究動向の変化や技術水準の高度化に追随するためには、
定例行事として甘んじることなく常に新たな挑戦が必要であると考えています。
今回のNTCIR-7では、NTCIR-6までの成果を踏まえるとともに、タスクの内容
と構成を一新し、以下に示す3つのクラスタを中心とした運営を行います。

○Advanced Cross-lingual Information Access
・高度な言語横断情報検索と質問応答

○User Generated Contents
・多言語意見情報の分析、言語横断ブログ検索

○Focused Domains
・特許情報の機械翻訳とマイニング

また、動向情報の要約も計画されています。


各クラスタでは、1つ以上のタスクを実行します。タスクへの参加資格や参加
方法は、NTCIR-6までと同じです。企業や大学などを問わず、情報アクセス
技術の研究開発にご関心がある個人もしくはグループは参加することが
できます。国立情報学研究所と覚書を交して頂くことで、通常では入手が
難しい、多彩な文書データを研究目的のために無償で提供いたします。


皆様の参加を心よりお待ちしております。


---
神門 典子
NTCIRプロジェクト


*************************************************************
                Call for Participation

             The 7th NTCIR Workshop (2007/2008)
Evaluation of Information Access Technologies: Information Retrieval,
       Question Answering, and Cross-Lingual Information Access

              October 2007 - December 2008
(Final Meeting: December 16-19, 2008, NII, Tokyo, Japan)

                 http://ntcir.nii.ac.jp/
*************************************************************
                    Online Registration:
http://research.nii.ac.jp/ntcir/cgi-bin/ntc7Registration.cgi?lang=en
*************************************************************

We are pleased to announce that the Seventh NTCIR (NTCIR-7)
workshop will start this year, and the concluding Workshop meeting
will be held at NII, Tokyo, Japan in December 16-19, 2008.

Participation is invited from anyone interested in research on
information access technologies and evaluation of them, such as
retrieval of documents from various genres, cross-lingual
information retrieval of Asian languages, question answering
and cross-lingual information access.

NTCIR Workshops are periodical events which are held once per
one and half years. All the documents needed for evaluation
will be provided from NII to the participants.

We have used Documents in East Asian Languages, but attracted
international participation. You are most welcome to participate!

Each task has Wiki or mailing lists for discussion. Discussion
for task design and evaluation methodologies are welcome.



** Tasks/Clusters for NTCIR-7

NTCIR-7 hosts the following tasks.

Cluster 1. Advanced Cross-lingual Information Access (ACLIA)
* Complex Cross-Lingual Question Answering (CCLQA)
* Information Retrieval for Query Answering (IR4QA)

Cluster 2. Information Access to User Generated Contents (UGC)
* Multi-Lingual Opinion Analysis Task (MOAT)
* Cross-Lingual Information Retrieval over Blog data (CLIR-B)

Cluster 3. Information Access to Focused Domains (PATENT)
* Patent Mining Task
* Patent Translation Task

Cluster Independent
* Multimodal Summarization of Trends (MuST)



** Clusters and Tasks Overview

* Cluster 1: Advanced Cross-lingual Information Access (ACLIA)
http://aclia.lti.cs.cmu.edu/wiki/moin.cgi/Home

This cluster evaluates "Complex CLQA", "CLIR" and "the contribution
of CLIR to CLQA". Documents are (Simplified and Traditional)
Chinese, and Japanese news published in 1998-2001. Questions/topics
are English, Chinese, Japanese and can be more.

CCLQA and CLIR share the same set of questions/topics, and
participation is welcome to test (1) end-to-end QA, or (2) CLIR
or IR module only using either original natural language questions,
or analyzed queries containing question types. IR modules will
be evaluated both IR itself and effectiveness in QA.

CCLQA and IR focusing to specific types of questions is new for
NTCIR. And we would like to know "What kind of IR mechanism
would be the best for what kind of QA mechanism?", "What kind
of combination is the best?", etc.



* Cluster 2: Information Access to User-Generated Contents
http://kde.ics.tut.ac.jp/~seki/ntcir_cl2/

CLIR-B: http://ntcir.nii.ac.jp/index.php/CLIRB/
MOAT: http://ntcir.nii.ac.jp/index.php/Table/MOAT/

This cluster consists of "Cross-Lingual Information Retrieval
for Blog (CLIR-B)" task and "Multilingual Opinion Analysis
Task (MOAT)". These use a newly crawled blog corpus of Chinese,
Japanese and English, which including both blog posts and
the comments for them; Topics will be shared by the tasks.

CLIR has been investigated in NTCIR from the beginning, but
Blog is a new document genre for NTCIR. CLIR for Blog shall be
an informational task to search opinionated documents relevant
to each topic. Opinion Analysis Task tests the ability of
the system to automatically identify relevance and opinionatedness
of each sentences in the relevant documents, opinion holder,
polarity and stakeholder (target of the opinion). Compared to
Opinion Analysis Task using news documents in NTCIR-6, identifying
"Stakeholder" is new.



* Cluster 3: Information Access for Focused Domains
http://if-lab.slis.tsukuba.ac.jp/fujii/ntcfd/index-en.html

Translation: http://if-lab.slis.tsukuba.ac.jp/fujii/ntc7patmt/index-en.html
Mining: http://www.nlp.its.hiroshima-cu.ac.jp/~nanba/ntcir-7/cfp-en.html

This cluster consists of "Patent Translation" and "Patent Mining"
tasks. It is targeting to evaluate the technologies to enhance
the information access for Patent, which were investigated
in the past NTCIRs.

Patent Translation will conduct both intrinsic and extrinsic
evaluation. Intrinsic evaluation consists of automatic
evaluation using a metrics like BLEU and human judgments.
Extrinsic evaluation adopts CLIR-task based evaluation, i.e.,
the contribution machine translation for CLIR will be tested.

Patent Mining task targets cross-genre information access
between patents and scientific papers. Abstracts of conference
papers are used as "topics" and the systems are requested
to provide appropriate International Patent Classification (IPC)
classes. It can be done as automatic categorization of paper
abstracts to IPC classes or as a cross-genre retrieval from
conference papers to patents.


* Multi-modal Summarization of Trends (MuST): TBA

Automatic identification and extraction of numeric information
related to the trends of a topic, and way of visualization will
be investigated and evaluated. For visualization, as a common
platform, an open source software for visualization will be provided.



** Important Dates

Registration Due:  November 15, 2007
Documents Release: November 15, 2007
Dry Run:    from November 2007 to April 2008
Formal Run: from 2007-11 to 2008-08
Task Overview Partial Release: by September 1, 2008
Evaluation Results Return: by September 1, 2008
Papers for Proceedings Due: October 1, 2008
Camera-ready for Proceedings Due: November 1, 2008
Final Meeting: December 16-19, 2008


* Notes

1. Whether there will be a dry run or not depends on each task.
For further information, please consult the web site for each task.

2. The exact dates for the dry and formal runs are decided by each
task. For further information, please contact Noriko Kando


The registration system for NTCIR-7 task participation is online.
Please register for NTCIR-7 at ;
http://research.nii.ac.jp/ntcir/cgi-bin/ntc7Registration.cgi?lang=en


"How to Participate" and "User Agreement forms" to obtain the data
set will be released soon. These are generally similar to
the previous ones that we used for NTCIR-6.
Just for your information:

HOW TO PARTICIPATE in NTCIR-6
http://research.nii.ac.jp/ntcir/ntcir-ws6/howto-en.html

USER AGREEMENT FORMS for NTCIR-6
http://research.nii.ac.jp/ntcir/ntcir-ws6/permission/perm-en.html

******************************************************************