推荐｜清华老师推荐30来项算法代码和工具包列表（开源）

会员服务 ·

Highlight Packages

THULAC: An Efficient Lexical Analyzer for Chinese.
[home]http://thulac.thunlp.org/
[Git C++]https://github.com/thunlp/thulac
[Git Java]https://github.com/thunlp/THULAC-Java
[Git Python]https://github.com/thunlp/THULAC-Python
THUCTC: An Efficient Chinese Text Classifier.
[home]：http://thuctc.thunlp.org/
[Git Java]https://github.com/thunlp/THUCTC
THUOCL: Open Chinese Lexicon.
[home]http://thuocl.thunlp.org/target=_blank
OpenKE: An Open-Source Package for Knowledge Embedding (KE).
[home]http://openke.thunlp.org/
[Git] https://github.com/thunlp/OpenKE
OpenNE: An Open-Source Package for Network Embedding (NE).
[Git] https://github.com/thunlp/OpenNE

Knowledge Graph and Relation Extraction

NRE: An Open-Source Package for Neural Relation Extraction.
[Git]https://github.com/thunlp/NRE
[TensorFlow Version] https://github.com/thunlp/TensorFlow-NRE
Neural relation extraction aims to extract relations from plain text with neural models, which has been the state-of-the-art methods for relation extraction. In this package, we provide our implementations of CNN [Zeng et al., 2014] and PCNN [Zeng et al.,2015] and their extended version with sentence-level attention scheme [Lin et al., 2016].
JointNRE: Joint Neural Relation Extraction with Text and KGs.
[Git] https://github.com/thunlp/JointNRE
This is the lab code of our AAAI 2018 paper "Neural Knowledge Acquisition via Mutual Attention between Knowledge Graph and Text".
PathNRE: Neural Relation Extraction with Relation Paths.
[Git] https://github.com/thunlp/PathNRE
This is the lab code of our EMNLP 2017 paper "Incorporating Relation Paths in Neural Relation Extraction".
Neural Entity Alignment.
[Git] https://github.com/thunlp/IEAJKE
This is the lab code of our IJCAI 2017 paper "Iterative Entity Alignment via Joint Knowledge Embeddings".
Neural Entity Typing.
[Git] https://github.com/thunlp/KNET
This is the lab code of our AAAI 2018 paper "Improving Neural Fine-Grained Entity Typing with Knowledge Attention".

Knowledge Representation Learning

OpenKE: An Open-Source Package for Knowledge Embedding (KE).
[Git] https://github.com/thunlp/OpenKE
KRLPapers: Must-read papers on knowledge representation learning (KRL) / knowledge embedding (KE).
[Git] https://github.com/thunlp/KRLPapers
TransX: An Efficient implementation of TransE and its extended models for Knowledge Representation Learning.
[Git]https://github.com/thunlp/Fast-TransX
[TensorFlow Version] https://github.com/thunlp/TensorFlow-TransX
KB2E: A package of Knowledge Base to Embeddings.
[Git] https://github.com/thunlp/KB2E
The package contains state-of-the-art knowledge representation learning methods including TransE, TransH, TransR and PTransE.
KR-EAR: Knowledge Representation Learning with Entities, Attributes and Relations. [Git]
This is the lab code of our IJCAI 2016 paper "Knowledge Representation Learning with Entities, Attributes and Relations".
CKRL: Confidence-aware Knowledge Representation Learning.
[Git] https://github.com/thunlp/CKRL
This is the lab code of our AAAI 2018 paper "Does William Shakespeare REALLY Write Hamlet? Knowledge Representation Learning with Confidence". The method is expected to support robust knowledge representation learning with noisy triples.
IKRL: Image-embodied Knowledge Representation Learning.
[Git] https://github.com/thunlp/IKRL
This is the lab code of our IJCAI 2017 paper "Image-embodied Knowledge Representation Learning". The method is expected to support knowledge representation learning with entity images.
TKRL: Type-embodied Knowledge Representation Learning
[Git] https://github.com/thunlp/TKRL
This is the lab code of our IJCAI 2016 paper "Representation Learning of Knowledge Graphs with Hierarchical Types". The method is expected to support knowledge representation learning with hierarchical types of entities.
DKRL: Description-embodied Knowledge Representation Learning.
[Git] https://github.com/thunlp/DKRL
This is the lab code of our AAAI 2016 paper "Representation Learning of Knowledge Graphs with Entity Descriptions". The method is expected to support knowledge representation learning with entity descriptions.

Network Representation Learning

OpenNE: An Open-Source Package for Network Embedding (NE).
[Git] https://github.com/thunlp/OpenNE
NRLPapers: Must-read papers on network representation learning (NRL) / network embedding (NE).
[Git] https://github.com/thunlp/NRLPapers
TransNet: Translation-Based Network Representation Learning.
[Git] https://github.com/thunlp/TransNet
This is the lab code of our IJCAI 2017 paper "TransNet: Translation-Based Network Representation Learning for Social Relation Extraction". The method is expected to model social networks by regarding relations as the translation between vertices.
NEU: Fast Network Embedding.
[Git] https://github.com/thunlp/NEU
This is the lab code of our IJCAI 2017 paper "Fast Network Embedding Enhancement via High Order Proximity Approximation". The method is expected to speed up network embedding by approximate update algorithm.
CANE: Context-Aware Network Embedding.
[Git] https://github.com/thunlp/CANE
This is the lab code of our ACL 2017 paper "CANE: Context-Aware Network Embedding for Relation Modeling". The method is expected to support context-aware network representation learning and model asymmetric relations.
MMDW: Max-Margin DeepWalk.
[Git] https://github.com/thunlp/MMDW
This is the lab code of our IJCAI 2016 paper "Max-Margin DeepWalk: Discriminative Learning of Network Representation". The method is expected to support discriminative network representation learning with node labels.
TADW: Text-Associated DeepWalk.
[Git] https://github.com/thunlp/TADW
This is the lab code of our IJCAI 2015 paper "Network Representation Learning with Rich Text Information". The method is expected to support network representation learning with rich text information within each node. The code requires a 64-bit linux machine with MATLAB installed.

Sememe-Driven NLP

SE-WRL: Improved Word Representation Learning with Sememes.
[Git] https://github.com/thunlp/SE-WRL
This is the lab code of our ACL 2017 paper "Improved Word Representation Learning with Sememes". Sememes are minimum semantic units of word meanings, and the meaning of each word sense is typically composed by several sememes. We proposed the improved word representation learning method with sememe knowledge annotated in HowNet.
Lexical Sememe Prediction.
[Git] https://github.com/thunlp/sememe_prediction
This is the lab code of our IJCAI 2017 paper "Lexical Sememe Prediction via Word Embeddings and Matrix Factorization".
Chinese LIWC Lexicon Expansion: Online Interpretable Word Embeddings.
[Git] https://github.com/thunlp/Auto_CLIWC
This is the lab code of our AAAI 2018 paper "Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention".

Language Representation Learning

CWE: Character Word Embeddings.
[Git] https://github.com/Leonard-Xu/CWE
This is the lab code of our IJCAI 2015 paper "Joint Learning of Character and Word Embeddings". This method is expected to learn Chinese word embeddings by taking those characters within words into consideration. The analogical reasoning dataset on Chinese is available in data folder.
CLWE: Cross-Lingual Word Embeddings.
[home] http://nlp.csai.tsinghua.edu.cn/~lzy/src/acl2015_bilingual.html
This is the lab code of our ACL 2015 short paper "Learning Cross-lingual Word Embeddings via Matrix Co-factorization". This method is expected to learn cross-lingual word embeddings with a matrix co-factorization framework.
OIWE: Online Interpretable Word Embeddings.
[Git] https://github.com/SkTim/OIWE
This is the lab code of our EMNLP 2015 short paper "Online Learning of Interpretable Word Embeddings". This method is expected to learn interpretable word embeddings based on OIWE-IPG model proposed in our paper.
TWE: Topical Word Embeddings.
[Git] https://github.com/thunlp/topical_word_embeddings
This is the lab code of our AAAI 2015 paper "Topical Word Embeddings". The method is expected to perform representation learning of words with their topic assignments by latent topic models such as Latent Dirichlet Allocation.

General NLP

THUCKE: An Open-Source Package for Chinese Keyphrase Extraction.
[Git]https://github.com/thunlp/THUCKE
The package can efficiently extract Chinese keyphrases by translating from documents to keyphrases, learned by word alignment models (WAM) that we propoased in[EMNLP][CoNLL].
TensorFlow-Summarization: An Open-Source Package for Neural Headline Generation. [Git]
https://github.com/thunlp/TensorFlow-Summarization
This is an implementation of sequence-to-sequence model using a bidirectional GRU encoder and a GRU decoder. This project aims to help people start working on Abstractive Short Text Summarization immediately. And hopefully, it may also work on machine translation tasks.
THUNSC: An Open-Source Package for Neural Sentiment Classification.
[Git]https://github.com/thunlp/NSC
Neural Sentiment Classification aims to classify the sentiment in a document with neural models, which has been the state-of-the-art methods for sentiment classification. In this package, we provide our implementations of NSC, NSC+LA and NSC+UPA[Chen et al., 2016] in which user and product information is considered via attentions over different semantic levels.
THUTAG: An Open-Source Package for Keyphrase Extraction and Social Tag Suggestion. [Git]
https://github.com/thunlp/THUTag
The package contains several keyphrase extraction methods including TextRank, ExpandRank, Topical PageRank and WAM, and social tag suggestion methods including KNN, PMI, TagLDA, TAM and WTM. The package has supported one of the most popular microblog apps, Weibo Keywords, which has got more than 3.5 million registered users.
PLDA+: An Open-Source Package for Parallel LDA.
[Git] https://code.google.com/archive/p/plda/
PLDA is a parallel C++ implementation of Latent Dirichlet Allocation (LDA). We present a highly optimized parallel implemention of the Gibbs sampling algorithm for the training/inference of LDA. The carefully designed architecture is expected to support extensions of this algorithm. PLDA+, an enhanced parallel implementation of LDA, can further improve scalability of LDA by signiﬁcantly reducing the unparallelizable communication bottleneck and achieve good load balancing.