来源:读芯术
本文约6800字,建议阅读10分钟。
本文为你介绍24种涵盖端到端数据科学生命周期的Python库。
易用性和灵活性
全行业高接受度:Python无疑是业界最流行的数据科学语言
用于数据科学的Python库的数量优势
用于不同数据科学任务的Python库
Beautiful Soup
Scrapy
Selenium
Pandas
PyOD
NumPy
Spacy
Matplotlib
Seaborn
Bokeh
Scikit-learn
TensorFlow
PyTorch
Lime
H2O
Librosa
Madmom
pyAudioAnalysis
OpenCV-Python
Scikit-image
Pillow
Psycopg
SQLAlchemy
Flask
传送门:
https://www.crummy.com/software/BeautifulSoup/bs4/doc/
pip install beautifulsoup4
#!/usr/bin/python3
# Anchor extraction from html document
from bs4 import BeautifulSoup
from urllib.request import urlopen
with urlopen('LINK') as response:
soup = BeautifulSoup(response, 'html.parser')
for anchor in soup.find_all('a'):
print(anchor.get('href', '/'))
《新手指南:在Python中使用BeautifulSoup进行网页抓取》传送门:
https://www.analyticsvidhya.com/blog/2015/10/beginner-guide-web-scraping-beautiful-soup-python/
传送门:
https://docs.scrapy.org/en/latest/intro/tutorial.html
pip install scrapy
import scrapy
class Spider(scrapy.Spider):
name = 'NAME'
start_urls = ['LINK']
def parse(self, response):
for title in response.css('.post-header>h2'):
yield {'title': title.css('a ::text').get()}
for next_page in response.css('a.next-posts-link'):
yield response.follow(next_page, self.parse
《使用Scrapy在Python中进行网页抓取(含多个示例)》传送门:
https://www.analyticsvidhya.com/blog/2017/07/web-scraping-in-python-using-scrapy/
传送门:
https://www.seleniumhq.org/
《数据科学项目:使用Python和Selenium抓取YouTube数据对视频进行分类》传送门:
https://www.analyticsvidhya.com/blog/2019/05/scraping-classifying-youtube-video-data-python-selenium/
传送门:
https://pandas.pydata.org/pandas-docs/stable/
pip install pandas
数据集连接和合并
删除和插入数据结构列
数据过滤
重塑数据集
使用DataFrame对象来操作数据等
《Python中用于数据操作的12种有用的Pandas技术》传送门:
https://www.analyticsvidhya.com/blog/2016/01/12-pandas-techniques-python-data-manipulation/
《CheatSheet:在Python中使用Pandas进行数据探索》传送门:
https://www.analyticsvidhya.com/blog/2015/07/11-steps-perform-data-analysis-pandas-python/
传送门:
https://pyod.readthedocs.io/en/latest/
pip install pyod
《学习在Python中使用PyOD库检测异常值的绝佳教程》传送门:
https://www.analyticsvidhya.com/blog/2019/02/outlier-detection-python-pyod/
传送门:
https://www.numpy.org/
$ pip install numpy
创建数组
import numpy as np
x = np.array([1, 2, 3])
print(x)
y = np.arange(10)
print(y)
output - [1 2 3]
[0 1 2 3 4 5 6 7 8 9]
基本运算
a = np.array([1, 2, 3, 6])
b = np.linspace(0, 2, 4)
c = a - b
print(c)
print(a**2)
output - [1. 1.33333333 1.66666667 4. ]
[ 1 4 9 36]
pip install -U spacy
python -m spacy download en
《简化自然语言处理——使用SpaCy(在Python中)》传送门:
https://www.analyticsvidhya.com/blog/2017/04/natural-language-processing-made-easy-using-spacy-%e2%80%8bin-python/
传送门:
https://matplotlib.org/
$ pip install matplotlib
柱状图
%matplotlib inline
import matplotlib.pyplot as plt
from numpy.random import normal
x = normal(size=100)
plt.hist(x, bins=20)
plt.show()
3D 图表
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.gca(projection='3d')
X = np.arange(-10, 10, 0.1)
Y = np.arange(-10, 10, 0.1)
X, Y = np.meshgrid(X, Y)
R = np.sqrt(X**2 + Y**2)
Z = np.sin(R)
surf = ax.plot_surface(X, Y, Z, rstride=1,cstride=1, cmap=cm.coolwarm)
plt.show()
《使用NumPy、Matplotlib和Pandas在Python中进行数据探索的终极指南》传送门:
https://www.analyticsvidhya.com/blog/2015/04/comprehensive-guide-data-exploration-sas-using-python-numpy-scipy-matplotlib-pandas/
传送门:
https://seaborn.pydata.org/
作为一个面向数据集的API,可用于查验多个变量之间的关系
便于查看复杂数据集的整体结构
用于选择显示数据中模式的调色板的工具
pip install seaborn
import seaborn as sns
sns.set()
tips =sns.load_dataset("tips")
sns.relplot(x="total_bill",y="tip", col="time",
hue="smoker",style="smoker", size="size",
data=tips);
import seaborn as sns
sns.catplot(x="day",y="total_bill", hue="smoker",
kind="violin",split=True, data=tips);
pip install bokeh
《使用Bokeh的交互式数据可视化(在Python中)》传送门:
https://www.analyticsvidhya.com/blog/2015/08/interactive-data-visualization-library-python-bokeh/
《Python中的Scikit-learn——笔者去年学到的最重要的机器学习工具!》传送门:
https://www.analyticsvidhya.com/blog/2015/01/scikit-learn-python-machine-learning-tool/
传送门:
https://www.tensorflow.org/
安装传送门:
https://www.tensorflow.org/install
《TensorFlow 101:理解张量和图像以便开始深入学习》传送门:
https://www.analyticsvidhya.com/blog/2017/03/tensorflow-understanding-tensors-and-graphs/
《开始使用Keras和TensorFlow在R中进行深度学习》传送门:
https://www.analyticsvidhya.com/blog/2017/06/getting-started-with-deep-learning-using-keras-in-r/
传送门:
https://pytorch.org/
NumPy的替代品,可使用GPU的强大功能
深度学习研究型平台,拥有最大灵活性和最快速度
安装指南传送门:
https://pytorch.org/get-started/locally/
混合前端
工具和库:由研发人员组成的活跃社区已经建立了一个丰富的工具和库的生态系统,用于扩展PyTorch并支持计算机视觉和强化学习等领域的开发
云支持:PyTorch支持在主要的云平台上运行,通过预构建的映像、对GPU的大规模训练、以及在生产规模环境中运行模型的能力等,可提供无摩擦的开发和轻松拓展
《PyTorch简介——一个简单但功能强大的深度学习库》传送门:
https://www.analyticsvidhya.com/blog/2018/02/pytorch-tutorial/
《开始使用PyTorch——学习如何建立快速和准确的神经网络(以4个案例研究为例)》传送门:
https://www.analyticsvidhya.com/blog/2019/01/guide-pytorch-neural-networks-case-studies/
传送门:
https://github.com/marcotcr/lime
pip install lime
《在机器学习模型中建立信任(在Python中使用LIME)》传送门:
https://www.analyticsvidhya.com/blog/2017/06/building-trust-in-machine-learning-models/
传送门:
https://github.com/h2oai/mli-resources
《机器学习可解释性》传送门:
https://www.h2o.ai/wp-content/uploads/2018/01/Machine-Learning-Interpretability-MLI_datasheet_v4-1.pdf
传送门:
https://librosa.github.io/librosa/
安装指南传送门:
https://librosa.github.io/librosa/install.html
《利用深度学习开始音频数据分析(含案例研究)》传送门:
https://www.analyticsvidhya.com/blog/2017/08/audio-voice-processing-deep-learning/
传送门:
https://github.com/CPJKU/madmom
NumPy
SciPy
Cython
Mido
PyTest
Fyaudio
PyFftw
pip install madmom
《学习音乐信息检索的音频节拍追踪(使用Python代码)》传送门:
https://www.analyticsvidhya.com/blog/2018/02/audio-beat-tracking-for-music-information-retrieval/
对未知声音进行分类
检测音频故障并排除长时间录音中的静音时段
进行监督和非监督的分割
提取音频缩略图等等
pip install pyAudioAnalysis
传送门:
https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_setup/py_intro/py_intro.html
pip3 install opencv-python
《基于深度学习的视频人脸检测模型建立(Python实现)》传送门:
https://www.analyticsvidhya.com/blog/2018/12/introduction-face-detection-video-deep-learning-python/
《16个OpenCV函数启动计算机视觉之旅(使用Python代码)》传送门:
https://www.analyticsvidhya.com/blog/2019/03/opencv-functions-computer-vision-python/
传送门:
https://scikit-image.org/
Python(> = 3.5)
NumPy(> = 1.11.0)
SciPy(> = 0.17.0)
joblib(> = 0.11)
pip install -U scikit-learn
传送门:
https://pillow.readthedocs.io/en/stable/
逐像素操作
掩模和透明处理
图像过滤,例如模糊,轮廓,平滑或边缘监测
图像增强,例如锐化,调整亮度、对比度或颜色
在图像上添加文字等等
pip install Pillow
《AI漫画:Z.A.I.N —— 第二期:使用计算机视觉进行面部识别》传送门:
https://www.analyticsvidhya.com/blog/2019/06/ai-comic-zain-issue-2-facial-recognition-computer-vision/
传送门:
http://initd.org/psycopg/
Python版本2.7
Python 3版本(3.4到3.7)
PostgreSQL服务器版本(7.4到11)
PostgreSQL客户端库版本(9.1以上)
pip install psycopg2
传送门:
https://www.sqlalchemy.org/
pip install SQLAlchemy
传送门:
http://flask.pocoo.org/docs/1.0/
Werkzeug:Python编程语言的实用程序库
Jinja:Python的模板引擎
from flask import Flask
app = Flask(__name__)
@app.route("/")
def hello():
return "HelloWorld!"
if __name__ == "__main__":
app.run()
《在生产中将机器学习模型部署为API的教程(使用Flask)》传送门:
https://www.analyticsvidhya.com/blog/2017/09/machine-learning-models-as-apis-using-flask/
相关链接:
https://www.analyticsvidhya.com/blog/2019/07/dont-miss-out-24-amazing-python-libraries-data-science/