前面讲了怎么用tensorflow识别一些常用的数据集,但是吧,大部分时候,我们都需要识别自己的数据集,比如你有一万张猫狗图片,这时候就需要把本地的那些照片作为数据集传到网络结构中进行处理,这些自己的图片,叫做自制数据集。
这篇文章,咱们用本地的数据集,完成一次识别,我把数据集放在了自己的博客资源里,你们去瞅瞅,应该有。
第一步咱们还是导入相应的包
import tensorflow as tf
from PIL import Image
import numpy as np
import os
from tensorflow import keras
因为数据在本地嘛,所以需要定义数据集在本地的位置
train_path = './数据集/class4/MNIST_FC/mnist_image_label/mnist_train_jpg_60000/'
train_txt = './数据集/class4/MNIST_FC/mnist_image_label/mnist_train_jpg_60000.txt'
x_train_savepath = './数据集/class4/MNIST_FC/mnist_image_label/mnist_x_train.npy'
y_train_savepath = './数据集/class4/MNIST_FC/mnist_image_label/mnist_y_train.npy'
test_path = './数据集/class4/MNIST_FC/mnist_image_label/mnist_test_jpg_10000/'
test_txt = './数据集/class4/MNIST_FC/mnist_image_label/mnist_test_jpg_10000.txt'
x_test_savepath = './数据集/class4/MNIST_FC/mnist_image_label/mnist_x_test.npy'
y_test_savepath = './数据集/class4/MNIST_FC/mnist_image_label/mnist_y_test.npy'
mnist_image_label文件夹里有两个txt文件和两个文件夹
文本文件里放的是图片名字和图片里的数字
两个文件夹里分别放了不同数量的手写数字图片,特别提醒,数组会以未压缩的原始二进制格式保存在扩展名为. npy的文件中
def generateds(path,txt):
f = open(txt,'r')
contents = f.readlines()
f.close()
x,y_ = [],[]
for content in contents:
value = content.split()
img_path = path+value[0]
img = Image.open(img_path)
# 图片变成8位宽灰度值的np.array格式
img = np.array(img.convert('L'))
img = img/255.0
x.append(img)
y_.append(value[1])
print('loading:'+content)
x = np.array(x)
y_ = np.array(y_)
y_ = y_.astype(np.int64)
return x,y_
if os.path.exists(x_train_savepath) and os.path.exists(y_train_savepath) and os.path.exists(x_test_savepath) and os.path.exists(y_test_savepath):
print('-'*20,'Load Datasets','-'*20)
x_train_save = np.load(x_train_savepath)
y_train = np.load(y_train_savepath)
x_test_save = np.load(x_test_savepath)
y_test = np.load(y_test_savepath)
x_train = np.reshape(x_train_save,(len(x_train_savepath),28,28))
x_test = np.reshape(x_test_save,len(x_test_save),28,28)
else:
print('-'*20,'Generate Datasets','-'*20)
x_train,y_train = generateds(train_path,train_txt)
x_test,y_test = generateds(test_path,test_txt)
print('-'*20,'Save Datasets','-'*20)
x_train_save = np.reshape(x_train,(len(x_train),-1))
x_test_save = np.reshape(x_test,(len(x_test),-1))
np.save(x_train_savepath,x_train_save)
np.save(y_train_savepath,y_train)
np.save(x_test_savepath,x_test_save)
np.save(y_test_savepath,y_test)
这段代码从if那里开始看,如果x_train_savepath,y_train_savepath,x_test_savepath,y_test_savepath,也就是那四个npy文件都存在的话,就直接从npy文件中读取出来之前保存好的数据,如果没有,就执行else里的语句。
先看if里,把x_train_save,y_train,x_test_save,y_test都读取出来,然后对x_train_save进行变形,变成长为len(x_train_save),像素值为28x28的数组,也就是x_train和x_test
那如果文件不存在或者文件不全的话,咱就需要去把这个npy文件生成出来。
这就是else里做的事情,把图片路径输入给generateds函数,然后返回已经做好数据类型变换的数组,把这些数组存在之前定义好的路径,生成npy文件。
这段代码不理解的话也没关系,当你用的时候把最上面文件路径改改就可以,不影响后面学习。
下面就是很简单的东西了,构建模型,模型配置,训练,然后summary查看结果
model = keras.models.Sequential([
keras.layers.Flatten(),
keras.layers.Dense(128,activation='relu'),
keras.layers.Dense(10,activation='softmax')
])
model.compile(optimizer='adam',
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=['sparse_categorical_accuracy'])
model.fit(x_train,y_train,batch_size=32,epochs=5,validation_data=(x_test,y_test),validation_freq=1)
model.summary()
运行之后如果没有npy文件的话首先读取图片,生成npy文件
读取完之后做epoch迭代训练
下面是完整代码,小伙伴们可以试着在自己的电脑上跑一下
import tensorflow as tf
from PIL import Image
import numpy as np
import os
from tensorflow import keras
train_path = './数据集/class4/MNIST_FC/mnist_image_label/mnist_train_jpg_60000/'
train_txt = './数据集/class4/MNIST_FC/mnist_image_label/mnist_train_jpg_60000.txt'
x_train_savepath = './数据集/class4/MNIST_FC/mnist_image_label/mnist_x_train.npy'
y_train_savepath = './数据集/class4/MNIST_FC/mnist_image_label/mnist_y_train.npy'
test_path = './数据集/class4/MNIST_FC/mnist_image_label/mnist_test_jpg_10000/'
test_txt = './数据集/class4/MNIST_FC/mnist_image_label/mnist_test_jpg_10000.txt'
x_test_savepath = './数据集/class4/MNIST_FC/mnist_image_label/mnist_x_test.npy'
y_test_savepath = './数据集/class4/MNIST_FC/mnist_image_label/mnist_y_test.npy'
def generateds(path,txt):
f = open(txt,'r')
contents = f.readlines()
f.close()
x,y_ = [],[]
for content in contents:
value = content.split()
img_path = path+value[0]
img = Image.open(img_path)
# 图片变成8位宽灰度值的np.array格式
img = np.array(img.convert('L'))
img = img/255.0
x.append(img)
y_.append(value[1])
print('loading:'+content)
x = np.array(x)
y_ = np.array(y_)
y_ = y_.astype(np.int64)
return x,y_
if os.path.exists(x_train_savepath) and os.path.exists(y_train_savepath) and os.path.exists(x_test_savepath) and os.path.exists(y_test_savepath):
print('-'*20,'Load Datasets','-'*20)
x_train_save = np.load(x_train_savepath)
y_train = np.load(y_train_savepath)
x_test_save = np.load(x_test_savepath)
y_test = np.load(y_test_savepath)
x_train = np.reshape(x_train_save,(len(x_train_savepath),28,28))
x_test = np.reshape(x_test_save,len(x_test_save),28,28)
else:
print('-'*20,'Generate Datasets','-'*20)
x_train,y_train = generateds(train_path,train_txt)
x_test,y_test = generateds(test_path,test_txt)
print('-'*20,'Save Datasets','-'*20)
x_train_save = np.reshape(x_train,(len(x_train),-1))
x_test_save = np.reshape(x_test,(len(x_test),-1))
np.save(x_train_savepath,x_train_save)
np.save(y_train_savepath,y_train)
np.save(x_test_savepath,x_test_save)
np.save(y_test_savepath,y_test)
model = keras.models.Sequential([
keras.layers.Flatten(),
keras.layers.Dense(128,activation='relu'),
keras.layers.Dense(10,activation='softmax')
])
model.compile(optimizer='adam',
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=['sparse_categorical_accuracy'])
model.fit(x_train,y_train,batch_size=32,epochs=5,validation_data=(x_test,y_test),validation_freq=1)
model.summary()
Original: https://blog.csdn.net/bester_man/article/details/121311473
Author: 可我是joker
Title: tensorflow(10)--自制数据集
相关阅读
Title: ModuleNotFoundError: No module named ‘d2l’
解决方案
问题描述:
Anaconda导入d2l库发生错误
import d2l
ModuleNotFoundError: No module named ‘d2l’
原因分析:
d2l库是李沐大神自己写的库函数,包括了一些我们常用的库和自定义函数。
解决方案:
1.下载
方式一:李沐老师动手深度学习github仓库
方式二:百度网盘
提取码:3xgq
; 2.寻找conda外部软件包的安装位置
对于windows上的conda环境,我们可以简单地检查conda的外部软件包(或conda环境)的安装位置。
2.1打开命令行
Win+r
打开"运行"对话框
输入 cmd
在命令行输入 ipython
2.2查看已安装包的路径
以numpy包为例
import numpy
numpy.__file__
3.将d2l库移动到安装包路径下
1.找到第二步查看的安装包路径,例如我的路径就是 D:\Anaconda\Lib\site-packages
2.将d2l库移动到安装包路径下
4.其它可能遇到的问题
如果在安装pytorch时,没有安装torchversion包,或者安装的包大于0.5.0版,那么可能会出现以下错误:
ModuleNotFoundError :No module named 'tqdm'
ModuleNotFoundError :No module named 'torchtext'
解决办法
conda install "包名"
Original: https://blog.csdn.net/qq_46378251/article/details/122380964
Author: 卡拉比丘流形
Title: ModuleNotFoundError: No module named ‘d2l’