虎牙小满歌单:python 按行读文件
来源:百度文库 编辑:中财网 时间:2024/05/04 06:33:35
Doing it the usual way
The standard idiom consists of a an ‘endless’ while loop, in which we repeatedly call the file’s readline method. Here’s an example:
# File: readline-example-1.pyfile = open("sample.txt")while 1: line = file.readline() if not line: break pass # do somethingThis snippet reads the file line by line. If readline reaches the end of the file, it returns an empty string. Otherwise, it returns the line of text, including the trailing newline character.
On my test machine, using a 10 megabyte sample text file, this script reads about 32,000 lines per second.
Using the fileinput module
If you think the while loop is ugly, you can hide the readline call in a wrapper class. The standard fileinput module contains an input class which does exactly that.
# File: readline-example-2.pyimport fileinputfor line in fileinput.input("sample.txt"): passHowever, adding more layers of Python code doesn’t exactly help. For the same test setup, performance drops to 13,000 lines per second. That’s nearly two and half times slower!
Speeding up line reading
To speed things up, we obviously need to make sure we spend as little time on in Python code (running under the interpreter) as possible.
One way to do this is to tell the file object to read larger chunks of data. For example, if you have enough memory, you can slurp the entire file into memory, using the readlines method. Or you could even use the read method to read the entire file into a single memory block, and then use string.split to chop it up into individual lines.
However, if you’re processing really large files, it would be nice if you could limit the chunk size to something reasonable. For example, if you read a few thousand lines at a time, you probably won’t use up more than 100 kilobytes or so.
The following script uses a nested loop. The outer loop uses readlines to read about 100,000 bytes of text, and the inner loop processes those lines using a simple for-in loop:
# File: readline-example-3.pyfile = open("sample.txt")while 1: lines = file.readlines(100000) if not lines: break for line in lines: pass # do somethingCan this really be faster? You bet. With the same test data, we can now process 96,900 lines of text per second!
Or to put it another way, this solution is three times as fast as the standard solution, and over seven times faster than the fileinput version.
In Python 2.2 and later, you can loop over the file object itself. This works pretty much like readlines(N) under the covers, but looks much better:
# File: readline-example-5.pyfile = open("sample.txt")for line in file: pass # do somethingIn Python 2.1, you have to use the xreadlines iterator factory instead:
# File: readline-example-4.pyfile = open("sample.txt")for line in file.xreadlines(): pass # do something
The standard idiom consists of a an ‘endless’ while loop, in which we repeatedly call the file’s readline method. Here’s an example:
# File: readline-example-1.pyfile = open("sample.txt")while 1: line = file.readline() if not line: break pass # do somethingThis snippet reads the file line by line. If readline reaches the end of the file, it returns an empty string. Otherwise, it returns the line of text, including the trailing newline character.
On my test machine, using a 10 megabyte sample text file, this script reads about 32,000 lines per second.
Using the fileinput module
If you think the while loop is ugly, you can hide the readline call in a wrapper class. The standard fileinput module contains an input class which does exactly that.
# File: readline-example-2.pyimport fileinputfor line in fileinput.input("sample.txt"): passHowever, adding more layers of Python code doesn’t exactly help. For the same test setup, performance drops to 13,000 lines per second. That’s nearly two and half times slower!
Speeding up line reading
To speed things up, we obviously need to make sure we spend as little time on in Python code (running under the interpreter) as possible.
One way to do this is to tell the file object to read larger chunks of data. For example, if you have enough memory, you can slurp the entire file into memory, using the readlines method. Or you could even use the read method to read the entire file into a single memory block, and then use string.split to chop it up into individual lines.
However, if you’re processing really large files, it would be nice if you could limit the chunk size to something reasonable. For example, if you read a few thousand lines at a time, you probably won’t use up more than 100 kilobytes or so.
The following script uses a nested loop. The outer loop uses readlines to read about 100,000 bytes of text, and the inner loop processes those lines using a simple for-in loop:
# File: readline-example-3.pyfile = open("sample.txt")while 1: lines = file.readlines(100000) if not lines: break for line in lines: pass # do somethingCan this really be faster? You bet. With the same test data, we can now process 96,900 lines of text per second!
Or to put it another way, this solution is three times as fast as the standard solution, and over seven times faster than the fileinput version.
In Python 2.2 and later, you can loop over the file object itself. This works pretty much like readlines(N) under the covers, but looks much better:
# File: readline-example-5.pyfile = open("sample.txt")for line in file: pass # do somethingIn Python 2.1, you have to use the xreadlines iterator factory instead:
# File: readline-example-4.pyfile = open("sample.txt")for line in file.xreadlines(): pass # do something
{急}请教:用python语言解析xml文件
python学习
关于python
Python咯!~~
什么是Python?
什么是Python?
什么是Python
python难吗?
Python读音
python初学问题
Python的用途是什么
求救!python下载
Perl、Python、REBOL、Ruby???
python.exe是什么?
Python的问题
Python怎么用
python的问题!!!!!!
python的问题,急!!!!!
python的安装问题!!!
什么是Python呀?
python软件如何运行
python 开发工具 中文
python的日期类型转换
初学python的因惑``??