/    Sign up×
Community /Pin to ProfileBookmark

Efficient method for reading x lines at a time of very large text file.

I have some very large text files (anywhere up to 500MB each) which I need to read via PHP and act on the contents. I need to do this on a line by line basis. Obviously using file() in this case will just cause the memory usage to exceed the limit. What I need is some way to read x lines (say 1000), process those lines, read the next 1000 lines and so on until EOF.

At the moment i’m using something like this:

[code=php]
$this->_openFile();

while (!$this->_eof) {

$this->_readLines(1000);

foreach ($this->_lines as $line) {
// do stuff
}

}

$this->_closeFile();
[/code]

_readLines uses fgets to populate $this->_lines array with a line per element.

This works OK, however ideally I’d like to split out the processing and retrieval of the data, because I have different types of file and the processing is different for each. I want to make a base class with the functionality for opening files, reading x lines etc, and a child class which handles the actual processing of the most recent x lines.

However my problem is, if I set the latest x lines for the child class to process, how can I then carry on reading from the next line? So for example if I read the first 1000 lines, how do I pick up again at line 1001?

So just to be clear, I want to do the following:

— BASE CLASS —
1. Take an array of files
2. Open the first file
3. Read x lines from the file
— CHILD CLASS —
4. Process the last x lines
— BASE CLASS —
5. Repeat 3-4 until EOF
6. Repeat 2-5 with each file in the array

Any thoughts on the best way to do this? I suppose what I’m really looking for is some way to read chunks of a file at a time but using lines rather than bytes.

to post a comment
PHP

1 Comments(s)

Copy linkTweet thisAlerts:
@MindzaiauthorMay 26.2009 — In case it helps anyone, the following seems to do the job:

[code=php]
protected function _readFiles() {

if (!$this->_fh) {
$this->_openFile(0);
}

if ($this->_eof) {
$this->_closeFile();
if ($this->_currentFile == count($this->_files) -1) {
return false;
} else {
$this->_openFile($this->_currentFile + 1);
}
}

$this->_readLines();
return $this->_lines;

}

protected function _readLines($numLines = 1000) {

$this->_lines = array();

for ($i = 0 ; $i < $numLines ; $i++) {

if (feof($this->_fh)) {
$this->_eof = true;
return;
}

if ($line = $this->_readLine($this->_offset + $i)) {
$this->_lines[] = $line;
}

}

}

private function _readLine($lineNum) {

$i = 0;

while (!feof($this->_fh)) {
if ($i == $lineNum ) {
return trim(stream_get_line($this->_fh, 1024, "n"));
}
$i++;
}

return false;

}
[/code]


Child:
[code=php]
while ($lines = $this->_readFiles()) {
foreach ($lines as $line) {
// do stuff
}
}
[/code]


I'd still be interested to hear from anyone with a better method though ?
×

Success!

Help @Mindzai spread the word by sharing this article on Twitter...

Tweet This
Sign in
Forgot password?
Sign in with TwitchSign in with GithubCreate Account
about: ({
version: 0.1.9 BETA 5.20,
whats_new: community page,
up_next: more Davinci•003 tasks,
coming_soon: events calendar,
social: @webDeveloperHQ
});

legal: ({
terms: of use,
privacy: policy
});
changelog: (
version: 0.1.9,
notes: added community page

version: 0.1.8,
notes: added Davinci•003

version: 0.1.7,
notes: upvote answers to bounties

version: 0.1.6,
notes: article editor refresh
)...
recent_tips: (
tipper: @AriseFacilitySolutions09,
tipped: article
amount: 1000 SATS,

tipper: @Yussuf4331,
tipped: article
amount: 1000 SATS,

tipper: @darkwebsites540,
tipped: article
amount: 10 SATS,
)...