Last N lines of text file with XStreams

Discussion:

(too old to reply)

Denis Kudryashov

2014-06-30 19:05:44 UTC

Hi

I investigate Pharo XStreams port for task to select N last lines of text file. I write such code for this (it works):

s:= FileLocator changes reading .
reversed := [ :out | s -= 1. [s position=0] whileFalse: [ out put: s peek. s -- 1 ] ] reading.
lines := (reversed ending: Character cr asInteger) slicing.
lastLines := (lines limiting: 50) collect: [ :eachReversedLine |
(eachReversedLine rest reversed reading encoding: #utf8) rest].
lastLines reversed

And now I have questions to stream maintainers:

1) Can we add #reversing transformation? (maybe it already exists in VW?)
2) Why "slicing collecting" is not work and "slicing collect:" should be used? Last don't produce stream but it process full source stream with collect block. I know it is possible to build another reading block for "slicing collecting". But what is reason that it's not works out of the box? (It is not clean to me after reading google docs)

s***@gmail.com

2014-07-12 07:16:54 UTC

Permalink

Hi denis

I do not know the answer but I would love to have experience reports about Xtreams so that we integrate them in Pharo.

Stef

Post by Denis Kudryashov
Hi
s:= FileLocator changes reading .
reversed := [ :out | s -= 1. [s position=0] whileFalse: [ out put: s peek. s -- 1 ] ] reading.
lines := (reversed ending: Character cr asInteger) slicing.
lastLines := (lines limiting: 50) collect: [ :eachReversedLine |
(eachReversedLine rest reversed reading encoding: #utf8) rest].
lastLines reversed
1) Can we add #reversing transformation? (maybe it already exists in VW?)
2) Why "slicing collecting" is not work and "slicing collect:" should be used? Last don't produce stream but it process full source stream with collect block. I know it is possible to build another reading block for "slicing collecting". But what is reason that it's not works out of the box? (It is not clean to me after reading google docs)

nicolas cellier aka nice

2015-02-21 22:44:15 UTC

Permalink

Hi Denis, sorry to be so late, but I just discovered this thread.
Though this channel is indicated at https://code.google.com/p/xtreams/ I'm not sure it really is active...
Maybe try again the vwnc mailing list...
Personnally, for the Squeak/Pharo port I'm more on squeak-dev or Pharo-dev.

For answering 1) - No, I don't think it's in VW yet, - and Yes, this is a good idea - maybe you could come with an implementation?

For answering 2) I had similar problems...
see below what I asked to Martin on 29/01/2014.

Cheers

Nicolas Cellier

-------------------------

Hello Martin,
I tried combining complex Xtreams constructions and got somehow surprising results.
For example, the four expressions below lead to four different results in latest Squeak port.
Maybe not all usage are correct, but it does not look that obvious why...
I would expect that all snippet work the same (the last one meets my expectations).
Unfortunately, Xtreams are a bit hard to debug due to combination of deep stacks and intensive usage of Incomplete exception handling.
Can you tell me if latest VW version behaves differently? If not what result would you expect?
There is no urge, but if you have a little time slot to inquire it, I'm curious to know about your opinion :)
Cheers

Nicolas

((1 to: 9) reading limiting: 3) slicing collect: [:e | e inject: 0 into: [:r :n | r + n]].

(((1 to: 9) reading limiting: 3) slicing collecting: [:e | e inject: 0 into: [:r :n | r + n]]) rest.

((((1 to: 9) reading limiting: 3) slicing collect: [:e | e injecting: 0 into: [:r :n | r + n]]) reading stitching transforming: [:in :out | out put: (in get;get;get)]) rest.

((((1 to: 9) reading limiting: 3) slicing collecting: [:e | e injecting: 0 into: [:r :n | r + n]]) stitching transforming: [:in :out | out put: (in get;get;get)]) rest.

-------------------------

The answer was:

-------------------------

Hi Nicolas,

Yes this is a bit messy. First here are the results from VW, the Squeak ones should be the same:

((1 to: 9) reading limiting: 3) slicing collect: [:e | e inject: 0 into:
[:r :n | r + n]].
#(6 15 24 0)

(((1 to: 9) reading limiting: 3) slicing collecting: [:e | e inject: 0
into: [:r :n | r + n]]) rest.
#(0 0 0 0)

((((1 to: 9) reading limiting: 3) slicing collect: [:e | e injecting: 0
into: [:r :n | r + n]]) reading stitching transforming: [:in :out | out
put: (in get;get;get)]) rest.
#()

((((1 to: 9) reading limiting: 3) slicing collecting: [:e | e injecting: 0
into: [:r :n | r + n]]) stitching transforming: [:in :out | out put: (in
get;get;get)]) rest.
#(6 15 24)

The first two are due to particular behavior of CollectReadStream. When you ask for #rest from a stream it takes a large buffer and attempts to read it full. So the collect stream actually receives a read for a large amount of elements. Largely for performance reasons it first attempts to read same amount of elements from the underlying stream (to avoid making lots of tiny calls to read individual elements) and only after that it runs the block over what it gets back. This interacts badly with #slicing, because only a single slice can be valid at any given time, so the collect stream basically gets bunch of exhausted slices of which only the last one is active, and then it runs the collecting block on each of them, thus the surprising zeros. Note that if instead of block reads you use single element reads, (e.g. collect: #yourself, instead of rest) then the second example will give you the same result as the first.

The third example also suffers from a self inflicted variation of the above. The collect: call is going to return a collection of exhausted slices except the last one (which is empty). Then it adds injecting: on top of those and stitches them all together, but they are all empty at this point, so by the time the transforming: block gets to them there's nothing there, so the first #get there will Incomplete and that's the end, thus the empty result.

Fourth example is a bit of a mind-bender, but I think the result is as it should be. You're basically stitching together the 3 element slices but transformed by the injecting: to be progressive sums of the original elements. So the stitched content is basically #(1 3 6 4 9 15 7 15 24). Then you run it through the transform that drops two elements and emits the third. So #(6 15 24) looks correct to me. The #rest is probably not wreaking havoc here because the collect stream is shielded by the stitching layer (or some such).

Makes sense?

The key thing to remember is that block reads on transforms are generally greedy and substreams cannot coexist at the same time. So any combination of the two usually yields disappointing results.

Martin

-------------------------

So take it as a known limitation...

nicolas cellier aka nice

2015-02-21 22:58:59 UTC

Permalink

Post by nicolas cellier aka nice

And finally, there's been an effort to document these limitations at https://code.google.com/p/xtreams/wiki/Substreams

Post by nicolas cellier aka nice
Cheers
Nicolas Cellier
-------------------------
Hello Martin,
I tried combining complex Xtreams constructions and got somehow surprising results.
For example, the four expressions below lead to four different results in latest Squeak port.
Maybe not all usage are correct, but it does not look that obvious why...
I would expect that all snippet work the same (the last one meets my expectations).
Unfortunately, Xtreams are a bit hard to debug due to combination of deep stacks and intensive usage of Incomplete exception handling.
Can you tell me if latest VW version behaves differently? If not what result would you expect?
There is no urge, but if you have a little time slot to inquire it, I'm curious to know about your opinion :)
Cheers
Nicolas
((1 to: 9) reading limiting: 3) slicing collect: [:e | e inject: 0 into: [:r :n | r + n]].
(((1 to: 9) reading limiting: 3) slicing collecting: [:e | e inject: 0 into: [:r :n | r + n]]) rest.
((((1 to: 9) reading limiting: 3) slicing collect: [:e | e injecting: 0 into: [:r :n | r + n]]) reading stitching transforming: [:in :out | out put: (in get;get;get)]) rest.
((((1 to: 9) reading limiting: 3) slicing collecting: [:e | e injecting: 0 into: [:r :n | r + n]]) stitching transforming: [:in :out | out put: (in get;get;get)]) rest.
-------------------------
-------------------------
Hi Nicolas,
((1 to: 9) reading limiting: 3) slicing collect: [:e | e inject: 0 into: [:r :n | r + n]].
#(6 15 24 0)
(((1 to: 9) reading limiting: 3) slicing collecting: [:e | e inject: 0
into: [:r :n | r + n]]) rest.
#(0 0 0 0)
((((1 to: 9) reading limiting: 3) slicing collect: [:e | e injecting: 0
into: [:r :n | r + n]]) reading stitching transforming: [:in :out | out
put: (in get;get;get)]) rest.
#()
((((1 to: 9) reading limiting: 3) slicing collecting: [:e | e injecting: 0
into: [:r :n | r + n]]) stitching transforming: [:in :out | out put: (in
get;get;get)]) rest.
#(6 15 24)
The first two are due to particular behavior of CollectReadStream. When you ask for #rest from a stream it takes a large buffer and attempts to read it full. So the collect stream actually receives a read for a large amount of elements. Largely for performance reasons it first attempts to read same amount of elements from the underlying stream (to avoid making lots of tiny calls to read individual elements) and only after that it runs the block over what it gets back. This interacts badly with #slicing, because only a single slice can be valid at any given time, so the collect stream basically gets bunch of exhausted slices of which only the last one is active, and then it runs the collecting block on each of them, thus the surprising zeros. Note that if instead of block reads you use single element reads, (e.g. collect: #yourself, instead of rest) then the second example will give you the same result as the first.
The third example also suffers from a self inflicted variation of the above. The collect: call is going to return a collection of exhausted slices except the last one (which is empty). Then it adds injecting: on top of those and stitches them all together, but they are all empty at this point, so by the time the transforming: block gets to them there's nothing there, so the first #get there will Incomplete and that's the end, thus the empty result.
Fourth example is a bit of a mind-bender, but I think the result is as it should be. You're basically stitching together the 3 element slices but transformed by the injecting: to be progressive sums of the original elements. So the stitched content is basically #(1 3 6 4 9 15 7 15 24). Then you run it through the transform that drops two elements and emits the third. So #(6 15 24) looks correct to me. The #rest is probably not wreaking havoc here because the collect stream is shielded by the stitching layer (or some such).
Makes sense?
The key thing to remember is that block reads on transforms are generally greedy and substreams cannot coexist at the same time. So any combination of the two usually yields disappointing results.
Martin
-------------------------
So take it as a known limitation...