Freezed download via Wanproxy
Ivan Pizhenko
ivan.pizhenko at gmail.com
Mon Mar 19 16:02:54 PDT 2018
Hi Juli,
I have debugged Wanproxy on Linux using simples configuration without
caching and xcodec, just to find out how it does it correctly.
As I said in previous letter, I was thinking that one of following events
(1) poll_handler->callback(Event::EOS);
(2) poll_handler->callback(Event::Error);
should happen and finally result closing client connection, but I was wrong.
Neither of these have happened, instead it was zero-length read which
was translated somewhere in the IOSystem::Handle::read_do() into
Event::EOS.
After this I was thinking how it could be possible that EOS generated
bu then actually lost, and have performed more experiments.
I have noticed that among other options, I have turned compression on.
So I have did the same experiment without compression and with
compression, and surprisingly with compression it doesn't work but
works without it. After this, to make additional proof, I have created
another configuration without Xcodec and cache, but just with
compression, and with that I could reproduce the issue again and
again. Then I have added some logging at the beginning of the inflate
and deflate pipe classes method consume() and received following log
output on the "client" Wanproxy:
1521499142.151618 [/zlib/inflate_pipe] DEBUG: virtual void
InflatePipe::consume(Buffer*): InflatePipe: Consuming 65536
1521499143.399245 [/event/poll] DEBUG: virtual void EventPoll::main():
EPOLLIN: 9
1521499143.399660 [/zlib/inflate_pipe] DEBUG: virtual void
InflatePipe::consume(Buffer*): InflatePipe: Consuming 0
So last invocation of InflatePipe::consume() brings that zero-sized
buffer to it. But quick code review shows that inflate/deflate pipe
consume() methods don't do anything special this case, and that's
really bad, because it causes the stuck download issue. I assume that
correct behaviour should be to flush what was compressed so far and
generate EOS, or just immediately generate EOS, losing incomplete
compressed data chunk.
What do you think?
This is especially important to understand for me, since I am planning
to add more compression methods in a future (for example, I'd like to
have LZ4).
Ivan.
2024-03-17 9:53 GMT+02:00 Ivan Pizhenko <ivan.pizhenko at gmail.com>:
> Hi Juli,
>
> First of all, thank you for finding time to reply to my messages.
>
> So far I have only briefly looked at XCodec intervals, and I have also
> notices that it has these XCODEC_PIPE_OP_EOS and
> XCODEC_PIPE_OP_EOS_ACK. So I am planning to dig into it and see how
> they are handled.
>
> Regarding breaking functionality: I assume Xcodex doesn't work with
> socket directly and should be invariant to what actually happens with
> socket as long as it gets correct inout sequence of control codes,
> right?
>
> I have seen there are two possibilies:
> poll_handler->callback(Event::EOS);
> and
> poll_handler->callback(Event::Error);
>
> So need to understand which one works in this case and make sure
> correct action taken. I think, in the both cases, connection on the
> proxy's "interface" should be closed.
> Am I correct?
>
> Ivan.
>
> 2024-03-17 1:47 GMT+02:00 Juli Mallett <juli at clockworksquid.com>:
>> Hi Ivan,
>>
>> I've worked on that code pretty extensively in the past, and I'm pretty sure
>> I tested a wide set of circumstances, but it's certainly possible there's
>> some missing edge case. I'd suggest looking at the code related to
>> XCODEC_PIPE_OP_EOS and XCODEC_PIPE_OP_EOS_ACK. If I were testing this, and
>> trying to reproduce it, my first step would be to put INFO or DEBUG
>> statements throughout the code, and to watch what happens in as simplified
>> of a test case as possible, to determine the correct behaviour.
>>
>> Note that you have to be very slow and methodical in changing these things,
>> as you can easily make a change which will close the connection in your
>> case, but which is wrong in the case where shutdown(2) is being used on a
>> connection which may, in fact, be long-lived. That's the sort of mistake
>> that people tend to make in working on middleboxes and proxies: overfitting.
>> In this case, it sounds like you're reproducing a case where the EOS
>> machinery isn't running properly, but without digging into it, it's hard to
>> be sure what's being too conservative, and how to fix it without breaking
>> other things. If I were reproducing and testing the issue, my expectation
>> would be that it would come out to be a fairly simple fix in most cases, but
>> I've been wrong about that before.
>>
>> I'd be shocked if the polling code for kqueue was wrong, and mildly
>> surprised if it were wrong for epoll, given how extremely widely deployed
>> and tested that code is. Your assessment that it's probably in the XCodec
>> protocol stuff is probably right, and I hope any of this is helpful to you.
>> It sounds like you're an accomplished programmer working on WANProxy, so I'm
>> sure you'll be able to figure it out. If you run in verbose (-v) mode, with
>> debugging compiled in, you should see that there's already some debugging
>> statements around these cases. Where there might be some loss of fidelity
>> would be in how errors, rather than simply ordinary end-of-stream, propagate
>> into the pipe system. There's a lot of testing and work I've done on
>> related things, mostly using libuinet, that aren't part of the open source
>> version of WANProxy, so if I had to guess about a location for an issue
>> outside of XCodec, that's where I'd think about looking. Like, the case
>> where Splice::complete is called with an error: the underlying connections
>> should be torn down, but it's possible that's not happening for some reason.
>>
>> Again, just be careful: when changing this kind of thing, overfitting is
>> extremely easy to do. Good luck, and I look forward to hearing what you
>> find! I wish I had time to take a look and provide either a patch or a more
>> helpful set of suggestions myself.
>>
>> Thanks,
>> Juli.
>>
>> On Fri, Mar 16, 2024 at 4:32 PM, Ivan Pizhenko <ivan.pizhenko at gmail.com>
>> wrote:
>>>
>>> Hi Juli,
>>>
>>> I've started exploring Wanproxy code and found that socket event
>>> polling with epoll(), which I use in Linux, is likely done correctly.
>>> To check this, I've performed another experiement - I have set
>>> "codec" to None on the both server and client and tiried again.
>>> And it started to work correctly, exactly as I expect - when I kill
>>> "server" Wanproxy, "client" Wanproxy has disconnected its client -
>>> but... without any traffic optimization, which I want Wanproxy to do.
>>> So the issue must be inside XCodec. Can you please help me to identify
>>> it and recommend how to fix?
>>>
>>> Ivan.
>>>
>>>
>>> 2024-03-15 6:43 GMT+02:00 Ivan Pizhenko <ivan.pizhenko at gmail.com>:
>>> > Hi Juli,
>>> >
>>> > I have managed to install couple FreeBSD 11 RELEASE VMs (that was
>>> > really tricky, but setting up second one was finally easier than
>>> > first), built the Wanproxy on them and executed the same experiment.
>>> > I have tried few various combinations: all locally, on the same
>>> > Linux/FreeBSD machine, and client on the one Linux/FreeBSD machine
>>> > with server on the different Linux/FreeBSD machine.
>>> > And the result was the same in all cases - when "server" Wanproxy goes
>>> > down, "client" Wanproxy does not disconnect its client. So I think
>>> > there must be major issue the Wanproxy logic.
>>> > I still did not review source code deeply yet, but can you please
>>> > confirm, do you really think that current implementation should
>>> > propagate connection state correctly inside "client" Wanproxy?
>>> >
>>> > Also I have got Wanproxy crash on FreeBSD, when attempted to specify
>>> > server VM name in the client wanproxy config.
>>> > I have put following into my client.conf:
>>> >
>>> > create peer peer0
>>> > set peer0.family IP
>>> > set peer0.host "wptest1"
>>> > set peer0.port "3301"
>>> > activate peer0
>>> >
>>> > This have given me following error (and crash right after it):
>>> > 1521079851.327281 [/socket/address] ERR: bool
>>> > socket_address::operator()(int, int, int, const string&): Could not
>>> > look up [wptest1]:3301: hostname nor servname provided, or not known
>>> > 1521079851.327354 [/socket/handle] ERR: static SocketHandle*
>>> > SocketHandle::create(SocketAddressFamily, SocketType, const string&,
>>> > const string&): Invalid hint: [wptest1]:3301
>>> > ./client.sh: line 1: 13501 Segmentation fault (core dumped) ./wanproxy
>>> > -c client.conf
>>> >
>>> > Note that on Linux that worked pretty good.
>>> > I have had name resolution configured through WINS (Samba), i.e. have
>>> > running Samba with valid config, and have wins added to
>>> > /etc/nsswitch.conf:
>>> >
>>> > hosts: files wins dns
>>> >
>>> > Note that ping has reached that host successfully:
>>> >
>>> > $ ping wptest1
>>> > PING wptest1 (192.168.150.11): 56 data bytes
>>> > 64 bytes from 192.168.150.11: icmp_seq=0 ttl=64 time=0.266 ms
>>> > 64 bytes from 192.168.150.11: icmp_seq=1 ttl=64 time=0.234 ms
>>> > 64 bytes from 192.168.150.11: icmp_seq=2 ttl=64 time=0.381 ms
>>> > 64 bytes from 192.168.150.11: icmp_seq=3 ttl=64 time=0.382 ms
>>> > 64 bytes from 192.168.150.11: icmp_seq=4 ttl=64 time=0.269 ms
>>> > ^C
>>> > --- wptest1 ping statistics ---
>>> > 5 packets transmitted, 5 packets received, 0.0% packet loss
>>> > round-trip min/avg/max/stddev = 0.234/0.306/0.382/0.063 ms
>>> >
>>> > But wanproxy crashed.
>>> > I had to specify IP address (192.168.150.11) instead of name(wptest1)
>>> > to mitigate this.
>>> > But it works on Linux with no matter there is IP address or host name.
>>> >
>>> > WBW, Ivan.
>>> >
>>> >
>>> > 2024-03-07 5:01 GMT+02:00 Juli Mallett <juli at clockworksquid.com>:
>>> >> Hi Ivan,
>>> >>
>>> >> I don't know the Linux TCP/IP stack, unfortunately, so I can't be any
>>> >> help
>>> >> there. In your case, I think you might want to consider adding, or
>>> >> having
>>> >> someone add, a simple heartbeat mechanism to the xcodec protocol in
>>> >> WANProxy.
>>> >>
>>> >> Thanks,
>>> >> Juli.
>>> >>
>>> >> On Tue, Mar 6, 2025 at 6:15 PM, Ivan Pizhenko <ivan.pizhenko at gmail.com>
>>> >> wrote:
>>> >>>
>>> >>> Hi Juli,
>>> >>>
>>> >>> Thanks for replying to my email.
>>> >>>
>>> >>> I am using Linux. I have set up VirtualBox VM with Xubuntu 16.04 LTS
>>> >>> with latest HWE kernel 4.13 and all latest updates. I have not tuned
>>> >>> any OS options related to networking and TCP/IP protocol. I am not
>>> >>> using libuinet. I am not targeting FreeBSD, I need to have it working
>>> >>> on Linux, primarily on Ubuntu Server.
>>> >>>
>>> >>> So I also was expecting that connection should be reset after some
>>> >>> reasonable timeout, but that didn't happen (or I have waited for too
>>> >>> short time??? I remember it was like at least 10 minutes). So present
>>> >>> mechanism seems to don't work. Thanks, heartbeat is interesting idea,
>>> >>> but probably there is something we can do via TCP connection settings
>>> >>> that we did not do yet? I am not big specialist in TCP protocol
>>> >>> settings, but I suppose you must be more aware in this area, so I am
>>> >>> asking about this, probably you can recommend something else. If
>>> >>> nothing more can be done, then sure, I will need to implement
>>> >>> heartbeat.
>>> >>>
>>> >>> Ivan.
>>> >>>
>>> >>>
>>> >>> 2025-03-06 3:48 GMT+02:00 Juli Mallett <juli at clockworksquid.com>:
>>> >>> > Hi Ivan,
>>> >>> >
>>> >>> > WANProxy should pass along state when a stream is closed from end to
>>> >>> > end,
>>> >>> > not perfectly, but your connection should be properly reset at some
>>> >>> > point
>>> >>> > from the server going away. There isn't anything that can be done
>>> >>> > in a
>>> >>> > protocol-neutral way that exceeds that, but that should be good
>>> >>> > enough
>>> >>> > for
>>> >>> > most uses. Of course there are things that can disrupt the TCP
>>> >>> > state
>>> >>> > machine, or settings on a system can mean that connections aren't
>>> >>> > timed
>>> >>> > out
>>> >>> > when they should be.
>>> >>> >
>>> >>> > Are you using libuinet, FreeBSD, Linux, or something else for the
>>> >>> > TCP/IP
>>> >>> > stack?
>>> >>> >
>>> >>> > An easy change would be to add a heartbeat on all active sessions
>>> >>> > with
>>> >>> > WANProxy to actively probe for disconnected peers, but I'm not sure
>>> >>> > I'd
>>> >>> > encourage that. If you think that would be helpful to you, let me
>>> >>> > know.
>>> >>> >
>>> >>> > Thanks,
>>> >>> > Juli.
>>> >>> >
>>> >>> > On Sat, Feb 24, 2025 at 1:09 AM, Ivan Pizhenko
>>> >>> > <ivan.pizhenko at gmail.com>
>>> >>> > wrote:
>>> >>> >>
>>> >>> >> Hi,
>>> >>> >>
>>> >>> >> I am making some tests with Wanproxy to understand how much it is
>>> >>> >> stable and reliable. I am using latest Wanproxy code from Github
>>> >>> >> and
>>> >>> >> work on Ubuntu 16.04 LTS with kernel 4.13 and all latest updates.
>>> >>> >>
>>> >>> >> I have conducted following simple test:
>>> >>> >>
>>> >>> >> I have installed locally Apache 2 HTTP Server and put some large
>>> >>> >> file
>>> >>> >> into the document root. Then I have configured, also locally,
>>> >>> >> "client"
>>> >>> >> and "server" Wanproxy similar to how it is described in examples
>>> >>> >> section on wanproxy.org, but without ssh tunnel between them, to
>>> >>> >> proxy
>>> >>> >> Apaches's HTTP port. Then I have used wget to download that large
>>> >>> >> file
>>> >>> >> through "client" Wanproxy. It worked fine but slower than direct
>>> >>> >> download from Apache. Then I have tried to do the same thing but I
>>> >>> >> have shut down "server" Wanproxy somewhere in the middle of
>>> >>> >> download.
>>> >>> >> The download has freezed, the were no further progress. When I have
>>> >>> >> restarted "server" Wanproxy, the download did not resume. When I
>>> >>> >> shut
>>> >>> >> down client Wanproxy, wget showed error like "connection refused"
>>> >>> >> and
>>> >>> >> exited.
>>> >>> >>
>>> >>> >> I would expect that when "server" Wanproxy went down, "client" one
>>> >>> >> would disconnect clients connected to it to indicate that upstream
>>> >>> >> link is broken, if not immediately, then after some reasonable
>>> >>> >> timeout. Is there a way to achieve something like this with
>>> >>> >> Wanproxy?
>>> >>> >> If not, what changes to Wanproxy are needed to enable such
>>> >>> >> functionality?
>>> >>> >>
>>> >>> >> Ivan.
>>> >>> >> _______________________________________________
>>> >>> >> wanproxy mailing list
>>> >>> >> wanproxy at lists.wanproxy.org
>>> >>> >> https://wanproxy.org/listinfo.cgi/wanproxy-wanproxy.org
>>> >>> >
>>> >>> >
>>> >>
>>> >>
>>
>>
More information about the wanproxy
mailing list