Gerald,
We
do have window scaling
enabled, but it’s hard set at 32kb per HP’s recommendations.
I will recommend we “supersize” it when we discuss this issue
again.
Bill,
I’m
starting to see the
pattern here, I’ll recommend to HP we set the window as large as it’ll
go. They’re following some chart they have in the user guide based
on CIR and latency, that tells them what window size and scaling factor
to
set. They’re magic formula has indicated we should be using 32kb,
but I’ve performed tests with iperf and seen with larger windows comes
greater data rates. I suspect this is not the issue though, as it
still
fails with high TCP timer expired errors.
Martin,
The
near side is
10.244.249.31/32 to the far side is 10.245.249.31/32. I think
IP
addressing is fine; otherwise, we would have worse problems where
replication would
not even start.
We’re
following HP’s
recommendations and so far they’ve wanted to set the window at
32kb. I agree with you it should be as large as possible, but for now,
this is HP’s show. I will recommend a larger window next time we
discuss the issue with them.
I
agree there is a configuration
problem or timeout that needs to be adjusted or possibly a
malfunctioning NIC
in their FC gateways.
Now
the update:
I
discovered if I disable the
FCP decode, Wireshark does decode it correctly as FCIP.
We applied a QoS config to flag SAN replication traffic as DSCP EF and
have
seen consistent ping times of ~36ms between sites and the bandwidth
climb as
high as 45Mbps on a 1Gbps link. They still fail after replicating for
a
few hours. Last time we watched them replicate for 12 hours and then
fail. The TCP timer exceed counter seems to indicate that is the
problem,
but I have nothing significant on the wireshark captures to support
this.
HP has decided that the MPX110 on the far side needs to be replaced.
I'll
post an update after that's done.