(Sorry in advance this is so long...I'm trying to braindump as much of my troubleshooting notes as possible in case any of it clicks with anyone...)
I have been troubleshooting numerous server errors being returned to the browser. This has been happening occasionally for a while -- perhaps a year -- but has increased greatly in the past month or so, to the point of causing substantial problems for our K-12 students' achievement.
Students occasionally get a 500 status code and this web page:
This server has encountered an internal error which prevents it
from fulfilling your request. The most likely cause is a misconfiguration.
Please ask the administrator to look for messages in the server's error log.
I'm fairly certain this error is generated by the web server software (Sun Java System Web Server 7.0) It shows a line such as this in the error log every time the error occurs:
|[26/Oct/2012:12:44:32] failure ( 1460): for host 220.127.116.11 trying to POST /cgi-stu/foxweb.exe/quiz@/db/stu/stu, cgi_scan_headers reports: HTTP4044: the CGI program e:/cgi/pub/foxweb.exe did not produce a valid header (program terminated without a valid CGI header. Check for core dump or other abnormal termination)
...and this in the access log:
|18.104.22.168 - [26/Oct/2012:12:44:32 -0400] "POST /cgi-stu/foxweb.exe/quiz@/db/stu/stu HTTP/1.1" 500 305 "http://dl.ket.org/cgi-stu/foxweb.exe/quiz@/db/stu/stu" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4" POST /cgi-stu/foxweb.exe/quiz@/db/stu/stu - HTTP/1.1 www.dl.ket.org
These two logs are the only place I can reliably detect the error has occurred, and unfortunately they're not very helpful in exploring the cause of the error.
The problem seems loosely related to traffic load, but even then, we are not heavily loaded. If a class of 20 students are all told to open a particular page at once, there are fair odds the error will occur for at least some of them, and it seems to snowball when it does occur. The problem is sporadic -- a reload will often succeed.
I cannot recreate it on demand, but do occasionally see it myself.
There are no entries in FWStart.log, the errors table, or Windows event log that correspond to the appearance of these errors. I do some logging ASAP in fw_enter.prg, and no activity registers there for these hits. I suspect foxweb.exe is not getting far enough to connect to vfp, or perhaps foxweb.exe is not even being called by the web server correctly in the first place.
We lose FoxWeb channels over time. We used to run a dozen channels, and would occasionally find only five present in fwstatus a week later. Early this week, I increased this to 20 channels and 3,000MB just to ensure more channels will survive.
The first entry in our FWStart.log is about two years old: 11/30/2010 16:28:02 Starting FoxWeb . Two days ago, this error appeared for the first time in FWStart.log:
10/23/2012 12:49:08 Error: 1429 Message: "OLE IDispatch exception code 61704 from FoxWebCtrl: Internal application error..." Program: "PROCEDURE FWMAIN C:\PROGRAM FILES\FOXWEB\FWSERVER.EXE" Line Number: 176 Line: "..." Channel: 16
It has happened 274 times since then.
Later that day, I saw this error for the first time ever:
|10/23/2012 13:42:35 Error: 2005 Message: "Error loading file - record number 3. FRMSCK <or one of its members>. Loading form or the data environment : There is not enough memory to complete this operation" Program: "PROCEDURE FWMAIN C:\PROGRAM FILES\FOXWEB\FWSERVER.EXE" Line Number: 165 Line: "..." Channel: 7
That error has occurred 118 times since then.
I've also recently started seeing various channels get stuck in 30-second restart cycles:
|10/25/2012 05:18:48 Restarting Channel 20
10/25/2012 05:18:52 Restarting Channel 15
10/25/2012 05:18:57 Restarting Channel 19
10/25/2012 05:19:14 Restarting Channel 6
10/25/2012 05:19:18 Restarting Channel 20
10/25/2012 05:19:22 Restarting Channel 15
10/25/2012 05:19:27 Restarting Channel 19
10/25/2012 05:19:44 Restarting Channel 6
10/25/2012 05:19:48 Restarting Channel 20
10/25/2012 05:19:52 Restarting Channel 15
10/25/2012 05:19:57 Restarting Channel 19
10/25/2012 05:20:14 Restarting Channel 6
10/25/2012 05:20:18 Restarting Channel 20
10/25/2012 05:20:22 Restarting Channel 15
10/25/2012 05:20:27 Restarting Channel 19
10/25/2012 05:20:44 Restarting Channel 6
10/25/2012 05:20:48 Restarting Channel 20
10/25/2012 05:20:52 Restarting Channel 15
10/25/2012 05:20:57 Restarting Channel 19
10/25/2012 05:21:14 Restarting Channel 6
10/25/2012 05:21:18 Restarting Channel 20
10/25/2012 05:21:22 Restarting Channel 15
10/25/2012 05:21:27 Restarting Channel 19
10/25/2012 05:21:44 Restarting Channel 6
10/25/2012 05:21:48 Restarting Channel 20
10/25/2012 05:21:52 Restarting Channel 15
10/25/2012 05:21:57 Restarting Channel 19
10/25/2012 05:22:14 Restarting Channel 6
Our FoxWeb script timeout is 30 seconds. As a test, I changed it to 60 seconds and restarted FoxWeb. Surprisingly, the restart interval is still 30 seconds:
10/26/2012 15:50:31 Restarting Channel 4
10/26/2012 15:50:59 Restarting Channel 20
10/26/2012 15:51:01 Restarting Channel 4
10/26/2012 15:51:29 Restarting Channel 20
10/26/2012 15:51:31 Restarting Channel 4
10/26/2012 15:51:59 Restarting Channel 20
10/26/2012 15:52:01 Restarting Channel 4
10/26/2012 15:52:29 Restarting Channel 20
10/26/2012 15:52:31 Restarting Channel 4
10/26/2012 15:52:50 Closing FoxWeb
10/26/2012 15:52:55 Starting FoxWeb
10/26/2012 15:53:34 Restarting Channel 18
10/26/2012 15:53:35 Restarting Channel 19
10/26/2012 15:53:35 Restarting Channel 20
10/26/2012 15:54:04 Restarting Channel 18
10/26/2012 15:54:05 Restarting Channel 19
10/26/2012 15:54:05 Restarting Channel 20
10/26/2012 15:54:35 Restarting Channel 19
10/26/2012 15:54:35 Restarting Channel 20
10/26/2012 15:55:05 Restarting Channel 19
Looking in Windows Event Log has not revealed much. I do see these:
From the Windows Event Log, Application Log:
|Event Type: Error
Event Source: Application Error
Event Category: (100)
Event ID: 1000
Time: 7:41:35 AM
Faulting application vfp9.exe, version 22.214.171.1242, faulting module vfp9.exe, version 126.96.36.1992, fault address 0x0002f454.
I see lots of these, but not necessarily in synch with anything in FWStart.log, or the observed "Server Error" messages appearing in the browser. I also rarely see a dialogue box on the server's desktop announcing VFP has died and asking if I want to report it to Microsoft. I've seen that a handful of times this week -- normally I won't see those for months.
This is probably not relevant, but if I have FWStatus running when I restart FoxWeb, the FWStatus window will sometimes (not always) start flickering, redrawing itself constantly and ignoring input. It taps CPU utilization, and I have to kill it via Task Manager. I've gotten into the habit of stopping FWStatus before restarting FoxWeb.
As an odd twist, fwadmin.fwx works, except for show_channels.fwx. In fact, you can see the Server Error I'm chasing by hitting that page: http://www.dl.ket.org/cgi-sta/foxweb.exe/show_channels.fwx That is the *only* link on the site that consistently generates the Server Error...but I do not think it is failing for the same reason as the sporadic errors our users are seeing.
Here is our Server Info page (with some data redacted)
|Virtual Program Root
|Main Program Root
||10/26/2012 04:18:12 PM
||Visual FoxPro 09.00.0000.2412 for Windows
|VFP Compile Version
|VFP Temporary Path
|Buffer Memory Size
||Company: KET Servers: 1
© Copyright Aegis Group
Well...that's about it for my observations.I can share more complete logs upon request.
Here is what I'm doing to try to solve this:
In the past few days, I have forced a recompile of every .prg and .fwx file on the site (some were been compiled ten years ago, under who-knows-what version of VFP) after reading the FoxWeb release notes. No improvement.
I've been applying some long-overdue Windows Updates. Not finished yet, but the "critical" updates are all applied at least.
I even ran a memory test suite (memtest86) with no errors. (I'm willing to pursue *any* option right now!)
Last year, we moved to new/fresh/virtual/physical servers in various configurations without improvement. The problem was less pronounced then, so we shrugged through it. The errors are now to a level that is not acceptable to our students.
Things I have on my roadmap to try:
- Upgrade FoxWeb 4.3 to the latest (6.6.3)
- Continue all Windows Updates
- Check for VFP updates
- Update Sun Server to latest version
- Replace Sun server with IIS (hope to avoid this...we have no expertise with IIS)
Whew...if you're still reading this, thanks for your persistence. If anyone has any suggestions, I'd GREATLY appreciate any insight.
Director of Broadcast and Educational Technology
Kentucky Educational Television
(859)258-7164 - (800)333-9764