GCN System Status

When something happens to the system that causes a loss of sevice,
then status information will be posted here.

08 Aug 08 3 BREIF SWIFT GCN OUTAGES (9min, 56min, & 45min):
There were 3 outages of Swift_TDRSS-to-GCN connection: 02:31-02:40,
09:47-10:43, and 10:56-11;40 UT. There were no Swift bursts during these intervals.
None of the other mission connection nor services of GCN were affected.

08 Jun 08 GCN OUTAGE (14.1 hours):
At 01:3? UT (08jun08) there was a problem with the power that caused all the computers
in the Low Energy Gamma-ray Group to shutdown (this includes the GCN computer: capella).
This is a repeat of yesterday's problem!
Between 01:3? and 15:48 UT (08jun08) GCN was off-line.
Currently, the power is OK and the GCN programs have been restarted.
Since there was some suspicion that the UPSs were the cause of the outage,
and since it repeated within 24 hours, I replaced the UPSs with new units
(bought as replacements and scheduled for replacement in the near future anyway).
Time will tell if this fixes the problem. (We have been having T-storms in the
last few days, and there have been numerous outages lasting 10's to 1000's of millisec.)

07 Jun 08 GCN OUTAGE (1.1 hours):
At 10:54 UT (07jun08) there was a problem with the power that caused all the computers
in the Low Energy Gamma-ray Group to shutdown (this includes the GCN computer: capella).
(The root cuase of this power glitch is as yet unknown.)
Between 10:54 and 12:04 UT (07jun08) GCN was off-line.
Currently, the power is OK and the GCN programs have been restarted.

14 May 08 SWIFT TDRSS OUTAGE (1.1 hours):
Between 19:43 and 20:52 UT (14may08) there was a problem in the socket connection
the TDRSS Ground Station and GCN. This resulted in a loss of the telemetry srteam
from the Swift spacecraft and GCN. All the rest of GCN fucntionality and connectivity
to all the other missions was OK. There were no Swift bursts during this interval.

05 Apr 08 GCN OUTAGE (3.6 + 2.3 hours):
The Goddard-wide main gateway/router and the Bluiding 2 gateway were being upgraded today.
All GRB functionality/services were off-line (Notices, Circulars, Reports;
incoming and outgoing). The outage started at 10:32 UT and ended at 14:11 UT.
Then the network came back for 1.0 hours, then stopped from 15:10 to 17:25 UT (2.3 hrs).

19 Mar 08 GCN SLOWDOWN (2.0 hours):
Due to the massive amount of traffic from the back-to-back burst tonight
the GCN computer is suffering from a high load factor. This has resulted
in delayed email delivery (both Notices and Circulars). Please note
that this has NOT efffect the socket distribution -- they went in milliseconds
for both bursts. But the email delivery of the Notices and Circulars has been delayed;
especially for the Circulars (several hours for some customers; and more so
for the later Circulars submitted in tonight's series of many follow-up obseervations.
I have taken steps to clear the email backlog. The distrubtion rate is increasing,
but there is still a backlog.
My apologies for the inconvenience and confusion these late emails have caused.

12 Mar 08 GCN OUTTAGE (2.0 hours):
From 16:50 to 18:50, the GCN was off-line (Goddard network switch-over problems).
All of GCN was out: the socket connections, the Notices, the Circulrs.
The Goddard network people said the upgrade switch-over would take 5 seconds,
but afer doing the upgrade they discovered that some routers downstream of the one being upgraded
were no longer compatible with the new upgraded unit.

23 Jan 08 NOTICES OUTAGE (9 hours):
From 04:30 to 13:25, the Notices portion of GCN was off-line (due to a software problem).
The problem started a few minutes after the last notice was distributed for GRB 080123
so there was no loss on the burst to the community. (And of course the Circulars portion
continued to function normally.)

13 Jan 08 BRIEF LOSS OF SOME OF THE WEB PAGES:
From around noon 12 Jan to 11am 13 Jan 08, about 20% of the top-level web pages
in the GCN web site were deleted (due to a stupid mistake on my part).
Everything should be back in place now. If you notice anything missing/old
please tell me (as is always a standing request on anything/anytime
you see wrong or could use improvement).

01 Aug 07 ACCIDENTAL RE-DISTRIBUTION OF SWIFT-BAT_POSITION NOTICE:
While testing out some new code to use the Swift_MOC SERS messages as a backup
to the real-time TDRSS (when there is a TDRSS outage; like last week), I accidentally
distributed a BAT_POSITION Notice for GRB 070729. I thought I had the block2world
active for this test, but no.

26 Jul 07 SOLUTION TO IN-LIMBO SOCKET PROBLEM:
The problem that caused the loss of notification of the two Swift burst 5 days ago
has been solved. Normally, there are demons and watchdogs inplace that monitor
for the loss of any of the socket connections between the various programs
that make up the GCN system. But last Saturday a new problem occurred that left
the socket connections in place, but they were not actually able to pass data.
A new demon/watchdog is in place (and tested) that detects this "in limbo" problem,
and it alerts me within 2 minutes of this occurance.

21 Jul 07 TWO BURSTS LOST DUE:
Two Swift bursts were not distributed to the world because of a problem
with the communications between two programs within the suite of programs that make up GCN.
See full announcment.

09 Mar 07 22:00 UT CIRCULARS and NOTICES WEB PAGE UPDATE DELAYS:
The archive pages for the Circulars and Notices was delayed in being updated
because the computer sys-adminstrators here in building 2 at Goddard were upgrading
all the machines with new op-systems that will handle the new Daylight Savings Time change correctly.
All is fixed.
This all happened for about 3-5 hours this afternoon.

06 Feb 07 20:00 UT NETWORK PROBLEMS AT GODDARD THIS WEEKEND:
The Goddard Center Network people were working on part of the network
and broke the connection of the TDRSS Swift telemetry stream for 4.4 hrs (15:19-19:43 UT).
Towards the end of that window Swift-BAT triggered on what turned out to be
a cosmic ray shower in the spacecraft and BAT instrument.

08 Jan 07 19:00 UT    NETWORK PROBLEMS AT GODDARD THIS WEEKEND: REALLY FIXED:
The final problem was solved, and now the email traffic is flowing with no delays.
08 Jan 07 14:00 UT    NETWORK PROBLEMS AT GODDARD THIS WEEKEND: STILL RESIDUAL PROBLEMS:
Well, I/they spoke too soon. Most of the functions came back, but there is still
some residual delays in some email deliveries. From my limited testing
the delays seems to now be in the 0.5 - 2 min range.
07 Jan 07 23:50 UT    NETWORK PROBLEMS AT GODDARD THIS WEEKEND: FIXED:
The Goddard Network people were doing some modifications to the network
this weekend and never bothered to announce it -- not to worry; they are going
catch it from Goddard management about this snafu. Things seem almost
back to normal (as of midnight). There appears to still be a few emails trickling
out of the backlogged queues, but for the most part, things are flowing again.
07 Jan 07    NETWORK PROBLEMS AT GODDARD THIS WEEKEND:
There are on-going network problems inside GSFC that are causing delays
in the distribution of email-based notifications (the socket connections are fine;
only the email are slow). The delays are time variable and range from 1-5 min.
Requests have been submit to the IT Service branch.

30 Nov 06 GCN OUTAGE (3.0 hrs):
There were network problems inside GSFC that cause GCN to be effectively off-line
for up to 3.0 hours (10:39 to 13:42 UT). I say "up to" because some socket sites
were still connected up to 11:16 UT and some communications were restored
before 13:42 UT. Normal operations have been restored.

28 Nov 06 SWIFT TDRSS DATA CONNECTION TO GCN OUTAGE:
From 10:11 to 14:10 UT, the telemetry connection for the Swift TDRSS data feed to GCN was out.
The outage is over and data is flowing again (total lost 239 minutes).
(From the full data sets, we know that there were no Swift bursts during that time.)

31 Oct 06 PROBLEMS DURING ON-ORBIT XRT POINTING TEST (PART 2):
I have fixed the problem that allowed some of the XRT Notices
to be distributed during this afternoon's XRT on-orbit pointing
alignment testing. The problem had to do with the lack of all
the other messages that come down the TDRSS link when a normal
trigger happens. During this test _all_ the other messages
(all the BAT-related, all the FOM-related, and all the UVOT-related
messages were missing). The state-machine with the GCN programs
got screwed up, causing the swift_receiver front-end program to crash.
And when I restarted the GCN programs there was a brief window
(less than 60 sec) when XRT Notices could come down and be distributed
before I could get the block-to-world-distribution command executed.
And since thre were many XRT messages coming down TDRSS during the test,
some of them slipped through this brief interval and were distributed.
Both the missing-messages-statemachine problem and the brief-window problem
have been fixed.
This new software has been tested to prove that the state-machine problem is fixed.
And regular burst data has been processed to prove that the normal mode
mode of operations has not been effected.
At all times during the last 3-4 hours, the normal burst processing capability
was never compromised.
My apologies for the inconvenience.

31 Oct 06 PROBLEMS DURING ON-ORBIT XRT POINTING TEST (PART 1):
This generated a bunch of messages -- all of which were supposed to be blocked-to-world
but some of which did get distributed. Please ignore all XRT message between 15:00 and 15:45 UT today.

13 Sep 06 INTEGRAL-->GCN-->WORLD BACK TO NORMAL:
The Goddard-level IT people have fixed the firewall rule, and I have switched
back to using the normal connection between IBAS and GCN (stopped the "bridge" program).

05 Sep 06 INTEGRAL-->GCN-->WORLD UPDATE:
Yes, the problem was at the GSFC Firewall end. The Goddard-level IT people
were consolodating their waiver rules and in that process copied one of the GCN rules wrong.
This is being fixed. In the meantime, the "bridge" program is working fine.

03 Sep 06 INTEGRAL-->GCN-->WORLD PATCHED:
The "bridge" program running on the U Chicago machine has been running fine
now for about a day. Things seem stable -- messages from IBAS are getting to GCN,
so if another burst happens, all should work fine.
Now there is time to wait until people are back to work (Tuesday for the US)
to see if this is a GSFC firewall problem (or elsewhere).

02 Sep 06 ON-GOING: INTEGRAL-->GCN-->WORLD PROBLEM:
There are on-going problems with the connectivity between GCN and the
the INTEGRAL burst information server (aka IBAS). This problem appears
to have started ~23 Aug 06. It was not noticed until shortly after
the INTEGRAL burst GRB 060901 (I received a Circular about the burst
but not a Notice -- a couple other people noticed this lack as well).

Although the connection between GCN and IBAS appeared OK, no POINT_DIR
or TEST messages were being received. Killing and restarting the GCN
program did not establish a good link. Since this problem happened
once before (and it turned out to be a Goddard firewall issue), I used
a machine outside of Goddard to set up a "bridge" between GCN and IBAS.
This bridge changed the socket connection protocol from UDP to TCP/IP
(which was key to the firewall issue; and it then relayed the INTEGRAL
messages to GCN (changing the protocol and thus avoiding the firewall problem).
[Carlo Graziani (U Chicago) kindly provides this machine outside of GSFC
that allows this bridge and other outisde-of-Goddard testing activities.]

This worked for about 24 hours and then the POINTDIR messages (and presumably
anything else that might have been generated) stopped again. This indicates
it is not a Goddard firewall issue this time.

I do not know what the problem is, but I am working to discover and solve it.
(Things are complicated because it is the weekend and system support people
here and at IBAS are not available.) I will keep you posted. (It is always
useful to check the GCN System Status wep page to see about this and any
future problem: http://gcn.gsfc.nasa.gov/sys_status.html; and on a broader
scale, check the "what's new" page: http://gcn.gsfc.nasa.gov/whats_new.html .)

This problem affects only INTEGRAL-based notices. The Swift, HETE, XTE, MILAGRO,
and IPN notices are all working fine (ie telemetry or messages are still being received
from these sources and are being distributed).

02 Sep 06 ON-GOING: INTEGRAL-->GCN-->WORLD PROBLEM:
The fixed connection (from yesterday) between INTEGRAL and GCN ran fine for almost 24 hours;
all the POINT_DIR and TEST messages were received as they were sent by IBAS.
But then something happened around 14:00 UT today to stop the flow of messages again.
I am working this on-going problem. (The problem is either with the Goddard firewall
or with IBAS (or GCN's account within IBAS). But given that we are in the weekend,
the lack of support personnel at both ends makes this effort difficult.

01 Sep 06 FIXED: INTEGRAL-->GCN-->WORLD PROBLEM:
The connection between INTEGRAL and GCN has been restored.

01 Sep 06 INTEGRAL-->GCN-->WORLD PROBLEM:
I am looking into why GCN did distribute the INTEGRAL Notice on the 060901 burst.
So far, I know it involves only the INTEGRAL->GCN connection. The other mission-based
connections (eg Swift, HETE, XTE) are all ok.

16 Aug 06 NEW GCN MACHINE SWITCH-OVER OUTAGE WAS 18 min:
The old GCN computer was replaced with a new faster machine today.
The GCN system (both Notices and Circulars) was off-line from 14:02 to 14:20 UT.

22 Jul 06 ACTUAL GCN OUTAGE WAS 14 min:
The actual outage was from 15:42 until 15:56 UT (14 min). (This is small compared to the amount of time
they allocated in the original ITN announcement of the outage (see below).)
No bursts were missed.

21 Jul 06 GCN OUTAGE TOMORROW:
The Goddard IT people are replacing the main gateway machine tomorrow (22jul06)
from as soon as 15:00 to as late as 22:00 UT. This will take GCN Notices and Circulars off-line
for up to as along as that time interval. The see the announcement
for the details.

15 Jul 06 GCN OUTAGE FOR 0.5 HOURS:
While trying to update the active sites.cfg list, the socket connection
to the Swift TDRSS data stream became wedged (this is related to this firewall blocking).
I had to reboot the system to clear it. The outage of the Notices service was 14:11 to 14:44 UT.

14 Jul 06 GCN SOCKET OUTAGE FOR 0.3 HOURS:
The Notices-portion of GCN was off-line from 22:37 until 23:00 UT 14jul06.
The Network Adminstrators were implimenting a wide-ranging set of new blocking rules
in the gateway between the GCN machine and the world. Their first attempt
at setting up these rules with holes all the GCN socket connections
was not quite right. It took us 23 min to get the rules fixed.
Since some of the socket connections were actually broken (not just suspeneded),
there maybe longer than 23-min outages based on how long it takes GCN and/or your end
to go through each end's reconnection/initialization cycle.

15 Jun 06 GCN NOTICES OUTAGE FOR 9.9 HOURS:
The Notices-portion of GCN was off-line from 16:39 15jun06 until 02:32 UT 16jun06.
The program crashed. (This did NOT affect the Circulars portion of GCN.)

15 May 06 SWIFT OUTAGE FOR 30 MIN:
The Swift-portion of GCN was off-line from 23:28 until 23:58 UT 15 May 06.
This was about an hour after the burst, and results in the loss of a few
of the later UVOT data products.

13 May 06 GCN NOTICES OUTAGE:
The GCN Notices system was off-line from 03:08 until 05:12 UT
due to a program crash. This affect only the Notices part of GCN;
the Circulars part continued to work.
The cause of the program crash is being investigated.

26 Apr 06 GCN OPSYS CHANGE OK:
The planned outage to switch to a new operating system on capella
lasted 40 minutes. Both the Notices and Circulars are back on-line.

25 Apr 06 GCN OUTAGE TOMORROW (26apr06):
The GCN System (both Notices and Circulars) will be off-line tommorrow
Wednesday 26 Apr 2006 from 14:00 to 15:30 UT.
NASA has issued a new set of computer security requirements.
The version of RedHat LINUX that GCN is curently running under
is no longer on the approved list; so I have to upgrade.
GCN Notices & Circulars has already been ported and tested on the new OpSys,
so the transition should go smoothly. 90 min has been allocated
for the switch-over, but it should likely take less time.
(If there are problems, this switch-over is being done in a way
which will allow us to go back to the old OpSys.)
This will NOT involve any change apparent to you. The capella name, domain, and
IP number will NOT change (so no firewall changes are needed at your end of things).
I apologize for the somewhat short notice, but the outage of services
should be small (and I want to get this in with sufficient time before
the weekend).

24 Apr 06 TWO 4-MIN OUTAGES WHILE DOING TEST:
The GCN system was offline for about 3-4 min starting at 18:58 and 19:10 UT
while the system was swiitched over to do a second test of GCN under Scientific LINUX.
The test was successful, and the real swift will likely be later this week or next week.

19 Apr 06 TWO 4-MIN OUTAGES WHILE DOING TEST:
The GCN system was offline for about 3-4 min starting at 18:58 and 19:22 UT
while the system was swiitched over to do a test of GCN under Scientific LINUX.
This OpsSys change is being mandated by the IT Security people at GSFC; RedHat
is no longer allowed. The test was successful.
In between the two times listed, GCN was actually running to the world
under the new Sci-LINUX. Had there been a burst during that (brief) window,
it would have been distributed to the world per normal.

10 Jan 06 SCREWY DATES AND TIMES STILL IN ATTACHMENTS:
Please note that while the screwy dates and times in the Notices was fixed 5 days ago,
it still remains in the titles of the lightcurve plots and images being sent as attachments
(and appearing on the GCN web table page). This ill be fixed in a day or two.

05 Jan 06 SCREWY DATES AND TIMES IN SWIFT NOTICES FIXED:
The problems with the Date and Times in the Swift Notices has been fixed.
There were two problems: (1) the GCN software was unpacking the now negative UCTF incorrectly, and
(2) a BAT FSW mistake filling the UTCF data fields.
The UT CorrFactor has been re-instated in all Swift Notices.

02 Jan 06 SCREWY DATES AND TIMES IN SWIFT NOTICES:
There is a problem(s) in the dates and times in the Swift Notices (only the Swift-based notices).
It seems to be related to (a) the year change, (b) the UT Correction Factor going negative
due to the LeapSecond adjustment, and (c) possible a FSW problem.
The problem is being investigated.
In the mean time, the UT CorrFactor has been removed from all Swift Notices.
Since this is always in the range -1.0 to +1.0, it is a small effect on dates and times.

04 Nov 05 LOSS OF SERVICE:
From 07:36 to 12:49 UT (delta_t = 5.2 hr), GCN was off-line due to a program crash.
The cause was due to a limitation in the file system.
Given the formating of the disk and file system, the number of inodes allocated
can not support a directory that has more than 185,494 files.
GCN has a directory that it writes a copy of every message that comes through the system.
This archive directory grew without much inspection, and then recently the addition
of the Startracker-loss-of-lock messages pushed this directory over the top (because these
StarTracker status (good and bad) messages come from Swift every 10 sec). So in roughly
30 days, the 185K limit was reached. The GCN program was changed to not write
these Startracker messages. This particular message does not need to be archived.

26 Oct 05 LOSS OF SERVICE:
From 12:24 to 13:47 UT (delta_t = 50 min), GCN was off-line due to an unplanned system crash
while some cabling work was being done on the cluster of machines.

06 Oct 05 BRIEF LOSS OF SERVICE:
From 18:31 to 18:47 UT (delta_t = 16 min), GCN was off-line while new system s/w was installed.

03 Sep 05 LOTS OF SWIFT GCN NOTICES:
At 21:13 UT, Swift-BAT triggered and issued the standard set of GCN Notices (and so did the NFIs).
About 9 more sets of Notices came out over the next half hour.
The spacecraft Star Tracker lost lock (as it does every couple months) and so with sources
drifting in the BAT FOV, triggers were generated. See GCN Circ #3909.
At 21:46 UT the block-to-world filter was activated for all BAT-based, XRT-based, and UVOT-based Notices.
However, you are still likely to receive some Notices after that time. They should all be generated
before that time however. The delay is due tot he way the sendmail demon works.
During the time when Swift was generating a lot of Notices in a short amount of time,
the load-factor on the GCN computer went up to over 14 (typical values of 2-3 for a regular burst series).
When the load-factor goes over 8, sendmail will suspend outgoing email activites.
And then it picks them up when the load-fact drops below 8 AND when the next retry-to-send interval expires.
This interval is currently 15 minutes. I, persoanlly, have been receiving Notices almost an hour
after the "blocking" time, so there is also something else at work in this email processing,
but most of the delay was due to the high load_factor-sendmail interaction.

15 Aug 05 17:24:
The email deliveries during the GRB050815 burst were very slow -- minutes to 107 min.
The exeact cause is not well understood at the moment, but it is believed to be caused by
a very high loadfactor on the capella machine. What caused this high loadfactor is not known yet.

06 Jul 05 00:00:
GCN was effectively off-line from 00:00 to 01:00 UT (total outage 60 min),
and the Swift_TDRSS_receiver portion for an additional time until 02:40 UT.
They Goddard Center Network people were conducting an emergency power test
in preparation for the Shuttle Return-to-Flight.
The main GCN program came back on-line at 01:00 when power was restored to the routers,
but the Swift TDRSS connection to White Sands needed manual help (at 02:40 UT, Swift total outage 160 min).
I would have announced the outage prior to it had I known it s going to affect GCN,
but the Center Network announcement of the test said that it was not going to affect the part of Goddard
that GCN is located, not was it supposed to affect the Goddard connection to the outside Internet.

03 Jul 05 14:20:
The Swift-to-GCN connection was out from 16:44 to 17:31 UT (total outage 47 min).
Manually restarting the swift_tdrss_receiver program cleared the block between White Sands and here.

13 May 05 14:20:
The Swift-to-GCN connection has been RESTORED (as of 14:20 UT).
Total lost time: 16.3 hours (Swift only; all the other Notice types as well as the Circulars suffered no loss).

13 May 05 03:00:
The Swift-to-GCN connection is down (as of 21:57UT 12may05).
The problem was "worked" for several hours, but no solution was found.
Work will resume Friday morning (13may05).

30 Apr 05 15:30 UT:
There is a problem with the intranet (and/or the mail server machine) here in Building 2 at GSFC (external to GCN).
This has caused the delay in distribution of the email-based GCN Notices. It appears to have been occurring for at least 4 hours,
and is still somewhat intermitant at the moment. People are working the problem.
17:00 UT: the problem has been fixed -- email is flowing promptly once again.

12 Feb 05 23:03 to 13 Feb 05 00:20 UT:
GCN (both Notices and Circulars) was down due to planned outage of the Goddard connection to the Internet.
The Goddard network people performed an upgrade to the gateway to the Internet. The total outage was 1.1 hours.

10 Feb 05 UT:
The GCN Notices system was off-line for 12.3 hours (01:22 to 13.40 UT).
The disk freespace went to zero (due to poor management on my part).
There was no loss on the Circulars system.

12 Dec 04 UT:
The GCN Notices system was off-line for 60 minutes. All functionality was restored at 08:26 UT.
There was no loss on the Circulars system.

08 Nov 04 UT:
GCN (both Notices and Circulars) was down due to a power failure in the (half of the) building that the GCN computer is located. The total outage was 11.2 hours.

08 Oct 04 UT:
One of the disk partitions on the GCN computer (capella) become ful some time around 21:00 UT yesterday. It was not discovered and fixed until 15:00 UT today. This problem caused two Circulars to be mis-numbered, and it caused some Notices to be delayed in distribution. Since it is possible that a submitted Circular was lost, you should resubmit your Circular again (if you did not see it in the outgoing list). This did NOT affect the socket-site portion of GCN -- that part kept worki right through the disk-full incident.

26 Sep 04 UT:
The interface program between GCN and INTEGRAL exited (for unknown reasons). A total of 16.8 hrs of connectivity to INTEGRAL was lost (23:55 25sep04 to 16:40 26sep04 UT). (The rest of GCN continued to operate normally, ie HETE, XTE, IPN, etc).

25 Aug 04 UT:
The system-clock on the GCN computer was found to be off by 3min21sec (ahead). This has been fixed. Any use of email "NOTICE_TIME"s or socket_packet times will appear to have caused a distribution delay of 3min21+sec. This is not the case in actual fact -- only in appearence. The delays (monitored by other parts ofthe GCN system are still short: 0.1-1.0 sec for socket sites and 1-3 sec for email sites (the part of the distribution time that is within GCN; I can not account nor control the part of the distribution time for email once it gets outside of Goddard Space Flight Center). The GRB Times are completely unaffected and accurate with respect to this problem.

22 Aug 04 13:31-14:19 UT:
The Goddard Center Network people took the Goddard internet down for upgrades. This resulted in a loss of connectivity (both incoming and outgoing) of GCN to the outside world. A loss of 48 min.

29 Jul 04 15:07-15:21 UT:
GCN was taken down to get an even better electrical_power/UPS/internet/router configuration. A loss of 14 min.

28 Jul 04 00:54-07:39 UT:
GCN was down due to power failure due to T-storm. A loss of 6.7 hrs.

30 Jun 04 00:39-15:15 UT:
The conection between INTEGRAL and GCN was down, so there was no INTEGRAL service within GCN for those 14.5 hrs (all the rest of GCN HETE, RXTE, IPN, Circulars, etc was connected and working fine). The INTEGRAL outage connection problem was probably at the GCN end (still under investigation).

25-27 Jun 04:
There were 3 outages this weekend. They were due to an upgrade in the electrical distribution within the building that houses the Notices and Circulars portions of the GCN system. For the Notices portion, the first was on Friday evening when the power was taken down (for about 1.5 hrs) to start the updgrade. During the upgrade, arrangements were made to have the GCN computer and router put on generator power. The second outage started on Sunday around noon (EDT) when the generator failed (~1 hr). A second generator was brought on-line. Then several hours later the system was brought down to switch back to the normal building power (~1 hr).
During this weekend the response times of GCN were slowed slightly due to the primary Domain Name Server being down. A 5-sec delay was introduced for about half the socket packets while GCN timed-out while waiting for the primary DNS. This was most notable in the round-trip travel times reported in the "Daily Socket Connection Reports" (sent to those socket sites requesting these reports). You will notice peaks in the round-trip times histograms at 5-sec and smaller peaks at 10-sec and 15-sec. These increments in the round-trip times are in the 'return' portion of the round-trip -- not in the 'to you' portion.
The Circulars portion of GCN was off-line for the whole weekend.

12 Jun 04 08:23-21:51 UT:
The GCN Notices was off-line between 08:23 to 21:51 (13.5 hrs) -- the program crashed (cause as yet unknown).
(If anybody knows of a pager -- or other comm system -- that can get through building walls, please let me know. I'm tired of being out of touch with my watchdog systems.)

04 May 04:
The problem that disabled GCN Circulars has been fixed.
You are now able to send your circular submission to gcncirc@@lheawww.gsfc.nasa.gov and it will be scanned, accepted, and distributed automatically (just like bofore).
The account was temporarily disabled (for 3.5 day) as a result of a reconfiguration of the machine by the computer admistration people here at Goddard. This affected only the Circulars portion of GCN; the Notices portion was never affected.

29 Mar 04 02:40 UT:
The GCN system was offline for 6.0 hours (20:03 28apr04 until 02:08 29apr04 UT). The main GCN processing demon crashed. It took 6 hours to restore the system because I was inside a metal building and so the automated monitoring system was not able to get through to my pager.

15 Mar 04 15:10-19:22 UT:
The main gateway router for Building 2 at Goddard died, which resulted in GCN being completely disabled (no incoming messages from the various misions and nothing outgoing -- not even Test Notices.)
The router was replaced and services restored at 19:22 UT; a 4.2 hour loss.
(Give the earlier INTEGRAL-only outage, it never rains but it pours.)

15 Mar 04 14:18 UT:
The connection to the INTEGRAL IBAS GRB_message server was re-established. I have set up a portnumber translator program on a machine operated by Carlo Graziani (U. Chicago). This translator is a work-around to the recent blanket port blockage by Goddard Network managment. (In the mean time I have submitted a request to get the specific port number re-opened for GCN<-->INTEGRAL use.)
Thankfully, the universe co-operated, and there were no INTEGRAL-detected bursts during this 4.5 day outage.
Many thanks to Carlo for the use of his machine for this work-around.

12 Mar 04 UT:
GCN's ability to receive (and therefore distribute) INTEGRAL Notices has been blocked by the GSFC Network Security people instituting a firewall blockage over a range of port numbers (that includes the IBAS-to-GCN port). This happened late Wednesday (23:00 UT 10Mar04); was not discovered until Thursday; and the route cause not identified until late Friday. I will submit a wiaver request to get the IBAS port number opened back up, but that will not be possible until Monday morning (14Mar04). A backup pathway is being developed (using a different method) which will prevent future losses of information should there be another outage of the INTEGRAL_IBAS socket-connection pathway.
I apologize for the 4-day loss of service.

01 Mar 04 14:57 UT:
The recent RXTE_ASM GRB Noticed was delayed in distribution by 7.3 hrs within the GCN system, because of a processing error within the GCN system. As part of the transition from the old Building 23 SunOS system to the new Building 2 LINUX system (a year ago), the entry in the "import" table was not updated properly for this Notice type. Insufficient testing was performed, and it was not until today's Notice that there was any real use of this Notice type. I apologize for the mistake and the delay in the distribution of this GRB Notice.

01 Mar 04 12:15 UT:
The Internet connection to the outside world was lost at 12:15 UT. It was re-estatblished at 13:36 UT; for a loss of 1.26 hrs. At this time (14:11 UT) I do not know the cause of the outage or why it resumed. (The GCN system proper continued to run throughout this interval. Socket sites are now being reconnnected via the automated reconnect process. Email/Pagers/cells/etc distribution has also resumed.)

29 Feb 04:
A mistake was made in the correction for this year's Leap year. This affected only the INTEGRAL Notices. The set of INTEGRAL Test Notices distributed at 01:00 29Feb04 had bad day-of-year, month, and day-of-month fields. I believe this has been corrected, but I am waiting for the next set of INTEGRAL Test Notices to know for sure.
The next set of INTEGRAL Test Notices have been received, processed, and distributed. Part of the fix was to adjust the time of the "event" from being in 2003/mm/dd (ie in the past) to 2004/mm/dd (in the future; the current mm/dd being used by INTEGRAL is May 23).
I will look into the possibility of making GCN function properly for dates in the past. (Currently, anything previous to January 01 of each year is too far into the past to have all the TJD, DOY, YY/MM/DD work properly. The GCN routines were conceived like GCN was conceived -- everything is real-time. Having something 2 years into the past is/was out-of-scope.)

11 Feb 04:
The GCN Notices system crashed at 20:36 UT. It was not noticed for a while. It was restarted and sites were connected by 21:29 -- a loss of 53 minutes.
(The GCN Circulars system was NOT affected.)

30 Dec 03:
While rebooting the computer to install some new security patches in the kernal, I restarted the connection to the INTEGRAL server with the wrong IP Number. This mistake was not noticed until 20 hours later. The GCN connection to the INTEGRAL server was immediately restarted and the connection was re-made. This affected only the INTEGRAL messages (if there were any) -- the rest of the GCN system was/is operating fine.

06 Dec 03:
An infinite-loop interaction between the Circulars demon and a spammer's demon caused the disk partition for the outgoing email to be filled to capacity. I can not tell which of these Circulars was actually distributed (some where distributed once the offending messages were dequeued), so I distributed them again. My apologies for the delay in distribution (for those that never got these) and my apologies for those that are receiving them twice.
The Circulars demon program has been modified to prevent this new form of infinite loop in the future.

16 May 03:
A T-storm power outage caused the system to go offline at 07:50 UT this morning.
The outage was longer than the UPS battery capacity.
Power was restored at 09:07 UT and the system was rebooted.
Some sockets sites were able to reconnect automatically starting at 09:07,
however a manual restarting of the program was needed to clear out problems
preventing the rest of the socket sites from connecting. This was done at 13:41 UT.
There was a 1.2-hour loss for some sites and a 3.9-hour loss for the other sites.

21 Apr 03:
The system went offline at 07:00 UT this morning. The cause is unknown.
The system was rebooted. There was a 5-hour loss.

16:30 UT 17 Apr 03:
The recent cluster of identical Circulars was due to the submittor sending 8 separate copies of the message over the span of an hour.
His account has been disabled.
I am in the process of cleaning up the mess.
I have reset the Circular serial number back to the point after his first submition.

The GCN contact is: Scott Barthelmy, scott@@lheamail.gsfc.nasa.gov, (301)-286-3106

This file was last modified on 08-Jun-08.