Skip to Main Content

The University of Tennessee

Newton header banner

Frequently Used Tools:



Home » Documentation » Older Announcements

Announcements 2013

Updates on the planed cluster upgrades: New Cluster .

-- Gerald Ragghianti - 2013-11-15

Note: the maintenance outage on Friday, Oct. 25th has been extended to 8:00PM.

The Newton clusters will have a maintenance outage on Friday, Oct. 25th from 6PM until 8PM for backup power systems upgrade. All systems will be powered down during this period.

-- Gerald Ragghianti - 2013-10-02

Gromacs software on Newton systems has been upgraded to version 4.6.1. Two versions are available (multithreading and MPI), and both version are compiled using the Intel compilers.

-- Gerald Ragghianti - 2013-07-16

The PGI compiler suite with compiler accelerators (OpenACC, CUDA, etc) is now available on the Newton cluster. It can be loaded via "module load pgi".

-- Gerald Ragghianti - 2013-06-06

The Newton cluster will be shut off on June 8th from 5-11AM for facility chilled water maintenance. Please keep this in mind when submitting long jobs to the queue.

We have also installed R version 3.0.1 on the Newton cluster. The default version will remain at 2.15.2 until June 8th so that users may test the new version and rebuild any packages that require it. To test version 3.0.1, run "module load R/3.0.1".

-- Gerald Ragghianti - 2013-05-29

We will be upgrading the default version of MATLAB to R2013a on May 15th. It is available for testing now by using the command "module switch matlab/R2013a". Older version of MATLAB will remain available.

-- Gerald Ragghianti - 2013-05-13

The default version of LAMMPS on the Newton cluster has been upgraded to version 22Feb13.

-- Gerald Ragghianti - 2013-03-04

The Newton clusters are now able to send outgoing email in response to job status changes. See the "-m" and "-M" options in the qsub manual for more information. Email notifications are turned off by default.

-- Gerald Ragghianti - 2013-02-12

Announcements 2012

We are planning a 12-hour maintenance outage of the Newton cluster for Sunday, November 4th starting at 10AM. The cluster will be inaccessible during this period, and all jobs will be stopped. This work will reconfigure the cluster for higher network throughput and fault-tolerance.

-- Gerald Ragghianti - 2012-10-05

Four new compute nodes of the new GPGPU compute cluster are now available for testing. Each node has an Tesla M2090 GPGPU card onboard along with 16 CPU cores (Intel Xeon E5-2670) and 32GB of RAM. The eventual size of the cluster will be 24 nodes. For more information on using this resource, visit Tesla Compute Cluster .

-- Gerald Ragghianti - 2012-09-25

Our new introductory workshop is now entirely self-guided and available online at any time.

-- Gerald Ragghianti - 2012-09-16

We are going to continue the previously announced work on Wed. (Sept. 12th) from 8:00-10:00 AM. Temporary disruption of network traffic to and from the Newton cluster will occur during this period. New job submission and execution will be disabled during this period, but no running jobs will be affected.

-- Gerald Ragghianti - 2012-09-10

We are rescheduling the next Newton maintenance period to Sept. 10th from 8:00-10:00 AM. During this time we will be upgrading and reconfiguring the network core to 10Gbit/sec with full redundancy, and we will be moving the cluster servers to a new rack with redundant network connections and electrical power. During this time, all running jobs will temporarily wait for network and servers to come back online.

-- Gerald Ragghianti - 2012-09-03

The cluster is back online now. We are scheduling a maintenance period starting Tuesday, June 24th at 8:00 AM in order to do software updates on this storage server. During this period, the cluster will be unavailable, but all cluster operations will be safely halted in place during the maintenance. No active jobs will be killed, they will just pause until the storage server is back in service.

-- Gerald Ragghianti - 2012-07-20

The Newton cluster is currently offline due to a failure of the main storage server. This is possibly related to the problem we experienced with this system two weeks ago. With the extra information that we gathered about the problem, we will be working with the hardware vendor to find a permanent solution to the problem. However, we will first try to get the system back online. We do not expect to have data corruption or loss of compute job progress due to this problem, but we send an update on this once the system is back online.

-- Gerald Ragghianti - 2012-07-20

The Newton storage server is now back online. There is no indication of data corruption or of failed jobs due to the downtime. Jobs and interactive sessions should have been able to continue from the point in time where the downtime started. We are currently running an online check of data integrity and will continue monitoring the system for aberrant behavior.

-- Gerald Ragghianti - 2012-07-03

The main Newton home directory server has unexpectedly halted and is currently being rebooted. This is temporarily causing all interaction with home directories to halt while the NFS server is unavailable. Once the server come back online, jobs and interactive sessions should continue as normal.

Update: The Newton storage server is taking a longer time than normal to restart because it is replaying data transaction logs to avoid data corruption caused by the unexpected reboot. We are working with the vendor on how we can speed up this process, but there is currently no indication of when the system will be available.

-- Gerald Ragghianti - 2012-07-03

We are looking at a possible service outage on June 6th for the Newton cluster from 6-10AM in order for Facilities Services to do maintenance on the chiller. All jobs will have to be stopped during this period. If you are starting long-running jobs, please be aware that they may be affected by this, and plan to checkpoint your calculation progress appropriately.

-- Gerald Ragghianti - 2012-05-23

Announcements 2011

On Thursday, December 15, from 10PM to midnight, we will be doing a network upgrade to the external network connection between the Newton cluster and UT campus. This work will upgrade the connection to 10Gbit/sec and will lessen the possibility of network outages in the future. The upgrade procedure should only cause a momentary loss of connectivity to interactive login sessions; no running user jobs will be affected.

-- Gerald Ragghianti - 2011-12-13

The cluster is at full capacity of about 4200 CPU cores.

-- Gerald Ragghianti - 2011-09-18

The work on the Lustre filesystem is taking much longer than expected due to a large number of very small files. We do not know when the filesystem rebuild will complete, but it is expected to be some time tomorrow.

-- Gerald Ragghianti - 2011-09-03

We are planning a large maintenance period for the Newton cluster starting Sept. 2nd at 5PM and ending on Sept. 3rd at 5PM. During this period, the entire cluster will be offline for the following work:

  • Upgrade of the cluster head node operating system
  • Rebuild of the Lustre high-performance file system
  • Bringing online a new compute node operating system image
  • Integrating 1700 new processors into the cluster

-- Gerald Ragghianti - 2011-08-28

The Newton HPC Program is offering three different workshops this semester. You can register for each 2-hour session at http://oit.utk.edu/workshops/data.php

  • Linux Basics: Sept 6 3:00-5:00PM in 208 Perkins Hall
  • Introduction to Newton Research Computing: Sept 8 3:00-5:00PM in 208 Perkins Hall
  • Advanced computing techniques on the Newton cluster: Sept 22 3:00-5:00PM in 208 Perkins Hall

-- Gerald Ragghianti - 2011-08-17

The facility maintenance has been scheduled for tomorrow, June 15th from 6AM to 9AM. We will be shutting down the Newton cluster at about 5:30AM and will begin bringing it back up after 9AM. I am now turning off access to the long* queues on Newton, but the medium and short queues will remain available until shutdown.

-- Gerald Ragghianti - 2011-06-15

There is a strong possibility that we will need to take the Newton cluster offline in order to allow facilities services to do maintenance on the data center air conditioning system. The likely dates are early in the morning either June 13th or 14th. We will notify when we know the planned time. During this period, all running jobs will be killed, so do not submit any jobs that will run longer than 24 hours unless you are saving calculation progress periodically.

-- Gerald Ragghianti - 2011-06-10

The Newton 24-seat license for Mathematica has been upgraded to version 8.0.1.

-- Gerald Ragghianti - 2011-05-02

We have three workshops in February that are related to the use of the Newton cluster:

Introduction to Linux - Feb 21 at 1:30

Introduction to High Performance Computing on Newton - Feb 23 at 1:30

Tools for optimizing parallel programs - Feb 25 at 1:30

More information and class workshop registration is available at http://oit.utk.edu/workshops/data.php

-- Gerald Ragghianti - 2011-02-14

We have upgraded the Newton cluster login nodes to a pair of 12-core, 24GB RAM computers. This upgrade should help to balance the load on the login machines and enable more people to do interactive computations simultaneously. When you use the login server name "login.newton.utk.edu" it will automatically take you to one of these machines. If you are currently logged into the cluster, your session will not be affected until you log in again.

-- Gerald Ragghianti - 2011-02-09

Announcements 2010

We will be doing maintenance on one storage server of the Newton cluster on Oct. 29 from 12PM until 5PM. During this time, the following storage locations will be unavailable: /data/data2/ /data/data4/ /data/data5/ /data/data6/ /data/data7/ /data/DEM_CFD/

During this time, no jobs should be started which use these storage locations.

-- Gerald Ragghianti - 2010-10-27

If you have attended one of the Newton workshops this semester, you will know that we now recommend that users manage their computing environments using the "Modules" system. This is the same system that is used on Kraken, and we hope that using Modules on Newton will make it easier to translate your work between the two computer systems. We have finished the documentation for using Modules on Newton: Managing Your Environment With Modules . Currently, all applications that are installed under /data/apps can be managed with modules.

-- Gerald Ragghianti - 2010-10-20

We are currently draining jobs from the short_phi, medium_phi, and long_phi queues in order to do a power supply upgrade tomorrow. This process will not affect any running jobs, however queue wait times may be longer until the phi machines are back online.

-- Gerald Ragghianti - 2010-10-19

We have three upcoming workshops:

  • Sept. 1st @ 1:30PM: Linux Basics
  • Sept. 3rd @ 1:30PM: Introduction to Newton Research Computing
  • Sept. 8th @ 1:30PM: Advanced computing techniques on the Newton cluster

You may register for these workshops at http://oit.utk.edu/workshops/data.php#newton.

-- Gerald Ragghianti - 2010-08-19

The planned maintenance period for Newton on July 16th has been rescheduled for Sunday July 18th from 5:00PM until midnight.

-- Gerald Ragghianti - 2010-07-13

We have just finished the installation of a rack of 72 new compute nodes having a total of 864 processor cores. You have an opportunity to test your applications on these nodes and on the new operating system version before the nodes are fully integrated into the Newton cluster. More information is available at TestingNewComputeNodes.

-- Gerald Ragghianti - 2010-06-30

We are planning to implement an at least once-yearly complete downtime of the Newton cluster in order to do major upgrades that cannot be done while the cluster is in operation. This includes hardware upgrades to the central networks and software patches on all systems. We will not be able to fully integrate the newly purchased compute nodes and storage systems until these upgrades are complete. This work will involve:

  • Migrating data from the Lustre test system onto the newly purchased hardware
  • Upgrading the firmware of all Ethernet switches in order to be able to connect them at maximum speed
  • Connecting the Ethernet switches for the Lustre systems to the other Newton networks
  • Applying software updates to all servers
  • Updating all compute nodes to the newest software
  • Upgrading the central storage system to 10Gbit/sec Ethernet.

This work will require that the entire cluster be down for a number of hours. All running compute jobs will be stopped, and there will be no external access to Newton systems. We would like to do this work on July 16th starting at 5:00PM. Please let us know if you have any questions.

-- Gerald Ragghianti - 2010-06-30

We noticed that using qrsh to start an interactive job has been failing on some nodes due to an incorrect node configuration in the Grid Engine on Newton. We have identified and fixed the source of the problem. Interactive (qrsh) jobs should now run reliably.

-- Gerald Ragghianti - 2010-04-22

We have activated the second login node on the Newton cluster. When you log in via ssh to login.newton.utk.edu, one of the two login nodes will be chosen at random. This will allow us to load balance interactive user jobs and data transfers between the two systems. One possible complication is the PGI software license, which is only available on the login node named login0.newton.utk.edu. If you are using PGI compilers you should log into login0.newton.utk.edu.

-- Gerald Ragghianti - 17 Mar 2010

Newton Program newsletter for March 15, 2010

-- Gerald Ragghianti - 15 Mar 2010

We are planning a large memory upgrade on Jan. 26 from 3PM to 5PM. This upgrade will increase the available RAM on the newest 120 compute nodes from 12GB to 16GB (2GB per processor core). This upgrade will require the stopping of all jobs running on "tao" compute nodes. We will also be doing a reconfiguration of the Ethernet fabric in order to increase the available inter-cluster bandwidth and redundancy. This work will briefly interrupt all cluster TCP/IP traffic but will not cause loss of progress to currently executing jobs.

-- Gerald Ragghianti - 18 Jan 2010

Announcements 2009

We are preparing to soon shut down the old cluster head node newton.usg.utk.edu. If you still require use of this machine, please notify us as soon as possible so that we can make arrangements for you. The tentative date for shutdown will be Wednesday, May 6th.

-- Gerald Ragghianti - 28 Apr 2009

We are planning to transfer 112 CPUs of the CMRG group to the new Newton cluster on March 1st at 4:00PM. Jobs running on these machines should be stopped or checkpointed before this time. Once the machines are installed in the new cluster and proper operation is verified, we will increase the priority of the CMRG group on the new cluster and announce the availability of the machines. This should take no more than a few hours.

-- Gerald Ragghianti - 25 Feb 2009

Announcements 2008

There is a power outage and maintenance period planned from Saturday, Oct. 11th at 6:00 AM to Monday, Oct. 13th at 12:00AM. We will be taking advantage of this time to upgrade the Ethernet and Infiniband networks. Please take appropriate measures to avoid loss of compute progress on your jobs.

-- Gerald Ragghianti - 07 Oct 2008

The data backup and "Time Machine" historical file retention system is now online. See Data Backup for how to access your backup data.

-- Gerald Ragghianti - 12 Mar 2008

We have created a new LSF queue called "testing" which is designed to enable quick testing of parallel code. We have included 2 dedicated machines in this queue with a total processor count of 16.

-- Gerald Ragghianti - 18 Jan 2008

Announcements 2007

The university has provided funds for us to purchase a 24TB storage server to augment the existing disk-based storage on the cluster. The server is a Sun Fire X4500 (Thumper) and consists of an innovative combination of a 4-core Opteron server and 48 attached SATA hard disks using the ZFS file system for redundancy and performance. We will be making this available for global access on the cluster through NFS shares of various sizes for maximum 6 month durations. Visit the storage application page to request space on this device.

-- Gerald Ragghianti - 26 Oct 2007

The cluster is currently back online and operational with the exception of one storage array (/data6). We are currently investigating this problem and with provide further information when it is available. Update: replacement parts for the affected array should be installed on Monday.

-- Gerald Ragghianti - 22 Jul 2007

There will be power outage in the early morning of the July 22nd to do maintenance on the backup power system. The queue will be drained of jobs 24 hours in advance, but if you have a long running job it may be killed. We will take the opportunity to do an upgrade of the Ethernet switch as well.

-- Gerald Ragghianti - 12 Jul 2007

We are planning a number of changes to the cluster in order to increase ease of use for users and administrators. However, since the cluster is currently in use we will make every effort to minimize interruption to cluster operations. Any changes that will affect end users will be announced via the [[http://listserv.utk.edu/archives/usg_hpcc.html][USG_HPCC]] mailing list.

-- Gerald Ragghianti - 27 Jun 2007