Density and PSF FWHM Maps

Here are some updated plots for us to look at at Thursday’s meeting.

First is the PSF maps for u and u*. I used Koe to extract the profiles for every 5th PSF (the code crashed without me knowing and I had to redo it at somewhat short notice), and used the curve of growth to get a FWHM (looking for the first value to rise above 0.5 after normalization). I tried my best to make the plot look as close to Thibaud’s as possible. It is very clear from this map that the PSFs that I measure are much smaller.

If I renormalize the value range I get something that looks a little more consistent with Thibaud’s

Density Maps

I finally learned how to properly use skycoords, and I’m kicking myself for not using them sooner. Ivana, I should have trusted you more! I used sky coordinate matching to find the 6th nearest neighbor (the 5th really, since the object I’m matching to is counted), and get distances. For the purposes of making this plot, I get distances for every 20th galaxy. The resulting plot is very pretty!

I am pretty confident that the units for the output separation is in radians, as that was the only thing that resulted in distances that “made sense”. If it was in degrees, the distances were FAR too small (the most isolated region having a 5th nearest neighbour something like 20 pixels away). What I get in radians seems far more reasonable.

I also decided to have a little fun and plot it over a black background to really up the “prettiness” factor.

I’m still waiting on results from the simulations with background subtraction. I’m also continuing to explore using Photutils Background2D, but I have no figures for it yet.

PSF Resizing, Global Backgrounds

U-Band PSFs

So the ePSFs that we have been getting for u-band, as you know, are being oversampled and thus are not the right size (and the pixels are not the same size as the input images). After spending some time fruitlessly trying to resize the PSF directly into the builder (and to no success), I turned to rebinning. The methods that scipy use did not remotely work, but then I realized that the PSFs are images and you can use image resizing procedures. As it turns out, scikit-image has an image resize function that uses interpolation to change a 2D array to any size you want. Below is a simple example:

This SHOULD be fine so long as resizing the ePSF back to the original image shape returns the pixels to the right size. There is no reason for me to think this would be untrue.

Global Backgrounds and Subtraction

As Marcin predicted, there are no clear 2 peaks for the asymptotic values. I tested three different methods of measuring an average: median with no alterations, sigma-clipped median, and weighted mean. I sigma clip with a bias towards larger SB values, as we know the lower values are definitely not ideal measurements. These results (SB histograms and medians) are below:

All pretty consistent values! I use the sigma-clipped value to generate a new global background subtracted profile in the matrix plot figure (the orange curve). The background measurement now has an associated error, which is determined as the standard deviation of SB values where the smoothed gradient is close to 0. I also have, after far too much time, added a legend to the plot to explain what the curves are. Finally, I fixed the label on the top row to have the same font size.

They definitely don’t match up identically to the green profiles in every bin, but they look far better than I expected for such a rough measurement (of both the background in the green profiles and the new orange profiles). We definitely lose some data to negative values in some bins, but that is to be expected at such low SB values.

Updated U-Band PSF FWHMs and More

Continuing with our exploration of the low FWHM values for u-band psfs, I re-extracted stars from the u and u* images with an adjusted cutout size. Instead of 41×41, I bumped it up to 91×91 pixels. I then re-ran the PSF generation procedure as before using these larger cutouts. This results in a set of Moffat PSFs with a cutout width of 91, so much larger this time.

Below are two histograms. The first is for u-band, the second is for u*-band. It appears, at least to me, that these PSFs are worse than the ones from smaller cutouts (lots of PSFs now with low FWHMs). It also is clear that the median is not shifting higher as we had hoped, at least not sufficiently.

Why is it worse? My explanation would be that when you have a larger cutout, you will introduce more error by way of companion objects (even if severely masked) or other artifacts. Then again, that doesn’t explain the large numbers of PSFs, especially in u-band, that now have smaller FWHMs than before.

I also wanted to ensure that the FWHM estimate that I was gathering from the Moffat model was correct. Per Ivana’s suggestions, I extracted the profiles (50 for each band, selected at random) for u and u* PSFs and plotted their curves of growth, normalizing them to 1. The reason why I have to normalize is the total light within the 1D profile will not sum to 1, even through the image will. Where these profiles cross the dashed line will be the HWHM, so we have to remember to multiply by 2.

Below are two plots showing the curves of growth. The plot on the right is simply zooming in on the central region, whereas the left plot is to show the curve shapes in total. Also important to note is that the flat region at the very centre is due to limitations of numerical integration.

Finally, I updated the cutoff uncertainties chart to also include the numerical values of the uncertainties on the chart itself. I think it looks pretty good!

U vs U*, Densities, and Cutoff Uncertainties

Continuing with our exploration of the weird effects we are seeing in the different u-band images, I wanted to explore how things might differ between the different bands. The first thing I tried was looking at masking effects. I took u and u* band patches in the same region of the sky, and got an even distribution of cutouts. I then estimated the background for each cutout, and masked the cutout as well. This plot shows the difference in the number of masked pixels.

I also check the difference in the background estimates for these cutouts. I have plotted here the difference for the estimate, as well as the RMS between u and u*. This seems to recover what we see for the estimates for multiple patches, showing that the background estimate is lower for u, but the background RMS is higher for u*.

As we discussed, we also wanted to see some profiles extracted from u or u*. I generated a simulated galaxy and convolved it with a u-band PSF to give us a simple model to test. I didn’t change the galaxy since we are only looking at changes in the profiles at different places in the image.

I added this galaxy to each cutout and masked out companions. I then extracted the profile, saving the ones that succeeded for both u and u*. The below figure shows the differences between the profiles of 63 cutouts, as well as the median profile plotted in black. It appears that u-band is slightly fainter on average than u*, but definitely not by much.

I also did some work figuring out how to properly use the Scoville Densities data. It is a single datacube but it also has a 3D WCS. I had to do some work fixing the WCS because for whatever reason it was not loading in Python. Doing some checks leads me to believe it is working fine now.

From the README I can construct 3 arrays of redshift information, containing the central, low, and high values for each slice. When we “collapse” this information, we can simply collapse the arrays in the same fashion and we will be able to seamlessly use the data at any level of coarseness we need.

Here is a plot showing the slice width as a function of redshift, based on the arrays I have available.

Lastly, the test of the model matrix code on i-band images went smoothly. I do not see any downward trends in the i-band images, giving me confidence that is the input images causing the different effect. However, some thing I DID notice was that the cutoff values are different between the i-band test and the original results. Not by much, and the trend is still there as expected, but I think the code is quite sensitive to randomness. After all, we have random selections all throughout the workflow (KDE selection, image selection, and bootstrapping). It makes sense to me that we won’t get the same values every time.

To explore this, I have 10 jobs currently running on Cedar. All of these run the model matrix code on i-band images, which will give us multiple trials on a single band. This should allow us to have some better knowledge on the uncertainty of the cutoff values themselves, which I think is very important for the paper.

Horrific Blunders, New Code, New Jobs

I will start this blog off by saying that I probably had the most incredibly stupid moment of my research career thus far. In an absolute monumental display of boneheadedness, I found out the hard way that I didn’t have backups for some of my Jupyter Notebooks. One of the notebooks (in particular, the one which generates our lovely matrix plots for medians and cutoff values), decided to completely corrupt to the point that it could not be recovered. I then found out to my absolute horror that not a SINGLE backup existed on any of my machines, and the one on the external HDD was also corrupted.

So when I said that I would add the PSFs to the matrix plot, what I thought was going to be a quick, 5 minute venture turned into a 4 hour panic desperately trying to figure out how the me 2 months ago made the dang code in the first place. I have to give my past self credit. It must have been pretty well written!

Despite these dumb mistakes, the code is back and (maybe) better than before! I checked the cutoff values that it generated against those in my thesis, and they are identical, so no funny business there. I will also have to rewrite the code for the percent differences plot, but that is a small additional step that will be really simple to replicate.

Here is the matrix plot with the median psf added as the black profile. I’m assuming as we start work on the psfs, we’re gonna want to normalize it in some fashion. I left it un-normalized here because I wasn’t sure what we want to do with it. Best to just show the shape of the PSF as an initial step.

Jobs Update

On a slightly less depressing note, jobs are steady and running well. Out of the two simulations, (u and u*), the u-band sims ran successfully. u* failed, but only because I had inputted the wrong key value for the band when submitting the job. It is running (now without incident as far as I know) and should be done shortly. Not wanting to waste any time, I am running the median extractions on Beluga for the u-band results, and a new (recreated) matrix plot showing these results should be done tomorrow for our meeting if things go well.

I also attempted some troubleshooting on my FitsCompiler code, and it seems to me that there is no easy fix. Astropy claims that HDUList.copy() returns a shallow copy (implying it isn’t just pointing at the same object in memory), but that cannot be the case. Even when trying to use the with fits.open(..) as f: method, it still gives me the same error when trying to save the compiled fits file. I’m sure there is a much more complicated workaround, which involves directly copying headers and datasets for each hdu based on its type, but right now it’s unnecessary to work on. I’ll cross that bridge when I get there. You are welcome to try your hand at it if you’d like. The file is available at this github link.

PSF Final Testing

As a final check on our PSFs (Moffat versus 4 Gaussians), Ivana suggested I plot the difference between the sum of the absolute value of the residuals between the two models. I plot these values versus the average magnitude of the input stars that make up the stacked image. This plot is the result for the u-band. I didn’t make the same figure for u*, as I assume that the results will be consistent.

If a data point is below the line, this means that the Moffat PSF modelled median image better than the Gaussians. The opposite is true if the data point is above the line. Thus, for the most part, the Moffat PSFs perform better than the Gaussians, but not in every case!

I also obtained all of the PSFs using the Gaussian models, and have them saved just in case we want to switch to them in the future.

Lots of AceNet Jobs

It’s been a productive few days! The first thing I wanted to tackle were the PSFs and get them properly processed in full. Chi squared values, even when using the square root of the stacked image as the uncertainty, are still suspiciously too low (or incredibly high). This can be due to lots of zeros present in the uncertainties, but even when account for them it doesn’t make things much better. I decided that instead of wasting time trying to think of a workaround, I figured we could instead use the residuals that we have to at least compare the PSF models and decide on a better one. The sum of the absolute value of the residual pixels can be used to compare the Moffat models to a combination of Gaussians (in this case I tried 4). For the most part, the Moffat PSF tends to perform better. Perhaps we choose the one that has the smallest residual sum?

For now, I decided to just go ahead with the Moffat PSFs, as they seem to be consistently good at modelling the stars. I generated both sets of PSFs (for u and uS) and set out to start testing the simulations. I did a huge amount of work on the simulation code, allowing it to be more easily configured. After all, since we say it can run seamlessly on any combination of data sets, we need to make sure that’s actually true once things are published!

I submitted the jobs on AceNet, and we’ll see if things run smoothly! Based on some initial checks, I am confident that things will run without incident and we will have the cutoff values soon.

The processing jobs for extracting galaxies from the cosmos field are complete (at least for the grizy bands. u and uS have to be run separately)! Checking the output files, it looks like everything ran smoothly with no need to rerun anything.

Density Estimates

After doing a little bit of digging, I found Scoville et al, 2013 which appears to be exactly what we’re looking for. The downloadable data consists of a 3D dataset, where field density per deg^2 or Mpc^2 can be obtained at a given sky coordinate and redshift, and the redshift goes out well beyond the range we care about. So for each object in our catalogue, I should be able to match it with a position on the image, and the appropriate slice can be obtained using arrays found in the README. I’ve attached an example slice below:

Looks like it is saved as a single hdu object inside the HDUList, and the data is a 3D ndarray containing everything I need to run processing. Then again, since there is only 1 WCS (as far as I can tell) I might have to be careful with how to proceed.

Initial PSF Results

Sorry that this is coming a little late. I lacked the time to create the blog post before Thursday, but I did get quite a bit of progress done. I’ll go through the updates to CutoutCreator, as well as the procedure to create PSFs from the input data.

Procedure

CutoutCreator was first adjusted to save a couple sets of parameters to the header for each object. These are the X and Y coordinates of the object as they relate to the initial image, as well as the X and Y coordinates of the objected related to the cutout. These are derived from the input RA and DEC coordinates. These are used to get sub-pixel values for shifting. I ran CutoutCreator on both u and u* images, so we can generate these results seamlessly for either band.

My PSF procedure starts by gathering all stars within 2 arcminutes of the PSF position. If this number is less than 5, we try again with a distance of 4 arcminutes. Right now, I don’t try to get a PSF if this larger area has less than 5 stars.

After this, I create a stacked image of all our stars. I use scipy.ndimage.shift to shift each individual image using the sub-pixel coordinates, and then normalize the image (to have total value of 1) I then stack the image together, and get the median image from this. I also shift the master image to have the brightest pixel in the centre of the image.

I then create a model using astropy. Right now I’ve been testing a 2D Gaussian and a 2D Moffat model. I then fit it to the median image. At Marcin’s suggestion, I also compute the residuals between the model and the data. Below are some examples of 2D Moffat fits, showing the data (left), best fit model (centre) and residuals (right). I also plot the location of the generated PSF, as well as the number of input stars.

I’m going to be running some tests using multiple Gaussians. If I am unable to keep parameters fixed, 2 Gaussians go all over the place. However, I recently discovered that the Astropy models indeed have a fixed parameter setting. That being said, the Moffat models tend to perform very well. We should have PSFs very soon!

Stars Coverage Map

Above is the coverage map for PSFs. Plotted are the locations of galaxies in the COSMOS field. Overplotted are the PSF regions, coloured by the number of stars less than magnitude 24 and within 2 arcminutes. You can see how I downloaded the PSFs from PSF picker. The region on the left side of the image with a higher density is because of a required second download (it is a very finicky application and doesn’t download all the PSFs). This was consistent across all HSC bands.

This is the same coverage map, but for stars within a 4 arcminute distance. Once we increase the distance, we definitely have a suitable number of stars for PSF creation.

Saturation Tests

Moving on, a secondary look at the catalogue and indeed, peak surface brightness values are contained. I added them to my active catalogue with a quick concatenation using TopCat. After this, I plotted the “clean” values against each other (non-flagged with a -100 value for both the peak SB and magnitude) in the following plots.

These provide some interesting results. Based on my interpretation from this figure, this implies no saturated stars right? Saturation would, if I remember correctly, show up as a flattened region at the bright (leftmost) end of each figure.

More to come! Cedar has been a little finicky over the past couple days, but I should have all stars (in both u and uS bands) with a “proxy” SNR above 50 extracted.

Updates to CutoutCreator

CutoutCreator.py is becoming a lovely piece of software. Something that I am actually quite proud of. As I add more and more features, I realize just how useful a tool this will be for people processing data like this.

The code and README can be found here on my GitHub: https://github.com/HSouch/CutoutCreator

New Flags

–snr INT : Choose a given signal-noise ratio that objects must be above in order to save the cutout. This is currently defined as the maximum value of the cutout divided by the calculated RMS noise value. This is best used when masking images, as the maximum value should be within the target object.

–isolated FLOAT {0 – 1} : Choose a maximum allowed percentage of masked pixels to save the image. Can only be used when asking for masking.

–maskparams STR (FLOAT, FLOAT, INT) : Directly control the masking parameters from the command line. The three inputs are the nsigma, gauss width, and npixels. This is incredibly useful as we can directly adjust the severity of the masking on the user-level.

Cutouts and SNR

Continuing with our exploration of required SNR and masking, I wrote a fun program that automatically generates a plot to view cutouts made by CutoutCreator. This is for images where the minimum SNR is 50. (Note that I am not sorting these by SNR. Why they appear to be sorted is COMPLETELY beyond me. Maybe they got sorted by u-band magnitude?).

For each cutout, the SNR is the top number. The bottom number is the fraction of pixels that are masked. I noticed that for the more severe masking required for PSFs, the only cutouts that have an absolute 0% of masked pixels are those where the masking fails, so I am also removing those. (At least a couple pixels tend to be masked). There are still on rare occasions cutouts where the masking fails and the source is not in the centre. I think, especially given their high rarity (1 in 161 for these cutouts), they shouldn’t have any major effect on the final PSF.

Likewise, if we require a very high SNR > 1000, these are some of the cutouts we get. They are looking good!

CDFS for Different Regions

As requested, I created CDFs (with limits adjusted to show only out to u = 24) for different local regions. I created these for a square region of width 2 arcmin (the spacing between PSFs for HSC images), as well as 5 and 10 arcmin. I was having a little trouble getting cone search to work, so I did square regions as I think they should be mostly consistent. The empty regions are places outside of the defined COSMOS field, where I still have PSFs from the PSF picker. They shouldn’t cause any problems when running simulations on MegaCam images, as these PSFs wouldn’t be used in HSC simulations.

U-Band Cutouts and File Structure

The first major thing I worked on was giving the file structure on Cedar a much needed and overdue rework. Having things organized by tract is honestly a pretty bad way of structuring things. To fix this I spent some time reorganizing things by band, then splitting the files into appropriately sized sub-directories for AceNet jobs.

Now, if we need to run software on a specific group of images by filter, it is a piece of cake! This is also an important piece of preparation for getting u-band PSFs.

Everything organized by filter. How nice!

U-Band Cutouts for PSF Generation

As we discussed, we want to use images of stars in the COSMOS field to create our PSFs for u-band images. The first thing I wanted to do was check an ideal case where every object defined as a star in the catalogue gives a resolved image of a star.

To do this, I ran my code CutoutCreator on the u band images. I rewrote some parts of the code to better find images in a directory. Since this code runs much faster than Koe, we can do the whole directory in a single job with the following command:

python CutoutCreator.py HSC_IMAGES/u/ CATALOG.fits -b -O output/ -s 41 -m –multi

The flag -s specifies the width of the cutout (so 41 to match our PSFs), and we ask for masking with -m.

And it runs seamlessly! One thing I added to the code was a check for a square cutout after it is pulled out of the image. If it isn’t square, that means the object is extremely close to the edge of the image, so I don’t save it.

Checking the images, most of them look like this.

Not too good. Your first thought (and mine) would be that the code to get the cutouts isn’t working properly. My first inspection was into this. When checking, I remembered that the code I used in CutoutCreator was identical to Koe, except it doesn’t do profile extractions. You might also ask “if these are the cutouts why don’t we see them in the cutouts generated by Koe?”. The reason we won’t see them is because Koe only saves the cutouts for objects with successful extractions. CutoutCreator lacks such a criteria.

To further ensure that CutoutCreator is working properly, I did a magnitude cut of 0 < u < 24 to see if the brighter stars get acceptable cutouts. These are some examples of what I get.

Much better! Scrolling through the available stars, they consistently look exactly like this. So there is definitely a magnitude limitation on what cutouts are good. For a magnitude cut of 24, u-band stars have a coverage that looks like the map below:

The holes are NOT because of missing images, as these points are objects directly taken from the CLAUDS catalogue.

And as requested, here are the CDFs of u-band magnitudes for each patch. The same for uS band is on the way, but I’m waiting for the Cedar job to finish up.

Proceeding

In terms of proceeding from here, could we potentially use 2D interpolation? For all of our bright stars (say a magnitude cut of 24), we could fit a Moffat or Double Gaussian. We could then make 2D maps of the fit parameters, meaning we can recreate a decent PSF at any point we choose, and ensure the PSF locations match those for HSC imaging. This would be, as far as I know, a task I could complete in a day.