Skip to main content

IT learning progress log

Apr 2023: rekindled
#

  • Bought 2 ebooks but they were locked with password, forcing me to use proprietary software to read, so I couldn’t read locally. While trying to crack the pdf password with hashcat + rockyou, I realised my life-long interest in general IT, hacking, tinkering etc. (i.e. I only ever stayed up until 4am to either play computer games or tinkering with tech-related stuffs). Found out more about digital forensics, computer forensics. Very interesting!

Jun 2023: planning my way forward
#

  • Heard that one cannot get straight through DFIR but should start somewhere like SOC analyst, sysadmin
  • Learned from reddit & confirmed through Seek that DFIR job is not popular in Australia – switch target to an entry position in sysadmin
  • Learning plan & roadmap created
  • Went through the first few lectures of several general computing courses that seemed to compliment each other: Harvard CS50, NYU Computer hardware & OS… all parallel, hopefully one area can help explain/enhance understanding of the other
  • Read more great blogs post about starting career, pillars of skills… Went through A+ videos, went deep into subnet mask, subnetting, wrote 1st blog post, looked for grad cert/grad dip in comp sci in local universities with CSP to save fees

Jul 2023: first VPS running OpenBSD
#

  • Setup an openBSD VPS on Vultr to store sensitive data on a detached storage, enable passwordless SSH with key pair
  • Learned how secured and robust OpenBSD is in multiple applications such as webserver (this blog is running on one), build simple router & firewall
  • Received a raspberry pi zero – thinking of building pihole, see notiapoint.com

Aug 2023: starting homelab, VMs
#

  • Bought an old Dell T630 server & an old Cisco Catalyst 3750 switch - started tinkering
  • Learned what lifecycle managment is, what iDRAC is, and how it’s used to remotedly manage the server
  • Learned to update firmware, BIOS
  • Learned about different types of hypervisor
  • Got free VMWare’s ESXi 8.0 license to run baremetal
  • Learned about difference between normal HDD and SAS HDD, different RAID array types, FreeNAS/TrueNAS and ZFS compared to other FS
  • Setup a 5xSAS RAID5 array
  • Learned to boot from image using vFlash
  • Decided to install ESXi on vflash card instead of any existing drive
  • Spun up first VMs, learned about thick and thin provision
  • Got shell to ESXi and learn some basic esxcli commands - learn to attach a USB as VMFS datastore
  • Learned to configure vmnic and let ESXi and VMs connect to the internet

Sep 2023: upgrading homelab
#

  • Got 2 GPUs and more powerful PSU (1600W) to accommodate extra load
  • Learned that I need a power supply extension board & cables kit to power 2 GPUs
  • Learned the different power cable connectors 6-pin vs 8-pin… and learned that I can merge 2 6-pin connectors into 1 8-pin connector to power the 2nd GPU
  • Learned from the manual that I need the 2nd CPU in order to run the 2nd GPU. Also realised that I installed the 2nd GPU in the wrong, low-priority PCIe slot (slot 7), that’s why it can’t be seen by OS
  • Had to mod the “wind tunnel” shroud to fit the oversize 2nd GPU in slot 6, network card went to slot 7.
  • Purchased & install 2nd CPU (Intel Xeon E5 2660 V3)
  • Learned about FreeBSD while trying to setup pfsense VM - tried installing FreeBSD OS including X.org desktop environment for fun
  • Completed building the server with 2 x GPUs
  • Fixed fan overspeed issue when installing 2nd GPU (kids!! can you hear me??) Upgraded BIOS, firmware of PSU along the way.
  • Setup Samba for filesharing within home network with other windows machines

Oct 2023: setup home network
#

  • Switch ISP and installed OpenWRT on an old Netgear wifi router as a main router, replacing the previous ISP-provided Archer C1200
  • Learned that I need rollover cable to connect to setup the Cisco switch
  • Learned that the switch had SFP ports and need (expensive) GLC-T 1000BASE-T transceivers to fit RJ45 connectors of CAT6 cables (part # 30-1410-03 or 30-1410-02)
  • Run CAT6 cable on the attic connecting ISP-provided router with homelab switch (got myself covered in fiberglass…)
  • Learned how to SSH into AWS instance running CentOS 7 using both AWS Cloudshell and PowerShell 7 in work computer. Started to get how SSH and its key pair work in cloud environment.

Nov-Dec 2023: tried running local AI
#

  • Naturally got sidetracked by machine learning, local AI etc. 2 GPUs was great for that purpose!
  • During that time learned about git and git lfs
  • Tried to passthrough GPU to Ubuntu VM for local AI finetuning - struggle with finding the right driver/CUDA/kernel version combo to utilise both GPUs. Installing CUDA Toolkit by reading the (outdated) doc is a pain in the ass.
  • No success in getting Ubuntu VM in ESXi to see the 2nd GPU… the different drivers on ESXi and VM OS started to convolute the whole process.
  • TIL I can use watch -d -n 0.5 nvidia-smi to watch GPU usage or any other output.
  • Took a break from local AI

Jan 2024: looked into cloud
#

  • As advised by a closed friend working in IT, started looking into adding cloud into skill mix. Came back to interest in sysadmin
  • Wrote a post about which cloud to choose
  • Went through MS Learn Azure Fundamentals in 3 days
  • While backing up by simply copying the LLMs I downloaded (~400GB) to the backup volume, I realized there’s a lot of large objects in git/lfs/ - will try to see if I can remove these safely. Still have a very vague idea how git works in general
  • Uninstalled ESXi as VMWare got acquired by Broadcom and free licenses got pulled. Might have to comeback to older (pirated) ESXi versions. Reckon it should be enough for linux sysadmin and cloud learning purposes.
  • Found myself increasingly interested in sysadmin and infrastructure side of IT. Frequently found myself staying up til 2am, wide awake and excited!

Feb 2024: kept learning linux admin, Docker
#

  • Ran Ollama with deepseek-coder:33b to help speed up learning process. Super helpful!
  • Got an APC UPS - learned that it shouldn’t be connected to anti-surge powerboard
  • Learned about C13/C14 power cable and these are needed to connect my UPS to my server’s PSUs
  • Learned to add second domain to httpd webserver, and create SSL certificate with Let’s Encrypt to enable https
  • TIL Docker Desktop is different from Docker CE, using different builder, having different context; installing Docker Desktop will disable docker engine daemon etc. and how to switch between the two in the CLI.
  • Tried to run a hashcat docker container (making use of NVDIA GPU) failed, with the following error - might be a permission issue. Will try to learn how to run docker Desktop as root, or rootless mode
(HTTP code 400) unexpected - failed to create task for container: failed to create shim task: OCI runtime create failed: 
runc create failed: unable to start container process: error during container init: error running hook #0: 
error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: 
libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown
  • Created a volume on the 5 x SAS disk RAID5 array to store the backup the server using fdisk. Finally understood that “mount point” is simply a folder where you access the entire disk partition (something I failed to understand clearly as a long-time Windows user)
  • Created a shell script to automate/schedule this backup with cronjob - learned about the difference between tar vs rsync. I’ll still have to learn to use both, though rsync seems particularly fit my needs for now.
  • Finally be able to run hashcat docker as root from CLI. It’s really an issue with permission.
  • Installed zfsutils-linux and created a 14TB zpool spanning across 2 SATA disks as another place to store backup. Must write an article to compare these 2 options.
  • Setup tailscale on work computer and homelab for remote access. Learned about /etc/resolv.conf, why it gets overwritten on WSL and how to use systemd-resolved as a DNS manager. Finally intuited how tailscale and, similarly, cloudflare work now. Also started to intuit the concept of shell and shell within shell.
  • Ubuntu booted up extremely slowly. TIL a handy tools to diagnose like systemd-analyze blame or critical-chain or passing whole boot process into a nifty chart with systemd-analyze plot > bootchart.svg is particularly helpful for visual learners like me.
  • Finally intuited Docker context. Uninstalled Docker Desktop to move to Docker CE, working on CLI entirely. Learned to save Docker Desktop images to tarballs and load them back into CE. Learned to organise docker directory. Created first .yml file and spun up a container from it. Started to get the concept of IaC now.
  • Spun up my first KVM VM running openBSD. Learned that setting up console to shell into the VM can be tricky. I don’t understand enough about serial console etc.
  • TIL that due to the cost of bandwidth with Aussie ISP, Cloudflare CDN in free plan actually reroutes traffic overseas before sending it back to host, defeating the purpose of a CDN
  • Realized that rsync only is not enough in my use case, as I will tinker and break a lot of stuffs. Better to have historical backup of the machine to roll back to. Created another bash script to backup the system using tar this time for version control under a form of multiple tarballs. Will pair this with git eventually. Learned that trailing / after directories is different (at least in tar), and tar usually run at the local dir, so if you want to zip files etc. from other dir, use -C <dir> - and this particular flag is order sensitive as it corrects all following flags
  • Spun up a container running alma linux base. Took a long time. Will try to install Foreman,not Foreman + Katello for simplicity’s sake while learning (only learned recently that they are totally different beasts.) Maybe will have to use VM here instead of container, but I’m slowly building my knowledge foundation up.
  • Alma linux: to view all opening ports: ss -tulpn (make sure to have iproute package installed first)
  • Encountered “Could not get default values, cannot continue” running foreman-installer. Log shows:
Error: The parameter '$slowlog_log_slower_than' must be a literal type, not a Puppet::Pops::Model::AccessExpression (file: /usr/share/foreman-installer/modules/redis/manifests/init.pp, line: 426, column: 15)
  • Downgrading puppet as suggested here
  • Encounter different error
Forward DNS points to 127.0.1.1 which is not configured on this server
Output of 'facter fqdn' (T630.lan) is different from 'hostname -f' (T630)
  • Set hostname domain as suggested here.

the solution is to create a fully qualified domain domain name (FQDN). I did it this way (src >26; also answered here 42):

Change the server IP to a static one. I reserved the IP address in the router and left the server in DHCP mode; one might apply a static address on the server itself.

Set a fully qualifed host name, like foreman.example.com using the hostnamectl set-hostname foreman.example.com command.

Edit (append) the /etc/hosts file to redirect the host name of the server (step 2) to its IP(step 1); one can use this command: echo “192.168.1.50 foreman.example.com” > /etc/hosts (of course, change the IP and the hostname).

  • Got error
System has not been booted with systemd as init system (PID 1). Can't operate.
  • Destroy the container to use different image: almalinux:8-init Doesn’t work. Don’t want to delve too deep into container yet. Decided to dual boot Alma linux 8 on 2nd SSD (virtualisation host profile).

Mar 2024: KVM KVM KVM
#

  • During installation of Alma, learned more about LVM; comparing xfs and ext4. Very interesting! Noticed that Alma Linux create a logical volume (LV) by default for /home spanning across 2 disks, which made me nervous becaues I don’t know how to assign /home to a separate disk like I did before.
  • Read a lot of discussion and advice about linux admin. Started to move away from Ubuntu and more into Enterprise Linux ecosystem. Probably only use Ubuntu for fun running LocalAI etc.
  • Successfully installed Foreman on the Thinkpad T480 running Alma Linux workstation. Still don’t understand the architect of Foreman interacting with the future hosts.
  • Installed Alma linux 8 but with virtualization host profile. Very interesting and daunting to purely use CLI to manage the server, but this is what I’m in for.
  • Setup Cockpit web GUI. Access it remotely from Window machine. This is just like ESXi!! Then I realized I gotta run Podman instead of Docker for container, because the two doesn’t work in the same machine. May have to defer learning about container until I got a full grasp of KVM? I don’t know why I still shy away from Proxmox. Maybe because it seems to be an easy way out from ESXi?
  • I need Alma linux iso to spin up KVM VMs, which is not stored in the current drives of the newly installed Alma. It’s locked behind the zfs pool of ubuntu, not on a separate drive that can be mounted. The pure CLI environment means I can’t cheat with GUI

5/3

  • Learned to edit /etc/fstab to mount the drive. But the iso was not in there
    1. Get partition UUID with lsblk -f
    2. Edit /etc/fstab: add UUID=<partition UUID> <mount point path> <fs type> defaults 0 2 - use UUID here instead of device path /dev/sda1 because if we swap HDD location, it’ll still use the right drive
    3. Create mount point as added in previous step such as /mnt/backup-storage
    4. Update systemd with systemctl daemon-reload
    5. Mount the disk with mount <partition> <mount point> such as mount /dev/sda2 /mnt/backup-storage

6/3

  • Trying to setup samba so I can copy the iso on my main windows machine to the drive attached earlier. Couldn’t get through as easy as before with GUI. Encountered multiple errors:
    1. Can’t access the server IP at all. Turned out it’s the firewall rule in Cockpit. Added samba to the white list
    2. Still can’t access the server. Got system error 1219 meaning multiple connections exist to the server using the same username and different credentials. But I swear I didnt have anything connect to the server but File Explorer. Turned out it’s the network drive mapping I did earlier to assign network drive to a disk for media sharing purpose. Disconnected it, signed out and back in again fixed it.
    3. I could see the shared folders in the IP now, but I can’t access the folder. Error: the network name cannot be found. After a while diagnosing with pinging, changing ownership, permission or the shared dir, net use etc. I stumbled upon the /etc/samba/samba.conf.example file that mentioned labelling newly-created shared dir as samba_share_t because it is operating in Security-Enhanced Linux. By doing this, SELinux will allow samba to read and write to these dirs. What an eye-opening read!! Now I know what SELinux is lol. Fixed it by recursively modifed the dir contexts with sudo chcon -t samba_share_t -R <dir> (learned that you can’t group -t and -R together, and understand a bit more about linux commands syntax) - then restart samba service.
  • TIL about the idea behind systemd and a bit deeper into the booting process
  • Learned that I need to sign zfs module because of secure boot. Gotta learn how to do this and write about it. (update: I did it! See here)
  • Installed & upgrade Ansible - realized I will need to upgrade python3.6 (RHEL8 default) to 3.9 as well

8/3

  • Trying to spin up a KVM again after previous openBSD KVM, but instead of virt-install, this time using XML file as I think it will pave way for future automation. And man it was a struggle for a 2 full nights. I was at the verge of “cheating” with virt-manager because of its GUI and being more beginner-friendly. But I decided to take the hard way to understand each line of the XML file and the proper way to spin up KVMs. Here are what I learned.
  • Finally learned to sign zfs kernel module!! Feel really good about this! See here
  • I have been living on edge as I got this in log: Device: /dev/sdf, SMART Failure: DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH. I was right to be nervous. That’s why I have been avoiding using /home. Looks like time to learn to replace a disk that belongs to a LVM group.
  • Got stuck at the first step move the extents off of /dev/sdf1 - the failing disk: pmove /dev/sdf1 returns “No extents available for allocation.” Realized I can’t move 1TB space to 500GB eventhough there’s no file on that 1TB. Looks like I have to add a member to the group that’s big enough to store the extent. But how will I remove the member later?
  • Ok kinda got it sorted! Here are the steps
  • Getting stuck at using cloud-init to spin up a alma linux KVM. Can’t pass the user data to the VM for some reason. Arrgh!!

11/3

  • After going at it for 3 long nights I finally did it!!! Starting kvm reliably from a generic Centos stream cloud image and cloud-init. And the cause of my struggle was so stupid it’s not even funny. Time to start installing foreman. Gotta remind myself to treat the VMs from now on as cattles instead of pets.

14-25/3

  • Something is wrong with Plesk at work that we cannot login via the usual route. It’s on and off in a funny way. Suspecting DNS problem. Spent 2 weeks to investigate the cause and harden the whole server, all while planning to move from Centos to Alma Linux. This is a good chance to have a fresh start without a mess from previous admin (nginx auto-restarting every 5 secs made log analysis a pain in the hind). Found an image from Plesk that seemed to make the entire thing much less painful.
  • Finally able to use Terraform to spin up some EC2 instances as well as local VMs. Now some more practice before adding Ansible on top for config. Maybe I can skip Foreman for now(?)
  • Installed webmin in parallel with Cockpit
  • Finally got dual GPU run stably on Alma Linux with the right driver and kernel etc. This is such a pain in the ass as documents all over the place. Learned to undervolt the cards to haver better inference performance while saving power! See it here

Apr 2024: ZFS, backup, and f***ed up
#

  • TIFU while migrating files to new zpool because I was distracted. Lost quite a few books. Luckily having backup in other media helps. Learned much more about ZFS’ zpool, dataset, scrub, snapshot, ashift etc. and how I must be extra focus at this stage of learning while working on this kind of production-lab hybrid environment, which is the nature of homelab. Will have to practice backing up and restoring many many more time, with rsync, with tarball, and with ZFS snapshot.
  • TIL about systemd timer, a bit deeper about shell script, dot files, customising shell. Might move to zsh soon, and will learn to use tmux, and learn to automate a lot more tasks other than backup and scrub. Also added MFA into SSH login
  • TIFU again, this time much bigger than yesterday. Lost 15 years of media library. Hurt so much not because of the data but because it seems the lesson yesterday didn’t sink in fast enough. I was too foolish to try migrating data from one zpool to another with zfs send | receive without clearly understanding it or dryrunning. Painful lessons learned here

May 2024: automate VM and cloud deployment
#

  • Learned to use terraform to deploy local VMs with libvirt provider. Also learned that I can login to guest VM remotely with virt-viewer -c qemu+ssh://<username>@<hostname>/system <vm name>. Super excited to be able to access my VMs from anywhere!

  • Whelp, after rebooting the Centos Stream 8 VM spun up with clouinit a month ago, I realized I can’t login through console again. I can only shell in as the default user, but then I can’t change password because current password is wrong. Shelling in or login with root is fine (but that’s not best practice). That’s when this part in the cloud-init documentation made sense: “Most cloud-init configuration is only applied to the system once. This means that simply rebooting the system will only re-run a subset of cloud-init.”. I might be able to ignore this problem for now, but still, gotta read deeper about cloud-init.

  • Learned that we can pass password’s hash instead of plain text thanks to this line in cloud-init doc: “Specifying a hash of a user’s password with passwd is a security risk if the cloud-config can be intercepted. SSH authentication is preferred

  • Turned out the problem above is not a problem but actually a feature. After all, password auth is frowned upon, so as long as I can shell in with my SSH key, I’d consider it a win.

  • Learned a bit deeper about files and directories permission. Recalled that a few months ago while trying to contained a malware on plesk (alfabypass.php), I though I was being smart to remove execute from all files AND directories in . Naturally, all of the websites styling are broken, leaving only barebone HTML. Gotta rollback the whole server in shame, because I didn’t know which default permission to restore for which file/dir. Fun and scary experience.

  • The more I learn and read and listen about Linux and its administration, the more I want to be good at this!! I think I found the one job that I want to do even without pay.

  • TIL about Single Event Effects (SEE) that can cause single bit flip in computers that can lead to desvastating consequences. Now I understand more the role of ECC RAMs and how fragile technology can be

  • Although I’ve been comfortably deploying local VMs with terraform but it has come to the point where I might need to reconsider this approach compare to Foreman Katello, or Vagrant, as it looks like terraform is better to provide cloud instances thanks for more robust providers. Deploying local VMs on the other hand, solely rely on this single provider which can be limited in terms of functionality. In other word, a simple way to provide local VM can just be a bunch of .xml files and virsh, and a more complex way is through Vagrant or Foreman, each of which has their own pros that seem to outweight terraform.

  • Also I’ve been actively avoiding using any GUI for the admin tasks, as I think it doesn’t help with my understanding of how the system work under the hood. One click can mask so many command lines that I need to learn to understand the mechanism.

  • Learned more about SELinux and how to read, add and remove fcontext labels etc.

  • Learned to use logrotate to preserve .bash_history, which is really important during this stage of learning.

  • Migrate this blog to new AWS instance running RHEL9

  • Some nifty handy scripts:

    • diff -qr . data | grep ' differ' getting a list of corresponding files whose content differs
    • sed 's/\r$//' - search for carriage return char and replace with an empty string. Very useful for cleaning up the text files created in Windows
    • watch -n 1 iostat -xy --human 1 1 - handy when you need to monitor disk activity
    • iftop -i eno0 - monitor network activity on specific network interface
    • ss -ltp
    • for i in *.mp4; do ffmpeg -i "$i" "${i%.*}.mp3"; done - convert all mp4 files in dir to mp3 with ffmpeg
    • find . -type f -name *.jpg -exec rm {} ;/ - find and remove all .jpg file, add flag -s to specify sizes
    • sudo dmidecode --type 17 or sudo lshw -short -C memory- display RAM info (type and speed etc.)
  • Today I’m one more step closer towards deploying multiple VMs locally with static IP using 1 single main.tf file, after a lot of terraform apply and destroy. I’ve been trying to deploy 2 VMs, one Fedora and one RHEL-based with static IP. Initially, I got permission denied error where libvirt couldn’t read the newly-created libvirt_volume. From experience, I guessed it could be that kvm-qemu doesn’t have enough permission, or the file permission of the parent folder where the libvirt pool locates, or some SELinux issue with it. After some troubleshooting, turned out it’s the simple problem of parent dir permission. Setting it to 775 instead of 770 helps. It’s a bit weird because this kind of problem didn’t pop up last time I run terraform.

  • Then I can deploy the VMs, but none of them get the static IP I assign in the main.tf file. So I read around and found out that I had to specify qemu_agent=true for the VMs to grab the static IPs. It kind of worked, but for some reason only the Fedora VM got static address, while the RHEL VM only have IPv6. So I thought maybe it has to do with the cloud image itself, so I switched from RHEL8 to Centos8. Didn’t work either. I mean at this point, I was thinking of maybe I can just log in and manually config IP address with nmcli. After all it’s about getting the VMs up and running, not HOW you do it. But well, I can’t live with it!

  • So I tried providing different config of cloud-init’s network config for each of the VM in main.tf file, thinking that maybe because QEMU guest agents in the VMs didn’t start properly, and we can assign the IP with cloud-init (still not sure how cloud-init works under the hood though, must find out). And this time, the Fedora VM didn’t get static IP!! Ha! We’re getting somewhere. So the problem is not with the RHEL-based image but maybe the order which terraform + libvirt provide and config the VMs.

  • So finally I tried providing 2 totally separate cloud-init injections for each VM. And it worked!!! I was exhilarated!

  • While trying to setup IdM (FreeIPA) and Foreman, I just realized that I don’t know anything about DNS server and what happen under the hood when a machine request a domain name. All my DNS knowledge is limited to changing the zone file on my registrars’ platforms. Felt like hitting a brickwall because this topic is certainly hard and worth diving deep into.

Jun 2024: CLI syntax, DNS, basic infra setup
#

  • TIL the exact behaviour of --exclude flag of rsync. The problem is when I was trying to exclude a specific dir, it still show up in the backup destination. rsync -arv --exclude=/home/user/excluded-dir /home/user destination/ From the man page: if the pattern starts with a / then it is anchored to a particular spot in the hierarchy of files, otherwise it is matched against the end of the pathname. This is similar to a leading ^ in regular expressions. Thus “/foo” would match a name of “foo” at either the “root of the transfer” (for a global rule) or in the merge-file’s directory (for a per-directory rule).

    An unqualified “foo” would match a name of “foo” anywhere in the tree because the algorithm is applied recursively from the top down; it behaves as if each path component gets a turn at being the end of the filename. Even the unanchored “sub/foo” would match at any point in the hierarchy where a “foo” was found within a directory named “sub”.

  • Basically, slashes matter to specify the root of transfer.

  • If you append a slash to the end of the source directory rsync -a /home/user/ destination, the root of the transfer is user dir, and rsync will copy the contents of user dir.

  • If otherwise no slash at the end, rsync -a /home/source destination, the root of the transfer will be home dir, and rsync will copy the whole user directory itself to the destination. This determines what the root of the transfer will be in order to inform the exclude expression.

  • Then within the root of transfer, to match the excluded dir /home/user/excluded-dir/, the filter path must be /user/exclude-dir/. Also a filter pattern of excluded-dir/ would match that directory anywhere in /home/user, even /home/user/subdir/subdir/subdir/excluded-dir/. Syntax matters so much in Linux.

3/6

  • Today is the day I feel like after a lof of building up and tearing down and frustrating troubleshooting virtual network issues around the kvm/qemu/libvirt/cloud-init/terraform/ansible stack, I can safely say that I have achieved a small milestone, not too different from a kid who has learned to walk safely, or an apprentice who has learned to use the basic tool properly. At this stage, I can:

    • Reliably spin up a proper virtual network and machines using Terraform and corresponding OS cloud image
    • Inject/modify user info, SSH keys, and run some command on first boot with cloud-init
    • Create basic Ansible inventory file and group & run adhoc commands on multiple VMs
    • Manage VMs with virsh for basic tasks such as snapshot, revert, and editing XML files if needed
    • Manage files & directory owner, group, permission and SELinux fcontext following least-privilege principle
    • Create a basic backup and recovery plan following 3-2-1 principle, using rsync and tar
    • Automate backup and maintenance shell script using crontab, timer
    • Understand multiple RAID configs
    • Create partition and filesystem on new disks, edit /etc/fstab and mount the filesystem
    • Setup basic network connection with nmcli and nmtui
    • Basic tmux, enough to get by without opening multiple shell windows
    • Setup SSH connection to VM/cloud instance with VPS such as tailscale
    • Use rsync and scp to transfer file remotely
    • Setup Samba fileshare server & manage access
    • Learn to combine commands such as find, grep, awk, sed together, and make use of history for more efficient administration
    • Troubleshoot most problems with the right Google-fu of the error message (I dread the day when I encounter a problem that I can’t find the answer online)
    • Be able to dive deeply into a topic to learn enough to be productive with it in a short time
  • Also more importantly, I have had several revelations and realisations that, albeit small, were intuited so quickly that if I don’t write them down, I will no longer remember why it caused me so much confusion in the beginning. Things such as:

    • libvirt default network virbr0 shouldn’t be touched or configured manually, instead we should leave it for when creating/modifying the virtual network
    • when specifying static IP address with terraform, it actually go add the static lease to teh virtual network instead of changing the VM’s NIC config itself
    • many other tools will have to rely on an agent running on the host machine to perform tasks. Such can be qemu agent, or EC2 connect, or cloudflare tunnel or tailscale
    • aware of the endless possiblity and strength of shell script, be it bash or python or powershell, combined with regex, for files manipulation and system management
    $ echo "I love sushi, sushi is great!" | sed 's/sushi/hotdog/'
    I love hotdog, sushi is great!
    
    $ echo "I love sushi, sushi is great!" | sed 's/sushi/hotdog/g'
    I love hotdog, hotdog is great!
    
    $ sudo netstat -anp | grep ::80
    tcp6       0      0 :::80                   :::*                    LISTEN      4011/apache2  
    
    $ sudo netstat -anp | grep ::80 | sed 's/.*LISTEN *//'
    4011/apache2
    
    $ hello="Hello, world!"
    $ echo $hello
    Hello, world!
    $ unset hello
    $ echo $hello
    
    $ du -hs ~
    100G	/home/me
    $ myhomedirsize=$(du -hs ~ | awk '{print $1}')
    100G
    
    $ y="cats dogs bears"
    $ for x in $y ; do echo "I like $x" ; done
    I like cats
    I like dogs
    I like bears
    
    $ for x in cats dogs bears ; do echo "I like $x" ; done
    I like cats
    I like dogs
    I like bears
    
    $ cat lines.txt | while read x ; do echo $x ; done
    x equals 0
    x equals 1
    x equals 2
    
  • Setup a VM as DNS server for lab virtual network using BIND. Learned much more about zone file than I did in the past. Now the whole lab network will use this DNS server.

  • Setting up a DNS server also teach me how a system prioritise resolving hostname: /etc/hosts > NetworkManager > /etc/resolve.conf

  • Learned about monolithic libvirt daemon vs modular daemons after encountering some hiccups with libvirtd not starting after adding more disks to LVM

  • Had to rebuild the zpool to add 2 more disks to improve redundancy, going from raidz1 to raidz2. This is the third time but I learned so much more from past experience. Now each dataset has proper recordsize, xattr, atime and compression. Also the whole pool has a SSD partition as ZIL to improve performance.

  • Learned about proxmox HA cluster and Ceph storage. Eager to try so much that I got 4 mini PCs to start a cluster and build my own opsense box on top. These will run applications that are benefited from HA in the context of home usage, mostly network services like DNS DHCP, firewall or VPN…

Jul 2024: Proxmox cluster, Opnsense, DNS again, podman
#

  • Setup proxmox cluster with 3 x ThinkCentre M910q mini PCs, upgraded them with a bunch of old laptop RAMs and nvme drives. Going to build a 4th one with a 4-port NIC to be an opnsense or NVR box.
  • Moved main server to a new location, rewire all cables in homelab. Learned to crimp RJ45 connectors & keystone. Cable management. Prepare to bring home a 12U rack + patch panel + PDU for proper management
  • Started to intuit OSI 7 layers when troubleshooting Cisco managed switch not “switching” after chaning port due to the move above. Should write a post about this.
  • Restore and patched work website after being hacked
  • Something is wrong with tailscale DNS (100.100.100.100) as I domain names doesn’t get resolved. Disable magic DNS and changed to systemd-resolved and use previous DNS fixed it. Gotta read a bit more into systemd-resolved to see how to control it properly.
  • Dabble with Postgres trying to run Miniflux, Paperless…
  • Getting used to container (mainly podman) more and understand a bit more after running some in production (Deluge, Jellyfin, Paperless and Miniflux), particularly in SELinux environment
  • Learned about setting up dynamic DNS with Cloudflare API via a nify shell script. Will carefully explore this to expose some service to the internet just for fun.
  • Successfully built and deploy an opnsense box with a Lenovo M720q, PCIe riser card, 4-port NIC and proxmox cluster. Minimal down time. On the very evening that Crowstrike bricked all MS systems. Also learned about the existence of ML2 interface - a propriatery PCIe of Lenovo.