CPU指令集总结
/home/ubuntu/src/cpp/tbox git:(master) lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 64
On-line CPU(s) list: 0-63
Vendor ID: AuthenticAMD
Model name: AMD EPYC 9354 32-Core Processor
CPU family: 25
Model: 17
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 1
Stepping: 1
Frequency boost: enabled
CPU(s) scaling MHz: 54%
CPU max MHz: 3250.0000
CPU min MHz: 1500.0000
BogoMIPS: 6499.72
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma
cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd
mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_to
tal cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc amd_ibpb_ret arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi
avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d sev sev_es debug_swap
Virtualization features:
Virtualization: AMD-V
Caches (sum of all):
L1d: 1 MiB (32 instances)
L1i: 1 MiB (32 instances)
L2: 32 MiB (32 instances)
L3: 256 MiB (8 instances)
NUMA:
NUMA node(s): 4
NUMA node0 CPU(s): 0-7,32-39
NUMA node1 CPU(s): 8-15,40-47
NUMA node2 CPU(s): 16-23,48-55
NUMA node3 CPU(s): 24-31,56-63
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Mitigation; Safe RET
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Srbds: Not affected
Tsx async abort: Not affected
Summary of Major CPU Instruction Sets
| Instruction / Feature | Purpose / Functionality |
|---|---|
| mmx | Original SIMD (64-bit) instructions for integer math, multimedia, and DSP operations. |
| sse / sse2 / sse3 / ssse3 | 128-bit SIMD instructions for floating-point and integer processing; improve performance in graphics, video, and scientific workloads. |
| sse4a / sse4_1 / sse4_2 | Adds more efficient vector and string processing operations (like crc32,popcnt). |
| avx / avx2 | 256-bit SIMD extensions; enable faster floating-point and integer computation (used by AI, video, HPC). |
| avx512f | Base 512-bit SIMD (AVX-512 Foundation). Doubles vector width for massive parallel math. |
| avx512dq / avx512bw / avx512vl | Add support for wider integer (DQ), byte/word (BW), and vector-length (VL) operations—making 128/256-bit AVX‑512 possible. |
| avx512ifma / avx512vnni / avx512_bf16 | Specialized AVX‑512 for fused multiply‑add integer (crypto), neural-network, and mixed-precision math acceleration. |
| bmi1 / bmi2 | Bit-manipulation instructions for efficient bit-field operations (e.g., in crypto or data compression). |
| fma | Fused multiply-add (combined multiply and add in one instruction); crucial for high-performance math, ML, scientific computing. |
| aes / pclmulqdq / vaes / vpclmulqdq | Hardware acceleration for AES encryption and carry-less (GF) polynomial multiplication — used in cryptographic workloads. |
| sha_ni | Hardware acceleration for SHA hashing functions (SHA1, SHA256). |
| rdseed / rdrand | Hardware true random number generation. |
| adcx / adox | High-speed multi-precision arithmetic (e.g., big integer arithmetic for crypto). |
| f16c / avx512_bf16 | Conversions and operations involving 16-bit floating-point formats (used in ML / inference). |
| popcnt / lzcnt | Fast bit-counting and “leading zero count” instructions. |
| clflush / clflushopt / clwb | Cache line flush instructions — control cache consistency and write behavior. |
| xsave / xsaveopt / xsaves / xgetbv | Efficient saving/restoring of extended CPU states (for virtualization, context switching). |
| smep / smap / umip / pku / ospke | Security and memory protection extensions (supervisor/user mode access prevention, key isolation). |
| smca / perfmon_v2 | System management and performance monitoring capabilities. |
| cpb (Core Performance Boost) | Enables frequency boosting depending on workload and thermal conditions. |
| bpext / perfctr_core / perfctr_llc / cqm* | Performance counter and cache monitoring features (useful for profiling). |
| rdtscp / constant_tsc / nonstop_tsc | Timestamp counter enhancements for precise timing. |
| tce / np / npt / svm / vgif / avic / vnmi / vmcb_clean | Virtualization extensions (AMD‑V): improve performance and security in VMs. |
| ibrs / ibpb / stibp / ssbd | Mitigations for Spectre, Meltdown, and related speculative execution vulnerabilities. |
| sev / sev_es | AMD Secure Encrypted Virtualization — encrypts VM memory for isolation and confidentiality. |
| cat_l3 / mba / cdp_l3 / rdt_a | Cache and memory bandwidth allocation — useful in data centers for resource partitioning. |
| fsgsbase | Faster access to FS/GS segment registers — important for user-space threading models. |
| gfni | Galois-field arithmetic instructions (used in cryptography and error correction). |
| pni (SSE3) | Synonym for SSE3, sometimes listed separately. |
| movbe | Byte-swap instruction to improve performance in endian conversion. |
| mwaitx / clzero | Power management and cache-zeroing instructions (low-latency cleanup or sleep states). |
| la57 | Enables 5‑level (57‑bit) virtual addressing — huge memory support (128 TiB VMA). |
| rdpid | Returns processor ID; useful for per-thread performance optimization. |
| fsrm | Faster string/memory operations (Zen 4 optimization). |
| wbnoinvd | Write-back + invalidate cache with less overhead — helps in certain low-latency use cases. |
| irperf / rapl / cpuid / topoext | Performance and topology querying tools (used by OS and monitoring software). |
Architecture Summary:
| Category | Support |
|---|---|
| SIMD Extensions | MMX, SSE – SSE4.2, AVX – AVX2, fullAVX-512suite ([F, DQ, BW, VL, IFMA, VNNI, BF16, VBMI, VBMI2, BITALG, VPOPCNTDQ]). |
| Crypto Extensions | AES, PCLMULQDQ, SHA, VAES, VPCLMULQDQ, GFNI. |
| Security / Virtualization | SME, SEV, SEV‑ES, SVM (AMD‑V), SMEP, SMAP, IBRS, IBPB, STIBP, SSBD. |
| Performance / Power | FMA, BMI1+2, CPB, PERFMON_v2, CAT_L3, MBA, CPPC (collaborative power control). |
| Memory / Addressing | LA57 (5‑level paging), CLWB, CLZERO, WBNOINVD. |