CPU指令集总结

/home/ubuntu/src/cpp/tbox git:(master) lscpu
Architecture:             x86_64
  CPU op-mode(s):         32-bit, 64-bit
  Address sizes:          46 bits physical, 57 bits virtual
  Byte Order:             Little Endian
CPU(s):                   64
  On-line CPU(s) list:    0-63
Vendor ID:                AuthenticAMD
  Model name:             AMD EPYC 9354 32-Core Processor
    CPU family:           25
    Model:                17
    Thread(s) per core:   2
    Core(s) per socket:   32
    Socket(s):            1
    Stepping:             1
    Frequency boost:      enabled
    CPU(s) scaling MHz:   54%
    CPU max MHz:          3250.0000
    CPU min MHz:          1500.0000
    BogoMIPS:             6499.72
    Flags:                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma
                          cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd
                          mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_to
                          tal cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc amd_ibpb_ret arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi
                          avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flush_l1d sev sev_es debug_swap
Virtualization features:
  Virtualization:         AMD-V
Caches (sum of all):
  L1d:                    1 MiB (32 instances)
  L1i:                    1 MiB (32 instances)
  L2:                     32 MiB (32 instances)
  L3:                     256 MiB (8 instances)
NUMA:
  NUMA node(s):           4
  NUMA node0 CPU(s):      0-7,32-39
  NUMA node1 CPU(s):      8-15,40-47
  NUMA node2 CPU(s):      16-23,48-55
  NUMA node3 CPU(s):      24-31,56-63
Vulnerabilities:
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Not affected
  Mds:                    Not affected
  Meltdown:               Not affected
  Mmio stale data:        Not affected
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Mitigation; Safe RET
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected

Summary of Major CPU Instruction Sets

Instruction / Feature	Purpose / Functionality
mmx	Original SIMD (64-bit) instructions for integer math, multimedia, and DSP operations.
sse / sse2 / sse3 / ssse3	128-bit SIMD instructions for floating-point and integer processing; improve performance in graphics, video, and scientific workloads.
sse4a / sse4_1 / sse4_2	Adds more efficient vector and string processing operations (like `crc32`,`popcnt`).
avx / avx2	256-bit SIMD extensions; enable faster floating-point and integer computation (used by AI, video, HPC).
avx512f	Base 512-bit SIMD (AVX-512 Foundation). Doubles vector width for massive parallel math.
avx512dq / avx512bw / avx512vl	Add support for wider integer (DQ), byte/word (BW), and vector-length (VL) operations—making 128/256-bit AVX‑512 possible.
avx512ifma / avx512vnni / avx512_bf16	Specialized AVX‑512 for fused multiply‑add integer (crypto), neural-network, and mixed-precision math acceleration.
bmi1 / bmi2	Bit-manipulation instructions for efficient bit-field operations (e.g., in crypto or data compression).
fma	Fused multiply-add (combined multiply and add in one instruction); crucial for high-performance math, ML, scientific computing.
aes / pclmulqdq / vaes / vpclmulqdq	Hardware acceleration for AES encryption and carry-less (GF) polynomial multiplication — used in cryptographic workloads.
sha_ni	Hardware acceleration for SHA hashing functions (SHA1, SHA256).
rdseed / rdrand	Hardware true random number generation.
adcx / adox	High-speed multi-precision arithmetic (e.g., big integer arithmetic for crypto).
f16c / avx512_bf16	Conversions and operations involving 16-bit floating-point formats (used in ML / inference).
popcnt / lzcnt	Fast bit-counting and “leading zero count” instructions.
clflush / clflushopt / clwb	Cache line flush instructions — control cache consistency and write behavior.
xsave / xsaveopt / xsaves / xgetbv	Efficient saving/restoring of extended CPU states (for virtualization, context switching).
smep / smap / umip / pku / ospke	Security and memory protection extensions (supervisor/user mode access prevention, key isolation).
smca / perfmon_v2	System management and performance monitoring capabilities.
cpb (Core Performance Boost)	Enables frequency boosting depending on workload and thermal conditions.
bpext / perfctr_core / perfctr_llc / cqm*	Performance counter and cache monitoring features (useful for profiling).
rdtscp / constant_tsc / nonstop_tsc	Timestamp counter enhancements for precise timing.
tce / np / npt / svm / vgif / avic / vnmi / vmcb_clean	Virtualization extensions (AMD‑V): improve performance and security in VMs.
ibrs / ibpb / stibp / ssbd	Mitigations for Spectre, Meltdown, and related speculative execution vulnerabilities.
sev / sev_es	AMD Secure Encrypted Virtualization — encrypts VM memory for isolation and confidentiality.
cat_l3 / mba / cdp_l3 / rdt_a	Cache and memory bandwidth allocation — useful in data centers for resource partitioning.
fsgsbase	Faster access to FS/GS segment registers — important for user-space threading models.
gfni	Galois-field arithmetic instructions (used in cryptography and error correction).
pni (SSE3)	Synonym for SSE3, sometimes listed separately.
movbe	Byte-swap instruction to improve performance in endian conversion.
mwaitx / clzero	Power management and cache-zeroing instructions (low-latency cleanup or sleep states).
la57	Enables 5‑level (57‑bit) virtual addressing — huge memory support (128 TiB VMA).
rdpid	Returns processor ID; useful for per-thread performance optimization.
fsrm	Faster string/memory operations (Zen 4 optimization).
wbnoinvd	Write-back + invalidate cache with less overhead — helps in certain low-latency use cases.
irperf / rapl / cpuid / topoext	Performance and topology querying tools (used by OS and monitoring software).

Architecture Summary:

Category	Support
SIMD Extensions	MMX, SSE – SSE4.2, AVX – AVX2, fullAVX-512suite ([F, DQ, BW, VL, IFMA, VNNI, BF16, VBMI, VBMI2, BITALG, VPOPCNTDQ]).
Crypto Extensions	AES, PCLMULQDQ, SHA, VAES, VPCLMULQDQ, GFNI.
Security / Virtualization	SME, SEV, SEV‑ES, SVM (AMD‑V), SMEP, SMAP, IBRS, IBPB, STIBP, SSBD.
Performance / Power	FMA, BMI1+2, CPB, PERFMON_v2, CAT_L3, MBA, CPPC (collaborative power control).
Memory / Addressing	LA57 (5‑level paging), CLWB, CLZERO, WBNOINVD.

菜单

分享

CPU指令集总结

CPU指令集总结

Summary of Major CPU Instruction Sets

Architecture Summary:

评论